This is the start of the stable review cycle for the 5.13.12 release. There are 151 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 18 Aug 2021 12:54:12 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.13.12-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.13.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.13.12-rc1
Kuan-Ying Lee Kuan-Ying.Lee@mediatek.com kasan, slub: reset tag when printing address
Jeff Layton jlayton@kernel.org ceph: take snap_empty_lock atomically with snaprealm refcount change
Jeff Layton jlayton@kernel.org ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm
Jeff Layton jlayton@kernel.org ceph: add some lockdep assertions around snaprealm handling
Sean Christopherson seanjc@google.com KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
Sean Christopherson seanjc@google.com KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
Sean Christopherson seanjc@google.com KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF
Sean Christopherson seanjc@google.com KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation
Zhen Lei thunder.leizhen@huawei.com locking/rtmutex: Use the correct rtmutex debugging config option
Ard Biesheuvel ardb@kernel.org efi/libstub: arm64: Double check image alignment at entry
Christophe Leroy christophe.leroy@csgroup.eu powerpc/32: Fix critical and debug interrupts on BOOKE
Cédric Le Goater clg@kaod.org powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
Christophe Leroy christophe.leroy@csgroup.eu powerpc/smp: Fix OOPS in topology_init()
Christophe Leroy christophe.leroy@csgroup.eu powerpc/32s: Fix napping restore in data storage interrupt (DSI)
Laurent Dufour ldufour@linux.ibm.com powerpc/pseries: Fix update of LPAR security flavor after LPM
Christophe Leroy christophe.leroy@csgroup.eu powerpc/interrupt: Do not call single_step_exception() from other exceptions
Thomas Gleixner tglx@linutronix.de PCI/MSI: Protect msi_desc::masked for multi-MSI
Thomas Gleixner tglx@linutronix.de PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
Thomas Gleixner tglx@linutronix.de PCI/MSI: Correct misleading comments
Thomas Gleixner tglx@linutronix.de PCI/MSI: Do not set invalid bits in MSI mask
Thomas Gleixner tglx@linutronix.de PCI/MSI: Enforce MSI[X] entry updates to be visible
Thomas Gleixner tglx@linutronix.de PCI/MSI: Enforce that MSI-X table entry is masked for update
Thomas Gleixner tglx@linutronix.de PCI/MSI: Mask all unused MSI-X entries
Thomas Gleixner tglx@linutronix.de PCI/MSI: Enable and mask MSI-X early
Christophe Leroy christophe.leroy@csgroup.eu powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
Ben Dai ben.dai@unisoc.com genirq/timings: Prevent potential array overflow in __irq_timings_store()
Bixuan Cui cuibixuan@huawei.com genirq/msi: Ensure deactivation on teardown
Babu Moger Babu.Moger@amd.com x86/resctrl: Fix default monitoring groups reporting
Thomas Gleixner tglx@linutronix.de x86/ioapic: Force affinity setup before startup
Thomas Gleixner tglx@linutronix.de x86/msi: Force affinity setup before startup
Thomas Gleixner tglx@linutronix.de genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
Randy Dunlap rdunlap@infradead.org x86/tools: Fix objdump version check again
Dhananjay Phadke dphadke@linux.microsoft.com i2c: iproc: fix race between client unreg and tasklet
Pu Lehui pulehui@huawei.com powerpc/kprobes: Fix kprobe Oops happens in booke
Ard Biesheuvel ardb@kernel.org efi/libstub: arm64: Relax 2M alignment again for relocatable kernels
Ard Biesheuvel ardb@kernel.org efi/libstub: arm64: Force Image reallocation if BSS was not reserved
David Brazdil dbrazdil@google.com KVM: arm64: Fix off-by-one in range_is_memory
Benjamin Herrenschmidt benh@kernel.crashing.org arm64: efi: kaslr: Fix occasional random alloc (and boot) failure
Xie Yongji xieyongji@bytedance.com nbd: Aovid double completion of a request
Longpeng(Mike) longpeng2@huawei.com vsock/virtio: avoid potential deadlock when vsock device remove
Maximilian Heyne mheyne@amazon.de xen/events: Fix race in set_evtchn_to_irq
Matt Roper matthew.d.roper@intel.com drm/i915: Only access SFC_DONE when media domain is not fused off
Eric Dumazet edumazet@google.com net: igmp: increase size of mr_ifc_count
Neal Cardwell ncardwell@google.com tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
Willy Tarreau w@1wt.eu net: linkwatch: fix failure to restore device state across suspend/resume
Yang Yingliang yangyingliang@huawei.com net: bridge: fix memleak in br_add_if()
Nikolay Aleksandrov nikolay@nvidia.com net: bridge: fix flags interpretation for extern learn fdb entries
Andre Przywara andre.przywara@arm.com pinctrl: sunxi: Don't underestimate number of functions
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
Eric Dumazet edumazet@google.com net: igmp: fix data-race in igmp_ifc_timer_expire()
Takeshi Misawa jeliantsurux@gmail.com net: Fix memory leak in ieee802154_raw_deliver
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: ksz8795: Don't use phy_port_cnt in VLAN table lookup
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: ksz8795: Fix VLAN filtering
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: ksz8795: Use software untagging on CPU port
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: ksz8795: Fix VLAN untagged flag change on deletion
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: ksz8795: Reject unsupported VLAN configuration
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: ksz8795: Fix PVID tag insertion
Ben Hutchings ben.hutchings@mind.be net: dsa: microchip: Fix ksz_read64()
Yonghong Song yhs@fb.com bpf: Fix potentially incorrect results with bpf_get_local_storage()
Miklos Szeredi mszeredi@redhat.com ovl: fix deadlock in splice write
Christian Hewitt christianshewitt@gmail.com drm/meson: fix colour distortion from HDR set during vendor u-boot
Aya Levin ayal@nvidia.com net/mlx5: Fix return value from tracer initialization
Shay Drory shayd@nvidia.com net/mlx5: Synchronize correct IRQ when destroying CQ
Chris Mi cmi@nvidia.com net/mlx5e: TC, Fix error handling memory leak
Aya Levin ayal@nvidia.com net/mlx5: Block switchdev mode while devlink traps are active
Maxim Mikityanskiy maximmi@nvidia.com net/mlx5e: Destroy page pool after XDP SQ to fix use-after-free
Roi Dayan roid@nvidia.com net/mlx5e: Avoid creating tunnel headers for local route
Alex Vesker valex@nvidia.com net/mlx5: DR, Add fail on error check on decap
Leon Romanovsky leon@kernel.org net/mlx5: Don't skip subfunction cleanup in case of error in module init
Hao Xu haoxu@linux.alibaba.com io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
Hao Xu haoxu@linux.alibaba.com io-wq: fix bug of creating io-wokers unconditionally
Guillaume Nault gnault@redhat.com bareudp: Fix invalid read beyond skb's linear data
Roi Dayan roid@nvidia.com psample: Add a fwd declaration for skbuff
Md Fahad Iqbal Polash md.fahad.iqbal.polash@intel.com iavf: Set RSS LUT and key in reset handle path
Brett Creeley brett.creeley@intel.com ice: don't remove netdev->dev_addr from uc sync list
Anirudh Venkataramanan anirudh.venkataramanan@intel.com ice: Stop processing VF messages during teardown
Anirudh Venkataramanan anirudh.venkataramanan@intel.com ice: Prevent probing virtual functions
Hangbin Liu liuhangbin@gmail.com net: sched: act_mirred: Reset ct info when mirror/redirect skb
Guvenc Gulce guvenc@linux.ibm.com net/smc: Correct smc link connection counter in case of smc client
Karsten Graul kgraul@linux.ibm.com net/smc: fix wait on already cleared link
Nadav Amit namit@vmware.com io_uring: clear TIF_NOTIFY_SIGNAL when running task work
Pali Rohár pali@kernel.org ppp: Fix generating ifname when empty IFLA_IFNAME is specified
Ben Hutchings ben.hutchings@mind.be net: phy: micrel: Fix link detection on ksz87xx switch"
Oleksij Rempel linux@rempel-privat.de net: dsa: qca: ar9331: make proper initial port defaults
Tatsuhiko Yasumatsu th.yasumatsu@gmail.com bpf: Fix integer overflow involving bucket_size
Daniel Xu dxu@dxuuu.xyz libbpf: Do not close un-owned FD 0 on errors
Robin Gögge r.goegge@googlemail.com libbpf: Fix probe for BPF_PROG_TYPE_CGROUP_SOCKOPT
Christophe JAILLET christophe.jaillet@wanadoo.fr drm/amd/pm: Fix a memory leak in an error handling path in 'vangogh_tables_init()'
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Apply mid ACK for small core
Hans de Goede hdegoede@redhat.com platform/x86: pcengines-apuv2: Add missing terminating entries to gpio-lookup tables
John Hubbard jhubbard@nvidia.com net: mvvp2: fix short frame size on s390
DENG Qingfang dqfext@gmail.com net: dsa: mt7530: add the missing RxUnicast MIB counter
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Fix mono playback
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Fix LRCLK frame start edge
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: PLL must be running when changing MCLK_SRC_SEL
Andy Shevchenko andriy.shevchenko@linux.intel.com pinctrl: tigerlake: Fix GPIO mapping for newer version of software
Yajun Deng yajun.deng@linux.dev netfilter: nf_conntrack_bridge: Fix memory leak when error
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Remove duplicate control for WNF filter frequency
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Fix inversion of ADC Notch Switch control
Guennadi Liakhovetski guennadi.liakhovetski@linux.intel.com ASoC: SOF: Intel: hda-ipc: fix reply size checking
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: SOF: Intel: Kconfig: fix SoundWire dependencies
Tianjia Zhang tianjia.zhang@linux.alibaba.com selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
Mike Tipton mdtipton@codeaurora.org interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Fix bclk calculation for mono
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Don't allow SND_SOC_DAIFMT_LEFT_J
Richard Fitzgerald rf@opensource.cirrus.com ASoC: cs42l42: Correct definition of ADC Volume control
Hsin-Yi Wang hsinyi@chromium.org pinctrl: mediatek: Fix fallback behavior for bias_set_combo
jason-jh.lin jason-jh.lin@mediatek.com drm/mediatek: Fix cursor plane no update
Dongliang Mu mudongliangabcd@gmail.com ieee802154: hwsim: fix GPF in hwsim_new_edge_nl
Dongliang Mu mudongliangabcd@gmail.com ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi
Alex Deucher alexander.deucher@amd.com drm/amdgpu: handle VCN instances when harvesting (v2)
Alex Deucher alexander.deucher@amd.com drm/amdgpu: don't enable baco on boco platforms in runpm
Solomon Chiu solomon.chiu@amd.com drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled.
Anson Jacob Anson.Jacob@amd.com drm/amd/display: use GFP_ATOMIC in amdgpu_dm_irq_schedule_work
Eric Bernstein eric.bernstein@amd.com drm/amd/display: Remove invalid assert for ODM + MPC case
Ankit Nautiyal ankit.k.nautiyal@intel.com drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg
Zhenyu Wang zhenyuw@linux.intel.com drm/i915/gvt: Fix cached atomics setting for Windows VM
Nathan Chancellor nathan@kernel.org vmlinux.lds.h: Handle clang's module.{c,d}tor sections
Changbin Du changbin.du@intel.com riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it
Dan Williams dan.j.williams@intel.com libnvdimm/region: Fix label activation vs errors
Dan Williams dan.j.williams@intel.com ACPI: NFIT: Fix support for virtual SPA ranges
Damien Le Moal damien.lemoal@wdc.com pinctrl: k210: Fix k210_fpioa_probe()
Luis Henriques lhenriques@suse.de ceph: reduce contention in ceph_check_delayed_caps()
Vineet Gupta vgupta@synopsys.com ARC: fp: set FPU_STATUS.FWE to enable FPU_STATUS update on context switch
Grygorii Strashko grygorii.strashko@ti.com net: ethernet: ti: cpsw: fix min eth packet size for non-switch use-cases
Loic Poulain loic.poulain@linaro.org net: wwan: mhi_wwan_ctrl: Fix possible deadlock
Hsuan-Chi Kuo hsuanchikuo@gmail.com seccomp: Fix setting loaded filter count during TSYNC
Tejun Heo tj@kernel.org cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync
Ewan D. Milne emilne@redhat.com scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash
Pavel Begunkov asml.silence@gmail.com io_uring: fix ctx-exit io_rsrc_put_work() deadlock
Jens Axboe axboe@kernel.dk io_uring: drop ctx->uring_lock before flushing work item
Ronnie Sahlberg lsahlber@redhat.com cifs: use the correct max-length for dentry_path_raw()
Rohith Surabattula rohiths@microsoft.com cifs: Call close synchronously during unlink/rename/lease break.
Shyam Prasad N sprasad@microsoft.com cifs: create sd context must be a multiple of 8
Rohith Surabattula rohiths@microsoft.com cifs: Handle race conditions during rename
Greg Kroah-Hartman gregkh@linuxfoundation.org i2c: dev: zero out array used for i2c reads from userspace
Takashi Iwai tiwai@suse.de ASoC: intel: atom: Fix reference to PCM buffer address
Takashi Iwai tiwai@suse.de ASoC: kirkwood: Fix reference to PCM buffer address
Mark Brown broonie@kernel.org ASoC: tlv320aic31xx: Fix jack detection after suspend
Takashi Iwai tiwai@suse.de ASoC: uniphier: Fix reference to PCM buffer address
Takashi Iwai tiwai@suse.de ASoC: xilinx: Fix reference to PCM buffer address
Takashi Iwai tiwai@suse.de ASoC: amd: Fix reference to PCM buffer address
Colin Ian King colin.king@canonical.com iio: adc: Fix incorrect exit of for-loop
Chris Lesiak chris.lesiak@licor.com iio: humidity: hdc100x: Add margin to the conversion time
Antti Keränen detegr@rbx.email iio: adis: set GPIO reset pin direction
Uwe Kleine-König u.kleine-koenig@pengutronix.de iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "usb: dwc3: gadget: Use list_replace_init() before traversing lists"
Liang Wang wangliang101@huawei.com lib: use PFN_PHYS() in devmem_is_allowed()
-------------
Diffstat:
Documentation/virt/kvm/locking.rst | 8 +- Makefile | 4 +- arch/arc/kernel/fpu.c | 9 +- arch/arm64/kvm/hyp/nvhe/mem_protect.c | 2 +- arch/powerpc/include/asm/interrupt.h | 3 + arch/powerpc/include/asm/irq.h | 2 +- arch/powerpc/include/asm/ptrace.h | 16 +++ arch/powerpc/kernel/asm-offsets.c | 31 +++-- arch/powerpc/kernel/head_book3s_32.S | 2 +- arch/powerpc/kernel/head_booke.h | 27 +---- arch/powerpc/kernel/irq.c | 7 +- arch/powerpc/kernel/kprobes.c | 3 +- arch/powerpc/kernel/sysfs.c | 2 +- arch/powerpc/kernel/time.c | 2 +- arch/powerpc/kernel/traps.c | 9 +- arch/powerpc/platforms/pseries/setup.c | 5 +- arch/powerpc/sysdev/xive/common.c | 35 ++++-- arch/riscv/kernel/Makefile | 2 +- arch/x86/events/intel/core.c | 23 ++-- arch/x86/events/perf_event.h | 15 +++ arch/x86/include/asm/kvm_host.h | 7 ++ arch/x86/kernel/apic/io_apic.c | 6 +- arch/x86/kernel/apic/msi.c | 11 +- arch/x86/kernel/cpu/resctrl/monitor.c | 27 +++-- arch/x86/kernel/hpet.c | 2 +- arch/x86/kvm/mmu/mmu.c | 28 +++++ arch/x86/kvm/mmu/tdp_mmu.c | 26 +++-- arch/x86/kvm/vmx/nested.c | 3 +- arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/tools/chkobjdump.awk | 1 + block/blk-cgroup.c | 14 ++- drivers/acpi/nfit/core.c | 3 + drivers/base/core.c | 1 + drivers/block/nbd.c | 14 ++- drivers/firmware/efi/libstub/arm64-stub.c | 69 ++++++++++-- drivers/firmware/efi/libstub/randomalloc.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 +- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 2 +- .../gpu/drm/amd/display/dc/dcn30/dcn30_resource.c | 1 - drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 2 +- drivers/gpu/drm/i915/display/intel_display.c | 34 ++++-- drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 + drivers/gpu/drm/i915/i915_gpu_error.c | 19 +++- drivers/gpu/drm/i915/i915_reg.h | 16 ++- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 3 - drivers/gpu/drm/mediatek/mtk_drm_plane.c | 60 +++++----- drivers/gpu/drm/meson/meson_registers.h | 5 + drivers/gpu/drm/meson/meson_viu.c | 7 +- drivers/i2c/busses/i2c-bcm-iproc.c | 4 +- drivers/i2c/i2c-dev.c | 5 +- drivers/iio/adc/palmas_gpadc.c | 4 +- drivers/iio/adc/ti-ads7950.c | 1 - drivers/iio/humidity/hdc100x.c | 6 +- drivers/iio/imu/adis.c | 3 +- drivers/infiniband/hw/mlx5/cq.c | 4 +- drivers/infiniband/hw/mlx5/devx.c | 3 +- drivers/interconnect/qcom/icc-rpmh.c | 10 +- drivers/net/bareudp.c | 16 ++- drivers/net/dsa/hirschmann/hellcreek.c | 7 +- drivers/net/dsa/lan9303-core.c | 34 +++--- drivers/net/dsa/lantiq_gswip.c | 14 ++- drivers/net/dsa/microchip/ksz8795.c | 82 +++++++++++--- drivers/net/dsa/microchip/ksz8795_reg.h | 4 + drivers/net/dsa/microchip/ksz_common.h | 9 +- drivers/net/dsa/mt7530.c | 1 + drivers/net/dsa/qca/ar9331.c | 73 +++++++++++- drivers/net/dsa/sja1105/sja1105_main.c | 4 +- drivers/net/ethernet/intel/iavf/iavf_main.c | 13 ++- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_main.c | 28 +++-- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 7 ++ drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/cq.c | 1 + .../ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 11 +- .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 5 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 33 ++---- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 20 +++- .../net/ethernet/mellanox/mlx5/core/esw/sample.c | 1 + .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 14 ++- .../net/ethernet/mellanox/mlx5/core/fpga/conn.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 +- .../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 5 + .../ethernet/mellanox/mlx5/core/steering/dr_send.c | 4 +- .../mellanox/mlx5/core/steering/dr_ste_v0.c | 2 + drivers/net/ethernet/ti/cpsw_new.c | 7 +- drivers/net/ethernet/ti/cpsw_priv.h | 4 +- drivers/net/ieee802154/mac802154_hwsim.c | 6 +- drivers/net/phy/micrel.c | 2 - drivers/net/ppp/ppp_generic.c | 2 +- drivers/net/wwan/mhi_wwan_ctrl.c | 12 +- drivers/nvdimm/namespace_devs.c | 17 ++- drivers/pci/msi.c | 125 +++++++++++++-------- drivers/pinctrl/intel/pinctrl-tigerlake.c | 26 ++--- drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c | 8 +- drivers/pinctrl/pinctrl-k210.c | 26 ++++- drivers/pinctrl/sunxi/pinctrl-sunxi.c | 8 +- drivers/platform/x86/pcengines-apuv2.c | 2 + drivers/scsi/lpfc/lpfc_init.c | 3 +- drivers/usb/dwc3/gadget.c | 18 +-- drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +- drivers/xen/events/events_base.c | 20 +++- fs/ceph/caps.c | 17 ++- fs/ceph/mds_client.c | 25 +++-- fs/ceph/snap.c | 54 +++++---- fs/ceph/super.h | 2 +- fs/cifs/cifsglob.h | 5 + fs/cifs/dir.c | 2 +- fs/cifs/file.c | 35 +++--- fs/cifs/inode.c | 19 +++- fs/cifs/misc.c | 50 +++++++-- fs/cifs/smb2pdu.c | 2 +- fs/io-wq.c | 26 +++-- fs/io_uring.c | 26 +++-- fs/overlayfs/file.c | 47 +++++++- include/asm-generic/vmlinux.lds.h | 1 + include/linux/bpf-cgroup.h | 4 +- include/linux/device.h | 1 + include/linux/inetdevice.h | 2 +- include/linux/irq.h | 2 + include/linux/mlx5/driver.h | 3 +- include/linux/msi.h | 2 +- include/net/psample.h | 2 + include/uapi/linux/neighbour.h | 7 +- kernel/bpf/hashtab.c | 4 +- kernel/bpf/helpers.c | 4 +- kernel/cgroup/rstat.c | 19 ++-- kernel/irq/chip.c | 5 +- kernel/irq/msi.c | 13 ++- kernel/irq/timings.c | 5 + kernel/locking/rtmutex.c | 2 +- kernel/seccomp.c | 2 +- lib/devmem_is_allowed.c | 2 +- mm/slub.c | 4 +- net/bridge/br.c | 3 +- net/bridge/br_fdb.c | 11 +- net/bridge/br_if.c | 2 + net/bridge/br_private.h | 2 +- net/bridge/netfilter/nf_conntrack_bridge.c | 6 + net/core/link_watch.c | 5 +- net/ieee802154/socket.c | 7 +- net/ipv4/igmp.c | 21 ++-- net/ipv4/tcp_bbr.c | 2 +- net/sched/act_mirred.c | 3 + net/smc/af_smc.c | 2 +- net/smc/smc_core.c | 4 +- net/smc/smc_core.h | 4 + net/smc/smc_llc.c | 10 +- net/smc/smc_tx.c | 18 ++- net/smc/smc_wr.c | 10 ++ net/vmw_vsock/virtio_transport.c | 7 +- sound/soc/amd/acp-pcm-dma.c | 2 +- sound/soc/amd/raven/acp3x-pcm-dma.c | 2 +- sound/soc/amd/renoir/acp3x-pdm-dma.c | 2 +- sound/soc/codecs/cs42l42.c | 83 ++++++++------ sound/soc/codecs/cs42l42.h | 3 + sound/soc/codecs/tlv320aic31xx.c | 10 ++ sound/soc/intel/atom/sst-mfld-platform-pcm.c | 3 +- sound/soc/kirkwood/kirkwood-dma.c | 26 +++-- sound/soc/sof/intel/Kconfig | 4 +- sound/soc/sof/intel/hda-ipc.c | 4 +- sound/soc/uniphier/aio-dma.c | 2 +- sound/soc/xilinx/xlnx_formatter_pcm.c | 4 +- tools/lib/bpf/btf.c | 3 +- tools/lib/bpf/libbpf_probes.c | 4 +- tools/testing/selftests/sgx/sigstruct.c | 41 +++---- 169 files changed, 1382 insertions(+), 656 deletions(-)
From: Liang Wang wangliang101@huawei.com
commit 854f32648b8a5e424d682953b1a9f3b7c3322701 upstream.
The physical address may exceed 32 bits on 32-bit systems with more than 32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or the physical address may overflow and be truncated.
We found this bug when mapping a high addresses through devmem tool, when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is used to map a high address that is not in the iomem address range, an unexpected error indicating no permission is returned.
This bug was initially introduced from v2.6.37, and the function was moved to lib in v5.11.
Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem") Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()") Signed-off-by: Liang Wang wangliang101@huawei.com Reviewed-by: Luis Chamberlain mcgrof@kernel.org Cc: Palmer Dabbelt palmerdabbelt@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Russell King linux@armlinux.org.uk Cc: Liang Wang wangliang101@huawei.com Cc: Xiaoming Ni nixiaoming@huawei.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org [2.6.37+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/devmem_is_allowed.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/devmem_is_allowed.c +++ b/lib/devmem_is_allowed.c @@ -19,7 +19,7 @@ */ int devmem_is_allowed(unsigned long pfn) { - if (iomem_is_exclusive(pfn << PAGE_SHIFT)) + if (iomem_is_exclusive(PFN_PHYS(pfn))) return 0; if (!page_is_ram(pfn)) return 1;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 664cc971fb259007e49cc8a3ac43b0787d89443f upstream.
This reverts commit d25d85061bd856d6be221626605319154f9b5043 as it is reported to cause problems on many different types of boards.
Reported-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Reported-by: John Stultz john.stultz@linaro.org Cc: Ray Chi raychi@google.com Link: https://lore.kernel.org/r/CANcMJZCEVxVLyFgLwK98hqBEdc0_n4P0x_K6Gih8zNH3ouzbJ... Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists") Cc: stable stable@vger.kernel.org Cc: Felipe Balbi balbi@kernel.org Cc: Wesley Cheng wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/gadget.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1741,13 +1741,9 @@ static void dwc3_gadget_ep_cleanup_cance { struct dwc3_request *req; struct dwc3_request *tmp; - struct list_head local; struct dwc3 *dwc = dep->dwc;
-restart: - list_replace_init(&dep->cancelled_list, &local); - - list_for_each_entry_safe(req, tmp, &local, list) { + list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) { dwc3_gadget_ep_skip_trbs(dep, req); switch (req->status) { case DWC3_REQUEST_STATUS_DISCONNECTED: @@ -1765,9 +1761,6 @@ restart: break; } } - - if (!list_empty(&dep->cancelled_list)) - goto restart; }
static int dwc3_gadget_ep_dequeue(struct usb_ep *ep, @@ -2963,12 +2956,8 @@ static void dwc3_gadget_ep_cleanup_compl { struct dwc3_request *req; struct dwc3_request *tmp; - struct list_head local;
-restart: - list_replace_init(&dep->started_list, &local); - - list_for_each_entry_safe(req, tmp, &local, list) { + list_for_each_entry_safe(req, tmp, &dep->started_list, list) { int ret;
ret = dwc3_gadget_ep_cleanup_completed_request(dep, event, @@ -2976,9 +2965,6 @@ restart: if (ret) break; } - - if (!list_empty(&dep->started_list)) - goto restart; }
static bool dwc3_gadget_ep_should_continue(struct dwc3_ep *dep)
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
commit 9898cb24e454602beb6e17bacf9f97b26c85c955 upstream.
The ADS7950 requires that CS is deasserted after each SPI word. Before commit e2540da86ef8 ("iio: adc: ti-ads7950: use SPI_CS_WORD to reduce CPU usage") the driver used a message with one spi transfer per channel where each but the last one had .cs_change set to enforce a CS toggle. This was wrongly translated into a message with a single transfer and .cs_change set which results in a CS toggle after each word but the last which corrupts the first adc conversion of all readouts after the first readout.
Fixes: e2540da86ef8 ("iio: adc: ti-ads7950: use SPI_CS_WORD to reduce CPU usage") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Reviewed-by: David Lechner david@lechnology.com Tested-by: David Lechner david@lechnology.com Cc: Stable@vger.kernel.org Link: https://lore.kernel.org/r/20210709101110.1814294-1-u.kleine-koenig@pengutron... Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/ti-ads7950.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/iio/adc/ti-ads7950.c +++ b/drivers/iio/adc/ti-ads7950.c @@ -568,7 +568,6 @@ static int ti_ads7950_probe(struct spi_d st->ring_xfer.tx_buf = &st->tx_buf[0]; st->ring_xfer.rx_buf = &st->rx_buf[0]; /* len will be set later */ - st->ring_xfer.cs_change = true;
spi_message_add_tail(&st->ring_xfer, &st->ring_msg);
From: Antti Keränen detegr@rbx.email
commit 7e77ef8b8d600cf8448a2bbd32f682c28884551f upstream.
Set reset pin direction to output as the reset pin needs to be an active low output pin.
Co-developed-by: Hannu Hartikainen hannu@hrtk.in Signed-off-by: Hannu Hartikainen hannu@hrtk.in Signed-off-by: Antti Keränen detegr@rbx.email Reviewed-by: Nuno Sá nuno.sa@analog.com Fixes: ecb010d44108 ("iio: imu: adis: Refactor adis_initial_startup") Link: https://lore.kernel.org/r/20210708095425.13295-1-detegr@rbx.email Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/imu/adis.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/iio/imu/adis.c +++ b/drivers/iio/imu/adis.c @@ -415,12 +415,11 @@ int __adis_initial_startup(struct adis * int ret;
/* check if the device has rst pin low */ - gpio = devm_gpiod_get_optional(&adis->spi->dev, "reset", GPIOD_ASIS); + gpio = devm_gpiod_get_optional(&adis->spi->dev, "reset", GPIOD_OUT_HIGH); if (IS_ERR(gpio)) return PTR_ERR(gpio);
if (gpio) { - gpiod_set_value_cansleep(gpio, 1); msleep(10); /* bring device out of reset */ gpiod_set_value_cansleep(gpio, 0);
From: Chris Lesiak chris.lesiak@licor.com
commit 84edec86f449adea9ee0b4912a79ab8d9d65abb7 upstream.
The datasheets have the following note for the conversion time specification: "This parameter is specified by design and/or characterization and it is not tested in production."
Parts have been seen that require more time to do 14-bit conversions for the relative humidity channel. The result is ENXIO due to the address phase of a transfer not getting an ACK.
Delay an additional 1 ms per conversion to allow for additional margin.
Fixes: 4839367d99e3 ("iio: humidity: add HDC100x support") Signed-off-by: Chris Lesiak chris.lesiak@licor.com Acked-by: Matt Ranostay matt.ranostay@konsulko.com Link: https://lore.kernel.org/r/20210614141820.2034827-1-chris.lesiak@licor.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/humidity/hdc100x.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/iio/humidity/hdc100x.c +++ b/drivers/iio/humidity/hdc100x.c @@ -25,6 +25,8 @@ #include <linux/iio/trigger_consumer.h> #include <linux/iio/triggered_buffer.h>
+#include <linux/time.h> + #define HDC100X_REG_TEMP 0x00 #define HDC100X_REG_HUMIDITY 0x01
@@ -166,7 +168,7 @@ static int hdc100x_get_measurement(struc struct iio_chan_spec const *chan) { struct i2c_client *client = data->client; - int delay = data->adc_int_us[chan->address]; + int delay = data->adc_int_us[chan->address] + 1*USEC_PER_MSEC; int ret; __be16 val;
@@ -316,7 +318,7 @@ static irqreturn_t hdc100x_trigger_handl struct iio_dev *indio_dev = pf->indio_dev; struct hdc100x_data *data = iio_priv(indio_dev); struct i2c_client *client = data->client; - int delay = data->adc_int_us[0] + data->adc_int_us[1]; + int delay = data->adc_int_us[0] + data->adc_int_us[1] + 2*USEC_PER_MSEC; int ret;
/* dual read starts at temp register */
From: Colin Ian King colin.king@canonical.com
commit 5afc1540f13804a31bb704b763308e17688369c5 upstream.
Currently the for-loop that scans for the optimial adc_period iterates through all the possible adc_period levels because the exit logic in the loop is inverted. I believe the comparison should be swapped and the continue replaced with a break to exit the loop at the correct point.
Addresses-Coverity: ("Continue has no effect") Fixes: e08e19c331fb ("iio:adc: add iio driver for Palmas (twl6035/7) gpadc") Signed-off-by: Colin Ian King colin.king@canonical.com Link: https://lore.kernel.org/r/20210730071651.17394-1-colin.king@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/palmas_gpadc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/iio/adc/palmas_gpadc.c +++ b/drivers/iio/adc/palmas_gpadc.c @@ -664,8 +664,8 @@ static int palmas_adc_wakeup_configure(s
adc_period = adc->auto_conversion_period; for (i = 0; i < 16; ++i) { - if (((1000 * (1 << i)) / 32) < adc_period) - continue; + if (((1000 * (1 << i)) / 32) >= adc_period) + break; } if (i > 0) i--;
From: Takashi Iwai tiwai@suse.de
commit 8b5d95313b6d30f642e4ed0125891984c446604e upstream.
PCM buffers might be allocated dynamically when the buffer preallocation failed or a larger buffer is requested, and it's not guaranteed that substream->dma_buffer points to the actually used buffer. The driver needs to refer to substream->runtime->dma_addr instead for the buffer address.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20210731084331.32225-1-tiwai@suse.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/amd/acp-pcm-dma.c | 2 +- sound/soc/amd/raven/acp3x-pcm-dma.c | 2 +- sound/soc/amd/renoir/acp3x-pdm-dma.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
--- a/sound/soc/amd/acp-pcm-dma.c +++ b/sound/soc/amd/acp-pcm-dma.c @@ -969,7 +969,7 @@ static int acp_dma_hw_params(struct snd_
acp_set_sram_bank_state(rtd->acp_mmio, 0, true); /* Save for runtime private data */ - rtd->dma_addr = substream->dma_buffer.addr; + rtd->dma_addr = runtime->dma_addr; rtd->order = get_order(size);
/* Fill the page table entries in ACP SRAM */ --- a/sound/soc/amd/raven/acp3x-pcm-dma.c +++ b/sound/soc/amd/raven/acp3x-pcm-dma.c @@ -286,7 +286,7 @@ static int acp3x_dma_hw_params(struct sn pr_err("pinfo failed\n"); } size = params_buffer_bytes(params); - rtd->dma_addr = substream->dma_buffer.addr; + rtd->dma_addr = substream->runtime->dma_addr; rtd->num_pages = (PAGE_ALIGN(size) >> PAGE_SHIFT); config_acp3x_dma(rtd, substream->stream); return 0; --- a/sound/soc/amd/renoir/acp3x-pdm-dma.c +++ b/sound/soc/amd/renoir/acp3x-pdm-dma.c @@ -246,7 +246,7 @@ static int acp_pdm_dma_hw_params(struct return -EINVAL; size = params_buffer_bytes(params); period_bytes = params_period_bytes(params); - rtd->dma_addr = substream->dma_buffer.addr; + rtd->dma_addr = substream->runtime->dma_addr; rtd->num_pages = (PAGE_ALIGN(size) >> PAGE_SHIFT); config_acp_dma(rtd, substream->stream); init_pdm_ring_buffer(MEM_WINDOW_START, size, period_bytes,
From: Takashi Iwai tiwai@suse.de
commit 42bc62c9f1d3d4880bdc27acb5ab4784209bb0b0 upstream.
PCM buffers might be allocated dynamically when the buffer preallocation failed or a larger buffer is requested, and it's not guaranteed that substream->dma_buffer points to the actually used buffer. The driver needs to refer to substream->runtime->dma_addr instead for the buffer address.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20210728112353.6675-4-tiwai@suse.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/xilinx/xlnx_formatter_pcm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/sound/soc/xilinx/xlnx_formatter_pcm.c +++ b/sound/soc/xilinx/xlnx_formatter_pcm.c @@ -452,8 +452,8 @@ static int xlnx_formatter_pcm_hw_params(
stream_data->buffer_size = size;
- low = lower_32_bits(substream->dma_buffer.addr); - high = upper_32_bits(substream->dma_buffer.addr); + low = lower_32_bits(runtime->dma_addr); + high = upper_32_bits(runtime->dma_addr); writel(low, stream_data->mmio + XLNX_AUD_BUFF_ADDR_LSB); writel(high, stream_data->mmio + XLNX_AUD_BUFF_ADDR_MSB);
From: Takashi Iwai tiwai@suse.de
commit 827f3164aaa579eee6fd50c6654861d54f282a11 upstream.
Along with the transition to the managed PCM buffers, the driver now accepts the dynamically allocated buffer, while it still kept the reference to the old preallocated buffer address. This patch corrects to the right reference via runtime->dma_addr.
(Although this might have been already buggy before the cleanup with the managed buffer, let's put Fixes tag to point that; it's a corner case, after all.)
Fixes: d55894bc2763 ("ASoC: uniphier: Use managed buffer allocation") Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20210728112353.6675-5-tiwai@suse.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/uniphier/aio-dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/uniphier/aio-dma.c +++ b/sound/soc/uniphier/aio-dma.c @@ -198,7 +198,7 @@ static int uniphier_aiodma_mmap(struct s vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
return remap_pfn_range(vma, vma->vm_start, - substream->dma_buffer.addr >> PAGE_SHIFT, + substream->runtime->dma_addr >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot); }
From: Mark Brown broonie@kernel.org
commit 2c39ca6885a2ec03e5c9e7c12a4da2aa8926605a upstream.
The tlv320aic31xx driver relies on regcache_sync() to restore the register contents after going to _BIAS_OFF, for example during system suspend. This does not work for the jack detection configuration since that is configured via the same register that status is read back from so the register is volatile and not cached. This can also cause issues during init if the jack detection ends up getting set up before the CODEC is initially brought out of _BIAS_OFF, we will reset the CODEC and resync the cache as part of that process.
Fix this by explicitly reapplying the jack detection configuration after resyncing the register cache during power on.
This issue was found by an engineer working off-list on a product kernel, I just wrote up the upstream fix.
Signed-off-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20210723180200.25105-1-broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/tlv320aic31xx.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/sound/soc/codecs/tlv320aic31xx.c +++ b/sound/soc/codecs/tlv320aic31xx.c @@ -35,6 +35,9 @@
#include "tlv320aic31xx.h"
+static int aic31xx_set_jack(struct snd_soc_component *component, + struct snd_soc_jack *jack, void *data); + static const struct reg_default aic31xx_reg_defaults[] = { { AIC31XX_CLKMUX, 0x00 }, { AIC31XX_PLLPR, 0x11 }, @@ -1256,6 +1259,13 @@ static int aic31xx_power_on(struct snd_s return ret; }
+ /* + * The jack detection configuration is in the same register + * that is used to report jack detect status so is volatile + * and not covered by the cache sync, restore it separately. + */ + aic31xx_set_jack(component, aic31xx->jack, NULL); + return 0; }
From: Takashi Iwai tiwai@suse.de
commit bb6a40fc5a830cae45ddd5cd6cfa151b008522ed upstream.
The transition to the managed PCM buffers allowed the dynamically buffer allocation, while the driver code still assumes the fixed preallocation buffer and sets up the DMA stuff at the open call. This needs to be moved to hw_params after the buffer allocation and setup. Also, the reference to the buffer address has to be corrected to runtime->dma_addr.
Fixes: b3c0ae75f5d3 ("ASoC: kirkwood: Use managed DMA buffer allocation") Cc: Lars-Peter Clausen lars@metafoo.de Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20210728112353.6675-6-tiwai@suse.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/kirkwood/kirkwood-dma.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
--- a/sound/soc/kirkwood/kirkwood-dma.c +++ b/sound/soc/kirkwood/kirkwood-dma.c @@ -104,8 +104,6 @@ static int kirkwood_dma_open(struct snd_ int err; struct snd_pcm_runtime *runtime = substream->runtime; struct kirkwood_dma_data *priv = kirkwood_priv(substream); - const struct mbus_dram_target_info *dram; - unsigned long addr;
snd_soc_set_runtime_hwparams(substream, &kirkwood_dma_snd_hw);
@@ -142,20 +140,14 @@ static int kirkwood_dma_open(struct snd_ writel((unsigned int)-1, priv->io + KIRKWOOD_ERR_MASK); }
- dram = mv_mbus_dram_info(); - addr = substream->dma_buffer.addr; if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) { if (priv->substream_play) return -EBUSY; priv->substream_play = substream; - kirkwood_dma_conf_mbus_windows(priv->io, - KIRKWOOD_PLAYBACK_WIN, addr, dram); } else { if (priv->substream_rec) return -EBUSY; priv->substream_rec = substream; - kirkwood_dma_conf_mbus_windows(priv->io, - KIRKWOOD_RECORD_WIN, addr, dram); }
return 0; @@ -182,6 +174,23 @@ static int kirkwood_dma_close(struct snd return 0; }
+static int kirkwood_dma_hw_params(struct snd_soc_component *component, + struct snd_pcm_substream *substream, + struct snd_pcm_hw_params *params) +{ + struct kirkwood_dma_data *priv = kirkwood_priv(substream); + const struct mbus_dram_target_info *dram = mv_mbus_dram_info(); + unsigned long addr = substream->runtime->dma_addr; + + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + kirkwood_dma_conf_mbus_windows(priv->io, + KIRKWOOD_PLAYBACK_WIN, addr, dram); + else + kirkwood_dma_conf_mbus_windows(priv->io, + KIRKWOOD_RECORD_WIN, addr, dram); + return 0; +} + static int kirkwood_dma_prepare(struct snd_soc_component *component, struct snd_pcm_substream *substream) { @@ -246,6 +255,7 @@ const struct snd_soc_component_driver ki .name = DRV_NAME, .open = kirkwood_dma_open, .close = kirkwood_dma_close, + .hw_params = kirkwood_dma_hw_params, .prepare = kirkwood_dma_prepare, .pointer = kirkwood_dma_pointer, .pcm_construct = kirkwood_dma_new,
From: Takashi Iwai tiwai@suse.de
commit 2e6b836312a477d647a7920b56810a5a25f6c856 upstream.
PCM buffers might be allocated dynamically when the buffer preallocation failed or a larger buffer is requested, and it's not guaranteed that substream->dma_buffer points to the actually used buffer. The address should be retrieved from runtime->dma_addr, instead of substream->dma_buffer (and shouldn't use virt_to_phys).
Also, remove the line overriding runtime->dma_area superfluously, which was already set up at the PCM buffer allocation.
Cc: Cezary Rojewski cezary.rojewski@intel.com Cc: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20210728112353.6675-3-tiwai@suse.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/intel/atom/sst-mfld-platform-pcm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c +++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c @@ -127,7 +127,7 @@ static void sst_fill_alloc_params(struct snd_pcm_uframes_t period_size; ssize_t periodbytes; ssize_t buffer_bytes = snd_pcm_lib_buffer_bytes(substream); - u32 buffer_addr = virt_to_phys(substream->dma_buffer.area); + u32 buffer_addr = substream->runtime->dma_addr;
channels = substream->runtime->channels; period_size = substream->runtime->period_size; @@ -233,7 +233,6 @@ static int sst_platform_alloc_stream(str /* set codec params and inform SST driver the same */ sst_fill_pcm_params(substream, ¶m); sst_fill_alloc_params(substream, &alloc_params); - substream->runtime->dma_area = substream->dma_buffer.area; str_params.sparams = param; str_params.aparams = alloc_params; str_params.codec = SST_CODEC_TYPE_PCM;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 86ff25ed6cd8240d18df58930bd8848b19fce308 upstream.
If an i2c driver happens to not provide the full amount of data that a user asks for, it is possible that some uninitialized data could be sent to userspace. While all in-kernel drivers look to be safe, just be sure by initializing the buffer to zero before it is passed to the i2c driver so that any future drivers will not have this issue.
Also properly copy the amount of data recvieved to the userspace buffer, as pointed out by Dan Carpenter.
Reported-by: Eric Dumazet edumazet@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/i2c-dev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/i2c/i2c-dev.c +++ b/drivers/i2c/i2c-dev.c @@ -141,7 +141,7 @@ static ssize_t i2cdev_read(struct file * if (count > 8192) count = 8192;
- tmp = kmalloc(count, GFP_KERNEL); + tmp = kzalloc(count, GFP_KERNEL); if (tmp == NULL) return -ENOMEM;
@@ -150,7 +150,8 @@ static ssize_t i2cdev_read(struct file *
ret = i2c_master_recv(client, tmp, count); if (ret >= 0) - ret = copy_to_user(buf, tmp, count) ? -EFAULT : ret; + if (copy_to_user(buf, tmp, ret)) + ret = -EFAULT; kfree(tmp); return ret; }
From: Rohith Surabattula rohiths@microsoft.com
commit 41535701da3324b80029cabb501e86c4fafe339d upstream.
When rename is executed on directory which has files for which close is deferred, then rename will fail with EACCES.
This patch will try to close all deferred files when EACCES is received and retry rename on a directory.
Signed-off-by: Rohith Surabattula rohiths@microsoft.com Cc: stable@vger.kernel.org # 5.13 Reviewed-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/inode.c | 19 +++++++++++++++++-- fs/cifs/misc.c | 16 +++++++++++----- 2 files changed, 28 insertions(+), 7 deletions(-)
--- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1637,7 +1637,7 @@ int cifs_unlink(struct inode *dir, struc goto unlink_out; }
- cifs_close_all_deferred_files(tcon); + cifs_close_deferred_file(CIFS_I(inode)); if (cap_unix(tcon->ses) && (CIFS_UNIX_POSIX_PATH_OPS_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability))) { rc = CIFSPOSIXDelFile(xid, tcon, full_path, @@ -2096,6 +2096,7 @@ cifs_rename2(struct user_namespace *mnt_ FILE_UNIX_BASIC_INFO *info_buf_target; unsigned int xid; int rc, tmprc; + int retry_count = 0;
if (flags & ~RENAME_NOREPLACE) return -EINVAL; @@ -2125,10 +2126,24 @@ cifs_rename2(struct user_namespace *mnt_ goto cifs_rename_exit; }
- cifs_close_all_deferred_files(tcon); + cifs_close_deferred_file(CIFS_I(d_inode(source_dentry))); + if (d_inode(target_dentry) != NULL) + cifs_close_deferred_file(CIFS_I(d_inode(target_dentry))); + rc = cifs_do_rename(xid, source_dentry, from_name, target_dentry, to_name);
+ if (rc == -EACCES) { + while (retry_count < 3) { + cifs_close_all_deferred_files(tcon); + rc = cifs_do_rename(xid, source_dentry, from_name, target_dentry, + to_name); + if (rc != -EACCES) + break; + retry_count++; + } + } + /* * No-replace is the natural behavior for CIFS, so skip unlink hacks. */ --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -735,13 +735,19 @@ void cifs_close_deferred_file(struct cifsInodeInfo *cifs_inode) { struct cifsFileInfo *cfile = NULL; - struct cifs_deferred_close *dclose; + + if (cifs_inode == NULL) + return;
list_for_each_entry(cfile, &cifs_inode->openFileList, flist) { - spin_lock(&cifs_inode->deferred_lock); - if (cifs_is_deferred_close(cfile, &dclose)) - mod_delayed_work(deferredclose_wq, &cfile->deferred, 0); - spin_unlock(&cifs_inode->deferred_lock); + if (delayed_work_pending(&cfile->deferred)) { + /* + * If there is no pending work, mod_delayed_work queues new work. + * So, Increase the ref count to avoid use-after-free. + */ + if (!mod_delayed_work(deferredclose_wq, &cfile->deferred, 0)) + cifsFileInfo_get(cfile); + } } }
From: Shyam Prasad N sprasad@microsoft.com
commit 7d3fc01796fc895e5fcce45c994c5a8db8120a8d upstream.
We used to follow the rule earlier that the create SD context always be a multiple of 8. However, with the change: cifs: refactor create_sd_buf() and and avoid corrupting the buffer ...we recompute the length, and we failed that rule. Fixing that with this change.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2375,7 +2375,7 @@ create_sd_buf(umode_t mode, bool set_own memcpy(aclptr, &acl, sizeof(struct cifs_acl));
buf->ccontext.DataLength = cpu_to_le32(ptr - (__u8 *)&buf->sd); - *len = ptr - (__u8 *)buf; + *len = roundup(ptr - (__u8 *)buf, 8);
return buf; }
From: Rohith Surabattula rohiths@microsoft.com
commit 9e992755be8f2d458a0bcbefd19e493483c1dba2 upstream.
During unlink/rename/lease break, deferred work for close is scheduled immediately but in an asynchronous manner which might lead to race with actual(unlink/rename) commands.
This change will schedule close synchronously which will avoid the race conditions with other commands.
Signed-off-by: Rohith Surabattula rohiths@microsoft.com Reviewed-by: Shyam Prasad N sprasad@microsoft.com Cc: stable@vger.kernel.org # 5.13 Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/cifsglob.h | 5 +++++ fs/cifs/file.c | 35 +++++++++++++++++------------------ fs/cifs/misc.c | 46 ++++++++++++++++++++++++++++++++++------------ 3 files changed, 56 insertions(+), 30 deletions(-)
--- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1615,6 +1615,11 @@ struct dfs_info3_param { int ttl; };
+struct file_list { + struct list_head list; + struct cifsFileInfo *cfile; +}; + /* * common struct for holding inode info when searching for or updating an * inode with new info --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4860,34 +4860,33 @@ void cifs_oplock_break(struct work_struc
oplock_break_ack: /* - * releasing stale oplock after recent reconnect of smb session using - * a now incorrect file handle is not a data integrity issue but do - * not bother sending an oplock release if session to server still is - * disconnected since oplock already released by the server - */ - if (!cfile->oplock_break_cancelled) { - rc = tcon->ses->server->ops->oplock_response(tcon, &cfile->fid, - cinode); - cifs_dbg(FYI, "Oplock release rc = %d\n", rc); - } - /* * When oplock break is received and there are no active * file handles but cached, then schedule deferred close immediately. * So, new open will not use cached handle. */ spin_lock(&CIFS_I(inode)->deferred_lock); is_deferred = cifs_is_deferred_close(cfile, &dclose); + spin_unlock(&CIFS_I(inode)->deferred_lock); if (is_deferred && cfile->deferred_close_scheduled && delayed_work_pending(&cfile->deferred)) { - /* - * If there is no pending work, mod_delayed_work queues new work. - * So, Increase the ref count to avoid use-after-free. - */ - if (!mod_delayed_work(deferredclose_wq, &cfile->deferred, 0)) - cifsFileInfo_get(cfile); + if (cancel_delayed_work(&cfile->deferred)) { + _cifsFileInfo_put(cfile, false, false); + goto oplock_break_done; + } } - spin_unlock(&CIFS_I(inode)->deferred_lock); + /* + * releasing stale oplock after recent reconnect of smb session using + * a now incorrect file handle is not a data integrity issue but do + * not bother sending an oplock release if session to server still is + * disconnected since oplock already released by the server + */ + if (!cfile->oplock_break_cancelled) { + rc = tcon->ses->server->ops->oplock_response(tcon, &cfile->fid, + cinode); + cifs_dbg(FYI, "Oplock release rc = %d\n", rc); + } +oplock_break_done: _cifsFileInfo_put(cfile, false /* do not wait for ourself */, false); cifs_done_oplock_break(cinode); } --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -735,20 +735,32 @@ void cifs_close_deferred_file(struct cifsInodeInfo *cifs_inode) { struct cifsFileInfo *cfile = NULL; + struct file_list *tmp_list, *tmp_next_list; + struct list_head file_head;
if (cifs_inode == NULL) return;
+ INIT_LIST_HEAD(&file_head); + spin_lock(&cifs_inode->open_file_lock); list_for_each_entry(cfile, &cifs_inode->openFileList, flist) { if (delayed_work_pending(&cfile->deferred)) { - /* - * If there is no pending work, mod_delayed_work queues new work. - * So, Increase the ref count to avoid use-after-free. - */ - if (!mod_delayed_work(deferredclose_wq, &cfile->deferred, 0)) - cifsFileInfo_get(cfile); + if (cancel_delayed_work(&cfile->deferred)) { + tmp_list = kmalloc(sizeof(struct file_list), GFP_ATOMIC); + if (tmp_list == NULL) + continue; + tmp_list->cfile = cfile; + list_add_tail(&tmp_list->list, &file_head); + } } } + spin_unlock(&cifs_inode->open_file_lock); + + list_for_each_entry_safe(tmp_list, tmp_next_list, &file_head, list) { + _cifsFileInfo_put(tmp_list->cfile, true, false); + list_del(&tmp_list->list); + kfree(tmp_list); + } }
void @@ -756,20 +768,30 @@ cifs_close_all_deferred_files(struct cif { struct cifsFileInfo *cfile; struct list_head *tmp; + struct file_list *tmp_list, *tmp_next_list; + struct list_head file_head;
+ INIT_LIST_HEAD(&file_head); spin_lock(&tcon->open_file_lock); list_for_each(tmp, &tcon->openFileList) { cfile = list_entry(tmp, struct cifsFileInfo, tlist); if (delayed_work_pending(&cfile->deferred)) { - /* - * If there is no pending work, mod_delayed_work queues new work. - * So, Increase the ref count to avoid use-after-free. - */ - if (!mod_delayed_work(deferredclose_wq, &cfile->deferred, 0)) - cifsFileInfo_get(cfile); + if (cancel_delayed_work(&cfile->deferred)) { + tmp_list = kmalloc(sizeof(struct file_list), GFP_ATOMIC); + if (tmp_list == NULL) + continue; + tmp_list->cfile = cfile; + list_add_tail(&tmp_list->list, &file_head); + } } } spin_unlock(&tcon->open_file_lock); + + list_for_each_entry_safe(tmp_list, tmp_next_list, &file_head, list) { + _cifsFileInfo_put(tmp_list->cfile, true, false); + list_del(&tmp_list->list); + kfree(tmp_list); + } }
/* parses DFS refferal V3 structure
From: Ronnie Sahlberg lsahlber@redhat.com
commit 981567bd965329df7e64b13e92a54da816c1e0a4 upstream.
RHBZ: 1972502
PATH_MAX is 4096 but PAGE_SIZE can be >4096 on some architectures such as ppc and would thus write beyond the end of the actual object.
Cc: stable@vger.kernel.org Reported-by: Xiaoli Feng xifeng@redhat.com Suggested-by: Brian foster bfoster@redhat.com Reviewed-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -112,7 +112,7 @@ build_path_from_dentry_optional_prefix(s if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) pplen = cifs_sb->prepath ? strlen(cifs_sb->prepath) + 1 : 0;
- s = dentry_path_raw(direntry, page, PAGE_SIZE); + s = dentry_path_raw(direntry, page, PATH_MAX); if (IS_ERR(s)) return s; if (!s[1]) // for root we want "", not "/"
From: Jens Axboe axboe@kernel.dk
commit c018db4a57f3e31a9cb24d528e9f094eda89a499 upstream.
Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite:
====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0
but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock);
*** DEADLOCK ***
2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30
which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL.
Fix this by dropping the uring_lock a bit earlier as part of the flush.
Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/404 Tested-by: Ammar Faizi ammarfaizi2@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7166,17 +7166,19 @@ static int io_rsrc_ref_quiesce(struct io /* kill initial ref, already quiesced if zero */ if (atomic_dec_and_test(&data->refs)) break; + mutex_unlock(&ctx->uring_lock); flush_delayed_work(&ctx->rsrc_put_work); ret = wait_for_completion_interruptible(&data->done); - if (!ret) + if (!ret) { + mutex_lock(&ctx->uring_lock); break; + }
atomic_inc(&data->refs); /* wait for all works potentially completing data->done */ flush_delayed_work(&ctx->rsrc_put_work); reinit_completion(&data->done);
- mutex_unlock(&ctx->uring_lock); ret = io_run_task_work_sig(); mutex_lock(&ctx->uring_lock); } while (ret >= 0);
From: Pavel Begunkov asml.silence@gmail.com
commit 43597aac1f87230cb565ab354d331682f13d3c7a upstream.
__io_rsrc_put_work() might need ->uring_lock, so nobody should wait for rsrc nodes holding the mutex. However, that's exactly what io_ring_ctx_free() does with io_wait_rsrc_data().
Split it into rsrc wait + dealloc, and move the first one out of the lock.
Cc: stable@vger.kernel.org Fixes: b60c8dce33895 ("io_uring: preparation for rsrc tagging") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/0130c5c2693468173ec1afab714e0885d2c9c363.162855978... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8614,13 +8614,10 @@ static void io_req_caches_free(struct io mutex_unlock(&ctx->uring_lock); }
-static bool io_wait_rsrc_data(struct io_rsrc_data *data) +static void io_wait_rsrc_data(struct io_rsrc_data *data) { - if (!data) - return false; - if (!atomic_dec_and_test(&data->refs)) + if (data && !atomic_dec_and_test(&data->refs)) wait_for_completion(&data->done); - return true; }
static void io_ring_ctx_free(struct io_ring_ctx *ctx) @@ -8632,10 +8629,14 @@ static void io_ring_ctx_free(struct io_r ctx->mm_account = NULL; }
+ /* __io_rsrc_put_work() may need uring_lock to progress, wait w/o it */ + io_wait_rsrc_data(ctx->buf_data); + io_wait_rsrc_data(ctx->file_data); + mutex_lock(&ctx->uring_lock); - if (io_wait_rsrc_data(ctx->buf_data)) + if (ctx->buf_data) __io_sqe_buffers_unregister(ctx); - if (io_wait_rsrc_data(ctx->file_data)) + if (ctx->file_data) __io_sqe_files_unregister(ctx); if (ctx->rings) __io_cqring_overflow_flush(ctx, true);
From: Ewan D. Milne emilne@redhat.com
commit 9977d880f7a3c233db9165a75a3a14defc2a4aee upstream.
The phba->poll_list is traversed in case of an error in lpfc_sli4_hba_setup(), so it must be initialized earlier in case the error path is taken.
[ 490.030738] lpfc 0000:65:00.0: 0:1413 Failed to init iocb list. [ 490.036661] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 490.044485] PGD 0 P4D 0 [ 490.047027] Oops: 0000 [#1] SMP PTI [ 490.050518] CPU: 0 PID: 7 Comm: kworker/0:1 Kdump: loaded Tainted: G I --------- - - 4.18. [ 490.060511] Hardware name: Dell Inc. PowerEdge R440/0WKGTH, BIOS 1.4.8 05/22/2018 [ 490.067994] Workqueue: events work_for_cpu_fn [ 490.072371] RIP: 0010:lpfc_sli4_cleanup_poll_list+0x20/0xb0 [lpfc] [ 490.078546] Code: cf e9 04 f7 fe ff 0f 1f 40 00 0f 1f 44 00 00 41 57 49 89 ff 41 56 41 55 41 54 4d 8d a79 [ 490.097291] RSP: 0018:ffffbd1a463dbcc8 EFLAGS: 00010246 [ 490.102518] RAX: 0000000000008200 RBX: ffff945cdb8c0000 RCX: 0000000000000000 [ 490.109649] RDX: 0000000000018200 RSI: ffff9468d0e16818 RDI: 0000000000000000 [ 490.116783] RBP: ffff945cdb8c1740 R08: 00000000000015c5 R09: 0000000000000042 [ 490.123915] R10: 0000000000000000 R11: ffffbd1a463dbab0 R12: ffff945cdb8c25c0 [ 490.131049] R13: 00000000fffffff4 R14: 0000000000001800 R15: ffff945cdb8c0000 [ 490.138182] FS: 0000000000000000(0000) GS:ffff9468d0e00000(0000) knlGS:0000000000000000 [ 490.146267] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 490.152013] CR2: 0000000000000000 CR3: 000000042ca10002 CR4: 00000000007706f0 [ 490.159146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 490.166277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 490.173409] PKRU: 55555554 [ 490.176123] Call Trace: [ 490.178598] lpfc_sli4_queue_destroy+0x7f/0x3c0 [lpfc] [ 490.183745] lpfc_sli4_hba_setup+0x1bc7/0x23e0 [lpfc] [ 490.188797] ? kernfs_activate+0x63/0x80 [ 490.192721] ? kernfs_add_one+0xe7/0x130 [ 490.196647] ? __kernfs_create_file+0x80/0xb0 [ 490.201020] ? lpfc_pci_probe_one_s4.isra.48+0x46f/0x9e0 [lpfc] [ 490.206944] lpfc_pci_probe_one_s4.isra.48+0x46f/0x9e0 [lpfc] [ 490.212697] lpfc_pci_probe_one+0x179/0xb70 [lpfc] [ 490.217492] local_pci_probe+0x41/0x90 [ 490.221246] work_for_cpu_fn+0x16/0x20 [ 490.224994] process_one_work+0x1a7/0x360 [ 490.229009] ? create_worker+0x1a0/0x1a0 [ 490.232933] worker_thread+0x1cf/0x390 [ 490.236687] ? create_worker+0x1a0/0x1a0 [ 490.240612] kthread+0x116/0x130 [ 490.243846] ? kthread_flush_work_fn+0x10/0x10 [ 490.248293] ret_from_fork+0x35/0x40 [ 490.251869] Modules linked in: lpfc(+) xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4i [ 490.332609] CR2: 0000000000000000
Link: https://lore.kernel.org/r/20210809150947.18104-1-emilne@redhat.com Fixes: 93a4d6f40198 ("scsi: lpfc: Add registration for CPU Offline/Online events") Cc: stable@vger.kernel.org Reviewed-by: James Smart jsmart2021@gmail.com Signed-off-by: Ewan D. Milne emilne@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/lpfc/lpfc_init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -13091,6 +13091,8 @@ lpfc_pci_probe_one_s4(struct pci_dev *pd if (!phba) return -ENOMEM;
+ INIT_LIST_HEAD(&phba->poll_list); + /* Perform generic PCI device enabling operation */ error = lpfc_enable_pci_dev(phba); if (error) @@ -13225,7 +13227,6 @@ lpfc_pci_probe_one_s4(struct pci_dev *pd /* Enable RAS FW log support */ lpfc_sli4_ras_setup(phba);
- INIT_LIST_HEAD(&phba->poll_list); timer_setup(&phba->cpuhp_poll_timer, lpfc_sli4_poll_hbtimer, 0); cpuhp_state_add_instance_nocalls(lpfc_cpuhp_state, &phba->cpuhp);
From: Tejun Heo tj@kernel.org
commit c3df5fb57fe8756d67fd56ed29da65cdfde839f9 upstream.
0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq context. However, rstat paths use u64_stats_sync to synchronize access to 64bit stat counters on 32bit machines. u64_stats_sync is implemented using seq_lock and trying to read from an irq context can lead to A-A deadlock if the irq happens to interrupt the stat update.
Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and u64_stats_update_end_irqrestore() - in the update paths. Note that none of this matters on 64bit machines. All these are just for 32bit SMP setups.
Note that the interface was introduced way back, its first and currently only use was recently added by 2d146aa3aa84 ("mm: memcontrol: switch to rstat"). Stable tagging targets this commit.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Rik van Riel riel@surriel.com Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-cgroup.c | 14 ++++++++------ kernel/cgroup/rstat.c | 19 +++++++++++-------- 2 files changed, 19 insertions(+), 14 deletions(-)
--- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -774,6 +774,7 @@ static void blkcg_rstat_flush(struct cgr struct blkcg_gq *parent = blkg->parent; struct blkg_iostat_set *bisc = per_cpu_ptr(blkg->iostat_cpu, cpu); struct blkg_iostat cur, delta; + unsigned long flags; unsigned int seq;
/* fetch the current per-cpu values */ @@ -783,21 +784,21 @@ static void blkcg_rstat_flush(struct cgr } while (u64_stats_fetch_retry(&bisc->sync, seq));
/* propagate percpu delta to global */ - u64_stats_update_begin(&blkg->iostat.sync); + flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); blkg_iostat_set(&delta, &cur); blkg_iostat_sub(&delta, &bisc->last); blkg_iostat_add(&blkg->iostat.cur, &delta); blkg_iostat_add(&bisc->last, &delta); - u64_stats_update_end(&blkg->iostat.sync); + u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
/* propagate global delta to parent (unless that's root) */ if (parent && parent->parent) { - u64_stats_update_begin(&parent->iostat.sync); + flags = u64_stats_update_begin_irqsave(&parent->iostat.sync); blkg_iostat_set(&delta, &blkg->iostat.cur); blkg_iostat_sub(&delta, &blkg->iostat.last); blkg_iostat_add(&parent->iostat.cur, &delta); blkg_iostat_add(&blkg->iostat.last, &delta); - u64_stats_update_end(&parent->iostat.sync); + u64_stats_update_end_irqrestore(&parent->iostat.sync, flags); } }
@@ -832,6 +833,7 @@ static void blkcg_fill_root_iostats(void memset(&tmp, 0, sizeof(tmp)); for_each_possible_cpu(cpu) { struct disk_stats *cpu_dkstats; + unsigned long flags;
cpu_dkstats = per_cpu_ptr(bdev->bd_stats, cpu); tmp.ios[BLKG_IOSTAT_READ] += @@ -848,9 +850,9 @@ static void blkcg_fill_root_iostats(void tmp.bytes[BLKG_IOSTAT_DISCARD] += cpu_dkstats->sectors[STAT_DISCARD] << 9;
- u64_stats_update_begin(&blkg->iostat.sync); + flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync); blkg_iostat_set(&blkg->iostat.cur, &tmp); - u64_stats_update_end(&blkg->iostat.sync); + u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags); } } } --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -347,19 +347,20 @@ static void cgroup_base_stat_flush(struc }
static struct cgroup_rstat_cpu * -cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp) +cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags) { struct cgroup_rstat_cpu *rstatc;
rstatc = get_cpu_ptr(cgrp->rstat_cpu); - u64_stats_update_begin(&rstatc->bsync); + *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); return rstatc; }
static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, - struct cgroup_rstat_cpu *rstatc) + struct cgroup_rstat_cpu *rstatc, + unsigned long flags) { - u64_stats_update_end(&rstatc->bsync); + u64_stats_update_end_irqrestore(&rstatc->bsync, flags); cgroup_rstat_updated(cgrp, smp_processor_id()); put_cpu_ptr(rstatc); } @@ -367,18 +368,20 @@ static void cgroup_base_stat_cputime_acc void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) { struct cgroup_rstat_cpu *rstatc; + unsigned long flags;
- rstatc = cgroup_base_stat_cputime_account_begin(cgrp); + rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); rstatc->bstat.cputime.sum_exec_runtime += delta_exec; - cgroup_base_stat_cputime_account_end(cgrp, rstatc); + cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags); }
void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec) { struct cgroup_rstat_cpu *rstatc; + unsigned long flags;
- rstatc = cgroup_base_stat_cputime_account_begin(cgrp); + rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
switch (index) { case CPUTIME_USER: @@ -394,7 +397,7 @@ void __cgroup_account_cputime_field(stru break; }
- cgroup_base_stat_cputime_account_end(cgrp, rstatc); + cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags); }
/*
From: Hsuan-Chi Kuo hsuanchikuo@gmail.com
commit b4d8a58f8dcfcc890f296696cadb76e77be44b5f upstream.
The desired behavior is to set the caller's filter count to thread's. This value is reported via /proc, so this fixes the inaccurate count exposed to userspace; it is not used for reference counting, etc.
Signed-off-by: Hsuan-Chi Kuo hsuanchikuo@gmail.com Link: https://lore.kernel.org/r/20210304233708.420597-1-hsuanchikuo@gmail.com Co-developed-by: Wiktor Garbacz wiktorg@google.com Signed-off-by: Wiktor Garbacz wiktorg@google.com Link: https://lore.kernel.org/lkml/20210810125158.329849-1-wiktorg@google.com Signed-off-by: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/seccomp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -602,7 +602,7 @@ static inline void seccomp_sync_threads( smp_store_release(&thread->seccomp.filter, caller->seccomp.filter); atomic_set(&thread->seccomp.filter_count, - atomic_read(&thread->seccomp.filter_count)); + atomic_read(&caller->seccomp.filter_count));
/* * Don't let an unprivileged task work around
From: Loic Poulain loic.poulain@linaro.org
commit 34737e1320db6d51f0d140d5c684b9eb32f0da76 upstream.
Lockdep detected possible interrupt unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&mhiwwan->rx_lock); local_irq_disable(); lock(&mhi_cntrl->pm_lock); lock(&mhiwwan->rx_lock); <Interrupt> lock(&mhi_cntrl->pm_lock);
*** DEADLOCK ***
To prevent this we need to disable the soft-interrupts when taking the rx_lock.
Cc: stable@vger.kernel.org Fixes: fa588eba632d ("net: Add Qcom WWAN control driver") Reported-by: Thomas Perrot thomas.perrot@bootlin.com Signed-off-by: Loic Poulain loic.poulain@linaro.org Reviewed-by: Sergey Ryazanov ryazanov.s.a@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wwan/mhi_wwan_ctrl.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/net/wwan/mhi_wwan_ctrl.c +++ b/drivers/net/wwan/mhi_wwan_ctrl.c @@ -41,14 +41,14 @@ struct mhi_wwan_dev { /* Increment RX budget and schedule RX refill if necessary */ static void mhi_wwan_rx_budget_inc(struct mhi_wwan_dev *mhiwwan) { - spin_lock(&mhiwwan->rx_lock); + spin_lock_bh(&mhiwwan->rx_lock);
mhiwwan->rx_budget++;
if (test_bit(MHI_WWAN_RX_REFILL, &mhiwwan->flags)) schedule_work(&mhiwwan->rx_refill);
- spin_unlock(&mhiwwan->rx_lock); + spin_unlock_bh(&mhiwwan->rx_lock); }
/* Decrement RX budget if non-zero and return true on success */ @@ -56,7 +56,7 @@ static bool mhi_wwan_rx_budget_dec(struc { bool ret = false;
- spin_lock(&mhiwwan->rx_lock); + spin_lock_bh(&mhiwwan->rx_lock);
if (mhiwwan->rx_budget) { mhiwwan->rx_budget--; @@ -64,7 +64,7 @@ static bool mhi_wwan_rx_budget_dec(struc ret = true; }
- spin_unlock(&mhiwwan->rx_lock); + spin_unlock_bh(&mhiwwan->rx_lock);
return ret; } @@ -130,9 +130,9 @@ static void mhi_wwan_ctrl_stop(struct ww { struct mhi_wwan_dev *mhiwwan = wwan_port_get_drvdata(port);
- spin_lock(&mhiwwan->rx_lock); + spin_lock_bh(&mhiwwan->rx_lock); clear_bit(MHI_WWAN_RX_REFILL, &mhiwwan->flags); - spin_unlock(&mhiwwan->rx_lock); + spin_unlock_bh(&mhiwwan->rx_lock);
cancel_work_sync(&mhiwwan->rx_refill);
From: Grygorii Strashko grygorii.strashko@ti.com
commit acc68b8d2a1196c4db806947606f162dbeed2274 upstream.
The CPSW switchdev driver inherited fix from commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min eth packet size") which changes min TX packet size to 64bytes (VLAN_ETH_ZLEN, excluding ETH_FCS). It was done to fix HW packed drop issue when packets are sent from Host to the port with PVID and un-tagging enabled. Unfortunately this breaks some other non-switch specific use-cases, like: - [1] CPSW port as DSA CPU port with DSA-tag applied at the end of the packet - [2] Some industrial protocols, which expects min TX packet size 60Bytes (excluding FCS).
Fix it by configuring min TX packet size depending on driver mode - 60Bytes (ETH_ZLEN) for multi mac (dual-mac) mode - 64Bytes (VLAN_ETH_ZLEN) for switch mode and update it during driver mode change and annotate with READ_ONCE()/WRITE_ONCE() as it can be read by napi while writing.
[1] https://lore.kernel.org/netdev/20210531124051.GA15218@cephalopod/ [2] https://e2e.ti.com/support/arm/sitara_arm/f/791/t/701669
Cc: stable@vger.kernel.org Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac") Reported-by: Ben Hutchings ben.hutchings@essensium.com Signed-off-by: Grygorii Strashko grygorii.strashko@ti.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ti/cpsw_new.c | 7 +++++-- drivers/net/ethernet/ti/cpsw_priv.h | 4 +++- 2 files changed, 8 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -920,7 +920,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(s struct cpdma_chan *txch; int ret, q_idx;
- if (skb_padto(skb, CPSW_MIN_PACKET_SIZE)) { + if (skb_put_padto(skb, READ_ONCE(priv->tx_packet_min))) { cpsw_err(priv, tx_err, "packet pad failed\n"); ndev->stats.tx_dropped++; return NET_XMIT_DROP; @@ -1100,7 +1100,7 @@ static int cpsw_ndo_xdp_xmit(struct net_
for (i = 0; i < n; i++) { xdpf = frames[i]; - if (xdpf->len < CPSW_MIN_PACKET_SIZE) + if (xdpf->len < READ_ONCE(priv->tx_packet_min)) break;
if (cpsw_xdp_tx_frame(priv, xdpf, NULL, priv->emac_port)) @@ -1389,6 +1389,7 @@ static int cpsw_create_ports(struct cpsw priv->dev = dev; priv->msg_enable = netif_msg_init(debug_level, CPSW_DEBUG); priv->emac_port = i + 1; + priv->tx_packet_min = CPSW_MIN_PACKET_SIZE;
if (is_valid_ether_addr(slave_data->mac_addr)) { ether_addr_copy(priv->mac_addr, slave_data->mac_addr); @@ -1686,6 +1687,7 @@ static int cpsw_dl_switch_mode_set(struc
priv = netdev_priv(sl_ndev); slave->port_vlan = vlan; + WRITE_ONCE(priv->tx_packet_min, CPSW_MIN_PACKET_SIZE_VLAN); if (netif_running(sl_ndev)) cpsw_port_add_switch_def_ale_entries(priv, slave); @@ -1714,6 +1716,7 @@ static int cpsw_dl_switch_mode_set(struc
priv = netdev_priv(slave->ndev); slave->port_vlan = slave->data->dual_emac_res_vlan; + WRITE_ONCE(priv->tx_packet_min, CPSW_MIN_PACKET_SIZE); cpsw_port_add_dual_emac_def_ale_entries(priv, slave); }
--- a/drivers/net/ethernet/ti/cpsw_priv.h +++ b/drivers/net/ethernet/ti/cpsw_priv.h @@ -89,7 +89,8 @@ do { \
#define CPSW_POLL_WEIGHT 64 #define CPSW_RX_VLAN_ENCAP_HDR_SIZE 4 -#define CPSW_MIN_PACKET_SIZE (VLAN_ETH_ZLEN) +#define CPSW_MIN_PACKET_SIZE_VLAN (VLAN_ETH_ZLEN) +#define CPSW_MIN_PACKET_SIZE (ETH_ZLEN) #define CPSW_MAX_PACKET_SIZE (VLAN_ETH_FRAME_LEN +\ ETH_FCS_LEN +\ CPSW_RX_VLAN_ENCAP_HDR_SIZE) @@ -380,6 +381,7 @@ struct cpsw_priv { u32 emac_port; struct cpsw_common *cpsw; int offload_fwd_mark; + u32 tx_packet_min; };
#define ndev_to_cpsw(ndev) (((struct cpsw_priv *)netdev_priv(ndev))->cpsw)
From: Vineet Gupta vgupta@synopsys.com
commit 3a715e80400f452b247caa55344f4f60250ffbcf upstream.
FPU_STATUS register contains FP exception flags bits which are updated by core as side-effect of FP instructions but can also be manually wiggled such as by glibc C99 functions fe{raise,clear,test}except() etc. To effect the update, the programming model requires OR'ing FWE bit (31). This bit is write-only and RAZ, meaning it is effectively auto-cleared after write and thus needs to be set everytime: which is how glibc implements this.
However there's another usecase of FPU_STATUS update, at the time of Linux task switch when incoming task value needs to be programmed into the register. This was added as part of f45ba2bd6da0dc ("ARCv2: fpu: preserve userspace fpu state") which missed OR'ing FWE bit, meaning the new value is effectively not being written at all. This patch remedies that.
Interestingly, this snafu was not caught in interm glibc testing as the race window which relies on a specific exception bit to be set/clear is really small specially when it nvolves context switch. Fortunately this was caught by glibc's math/test-fenv-tls test which repeatedly set/clear exception flags in a big loop, concurrently in main program and also in a thread.
Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/54 Fixes: f45ba2bd6da0dc ("ARCv2: fpu: preserve userspace fpu state") Cc: stable@vger.kernel.org #5.6+ Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arc/kernel/fpu.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/arc/kernel/fpu.c +++ b/arch/arc/kernel/fpu.c @@ -57,23 +57,26 @@ void fpu_save_restore(struct task_struct
void fpu_init_task(struct pt_regs *regs) { + const unsigned int fwe = 0x80000000; + /* default rounding mode */ write_aux_reg(ARC_REG_FPU_CTRL, 0x100);
- /* set "Write enable" to allow explicit write to exception flags */ - write_aux_reg(ARC_REG_FPU_STATUS, 0x80000000); + /* Initialize to zero: setting requires FWE be set */ + write_aux_reg(ARC_REG_FPU_STATUS, fwe); }
void fpu_save_restore(struct task_struct *prev, struct task_struct *next) { struct arc_fpu *save = &prev->thread.fpu; struct arc_fpu *restore = &next->thread.fpu; + const unsigned int fwe = 0x80000000;
save->ctrl = read_aux_reg(ARC_REG_FPU_CTRL); save->status = read_aux_reg(ARC_REG_FPU_STATUS);
write_aux_reg(ARC_REG_FPU_CTRL, restore->ctrl); - write_aux_reg(ARC_REG_FPU_STATUS, restore->status); + write_aux_reg(ARC_REG_FPU_STATUS, (fwe | restore->status)); }
#endif
From: Luis Henriques lhenriques@suse.de
commit bf2ba432213fade50dd39f2e348085b758c0726e upstream.
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work workqueue and it can be kept looping for quite some time if caps keep being added back to the mdsc->cap_delay_list. This may result in the watchdog tainting the kernel with the softlockup flag.
This patch breaks this loop if the caps have been recently (i.e. during the loop execution). Any new caps added to the list will be handled in the next run.
Also, allow schedule_delayed() callers to explicitly set the delay value instead of defaulting to 5s, so we can ensure that it runs soon afterward if it looks like there is more work.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46284 Signed-off-by: Luis Henriques lhenriques@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/caps.c | 17 ++++++++++++++++- fs/ceph/mds_client.c | 25 ++++++++++++++++--------- fs/ceph/super.h | 2 +- 3 files changed, 33 insertions(+), 11 deletions(-)
--- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4224,11 +4224,19 @@ bad:
/* * Delayed work handler to process end of delayed cap release LRU list. + * + * If new caps are added to the list while processing it, these won't get + * processed in this run. In this case, the ci->i_hold_caps_max will be + * returned so that the work can be scheduled accordingly. */ -void ceph_check_delayed_caps(struct ceph_mds_client *mdsc) +unsigned long ceph_check_delayed_caps(struct ceph_mds_client *mdsc) { struct inode *inode; struct ceph_inode_info *ci; + struct ceph_mount_options *opt = mdsc->fsc->mount_options; + unsigned long delay_max = opt->caps_wanted_delay_max * HZ; + unsigned long loop_start = jiffies; + unsigned long delay = 0;
dout("check_delayed_caps\n"); spin_lock(&mdsc->cap_delay_lock); @@ -4236,6 +4244,11 @@ void ceph_check_delayed_caps(struct ceph ci = list_first_entry(&mdsc->cap_delay_list, struct ceph_inode_info, i_cap_delay_list); + if (time_before(loop_start, ci->i_hold_caps_max - delay_max)) { + dout("%s caps added recently. Exiting loop", __func__); + delay = ci->i_hold_caps_max; + break; + } if ((ci->i_ceph_flags & CEPH_I_FLUSH) == 0 && time_before(jiffies, ci->i_hold_caps_max)) break; @@ -4252,6 +4265,8 @@ void ceph_check_delayed_caps(struct ceph } } spin_unlock(&mdsc->cap_delay_lock); + + return delay; }
/* --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4502,22 +4502,29 @@ void inc_session_sequence(struct ceph_md }
/* - * delayed work -- periodically trim expired leases, renew caps with mds + * delayed work -- periodically trim expired leases, renew caps with mds. If + * the @delay parameter is set to 0 or if it's more than 5 secs, the default + * workqueue delay value of 5 secs will be used. */ -static void schedule_delayed(struct ceph_mds_client *mdsc) +static void schedule_delayed(struct ceph_mds_client *mdsc, unsigned long delay) { - int delay = 5; - unsigned hz = round_jiffies_relative(HZ * delay); - schedule_delayed_work(&mdsc->delayed_work, hz); + unsigned long max_delay = HZ * 5; + + /* 5 secs default delay */ + if (!delay || (delay > max_delay)) + delay = max_delay; + schedule_delayed_work(&mdsc->delayed_work, + round_jiffies_relative(delay)); }
static void delayed_work(struct work_struct *work) { - int i; struct ceph_mds_client *mdsc = container_of(work, struct ceph_mds_client, delayed_work.work); + unsigned long delay; int renew_interval; int renew_caps; + int i;
dout("mdsc delayed_work\n");
@@ -4557,7 +4564,7 @@ static void delayed_work(struct work_str } mutex_unlock(&mdsc->mutex);
- ceph_check_delayed_caps(mdsc); + delay = ceph_check_delayed_caps(mdsc);
ceph_queue_cap_reclaim_work(mdsc);
@@ -4565,7 +4572,7 @@ static void delayed_work(struct work_str
maybe_recover_session(mdsc);
- schedule_delayed(mdsc); + schedule_delayed(mdsc, delay); }
int ceph_mdsc_init(struct ceph_fs_client *fsc) @@ -5042,7 +5049,7 @@ void ceph_mdsc_handle_mdsmap(struct ceph mdsc->mdsmap->m_epoch);
mutex_unlock(&mdsc->mutex); - schedule_delayed(mdsc); + schedule_delayed(mdsc, 0); return;
bad_unlock: --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -1170,7 +1170,7 @@ extern void ceph_flush_snaps(struct ceph extern bool __ceph_should_report_size(struct ceph_inode_info *ci); extern void ceph_check_caps(struct ceph_inode_info *ci, int flags, struct ceph_mds_session *session); -extern void ceph_check_delayed_caps(struct ceph_mds_client *mdsc); +extern unsigned long ceph_check_delayed_caps(struct ceph_mds_client *mdsc); extern void ceph_flush_dirty_caps(struct ceph_mds_client *mdsc); extern int ceph_drop_caps_for_unlink(struct inode *inode); extern int ceph_encode_inode_release(void **p, struct inode *inode,
From: Damien Le Moal damien.lemoal@wdc.com
commit 31697ef7f3f45293bba3da87bcc710953e97fc3e upstream.
In k210_fpioa_probe(), add missing calls to clk_disable_unprepare() in case of error after cenabling the clk and pclk clocks. Also add missing error handling when enabling pclk.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: d4c34d09ab03 ("pinctrl: Add RISC-V Canaan Kendryte K210 FPIOA driver") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal damien.lemoal@wdc.com Link: https://lore.kernel.org/r/20210806004311.52859-1-damien.lemoal@wdc.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/pinctrl-k210.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)
--- a/drivers/pinctrl/pinctrl-k210.c +++ b/drivers/pinctrl/pinctrl-k210.c @@ -950,23 +950,37 @@ static int k210_fpioa_probe(struct platf return ret;
pdata->pclk = devm_clk_get_optional(dev, "pclk"); - if (!IS_ERR(pdata->pclk)) - clk_prepare_enable(pdata->pclk); + if (!IS_ERR(pdata->pclk)) { + ret = clk_prepare_enable(pdata->pclk); + if (ret) + goto disable_clk; + }
pdata->sysctl_map = syscon_regmap_lookup_by_phandle_args(np, "canaan,k210-sysctl-power", 1, &pdata->power_offset); - if (IS_ERR(pdata->sysctl_map)) - return PTR_ERR(pdata->sysctl_map); + if (IS_ERR(pdata->sysctl_map)) { + ret = PTR_ERR(pdata->sysctl_map); + goto disable_pclk; + }
k210_fpioa_init_ties(pdata);
pdata->pctl = pinctrl_register(&k210_pinctrl_desc, dev, (void *)pdata); - if (IS_ERR(pdata->pctl)) - return PTR_ERR(pdata->pctl); + if (IS_ERR(pdata->pctl)) { + ret = PTR_ERR(pdata->pctl); + goto disable_pclk; + }
return 0; + +disable_pclk: + clk_disable_unprepare(pdata->pclk); +disable_clk: + clk_disable_unprepare(pdata->clk); + + return ret; }
static const struct of_device_id k210_fpioa_dt_ids[] = {
From: Dan Williams dan.j.williams@intel.com
commit b93dfa6bda4d4e88e5386490f2b277a26958f9d3 upstream.
Fix the NFIT parsing code to treat a 0 index in a SPA Range Structure as a special case and not match Region Mapping Structures that use 0 to indicate that they are not mapped. Without this fix some platform BIOS descriptions of "virtual disk" ranges do not result in the pmem driver attaching to the range.
Details: In addition to typical persistent memory ranges, the ACPI NFIT may also convey "virtual" ranges. These ranges are indicated by a UUID in the SPA Range Structure of UUID_VOLATILE_VIRTUAL_DISK, UUID_VOLATILE_VIRTUAL_CD, UUID_PERSISTENT_VIRTUAL_DISK, or UUID_PERSISTENT_VIRTUAL_CD. The critical difference between virtual ranges and UUID_PERSISTENT_MEMORY, is that virtual do not support associations with Region Mapping Structures. For this reason the "index" value of virtual SPA Range Structures is allowed to be 0. If a platform BIOS decides to represent NVDIMMs with disconnected "Region Mapping Structures" (range-index == 0), the kernel may falsely associate them with standalone ranges where the "SPA Range Structure Index" is also zero. When this happens the driver may falsely require labels where "virtual disks" are expected to be label-less. I.e. "label-less" is where the namespace-range == region-range and the pmem driver attaches with no user action to create a namespace.
Cc: Jacek Zloch jacek.zloch@intel.com Cc: Lukasz Sobieraj lukasz.sobieraj@intel.com Cc: "Lee, Chun-Yi" jlee@suse.com Cc: stable@vger.kernel.org Fixes: c2f32acdf848 ("acpi, nfit: treat virtual ramdisk SPA as pmem region") Reported-by: Krzysztof Rusocki krzysztof.rusocki@intel.com Reported-by: Damian Bassa damian.bassa@intel.com Reviewed-by: Jeff Moyer jmoyer@redhat.com Link: https://lore.kernel.org/r/162870796589.2521182.1240403310175570220.stgit@dwi... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/nfit/core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -3021,6 +3021,9 @@ static int acpi_nfit_register_region(str struct acpi_nfit_memory_map *memdev = nfit_memdev->memdev; struct nd_mapping_desc *mapping;
+ /* range index 0 == unmapped in SPA or invalid-SPA */ + if (memdev->range_index == 0 || spa->range_index == 0) + continue; if (memdev->range_index != spa->range_index) continue; if (count >= ND_MAX_MAPPINGS) {
From: Dan Williams dan.j.williams@intel.com
commit d9cee9f85b22fab88d2b76d2e92b18e3d0e6aa8c upstream.
There are a few scenarios where init_active_labels() can return without registering deactivate_labels() to run when the region is disabled. In particular label error injection creates scenarios where a DIMM is disabled, but labels on other DIMMs in the region become activated.
Arrange for init_active_labels() to always register deactivate_labels().
Reported-by: Krzysztof Kensicki krzysztof.kensicki@intel.com Cc: stable@vger.kernel.org Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation.") Reviewed-by: Jeff Moyer jmoyer@redhat.com Link: https://lore.kernel.org/r/162766356450.3223041.1183118139023841447.stgit@dwi... Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvdimm/namespace_devs.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -2527,7 +2527,7 @@ static void deactivate_labels(void *regi
static int init_active_labels(struct nd_region *nd_region) { - int i; + int i, rc = 0;
for (i = 0; i < nd_region->ndr_mappings; i++) { struct nd_mapping *nd_mapping = &nd_region->mapping[i]; @@ -2546,13 +2546,14 @@ static int init_active_labels(struct nd_ else if (test_bit(NDD_LABELING, &nvdimm->flags)) /* fail, labels needed to disambiguate dpa */; else - return 0; + continue;
dev_err(&nd_region->dev, "%s: is %s, failing probe\n", dev_name(&nd_mapping->nvdimm->dev), test_bit(NDD_LOCKED, &nvdimm->flags) ? "locked" : "disabled"); - return -ENXIO; + rc = -ENXIO; + goto out; } nd_mapping->ndd = ndd; atomic_inc(&nvdimm->busy); @@ -2586,13 +2587,17 @@ static int init_active_labels(struct nd_ break; }
- if (i < nd_region->ndr_mappings) { + if (i < nd_region->ndr_mappings) + rc = -ENOMEM; + +out: + if (rc) { deactivate_labels(nd_region); - return -ENOMEM; + return rc; }
return devm_add_action_or_reset(&nd_region->dev, deactivate_labels, - nd_region); + nd_region); }
int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
From: Changbin Du changbin.du@gmail.com
commit 030d6dbf0c2e5fdf23ad29557f0c87a882993e26 upstream.
The RISC-V special option '-mno-relax' which to disable linker relaxations is supported by GCC8+. For GCC7 and lower versions do not support this option.
Fixes: fba8a8674f68 ("RISC-V: Add kexec support") Signed-off-by: Changbin Du changbin.du@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -11,7 +11,7 @@ endif CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
ifdef CONFIG_KEXEC -AFLAGS_kexec_relocate.o := -mcmodel=medany -mno-relax +AFLAGS_kexec_relocate.o := -mcmodel=medany $(call cc-option,-mno-relax) endif
extra-y += head.o
From: Nathan Chancellor nathan@kernel.org
commit 848378812e40152abe9b9baf58ce2004f76fb988 upstream.
A recent change in LLVM causes module_{c,d}tor sections to appear when CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings because these are not handled anywhere:
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor'
Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN flag, so it is in a separate section even with -fno-function-sections (default)".
Place them in the TEXT_TEXT section so that these technologies continue to work with the newer compiler versions. All of the KASAN and KCSAN KUnit tests continue to pass after this change.
Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1432 Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08a... Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Fangrui Song maskray@google.com Acked-by: Marco Elver elver@google.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/asm-generic/vmlinux.lds.h | 1 + 1 file changed, 1 insertion(+)
--- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -586,6 +586,7 @@ NOINSTR_TEXT \ *(.text..refcount) \ *(.ref.text) \ + *(.text.asan.* .text.tsan.*) \ TEXT_CFI_JT \ MEM_KEEP(init.text*) \ MEM_KEEP(exit.text*) \
From: Zhenyu Wang zhenyuw@linux.intel.com
commit 699aa57b35672c3b2f230e2b7e5d0ab8c2bde80a upstream.
We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to commit 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue.
This tries to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one.
Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" terrence.xu@intel.com Cc: "Bloomfield, Jon" jon.bloomfield@intel.com Cc: "Ekstrand, Jason" jason.ekstrand@intel.com Reviewed-by: Colin Xu colin.xu@intel.com Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20210806044056.648016-1-zhenyuw... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)
--- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3149,6 +3149,7 @@ static int init_bdw_mmio_info(struct int MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW); + MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write); --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mm {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ + {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ + {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
From: Ankit Nautiyal ankit.k.nautiyal@intel.com
commit abd9d66a055722393d33685214c08386694871d7 upstream.
Till DISPLAY12 the PIPE_MISC bits 5-7 are used to set the Dithering BPC, with valid values of 6, 8, 10 BPC. For ADLP+ these bits are used to set the PORT OUTPUT BPC, with valid values of: 6, 8, 10, 12 BPC, and need to be programmed whether dithering is enabled or not.
This patch: -corrects the bits 5-7 for PIPE MISC register for 12 BPC. -renames the bits and mask to have generic names for these bits for dithering bpc and port output bpc.
v3: Added a note for MIPI DSI which uses the PIPE_MISC for readout for pipe_bpp. (Uma Shankar)
v2: Added 'display' to the subject and fixes tag. (Uma Shankar)
Fixes: 756f85cffef2 ("drm/i915/bdw: Broadwell has PIPEMISC") Cc: Paulo Zanoni paulo.r.zanoni@intel.com (v1) Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: intel-gfx@lists.freedesktop.org Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Ankit Nautiyal ankit.k.nautiyal@intel.com Reviewed-by: Uma Shankar uma.shankar@intel.com Signed-off-by: Uma Shankar uma.shankar@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20210811051857.109723-1-ankit.... (cherry picked from commit 70418a68713c13da3f36c388087d0220b456a430) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_display.c | 34 +++++++++++++++++++-------- drivers/gpu/drm/i915/i915_reg.h | 16 ++++++++---- 2 files changed, 35 insertions(+), 15 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -5424,16 +5424,18 @@ static void bdw_set_pipemisc(const struc
switch (crtc_state->pipe_bpp) { case 18: - val |= PIPEMISC_DITHER_6_BPC; + val |= PIPEMISC_6_BPC; break; case 24: - val |= PIPEMISC_DITHER_8_BPC; + val |= PIPEMISC_8_BPC; break; case 30: - val |= PIPEMISC_DITHER_10_BPC; + val |= PIPEMISC_10_BPC; break; case 36: - val |= PIPEMISC_DITHER_12_BPC; + /* Port output 12BPC defined for ADLP+ */ + if (DISPLAY_VER(dev_priv) > 12) + val |= PIPEMISC_12_BPC_ADLP; break; default: MISSING_CASE(crtc_state->pipe_bpp); @@ -5469,15 +5471,27 @@ int bdw_get_pipemisc_bpp(struct intel_cr
tmp = intel_de_read(dev_priv, PIPEMISC(crtc->pipe));
- switch (tmp & PIPEMISC_DITHER_BPC_MASK) { - case PIPEMISC_DITHER_6_BPC: + switch (tmp & PIPEMISC_BPC_MASK) { + case PIPEMISC_6_BPC: return 18; - case PIPEMISC_DITHER_8_BPC: + case PIPEMISC_8_BPC: return 24; - case PIPEMISC_DITHER_10_BPC: + case PIPEMISC_10_BPC: return 30; - case PIPEMISC_DITHER_12_BPC: - return 36; + /* + * PORT OUTPUT 12 BPC defined for ADLP+. + * + * TODO: + * For previous platforms with DSI interface, bits 5:7 + * are used for storing pipe_bpp irrespective of dithering. + * Since the value of 12 BPC is not defined for these bits + * on older platforms, need to find a workaround for 12 BPC + * MIPI DSI HW readout. + */ + case PIPEMISC_12_BPC_ADLP: + if (DISPLAY_VER(dev_priv) > 12) + return 36; + fallthrough; default: MISSING_CASE(tmp); return 0; --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6134,11 +6134,17 @@ enum { #define PIPEMISC_HDR_MODE_PRECISION (1 << 23) /* icl+ */ #define PIPEMISC_OUTPUT_COLORSPACE_YUV (1 << 11) #define PIPEMISC_PIXEL_ROUNDING_TRUNC REG_BIT(8) /* tgl+ */ -#define PIPEMISC_DITHER_BPC_MASK (7 << 5) -#define PIPEMISC_DITHER_8_BPC (0 << 5) -#define PIPEMISC_DITHER_10_BPC (1 << 5) -#define PIPEMISC_DITHER_6_BPC (2 << 5) -#define PIPEMISC_DITHER_12_BPC (3 << 5) +/* + * For Display < 13, Bits 5-7 of PIPE MISC represent DITHER BPC with + * valid values of: 6, 8, 10 BPC. + * ADLP+, the bits 5-7 represent PORT OUTPUT BPC with valid values of: + * 6, 8, 10, 12 BPC. + */ +#define PIPEMISC_BPC_MASK (7 << 5) +#define PIPEMISC_8_BPC (0 << 5) +#define PIPEMISC_10_BPC (1 << 5) +#define PIPEMISC_6_BPC (2 << 5) +#define PIPEMISC_12_BPC_ADLP (4 << 5) /* adlp+ */ #define PIPEMISC_DITHER_ENABLE (1 << 4) #define PIPEMISC_DITHER_TYPE_MASK (3 << 2) #define PIPEMISC_DITHER_TYPE_SP (0 << 2)
From: Eric Bernstein eric.bernstein@amd.com
commit c90f6263f58a28c3d97b83679d6fd693b33dfd4e upstream.
Reviewed-by: Dmytro Laktyushkin Dmytro.Laktyushkin@amd.com Acked-by: Anson Jacob Anson.Jacob@amd.com Signed-off-by: Eric Bernstein eric.bernstein@amd.com Cc: stable@vger.kernel.org Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c @@ -1788,7 +1788,6 @@ static bool dcn30_split_stream_for_mpc_o } pri_pipe->next_odm_pipe = sec_pipe; sec_pipe->prev_odm_pipe = pri_pipe; - ASSERT(sec_pipe->top_pipe == NULL);
if (!sec_pipe->top_pipe) sec_pipe->stream_res.opp = pool->opps[pipe_idx];
From: Anson Jacob Anson.Jacob@amd.com
commit 0cde63a8fc4d9f9f580c297211fd05f91c0fd66d upstream.
Replace GFP_KERNEL with GFP_ATOMIC as amdgpu_dm_irq_schedule_work can't sleep.
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 253, name: kworker/6:1H CPU: 6 PID: 253 Comm: kworker/6:1H Tainted: G W OE 5.11.0-promotion_2021_06_07-18_36_28_prelim_revert_retrain #8 Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 3405 02/01/2021 Workqueue: events_highpri dm_irq_work_func [amdgpu] Call Trace: <IRQ> dump_stack+0x5e/0x74 ___might_sleep.cold+0x87/0x98 __might_sleep+0x4b/0x80 kmem_cache_alloc_trace+0x390/0x4f0 amdgpu_dm_irq_handler+0x171/0x230 [amdgpu] amdgpu_irq_dispatch+0xc0/0x1e0 [amdgpu] amdgpu_ih_process+0x81/0x100 [amdgpu] amdgpu_irq_handler+0x26/0xa0 [amdgpu] __handle_irq_event_percpu+0x49/0x190 ? __hrtimer_get_next_event+0x4d/0x80 handle_irq_event_percpu+0x33/0x80 handle_irq_event+0x33/0x60 handle_edge_irq+0x82/0x190 asm_call_irq_on_stack+0x12/0x20 </IRQ> common_interrupt+0xbb/0x140 asm_common_interrupt+0x1e/0x40 RIP: 0010:amdgpu_device_rreg.part.0+0x44/0xf0 [amdgpu] Code: 53 48 89 fb 4c 3b af c8 08 00 00 73 6d 83 e2 02 75 0d f6 87 40 62 01 00 10 0f 85 83 00 00 00 4c 03 ab d0 08 00 00 45 8b 6d 00 <8b> 05 3e b6 52 00 85 c0 7e 62 48 8b 43 08 0f b7 70 3e 65 8b 05 e3 RSP: 0018:ffffae7740fff9e8 EFLAGS: 00000286 RAX: ffffffffc05ee610 RBX: ffff8aaf8f620000 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000005430 RDI: ffff8aaf8f620000 RBP: ffffae7740fffa08 R08: 0000000000000001 R09: 000000000000000a R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000005430 R13: 0000000071000000 R14: 0000000000000001 R15: 0000000000005430 ? amdgpu_cgs_write_register+0x20/0x20 [amdgpu] amdgpu_device_rreg+0x17/0x20 [amdgpu] amdgpu_cgs_read_register+0x14/0x20 [amdgpu] dm_read_reg_func+0x38/0xb0 [amdgpu] generic_reg_wait+0x80/0x160 [amdgpu] dce_aux_transfer_raw+0x324/0x7c0 [amdgpu] dc_link_aux_transfer_raw+0x43/0x50 [amdgpu] dm_dp_aux_transfer+0x87/0x110 [amdgpu] drm_dp_dpcd_access+0x72/0x110 [drm_kms_helper] drm_dp_dpcd_read+0xb7/0xf0 [drm_kms_helper] drm_dp_get_one_sb_msg+0x349/0x480 [drm_kms_helper] drm_dp_mst_hpd_irq+0xc5/0xe40 [drm_kms_helper] ? drm_dp_mst_hpd_irq+0xc5/0xe40 [drm_kms_helper] dm_handle_hpd_rx_irq+0x184/0x1a0 [amdgpu] ? dm_handle_hpd_rx_irq+0x184/0x1a0 [amdgpu] handle_hpd_rx_irq+0x195/0x240 [amdgpu] ? __switch_to_asm+0x42/0x70 ? __switch_to+0x131/0x450 dm_irq_work_func+0x19/0x20 [amdgpu] process_one_work+0x209/0x400 worker_thread+0x4d/0x3e0 ? cancel_delayed_work+0xa0/0xa0 kthread+0x124/0x160 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30
Reviewed-by: Aurabindo Jayamohanan Pillai Aurabindo.Pillai@amd.com Acked-by: Anson Jacob Anson.Jacob@amd.com Signed-off-by: Anson Jacob Anson.Jacob@amd.com Cc: stable@vger.kernel.org Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c @@ -584,7 +584,7 @@ static void amdgpu_dm_irq_schedule_work( handler_data = container_of(handler_list->next, struct amdgpu_dm_irq_handler_data, list);
/*allocate a new amdgpu_dm_irq_handler_data*/ - handler_data_add = kzalloc(sizeof(*handler_data), GFP_KERNEL); + handler_data_add = kzalloc(sizeof(*handler_data), GFP_ATOMIC); if (!handler_data_add) { DRM_ERROR("DM_IRQ: failed to allocate irq handler!\n"); return;
From: Solomon Chiu solomon.chiu@amd.com
commit 46dd2965bdd1c5a4f6499c73ff32e636fa8f9769 upstream.
[Why] With kernel module parameter "freesync_video" is enabled, if the mode is changed to preferred mode(the mode with highest rate), then Freesync fails because the preferred mode is treated as one of freesync video mode, and then be configurated as freesync video mode(fixed refresh rate).
[How] Skip freesync fixed rate configurating when modeset to preferred mode.
Signed-off-by: Solomon Chiu solomon.chiu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -9410,7 +9410,12 @@ static int dm_update_crtc_state(struct a } else if (amdgpu_freesync_vid_mode && aconnector && is_freesync_video_mode(&new_crtc_state->mode, aconnector)) { - set_freesync_fixed_config(dm_new_crtc_state); + struct drm_display_mode *high_mode; + + high_mode = get_highest_refresh_rate_mode(aconnector, false); + if (!drm_mode_equal(&new_crtc_state->mode, high_mode)) { + set_freesync_fixed_config(dm_new_crtc_state); + } }
ret = dm_atomic_get_state(state, &dm_state);
From: Alex Deucher alexander.deucher@amd.com
commit 202ead5a3c589b0594a75cb99f080174f6851fed upstream.
If the platform uses BOCO, don't use BACO in runtime suspend. We could end up executing the BACO path if the platform supports both.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1669 Reviewed-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1537,6 +1537,8 @@ static int amdgpu_pmops_runtime_suspend( pci_ignore_hotplug(pdev); pci_set_power_state(pdev, PCI_D3cold); drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF; + } else if (amdgpu_device_supports_boco(drm_dev)) { + /* nothing to do */ } else if (amdgpu_device_supports_baco(drm_dev)) { amdgpu_device_baco_enter(drm_dev); }
From: Alex Deucher alexander.deucher@amd.com
commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 upstream.
There may be multiple instances and only one is harvested.
v2: fix typo in commit message
Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1673 Reviewed-by: Guchun Chen guchun.chen@amd.com Reviewed-by: James Zhu James.Zhu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -299,6 +299,9 @@ int amdgpu_discovery_reg_base_init(struc ip->major, ip->minor, ip->revision);
+ if (le16_to_cpu(ip->hw_id) == VCN_HWID) + adev->vcn.num_vcn_inst++; + for (k = 0; k < num_base_address; k++) { /* * convert the endianness of base addresses in place, @@ -377,7 +380,7 @@ void amdgpu_discovery_harvest_ip(struct { struct binary_header *bhdr; struct harvest_table *harvest_info; - int i; + int i, vcn_harvest_count = 0;
bhdr = (struct binary_header *)adev->mman.discovery_bin; harvest_info = (struct harvest_table *)(adev->mman.discovery_bin + @@ -389,8 +392,7 @@ void amdgpu_discovery_harvest_ip(struct
switch (le32_to_cpu(harvest_info->list[i].hw_id)) { case VCN_HWID: - adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK; - adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK; + vcn_harvest_count++; break; case DMU_HWID: adev->harvest_ip_mask |= AMD_HARVEST_IP_DMU_MASK; @@ -399,6 +401,10 @@ void amdgpu_discovery_harvest_ip(struct break; } } + if (vcn_harvest_count == adev->vcn.num_vcn_inst) { + adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK; + adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK; + } }
int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev)
From: Dongliang Mu mudongliangabcd@gmail.com
[ Upstream commit e9faf53c5a5d01f6f2a09ae28ec63a3bbd6f64fd ]
Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE, MAC802154_HWSIM_EDGE_ATTR_ENDPOINT_ID and MAC802154_HWSIM_EDGE_ATTR_LQI must be present to fix GPF.
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: Dongliang Mu mudongliangabcd@gmail.com Acked-by: Alexander Aring aahringo@redhat.com Link: https://lore.kernel.org/r/20210705131321.217111-1-mudongliangabcd@gmail.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/mac802154_hwsim.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ieee802154/mac802154_hwsim.c b/drivers/net/ieee802154/mac802154_hwsim.c index ebc976b7fcc2..cae52bfb871e 100644 --- a/drivers/net/ieee802154/mac802154_hwsim.c +++ b/drivers/net/ieee802154/mac802154_hwsim.c @@ -528,14 +528,14 @@ static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info) u32 v0, v1; u8 lqi;
- if (!info->attrs[MAC802154_HWSIM_ATTR_RADIO_ID] && + if (!info->attrs[MAC802154_HWSIM_ATTR_RADIO_ID] || !info->attrs[MAC802154_HWSIM_ATTR_RADIO_EDGE]) return -EINVAL;
if (nla_parse_nested_deprecated(edge_attrs, MAC802154_HWSIM_EDGE_ATTR_MAX, info->attrs[MAC802154_HWSIM_ATTR_RADIO_EDGE], hwsim_edge_policy, NULL)) return -EINVAL;
- if (!edge_attrs[MAC802154_HWSIM_EDGE_ATTR_ENDPOINT_ID] && + if (!edge_attrs[MAC802154_HWSIM_EDGE_ATTR_ENDPOINT_ID] || !edge_attrs[MAC802154_HWSIM_EDGE_ATTR_LQI]) return -EINVAL;
From: Dongliang Mu mudongliangabcd@gmail.com
[ Upstream commit 889d0e7dc68314a273627d89cbb60c09e1cc1c25 ]
Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE must be present to fix GPF.
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: Dongliang Mu mudongliangabcd@gmail.com Acked-by: Alexander Aring aahringo@redhat.com Link: https://lore.kernel.org/r/20210707155633.1486603-1-mudongliangabcd@gmail.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/mac802154_hwsim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ieee802154/mac802154_hwsim.c b/drivers/net/ieee802154/mac802154_hwsim.c index cae52bfb871e..8caa61ec718f 100644 --- a/drivers/net/ieee802154/mac802154_hwsim.c +++ b/drivers/net/ieee802154/mac802154_hwsim.c @@ -418,7 +418,7 @@ static int hwsim_new_edge_nl(struct sk_buff *msg, struct genl_info *info) struct hwsim_edge *e; u32 v0, v1;
- if (!info->attrs[MAC802154_HWSIM_ATTR_RADIO_ID] && + if (!info->attrs[MAC802154_HWSIM_ATTR_RADIO_ID] || !info->attrs[MAC802154_HWSIM_ATTR_RADIO_EDGE]) return -EINVAL;
From: jason-jh.lin jason-jh.lin@mediatek.com
[ Upstream commit 1a64a7aff8da352c9419de3d5c34343682916411 ]
The cursor plane should use the current plane state in atomic_async_update because it would not be the new plane state in the global atomic state since _swap_state happened when those hook are run.
Fix cursor plane issue by below modification: 1. Remove plane_helper_funcs->atomic_update(plane, state) in mtk_drm_crtc_async_update. 2. Add mtk_drm_update_new_state in to mtk_plane_atomic_async_update to update the cursor plane by current plane state hook and update others plane by the new_state.
Fixes: 37418bf14c13 ("drm: Use state helper instead of the plane state pointer") Signed-off-by: jason-jh.lin jason-jh.lin@mediatek.com Tested-by: Enric Balletbo i Serra enric.balletbo@collabora.com Signed-off-by: Chun-Kuang Hu chunkuang.hu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 3 -- drivers/gpu/drm/mediatek/mtk_drm_plane.c | 60 ++++++++++++++---------- 2 files changed, 34 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index 474efb844249..735efe79f075 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -532,13 +532,10 @@ void mtk_drm_crtc_async_update(struct drm_crtc *crtc, struct drm_plane *plane, struct drm_atomic_state *state) { struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); - const struct drm_plane_helper_funcs *plane_helper_funcs = - plane->helper_private;
if (!mtk_crtc->enabled) return;
- plane_helper_funcs->atomic_update(plane, state); mtk_drm_crtc_update_config(mtk_crtc, false); }
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_plane.c b/drivers/gpu/drm/mediatek/mtk_drm_plane.c index b5582dcf564c..e6dcb34d3052 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_plane.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_plane.c @@ -110,6 +110,35 @@ static int mtk_plane_atomic_async_check(struct drm_plane *plane, true, true); }
+static void mtk_plane_update_new_state(struct drm_plane_state *new_state, + struct mtk_plane_state *mtk_plane_state) +{ + struct drm_framebuffer *fb = new_state->fb; + struct drm_gem_object *gem; + struct mtk_drm_gem_obj *mtk_gem; + unsigned int pitch, format; + dma_addr_t addr; + + gem = fb->obj[0]; + mtk_gem = to_mtk_gem_obj(gem); + addr = mtk_gem->dma_addr; + pitch = fb->pitches[0]; + format = fb->format->format; + + addr += (new_state->src.x1 >> 16) * fb->format->cpp[0]; + addr += (new_state->src.y1 >> 16) * pitch; + + mtk_plane_state->pending.enable = true; + mtk_plane_state->pending.pitch = pitch; + mtk_plane_state->pending.format = format; + mtk_plane_state->pending.addr = addr; + mtk_plane_state->pending.x = new_state->dst.x1; + mtk_plane_state->pending.y = new_state->dst.y1; + mtk_plane_state->pending.width = drm_rect_width(&new_state->dst); + mtk_plane_state->pending.height = drm_rect_height(&new_state->dst); + mtk_plane_state->pending.rotation = new_state->rotation; +} + static void mtk_plane_atomic_async_update(struct drm_plane *plane, struct drm_atomic_state *state) { @@ -126,8 +155,10 @@ static void mtk_plane_atomic_async_update(struct drm_plane *plane, plane->state->src_h = new_state->src_h; plane->state->src_w = new_state->src_w; swap(plane->state->fb, new_state->fb); - new_plane_state->pending.async_dirty = true;
+ mtk_plane_update_new_state(new_state, new_plane_state); + wmb(); /* Make sure the above parameters are set before update */ + new_plane_state->pending.async_dirty = true; mtk_drm_crtc_async_update(new_state->crtc, plane, state); }
@@ -189,14 +220,8 @@ static void mtk_plane_atomic_update(struct drm_plane *plane, struct drm_plane_state *new_state = drm_atomic_get_new_plane_state(state, plane); struct mtk_plane_state *mtk_plane_state = to_mtk_plane_state(new_state); - struct drm_crtc *crtc = new_state->crtc; - struct drm_framebuffer *fb = new_state->fb; - struct drm_gem_object *gem; - struct mtk_drm_gem_obj *mtk_gem; - unsigned int pitch, format; - dma_addr_t addr;
- if (!crtc || WARN_ON(!fb)) + if (!new_state->crtc || WARN_ON(!new_state->fb)) return;
if (!new_state->visible) { @@ -204,24 +229,7 @@ static void mtk_plane_atomic_update(struct drm_plane *plane, return; }
- gem = fb->obj[0]; - mtk_gem = to_mtk_gem_obj(gem); - addr = mtk_gem->dma_addr; - pitch = fb->pitches[0]; - format = fb->format->format; - - addr += (new_state->src.x1 >> 16) * fb->format->cpp[0]; - addr += (new_state->src.y1 >> 16) * pitch; - - mtk_plane_state->pending.enable = true; - mtk_plane_state->pending.pitch = pitch; - mtk_plane_state->pending.format = format; - mtk_plane_state->pending.addr = addr; - mtk_plane_state->pending.x = new_state->dst.x1; - mtk_plane_state->pending.y = new_state->dst.y1; - mtk_plane_state->pending.width = drm_rect_width(&new_state->dst); - mtk_plane_state->pending.height = drm_rect_height(&new_state->dst); - mtk_plane_state->pending.rotation = new_state->rotation; + mtk_plane_update_new_state(new_state, mtk_plane_state); wmb(); /* Make sure the above parameters are set before update */ mtk_plane_state->pending.dirty = true; }
From: Hsin-Yi Wang hsinyi@chromium.org
[ Upstream commit 798a315fc359aa6dbe48e09d802aa59b7e158ffc ]
Some pin doesn't support PUPD register, if it fails and fallbacks with bias_set_combo case, it will call mtk_pinconf_bias_set_pupd_r1_r0() to modify the PUPD pin again.
Since the general bias set are either PU/PD or PULLSEL/PULLEN, try bias_set or bias_set_rev1 for the other fallback case. If the pin doesn't support neither PU/PD nor PULLSEL/PULLEN, it will return -ENOTSUPP.
Fixes: 81bd1579b43e ("pinctrl: mediatek: Fix fallback call path") Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org Reviewed-by: Chen-Yu Tsai wenst@chromium.org Reviewed-by: Zhiyong Tao zhiyong.tao@mediatek.com Link: https://lore.kernel.org/r/20210701080955.2660294-1-hsinyi@chromium.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c index 5b3b048725cc..45ebdeba985a 100644 --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c @@ -925,12 +925,10 @@ int mtk_pinconf_adv_pull_set(struct mtk_pinctrl *hw, err = hw->soc->bias_set(hw, desc, pullup); if (err) return err; - } else if (hw->soc->bias_set_combo) { - err = hw->soc->bias_set_combo(hw, desc, pullup, arg); - if (err) - return err; } else { - return -ENOTSUPP; + err = mtk_pinconf_bias_set_rev1(hw, desc, pullup); + if (err) + err = mtk_pinconf_bias_set(hw, desc, pullup); } }
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit ee86f680ff4c9b406d49d4e22ddf10805b8a2137 ]
The ADC volume is a signed 8-bit number with range -97 to +12, with -97 being mute. Use a SOC_SINGLE_S8_TLV() to define this and fix the DECLARE_TLV_DB_SCALE() to have the correct start and mute flag.
Fixes: 2c394ca79604 ("ASoC: Add support for CS42L42 codec") Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Link: https://lore.kernel.org/r/20210729170929.6589-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 8434c48354f1..3956912e23ac 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -404,7 +404,7 @@ static const struct regmap_config cs42l42_regmap = { .use_single_write = true, };
-static DECLARE_TLV_DB_SCALE(adc_tlv, -9600, 100, false); +static DECLARE_TLV_DB_SCALE(adc_tlv, -9700, 100, true); static DECLARE_TLV_DB_SCALE(mixer_tlv, -6300, 100, true);
static const char * const cs42l42_hpf_freq_text[] = { @@ -443,8 +443,7 @@ static const struct snd_kcontrol_new cs42l42_snd_controls[] = { CS42L42_ADC_INV_SHIFT, true, false), SOC_SINGLE("ADC Boost Switch", CS42L42_ADC_CTL, CS42L42_ADC_DIG_BOOST_SHIFT, true, false), - SOC_SINGLE_SX_TLV("ADC Volume", CS42L42_ADC_VOLUME, - CS42L42_ADC_VOL_SHIFT, 0xA0, 0x6C, adc_tlv), + SOC_SINGLE_S8_TLV("ADC Volume", CS42L42_ADC_VOLUME, -97, 12, adc_tlv), SOC_SINGLE("ADC WNF Switch", CS42L42_ADC_WNF_HPF_CTL, CS42L42_ADC_WNF_EN_SHIFT, true, false), SOC_SINGLE("ADC HPF Switch", CS42L42_ADC_WNF_HPF_CTL,
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 64324bac750b84ca54711fb7d332132fcdb87293 ]
The driver has no support for left-justified protocol so it should not have been allowing this to be passed to cs42l42_set_dai_fmt().
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 2c394ca79604 ("ASoC: Add support for CS42L42 codec") Link: https://lore.kernel.org/r/20210729170929.6589-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 3956912e23ac..0d31c84b0445 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -778,7 +778,6 @@ static int cs42l42_set_dai_fmt(struct snd_soc_dai *codec_dai, unsigned int fmt) /* interface format */ switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) { case SND_SOC_DAIFMT_I2S: - case SND_SOC_DAIFMT_LEFT_J: break; default: return -EINVAL;
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 926ef1a4c245c093acc07807e466ad2ef0ff6ccb ]
An I2S frame always has a left and right channel slot even if mono data is being sent. So if channels==1 the actual bitclock frequency is 2 * snd_soc_params_to_bclk(params).
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 2cdba9b045c7 ("ASoC: cs42l42: Use bclk from hw_params if set_sysclk was not called") Link: https://lore.kernel.org/r/20210729170929.6589-3-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 0d31c84b0445..fe73a5c70bdd 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -820,6 +820,10 @@ static int cs42l42_pcm_hw_params(struct snd_pcm_substream *substream, cs42l42->srate = params_rate(params); cs42l42->bclk = snd_soc_params_to_bclk(params);
+ /* I2S frame always has 2 channels even for mono audio */ + if (channels == 1) + cs42l42->bclk *= 2; + switch(substream->stream) { case SNDRV_PCM_STREAM_CAPTURE: if (channels == 2) {
From: Mike Tipton mdtipton@codeaurora.org
[ Upstream commit f84f5b6f72e68bbaeb850b58ac167e4a3a47532a ]
We're only adding BCMs to the commit list in aggregate(), but there are cases where pre_aggregate() is called without subsequently calling aggregate(). In particular, in icc_sync_state() when a node with initial BW has zero requests. Since BCMs aren't added to the commit list in these cases, we don't actually send the zero BW request to HW. So the resources remain on unnecessarily.
Add BCMs to the commit list in pre_aggregate() instead, which is always called even when there are no requests.
Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support") Signed-off-by: Mike Tipton mdtipton@codeaurora.org Link: https://lore.kernel.org/r/20210721175432.2119-5-mdtipton@codeaurora.org Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/interconnect/qcom/icc-rpmh.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c index f6fae64861ce..27cc5f03611c 100644 --- a/drivers/interconnect/qcom/icc-rpmh.c +++ b/drivers/interconnect/qcom/icc-rpmh.c @@ -20,13 +20,18 @@ void qcom_icc_pre_aggregate(struct icc_node *node) { size_t i; struct qcom_icc_node *qn; + struct qcom_icc_provider *qp;
qn = node->data; + qp = to_qcom_provider(node->provider);
for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) { qn->sum_avg[i] = 0; qn->max_peak[i] = 0; } + + for (i = 0; i < qn->num_bcms; i++) + qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]); } EXPORT_SYMBOL_GPL(qcom_icc_pre_aggregate);
@@ -44,10 +49,8 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw, { size_t i; struct qcom_icc_node *qn; - struct qcom_icc_provider *qp;
qn = node->data; - qp = to_qcom_provider(node->provider);
if (!tag) tag = QCOM_ICC_TAG_ALWAYS; @@ -67,9 +70,6 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw, *agg_avg += avg_bw; *agg_peak = max_t(u32, *agg_peak, peak_bw);
- for (i = 0; i < qn->num_bcms; i++) - qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]); - return 0; } EXPORT_SYMBOL_GPL(qcom_icc_aggregate);
On 16.08.21 16:01, Greg Kroah-Hartman wrote:
From: Mike Tipton mdtipton@codeaurora.org
[ Upstream commit f84f5b6f72e68bbaeb850b58ac167e4a3a47532a ]
We're only adding BCMs to the commit list in aggregate(), but there are cases where pre_aggregate() is called without subsequently calling aggregate(). In particular, in icc_sync_state() when a node with initial BW has zero requests. Since BCMs aren't added to the commit list in these cases, we don't actually send the zero BW request to HW. So the resources remain on unnecessarily.
Add BCMs to the commit list in pre_aggregate() instead, which is always called even when there are no requests.
Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support") Signed-off-by: Mike Tipton mdtipton@codeaurora.org Link: https://lore.kernel.org/r/20210721175432.2119-5-mdtipton@codeaurora.org Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
Hello Greg and Sasha,
Please drop this patch from both 5.10 and 5.13 stable queues. It's causing issues on some platforms and we are reverting in. Revert is in linux-next already.
Thanks, Georgi
On Mon, Aug 16, 2021 at 08:17:52PM +0300, Georgi Djakov wrote:
On 16.08.21 16:01, Greg Kroah-Hartman wrote:
From: Mike Tipton mdtipton@codeaurora.org
[ Upstream commit f84f5b6f72e68bbaeb850b58ac167e4a3a47532a ]
We're only adding BCMs to the commit list in aggregate(), but there are cases where pre_aggregate() is called without subsequently calling aggregate(). In particular, in icc_sync_state() when a node with initial BW has zero requests. Since BCMs aren't added to the commit list in these cases, we don't actually send the zero BW request to HW. So the resources remain on unnecessarily.
Add BCMs to the commit list in pre_aggregate() instead, which is always called even when there are no requests.
Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support") Signed-off-by: Mike Tipton mdtipton@codeaurora.org Link: https://lore.kernel.org/r/20210721175432.2119-5-mdtipton@codeaurora.org Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
Hello Greg and Sasha,
Please drop this patch from both 5.10 and 5.13 stable queues. It's causing issues on some platforms and we are reverting in. Revert is in linux-next already.
Now dropped, thanks.
greg k-h
From: Tianjia Zhang tianjia.zhang@linux.alibaba.com
[ Upstream commit 567c39047dbee341244fe3bf79fea24ee0897ff9 ]
Q1 and Q2 are numbers with *maximum* length of 384 bytes. If the calculated length of Q1 and Q2 is less than 384 bytes, things will go wrong.
E.g. if Q2 is 383 bytes, then
1. The bytes of q2 are copied to sigstruct->q2 in calc_q1q2(). 2. The entire sigstruct->q2 is reversed, which results it being 256 * Q2, given that the last byte of sigstruct->q2 is added to before the bytes given by calc_q1q2().
Either change in key or measurement can trigger the bug. E.g. an unmeasured heap could cause a devastating change in Q1 or Q2.
Reverse exactly the bytes of Q1 and Q2 in calc_q1q2() before returning to the caller.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX") Link: https://lore.kernel.org/linux-sgx/20210301051836.30738-1-tianjia.zhang@linux... Signed-off-by: Tianjia Zhang tianjia.zhang@linux.alibaba.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/sgx/sigstruct.c | 41 +++++++++++++------------ 1 file changed, 21 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/sgx/sigstruct.c b/tools/testing/selftests/sgx/sigstruct.c index dee7a3d6c5a5..92bbc5a15c39 100644 --- a/tools/testing/selftests/sgx/sigstruct.c +++ b/tools/testing/selftests/sgx/sigstruct.c @@ -55,10 +55,27 @@ static bool alloc_q1q2_ctx(const uint8_t *s, const uint8_t *m, return true; }
+static void reverse_bytes(void *data, int length) +{ + int i = 0; + int j = length - 1; + uint8_t temp; + uint8_t *ptr = data; + + while (i < j) { + temp = ptr[i]; + ptr[i] = ptr[j]; + ptr[j] = temp; + i++; + j--; + } +} + static bool calc_q1q2(const uint8_t *s, const uint8_t *m, uint8_t *q1, uint8_t *q2) { struct q1q2_ctx ctx; + int len;
if (!alloc_q1q2_ctx(s, m, &ctx)) { fprintf(stderr, "Not enough memory for Q1Q2 calculation\n"); @@ -89,8 +106,10 @@ static bool calc_q1q2(const uint8_t *s, const uint8_t *m, uint8_t *q1, goto out; }
- BN_bn2bin(ctx.q1, q1); - BN_bn2bin(ctx.q2, q2); + len = BN_bn2bin(ctx.q1, q1); + reverse_bytes(q1, len); + len = BN_bn2bin(ctx.q2, q2); + reverse_bytes(q2, len);
free_q1q2_ctx(&ctx); return true; @@ -152,22 +171,6 @@ static RSA *gen_sign_key(void) return key; }
-static void reverse_bytes(void *data, int length) -{ - int i = 0; - int j = length - 1; - uint8_t temp; - uint8_t *ptr = data; - - while (i < j) { - temp = ptr[i]; - ptr[i] = ptr[j]; - ptr[j] = temp; - i++; - j--; - } -} - enum mrtags { MRECREATE = 0x0045544145524345, MREADD = 0x0000000044444145, @@ -367,8 +370,6 @@ bool encl_measure(struct encl *encl) /* BE -> LE */ reverse_bytes(sigstruct->signature, SGX_MODULUS_SIZE); reverse_bytes(sigstruct->modulus, SGX_MODULUS_SIZE); - reverse_bytes(sigstruct->q1, SGX_MODULUS_SIZE); - reverse_bytes(sigstruct->q2, SGX_MODULUS_SIZE);
EVP_MD_CTX_destroy(ctx); RSA_free(key);
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit 6b994c554ebc4c065427f510db333081cbd7228d ]
The previous Kconfig cleanup added simplifications but also introduced a new one by moving a boolean to a tristate. This leads to randconfig problems.
This patch moves the select operations in the SOUNDWIRE_LINK_BASELINE option. The INTEL_SOUNDWIRE config remains a tristate for backwards compatibility with older configurations but is essentially an on/off switch.
Fixes: cf5807f5f814f ('ASoC: SOF: Intel: SoundWire: simplify Kconfig') Reported-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Reviewed-by: Bard Liao bard.liao@intel.com Tested-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20210802151628.15291-1-pierre-louis.bossart@linux.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/intel/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/sof/intel/Kconfig b/sound/soc/sof/intel/Kconfig index 4bce89b5ea40..4447f515e8b1 100644 --- a/sound/soc/sof/intel/Kconfig +++ b/sound/soc/sof/intel/Kconfig @@ -278,6 +278,8 @@ config SND_SOC_SOF_HDA
config SND_SOC_SOF_INTEL_SOUNDWIRE_LINK_BASELINE tristate + select SOUNDWIRE_INTEL if SND_SOC_SOF_INTEL_SOUNDWIRE + select SND_INTEL_SOUNDWIRE_ACPI if SND_SOC_SOF_INTEL_SOUNDWIRE
config SND_SOC_SOF_INTEL_SOUNDWIRE tristate "SOF support for SoundWire" @@ -285,8 +287,6 @@ config SND_SOC_SOF_INTEL_SOUNDWIRE depends on SND_SOC_SOF_INTEL_SOUNDWIRE_LINK_BASELINE depends on ACPI && SOUNDWIRE depends on !(SOUNDWIRE=m && SND_SOC_SOF_INTEL_SOUNDWIRE_LINK_BASELINE=y) - select SOUNDWIRE_INTEL - select SND_INTEL_SOUNDWIRE_ACPI help This adds support for SoundWire with Sound Open Firmware for Intel(R) platforms.
From: Guennadi Liakhovetski guennadi.liakhovetski@linux.intel.com
[ Upstream commit 973b393fdf073a4ebd8d82ef6edea99fedc74af9 ]
Checking that two values don't have common bits makes no sense, strict equality is meant.
Fixes: f3b433e4699f ("ASoC: SOF: Implement Probe IPC API") Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Signed-off-by: Guennadi Liakhovetski guennadi.liakhovetski@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20210802151749.15417-1-pierre-louis.bossart@linux.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/intel/hda-ipc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/sof/intel/hda-ipc.c b/sound/soc/sof/intel/hda-ipc.c index c91aa951df22..acfeca42604c 100644 --- a/sound/soc/sof/intel/hda-ipc.c +++ b/sound/soc/sof/intel/hda-ipc.c @@ -107,8 +107,8 @@ void hda_dsp_ipc_get_reply(struct snd_sof_dev *sdev) } else { /* reply correct size ? */ if (reply.hdr.size != msg->reply_size && - /* getter payload is never known upfront */ - !(reply.hdr.cmd & SOF_IPC_GLB_PROBE)) { + /* getter payload is never known upfront */ + ((reply.hdr.cmd & SOF_GLB_TYPE_MASK) != SOF_IPC_GLB_PROBE)) { dev_err(sdev->dev, "error: reply expected %zu got %u bytes\n", msg->reply_size, reply.hdr.size); ret = -EINVAL;
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 30615bd21b4cc3c3bb5ae8bd70e2a915cc5f75c7 ]
The underlying register field has inverted sense (0 = enabled) so the control definition must be marked as inverted.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 2c394ca79604 ("ASoC: Add support for CS42L42 codec") Link: https://lore.kernel.org/r/20210803160834.9005-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index fe73a5c70bdd..c7fb33a89224 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -436,7 +436,7 @@ static SOC_ENUM_SINGLE_DECL(cs42l42_wnf05_freq_enum, CS42L42_ADC_WNF_HPF_CTL, static const struct snd_kcontrol_new cs42l42_snd_controls[] = { /* ADC Volume and Filter Controls */ SOC_SINGLE("ADC Notch Switch", CS42L42_ADC_CTL, - CS42L42_ADC_NOTCH_DIS_SHIFT, true, false), + CS42L42_ADC_NOTCH_DIS_SHIFT, true, true), SOC_SINGLE("ADC Weak Force Switch", CS42L42_ADC_CTL, CS42L42_ADC_FORCE_WEAK_VCM_SHIFT, true, false), SOC_SINGLE("ADC Invert Switch", CS42L42_ADC_CTL,
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 8b353bbeae20e2214c9d9d88bcb2fda4ba145d83 ]
The driver was defining two ALSA controls that both change the same register field for the wind noise filter corner frequency. The filter response has two corners, at different frequencies, and the duplicate controls most likely were an attempt to be able to set the value using either of the frequencies.
However, having two controls changing the same field can be problematic and it is unnecessary. Both frequencies are related to each other so setting one implies exactly what the other would be.
Removing a control affects user-side code, but there is currently no known use of the removed control so it would be best to remove it now before it becomes a problem.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 2c394ca79604 ("ASoC: Add support for CS42L42 codec") Link: https://lore.kernel.org/r/20210803160834.9005-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index c7fb33a89224..22d8c8d03308 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -424,15 +424,6 @@ static SOC_ENUM_SINGLE_DECL(cs42l42_wnf3_freq_enum, CS42L42_ADC_WNF_HPF_CTL, CS42L42_ADC_WNF_CF_SHIFT, cs42l42_wnf3_freq_text);
-static const char * const cs42l42_wnf05_freq_text[] = { - "280Hz", "315Hz", "350Hz", "385Hz", - "420Hz", "455Hz", "490Hz", "525Hz" -}; - -static SOC_ENUM_SINGLE_DECL(cs42l42_wnf05_freq_enum, CS42L42_ADC_WNF_HPF_CTL, - CS42L42_ADC_WNF_CF_SHIFT, - cs42l42_wnf05_freq_text); - static const struct snd_kcontrol_new cs42l42_snd_controls[] = { /* ADC Volume and Filter Controls */ SOC_SINGLE("ADC Notch Switch", CS42L42_ADC_CTL, @@ -450,7 +441,6 @@ static const struct snd_kcontrol_new cs42l42_snd_controls[] = { CS42L42_ADC_HPF_EN_SHIFT, true, false), SOC_ENUM("HPF Corner Freq", cs42l42_hpf_freq_enum), SOC_ENUM("WNF 3dB Freq", cs42l42_wnf3_freq_enum), - SOC_ENUM("WNF 05dB Freq", cs42l42_wnf05_freq_enum),
/* DAC Volume and Filter Controls */ SOC_SINGLE("DACA Invert Switch", CS42L42_DAC_CTL1,
From: Yajun Deng yajun.deng@linux.dev
[ Upstream commit 38ea9def5b62f9193f6bad96c5d108e2830ecbde ]
It should be added kfree_skb_list() when err is not equal to zero in nf_br_ip_fragment().
v2: keep this aligned with IPv6. v3: modify iter.frag_list to iter.frag.
Fixes: 3c171f496ef5 ("netfilter: bridge: add connection tracking system") Signed-off-by: Yajun Deng yajun.deng@linux.dev Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/netfilter/nf_conntrack_bridge.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index 8d033a75a766..fdbed3158555 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -88,6 +88,12 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
skb = ip_fraglist_next(&iter); } + + if (!err) + return 0; + + kfree_skb_list(iter.frag); + return err; } slow_path:
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 2f658f7a3953f6d70bab90e117aff8d0ad44e200 ]
The software mapping for GPIO, which initially comes from Microsoft, is subject to change by respective Windows and firmware developers. Due to the above the driver had been written and published way ahead of the schedule, and thus the numbering schema used in it is outdated.
Fix the numbering schema in accordance with the real products on market.
Fixes: 653d96455e1e ("pinctrl: tigerlake: Add support for Tiger Lake-H") Reported-and-tested-by: Kai-Heng Feng kai.heng.feng@canonical.com Reported-by: Riccardo Mori patacca@autistici.org Reported-and-tested-by: Lovesh lovesh.bond@gmail.com BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213463 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213579 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213857 Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/intel/pinctrl-tigerlake.c | 26 +++++++++++------------ 1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/pinctrl/intel/pinctrl-tigerlake.c b/drivers/pinctrl/intel/pinctrl-tigerlake.c index 75b6d66955bf..3ddaeffc0415 100644 --- a/drivers/pinctrl/intel/pinctrl-tigerlake.c +++ b/drivers/pinctrl/intel/pinctrl-tigerlake.c @@ -701,32 +701,32 @@ static const struct pinctrl_pin_desc tglh_pins[] = {
static const struct intel_padgroup tglh_community0_gpps[] = { TGL_GPP(0, 0, 24, 0), /* GPP_A */ - TGL_GPP(1, 25, 44, 128), /* GPP_R */ - TGL_GPP(2, 45, 70, 32), /* GPP_B */ - TGL_GPP(3, 71, 78, INTEL_GPIO_BASE_NOMAP), /* vGPIO_0 */ + TGL_GPP(1, 25, 44, 32), /* GPP_R */ + TGL_GPP(2, 45, 70, 64), /* GPP_B */ + TGL_GPP(3, 71, 78, 96), /* vGPIO_0 */ };
static const struct intel_padgroup tglh_community1_gpps[] = { - TGL_GPP(0, 79, 104, 96), /* GPP_D */ - TGL_GPP(1, 105, 128, 64), /* GPP_C */ - TGL_GPP(2, 129, 136, 160), /* GPP_S */ - TGL_GPP(3, 137, 153, 192), /* GPP_G */ - TGL_GPP(4, 154, 180, 224), /* vGPIO */ + TGL_GPP(0, 79, 104, 128), /* GPP_D */ + TGL_GPP(1, 105, 128, 160), /* GPP_C */ + TGL_GPP(2, 129, 136, 192), /* GPP_S */ + TGL_GPP(3, 137, 153, 224), /* GPP_G */ + TGL_GPP(4, 154, 180, 256), /* vGPIO */ };
static const struct intel_padgroup tglh_community3_gpps[] = { - TGL_GPP(0, 181, 193, 256), /* GPP_E */ - TGL_GPP(1, 194, 217, 288), /* GPP_F */ + TGL_GPP(0, 181, 193, 288), /* GPP_E */ + TGL_GPP(1, 194, 217, 320), /* GPP_F */ };
static const struct intel_padgroup tglh_community4_gpps[] = { - TGL_GPP(0, 218, 241, 320), /* GPP_H */ + TGL_GPP(0, 218, 241, 352), /* GPP_H */ TGL_GPP(1, 242, 251, 384), /* GPP_J */ - TGL_GPP(2, 252, 266, 352), /* GPP_K */ + TGL_GPP(2, 252, 266, 416), /* GPP_K */ };
static const struct intel_padgroup tglh_community5_gpps[] = { - TGL_GPP(0, 267, 281, 416), /* GPP_I */ + TGL_GPP(0, 267, 281, 448), /* GPP_I */ TGL_GPP(1, 282, 290, INTEL_GPIO_BASE_NOMAP), /* JTAG */ };
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit f1040e86f83b0f7d5f45724500a6a441731ff4b7 ]
Both SCLK and PLL clocks must be running to drive the glitch-free mux behind MCLK_SRC_SEL and complete the switchover.
This patch moves the writing of MCLK_SRC_SEL to when the PLL is started and stopped, so that it only transitions while the PLL is running. The unconditional write MCLK_SRC_SEL=0 in cs42l42_mute_stream() is safe because if the PLL is not running MCLK_SRC_SEL is already 0.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 43fc357199f9 ("ASoC: cs42l42: Set clock source for both ways of stream") Link: https://lore.kernel.org/r/20210805161111.10410-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 25 ++++++++++++++++++------- sound/soc/codecs/cs42l42.h | 1 + 2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 22d8c8d03308..7b102a05a1b6 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -609,6 +609,8 @@ static int cs42l42_pll_config(struct snd_soc_component *component)
for (i = 0; i < ARRAY_SIZE(pll_ratio_table); i++) { if (pll_ratio_table[i].sclk == clk) { + cs42l42->pll_config = i; + /* Configure the internal sample rate */ snd_soc_component_update_bits(component, CS42L42_MCLK_CTL, CS42L42_INTERNAL_FS_MASK, @@ -617,14 +619,9 @@ static int cs42l42_pll_config(struct snd_soc_component *component) (pll_ratio_table[i].mclk_int != 24000000)) << CS42L42_INTERNAL_FS_SHIFT); - /* Set the MCLK src (PLL or SCLK) and the divide - * ratio - */ + snd_soc_component_update_bits(component, CS42L42_MCLK_SRC_SEL, - CS42L42_MCLK_SRC_SEL_MASK | CS42L42_MCLKDIV_MASK, - (pll_ratio_table[i].mclk_src_sel - << CS42L42_MCLK_SRC_SEL_SHIFT) | (pll_ratio_table[i].mclk_div << CS42L42_MCLKDIV_SHIFT)); /* Set up the LRCLK */ @@ -882,13 +879,21 @@ static int cs42l42_mute_stream(struct snd_soc_dai *dai, int mute, int stream) */ regmap_multi_reg_write(cs42l42->regmap, cs42l42_to_osc_seq, ARRAY_SIZE(cs42l42_to_osc_seq)); + + /* Must disconnect PLL before stopping it */ + snd_soc_component_update_bits(component, + CS42L42_MCLK_SRC_SEL, + CS42L42_MCLK_SRC_SEL_MASK, + 0); + usleep_range(100, 200); + snd_soc_component_update_bits(component, CS42L42_PLL_CTL1, CS42L42_PLL_START_MASK, 0); } } else { if (!cs42l42->stream_use) { /* SCLK must be running before codec unmute */ - if ((cs42l42->bclk < 11289600) && (cs42l42->sclk < 11289600)) { + if (pll_ratio_table[cs42l42->pll_config].mclk_src_sel) { snd_soc_component_update_bits(component, CS42L42_PLL_CTL1, CS42L42_PLL_START_MASK, 1);
@@ -909,6 +914,12 @@ static int cs42l42_mute_stream(struct snd_soc_dai *dai, int mute, int stream) CS42L42_PLL_LOCK_TIMEOUT_US); if (ret < 0) dev_warn(component->dev, "PLL failed to lock: %d\n", ret); + + /* PLL must be running to drive glitchless switch logic */ + snd_soc_component_update_bits(component, + CS42L42_MCLK_SRC_SEL, + CS42L42_MCLK_SRC_SEL_MASK, + CS42L42_MCLK_SRC_SEL_MASK); }
/* Mark SCLK as present, turn off internal oscillator */ diff --git a/sound/soc/codecs/cs42l42.h b/sound/soc/codecs/cs42l42.h index 5384105afe50..38fd91a168ae 100644 --- a/sound/soc/codecs/cs42l42.h +++ b/sound/soc/codecs/cs42l42.h @@ -775,6 +775,7 @@ struct cs42l42_private { struct gpio_desc *reset_gpio; struct completion pdn_done; struct snd_soc_jack jack; + int pll_config; int bclk; u32 sclk; u32 srate;
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit 0c2f2ad4f16a58879463d0979a54293f8f296d6f ]
An I2S frame starts on the falling edge of LRCLK so ASP_STP must be 0.
At the same time, move other format settings in the same register from cs42l42_pll_config() to cs42l42_set_dai_fmt() where you'd expect to find them, and merge into a single write.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 2c394ca79604 ("ASoC: Add support for CS42L42 codec") Link: https://lore.kernel.org/r/20210805161111.10410-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 7b102a05a1b6..0c8cdfe78d96 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -657,15 +657,6 @@ static int cs42l42_pll_config(struct snd_soc_component *component) CS42L42_FSYNC_PULSE_WIDTH_MASK, CS42L42_FRAC1_VAL(fsync - 1) << CS42L42_FSYNC_PULSE_WIDTH_SHIFT); - snd_soc_component_update_bits(component, - CS42L42_ASP_FRM_CFG, - CS42L42_ASP_5050_MASK, - CS42L42_ASP_5050_MASK); - /* Set the frame delay to 1.0 SCLK clocks */ - snd_soc_component_update_bits(component, CS42L42_ASP_FRM_CFG, - CS42L42_ASP_FSD_MASK, - CS42L42_ASP_FSD_1_0 << - CS42L42_ASP_FSD_SHIFT); /* Set the sample rates (96k or lower) */ snd_soc_component_update_bits(component, CS42L42_FS_RATE_EN, CS42L42_FS_EN_MASK, @@ -765,6 +756,18 @@ static int cs42l42_set_dai_fmt(struct snd_soc_dai *codec_dai, unsigned int fmt) /* interface format */ switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) { case SND_SOC_DAIFMT_I2S: + /* + * 5050 mode, frame starts on falling edge of LRCLK, + * frame delayed by 1.0 SCLKs + */ + snd_soc_component_update_bits(component, + CS42L42_ASP_FRM_CFG, + CS42L42_ASP_STP_MASK | + CS42L42_ASP_5050_MASK | + CS42L42_ASP_FSD_MASK, + CS42L42_ASP_5050_MASK | + (CS42L42_ASP_FSD_1_0 << + CS42L42_ASP_FSD_SHIFT)); break; default: return -EINVAL;
From: Richard Fitzgerald rf@opensource.cirrus.com
[ Upstream commit e5ada3f6787a4d6234adc6f2f3ae35c6d5b71ba0 ]
I2S always has two LRCLK phases and both CH1 and CH2 of the RX must be enabled (corresponding to the low and high phases of LRCLK.) The selection of the valid data channels is done by setting the DAC CHA_SEL and CHB_SEL. CHA_SEL is always the first (left) channel, CHB_SEL depends on the number of active channels.
Previously for mono ASP CH2 was not enabled, the result was playing mono data would not produce any audio output.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Fixes: 621d65f3b868 ("ASoC: cs42l42: Provide finer control on playback path") Link: https://lore.kernel.org/r/20210805161111.10410-4-rf@opensource.cirrus.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 15 +++++++++++++-- sound/soc/codecs/cs42l42.h | 2 ++ 2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 0c8cdfe78d96..e0a524f8e16c 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -459,8 +459,8 @@ static const struct snd_soc_dapm_widget cs42l42_dapm_widgets[] = { SND_SOC_DAPM_OUTPUT("HP"), SND_SOC_DAPM_DAC("DAC", NULL, CS42L42_PWR_CTL1, CS42L42_HP_PDN_SHIFT, 1), SND_SOC_DAPM_MIXER("MIXER", CS42L42_PWR_CTL1, CS42L42_MIXER_PDN_SHIFT, 1, NULL, 0), - SND_SOC_DAPM_AIF_IN("SDIN1", NULL, 0, CS42L42_ASP_RX_DAI0_EN, CS42L42_ASP_RX0_CH1_SHIFT, 0), - SND_SOC_DAPM_AIF_IN("SDIN2", NULL, 1, CS42L42_ASP_RX_DAI0_EN, CS42L42_ASP_RX0_CH2_SHIFT, 0), + SND_SOC_DAPM_AIF_IN("SDIN1", NULL, 0, SND_SOC_NOPM, 0, 0), + SND_SOC_DAPM_AIF_IN("SDIN2", NULL, 1, SND_SOC_NOPM, 0, 0),
/* Playback Requirements */ SND_SOC_DAPM_SUPPLY("ASP DAI0", CS42L42_PWR_CTL1, CS42L42_ASP_DAI_PDN_SHIFT, 1, NULL, 0), @@ -837,6 +837,17 @@ static int cs42l42_pcm_hw_params(struct snd_pcm_substream *substream, snd_soc_component_update_bits(component, CS42L42_ASP_RX_DAI0_CH2_AP_RES, CS42L42_ASP_RX_CH_AP_MASK | CS42L42_ASP_RX_CH_RES_MASK, val); + + /* Channel B comes from the last active channel */ + snd_soc_component_update_bits(component, CS42L42_SP_RX_CH_SEL, + CS42L42_SP_RX_CHB_SEL_MASK, + (channels - 1) << CS42L42_SP_RX_CHB_SEL_SHIFT); + + /* Both LRCLK slots must be enabled */ + snd_soc_component_update_bits(component, CS42L42_ASP_RX_DAI0_EN, + CS42L42_ASP_RX0_CH_EN_MASK, + BIT(CS42L42_ASP_RX0_CH1_SHIFT) | + BIT(CS42L42_ASP_RX0_CH2_SHIFT)); break; default: break; diff --git a/sound/soc/codecs/cs42l42.h b/sound/soc/codecs/cs42l42.h index 38fd91a168ae..10cf2e4c8ead 100644 --- a/sound/soc/codecs/cs42l42.h +++ b/sound/soc/codecs/cs42l42.h @@ -653,6 +653,8 @@
/* Page 0x25 Audio Port Registers */ #define CS42L42_SP_RX_CH_SEL (CS42L42_PAGE_25 + 0x01) +#define CS42L42_SP_RX_CHB_SEL_SHIFT 2 +#define CS42L42_SP_RX_CHB_SEL_MASK (3 << CS42L42_SP_RX_CHB_SEL_SHIFT)
#define CS42L42_SP_RX_ISOC_CTL (CS42L42_PAGE_25 + 0x02) #define CS42L42_SP_RX_RSYNC_SHIFT 6
From: DENG Qingfang dqfext@gmail.com
[ Upstream commit aff51c5da3208bd164381e1488998667269c6cf4 ]
Add the missing RxUnicast counter.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: DENG Qingfang dqfext@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 9b90f3d3a8f5..167c599a81a5 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -46,6 +46,7 @@ static const struct mt7530_mib_desc mt7530_mib[] = { MIB_DESC(2, 0x48, "TxBytes"), MIB_DESC(1, 0x60, "RxDrop"), MIB_DESC(1, 0x64, "RxFiltering"), + MIB_DESC(1, 0x68, "RxUnicast"), MIB_DESC(1, 0x6c, "RxMulticast"), MIB_DESC(1, 0x70, "RxBroadcast"), MIB_DESC(1, 0x74, "RxAlignErr"),
From: John Hubbard jhubbard@nvidia.com
[ Upstream commit 704e624f7b3e8a4fc1ce43fb564746d1d07b20c0 ]
On s390, the following build warning occurs:
drivers/net/ethernet/marvell/mvpp2/mvpp2.h:844:2: warning: overflow in conversion from 'long unsigned int' to 'int' changes value from '18446744073709551584' to '-32' [-Woverflow] 844 | ((total_size) - MVPP2_SKB_HEADROOM - MVPP2_SKB_SHINFO_SIZE)
This happens because MVPP2_SKB_SHINFO_SIZE, which is 320 bytes (which is already 64-byte aligned) on some architectures, actually gets ALIGN'd up to 512 bytes in the s390 case.
So then, when this is invoked:
MVPP2_RX_MAX_PKT_SIZE(MVPP2_BM_SHORT_FRAME_SIZE)
...that turns into:
704 - 224 - 512 == -32
...which is not a good frame size to end up with! The warning above is a bit lucky: it notices a signed/unsigned bad behavior here, which leads to the real problem of a frame that is too short for its contents.
Increase MVPP2_BM_SHORT_FRAME_SIZE by 32 (from 704 to 736), which is just exactly big enough. (The other values can't readily be changed without causing a lot of other problems.)
Fixes: 07dd0a7aae7f ("mvpp2: add basic XDP support") Cc: Sven Auhagen sven.auhagen@voleatech.de Cc: Matteo Croce mcroce@microsoft.com Cc: David S. Miller davem@davemloft.net Signed-off-by: John Hubbard jhubbard@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h index 4a61c90003b5..722209a14f53 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h @@ -938,7 +938,7 @@ enum mvpp22_ptp_packet_format { #define MVPP2_BM_COOKIE_POOL_OFFS 8 #define MVPP2_BM_COOKIE_CPU_OFFS 24
-#define MVPP2_BM_SHORT_FRAME_SIZE 704 /* frame size 128 */ +#define MVPP2_BM_SHORT_FRAME_SIZE 736 /* frame size 128 */ #define MVPP2_BM_LONG_FRAME_SIZE 2240 /* frame size 1664 */ #define MVPP2_BM_JUMBO_FRAME_SIZE 10432 /* frame size 9856 */ /* BM short pool packet size
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 9d7b132e62e41b7d49bf157aeaf9147c27492e0f ]
The gpiod_lookup_table.table passed to gpiod_add_lookup_table() must be terminated with an empty entry, add this.
Note we have likely been getting away with this not being present because the GPIO lookup code first matches on the dev_id, causing most lookups to skip checking the table and the lookups which do check the table will find a matching entry before reaching the end. With that said, terminating these tables properly still is obviously the correct thing to do.
Fixes: f8eb0235f659 ("x86: pcengines apuv2 gpio/leds/keys platform driver") Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20210806115515.12184-1-hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/pcengines-apuv2.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/platform/x86/pcengines-apuv2.c b/drivers/platform/x86/pcengines-apuv2.c index c37349f97bb8..d063d91db9bc 100644 --- a/drivers/platform/x86/pcengines-apuv2.c +++ b/drivers/platform/x86/pcengines-apuv2.c @@ -94,6 +94,7 @@ static struct gpiod_lookup_table gpios_led_table = { NULL, 1, GPIO_ACTIVE_LOW), GPIO_LOOKUP_IDX(AMD_FCH_GPIO_DRIVER_NAME, APU2_GPIO_LINE_LED3, NULL, 2, GPIO_ACTIVE_LOW), + {} /* Terminating entry */ } };
@@ -123,6 +124,7 @@ static struct gpiod_lookup_table gpios_key_table = { .table = { GPIO_LOOKUP_IDX(AMD_FCH_GPIO_DRIVER_NAME, APU2_GPIO_LINE_MODESW, NULL, 0, GPIO_ACTIVE_LOW), + {} /* Terminating entry */ } };
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit acade6379930dfa7987f4bd9b26d1a701cc1b542 ]
A warning as below may be occasionally triggered in an ADL machine when these conditions occur:
- Two perf record commands run one by one. Both record a PEBS event. - Both runs on small cores. - They have different adaptive PEBS configuration (PEBS_DATA_CFG).
[ ] WARNING: CPU: 4 PID: 9874 at arch/x86/events/intel/ds.c:1743 setup_pebs_adaptive_sample_data+0x55e/0x5b0 [ ] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 [ ] Call Trace: [ ] <NMI> [ ] intel_pmu_drain_pebs_icl+0x48b/0x810 [ ] perf_event_nmi_handler+0x41/0x80 [ ] </NMI> [ ] __perf_event_task_sched_in+0x2c2/0x3a0
Different from the big core, the small core requires the ACK right before re-enabling counters in the NMI handler, otherwise a stale PEBS record may be dumped into the later NMI handler, which trigger the warning.
Add a new mid_ack flag to track the case. Add all PMI handler bits in the struct x86_hybrid_pmu to track the bits for different types of PMUs. Apply mid ACK for the small cores on an Alder Lake machine.
The existing hybrid() macro has a compile error when taking address of a bit-field variable. Add a new macro hybrid_bit() to get the bit-field value of a given PMU.
Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi ammy.yi@intel.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Andi Kleen ak@linux.intel.com Tested-by: Ammy Yi ammy.yi@intel.com Link: https://lkml.kernel.org/r/1627997128-57891-1-git-send-email-kan.liang@linux.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/intel/core.c | 23 +++++++++++++++-------- arch/x86/events/perf_event.h | 15 +++++++++++++++ 2 files changed, 30 insertions(+), 8 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index d76be3bba11e..511d1f9a9bf8 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2904,24 +2904,28 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status) */ static int intel_pmu_handle_irq(struct pt_regs *regs) { - struct cpu_hw_events *cpuc; + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + bool late_ack = hybrid_bit(cpuc->pmu, late_ack); + bool mid_ack = hybrid_bit(cpuc->pmu, mid_ack); int loops; u64 status; int handled; int pmu_enabled;
- cpuc = this_cpu_ptr(&cpu_hw_events); - /* * Save the PMU state. * It needs to be restored when leaving the handler. */ pmu_enabled = cpuc->enabled; /* - * No known reason to not always do late ACK, - * but just in case do it opt-in. + * In general, the early ACK is only applied for old platforms. + * For the big core starts from Haswell, the late ACK should be + * applied. + * For the small core after Tremont, we have to do the ACK right + * before re-enabling counters, which is in the middle of the + * NMI handler. */ - if (!x86_pmu.late_ack) + if (!late_ack && !mid_ack) apic_write(APIC_LVTPC, APIC_DM_NMI); intel_bts_disable_local(); cpuc->enabled = 0; @@ -2958,6 +2962,8 @@ again: goto again;
done: + if (mid_ack) + apic_write(APIC_LVTPC, APIC_DM_NMI); /* Only restore PMU state when it's active. See x86_pmu_disable(). */ cpuc->enabled = pmu_enabled; if (pmu_enabled) @@ -2969,7 +2975,7 @@ done: * have been reset. This avoids spurious NMIs on * Haswell CPUs. */ - if (x86_pmu.late_ack) + if (late_ack) apic_write(APIC_LVTPC, APIC_DM_NMI); return handled; } @@ -6123,7 +6129,6 @@ __init int intel_pmu_init(void) static_branch_enable(&perf_is_hybrid); x86_pmu.num_hybrid_pmus = X86_HYBRID_NUM_PMUS;
- x86_pmu.late_ack = true; x86_pmu.pebs_aliases = NULL; x86_pmu.pebs_prec_dist = true; x86_pmu.pebs_block = true; @@ -6161,6 +6166,7 @@ __init int intel_pmu_init(void) pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX]; pmu->name = "cpu_core"; pmu->cpu_type = hybrid_big; + pmu->late_ack = true; if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) { pmu->num_counters = x86_pmu.num_counters + 2; pmu->num_counters_fixed = x86_pmu.num_counters_fixed + 1; @@ -6186,6 +6192,7 @@ __init int intel_pmu_init(void) pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_ATOM_IDX]; pmu->name = "cpu_atom"; pmu->cpu_type = hybrid_small; + pmu->mid_ack = true; pmu->num_counters = x86_pmu.num_counters; pmu->num_counters_fixed = x86_pmu.num_counters_fixed; pmu->max_pebs_events = x86_pmu.max_pebs_events; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2938c902ffbe..e3ac05c97b5e 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -656,6 +656,10 @@ struct x86_hybrid_pmu { struct event_constraint *event_constraints; struct event_constraint *pebs_constraints; struct extra_reg *extra_regs; + + unsigned int late_ack :1, + mid_ack :1, + enabled_ack :1; };
static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu) @@ -686,6 +690,16 @@ extern struct static_key_false perf_is_hybrid; __Fp; \ }))
+#define hybrid_bit(_pmu, _field) \ +({ \ + bool __Fp = x86_pmu._field; \ + \ + if (is_hybrid() && (_pmu)) \ + __Fp = hybrid_pmu(_pmu)->_field; \ + \ + __Fp; \ +}) + enum hybrid_pmu_type { hybrid_big = 0x40, hybrid_small = 0x20, @@ -755,6 +769,7 @@ struct x86_pmu {
/* PMI handler bits */ unsigned int late_ack :1, + mid_ack :1, enabled_ack :1; /* * sysfs attrs
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 5126da7d99cf6396c929f3b577ba3aed1e74acd7 ]
'watermarks_table' must be freed instead 'clocks_table', because 'clocks_table' is known to be NULL at this point and 'watermarks_table' is never freed if the last kzalloc fails.
Fixes: c98ee89736b8 ("drm/amd/pm: add the fine grain tuning function for vangogh") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 77f532a49e37..bacef9120b8d 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -242,7 +242,7 @@ static int vangogh_tables_init(struct smu_context *smu) return 0;
err3_out: - kfree(smu_table->clocks_table); + kfree(smu_table->watermarks_table); err2_out: kfree(smu_table->gpu_metrics_table); err1_out:
From: Robin Gögge r.goegge@googlemail.com
[ Upstream commit 78d14bda861dd2729f15bb438fe355b48514bfe0 ]
This patch fixes the probe for BPF_PROG_TYPE_CGROUP_SOCKOPT, so the probe reports accurate results when used by e.g. bpftool.
Fixes: 4cdbfb59c44a ("libbpf: support sockopt hooks") Signed-off-by: Robin Gögge r.goegge@gmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Quentin Monnet quentin@isovalent.com Link: https://lore.kernel.org/bpf/20210728225825.2357586-1-r.goegge@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/libbpf_probes.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index ecaae2927ab8..cd8c703dde71 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -75,6 +75,9 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns, case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: xattr.expected_attach_type = BPF_CGROUP_INET4_CONNECT; break; + case BPF_PROG_TYPE_CGROUP_SOCKOPT: + xattr.expected_attach_type = BPF_CGROUP_GETSOCKOPT; + break; case BPF_PROG_TYPE_SK_LOOKUP: xattr.expected_attach_type = BPF_SK_LOOKUP; break; @@ -104,7 +107,6 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns, case BPF_PROG_TYPE_SK_REUSEPORT: case BPF_PROG_TYPE_FLOW_DISSECTOR: case BPF_PROG_TYPE_CGROUP_SYSCTL: - case BPF_PROG_TYPE_CGROUP_SOCKOPT: case BPF_PROG_TYPE_TRACING: case BPF_PROG_TYPE_STRUCT_OPS: case BPF_PROG_TYPE_EXT:
From: Daniel Xu dxu@dxuuu.xyz
[ Upstream commit c34c338a40e4f3b6f80889cd17fd9281784d1c32 ]
Before this patch, btf_new() was liable to close an arbitrary FD 0 if BTF parsing failed. This was because:
* btf->fd was initialized to 0 through the calloc() * btf__free() (in the `done` label) closed any FDs >= 0 * btf->fd is left at 0 if parsing fails
This issue was discovered on a system using libbpf v0.3 (without BTF_KIND_FLOAT support) but with a kernel that had BTF_KIND_FLOAT types in BTF. Thus, parsing fails.
While this patch technically doesn't fix any issues b/c upstream libbpf has BTF_KIND_FLOAT support, it'll help prevent issues in the future if more BTF types are added. It also allow the fix to be backported to older libbpf's.
Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness") Signed-off-by: Daniel Xu dxu@dxuuu.xyz Signed-off-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Yonghong Song yhs@fb.com Link: https://lore.kernel.org/bpf/5969bb991adedb03c6ae93e051fd2a00d293cf25.1627513... Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/btf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index d57e13a13798..1d9e5b35524c 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -805,6 +805,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf) btf->nr_types = 0; btf->start_id = 1; btf->start_str_off = 0; + btf->fd = -1;
if (base_btf) { btf->base_btf = base_btf; @@ -833,8 +834,6 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf) if (err) goto done;
- btf->fd = -1; - done: if (err) { btf__free(btf);
From: Tatsuhiko Yasumatsu th.yasumatsu@gmail.com
[ Upstream commit c4eb1f403243fc7bbb7de644db8587c03de36da6 ]
In __htab_map_lookup_and_delete_batch(), hash buckets are iterated over to count the number of elements in each bucket (bucket_size). If bucket_size is large enough, the multiplication to calculate kvmalloc() size could overflow, resulting in out-of-bounds write as reported by KASAN:
[...] [ 104.986052] BUG: KASAN: vmalloc-out-of-bounds in __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.986489] Write of size 4194224 at addr ffffc9010503be70 by task crash/112 [ 104.986889] [ 104.987193] CPU: 0 PID: 112 Comm: crash Not tainted 5.14.0-rc4 #13 [ 104.987552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 104.988104] Call Trace: [ 104.988410] dump_stack_lvl+0x34/0x44 [ 104.988706] print_address_description.constprop.0+0x21/0x140 [ 104.988991] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989327] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.989622] kasan_report.cold+0x7f/0x11b [ 104.989881] ? __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990239] kasan_check_range+0x17c/0x1e0 [ 104.990467] memcpy+0x39/0x60 [ 104.990670] __htab_map_lookup_and_delete_batch+0x5ce/0xb60 [ 104.990982] ? __wake_up_common+0x4d/0x230 [ 104.991256] ? htab_of_map_free+0x130/0x130 [ 104.991541] bpf_map_do_batch+0x1fb/0x220 [...]
In hashtable, if the elements' keys have the same jhash() value, the elements will be put into the same bucket. By putting a lot of elements into a single bucket, the value of bucket_size can be increased to trigger the integer overflow.
Triggering the overflow is possible for both callers with CAP_SYS_ADMIN and callers without CAP_SYS_ADMIN.
It will be trivial for a caller with CAP_SYS_ADMIN to intentionally reach this overflow by enabling BPF_F_ZERO_SEED. As this flag will set the random seed passed to jhash() to 0, it will be easy for the caller to prepare keys which will be hashed into the same value, and thus put all the elements into the same bucket.
If the caller does not have CAP_SYS_ADMIN, BPF_F_ZERO_SEED cannot be used. However, it will be still technically possible to trigger the overflow, by guessing the random seed value passed to jhash() (32bit) and repeating the attempt to trigger the overflow. In this case, the probability to trigger the overflow will be low and will take a very long time.
Fix the integer overflow by calling kvmalloc_array() instead of kvmalloc() to allocate memory.
Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map") Signed-off-by: Tatsuhiko Yasumatsu th.yasumatsu@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20210806150419.109658-1-th.yasumatsu@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/hashtab.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index d7ebb12ffffc..49857e8cd6ce 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1464,8 +1464,8 @@ alloc: /* We cannot do copy_from_user or copy_to_user inside * the rcu_read_lock. Allocate enough space here. */ - keys = kvmalloc(key_size * bucket_size, GFP_USER | __GFP_NOWARN); - values = kvmalloc(value_size * bucket_size, GFP_USER | __GFP_NOWARN); + keys = kvmalloc_array(key_size, bucket_size, GFP_USER | __GFP_NOWARN); + values = kvmalloc_array(value_size, bucket_size, GFP_USER | __GFP_NOWARN); if (!keys || !values) { ret = -ENOMEM; goto after_loop;
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 47fac45600aafc5939d9620055c3c46f7135d316 ]
Make sure that all external port are actually isolated from each other, so no packets are leaked.
Fixes: ec6698c272de ("net: dsa: add support for Atheros AR9331 built-in switch") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/qca/ar9331.c | 73 +++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c index 6686192e1883..563d8a279030 100644 --- a/drivers/net/dsa/qca/ar9331.c +++ b/drivers/net/dsa/qca/ar9331.c @@ -101,6 +101,23 @@ AR9331_SW_PORT_STATUS_RX_FLOW_EN | AR9331_SW_PORT_STATUS_TX_FLOW_EN | \ AR9331_SW_PORT_STATUS_SPEED_M)
+#define AR9331_SW_REG_PORT_CTRL(_port) (0x104 + (_port) * 0x100) +#define AR9331_SW_PORT_CTRL_HEAD_EN BIT(11) +#define AR9331_SW_PORT_CTRL_PORT_STATE GENMASK(2, 0) +#define AR9331_SW_PORT_CTRL_PORT_STATE_DISABLED 0 +#define AR9331_SW_PORT_CTRL_PORT_STATE_BLOCKING 1 +#define AR9331_SW_PORT_CTRL_PORT_STATE_LISTENING 2 +#define AR9331_SW_PORT_CTRL_PORT_STATE_LEARNING 3 +#define AR9331_SW_PORT_CTRL_PORT_STATE_FORWARD 4 + +#define AR9331_SW_REG_PORT_VLAN(_port) (0x108 + (_port) * 0x100) +#define AR9331_SW_PORT_VLAN_8021Q_MODE GENMASK(31, 30) +#define AR9331_SW_8021Q_MODE_SECURE 3 +#define AR9331_SW_8021Q_MODE_CHECK 2 +#define AR9331_SW_8021Q_MODE_FALLBACK 1 +#define AR9331_SW_8021Q_MODE_NONE 0 +#define AR9331_SW_PORT_VLAN_PORT_VID_MEMBER GENMASK(25, 16) + /* MIB registers */ #define AR9331_MIB_COUNTER(x) (0x20000 + ((x) * 0x100))
@@ -371,12 +388,60 @@ static int ar9331_sw_mbus_init(struct ar9331_sw_priv *priv) return 0; }
-static int ar9331_sw_setup(struct dsa_switch *ds) +static int ar9331_sw_setup_port(struct dsa_switch *ds, int port) { struct ar9331_sw_priv *priv = (struct ar9331_sw_priv *)ds->priv; struct regmap *regmap = priv->regmap; + u32 port_mask, port_ctrl, val; int ret;
+ /* Generate default port settings */ + port_ctrl = FIELD_PREP(AR9331_SW_PORT_CTRL_PORT_STATE, + AR9331_SW_PORT_CTRL_PORT_STATE_FORWARD); + + if (dsa_is_cpu_port(ds, port)) { + /* CPU port should be allowed to communicate with all user + * ports. + */ + port_mask = dsa_user_ports(ds); + /* Enable Atheros header on CPU port. This will allow us + * communicate with each port separately + */ + port_ctrl |= AR9331_SW_PORT_CTRL_HEAD_EN; + } else if (dsa_is_user_port(ds, port)) { + /* User ports should communicate only with the CPU port. + */ + port_mask = BIT(dsa_upstream_port(ds, port)); + } else { + /* Other ports do not need to communicate at all */ + port_mask = 0; + } + + val = FIELD_PREP(AR9331_SW_PORT_VLAN_8021Q_MODE, + AR9331_SW_8021Q_MODE_NONE) | + FIELD_PREP(AR9331_SW_PORT_VLAN_PORT_VID_MEMBER, port_mask); + + ret = regmap_write(regmap, AR9331_SW_REG_PORT_VLAN(port), val); + if (ret) + goto error; + + ret = regmap_write(regmap, AR9331_SW_REG_PORT_CTRL(port), port_ctrl); + if (ret) + goto error; + + return 0; +error: + dev_err(priv->dev, "%s: error: %i\n", __func__, ret); + + return ret; +} + +static int ar9331_sw_setup(struct dsa_switch *ds) +{ + struct ar9331_sw_priv *priv = (struct ar9331_sw_priv *)ds->priv; + struct regmap *regmap = priv->regmap; + int ret, i; + ret = ar9331_sw_reset(priv); if (ret) return ret; @@ -402,6 +467,12 @@ static int ar9331_sw_setup(struct dsa_switch *ds) if (ret) goto error;
+ for (i = 0; i < ds->num_ports; i++) { + ret = ar9331_sw_setup_port(ds, i); + if (ret) + goto error; + } + ds->configure_vlan_while_not_filtering = false;
return 0;
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit 2383cb9497d113360137a2be308b390faa80632d ]
Commit a5e63c7d38d5 "net: phy: micrel: Fix detection of ksz87xx switch" broke link detection on the external ports of the KSZ8795.
The previously unused phy_driver structure for these devices specifies config_aneg and read_status functions that appear to be designed for a fixed link and do not work with the embedded PHYs in the KSZ8795.
Delete the use of these functions in favour of the generic PHY implementations which were used previously.
Fixes: a5e63c7d38d5 ("net: phy: micrel: Fix detection of ksz87xx switch") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/micrel.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 7afd9edaf249..22ca29cc9ad7 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1406,8 +1406,6 @@ static struct phy_driver ksphy_driver[] = { .name = "Micrel KSZ87XX Switch", /* PHY_BASIC_FEATURES */ .config_init = kszphy_config_init, - .config_aneg = ksz8873mll_config_aneg, - .read_status = ksz8873mll_read_status, .match_phy_device = ksz8795_match_phy_device, .suspend = genphy_suspend, .resume = genphy_resume,
From: Pali Rohár pali@kernel.org
[ Upstream commit 2459dcb96bcba94c08d6861f8a050185ff301672 ]
IFLA_IFNAME is nul-term string which means that IFLA_IFNAME buffer can be larger than length of string which contains.
Function __rtnl_newlink() generates new own ifname if either IFLA_IFNAME was not specified at all or userspace passed empty nul-term string.
It is expected that if userspace does not specify ifname for new ppp netdev then kernel generates one in format "ppp<id>" where id matches to the ppp unit id which can be later obtained by PPPIOCGUNIT ioctl.
And it works in this way if IFLA_IFNAME is not specified at all. But it does not work when IFLA_IFNAME is specified with empty string.
So fix this logic also for empty IFLA_IFNAME in ppp_nl_newlink() function and correctly generates ifname based on ppp unit identifier if userspace did not provided preferred ifname.
Without this patch when IFLA_IFNAME was specified with empty string then kernel created a new ppp interface in format "ppp<id>" but id did not match ppp unit id returned by PPPIOCGUNIT ioctl. In this case id was some number generated by __rtnl_newlink() function.
Signed-off-by: Pali Rohár pali@kernel.org Fixes: bb8082f69138 ("ppp: build ifname using unit identifier for rtnl based devices") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ppp/ppp_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index b9dd47bd597f..7a099c37527f 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -1317,7 +1317,7 @@ static int ppp_nl_newlink(struct net *src_net, struct net_device *dev, * the PPP unit identifer as suffix (i.e. ppp<unit_id>). This allows * userspace to infer the device name using to the PPPIOCGUNIT ioctl. */ - if (!tb[IFLA_IFNAME]) + if (!tb[IFLA_IFNAME] || !nla_len(tb[IFLA_IFNAME]) || !*(char *)nla_data(tb[IFLA_IFNAME])) conf.ifname_is_set = false;
err = ppp_dev_configure(src_net, dev, &conf);
From: Nadav Amit namit@vmware.com
[ Upstream commit ef98eb0409c31c39ab55ff46b2721c3b4f84c122 ]
When using SQPOLL, the submission queue polling thread calls task_work_run() to run queued work. However, when work is added with TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains set afterwards and is never cleared.
Consequently, when the submission queue polling thread checks whether signal_pending(), it may always find a pending signal, if task_work_add() was ever called before.
The impact of this bug might be different on different kernel versions. It appears that on 5.14 it would only cause unnecessary calculation and prevent the polling thread from sleeping. On 5.13, where the bug was found, it stops the polling thread from finding newly submitted work.
Instead of task_work_run(), use tracehook_notify_signal() that clears TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to current->task_works to avoid a race in which task_works is cleared but the TIF_NOTIFY_SIGNAL is set.
Fixes: 685fe7feedb96 ("io-wq: eliminate the need for a manager thread") Cc: Jens Axboe axboe@kernel.dk Cc: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Nadav Amit namit@vmware.com Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 32f3df13a812..8a8507cab580 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -78,6 +78,7 @@ #include <linux/task_work.h> #include <linux/pagemap.h> #include <linux/io_uring.h> +#include <linux/tracehook.h>
#define CREATE_TRACE_POINTS #include <trace/events/io_uring.h> @@ -2250,9 +2251,9 @@ static inline unsigned int io_put_rw_kbuf(struct io_kiocb *req)
static inline bool io_run_task_work(void) { - if (current->task_works) { + if (test_thread_flag(TIF_NOTIFY_SIGNAL) || current->task_works) { __set_current_state(TASK_RUNNING); - task_work_run(); + tracehook_notify_signal(); return true; }
From: Karsten Graul kgraul@linux.ibm.com
[ Upstream commit 8f3d65c166797746455553f4eaf74a5f89f996d4 ]
There can be a race between the waiters for a tx work request buffer and the link down processing that finally clears the link. Although all waiters are woken up before the link is cleared there might be waiters which did not yet get back control and are still waiting. This results in an access to a cleared wait queue head.
Fix this by introducing atomic reference counting around the wait calls, and wait with the link clear processing until all waiters have finished. Move the work request layer related calls into smc_wr.c and set the link state to INACTIVE before calling smcr_link_clear() in smc_llc_srv_add_link().
Fixes: 15e1b99aadfb ("net/smc: no WR buffer wait for terminating link group") Signed-off-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: Guvenc Gulce guvenc@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_core.h | 2 ++ net/smc/smc_llc.c | 10 ++++------ net/smc/smc_tx.c | 18 +++++++++++++++++- net/smc/smc_wr.c | 10 ++++++++++ 4 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 6d6fd1397c87..64d86298e4df 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -97,6 +97,7 @@ struct smc_link { unsigned long *wr_tx_mask; /* bit mask of used indexes */ u32 wr_tx_cnt; /* number of WR send buffers */ wait_queue_head_t wr_tx_wait; /* wait for free WR send buf */ + atomic_t wr_tx_refcnt; /* tx refs to link */
struct smc_wr_buf *wr_rx_bufs; /* WR recv payload buffers */ struct ib_recv_wr *wr_rx_ibs; /* WR recv meta data */ @@ -109,6 +110,7 @@ struct smc_link {
struct ib_reg_wr wr_reg; /* WR register memory region */ wait_queue_head_t wr_reg_wait; /* wait for wr_reg result */ + atomic_t wr_reg_refcnt; /* reg refs to link */ enum smc_wr_reg_state wr_reg_state; /* state of wr_reg request */
u8 gid[SMC_GID_SIZE];/* gid matching used vlan id*/ diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index 273eaf1bfe49..2e7560eba981 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -888,6 +888,7 @@ int smc_llc_cli_add_link(struct smc_link *link, struct smc_llc_qentry *qentry) if (!rc) goto out; out_clear_lnk: + lnk_new->state = SMC_LNK_INACTIVE; smcr_link_clear(lnk_new, false); out_reject: smc_llc_cli_add_link_reject(qentry); @@ -1184,6 +1185,7 @@ int smc_llc_srv_add_link(struct smc_link *link) goto out_err; return 0; out_err: + link_new->state = SMC_LNK_INACTIVE; smcr_link_clear(link_new, false); return rc; } @@ -1286,10 +1288,8 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr) del_llc->reason = 0; smc_llc_send_message(lnk, &qentry->msg); /* response */
- if (smc_link_downing(&lnk_del->state)) { - if (smc_switch_conns(lgr, lnk_del, false)) - smc_wr_tx_wait_no_pending_sends(lnk_del); - } + if (smc_link_downing(&lnk_del->state)) + smc_switch_conns(lgr, lnk_del, false); smcr_link_clear(lnk_del, true);
active_links = smc_llc_active_link_count(lgr); @@ -1805,8 +1805,6 @@ void smc_llc_link_clear(struct smc_link *link, bool log) link->smcibdev->ibdev->name, link->ibport); complete(&link->llc_testlink_resp); cancel_delayed_work_sync(&link->llc_testlink_wrk); - smc_wr_wakeup_reg_wait(link); - smc_wr_wakeup_tx_wait(link); }
/* register a new rtoken at the remote peer (for all links) */ diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index 4532c16bf85e..ff02952b3d03 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -479,7 +479,7 @@ static int smc_tx_rdma_writes(struct smc_connection *conn, /* Wakeup sndbuf consumers from any context (IRQ or process) * since there is more data to transmit; usable snd_wnd as max transmit */ -static int smcr_tx_sndbuf_nonempty(struct smc_connection *conn) +static int _smcr_tx_sndbuf_nonempty(struct smc_connection *conn) { struct smc_cdc_producer_flags *pflags = &conn->local_tx_ctrl.prod_flags; struct smc_link *link = conn->lnk; @@ -533,6 +533,22 @@ out_unlock: return rc; }
+static int smcr_tx_sndbuf_nonempty(struct smc_connection *conn) +{ + struct smc_link *link = conn->lnk; + int rc = -ENOLINK; + + if (!link) + return rc; + + atomic_inc(&link->wr_tx_refcnt); + if (smc_link_usable(link)) + rc = _smcr_tx_sndbuf_nonempty(conn); + if (atomic_dec_and_test(&link->wr_tx_refcnt)) + wake_up_all(&link->wr_tx_wait); + return rc; +} + static int smcd_tx_sndbuf_nonempty(struct smc_connection *conn) { struct smc_cdc_producer_flags *pflags = &conn->local_tx_ctrl.prod_flags; diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index cbc73a7e4d59..a419e9af36b9 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -322,9 +322,12 @@ int smc_wr_reg_send(struct smc_link *link, struct ib_mr *mr) if (rc) return rc;
+ atomic_inc(&link->wr_reg_refcnt); rc = wait_event_interruptible_timeout(link->wr_reg_wait, (link->wr_reg_state != POSTED), SMC_WR_REG_MR_WAIT_TIME); + if (atomic_dec_and_test(&link->wr_reg_refcnt)) + wake_up_all(&link->wr_reg_wait); if (!rc) { /* timeout - terminate link */ smcr_link_down_cond_sched(link); @@ -566,10 +569,15 @@ void smc_wr_free_link(struct smc_link *lnk) return; ibdev = lnk->smcibdev->ibdev;
+ smc_wr_wakeup_reg_wait(lnk); + smc_wr_wakeup_tx_wait(lnk); + if (smc_wr_tx_wait_no_pending_sends(lnk)) memset(lnk->wr_tx_mask, 0, BITS_TO_LONGS(SMC_WR_BUF_CNT) * sizeof(*lnk->wr_tx_mask)); + wait_event(lnk->wr_reg_wait, (!atomic_read(&lnk->wr_reg_refcnt))); + wait_event(lnk->wr_tx_wait, (!atomic_read(&lnk->wr_tx_refcnt)));
if (lnk->wr_rx_dma_addr) { ib_dma_unmap_single(ibdev, lnk->wr_rx_dma_addr, @@ -728,7 +736,9 @@ int smc_wr_create_link(struct smc_link *lnk) memset(lnk->wr_tx_mask, 0, BITS_TO_LONGS(SMC_WR_BUF_CNT) * sizeof(*lnk->wr_tx_mask)); init_waitqueue_head(&lnk->wr_tx_wait); + atomic_set(&lnk->wr_tx_refcnt, 0); init_waitqueue_head(&lnk->wr_reg_wait); + atomic_set(&lnk->wr_reg_refcnt, 0); return rc;
dma_unmap:
From: Guvenc Gulce guvenc@linux.ibm.com
[ Upstream commit 64513d269e8971aabb7e787955a1b320e3031306 ]
SMC clients may be assigned to a different link after the initial connection between two peers was established. In such a case, the connection counter was not correctly set.
Update the connection counter correctly when a smc client connection is assigned to a different smc link.
Fixes: 07d51580ff65 ("net/smc: Add connection counters for links") Signed-off-by: Guvenc Gulce guvenc@linux.ibm.com Tested-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 2 +- net/smc/smc_core.c | 4 ++-- net/smc/smc_core.h | 2 ++ 3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 5eff7cccceff..66fbdc63f965 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -757,7 +757,7 @@ static int smc_connect_rdma(struct smc_sock *smc, reason_code = SMC_CLC_DECL_NOSRVLINK; goto connect_abort; } - smc->conn.lnk = link; + smc_switch_link_and_count(&smc->conn, link); }
/* create send buffer and rmb */ diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 0df85a12651e..39b24f98eac5 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -916,8 +916,8 @@ static int smc_switch_cursor(struct smc_sock *smc, struct smc_cdc_tx_pend *pend, return rc; }
-static void smc_switch_link_and_count(struct smc_connection *conn, - struct smc_link *to_lnk) +void smc_switch_link_and_count(struct smc_connection *conn, + struct smc_link *to_lnk) { atomic_dec(&conn->lnk->conn_cnt); conn->lnk = to_lnk; diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 64d86298e4df..c043ecdca5c4 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -446,6 +446,8 @@ void smc_core_exit(void); int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, u8 link_idx, struct smc_init_info *ini); void smcr_link_clear(struct smc_link *lnk, bool log); +void smc_switch_link_and_count(struct smc_connection *conn, + struct smc_link *to_lnk); int smcr_buf_map_lgr(struct smc_link *lnk); int smcr_buf_reg_lgr(struct smc_link *lnk); void smcr_lgr_set_type(struct smc_link_group *lgr, enum smc_lgr_type new_type);
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit d09c548dbf3b31cb07bba562e0f452edfa01efe3 ]
When mirror/redirect a skb to a different port, the ct info should be reset for reclassification. Or the pkts will match unexpected rules. For example, with following topology and commands:
----------- | veth0 -+------- | veth1 -+------- | ------------
tc qdisc add dev veth0 clsact # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1" tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1 tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1 tc qdisc add dev veth1 clsact tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop
ping <remove ip via veth0> & tc -s filter show dev veth1 ingress
With command 'tc -s filter show', we can find the pkts were dropped on veth1.
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Roi Dayan roid@nvidia.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_mirred.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 7153c67f641e..2ef4cd2c848b 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a, goto out; }
+ /* All mirred/redirected skbs should clear previous ct info */ + nf_reset_ct(skb2); + want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
expects_nh = want_ingress || !m_mac_header_xmit;
From: Anirudh Venkataramanan anirudh.venkataramanan@intel.com
[ Upstream commit 50ac7479846053ca8054be833c1594e64de496bb ]
The userspace utility "driverctl" can be used to change/override the system's default driver choices. This is useful in some situations (buggy driver, old driver missing a device ID, trying a workaround, etc.) where the user needs to load a different driver.
However, this is also prone to user error, where a driver is mapped to a device it's not designed to drive. For example, if the ice driver is mapped to driver iavf devices, the ice driver crashes.
Add a check to return an error if the ice driver is being used to probe a virtual function.
Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series") Signed-off-by: Anirudh Venkataramanan anirudh.venkataramanan@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 0eb2307325d3..6a72a3b93037 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4014,6 +4014,11 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) struct ice_hw *hw; int i, err;
+ if (pdev->is_virtfn) { + dev_err(dev, "can't probe a virtual function\n"); + return -EINVAL; + } + /* this driver uses devres, see * Documentation/driver-api/driver-model/devres.rst */
From: Anirudh Venkataramanan anirudh.venkataramanan@intel.com
[ Upstream commit c503e63200c679e362afca7aca9d3dc63a0f45ed ]
When VFs are setup and torn down in quick succession, it is possible that a VF is torn down by the PF while the VF's virtchnl requests are still in the PF's mailbox ring. Processing the VF's virtchnl request when the VF itself doesn't exist results in undefined behavior. Fix this by adding a check to stop processing virtchnl requests when VF teardown is in progress.
Fixes: ddf30f7ff840 ("ice: Add handler to configure SR-IOV") Signed-off-by: Anirudh Venkataramanan anirudh.venkataramanan@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 7 +++++++ 2 files changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 2924c67567b8..13ffa3f6a521 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -226,6 +226,7 @@ enum ice_pf_state { ICE_VFLR_EVENT_PENDING, ICE_FLTR_OVERFLOW_PROMISC, ICE_VF_DIS, + ICE_VF_DEINIT_IN_PROGRESS, ICE_CFG_BUSY, ICE_SERVICE_SCHED, ICE_SERVICE_DIS, diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 97a46c616aca..671902d9fc35 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -615,6 +615,8 @@ void ice_free_vfs(struct ice_pf *pf) struct ice_hw *hw = &pf->hw; unsigned int tmp, i;
+ set_bit(ICE_VF_DEINIT_IN_PROGRESS, pf->state); + if (!pf->vf) return;
@@ -680,6 +682,7 @@ void ice_free_vfs(struct ice_pf *pf) i);
clear_bit(ICE_VF_DIS, pf->state); + clear_bit(ICE_VF_DEINIT_IN_PROGRESS, pf->state); clear_bit(ICE_FLAG_SRIOV_ENA, pf->flags); }
@@ -4292,6 +4295,10 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) struct device *dev; int err = 0;
+ /* if de-init is underway, don't process messages from VF */ + if (test_bit(ICE_VF_DEINIT_IN_PROGRESS, pf->state)) + return; + dev = ice_pf_to_dev(pf); if (ice_validate_vf_id(pf, vf_id)) { err = -EINVAL;
From: Brett Creeley brett.creeley@intel.com
[ Upstream commit 3ba7f53f8bf1fb862e36c7f74434ac3aceb60158 ]
In some circumstances, such as with bridging, it's possible that the stack will add the device's own MAC address to its unicast address list.
If, later, the stack deletes this address, the driver will receive a request to remove this address.
The driver stores its current MAC address as part of the VSI MAC filter list instead of separately. So, this causes a problem when the device's MAC address is deleted unexpectedly, which results in traffic failure in some cases.
The following configuration steps will reproduce the previously mentioned problem:
ip link set eth0 up ip link add dev br0 type bridge ip link set br0 up ip addr flush dev eth0 ip link set eth0 master br0 echo 1 > /sys/class/net/br0/bridge/vlan_filtering modprobe -r veth modprobe -r bridge ip addr add 192.168.1.100/24 dev eth0
The following ping command fails due to the netdev->dev_addr being deleted when removing the bridge module.
ping <link partner>
Fix this by making sure to not delete the netdev->dev_addr during MAC address sync. After fixing this issue it was noticed that the netdev_warn() in .set_mac was overly verbose, so make it at netdev_dbg().
Also, there is a possibility of a race condition between .set_mac and .set_rx_mode. Fix this by calling netif_addr_lock_bh() and netif_addr_unlock_bh() on the device's netdev when the netdev->dev_addr is going to be updated in .set_mac.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Brett Creeley brett.creeley@intel.com Tested-by: Liang Li liali@redhat.com Tested-by: Gurucharan G gurucharanx.g@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6a72a3b93037..a7f2f5c490e3 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -183,6 +183,14 @@ static int ice_add_mac_to_unsync_list(struct net_device *netdev, const u8 *addr) struct ice_netdev_priv *np = netdev_priv(netdev); struct ice_vsi *vsi = np->vsi;
+ /* Under some circumstances, we might receive a request to delete our + * own device address from our uc list. Because we store the device + * address in the VSI's MAC filter list, we need to ignore such + * requests and not delete our device address from this list. + */ + if (ether_addr_equal(addr, netdev->dev_addr)) + return 0; + if (ice_fltr_add_mac_to_list(vsi, &vsi->tmp_unsync_list, addr, ICE_FWD_TO_VSI)) return -EINVAL; @@ -4913,7 +4921,7 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi) return -EADDRNOTAVAIL;
if (ether_addr_equal(netdev->dev_addr, mac)) { - netdev_warn(netdev, "already using mac %pM\n", mac); + netdev_dbg(netdev, "already using mac %pM\n", mac); return 0; }
@@ -4924,6 +4932,7 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi) return -EBUSY; }
+ netif_addr_lock_bh(netdev); /* Clean up old MAC filter. Not an error if old filter doesn't exist */ status = ice_fltr_remove_mac(vsi, netdev->dev_addr, ICE_FWD_TO_VSI); if (status && status != ICE_ERR_DOES_NOT_EXIST) { @@ -4933,30 +4942,28 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi)
/* Add filter for new MAC. If filter exists, return success */ status = ice_fltr_add_mac(vsi, mac, ICE_FWD_TO_VSI); - if (status == ICE_ERR_ALREADY_EXISTS) { + if (status == ICE_ERR_ALREADY_EXISTS) /* Although this MAC filter is already present in hardware it's * possible in some cases (e.g. bonding) that dev_addr was * modified outside of the driver and needs to be restored back * to this value. */ - memcpy(netdev->dev_addr, mac, netdev->addr_len); netdev_dbg(netdev, "filter for MAC %pM already exists\n", mac); - return 0; - } - - /* error if the new filter addition failed */ - if (status) + else if (status) + /* error if the new filter addition failed */ err = -EADDRNOTAVAIL;
err_update_filters: if (err) { netdev_err(netdev, "can't set MAC %pM. filter update failed\n", mac); + netif_addr_unlock_bh(netdev); return err; }
/* change the netdev's MAC address */ memcpy(netdev->dev_addr, mac, netdev->addr_len); + netif_addr_unlock_bh(netdev); netdev_dbg(vsi->netdev, "updated MAC address to %pM\n", netdev->dev_addr);
From: Md Fahad Iqbal Polash md.fahad.iqbal.polash@intel.com
[ Upstream commit a7550f8b1c9712894f9e98d6caf5f49451ebd058 ]
iavf driver should set RSS LUT and key unconditionally in reset path. Currently, the driver does not do that. This patch fixes this issue.
Fixes: 2c86ac3c7079 ("i40evf: create a generic config RSS function") Signed-off-by: Md Fahad Iqbal Polash md.fahad.iqbal.polash@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/iavf/iavf_main.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 44bafedd09f2..244ec74ceca7 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1506,11 +1506,6 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter) set_bit(__IAVF_VSI_DOWN, adapter->vsi.state);
iavf_map_rings_to_vectors(adapter); - - if (RSS_AQ(adapter)) - adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_RSS; - else - err = iavf_init_rss(adapter); err: return err; } @@ -2200,6 +2195,14 @@ continue_reset: goto reset_err; }
+ if (RSS_AQ(adapter)) { + adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_RSS; + } else { + err = iavf_init_rss(adapter); + if (err) + goto reset_err; + } + adapter->aq_required |= IAVF_FLAG_AQ_GET_CONFIG; adapter->aq_required |= IAVF_FLAG_AQ_MAP_VECTORS;
From: Roi Dayan roid@nvidia.com
[ Upstream commit beb7f2de5728b0bd2140a652fa51f6ad85d159f7 ]
Without this there is a warning if source files include psample.h before skbuff.h or doesn't include it at all.
Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling") Signed-off-by: Roi Dayan roid@nvidia.com Link: https://lore.kernel.org/r/20210808065242.1522535-1-roid@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/psample.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/net/psample.h b/include/net/psample.h index e328c5127757..0509d2d6be67 100644 --- a/include/net/psample.h +++ b/include/net/psample.h @@ -31,6 +31,8 @@ struct psample_group *psample_group_get(struct net *net, u32 group_num); void psample_group_take(struct psample_group *group); void psample_group_put(struct psample_group *group);
+struct sk_buff; + #if IS_ENABLED(CONFIG_PSAMPLE)
void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
From: Guillaume Nault gnault@redhat.com
[ Upstream commit 143a8526ab5fd4f8a0c4fe2a9cb28c181dc5a95f ]
Data beyond the UDP header might not be part of the skb's linear data. Use skb_copy_bits() instead of direct access to skb->data+X, so that we read the correct bytes even on a fragmented skb.
Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.") Signed-off-by: Guillaume Nault gnault@redhat.com Link: https://lore.kernel.org/r/7741c46545c6ef02e70c80a9b32814b22d9616b3.162826497... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bareudp.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c index edfad93e7b68..22e26458a86e 100644 --- a/drivers/net/bareudp.c +++ b/drivers/net/bareudp.c @@ -71,12 +71,18 @@ static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb) family = AF_INET6;
if (bareudp->ethertype == htons(ETH_P_IP)) { - struct iphdr *iphdr; + __u8 ipversion;
- iphdr = (struct iphdr *)(skb->data + BAREUDP_BASE_HLEN); - if (iphdr->version == 4) { - proto = bareudp->ethertype; - } else if (bareudp->multi_proto_mode && (iphdr->version == 6)) { + if (skb_copy_bits(skb, BAREUDP_BASE_HLEN, &ipversion, + sizeof(ipversion))) { + bareudp->dev->stats.rx_dropped++; + goto drop; + } + ipversion >>= 4; + + if (ipversion == 4) { + proto = htons(ETH_P_IP); + } else if (ipversion == 6 && bareudp->multi_proto_mode) { proto = htons(ETH_P_IPV6); } else { bareudp->dev->stats.rx_dropped++;
From: Hao Xu haoxu@linux.alibaba.com
[ Upstream commit 49e7f0c789add1330b111af0b7caeb0e87df063e ]
The former patch to add check between nr_workers and max_workers has a bug, which will cause unconditionally creating io-workers. That's because the result of the check doesn't affect the call of create_io_worker(), fix it by bringing in a boolean value for it.
Fixes: 21698274da5b ("io-wq: fix lack of acct->nr_workers < acct->max_workers judgement") Signed-off-by: Hao Xu haoxu@linux.alibaba.com Link: https://lore.kernel.org/r/20210808135434.68667-2-haoxu@linux.alibaba.com [axboe: drop hunk that isn't strictly needed] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io-wq.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c index 77026d42cb79..2c8a9a394884 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -283,16 +283,24 @@ static void create_worker_cb(struct callback_head *cb) struct io_wq *wq; struct io_wqe *wqe; struct io_wqe_acct *acct; + bool do_create = false;
cwd = container_of(cb, struct create_worker_data, work); wqe = cwd->wqe; wq = wqe->wq; acct = &wqe->acct[cwd->index]; raw_spin_lock_irq(&wqe->lock); - if (acct->nr_workers < acct->max_workers) + if (acct->nr_workers < acct->max_workers) { acct->nr_workers++; + do_create = true; + } raw_spin_unlock_irq(&wqe->lock); - create_io_worker(wq, cwd->wqe, cwd->index); + if (do_create) { + create_io_worker(wq, cwd->wqe, cwd->index); + } else { + atomic_dec(&acct->nr_running); + io_worker_ref_put(wq); + } kfree(cwd); }
From: Hao Xu haoxu@linux.alibaba.com
[ Upstream commit 47cae0c71f7a126903f930191e6e9f103674aca1 ]
There may be cases like: A B spin_lock(wqe->lock) nr_workers is 0 nr_workers++ spin_unlock(wqe->lock) spin_lock(wqe->lock) nr_wokers is 1 nr_workers++ spin_unlock(wqe->lock) create_io_worker() acct->worker is 1 create_io_worker() acct->worker is 1
There should be one worker marked IO_WORKER_F_FIXED, but no one is. Fix this by introduce a new agrument for create_io_worker() to indicate if it is the first worker.
Fixes: 3d4e4face9c1 ("io-wq: fix no lock protection of acct->nr_worker") Signed-off-by: Hao Xu haoxu@linux.alibaba.com Link: https://lore.kernel.org/r/20210808135434.68667-3-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io-wq.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c index 2c8a9a394884..91b0d1fb90eb 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -130,7 +130,7 @@ struct io_cb_cancel_data { bool cancel_all; };
-static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index); +static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index, bool first); static void io_wqe_dec_running(struct io_worker *worker);
static bool io_worker_get(struct io_worker *worker) @@ -249,18 +249,20 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) rcu_read_unlock();
if (!ret) { - bool do_create = false; + bool do_create = false, first = false;
raw_spin_lock_irq(&wqe->lock); if (acct->nr_workers < acct->max_workers) { atomic_inc(&acct->nr_running); atomic_inc(&wqe->wq->worker_refs); + if (!acct->nr_workers) + first = true; acct->nr_workers++; do_create = true; } raw_spin_unlock_irq(&wqe->lock); if (do_create) - create_io_worker(wqe->wq, wqe, acct->index); + create_io_worker(wqe->wq, wqe, acct->index, first); } }
@@ -283,7 +285,7 @@ static void create_worker_cb(struct callback_head *cb) struct io_wq *wq; struct io_wqe *wqe; struct io_wqe_acct *acct; - bool do_create = false; + bool do_create = false, first = false;
cwd = container_of(cb, struct create_worker_data, work); wqe = cwd->wqe; @@ -291,12 +293,14 @@ static void create_worker_cb(struct callback_head *cb) acct = &wqe->acct[cwd->index]; raw_spin_lock_irq(&wqe->lock); if (acct->nr_workers < acct->max_workers) { + if (!acct->nr_workers) + first = true; acct->nr_workers++; do_create = true; } raw_spin_unlock_irq(&wqe->lock); if (do_create) { - create_io_worker(wq, cwd->wqe, cwd->index); + create_io_worker(wq, wqe, cwd->index, first); } else { atomic_dec(&acct->nr_running); io_worker_ref_put(wq); @@ -642,7 +646,7 @@ void io_wq_worker_sleeping(struct task_struct *tsk) raw_spin_unlock_irq(&worker->wqe->lock); }
-static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index) +static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index, bool first) { struct io_wqe_acct *acct = &wqe->acct[index]; struct io_worker *worker; @@ -683,7 +687,7 @@ fail: worker->flags |= IO_WORKER_F_FREE; if (index == IO_WQ_ACCT_BOUND) worker->flags |= IO_WORKER_F_BOUND; - if ((acct->nr_workers == 1) && (worker->flags & IO_WORKER_F_BOUND)) + if (first && (worker->flags & IO_WORKER_F_BOUND)) worker->flags |= IO_WORKER_F_FIXED; raw_spin_unlock_irq(&wqe->lock); wake_up_new_task(tsk);
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit c633e799641cf13960bd83189b4d5b1b2adb0d4e ]
Clean SF resources if mlx5 eth failed to initialize.
Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Leon Romanovsky leonro@nvidia.com Reviewed-by: Parav Pandit parav@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 ++++-------- drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 5 +++++ 2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 0d0f63a27aba..8c6d7f70e783 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1781,16 +1781,14 @@ static int __init init(void) if (err) goto err_sf;
-#ifdef CONFIG_MLX5_CORE_EN err = mlx5e_init(); - if (err) { - pci_unregister_driver(&mlx5_core_driver); - goto err_debug; - } -#endif + if (err) + goto err_en;
return 0;
+err_en: + mlx5_sf_driver_unregister(); err_sf: pci_unregister_driver(&mlx5_core_driver); err_debug: @@ -1800,9 +1798,7 @@ err_debug:
static void __exit cleanup(void) { -#ifdef CONFIG_MLX5_CORE_EN mlx5e_cleanup(); -#endif mlx5_sf_driver_unregister(); pci_unregister_driver(&mlx5_core_driver); mlx5_unregister_debugfs(); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index a22b706eebd3..1824eb0b0e9a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -223,8 +223,13 @@ int mlx5_firmware_flash(struct mlx5_core_dev *dev, const struct firmware *fw, int mlx5_fw_version_query(struct mlx5_core_dev *dev, u32 *running_ver, u32 *stored_ver);
+#ifdef CONFIG_MLX5_CORE_EN int mlx5e_init(void); void mlx5e_cleanup(void); +#else +static inline int mlx5e_init(void){ return 0; } +static inline void mlx5e_cleanup(void){} +#endif
static inline bool mlx5_sriov_is_enabled(struct mlx5_core_dev *dev) {
From: Alex Vesker valex@nvidia.com
[ Upstream commit d3875924dae632d5edd908d285fffc5f07c835a3 ]
While processing encapsulated packet on RX, one of the fields that is checked is the inner packet length. If the length as specified in the header doesn't match the actual inner packet length, the packet is invalid and should be dropped. However, such packet caused the NIC to hang.
This patch turns on a 'fail_on_error' HW bit which allows HW to drop such an invalid packet while processing RX packet and trying to decap it.
Fixes: ad17dc8cf910 ("net/mlx5: DR, Move STEv0 action apply logic") Signed-off-by: Alex Vesker valex@nvidia.com Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c index 0757a4e8540e..42446e92aa38 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c @@ -352,6 +352,7 @@ static void dr_ste_v0_set_rx_decap(u8 *hw_ste_p) { MLX5_SET(ste_rx_steering_mult, hw_ste_p, tunneling_action, DR_STE_TUNL_ACTION_DECAP); + MLX5_SET(ste_rx_steering_mult, hw_ste_p, fail_on_error, 1); }
static void dr_ste_v0_set_rx_pop_vlan(u8 *hw_ste_p) @@ -365,6 +366,7 @@ static void dr_ste_v0_set_rx_decap_l3(u8 *hw_ste_p, bool vlan) MLX5_SET(ste_rx_steering_mult, hw_ste_p, tunneling_action, DR_STE_TUNL_ACTION_L3_DECAP); MLX5_SET(ste_modify_packet, hw_ste_p, action_description, vlan ? 1 : 0); + MLX5_SET(ste_rx_steering_mult, hw_ste_p, fail_on_error, 1); }
static void dr_ste_v0_set_rewrite_actions(u8 *hw_ste_p, u16 num_of_actions,
From: Roi Dayan roid@nvidia.com
[ Upstream commit c623c95afa56bf4bf64e4f58742dc94616ef83db ]
It could be local and remote are on the same machine and the route result will be a local route which will result in creating encap id with src/dst mac address of 0.
Fixes: a54e20b4fcae ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Roi Dayan roid@nvidia.com Reviewed-by: Maor Dickman maord@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c index 172e0474f2e6..3980a3905084 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c @@ -124,6 +124,11 @@ static int mlx5e_route_lookup_ipv4_get(struct mlx5e_priv *priv, if (IS_ERR(rt)) return PTR_ERR(rt);
+ if (rt->rt_type != RTN_UNICAST) { + ret = -ENETUNREACH; + goto err_rt_release; + } + if (mlx5_lag_is_multipath(mdev) && rt->rt_gw_family != AF_INET) { ret = -ENETUNREACH; goto err_rt_release;
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit 8ba3e4c85825c8801a2c298dcadac650a40d7137 ]
mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to free the outstanding descriptors, which relies on mlx5e_page_release_dynamic and page_pool_release_page. However, page_pool_destroy is already called by this point, because mlx5e_close_rq runs before mlx5e_close_xdpsq.
This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and mlx5e_close_rq.
The commit cited below started calling page_pool_destroy directly from the driver. Previously, the page pool was destroyed under a call_rcu from xdp_rxq_info_unreg_mem_model, which would defer the deallocation until after the XDPSQ is cleaned up.
Fixes: 1da4bbeffe41 ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_main.c | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d0d9acb17253..3221a6a2f221 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1887,30 +1887,30 @@ static int mlx5e_open_queues(struct mlx5e_channel *c, if (err) goto err_close_icosq;
+ err = mlx5e_open_rxq_rq(c, params, &cparam->rq); + if (err) + goto err_close_sqs; + if (c->xdp) { err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, NULL, &c->rq_xdpsq, false); if (err) - goto err_close_sqs; + goto err_close_rq; }
- err = mlx5e_open_rxq_rq(c, params, &cparam->rq); - if (err) - goto err_close_xdp_sq; - err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, NULL, &c->xdpsq, true); if (err) - goto err_close_rq; + goto err_close_xdp_sq;
return 0;
-err_close_rq: - mlx5e_close_rq(&c->rq); - err_close_xdp_sq: if (c->xdp) mlx5e_close_xdpsq(&c->rq_xdpsq);
+err_close_rq: + mlx5e_close_rq(&c->rq); + err_close_sqs: mlx5e_close_sqs(c);
@@ -1945,9 +1945,9 @@ err_close_async_icosq_cq: static void mlx5e_close_queues(struct mlx5e_channel *c) { mlx5e_close_xdpsq(&c->xdpsq); - mlx5e_close_rq(&c->rq); if (c->xdp) mlx5e_close_xdpsq(&c->rq_xdpsq); + mlx5e_close_rq(&c->rq); mlx5e_close_sqs(c); mlx5e_close_icosq(&c->icosq); mlx5e_close_icosq(&c->async_icosq);
From: Aya Levin ayal@nvidia.com
[ Upstream commit c85a6b8feb16c0cdbbc8d9f581c7861c4a9ac351 ]
Since switchdev mode can't support devlink traps, verify there are no active devlink traps before moving eswitch to switchdev mode. If there are active traps, prevent the switchdev mode configuration.
Fixes: eb3862a0525d ("net/mlx5e: Enable traps according to link state") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index b66e12753f37..d0e4daa55a4a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -48,6 +48,7 @@ #include "lib/fs_chains.h" #include "en_tc.h" #include "en/mapping.h" +#include "devlink.h"
#define mlx5_esw_for_each_rep(esw, i, rep) \ xa_for_each(&((esw)->offloads.vport_reps), i, rep) @@ -2984,12 +2985,19 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, if (cur_mlx5_mode == mlx5_mode) goto unlock;
- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) + if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) { + if (mlx5_devlink_trap_get_num_active(esw->dev)) { + NL_SET_ERR_MSG_MOD(extack, + "Can't change mode while devlink traps are active"); + err = -EOPNOTSUPP; + goto unlock; + } err = esw_offloads_start(esw, extack); - else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) + } else if (mode == DEVLINK_ESWITCH_MODE_LEGACY) { err = esw_offloads_stop(esw, extack); - else + } else { err = -EINVAL; + }
unlock: mlx5_esw_unlock(esw);
From: Chris Mi cmi@nvidia.com
[ Upstream commit 88bbd7b2369aca4598eb8f38c5f16be98c3bb5d4 ]
Free the offload sample action on error.
Fixes: f94d6389f6a8 ("net/mlx5e: TC, Add support to offload sample action") Signed-off-by: Chris Mi cmi@nvidia.com Reviewed-by: Oz Shlomo ozsh@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/esw/sample.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/sample.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/sample.c index 794012c5c476..d3ad78aa9d45 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/sample.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/sample.c @@ -501,6 +501,7 @@ err_sampler: err_offload_rule: mlx5_esw_vporttbl_put(esw, &per_vport_tbl_attr); err_default_tbl: + kfree(sample_flow); return ERR_PTR(err); }
From: Shay Drory shayd@nvidia.com
[ Upstream commit 563476ae0c5e48a028cbfa38fa9d2fc0418eb88f ]
The CQ destroy is performed based on the IRQ number that is stored in cq->irqn. That number wasn't set explicitly during CQ creation and as expected some of the API users of mlx5_core_create_cq() forgot to update it.
This caused to wrong synchronization call of the wrong IRQ with a number 0 instead of the real one.
As a fix, set the IRQ number directly in the mlx5_core_create_cq() and update all users accordingly.
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Fixes: ef1659ade359 ("IB/mlx5: Add DEVX support for CQ events") Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/cq.c | 4 +--- drivers/infiniband/hw/mlx5/devx.c | 3 +-- drivers/net/ethernet/mellanox/mlx5/core/cq.c | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 13 ++---------- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 20 +++++++++++++++---- .../ethernet/mellanox/mlx5/core/fpga/conn.c | 4 +--- .../net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 ++ .../mellanox/mlx5/core/steering/dr_send.c | 4 +--- drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +-- include/linux/mlx5/driver.h | 3 +-- 10 files changed, 27 insertions(+), 30 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 9ce01f729673..e14a14b634a5 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -941,7 +941,6 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, u32 *cqb = NULL; void *cqc; int cqe_size; - unsigned int irqn; int eqn; int err;
@@ -980,7 +979,7 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, INIT_WORK(&cq->notify_work, notify_soft_wc_handler); }
- err = mlx5_vector2eqn(dev->mdev, vector, &eqn, &irqn); + err = mlx5_vector2eqn(dev->mdev, vector, &eqn); if (err) goto err_cqb;
@@ -1003,7 +1002,6 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, goto err_cqb;
mlx5_ib_dbg(dev, "cqn 0x%x\n", cq->mcq.cqn); - cq->mcq.irqn = irqn; if (udata) cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp; else diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index eb9b0a2707f8..c869b2a91a28 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -975,7 +975,6 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_QUERY_EQN)( struct mlx5_ib_dev *dev; int user_vector; int dev_eqn; - unsigned int irqn; int err;
if (uverbs_copy_from(&user_vector, attrs, @@ -987,7 +986,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_QUERY_EQN)( return PTR_ERR(c); dev = to_mdev(c->ibucontext.device);
- err = mlx5_vector2eqn(dev->mdev, user_vector, &dev_eqn, &irqn); + err = mlx5_vector2eqn(dev->mdev, user_vector, &dev_eqn); if (err < 0) return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c index df3e4938ecdd..360e093874d4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c @@ -134,6 +134,7 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, cq->cqn);
cq->uar = dev->priv.uar; + cq->irqn = eq->core.irqn;
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 3221a6a2f221..779a4abead01 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1531,15 +1531,9 @@ static int mlx5e_alloc_cq_common(struct mlx5e_priv *priv, { struct mlx5_core_dev *mdev = priv->mdev; struct mlx5_core_cq *mcq = &cq->mcq; - int eqn_not_used; - unsigned int irqn; int err; u32 i;
- err = mlx5_vector2eqn(mdev, param->eq_ix, &eqn_not_used, &irqn); - if (err) - return err; - err = mlx5_cqwq_create(mdev, ¶m->wq, param->cqc, &cq->wq, &cq->wq_ctrl); if (err) @@ -1553,7 +1547,6 @@ static int mlx5e_alloc_cq_common(struct mlx5e_priv *priv, mcq->vector = param->eq_ix; mcq->comp = mlx5e_completion_event; mcq->event = mlx5e_cq_error_event; - mcq->irqn = irqn;
for (i = 0; i < mlx5_cqwq_get_size(&cq->wq); i++) { struct mlx5_cqe64 *cqe = mlx5_cqwq_get_wqe(&cq->wq, i); @@ -1601,11 +1594,10 @@ static int mlx5e_create_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param) void *in; void *cqc; int inlen; - unsigned int irqn_not_used; int eqn; int err;
- err = mlx5_vector2eqn(mdev, param->eq_ix, &eqn, &irqn_not_used); + err = mlx5_vector2eqn(mdev, param->eq_ix, &eqn); if (err) return err;
@@ -1979,9 +1971,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, struct mlx5e_channel *c; unsigned int irq; int err; - int eqn;
- err = mlx5_vector2eqn(priv->mdev, ix, &eqn, &irq); + err = mlx5_vector2irqn(priv->mdev, ix, &irq); if (err) return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index 940333410267..0879551161d2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -871,8 +871,8 @@ clean: return err; }
-int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn, - unsigned int *irqn) +static int vector2eqnirqn(struct mlx5_core_dev *dev, int vector, int *eqn, + unsigned int *irqn) { struct mlx5_eq_table *table = dev->priv.eq_table; struct mlx5_eq_comp *eq, *n; @@ -881,8 +881,10 @@ int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
list_for_each_entry_safe(eq, n, &table->comp_eqs_list, list) { if (i++ == vector) { - *eqn = eq->core.eqn; - *irqn = eq->core.irqn; + if (irqn) + *irqn = eq->core.irqn; + if (eqn) + *eqn = eq->core.eqn; err = 0; break; } @@ -890,8 +892,18 @@ int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
return err; } + +int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn) +{ + return vector2eqnirqn(dev, vector, eqn, NULL); +} EXPORT_SYMBOL(mlx5_vector2eqn);
+int mlx5_vector2irqn(struct mlx5_core_dev *dev, int vector, unsigned int *irqn) +{ + return vector2eqnirqn(dev, vector, NULL, irqn); +} + unsigned int mlx5_comp_vectors_count(struct mlx5_core_dev *dev) { return dev->priv.eq_table->num_comp_eqs; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c index bd66ab2af5b5..d5da4ab65766 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c @@ -417,7 +417,6 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size) struct mlx5_wq_param wqp; struct mlx5_cqe64 *cqe; int inlen, err, eqn; - unsigned int irqn; void *cqc, *in; __be64 *pas; u32 i; @@ -446,7 +445,7 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size) goto err_cqwq; }
- err = mlx5_vector2eqn(mdev, smp_processor_id(), &eqn, &irqn); + err = mlx5_vector2eqn(mdev, smp_processor_id(), &eqn); if (err) { kvfree(in); goto err_cqwq; @@ -476,7 +475,6 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn *conn, int cq_size) *conn->cq.mcq.arm_db = 0; conn->cq.mcq.vector = 0; conn->cq.mcq.comp = mlx5_fpga_conn_cq_complete; - conn->cq.mcq.irqn = irqn; conn->cq.mcq.uar = fdev->conn_res.uar; tasklet_setup(&conn->cq.tasklet, mlx5_fpga_conn_cq_tasklet);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h index f607a3858ef5..bd3ed8660483 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h @@ -103,4 +103,6 @@ void mlx5_core_eq_free_irqs(struct mlx5_core_dev *dev); struct cpu_rmap *mlx5_eq_table_get_rmap(struct mlx5_core_dev *dev); #endif
+int mlx5_vector2irqn(struct mlx5_core_dev *dev, int vector, unsigned int *irqn); + #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c index 12cf323a5943..9df0e73d1c35 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c @@ -749,7 +749,6 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, struct mlx5_cqe64 *cqe; struct mlx5dr_cq *cq; int inlen, err, eqn; - unsigned int irqn; void *cqc, *in; __be64 *pas; int vector; @@ -782,7 +781,7 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, goto err_cqwq;
vector = raw_smp_processor_id() % mlx5_comp_vectors_count(mdev); - err = mlx5_vector2eqn(mdev, vector, &eqn, &irqn); + err = mlx5_vector2eqn(mdev, vector, &eqn); if (err) { kvfree(in); goto err_cqwq; @@ -818,7 +817,6 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, *cq->mcq.arm_db = cpu_to_be32(2 << 28);
cq->mcq.vector = 0; - cq->mcq.irqn = irqn; cq->mcq.uar = uar;
return cq; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 32dd5ed712cb..f3495386698a 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -526,7 +526,6 @@ static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent) void __iomem *uar_page = ndev->mvdev.res.uar->map; u32 out[MLX5_ST_SZ_DW(create_cq_out)]; struct mlx5_vdpa_cq *vcq = &mvq->cq; - unsigned int irqn; __be64 *pas; int inlen; void *cqc; @@ -566,7 +565,7 @@ static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent) /* Use vector 0 by default. Consider adding code to choose least used * vector. */ - err = mlx5_vector2eqn(mdev, 0, &eqn, &irqn); + err = mlx5_vector2eqn(mdev, 0, &eqn); if (err) goto err_vec;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index f8902bcd91e2..58236808fdf4 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1042,8 +1042,7 @@ void mlx5_unregister_debugfs(void); void mlx5_fill_page_array(struct mlx5_frag_buf *buf, __be64 *pas); void mlx5_fill_page_frag_array_perm(struct mlx5_frag_buf *buf, __be64 *pas, u8 perm); void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas); -int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn, - unsigned int *irqn); +int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn); int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
From: Aya Levin ayal@nvidia.com
[ Upstream commit bd37c2888ccaa5ceb9895718f6909b247cc372e0 ]
Check return value of mlx5_fw_tracer_start(), set error path and fix return value of mlx5_fw_tracer_init() accordingly.
Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c index 01a1d02dcf15..3f8a98093f8c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c @@ -1019,12 +1019,19 @@ int mlx5_fw_tracer_init(struct mlx5_fw_tracer *tracer) MLX5_NB_INIT(&tracer->nb, fw_tracer_event, DEVICE_TRACER); mlx5_eq_notifier_register(dev, &tracer->nb);
- mlx5_fw_tracer_start(tracer); - + err = mlx5_fw_tracer_start(tracer); + if (err) { + mlx5_core_warn(dev, "FWTracer: Failed to start tracer %d\n", err); + goto err_notifier_unregister; + } return 0;
+err_notifier_unregister: + mlx5_eq_notifier_unregister(dev, &tracer->nb); + mlx5_core_destroy_mkey(dev, &tracer->buff.mkey); err_dealloc_pd: mlx5_core_dealloc_pd(dev, tracer->buff.pdn); + cancel_work_sync(&tracer->read_fw_strings_work); return err; }
From: Christian Hewitt christianshewitt@gmail.com
[ Upstream commit bf33677a3c394bb8fddd48d3bbc97adf0262e045 ]
Add support for the OSD1 HDR registers so meson DRM can handle the HDR properties set by Amlogic u-boot on G12A and newer devices which result in blue/green/pink colour distortion to display output.
This takes the original patch submissions from Mathias [0] and [1] with corrections for formatting and the missing description and attribution needed for merge.
[0] https://lore.kernel.org/linux-amlogic/59dfd7e6-fc91-3d61-04c4-94e078a3188c@b... [1] https://lore.kernel.org/linux-amlogic/CAOKfEHBx_fboUqkENEMd-OC-NSrf46nto+vDL...
Fixes: 728883948b0d ("drm/meson: Add G12A Support for VIU setup") Suggested-by: Mathias Steiger mathias.steiger@googlemail.com Signed-off-by: Christian Hewitt christianshewitt@gmail.com Tested-by: Neil Armstrong narmstrong@baylibre.com Tested-by: Philip Milev milev.philip@gmail.com [narmsrong: adding missing space on second tested-by tag] Signed-off-by: Neil Armstrong narmstrong@baylibre.com Link: https://patchwork.freedesktop.org/patch/msgid/20210806094005.7136-1-christia... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/meson/meson_registers.h | 5 +++++ drivers/gpu/drm/meson/meson_viu.c | 7 ++++++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/meson/meson_registers.h b/drivers/gpu/drm/meson/meson_registers.h index 446e7961da48..0f3cafab8860 100644 --- a/drivers/gpu/drm/meson/meson_registers.h +++ b/drivers/gpu/drm/meson/meson_registers.h @@ -634,6 +634,11 @@ #define VPP_WRAP_OSD3_MATRIX_PRE_OFFSET2 0x3dbc #define VPP_WRAP_OSD3_MATRIX_EN_CTRL 0x3dbd
+/* osd1 HDR */ +#define OSD1_HDR2_CTRL 0x38a0 +#define OSD1_HDR2_CTRL_VDIN0_HDR2_TOP_EN BIT(13) +#define OSD1_HDR2_CTRL_REG_ONLY_MAT BIT(16) + /* osd2 scaler */ #define OSD2_VSC_PHASE_STEP 0x3d00 #define OSD2_VSC_INI_PHASE 0x3d01 diff --git a/drivers/gpu/drm/meson/meson_viu.c b/drivers/gpu/drm/meson/meson_viu.c index aede0c67a57f..259f3e6bec90 100644 --- a/drivers/gpu/drm/meson/meson_viu.c +++ b/drivers/gpu/drm/meson/meson_viu.c @@ -425,9 +425,14 @@ void meson_viu_init(struct meson_drm *priv) if (meson_vpu_is_compatible(priv, VPU_COMPATIBLE_GXM) || meson_vpu_is_compatible(priv, VPU_COMPATIBLE_GXL)) meson_viu_load_matrix(priv); - else if (meson_vpu_is_compatible(priv, VPU_COMPATIBLE_G12A)) + else if (meson_vpu_is_compatible(priv, VPU_COMPATIBLE_G12A)) { meson_viu_set_g12a_osd1_matrix(priv, RGB709_to_YUV709l_coeff, true); + /* fix green/pink color distortion from vendor u-boot */ + writel_bits_relaxed(OSD1_HDR2_CTRL_REG_ONLY_MAT | + OSD1_HDR2_CTRL_VDIN0_HDR2_TOP_EN, 0, + priv->io_base + _REG(OSD1_HDR2_CTRL)); + }
/* Initialize OSD1 fifo control register */ reg = VIU_OSD_DDR_PRIORITY_URGENT |
From: Miklos Szeredi mszeredi@redhat.com
[ Upstream commit 9b91b6b019fda817eb52f728eb9c79b3579760bc ]
There's possibility of an ABBA deadlock in case of a splice write to an overlayfs file and a concurrent splice write to a corresponding real file.
The call chain for splice to an overlay file:
-> do_splice [takes sb_writers on overlay file] -> do_splice_from -> iter_file_splice_write [takes pipe->mutex] -> vfs_iter_write ... -> ovl_write_iter [takes sb_writers on real file]
And the call chain for splice to a real file:
-> do_splice [takes sb_writers on real file] -> do_splice_from -> iter_file_splice_write [takes pipe->mutex]
Syzbot successfully bisected this to commit 82a763e61e2b ("ovl: simplify file splice").
Fix by reverting the write part of the above commit and by adding missing bits from ovl_write_iter() into ovl_splice_write().
Fixes: 82a763e61e2b ("ovl: simplify file splice") Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/overlayfs/file.c | 47 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 4d53d3b7e5fe..d081faa55e83 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -392,6 +392,51 @@ out_unlock: return ret; }
+/* + * Calling iter_file_splice_write() directly from overlay's f_op may deadlock + * due to lock order inversion between pipe->mutex in iter_file_splice_write() + * and file_start_write(real.file) in ovl_write_iter(). + * + * So do everything ovl_write_iter() does and call iter_file_splice_write() on + * the real file. + */ +static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) +{ + struct fd real; + const struct cred *old_cred; + struct inode *inode = file_inode(out); + struct inode *realinode = ovl_inode_real(inode); + ssize_t ret; + + inode_lock(inode); + /* Update mode */ + ovl_copyattr(realinode, inode); + ret = file_remove_privs(out); + if (ret) + goto out_unlock; + + ret = ovl_real_fdget(out, &real); + if (ret) + goto out_unlock; + + old_cred = ovl_override_creds(inode->i_sb); + file_start_write(real.file); + + ret = iter_file_splice_write(pipe, real.file, ppos, len, flags); + + file_end_write(real.file); + /* Update size */ + ovl_copyattr(realinode, inode); + revert_creds(old_cred); + fdput(real); + +out_unlock: + inode_unlock(inode); + + return ret; +} + static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) { struct fd real; @@ -603,7 +648,7 @@ const struct file_operations ovl_file_operations = { .fadvise = ovl_fadvise, .flush = ovl_flush, .splice_read = generic_file_splice_read, - .splice_write = iter_file_splice_write, + .splice_write = ovl_splice_write,
.copy_file_range = ovl_copy_file_range, .remap_file_range = ovl_remap_file_range,
Hi Greg,
Looks like upstream commit b91b6b019fd ("ovl: fix deadlock in splice write") needs to be added to linux-5.4.y.
The reason is that commit 82a763e61e2b ("ovl: simplify file splice") was backported to v5.4.155, and the above commit fixes this.
Applies cleanly and I reviewed that the backport is correct.
Thanks, Miklos
On Mon, Aug 16, 2021 at 3:13 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Miklos Szeredi mszeredi@redhat.com
[ Upstream commit 9b91b6b019fda817eb52f728eb9c79b3579760bc ]
There's possibility of an ABBA deadlock in case of a splice write to an overlayfs file and a concurrent splice write to a corresponding real file.
The call chain for splice to an overlay file:
-> do_splice [takes sb_writers on overlay file] -> do_splice_from -> iter_file_splice_write [takes pipe->mutex] -> vfs_iter_write ... -> ovl_write_iter [takes sb_writers on real file]
And the call chain for splice to a real file:
-> do_splice [takes sb_writers on real file] -> do_splice_from -> iter_file_splice_write [takes pipe->mutex]
Syzbot successfully bisected this to commit 82a763e61e2b ("ovl: simplify file splice").
Fix by reverting the write part of the above commit and by adding missing bits from ovl_write_iter() into ovl_splice_write().
Fixes: 82a763e61e2b ("ovl: simplify file splice") Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
fs/overlayfs/file.c | 47 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 4d53d3b7e5fe..d081faa55e83 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -392,6 +392,51 @@ out_unlock: return ret; }
+/*
- Calling iter_file_splice_write() directly from overlay's f_op may deadlock
- due to lock order inversion between pipe->mutex in iter_file_splice_write()
- and file_start_write(real.file) in ovl_write_iter().
- So do everything ovl_write_iter() does and call iter_file_splice_write() on
- the real file.
- */
+static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags)
+{
struct fd real;
const struct cred *old_cred;
struct inode *inode = file_inode(out);
struct inode *realinode = ovl_inode_real(inode);
ssize_t ret;
inode_lock(inode);
/* Update mode */
ovl_copyattr(realinode, inode);
ret = file_remove_privs(out);
if (ret)
goto out_unlock;
ret = ovl_real_fdget(out, &real);
if (ret)
goto out_unlock;
old_cred = ovl_override_creds(inode->i_sb);
file_start_write(real.file);
ret = iter_file_splice_write(pipe, real.file, ppos, len, flags);
file_end_write(real.file);
/* Update size */
ovl_copyattr(realinode, inode);
revert_creds(old_cred);
fdput(real);
+out_unlock:
inode_unlock(inode);
return ret;
+}
static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync) { struct fd real; @@ -603,7 +648,7 @@ const struct file_operations ovl_file_operations = { .fadvise = ovl_fadvise, .flush = ovl_flush, .splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
.splice_write = ovl_splice_write, .copy_file_range = ovl_copy_file_range, .remap_file_range = ovl_remap_file_range,
-- 2.30.2
On Mon, Nov 15, 2021 at 02:54:35PM +0100, Miklos Szeredi wrote:
Hi Greg,
Looks like upstream commit b91b6b019fd ("ovl: fix deadlock in splice write") needs to be added to linux-5.4.y.
The reason is that commit 82a763e61e2b ("ovl: simplify file splice") was backported to v5.4.155, and the above commit fixes this.
Applies cleanly and I reviewed that the backport is correct.
Now queued up, thanks.
greg k-h
From: Yonghong Song yhs@fb.com
[ Upstream commit a2baf4e8bb0f306fbed7b5e6197c02896a638ab5 ]
Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed a bug for bpf_get_local_storage() helper so different tasks won't mess up with each other's percpu local storage.
The percpu data contains 8 slots so it can hold up to 8 contexts (same or different tasks), for 8 different program runs, at the same time. This in general is sufficient. But our internal testing showed the following warning multiple times:
[...] warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193 __cgroup_bpf_run_filter_sock_ops+0x13e/0x180 RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180 <IRQ> tcp_call_bpf.constprop.99+0x93/0xc0 tcp_conn_request+0x41e/0xa50 ? tcp_rcv_state_process+0x203/0xe00 tcp_rcv_state_process+0x203/0xe00 ? sk_filter_trim_cap+0xbc/0x210 ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160 tcp_v6_do_rcv+0x181/0x3e0 tcp_v6_rcv+0xc65/0xcb0 ip6_protocol_deliver_rcu+0xbd/0x450 ip6_input_finish+0x11/0x20 ip6_input+0xb5/0xc0 ip6_sublist_rcv_finish+0x37/0x50 ip6_sublist_rcv+0x1dc/0x270 ipv6_list_rcv+0x113/0x140 __netif_receive_skb_list_core+0x1a0/0x210 netif_receive_skb_list_internal+0x186/0x2a0 gro_normal_list.part.170+0x19/0x40 napi_complete_done+0x65/0x150 mlx5e_napi_poll+0x1ae/0x680 __napi_poll+0x25/0x120 net_rx_action+0x11e/0x280 __do_softirq+0xbb/0x271 irq_exit_rcu+0x97/0xa0 common_interrupt+0x7f/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac ? __cgroup_bpf_run_filter_skb+0x378/0x4e0 ? do_softirq+0x34/0x70 ? ip6_finish_output2+0x266/0x590 ? ip6_finish_output+0x66/0xa0 ? ip6_output+0x6c/0x130 ? ip6_xmit+0x279/0x550 ? ip6_dst_check+0x61/0xd0 [...]
Using drgn [0] to dump the percpu buffer contents showed that on this CPU slot 0 is still available, but slots 1-7 are occupied and those tasks in slots 1-7 mostly don't exist any more. So we might have issues in bpf_cgroup_storage_unset().
Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset(). Currently, it tries to unset "current" slot with searching from the start. So the following sequence is possible:
1. A task is running and claims slot 0 2. Running BPF program is done, and it checked slot 0 has the "task" and ready to reset it to NULL (not yet). 3. An interrupt happens, another BPF program runs and it claims slot 1 with the *same* task. 4. The unset() in interrupt context releases slot 0 since it matches "task". 5. Interrupt is done, the task in process context reset slot 0.
At the end, slot 1 is not reset and the same process can continue to occupy slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF program won't be able to claim an empty slot and a warning will be issued.
To fix the issue, for unset() function, we should traverse from the last slot to the first. This way, the above issue can be avoided.
The same reverse traversal should also be done in bpf_get_local_storage() helper itself. Otherwise, incorrect local storage may be returned to BPF program.
[0] https://github.com/osandov/drgn
Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Yonghong Song yhs@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf-cgroup.h | 4 ++-- kernel/bpf/helpers.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 8b77d08d4b47..6c9b10d82c80 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -201,8 +201,8 @@ static inline void bpf_cgroup_storage_unset(void) { int i;
- for (i = 0; i < BPF_CGROUP_STORAGE_NEST_MAX; i++) { - if (unlikely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current)) + for (i = BPF_CGROUP_STORAGE_NEST_MAX - 1; i >= 0; i--) { + if (likely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current)) continue;
this_cpu_write(bpf_cgroup_storage_info[i].task, NULL); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a2f1f15ce432..728f1a0fb442 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -397,8 +397,8 @@ BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags) void *ptr; int i;
- for (i = 0; i < BPF_CGROUP_STORAGE_NEST_MAX; i++) { - if (unlikely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current)) + for (i = BPF_CGROUP_STORAGE_NEST_MAX - 1; i >= 0; i--) { + if (likely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current)) continue;
storage = this_cpu_read(bpf_cgroup_storage_info[i].storage[stype]);
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit c34f674c8875235725c3ef86147a627f165d23b4 ]
ksz_read64() currently does some dubious byte-swapping on the two halves of a 64-bit register, and then only returns the high bits. Replace this with a straightforward expression.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz_common.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index 2e6bfd333f50..6afbb41ad39e 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -205,12 +205,8 @@ static inline int ksz_read64(struct ksz_device *dev, u32 reg, u64 *val) int ret;
ret = regmap_bulk_read(dev->regmap[2], reg, value, 2); - if (!ret) { - /* Ick! ToDo: Add 64bit R/W to regmap on 32bit systems */ - value[0] = swab32(value[0]); - value[1] = swab32(value[1]); - *val = swab64((u64)*value); - } + if (!ret) + *val = (u64)value[0] << 32 | value[1];
return ret; }
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit ef3b02a1d79b691f9a354c4903cf1e6917e315f9 ]
ksz8795 has never actually enabled PVID tag insertion, and it also programmed the PVID incorrectly. To fix this:
* Allow tag insertion to be controlled per ingress port. On most chips, set bit 2 in Global Control 19. On KSZ88x3 this control flag doesn't exist.
* When adding a PVID: - Set the appropriate register bits to enable tag insertion on egress at every other port if this was the packet's ingress port. - Mask *out* the VID from the default tag, before or-ing in the new PVID.
* When removing a PVID: - Clear the same control bits to disable tag insertion. - Don't update the default tag. This wasn't doing anything useful.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz8795.c | 26 ++++++++++++++++++------- drivers/net/dsa/microchip/ksz8795_reg.h | 4 ++++ 2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index ad509a57a945..bc9ca2b0e091 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -1083,6 +1083,16 @@ static int ksz8_port_vlan_filtering(struct dsa_switch *ds, int port, bool flag, return 0; }
+static void ksz8_port_enable_pvid(struct ksz_device *dev, int port, bool state) +{ + if (ksz_is_ksz88x3(dev)) { + ksz_cfg(dev, REG_SW_INSERT_SRC_PVID, + 0x03 << (4 - 2 * port), state); + } else { + ksz_pwrite8(dev, port, REG_PORT_CTRL_12, state ? 0x0f : 0x00); + } +} + static int ksz8_port_vlan_add(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan, struct netlink_ext_ack *extack) @@ -1119,9 +1129,11 @@ static int ksz8_port_vlan_add(struct dsa_switch *ds, int port, u16 vid;
ksz_pread16(dev, port, REG_PORT_CTRL_VID, &vid); - vid &= 0xfff; + vid &= ~VLAN_VID_MASK; vid |= new_pvid; ksz_pwrite16(dev, port, REG_PORT_CTRL_VID, vid); + + ksz8_port_enable_pvid(dev, port, true); }
return 0; @@ -1132,7 +1144,7 @@ static int ksz8_port_vlan_del(struct dsa_switch *ds, int port, { bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; struct ksz_device *dev = ds->priv; - u16 data, pvid, new_pvid = 0; + u16 data, pvid; u8 fid, member, valid;
if (ksz_is_ksz88x3(dev)) @@ -1154,14 +1166,11 @@ static int ksz8_port_vlan_del(struct dsa_switch *ds, int port, valid = 0; }
- if (pvid == vlan->vid) - new_pvid = 1; - ksz8_to_vlan(dev, fid, member, valid, &data); ksz8_w_vlan_table(dev, vlan->vid, data);
- if (new_pvid != pvid) - ksz_pwrite16(dev, port, REG_PORT_CTRL_VID, pvid); + if (pvid == vlan->vid) + ksz8_port_enable_pvid(dev, port, false);
return 0; } @@ -1394,6 +1403,9 @@ static int ksz8_setup(struct dsa_switch *ds)
ksz_cfg(dev, S_MIRROR_CTRL, SW_MIRROR_RX_TX, false);
+ if (!ksz_is_ksz88x3(dev)) + ksz_cfg(dev, REG_SW_CTRL_19, SW_INS_TAG_ENABLE, true); + /* set broadcast storm protection 10% rate */ regmap_update_bits(dev->regmap[1], S_REPLACE_VID_CTRL, BROADCAST_STORM_RATE, diff --git a/drivers/net/dsa/microchip/ksz8795_reg.h b/drivers/net/dsa/microchip/ksz8795_reg.h index c2e52c40a54c..383ba7a90f9c 100644 --- a/drivers/net/dsa/microchip/ksz8795_reg.h +++ b/drivers/net/dsa/microchip/ksz8795_reg.h @@ -631,6 +631,10 @@ #define REG_PORT_4_OUT_RATE_3 0xEE #define REG_PORT_5_OUT_RATE_3 0xFE
+/* 88x3 specific */ + +#define REG_SW_INSERT_SRC_PVID 0xC2 + /* PME */
#define SW_PME_OUTPUT_ENABLE BIT(1)
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit 8f4f58f88fe0d9bd591f21f53de7dbd42baeb3fa ]
The switches supported by ksz8795 only have a per-port flag for Tag Removal. This means it is not possible to support both tagged and untagged VLANs on the same port. Reject attempts to add a VLAN that requires the flag to be changed, unless there are no VLANs currently configured.
VID 0 is excluded from this check since it is untagged regardless of the state of the flag.
On the CPU port we could support tagged and untagged VLANs at the same time. This will be enabled by a later patch.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz8795.c | 27 +++++++++++++++++++++++++- drivers/net/dsa/microchip/ksz_common.h | 1 + 2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index bc9ca2b0e091..c20fb6edd420 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -1099,13 +1099,38 @@ static int ksz8_port_vlan_add(struct dsa_switch *ds, int port, { bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; struct ksz_device *dev = ds->priv; + struct ksz_port *p = &dev->ports[port]; u16 data, new_pvid = 0; u8 fid, member, valid;
if (ksz_is_ksz88x3(dev)) return -ENOTSUPP;
- ksz_port_cfg(dev, port, P_TAG_CTRL, PORT_REMOVE_TAG, untagged); + /* If a VLAN is added with untagged flag different from the + * port's Remove Tag flag, we need to change the latter. + * Ignore VID 0, which is always untagged. + */ + if (untagged != p->remove_tag && vlan->vid != 0) { + unsigned int vid; + + /* Reject attempts to add a VLAN that requires the + * Remove Tag flag to be changed, unless there are no + * other VLANs currently configured. + */ + for (vid = 1; vid < dev->num_vlans; ++vid) { + /* Skip the VID we are going to add or reconfigure */ + if (vid == vlan->vid) + continue; + + ksz8_from_vlan(dev, dev->vlan_cache[vid].table[0], + &fid, &member, &valid); + if (valid && (member & BIT(port))) + return -EINVAL; + } + + ksz_port_cfg(dev, port, P_TAG_CTRL, PORT_REMOVE_TAG, untagged); + p->remove_tag = untagged; + }
ksz8_r_vlan_table(dev, vlan->vid, &data); ksz8_from_vlan(dev, data, &fid, &member, &valid); diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index 6afbb41ad39e..1597c63988b4 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -27,6 +27,7 @@ struct ksz_port_mib { struct ksz_port { u16 member; u16 vid_member; + bool remove_tag; /* Remove Tag flag set, for ksz8795 only */ int stp_state; struct phy_device phydev;
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit af01754f9e3c553a2ee63b4693c79a3956e230ab ]
When a VLAN is deleted from a port, the flags in struct switchdev_obj_port_vlan are always 0. ksz8_port_vlan_del() copies the BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and therefore always clears it.
In case there are multiple VLANs configured as untagged on this port - which seems useless, but is allowed - deleting one of them changes the remaining VLANs to be tagged.
It's only ever necessary to change this flag when a VLAN is added to the port, so leave it unchanged in ksz8_port_vlan_del().
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz8795.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index c20fb6edd420..46ef5bc79cbd 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -1167,7 +1167,6 @@ static int ksz8_port_vlan_add(struct dsa_switch *ds, int port, static int ksz8_port_vlan_del(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan) { - bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; struct ksz_device *dev = ds->priv; u16 data, pvid; u8 fid, member, valid; @@ -1178,8 +1177,6 @@ static int ksz8_port_vlan_del(struct dsa_switch *ds, int port, ksz_pread16(dev, port, REG_PORT_CTRL_VID, &pvid); pvid = pvid & 0xFFF;
- ksz_port_cfg(dev, port, P_TAG_CTRL, PORT_REMOVE_TAG, untagged); - ksz8_r_vlan_table(dev, vlan->vid, &data); ksz8_from_vlan(dev, data, &fid, &member, &valid);
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit 9130c2d30c17846287b803a9803106318cbe5266 ]
On the CPU port, we can support both tagged and untagged VLANs at the same time by doing any necessary untagging in software rather than hardware. To enable that, keep the CPU port's Remove Tag flag cleared and set the dsa_switch::untag_bridge_pvid flag.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz8795.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 46ef5bc79cbd..4bd735c5183c 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -1109,8 +1109,10 @@ static int ksz8_port_vlan_add(struct dsa_switch *ds, int port, /* If a VLAN is added with untagged flag different from the * port's Remove Tag flag, we need to change the latter. * Ignore VID 0, which is always untagged. + * Ignore CPU port, which will always be tagged. */ - if (untagged != p->remove_tag && vlan->vid != 0) { + if (untagged != p->remove_tag && vlan->vid != 0 && + port != dev->cpu_port) { unsigned int vid;
/* Reject attempts to add a VLAN that requires the @@ -1655,6 +1657,11 @@ static int ksz8_switch_init(struct ksz_device *dev) /* set the real number of ports */ dev->ds->num_ports = dev->port_cnt;
+ /* We rely on software untagging on the CPU port, so that we + * can support both tagged and untagged VLANs + */ + dev->ds->untag_bridge_pvid = true; + return 0; }
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit 164844135a3f215d3018ee9d6875336beb942413 ]
Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable hardware flag. That controls discarding of packets with a VID that has not been enabled for any port on the switch.
Since it is a global flag, set the dsa_switch::vlan_filtering_is_global flag so that the DSA core understands this can't be controlled per port.
When VLAN filtering is enabled, the switch should also discard packets with a VID that's not enabled on the ingress port. Set or clear each external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering() to make that happen.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz8795.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 4bd735c5183c..8e2a8103d590 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -1078,8 +1078,14 @@ static int ksz8_port_vlan_filtering(struct dsa_switch *ds, int port, bool flag, if (ksz_is_ksz88x3(dev)) return -ENOTSUPP;
+ /* Discard packets with VID not enabled on the switch */ ksz_cfg(dev, S_MIRROR_CTRL, SW_VLAN_ENABLE, flag);
+ /* Discard packets with VID not enabled on the ingress port */ + for (port = 0; port < dev->phy_port_cnt; ++port) + ksz_port_cfg(dev, port, REG_PORT_CTRL_2, PORT_INGRESS_FILTER, + flag); + return 0; }
@@ -1662,6 +1668,11 @@ static int ksz8_switch_init(struct ksz_device *dev) */ dev->ds->untag_bridge_pvid = true;
+ /* VLAN filtering is partly controlled by the global VLAN + * Enable flag + */ + dev->ds->vlan_filtering_is_global = true; + return 0; }
From: Ben Hutchings ben.hutchings@mind.be
[ Upstream commit 411d466d94a6b16a20c8b552e403b7e8ce2397a2 ]
The magic number 4 in VLAN table lookup was the number of entries we can read and write at once. Using phy_port_cnt here doesn't make sense and presumably broke VLAN filtering for 3-port switches. Change it back to 4.
Fixes: 4ce2a984abd8 ("net: dsa: microchip: ksz8795: use phy_port_cnt ...") Signed-off-by: Ben Hutchings ben.hutchings@mind.be Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/microchip/ksz8795.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 8e2a8103d590..8eb9a45c98cf 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -684,8 +684,8 @@ static void ksz8_r_vlan_entries(struct ksz_device *dev, u16 addr) shifts = ksz8->shifts;
ksz8_r_table(dev, TABLE_VLAN, addr, &data); - addr *= dev->phy_port_cnt; - for (i = 0; i < dev->phy_port_cnt; i++) { + addr *= 4; + for (i = 0; i < 4; i++) { dev->vlan_cache[addr + i].table[0] = (u16)data; data >>= shifts[VLAN_TABLE]; } @@ -699,7 +699,7 @@ static void ksz8_r_vlan_table(struct ksz_device *dev, u16 vid, u16 *vlan) u64 buf;
data = (u16 *)&buf; - addr = vid / dev->phy_port_cnt; + addr = vid / 4; index = vid & 3; ksz8_r_table(dev, TABLE_VLAN, addr, &buf); *vlan = data[index]; @@ -713,7 +713,7 @@ static void ksz8_w_vlan_table(struct ksz_device *dev, u16 vid, u16 vlan) u64 buf;
data = (u16 *)&buf; - addr = vid / dev->phy_port_cnt; + addr = vid / 4; index = vid & 3; ksz8_r_table(dev, TABLE_VLAN, addr, &buf); data[index] = vlan;
From: Takeshi Misawa jeliantsurux@gmail.com
[ Upstream commit 1090340f7ee53e824fd4eef66a4855d548110c5b ]
If IEEE-802.15.4-RAW is closed before receive skb, skb is leaked. Fix this, by freeing sk_receive_queue in sk->sk_destruct().
syzbot report: BUG: memory leak unreferenced object 0xffff88810f644600 (size 232): comm "softirq", pid 0, jiffies 4294967032 (age 81.270s) hex dump (first 32 bytes): 10 7d 4b 12 81 88 ff ff 10 7d 4b 12 81 88 ff ff .}K......}K..... 00 00 00 00 00 00 00 00 40 7c 4b 12 81 88 ff ff ........@|K..... backtrace: [<ffffffff83651d4a>] skb_clone+0xaa/0x2b0 net/core/skbuff.c:1496 [<ffffffff83fe1b80>] ieee802154_raw_deliver net/ieee802154/socket.c:369 [inline] [<ffffffff83fe1b80>] ieee802154_rcv+0x100/0x340 net/ieee802154/socket.c:1070 [<ffffffff8367cc7a>] __netif_receive_skb_one_core+0x6a/0xa0 net/core/dev.c:5384 [<ffffffff8367cd07>] __netif_receive_skb+0x27/0xa0 net/core/dev.c:5498 [<ffffffff8367cdd9>] netif_receive_skb_internal net/core/dev.c:5603 [inline] [<ffffffff8367cdd9>] netif_receive_skb+0x59/0x260 net/core/dev.c:5662 [<ffffffff83fe6302>] ieee802154_deliver_skb net/mac802154/rx.c:29 [inline] [<ffffffff83fe6302>] ieee802154_subif_frame net/mac802154/rx.c:102 [inline] [<ffffffff83fe6302>] __ieee802154_rx_handle_packet net/mac802154/rx.c:212 [inline] [<ffffffff83fe6302>] ieee802154_rx+0x612/0x620 net/mac802154/rx.c:284 [<ffffffff83fe59a6>] ieee802154_tasklet_handler+0x86/0xa0 net/mac802154/main.c:35 [<ffffffff81232aab>] tasklet_action_common.constprop.0+0x5b/0x100 kernel/softirq.c:557 [<ffffffff846000bf>] __do_softirq+0xbf/0x2ab kernel/softirq.c:345 [<ffffffff81232f4c>] do_softirq kernel/softirq.c:248 [inline] [<ffffffff81232f4c>] do_softirq+0x5c/0x80 kernel/softirq.c:235 [<ffffffff81232fc1>] __local_bh_enable_ip+0x51/0x60 kernel/softirq.c:198 [<ffffffff8367a9a4>] local_bh_enable include/linux/bottom_half.h:32 [inline] [<ffffffff8367a9a4>] rcu_read_unlock_bh include/linux/rcupdate.h:745 [inline] [<ffffffff8367a9a4>] __dev_queue_xmit+0x7f4/0xf60 net/core/dev.c:4221 [<ffffffff83fe2db4>] raw_sendmsg+0x1f4/0x2b0 net/ieee802154/socket.c:295 [<ffffffff8363af16>] sock_sendmsg_nosec net/socket.c:654 [inline] [<ffffffff8363af16>] sock_sendmsg+0x56/0x80 net/socket.c:674 [<ffffffff8363deec>] __sys_sendto+0x15c/0x200 net/socket.c:1977 [<ffffffff8363dfb6>] __do_sys_sendto net/socket.c:1989 [inline] [<ffffffff8363dfb6>] __se_sys_sendto net/socket.c:1985 [inline] [<ffffffff8363dfb6>] __x64_sys_sendto+0x26/0x30 net/socket.c:1985
Fixes: 9ec767160357 ("net: add IEEE 802.15.4 socket family implementation") Reported-and-tested-by: syzbot+1f68113fa907bf0695a8@syzkaller.appspotmail.com Signed-off-by: Takeshi Misawa jeliantsurux@gmail.com Acked-by: Alexander Aring aahringo@redhat.com Link: https://lore.kernel.org/r/20210805075414.GA15796@DESKTOP Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ieee802154/socket.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c index a45a0401adc5..c25f7617770c 100644 --- a/net/ieee802154/socket.c +++ b/net/ieee802154/socket.c @@ -984,6 +984,11 @@ static const struct proto_ops ieee802154_dgram_ops = { .sendpage = sock_no_sendpage, };
+static void ieee802154_sock_destruct(struct sock *sk) +{ + skb_queue_purge(&sk->sk_receive_queue); +} + /* Create a socket. Initialise the socket, blank the addresses * set the state. */ @@ -1024,7 +1029,7 @@ static int ieee802154_create(struct net *net, struct socket *sock, sock->ops = ops;
sock_init_data(sock, sk); - /* FIXME: sk->sk_destruct */ + sk->sk_destruct = ieee802154_sock_destruct; sk->sk_family = PF_IEEE802154;
/* Checksums on by default */
From: Eric Dumazet edumazet@google.com
[ Upstream commit 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15 ]
Fix the data-race reported by syzbot [1] Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count while another change just occured from another context.
in_dev->mr_ifc_count is only 8bit wide, so the race had little consequences.
[1] BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire
write to 0xffff8881051e3062 of 1 bytes by task 12547 on cpu 0: igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821 igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356 ____ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461 __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199 ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218 do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline] ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423 tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362 __sys_setsockopt+0x18f/0x200 net/socket.c:2159 __do_sys_setsockopt net/socket.c:2170 [inline] __se_sys_setsockopt net/socket.c:2167 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881051e3062 of 1 bytes by interrupt on cpu 1: igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419 expire_timers+0x135/0x250 kernel/time/timer.c:1464 __run_timers+0x358/0x420 kernel/time/timer.c:1732 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745 __do_softirq+0x12c/0x26e kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646 vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174 vprintk_default+0x22/0x30 kernel/printk/printk.c:2185 vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392 printk+0x62/0x87 kernel/printk/printk.c:2216 selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041 security_netlink_send+0x42/0x90 security/security.c:2070 netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmsg+0x1ed/0x270 net/socket.c:2475 __do_sys_sendmsg net/socket.c:2484 [inline] __se_sys_sendmsg net/socket.c:2482 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x01 -> 0x02
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/igmp.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 6b3c558a4f23..a51360087b19 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -803,10 +803,17 @@ static void igmp_gq_timer_expire(struct timer_list *t) static void igmp_ifc_timer_expire(struct timer_list *t) { struct in_device *in_dev = from_timer(in_dev, t, mr_ifc_timer); + u8 mr_ifc_count;
igmpv3_send_cr(in_dev); - if (in_dev->mr_ifc_count) { - in_dev->mr_ifc_count--; +restart: + mr_ifc_count = READ_ONCE(in_dev->mr_ifc_count); + + if (mr_ifc_count) { + if (cmpxchg(&in_dev->mr_ifc_count, + mr_ifc_count, + mr_ifc_count - 1) != mr_ifc_count) + goto restart; igmp_ifc_start_timer(in_dev, unsolicited_report_interval(in_dev)); } @@ -818,7 +825,7 @@ static void igmp_ifc_event(struct in_device *in_dev) struct net *net = dev_net(in_dev->dev); if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) return; - in_dev->mr_ifc_count = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; + WRITE_ONCE(in_dev->mr_ifc_count, in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv); igmp_ifc_start_timer(in_dev, 1); }
@@ -957,7 +964,7 @@ static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb, in_dev->mr_qri; } /* cancel the interface change timer */ - in_dev->mr_ifc_count = 0; + WRITE_ONCE(in_dev->mr_ifc_count, 0); if (del_timer(&in_dev->mr_ifc_timer)) __in_dev_put(in_dev); /* clear deleted report items */ @@ -1724,7 +1731,7 @@ void ip_mc_down(struct in_device *in_dev) igmp_group_dropped(pmc);
#ifdef CONFIG_IP_MULTICAST - in_dev->mr_ifc_count = 0; + WRITE_ONCE(in_dev->mr_ifc_count, 0); if (del_timer(&in_dev->mr_ifc_timer)) __in_dev_put(in_dev); in_dev->mr_gq_running = 0; @@ -1941,7 +1948,7 @@ static int ip_mc_del_src(struct in_device *in_dev, __be32 *pmca, int sfmode, pmc->sfmode = MCAST_INCLUDE; #ifdef CONFIG_IP_MULTICAST pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; - in_dev->mr_ifc_count = pmc->crcount; + WRITE_ONCE(in_dev->mr_ifc_count, pmc->crcount); for (psf = pmc->sources; psf; psf = psf->sf_next) psf->sf_crcount = 0; igmp_ifc_event(pmc->interface); @@ -2120,7 +2127,7 @@ static int ip_mc_add_src(struct in_device *in_dev, __be32 *pmca, int sfmode, /* else no filters; keep old mode for reports */
pmc->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; - in_dev->mr_ifc_count = pmc->crcount; + WRITE_ONCE(in_dev->mr_ifc_count, pmc->crcount); for (psf = pmc->sources; psf; psf = psf->sf_next) psf->sf_crcount = 0; igmp_ifc_event(in_dev);
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit cd391280bf4693ceddca8f19042cff42f98c1a89 ]
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: e4b27ebc780f ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Acked-by: Kurt Kanzenbach kurt@linutronix.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/hirschmann/hellcreek.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/hirschmann/hellcreek.c b/drivers/net/dsa/hirschmann/hellcreek.c index 4d78219da253..50109218baad 100644 --- a/drivers/net/dsa/hirschmann/hellcreek.c +++ b/drivers/net/dsa/hirschmann/hellcreek.c @@ -912,6 +912,7 @@ static int hellcreek_fdb_dump(struct dsa_switch *ds, int port, { struct hellcreek *hellcreek = ds->priv; u16 entries; + int ret = 0; size_t i;
mutex_lock(&hellcreek->reg_lock); @@ -944,12 +945,14 @@ static int hellcreek_fdb_dump(struct dsa_switch *ds, int port, if (!(entry.portmask & BIT(port))) continue;
- cb(entry.mac, 0, entry.is_static, data); + ret = cb(entry.mac, 0, entry.is_static, data); + if (ret) + break; }
mutex_unlock(&hellcreek->reg_lock);
- return 0; + return ret; }
static int hellcreek_vlan_filtering(struct dsa_switch *ds, int port,
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit ada2fee185d8145afb89056558bb59545b9dbdd0 ]
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: ab335349b852 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/lan9303-core.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index 344374025426..d7ce281570b5 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -557,12 +557,12 @@ static int lan9303_alr_make_entry_raw(struct lan9303 *chip, u32 dat0, u32 dat1) return 0; }
-typedef void alr_loop_cb_t(struct lan9303 *chip, u32 dat0, u32 dat1, - int portmap, void *ctx); +typedef int alr_loop_cb_t(struct lan9303 *chip, u32 dat0, u32 dat1, + int portmap, void *ctx);
-static void lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void *ctx) +static int lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void *ctx) { - int i; + int ret = 0, i;
mutex_lock(&chip->alr_mutex); lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, @@ -582,13 +582,17 @@ static void lan9303_alr_loop(struct lan9303 *chip, alr_loop_cb_t *cb, void *ctx) LAN9303_ALR_DAT1_PORT_BITOFFS; portmap = alrport_2_portmap[alrport];
- cb(chip, dat0, dat1, portmap, ctx); + ret = cb(chip, dat0, dat1, portmap, ctx); + if (ret) + break;
lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, LAN9303_ALR_CMD_GET_NEXT); lan9303_write_switch_reg(chip, LAN9303_SWE_ALR_CMD, 0); } mutex_unlock(&chip->alr_mutex); + + return ret; }
static void alr_reg_to_mac(u32 dat0, u32 dat1, u8 mac[6]) @@ -606,18 +610,20 @@ struct del_port_learned_ctx { };
/* Clear learned (non-static) entry on given port */ -static void alr_loop_cb_del_port_learned(struct lan9303 *chip, u32 dat0, - u32 dat1, int portmap, void *ctx) +static int alr_loop_cb_del_port_learned(struct lan9303 *chip, u32 dat0, + u32 dat1, int portmap, void *ctx) { struct del_port_learned_ctx *del_ctx = ctx; int port = del_ctx->port;
if (((BIT(port) & portmap) == 0) || (dat1 & LAN9303_ALR_DAT1_STATIC)) - return; + return 0;
/* learned entries has only one port, we can just delete */ dat1 &= ~LAN9303_ALR_DAT1_VALID; /* delete entry */ lan9303_alr_make_entry_raw(chip, dat0, dat1); + + return 0; }
struct port_fdb_dump_ctx { @@ -626,19 +632,19 @@ struct port_fdb_dump_ctx { dsa_fdb_dump_cb_t *cb; };
-static void alr_loop_cb_fdb_port_dump(struct lan9303 *chip, u32 dat0, - u32 dat1, int portmap, void *ctx) +static int alr_loop_cb_fdb_port_dump(struct lan9303 *chip, u32 dat0, + u32 dat1, int portmap, void *ctx) { struct port_fdb_dump_ctx *dump_ctx = ctx; u8 mac[ETH_ALEN]; bool is_static;
if ((BIT(dump_ctx->port) & portmap) == 0) - return; + return 0;
alr_reg_to_mac(dat0, dat1, mac); is_static = !!(dat1 & LAN9303_ALR_DAT1_STATIC); - dump_ctx->cb(mac, 0, is_static, dump_ctx->data); + return dump_ctx->cb(mac, 0, is_static, dump_ctx->data); }
/* Set a static ALR entry. Delete entry if port_map is zero */ @@ -1210,9 +1216,7 @@ static int lan9303_port_fdb_dump(struct dsa_switch *ds, int port, };
dev_dbg(chip->dev, "%s(%d)\n", __func__, port); - lan9303_alr_loop(chip, alr_loop_cb_fdb_port_dump, &dump_ctx); - - return 0; + return lan9303_alr_loop(chip, alr_loop_cb_fdb_port_dump, &dump_ctx); }
static int lan9303_port_mdb_prepare(struct dsa_switch *ds, int port,
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 871a73a1c8f55da0a3db234e9dd816ea4fd546f2 ]
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: 58c59ef9e930 ("net: dsa: lantiq: Add Forwarding Database access") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/lantiq_gswip.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c index 314ae78bbdd6..e78026ef6d8c 100644 --- a/drivers/net/dsa/lantiq_gswip.c +++ b/drivers/net/dsa/lantiq_gswip.c @@ -1404,11 +1404,17 @@ static int gswip_port_fdb_dump(struct dsa_switch *ds, int port, addr[1] = mac_bridge.key[2] & 0xff; addr[0] = (mac_bridge.key[2] >> 8) & 0xff; if (mac_bridge.val[1] & GSWIP_TABLE_MAC_BRIDGE_STATIC) { - if (mac_bridge.val[0] & BIT(port)) - cb(addr, 0, true, data); + if (mac_bridge.val[0] & BIT(port)) { + err = cb(addr, 0, true, data); + if (err) + return err; + } } else { - if (((mac_bridge.val[0] & GENMASK(7, 4)) >> 4) == port) - cb(addr, 0, false, data); + if (((mac_bridge.val[0] & GENMASK(7, 4)) >> 4) == port) { + err = cb(addr, 0, false, data); + if (err) + return err; + } } } return 0;
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 21b52fed928e96d2f75d2f6aa9eac7a4b0b55d22 ]
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into multiple netlink skbs if the buffer provided by user space is too small (one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that point, and then the dump resumes on the same port with a new skb, and FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to drivers, then drivers must check for the -EMSGSIZE error code returned by it. Otherwise, when a netlink skb becomes full, DSA will no longer save newly dumped FDB entries to it, but the driver will continue dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 4b05a2424623..0aaf599119cd 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -1625,7 +1625,9 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port, /* We need to hide the dsa_8021q VLANs from the user. */ if (priv->vlan_state == SJA1105_VLAN_UNAWARE) l2_lookup.vlanid = 0; - cb(macaddr, l2_lookup.vlanid, l2_lookup.lockeds, data); + rc = cb(macaddr, l2_lookup.vlanid, l2_lookup.lockeds, data); + if (rc) + return rc; } return 0; }
From: Andre Przywara andre.przywara@arm.com
[ Upstream commit d1dee814168538eba166ae4150b37f0d88257884 ]
When we are building all the various pinctrl structures for the Allwinner pinctrl devices, we do some estimation about the maximum number of distinct function (names) that we will need.
So far we take the number of pins as an upper bound, even though we can actually have up to four special functions per pin. This wasn't a problem until now, since we indeed have typically far more pins than functions, and most pins share common functions.
However the H616 "-r" pin controller has only two pins, but four functions, so we run over the end of the array when we are looking for a matching function name in sunxi_pinctrl_add_function - there is no NULL sentinel left that would terminate the loop:
[ 8.200648] Unable to handle kernel paging request at virtual address fffdff7efbefaff5 [ 8.209179] Mem abort info: .... [ 8.368456] Call trace: [ 8.370925] __pi_strcmp+0x90/0xf0 [ 8.374559] sun50i_h616_r_pinctrl_probe+0x1c/0x28 [ 8.379557] platform_probe+0x68/0xd8
Do an actual worst case allocation (4 functions per pin, three common functions and the sentinel) for the initial array allocation. This is now heavily overestimating the number of functions in the common case, but we will reallocate this array later with the actual number of functions, so it's only temporarily.
Fixes: 561c1cf17c46 ("pinctrl: sunxi: Add support for the Allwinner H616-R pin controller") Signed-off-by: Andre Przywara andre.przywara@arm.com Acked-by: Maxime Ripard maxime@cerno.tech Link: https://lore.kernel.org/r/20210722132548.22121-1-andre.przywara@arm.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/sunxi/pinctrl-sunxi.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c b/drivers/pinctrl/sunxi/pinctrl-sunxi.c index dc8d39ae045b..9c7679c06dca 100644 --- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c +++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c @@ -1219,10 +1219,12 @@ static int sunxi_pinctrl_build_state(struct platform_device *pdev) }
/* - * We suppose that we won't have any more functions than pins, - * we'll reallocate that later anyway + * Find an upper bound for the maximum number of functions: in + * the worst case we have gpio_in, gpio_out, irq and up to four + * special functions per pin, plus one entry for the sentinel. + * We'll reallocate that later anyway. */ - pctl->functions = kcalloc(pctl->ngroups, + pctl->functions = kcalloc(4 * pctl->ngroups + 4, sizeof(*pctl->functions), GFP_KERNEL); if (!pctl->functions)
From: Nikolay Aleksandrov nikolay@nvidia.com
[ Upstream commit 45a687879b31caae4032abd1c2402e289d2b8083 ]
Ignore fdb flags when adding port extern learn entries and always set BR_FDB_LOCAL flag when adding bridge extern learn entries. This is closest to the behaviour we had before and avoids breaking any use cases which were allowed.
This patch fixes iproute2 calls which assume NUD_PERMANENT and were allowed before, example: $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
Extern learn entries are allowed to roam, but do not expire, so static or dynamic flags make no sense for them.
Also add a comment for future reference.
Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Fixes: 0541a6293298 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry") Reviewed-by: Ido Schimmel idosch@nvidia.com Tested-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/neighbour.h | 7 +++++-- net/bridge/br.c | 3 +-- net/bridge/br_fdb.c | 11 ++++------- net/bridge/br_private.h | 2 +- 4 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h index dc8b72201f6c..00a60695fa53 100644 --- a/include/uapi/linux/neighbour.h +++ b/include/uapi/linux/neighbour.h @@ -66,8 +66,11 @@ enum { #define NUD_NONE 0x00
/* NUD_NOARP & NUD_PERMANENT are pseudostates, they never change - and make no address resolution or NUD. - NUD_PERMANENT also cannot be deleted by garbage collectors. + * and make no address resolution or NUD. + * NUD_PERMANENT also cannot be deleted by garbage collectors. + * When NTF_EXT_LEARNED is set for a bridge fdb entry the different cache entry + * states don't make sense and thus are ignored. Such entries don't age and + * can roam. */
struct nda_cacheinfo { diff --git a/net/bridge/br.c b/net/bridge/br.c index bbab9984f24e..ef743f94254d 100644 --- a/net/bridge/br.c +++ b/net/bridge/br.c @@ -166,8 +166,7 @@ static int br_switchdev_event(struct notifier_block *unused, case SWITCHDEV_FDB_ADD_TO_BRIDGE: fdb_info = ptr; err = br_fdb_external_learn_add(br, p, fdb_info->addr, - fdb_info->vid, - fdb_info->is_local, false); + fdb_info->vid, false); if (err) { err = notifier_from_errno(err); break; diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 87ce52bba649..3451c888ff79 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -1026,10 +1026,7 @@ static int __br_fdb_add(struct ndmsg *ndm, struct net_bridge *br, "FDB entry towards bridge must be permanent"); return -EINVAL; } - - err = br_fdb_external_learn_add(br, p, addr, vid, - ndm->ndm_state & NUD_PERMANENT, - true); + err = br_fdb_external_learn_add(br, p, addr, vid, true); } else { spin_lock_bh(&br->hash_lock); err = fdb_add_entry(br, p, addr, ndm, nlh_flags, vid, nfea_tb); @@ -1257,7 +1254,7 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p) }
int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, - const unsigned char *addr, u16 vid, bool is_local, + const unsigned char *addr, u16 vid, bool swdev_notify) { struct net_bridge_fdb_entry *fdb; @@ -1275,7 +1272,7 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, if (swdev_notify) flags |= BIT(BR_FDB_ADDED_BY_USER);
- if (is_local) + if (!p) flags |= BIT(BR_FDB_LOCAL);
fdb = fdb_create(br, p, addr, vid, flags); @@ -1304,7 +1301,7 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, if (swdev_notify) set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
- if (is_local) + if (!p) set_bit(BR_FDB_LOCAL, &fdb->flags);
if (modified) diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 4e3d26e0a2d1..e013d33f1c7c 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -707,7 +707,7 @@ int br_fdb_get(struct sk_buff *skb, struct nlattr *tb[], struct net_device *dev, int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p); void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p); int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, - const unsigned char *addr, u16 vid, bool is_local, + const unsigned char *addr, u16 vid, bool swdev_notify); int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p, const unsigned char *addr, u16 vid,
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit 519133debcc19f5c834e7e28480b60bdc234fe02 ]
I got a memleak report:
BUG: memory leak unreferenced object 0x607ee521a658 (size 240): comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s) hex dump (first 32 bytes, cpu 1): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693 [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline] [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611 [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline] [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487 [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457 [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488 [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550 [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504 [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340 [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929 [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline] [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674 [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350 [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404 [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433 [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47 [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
On error path of br_add_if(), p->mcast_stats allocated in new_nbp() need be freed, or it will be leaked.
Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Acked-by: Nikolay Aleksandrov nikolay@nvidia.com Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_if.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 6e4a32354a13..14cd6ef96111 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -616,6 +616,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
err = dev_set_allmulti(dev, 1); if (err) { + br_multicast_del_port(p); kfree(p); /* kobject not yet init'd, manually free */ goto err1; } @@ -729,6 +730,7 @@ err4: err3: sysfs_remove_link(br->ifobj, p->dev->name); err2: + br_multicast_del_port(p); kobject_put(&p->kobj); dev_set_allmulti(dev, -1); err1:
From: Willy Tarreau w@1wt.eu
[ Upstream commit 6922110d152e56d7569616b45a1f02876cf3eb9f ]
After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed that my Ethernet port to which a bond and a VLAN interface are attached appeared to remain up after resuming from suspend with the cable unplugged (and that problem still persists with 5.10-LTS).
It happens that the following happens:
- the network driver (e1000e here) prepares to suspend, calls e1000e_down() which calls netif_carrier_off() to signal that the link is going down. - netif_carrier_off() adds a link_watch event to the list of events for this device - the device is completely stopped. - the machine suspends - the cable is unplugged and the machine brought to another location - the machine is resumed - the queued linkwatch events are processed for the device - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events are silently dropped - the device is resumed with its link down - the upper VLAN and bond interfaces are never notified that the link had been turned down and remain up - the only way to provoke a change is to physically connect the machine to a port and possibly unplug it.
The state after resume looks like this: $ ip -br li | egrep 'bond|eth' bond0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> eth0 DOWN e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> eth0.2@eth0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
Placing an explicit call to netdev_state_change() either in the suspend or the resume code in the NIC driver worked around this but the solution is not satisfying.
The issue in fact really is in link_watch that loses events while it ought not to. It happens that the test for the device being present was added by commit 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") in 4.20 to avoid an access to devices that are not present.
Instead of dropping events, this patch proceeds slightly differently by postponing their handling so that they happen after the device is fully resumed.
Fixes: 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") Link: https://lists.openwall.net/netdev/2018/03/15/62 Cc: Heiner Kallweit hkallweit1@gmail.com Cc: Geert Uytterhoeven geert+renesas@glider.be Cc: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Willy Tarreau w@1wt.eu Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/link_watch.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 75431ca9300f..1a455847da54 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct net_device *dev) clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
rfc2863_policy(dev); - if (dev->flags & IFF_UP && netif_device_present(dev)) { + if (dev->flags & IFF_UP) { if (netif_carrier_ok(dev)) dev_activate(dev); else @@ -204,7 +204,8 @@ static void __linkwatch_run_queue(int urgent_only) dev = list_first_entry(&wrk, struct net_device, link_watch_list); list_del_init(&dev->link_watch_list);
- if (urgent_only && !linkwatch_urgent_event(dev)) { + if (!netif_device_present(dev) || + (urgent_only && !linkwatch_urgent_event(dev))) { list_add_tail(&dev->link_watch_list, &lweventlist); continue; }
From: Neal Cardwell ncardwell@google.com
[ Upstream commit 6de035fec045f8ae5ee5f3a02373a18b939e91fb ]
Currently if BBR congestion control is initialized after more than 2B packets have been delivered, depending on the phase of the tp->delivered counter the tracking of BBR round trips can get stuck.
The bug arises because if tp->delivered is between 2^31 and 2^32 at the time the BBR congestion control module is initialized, then the initialization of bbr->next_rtt_delivered to 0 will cause the logic to believe that the end of the round trip is still billions of packets in the future. More specifically, the following check will fail repeatedly:
!before(rs->prior_delivered, bbr->next_rtt_delivered)
and thus the connection will take up to 2B packets delivered before that check will pass and the connection will set:
bbr->round_start = 1;
This could cause many mechanisms in BBR to fail to trigger, for example bbr_check_full_bw_reached() would likely never exit STARTUP.
This bug is 5 years old and has not been observed, and as a practical matter this would likely rarely trigger, since it would require transferring at least 2B packets, or likely more than 3 terabytes of data, before switching congestion control algorithms to BBR.
This patch is a stable candidate for kernels as far back as v4.9, when tcp_bbr.c was added.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell ncardwell@google.com Reviewed-by: Yuchung Cheng ycheng@google.com Reviewed-by: Kevin Yang yyd@google.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_bbr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 6ea3dc2e4219..6274462b86b4 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -1041,7 +1041,7 @@ static void bbr_init(struct sock *sk) bbr->prior_cwnd = 0; tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; bbr->rtt_cnt = 0; - bbr->next_rtt_delivered = 0; + bbr->next_rtt_delivered = tp->delivered; bbr->prev_ca_state = TCP_CA_Open; bbr->packet_conservation = 0;
From: Eric Dumazet edumazet@google.com
[ Upstream commit b69dd5b3780a7298bd893816a09da751bc0636f7 ]
Some arches support cmpxchg() on 4-byte and 8-byte only. Increase mr_ifc_count width to 32bit to fix this problem.
Fixes: 4a2b285e7e10 ("net: igmp: fix data-race in igmp_ifc_timer_expire()") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/inetdevice.h | 2 +- net/ipv4/igmp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index 53aa0343bf69..aaf4f1b4c277 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -41,7 +41,7 @@ struct in_device { unsigned long mr_qri; /* Query Response Interval */ unsigned char mr_qrv; /* Query Robustness Variable */ unsigned char mr_gq_running; - unsigned char mr_ifc_count; + u32 mr_ifc_count; struct timer_list mr_gq_timer; /* general query timer */ struct timer_list mr_ifc_timer; /* interface change timer */
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index a51360087b19..00576bae183d 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -803,7 +803,7 @@ static void igmp_gq_timer_expire(struct timer_list *t) static void igmp_ifc_timer_expire(struct timer_list *t) { struct in_device *in_dev = from_timer(in_dev, t, mr_ifc_timer); - u8 mr_ifc_count; + u32 mr_ifc_count;
igmpv3_send_cr(in_dev); restart:
From: Matt Roper matthew.d.roper@intel.com
[ Upstream commit 24d032e2359e3abc926b3d423f49a7c33e0b7836 ]
The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6 forcewake domain and is not accessible if the vdbox in that domain is fused off and the forcewake is not initialized.
This mistake went unnoticed because until recently we were using the wrong register offset for the SFC_DONE register; once the register offset was corrected, we started hitting errors like
<4> [544.989065] i915 0000:cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000
on parts with fused-off vdbox engines.
Fixes: e50dbdbfd9fb ("drm/i915/tgl: Add SFC instdone to error state") Fixes: 9c9c6d0ab08a ("drm/i915: Correct SFC_DONE register offset") Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20210806174130.1058960-1-matth... Reviewed-by: José Roberto de Souza jose.souza@intel.com (cherry picked from commit c5589bb5dccb0c5cb74910da93663f489589f3ce) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com [Changed Fixes tag to match the cherry-picked 82929a2140eb] Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/i915_gpu_error.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index bb181fe5d47e..725f241a428c 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -728,9 +728,18 @@ static void err_print_gt(struct drm_i915_error_state_buf *m, if (INTEL_GEN(m->i915) >= 12) { int i;
- for (i = 0; i < GEN12_SFC_DONE_MAX; i++) + for (i = 0; i < GEN12_SFC_DONE_MAX; i++) { + /* + * SFC_DONE resides in the VD forcewake domain, so it + * only exists if the corresponding VCS engine is + * present. + */ + if (!HAS_ENGINE(gt->_gt, _VCS(i * 2))) + continue; + err_printf(m, " SFC_DONE[%d]: 0x%08x\n", i, gt->sfc_done[i]); + }
err_printf(m, " GAM_DONE: 0x%08x\n", gt->gam_done); } @@ -1586,6 +1595,14 @@ static void gt_record_regs(struct intel_gt_coredump *gt)
if (INTEL_GEN(i915) >= 12) { for (i = 0; i < GEN12_SFC_DONE_MAX; i++) { + /* + * SFC_DONE resides in the VD forcewake domain, so it + * only exists if the corresponding VCS engine is + * present. + */ + if (!HAS_ENGINE(gt->_gt, _VCS(i * 2))) + continue; + gt->sfc_done[i] = intel_uncore_read(uncore, GEN12_SFC_DONE(i)); }
From: Maximilian Heyne mheyne@amazon.de
[ Upstream commit 88ca2521bd5b4e8b83743c01a2d4cb09325b51e9 ]
There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq mapping are lazily allocated in this function. The check whether the row is already present and the row initialization is not synchronized. Two threads can at the same time allocate a new row for evtchn_to_irq and add the irq mapping to the their newly allocated row. One thread will overwrite what the other has set for evtchn_to_irq[row] and therefore the irq mapping is lost. This will trigger a BUG_ON later in bind_evtchn_to_cpu:
INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802 INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002) INFO: nvme nvme77: 1/0/0 default/read/poll queues CRIT: kernel BUG at drivers/xen/events/events_base.c:427! WARN: invalid opcode: 0000 [#1] SMP NOPTI WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme] WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0 WARN: Call Trace: WARN: set_affinity_irq+0x121/0x150 WARN: irq_do_set_affinity+0x37/0xe0 WARN: irq_setup_affinity+0xf6/0x170 WARN: irq_startup+0x64/0xe0 WARN: __setup_irq+0x69e/0x740 WARN: ? request_threaded_irq+0xad/0x160 WARN: request_threaded_irq+0xf5/0x160 WARN: ? nvme_timeout+0x2f0/0x2f0 [nvme] WARN: pci_request_irq+0xa9/0xf0 WARN: ? pci_alloc_irq_vectors_affinity+0xbb/0x130 WARN: queue_request_irq+0x4c/0x70 [nvme] WARN: nvme_reset_work+0x82d/0x1550 [nvme] WARN: ? check_preempt_wakeup+0x14f/0x230 WARN: ? check_preempt_curr+0x29/0x80 WARN: ? nvme_irq_check+0x30/0x30 [nvme] WARN: process_one_work+0x18e/0x3c0 WARN: worker_thread+0x30/0x3a0 WARN: ? process_one_work+0x3c0/0x3c0 WARN: kthread+0x113/0x130 WARN: ? kthread_park+0x90/0x90 WARN: ret_from_fork+0x3a/0x50
This patch sets evtchn_to_irq rows via a cmpxchg operation so that they will be set only once. The row is now cleared before writing it to evtchn_to_irq in order to not create a race once the row is visible for other threads.
While at it, do not require the page to be zeroed, because it will be overwritten with -1's in clear_evtchn_to_irq_row anyway.
Signed-off-by: Maximilian Heyne mheyne@amazon.de Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated") Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/xen/events/events_base.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index d7e361fb0548..0e44098f3977 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -198,12 +198,12 @@ static void disable_dynirq(struct irq_data *data);
static DEFINE_PER_CPU(unsigned int, irq_epoch);
-static void clear_evtchn_to_irq_row(unsigned row) +static void clear_evtchn_to_irq_row(int *evtchn_row) { unsigned col;
for (col = 0; col < EVTCHN_PER_ROW; col++) - WRITE_ONCE(evtchn_to_irq[row][col], -1); + WRITE_ONCE(evtchn_row[col], -1); }
static void clear_evtchn_to_irq_all(void) @@ -213,7 +213,7 @@ static void clear_evtchn_to_irq_all(void) for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) { if (evtchn_to_irq[row] == NULL) continue; - clear_evtchn_to_irq_row(row); + clear_evtchn_to_irq_row(evtchn_to_irq[row]); } }
@@ -221,6 +221,7 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, unsigned int irq) { unsigned row; unsigned col; + int *evtchn_row;
if (evtchn >= xen_evtchn_max_channels()) return -EINVAL; @@ -233,11 +234,18 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, unsigned int irq) if (irq == -1) return 0;
- evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL); - if (evtchn_to_irq[row] == NULL) + evtchn_row = (int *) __get_free_pages(GFP_KERNEL, 0); + if (evtchn_row == NULL) return -ENOMEM;
- clear_evtchn_to_irq_row(row); + clear_evtchn_to_irq_row(evtchn_row); + + /* + * We've prepared an empty row for the mapping. If a different + * thread was faster inserting it, we can drop ours. + */ + if (cmpxchg(&evtchn_to_irq[row], NULL, evtchn_row) != NULL) + free_page((unsigned long) evtchn_row); }
WRITE_ONCE(evtchn_to_irq[row][col], irq);
From: Longpeng(Mike) longpeng2@huawei.com
[ Upstream commit 49b0b6ffe20c5344f4173f3436298782a08da4f2 ]
There's a potential deadlock case when remove the vsock device or process the RESET event:
vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock)
lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic".
Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput
So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci.
Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Longpeng(Mike) longpeng2@huawei.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/virtio_transport.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 2700a63ab095..3a056f8affd1 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -356,11 +356,14 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
static void virtio_vsock_reset_sock(struct sock *sk) { - lock_sock(sk); + /* vmci_transport.c doesn't take sk_lock here either. At least we're + * under vsock_table_lock so the sock cannot disappear while we're + * executing. + */ + sk->sk_state = TCP_CLOSE; sk->sk_err = ECONNRESET; sk->sk_error_report(sk); - release_sock(sk); }
static void virtio_vsock_update_guest_cid(struct virtio_vsock *vsock)
From: Xie Yongji xieyongji@bytedance.com
[ Upstream commit cddce01160582a5f52ada3da9626c052d852ec42 ]
There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request.
To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating.
Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong jiangyadong@bytedance.com Signed-off-by: Xie Yongji xieyongji@bytedance.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/nbd.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 45d2c28c8fc8..1061894a55df 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -805,6 +805,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved) { struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
+ /* don't abort one completed request */ + if (blk_mq_request_completed(req)) + return true; + mutex_lock(&cmd->lock); cmd->status = BLK_STS_IOERR; mutex_unlock(&cmd->lock); @@ -1973,15 +1977,19 @@ static void nbd_disconnect_and_put(struct nbd_device *nbd) { mutex_lock(&nbd->config_lock); nbd_disconnect(nbd); - nbd_clear_sock(nbd); - mutex_unlock(&nbd->config_lock); + sock_shutdown(nbd); /* * Make sure recv thread has finished, so it does not drop the last * config ref and try to destroy the workqueue from inside the work - * queue. + * queue. And this also ensure that we can safely call nbd_clear_que() + * to cancel the inflight I/Os. */ if (nbd->recv_workq) flush_workqueue(nbd->recv_workq); + nbd_clear_que(nbd); + nbd->task_setup = NULL; + mutex_unlock(&nbd->config_lock); + if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, &nbd->config->runtime_flags)) nbd_config_put(nbd);
From: Benjamin Herrenschmidt benh@kernel.crashing.org
[ Upstream commit 4152433c397697acc4b02c4a10d17d5859c2730d ]
The EFI stub random allocator used for kaslr on arm64 has a subtle bug. In function get_entry_num_slots() which counts the number of possible allocation "slots" for the image in a given chunk of free EFI memory, "last_slot" can become negative if the chunk is smaller than the requested allocation size.
The test "if (first_slot > last_slot)" doesn't catch it because both first_slot and last_slot are unsigned.
I chose not to make them signed to avoid problems if this is ever used on architectures where there are meaningful addresses with the top bit set. Instead, fix it with an additional test against the allocation size.
This can cause a boot failure in addition to a loss of randomisation due to another bug in the arm64 stub fixed separately.
Signed-off-by: Benjamin Herrenschmidt benh@kernel.crashing.org Fixes: 2ddbfc81eac8 ("efi: stub: add implementation of efi_random_alloc()") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/randomalloc.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c index a408df474d83..724155b9e10d 100644 --- a/drivers/firmware/efi/libstub/randomalloc.c +++ b/drivers/firmware/efi/libstub/randomalloc.c @@ -30,6 +30,8 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1, (u64)ULONG_MAX); + if (region_end < size) + return 0;
first_slot = round_up(md->phys_addr, align); last_slot = round_down(region_end - size + 1, align);
From: David Brazdil dbrazdil@google.com
[ Upstream commit facee1be7689f8cf573b9ffee6a5c28ee193615e ]
Hyp checks whether an address range only covers RAM by checking the start/endpoints against a list of memblock_region structs. However, the endpoint here is exclusive but internally is treated as inclusive. Fix the off-by-one error that caused valid address ranges to be rejected.
Cc: Quentin Perret qperret@google.com Fixes: 90134ac9cabb6 ("KVM: arm64: Protect the .hyp sections from the host") Signed-off-by: David Brazdil dbrazdil@google.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20210728153232.1018911-2-dbrazdil@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kvm/hyp/nvhe/mem_protect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c index 4b60c0056c04..fa1b77fe629d 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -190,7 +190,7 @@ static bool range_is_memory(u64 start, u64 end) { struct kvm_mem_range r1, r2;
- if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2)) + if (!find_mem_range(start, &r1) || !find_mem_range(end - 1, &r2)) return false; if (r1.start != r2.start) return false;
From: Ard Biesheuvel ardb@kernel.org
[ Upstream commit 5b94046efb4706b3429c9c8e7377bd8d1621d588 ]
Distro versions of GRUB replace the usual LoadImage/StartImage calls used to load the kernel image with some local code that fails to honor the allocation requirements described in the PE/COFF header, as it does not account for the image's BSS section at all: it fails to allocate space for it, and fails to zero initialize it.
Since the EFI stub itself is allocated in the .init segment, which is in the middle of the image, its BSS section is not impacted by this, and the main consequence of this omission is that the BSS section may overlap with memory regions that are already used by the firmware.
So let's warn about this condition, and force image reallocation to occur in this case, which works around the problem.
Fixes: 82046702e288 ("efi/libstub/arm64: Replace 'preferred' offset with alignment check") Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Benjamin Herrenschmidt benh@kernel.crashing.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/arm64-stub.c | 49 ++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c index 7bf0a7acae5e..3698c1ce2940 100644 --- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c @@ -34,6 +34,51 @@ efi_status_t check_platform_features(void) return EFI_SUCCESS; }
+/* + * Distro versions of GRUB may ignore the BSS allocation entirely (i.e., fail + * to provide space, and fail to zero it). Check for this condition by double + * checking that the first and the last byte of the image are covered by the + * same EFI memory map entry. + */ +static bool check_image_region(u64 base, u64 size) +{ + unsigned long map_size, desc_size, buff_size; + efi_memory_desc_t *memory_map; + struct efi_boot_memmap map; + efi_status_t status; + bool ret = false; + int map_offset; + + map.map = &memory_map; + map.map_size = &map_size; + map.desc_size = &desc_size; + map.desc_ver = NULL; + map.key_ptr = NULL; + map.buff_size = &buff_size; + + status = efi_get_memory_map(&map); + if (status != EFI_SUCCESS) + return false; + + for (map_offset = 0; map_offset < map_size; map_offset += desc_size) { + efi_memory_desc_t *md = (void *)memory_map + map_offset; + u64 end = md->phys_addr + md->num_pages * EFI_PAGE_SIZE; + + /* + * Find the region that covers base, and return whether + * it covers base+size bytes. + */ + if (base >= md->phys_addr && base < end) { + ret = (base + size) <= end; + break; + } + } + + efi_bs_call(free_pool, memory_map); + + return ret; +} + /* * Although relocatable kernels can fix up the misalignment with respect to * MIN_KIMG_ALIGN, the resulting virtual text addresses are subtly out of @@ -92,7 +137,9 @@ efi_status_t handle_kernel_image(unsigned long *image_addr, }
if (status != EFI_SUCCESS) { - if (IS_ALIGNED((u64)_text, min_kimg_align())) { + if (!check_image_region((u64)_text, kernel_memsize)) { + efi_err("FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region\n"); + } else if (IS_ALIGNED((u64)_text, min_kimg_align())) { /* * Just execute from wherever we were loaded by the * UEFI PE/COFF loader if the alignment is suitable.
From: Ard Biesheuvel ardb@kernel.org
[ Upstream commit 3a262423755b83a5f85009ace415d6e7f572dfe8 ]
Commit 82046702e288 ("efi/libstub/arm64: Replace 'preferred' offset with alignment check") simplified the way the stub moves the kernel image around in memory before booting it, given that a relocatable image does not need to be copied to a 2M aligned offset if it was loaded on a 64k boundary by EFI.
Commit d32de9130f6c ("efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure") inadvertently defeated this logic by overriding the value of efi_nokaslr if EFI_RNG_PROTOCOL is not available, which was mistaken by the loader logic as an explicit request on the part of the user to disable KASLR and any associated relocation of an Image not loaded on a 2M boundary.
So let's reinstate this functionality, by capturing the value of efi_nokaslr at function entry to choose the minimum alignment.
Fixes: d32de9130f6c ("efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure") Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Benjamin Herrenschmidt benh@kernel.crashing.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/arm64-stub.c | 28 +++++++++++------------ 1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c index 3698c1ce2940..6f214c9c303e 100644 --- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c @@ -79,18 +79,6 @@ static bool check_image_region(u64 base, u64 size) return ret; }
-/* - * Although relocatable kernels can fix up the misalignment with respect to - * MIN_KIMG_ALIGN, the resulting virtual text addresses are subtly out of - * sync with those recorded in the vmlinux when kaslr is disabled but the - * image required relocation anyway. Therefore retain 2M alignment unless - * KASLR is in use. - */ -static u64 min_kimg_align(void) -{ - return efi_nokaslr ? MIN_KIMG_ALIGN : EFI_KIMG_ALIGN; -} - efi_status_t handle_kernel_image(unsigned long *image_addr, unsigned long *image_size, unsigned long *reserve_addr, @@ -101,6 +89,16 @@ efi_status_t handle_kernel_image(unsigned long *image_addr, unsigned long kernel_size, kernel_memsize = 0; u32 phys_seed = 0;
+ /* + * Although relocatable kernels can fix up the misalignment with + * respect to MIN_KIMG_ALIGN, the resulting virtual text addresses are + * subtly out of sync with those recorded in the vmlinux when kaslr is + * disabled but the image required relocation anyway. Therefore retain + * 2M alignment if KASLR was explicitly disabled, even if it was not + * going to be activated to begin with. + */ + u64 min_kimg_align = efi_nokaslr ? MIN_KIMG_ALIGN : EFI_KIMG_ALIGN; + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { if (!efi_nokaslr) { status = efi_get_random_bytes(sizeof(phys_seed), @@ -130,7 +128,7 @@ efi_status_t handle_kernel_image(unsigned long *image_addr, * If KASLR is enabled, and we have some randomness available, * locate the kernel at a randomized offset in physical memory. */ - status = efi_random_alloc(*reserve_size, min_kimg_align(), + status = efi_random_alloc(*reserve_size, min_kimg_align, reserve_addr, phys_seed); } else { status = EFI_OUT_OF_RESOURCES; @@ -139,7 +137,7 @@ efi_status_t handle_kernel_image(unsigned long *image_addr, if (status != EFI_SUCCESS) { if (!check_image_region((u64)_text, kernel_memsize)) { efi_err("FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region\n"); - } else if (IS_ALIGNED((u64)_text, min_kimg_align())) { + } else if (IS_ALIGNED((u64)_text, min_kimg_align)) { /* * Just execute from wherever we were loaded by the * UEFI PE/COFF loader if the alignment is suitable. @@ -150,7 +148,7 @@ efi_status_t handle_kernel_image(unsigned long *image_addr, }
status = efi_allocate_pages_aligned(*reserve_size, reserve_addr, - ULONG_MAX, min_kimg_align()); + ULONG_MAX, min_kimg_align);
if (status != EFI_SUCCESS) { efi_err("Failed to relocate kernel\n");
From: Pu Lehui pulehui@huawei.com
[ Upstream commit 43e8f76006592cb1573a959aa287c45421066f9c ]
When using kprobe on powerpc booke series processor, Oops happens as show bellow:
/ # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable / # sleep 1 [ 50.076730] Oops: Exception in kernel mode, sig: 5 [#1] [ 50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 [ 50.077221] Modules linked in: [ 50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d #21 [ 50.077887] NIP: c0b9c4e0 LR: c00ebecc CTR: 00000000 [ 50.078067] REGS: c3883de0 TRAP: 0700 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.078349] MSR: 00029000 <CE,EE,ME> CR: 24000228 XER: 20000000 [ 50.078675] [ 50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001 [ 50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4 [ 50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000 [ 50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190 [ 50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0 [ 50.080638] Call Trace: [ 50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable) [ 50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110 [ 50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28 [ 50.081541] --- interrupt: c00 at 0x100a4d08 [ 50.081749] NIP: 100a4d08 LR: 101b5234 CTR: 00000003 [ 50.081931] REGS: c3883f50 TRAP: 0c00 Not tainted (5.14.0-rc4-00022-g251a1524293d) [ 50.082183] MSR: 0002f902 <CE,EE,PR,FP,ME> CR: 24000222 XER: 00000000 [ 50.082457] [ 50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff [ 50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4 [ 50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000 [ 50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8 [ 50.083789] NIP [100a4d08] 0x100a4d08 [ 50.083917] LR [101b5234] 0x101b5234 [ 50.084042] --- interrupt: c00 [ 50.084238] Instruction dump: [ 50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010 [ 50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048 [ 50.085487] ---[ end trace f6fffe98e2fa8f3e ]--- [ 50.085678] Trace/breakpoint trap
There is no real mode for booke arch and the MMU translation is always on. The corresponding MSR_IS/MSR_DS bit in booke is used to switch the address space, but not for real mode judgment.
Fixes: 21f8b2fa3ca5 ("powerpc/kprobes: Ignore traps that happened in real mode") Signed-off-by: Pu Lehui pulehui@huawei.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/kprobes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index e8c2a6373157..00fafc8b249e 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -276,7 +276,8 @@ int kprobe_handler(struct pt_regs *regs) if (user_mode(regs)) return 0;
- if (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR)) + if (!IS_ENABLED(CONFIG_BOOKE) && + (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR))) return 0;
/*
From: Dhananjay Phadke dphadke@linux.microsoft.com
[ Upstream commit bba676cc0b6122a74fa2e246f38a6b05c6f95b36 ]
Similar NULL deref was originally fixed by graceful teardown sequence -
https://lore.kernel.org/linux-i2c/1597106560-79693-1-git-send-email-dphadke@...
After this, a tasklet was added to take care of FIFO full condition for large i2c transaction.
https://lore.kernel.org/linux-arm-kernel/20201102035433.6774-1-rayagonda.kok...
This introduced regression, a new race condition between tasklet enabling interrupts and client unreg teardown sequence.
Kill tasklet before unreg_slave() masks bits in IE_OFFSET. Updated teardown sequence - (1) disable_irq() (2) Kill tasklet (3) Mask event enable bits in control reg (4) Erase slave address (avoid further writes to rx fifo) (5) Flush tx and rx FIFOs (6) Clear pending event (interrupt) bits in status reg (7) Set client pointer to NULL (8) enable_irq()
--
Unable to handle kernel read from unreadable memory at virtual address 0000000000000320 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000009212a000 [0000000000000320] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O Hardware name: Overlake (DT) pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--) pc : bcm_iproc_i2c_slave_isr+0x2b8/0x8e4 lr : bcm_iproc_i2c_slave_isr+0x1c8/0x8e4 sp : ffff800010003e70 x29: ffff800010003e80 x28: ffffda017acdc000 x27: ffffda017b0ae000 x26: ffff800010004000 x25: ffff800010000000 x24: ffffda017af4a168 x23: 0000000000000073 x22: 0000000000000000 x21: 0000000001400000 x20: 0000000001000000 x19: ffff06f09583f880 x18: 00000000fa83b2da x17: 000000000000b67e x16: 0000000002edb2f3 x15: 00000000000002c7 x14: 00000000000002c7 x13: 0000000000000006 x12: 0000000000000033 x11: 0000000000000000 x10: 0000000001000000 x9 : 0000000003289312 x8 : 0000000003289311 x7 : 02d0cd03a303adbc x6 : 02d18e7f0a4dfc6c x5 : 02edb2f33f76ea68 x4 : 00000000fa83b2da x3 : ffffda017af43cd0 x2 : ffff800010003e74 x1 : 0000000001400000 x0 : 0000000000000000 Call trace: bcm_iproc_i2c_slave_isr+0x2b8/0x8e4 bcm_iproc_i2c_isr+0x178/0x290 __handle_irq_event_percpu+0xd0/0x200 handle_irq_event+0x60/0x1a0 handle_fasteoi_irq+0x130/0x220 __handle_domain_irq+0x8c/0xcc gic_handle_irq+0xc0/0x120 el1_irq+0xcc/0x180 finish_task_switch+0x100/0x1d8 __schedule+0x61c/0x7a0 schedule_idle+0x28/0x44 do_idle+0x254/0x28c cpu_startup_entry+0x28/0x2c rest_init+0xc4/0xd0 arch_call_rest_init+0x14/0x1c start_kernel+0x33c/0x3b8 Code: f9423260 910013e2 11000509 b9047a69 (f9419009) ---[ end trace 4781455b2a7bec15 ]---
Fixes: 4d658451c9d6 ("i2c: iproc: handle rx fifo full interrupt")
Signed-off-by: Dhananjay Phadke dphadke@linux.microsoft.com Acked-by: Ray Jui ray.jui@broadcom.com Acked-by: Rayagonda Kokatanur rayagonda.kokatanur@broadcom.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-bcm-iproc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-bcm-iproc.c b/drivers/i2c/busses/i2c-bcm-iproc.c index cceaf69279a9..6304d1dd2dd6 100644 --- a/drivers/i2c/busses/i2c-bcm-iproc.c +++ b/drivers/i2c/busses/i2c-bcm-iproc.c @@ -1224,14 +1224,14 @@ static int bcm_iproc_i2c_unreg_slave(struct i2c_client *slave)
disable_irq(iproc_i2c->irq);
+ tasklet_kill(&iproc_i2c->slave_rx_tasklet); + /* disable all slave interrupts */ tmp = iproc_i2c_rd_reg(iproc_i2c, IE_OFFSET); tmp &= ~(IE_S_ALL_INTERRUPT_MASK << IE_S_ALL_INTERRUPT_SHIFT); iproc_i2c_wr_reg(iproc_i2c, IE_OFFSET, tmp);
- tasklet_kill(&iproc_i2c->slave_rx_tasklet); - /* Erase the slave address programmed */ tmp = iproc_i2c_rd_reg(iproc_i2c, S_CFG_SMBUS_ADDR_OFFSET); tmp &= ~BIT(S_CFG_EN_NIC_SMB_ADDR3_SHIFT);
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 839ad22f755132838f406751439363c07272ad87 ]
Skip (omit) any version string info that is parenthesized.
Warning: objdump version 15) is older than 2.19 Warning: Skipping posttest.
where 'objdump -v' says: GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18
Fixes: 8bee738bb1979 ("x86: Fix objdump version check in chkobjdump.awk for different formats.") Signed-off-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/tools/chkobjdump.awk | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/tools/chkobjdump.awk b/arch/x86/tools/chkobjdump.awk index fd1ab80be0de..a4cf678cf5c8 100644 --- a/arch/x86/tools/chkobjdump.awk +++ b/arch/x86/tools/chkobjdump.awk @@ -10,6 +10,7 @@ BEGIN {
/^GNU objdump/ { verstr = "" + gsub(/(.*)/, ""); for (i = 3; i <= NF; i++) if (match($(i), "^[0-9]")) { verstr = $(i);
From: Thomas Gleixner tglx@linutronix.de
commit 826da771291fc25a428e871f9e7fb465e390f852 upstream.
X86 IO/APIC and MSI interrupts (when used without interrupts remapping) require that the affinity setup on startup is done before the interrupt is enabled for the first time as the non-remapped operation mode cannot safely migrate enabled interrupts from arbitrary contexts. Provide a new irq chip flag which allows affected hardware to request this.
This has to be opt-in because there have been reports in the past that some interrupt chips cannot handle affinity setting before startup.
Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/irq.h | 2 ++ kernel/irq/chip.c | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-)
--- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -567,6 +567,7 @@ struct irq_chip { * IRQCHIP_SUPPORTS_NMI: Chip can deliver NMIs, only for root irqchips * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND: Invokes __enable_irq()/__disable_irq() for wake irqs * in the suspend path if they are in disabled state + * IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), @@ -579,6 +580,7 @@ enum { IRQCHIP_SUPPORTS_LEVEL_MSI = (1 << 7), IRQCHIP_SUPPORTS_NMI = (1 << 8), IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND = (1 << 9), + IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10), };
#include <linux/irqdesc.h> --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -265,8 +265,11 @@ int irq_startup(struct irq_desc *desc, b } else { switch (__irq_startup_managed(desc, aff, force)) { case IRQ_STARTUP_NORMAL: + if (d->chip->flags & IRQCHIP_AFFINITY_PRE_STARTUP) + irq_setup_affinity(desc); ret = __irq_startup(desc); - irq_setup_affinity(desc); + if (!(d->chip->flags & IRQCHIP_AFFINITY_PRE_STARTUP)) + irq_setup_affinity(desc); break; case IRQ_STARTUP_MANAGED: irq_do_set_affinity(d, aff, false);
From: Thomas Gleixner tglx@linutronix.de
commit ff363f480e5997051dd1de949121ffda3b753741 upstream.
The X86 MSI mechanism cannot handle interrupt affinity changes safely after startup other than from an interrupt handler, unless interrupt remapping is enabled. The startup sequence in the generic interrupt code violates that assumption.
Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time.
While the interrupt remapping MSI chip does not require this, there is no point in treating it differently as this might spare an interrupt to a CPU which is not in the default affinity mask.
For the non-remapping case go to the direct write path when the interrupt is not yet started similar to the not yet activated case.
Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/apic/msi.c | 11 ++++++++--- arch/x86/kernel/hpet.c | 2 +- 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -58,11 +58,13 @@ msi_set_affinity(struct irq_data *irqd, * The quirk bit is not set in this case. * - The new vector is the same as the old vector * - The old vector is MANAGED_IRQ_SHUTDOWN_VECTOR (interrupt starts up) + * - The interrupt is not yet started up * - The new destination CPU is the same as the old destination CPU */ if (!irqd_msi_nomask_quirk(irqd) || cfg->vector == old_cfg.vector || old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR || + !irqd_is_started(irqd) || cfg->dest_apicid == old_cfg.dest_apicid) { irq_msi_update_msg(irqd, cfg); return ret; @@ -150,7 +152,8 @@ static struct irq_chip pci_msi_controlle .irq_ack = irq_chip_ack_parent, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_set_affinity = msi_set_affinity, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, };
int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, @@ -219,7 +222,8 @@ static struct irq_chip pci_msi_ir_contro .irq_mask = pci_msi_mask_irq, .irq_ack = irq_chip_ack_parent, .irq_retrigger = irq_chip_retrigger_hierarchy, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, };
static struct msi_domain_info pci_msi_ir_domain_info = { @@ -273,7 +277,8 @@ static struct irq_chip dmar_msi_controll .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_compose_msi_msg = dmar_msi_compose_msg, .irq_write_msi_msg = dmar_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, };
static int dmar_msi_init(struct irq_domain *domain, --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -508,7 +508,7 @@ static struct irq_chip hpet_msi_controll .irq_set_affinity = msi_domain_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_write_msi_msg = hpet_msi_write_msg, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_AFFINITY_PRE_STARTUP, };
static int hpet_msi_init(struct irq_domain *domain,
From: Thomas Gleixner tglx@linutronix.de
commit 0c0e37dc11671384e53ba6ede53a4d91162a2cc5 upstream.
The IO/APIC cannot handle interrupt affinity changes safely after startup other than from an interrupt handler. The startup sequence in the generic interrupt code violates that assumption.
Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time.
Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/apic/io_apic.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1986,7 +1986,8 @@ static struct irq_chip ioapic_chip __rea .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_get_irqchip_state = ioapic_irq_get_chip_state, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, };
static struct irq_chip ioapic_ir_chip __read_mostly = { @@ -1999,7 +2000,8 @@ static struct irq_chip ioapic_ir_chip __ .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, .irq_get_irqchip_state = ioapic_irq_get_chip_state, - .flags = IRQCHIP_SKIP_SET_WAKE, + .flags = IRQCHIP_SKIP_SET_WAKE | + IRQCHIP_AFFINITY_PRE_STARTUP, };
static inline void init_IO_APIC_traps(void)
From: Babu Moger Babu.Moger@amd.com
commit 064855a69003c24bd6b473b367d364e418c57625 upstream.
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem.
Steps to reproduce:
1. mount -t resctrl resctrl /sys/fs/resctrl/
2. cd /sys/fs/resctrl/
3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832
4. Create sub monitor group: mkdir mon_groups/test1
5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable
When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable".
When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable".
Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs.
Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik pawel.szulik@intel.com Signed-off-by: Babu Moger Babu.Moger@amd.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Reinette Chatre reinette.chatre@intel.com Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoge... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/resctrl/monitor.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_m return chunks >>= shift; }
-static int __mon_event_count(u32 rmid, struct rmid_read *rr) +static u64 __mon_event_count(u32 rmid, struct rmid_read *rr) { struct mbm_state *m; u64 chunks, tval;
tval = __rmid_read(rmid, rr->evtid); if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) { - rr->val = tval; - return -EINVAL; + return tval; } switch (rr->evtid) { case QOS_L3_OCCUP_EVENT_ID: @@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, s case QOS_L3_MBM_LOCAL_EVENT_ID: m = &rr->d->mbm_local[rmid]; break; - default: - /* - * Code would never reach here because - * an invalid event id would fail the __rmid_read. - */ - return -EINVAL; }
if (rr->first) { @@ -361,23 +354,29 @@ void mon_event_count(void *info) struct rdtgroup *rdtgrp, *entry; struct rmid_read *rr = info; struct list_head *head; + u64 ret_val;
rdtgrp = rr->rgrp;
- if (__mon_event_count(rdtgrp->mon.rmid, rr)) - return; + ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
/* - * For Ctrl groups read data from child monitor groups. + * For Ctrl groups read data from child monitor groups and + * add them together. Count events which are read successfully. + * Discard the rmid_read's reporting errors. */ head = &rdtgrp->mon.crdtgrp_list;
if (rdtgrp->type == RDTCTRL_GROUP) { list_for_each_entry(entry, head, mon.crdtgrp_list) { - if (__mon_event_count(entry->mon.rmid, rr)) - return; + if (__mon_event_count(entry->mon.rmid, rr) == 0) + ret_val = 0; } } + + /* Report error if none of rmid_reads are successful */ + if (ret_val) + rr->val = ret_val; }
/*
From: Bixuan Cui cuibixuan@huawei.com
commit dbbc93576e03fbe24b365fab0e901eb442237a8a upstream.
msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but msi_domain_free_irqs() does not enforce deactivation before tearing down the interrupts.
This happens when PCI/MSI interrupts are set up and never used before being torn down again, e.g. in error handling pathes. The only place which cleans that up is the error handling path in msi_domain_alloc_irqs().
Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs() to cure that.
Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Signed-off-by: Bixuan Cui cuibixuan@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/msi.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -476,11 +476,6 @@ skip_activate: return 0;
cleanup: - for_each_msi_vector(desc, i, dev) { - irq_data = irq_domain_get_irq_data(domain, i); - if (irqd_is_activated(irq_data)) - irq_domain_deactivate_irq(irq_data); - } msi_domain_free_irqs(domain, dev); return ret; } @@ -505,7 +500,15 @@ int msi_domain_alloc_irqs(struct irq_dom
void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) { + struct irq_data *irq_data; struct msi_desc *desc; + int i; + + for_each_msi_vector(desc, i, dev) { + irq_data = irq_domain_get_irq_data(domain, i); + if (irqd_is_activated(irq_data)) + irq_domain_deactivate_irq(irq_data); + }
for_each_msi_entry(desc, dev) { /*
From: Ben Dai ben.dai@unisoc.com
commit b9cc7d8a4656a6e815852c27ab50365009cb69c1 upstream.
When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE * PREDICTION_FACTOR us and less than 1s, the calculated index will be greater than the length of irqs->ema_time[]. Check the calculated index before using it to prevent array overflow.
Fixes: 23aa3b9a6b7d ("genirq/timings: Encapsulate storing function") Signed-off-by: Ben Dai ben.dai@unisoc.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425150903.25456-1-ben.dai9703@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/timings.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/kernel/irq/timings.c +++ b/kernel/irq/timings.c @@ -453,6 +453,11 @@ static __always_inline void __irq_timing */ index = irq_timings_interval_index(interval);
+ if (index > PREDICTION_BUFFER_SIZE - 1) { + irqs->count = 0; + return; + } + /* * Store the index as an element of the pattern in another * circular array.
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 98694166c27d473c36b434bd3572934c2f2a16ab upstream.
An interrupt handler shall not be called from another interrupt handler otherwise this leads to problems like the following:
Kernel attempted to write user page (afd4fa84) - exploit attempt? (uid: 1000) ------------[ cut here ]------------ Bug: Write fault blocked by KUAP! WARNING: CPU: 0 PID: 1617 at arch/powerpc/mm/fault.c:230 do_page_fault+0x484/0x720 Modules linked in: CPU: 0 PID: 1617 Comm: sshd Tainted: G W 5.13.0-pmac-00010-g8393422eb77 #7 NIP: c001b77c LR: c001b77c CTR: 00000000 REGS: cb9e5bc0 TRAP: 0700 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00021032 <ME,IR,DR,RI> CR: 24942424 XER: 00000000
GPR00: c001b77c cb9e5c80 c1582c00 00000021 3ffffbff 085b0000 00000027 c8eb644c GPR08: 00000023 00000000 00000000 00000000 24942424 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 02000000 c1fda6c0 afd4fa84 00000300 cb9e5cc0 NIP [c001b77c] do_page_fault+0x484/0x720 LR [c001b77c] do_page_fault+0x484/0x720 Call Trace: [cb9e5c80] [c001b77c] do_page_fault+0x484/0x720 (unreliable) [cb9e5cb0] [c000424c] DataAccess_virt+0xd4/0xe4 --- interrupt: 300 at __copy_tofrom_user+0x110/0x20c NIP: c001f9b4 LR: c03250a0 CTR: 00000004 REGS: cb9e5cc0 TRAP: 0300 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48028468 XER: 20000000 DAR: afd4fa84 DSISR: 0a000000 GPR00: 20726f6f cb9e5d80 c1582c00 00000004 cb9e5e3a 00000016 afd4fa80 00000000 GPR08: 3835202d 72777872 2d78722d 00000004 28028464 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 cb9e5e0c 00000daa a0000000 cb9e5e98 afd4fa56 NIP [c001f9b4] __copy_tofrom_user+0x110/0x20c LR [c03250a0] _copy_to_iter+0x144/0x990 --- interrupt: 300 [cb9e5d80] [c03e89c0] n_tty_read+0xa4/0x598 (unreliable) [cb9e5df0] [c03e2a0c] tty_read+0xdc/0x2b4 [cb9e5e80] [c0156bf8] vfs_read+0x274/0x340 [cb9e5f00] [c01571ac] ksys_read+0x70/0x118 [cb9e5f30] [c0016048] ret_from_syscall+0x0/0x28 --- interrupt: c00 at 0xa7855c88 NIP: a7855c88 LR: a7855c5c CTR: 00000000 REGS: cb9e5f40 TRAP: 0c00 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 2402446c XER: 00000000
GPR00: 00000003 afd4ec70 a72137d0 0000000b afd4ecac 00004000 0065a990 00000800 GPR08: 00000000 a7947930 00000000 00000004 c15831b0 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 0065a9e0 00000001 0065fac0 GPR24: 00000000 00000089 00664050 00000000 00668e30 a720c8dc a7943ff4 0065f9b0 NIP [a7855c88] 0xa7855c88 LR [a7855c5c] 0xa7855c5c --- interrupt: c00 Instruction dump: 3884aa88 38630178 48076861 807f0080 48042e45 2f830000 419e0148 3c80c079 3c60c076 38841be4 386301c0 4801f705 <0fe00000> 3860000b 4bfffe30 3c80c06b ---[ end trace fd69b91a8046c2e5 ]---
Here the problem is that by re-enterring an exception handler, kuap_save_and_lock() is called a second time with this time KUAP access locked, leading to regs->kuap being overwritten hence KUAP not being unlocked at exception exit as expected.
Do not call do_IRQ() from timer_interrupt() directly. Instead, redefine do_IRQ() as a standard function named __do_IRQ(), and call it from both do_IRQ() and time_interrupt() handlers.
Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Stan Johnson userm57@yahoo.com Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Reviewed-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/c17d234f4927d39a1d7100864a8e1145323d33a0.162861192... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/interrupt.h | 3 +++ arch/powerpc/include/asm/irq.h | 2 +- arch/powerpc/kernel/irq.c | 7 ++++++- arch/powerpc/kernel/time.c | 2 +- 4 files changed, 11 insertions(+), 3 deletions(-)
--- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -531,6 +531,9 @@ DECLARE_INTERRUPT_HANDLER_NMI(hmi_except
DECLARE_INTERRUPT_HANDLER_ASYNC(TAUException);
+/* irq.c */ +DECLARE_INTERRUPT_HANDLER_ASYNC(do_IRQ); + void __noreturn unrecoverable_exception(struct pt_regs *regs);
void replay_system_reset(void); --- a/arch/powerpc/include/asm/irq.h +++ b/arch/powerpc/include/asm/irq.h @@ -53,7 +53,7 @@ extern void *mcheckirq_ctx[NR_CPUS]; extern void *hardirq_ctx[NR_CPUS]; extern void *softirq_ctx[NR_CPUS];
-extern void do_IRQ(struct pt_regs *regs); +void __do_IRQ(struct pt_regs *regs); extern void __init init_IRQ(void); extern void __do_irq(struct pt_regs *regs);
--- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -654,7 +654,7 @@ void __do_irq(struct pt_regs *regs) trace_irq_exit(regs); }
-DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ) +void __do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); void *cursp, *irqsp, *sirqsp; @@ -678,6 +678,11 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ) set_irq_regs(old_regs); }
+DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ) +{ + __do_IRQ(regs); +} + static void *__init alloc_vm_stack(void) { return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP, --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -607,7 +607,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_int
#if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC) if (atomic_read(&ppc_n_lost_interrupts) != 0) - do_IRQ(regs); + __do_IRQ(regs); #endif
old_regs = set_irq_regs(regs);
From: Thomas Gleixner tglx@linutronix.de
commit 438553958ba19296663c6d6583d208dfb6792830 upstream.
The ordering of MSI-X enable in hardware is dysfunctional:
1) MSI-X is disabled in the control register 2) Various setup functions 3) pci_msi_setup_msi_irqs() is invoked which ends up accessing the MSI-X table entries 4) MSI-X is enabled and masked in the control register with the comment that enabling is required for some hardware to access the MSI-X table
Step #4 obviously contradicts #3. The history of this is an issue with the NIU hardware. When #4 was introduced the table access actually happened in msix_program_entries() which was invoked after enabling and masking MSI-X.
This was changed in commit d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") which removed the table write from msix_program_entries().
Interestingly enough nobody noticed and either NIU still works or it did not get any testing with a kernel 3.19 or later.
Nevertheless this is inconsistent and there is no reason why MSI-X can't be enabled and masked in the control register early on, i.e. move step #4 above to step #1. This preserves the NIU workaround and has no side effects on other hardware.
Fixes: d71d6432e105 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Ashok Raj ashok.raj@intel.com Reviewed-by: Marc Zyngier maz@kernel.org Acked-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -772,18 +772,25 @@ static int msix_capability_init(struct p u16 control; void __iomem *base;
- /* Ensure MSI-X is disabled while it is set up */ - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + /* + * Some devices require MSI-X to be enabled before the MSI-X + * registers can be accessed. Mask all the vectors to prevent + * interrupts coming in before they're fully set up. + */ + pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL | + PCI_MSIX_FLAGS_ENABLE);
pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ base = msix_map_region(dev, msix_table_size(control)); - if (!base) - return -ENOMEM; + if (!base) { + ret = -ENOMEM; + goto out_disable; + }
ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) - return ret; + goto out_disable;
ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX); if (ret) @@ -794,14 +801,6 @@ static int msix_capability_init(struct p if (ret) goto out_free;
- /* - * Some devices require MSI-X to be enabled before we can touch the - * MSI-X registers. We need to mask all the vectors to prevent - * interrupts coming in before they're fully set up. - */ - pci_msix_clear_and_set_ctrl(dev, 0, - PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE); - msix_program_entries(dev, entries);
ret = populate_msi_sysfs(dev); @@ -836,6 +835,9 @@ out_avail: out_free: free_msi_irqs(dev);
+out_disable: + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + return ret; }
From: Thomas Gleixner tglx@linutronix.de
commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f upstream.
When MSI-X is enabled the ordering of calls is:
msix_map_region(); msix_setup_entries(); pci_msi_setup_msi_irqs(); msix_program_entries();
This has a few interesting issues:
1) msix_setup_entries() allocates the MSI descriptors and initializes them except for the msi_desc:masked member which is left zero initialized.
2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets up the MSI interrupts which ends up in pci_write_msi_msg() unless the interrupt chip provides its own irq_write_msi_msg() function.
3) msix_program_entries() does not do what the name suggests. It solely updates the entries array (if not NULL) and initializes the masked member for each MSI descriptor by reading the hardware state and then masks the entry.
Obviously this has some issues:
1) The uninitialized masked member of msi_desc prevents the enforcement of masking the entry in pci_write_msi_msg() depending on the cached masked bit. Aside of that half initialized data is a NONO in general
2) msix_program_entries() only ensures that the actually allocated entries are masked. This is wrong as experimentation with crash testing and crash kernel kexec has shown.
This limited testing unearthed that when the production kernel had more entries in use and unmasked when it crashed and the crash kernel allocated a smaller amount of entries, then a full scan of all entries found unmasked entries which were in use in the production kernel.
This is obviously a device or emulation issue as the device reset should mask all MSI-X table entries, but obviously that's just part of the paper specification.
Cure this by:
1) Masking all table entries in hardware 2) Initializing msi_desc::masked in msix_setup_entries() 3) Removing the mask dance in msix_program_entries() 4) Renaming msix_program_entries() to msix_update_entries() to reflect the purpose of that function.
As the masking of unused entries has never been done the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Acked-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 45 +++++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 18 deletions(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -691,6 +691,7 @@ static int msix_setup_entries(struct pci { struct irq_affinity_desc *curmsk, *masks = NULL; struct msi_desc *entry; + void __iomem *addr; int ret, i; int vec_count = pci_msix_vec_count(dev);
@@ -711,6 +712,7 @@ static int msix_setup_entries(struct pci
entry->msi_attrib.is_msix = 1; entry->msi_attrib.is_64 = 1; + if (entries) entry->msi_attrib.entry_nr = entries[i].entry; else @@ -722,6 +724,10 @@ static int msix_setup_entries(struct pci entry->msi_attrib.default_irq = dev->irq; entry->mask_base = base;
+ addr = pci_msix_desc_addr(entry); + if (addr) + entry->masked = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL); + list_add_tail(&entry->list, dev_to_msi_list(&dev->dev)); if (masks) curmsk++; @@ -732,26 +738,25 @@ out: return ret; }
-static void msix_program_entries(struct pci_dev *dev, - struct msix_entry *entries) +static void msix_update_entries(struct pci_dev *dev, struct msix_entry *entries) { struct msi_desc *entry; - int i = 0; - void __iomem *desc_addr;
for_each_pci_msi_entry(entry, dev) { - if (entries) - entries[i++].vector = entry->irq; + if (entries) { + entries->vector = entry->irq; + entries++; + } + } +}
- desc_addr = pci_msix_desc_addr(entry); - if (desc_addr) - entry->masked = readl(desc_addr + - PCI_MSIX_ENTRY_VECTOR_CTRL); - else - entry->masked = 0; +static void msix_mask_all(void __iomem *base, int tsize) +{ + u32 ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT; + int i;
- msix_mask_irq(entry, 1); - } + for (i = 0; i < tsize; i++, base += PCI_MSIX_ENTRY_SIZE) + writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL); }
/** @@ -768,9 +773,9 @@ static void msix_program_entries(struct static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, int nvec, struct irq_affinity *affd) { - int ret; - u16 control; void __iomem *base; + int ret, tsize; + u16 control;
/* * Some devices require MSI-X to be enabled before the MSI-X @@ -782,12 +787,16 @@ static int msix_capability_init(struct p
pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ - base = msix_map_region(dev, msix_table_size(control)); + tsize = msix_table_size(control); + base = msix_map_region(dev, tsize); if (!base) { ret = -ENOMEM; goto out_disable; }
+ /* Ensure that all table entries are masked. */ + msix_mask_all(base, tsize); + ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -801,7 +810,7 @@ static int msix_capability_init(struct p if (ret) goto out_free;
- msix_program_entries(dev, entries); + msix_update_entries(dev, entries);
ret = populate_msi_sysfs(dev); if (ret)
From: Thomas Gleixner tglx@linutronix.de
commit da181dc974ad667579baece33c2c8d2d1e4558d5 upstream.
The specification (PCIe r5.0, sec 6.1.4.5) states:
For MSI-X, a function is permitted to cache Address and Data values from unmasked MSI-X Table entries. However, anytime software unmasks a currently masked MSI-X Table entry either by clearing its Mask bit or by clearing the Function Mask bit, the function must update any Address or Data values that it cached from that entry. If software changes the Address or Data value of an entry while the entry is unmasked, the result is undefined.
The Linux kernel's MSI-X support never enforced that the entry is masked before the entry is modified hence the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Enforce the entry to be masked across the update.
There is no point in enforcing this to be handled at all possible call sites as this is just pointless code duplication and the common update function is the obvious place to enforce this.
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Reported-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Acked-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -289,13 +289,28 @@ void __pci_write_msi_msg(struct msi_desc /* Don't touch the hardware now */ } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); + bool unmasked = !(entry->masked & PCI_MSIX_ENTRY_CTRL_MASKBIT);
if (!base) goto skip;
+ /* + * The specification mandates that the entry is masked + * when the message is modified: + * + * "If software changes the Address or Data value of an + * entry while the entry is unmasked, the result is + * undefined." + */ + if (unmasked) + __pci_msix_desc_mask_irq(entry, PCI_MSIX_ENTRY_CTRL_MASKBIT); + writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); writel(msg->data, base + PCI_MSIX_ENTRY_DATA); + + if (unmasked) + __pci_msix_desc_mask_irq(entry, 0); } else { int pos = dev->msi_cap; u16 msgctl;
From: Thomas Gleixner tglx@linutronix.de
commit b9255a7cb51754e8d2645b65dd31805e282b4f3e upstream.
Nothing enforces the posted writes to be visible when the function returns. Flush them even if the flush might be redundant when the entry is masked already as the unmask will flush as well. This is either setup or a rare affinity change event so the extra flush is not the end of the world.
While this is more a theoretical issue especially the logic in the X86 specific msi_set_affinity() function relies on the assumption that the update has reached the hardware when the function returns.
Again, as this never has been enforced the Fixes tag refers to a commit in: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Acked-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -311,6 +311,9 @@ void __pci_write_msi_msg(struct msi_desc
if (unmasked) __pci_msix_desc_mask_irq(entry, 0); + + /* Ensure that the writes are visible in the device */ + readl(base + PCI_MSIX_ENTRY_DATA); } else { int pos = dev->msi_cap; u16 msgctl; @@ -331,6 +334,8 @@ void __pci_write_msi_msg(struct msi_desc pci_write_config_word(dev, pos + PCI_MSI_DATA_32, msg->data); } + /* Ensure that the writes are visible in the device */ + pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl); }
skip:
From: Thomas Gleixner tglx@linutronix.de
commit 361fd37397f77578735907341579397d5bed0a2d upstream.
msi_mask_irq() takes a mask and a flags argument. The mask argument is used to mask out bits from the cached mask and the flags argument to set bits.
Some places invoke it with a flags argument which sets bits which are not used by the device, i.e. when the device supports up to 8 vectors a full unmask in some places sets the mask to 0xFFFFFF00. While devices probably do not care, it's still bad practice.
Fixes: 7ba1930db02f ("PCI MSI: Unmask MSI if setup failed") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -656,21 +656,21 @@ static int msi_capability_init(struct pc /* Configure MSI capability structure */ ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; }
ret = msi_verify_entries(dev); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; }
ret = populate_msi_sysfs(dev); if (ret) { - msi_mask_irq(entry, mask, ~mask); + msi_mask_irq(entry, mask, 0); free_msi_irqs(dev); return ret; } @@ -962,7 +962,7 @@ static void pci_msi_shutdown(struct pci_ /* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); /* Keep cached state to be restored */ - __pci_msi_desc_mask_irq(desc, mask, ~mask); + __pci_msi_desc_mask_irq(desc, mask, 0);
/* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
From: Thomas Gleixner tglx@linutronix.de
commit 689e6b5351573c38ccf92a0dd8b3e2c2241e4aff upstream.
The comments about preserving the cached state in pci_msi[x]_shutdown() are misleading as the MSI descriptors are freed right after those functions return. So there is nothing to restore. Preparatory change.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.621609423@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -961,7 +961,6 @@ static void pci_msi_shutdown(struct pci_
/* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); - /* Keep cached state to be restored */ __pci_msi_desc_mask_irq(desc, mask, 0);
/* Restore dev->irq to its default pin-assertion IRQ */ @@ -1047,10 +1046,8 @@ static void pci_msix_shutdown(struct pci }
/* Return the device with MSI-X masked as initial states */ - for_each_pci_msi_entry(entry, dev) { - /* Keep cached states to be restored */ + for_each_pci_msi_entry(entry, dev) __pci_msix_desc_mask_irq(entry, 1); - }
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); pci_intx_for_msi(dev, 1);
From: Thomas Gleixner tglx@linutronix.de
commit d28d4ad2a1aef27458b3383725bb179beb8d015c upstream.
No point in using the raw write function from shutdown. Preparatory change to introduce proper serialization for the msi_desc::masked cache.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.674391354@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -961,7 +961,7 @@ static void pci_msi_shutdown(struct pci_
/* Return the device with MSI unmasked as initial states */ mask = msi_mask(desc->msi_attrib.multi_cap); - __pci_msi_desc_mask_irq(desc, mask, 0); + msi_mask_irq(desc, mask, 0);
/* Restore dev->irq to its default pin-assertion IRQ */ dev->irq = desc->msi_attrib.default_irq;
From: Thomas Gleixner tglx@linutronix.de
commit 77e89afc25f30abd56e76a809ee2884d7c1b63ce upstream.
Multi-MSI uses a single MSI descriptor and there is a single mask register when the device supports per vector masking. To avoid reading back the mask register the value is cached in the MSI descriptor and updates are done by clearing and setting bits in the cache and writing it to the device.
But nothing protects msi_desc::masked and the mask register from being modified concurrently on two different CPUs for two different Linux interrupts which belong to the same multi-MSI descriptor.
Add a lock to struct device and protect any operation on the mask and the mask register with it.
This makes the update of msi_desc::masked unconditional, but there is no place which requires a modification of the hardware register without updating the masked cache.
msi_mask_irq() is now an empty wrapper which will be cleaned up in follow up changes.
The problem goes way back to the initial support of multi-MSI, but picking the commit which introduced the mask cache is a valid cut off point (2.6.30).
Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/core.c | 1 + drivers/pci/msi.c | 19 ++++++++++--------- include/linux/device.h | 1 + include/linux/msi.h | 2 +- 4 files changed, 13 insertions(+), 10 deletions(-)
--- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2809,6 +2809,7 @@ void device_initialize(struct device *de device_pm_init(dev); set_dev_node(dev, -1); #ifdef CONFIG_GENERIC_MSI_IRQ + raw_spin_lock_init(&dev->msi_lock); INIT_LIST_HEAD(&dev->msi_list); #endif INIT_LIST_HEAD(&dev->links.consumers); --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -143,24 +143,25 @@ static inline __attribute_const__ u32 ms * reliably as devices without an INTx disable bit will then generate a * level IRQ which will never be cleared. */ -u32 __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) +void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { - u32 mask_bits = desc->masked; + raw_spinlock_t *lock = &desc->dev->msi_lock; + unsigned long flags;
if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit) - return 0; + return;
- mask_bits &= ~mask; - mask_bits |= flag; + raw_spin_lock_irqsave(lock, flags); + desc->masked &= ~mask; + desc->masked |= flag; pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos, - mask_bits); - - return mask_bits; + desc->masked); + raw_spin_unlock_irqrestore(lock, flags); }
static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) { - desc->masked = __pci_msi_desc_mask_irq(desc, mask, flag); + __pci_msi_desc_mask_irq(desc, mask, flag); }
static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) --- a/include/linux/device.h +++ b/include/linux/device.h @@ -496,6 +496,7 @@ struct device { struct dev_pin_info *pins; #endif #ifdef CONFIG_GENERIC_MSI_IRQ + raw_spinlock_t msi_lock; struct list_head msi_list; #endif #ifdef CONFIG_DMA_OPS --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -233,7 +233,7 @@ void __pci_read_msi_msg(struct msi_desc void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag); -u32 __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); +void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag); void pci_msi_mask_irq(struct irq_data *data); void pci_msi_unmask_irq(struct irq_data *data);
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 01fcac8e4dfc112f420dcaeb70056a74e326cacf upstream.
single_step_exception() is called by emulate_single_step() which is called from (at least) alignment exception() handler and program_check_exception() handler.
Redefine it as a regular __single_step_exception() which is called by both single_step_exception() handler and emulate_single_step() function.
Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Reviewed-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/aed174f5cbc06f2cf95233c071d8aac948e46043.162861192... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/traps.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1103,7 +1103,7 @@ DEFINE_INTERRUPT_HANDLER(RunModeExceptio _exception(SIGTRAP, regs, TRAP_UNK, 0); }
-DEFINE_INTERRUPT_HANDLER(single_step_exception) +static void __single_step_exception(struct pt_regs *regs) { clear_single_step(regs); clear_br_trace(regs); @@ -1120,6 +1120,11 @@ DEFINE_INTERRUPT_HANDLER(single_step_exc _exception(SIGTRAP, regs, TRAP_TRACE, regs->nip); }
+DEFINE_INTERRUPT_HANDLER(single_step_exception) +{ + __single_step_exception(regs); +} + /* * After we have successfully emulated an instruction, we have to * check if the instruction was being single-stepped, and if so, @@ -1129,7 +1134,7 @@ DEFINE_INTERRUPT_HANDLER(single_step_exc static void emulate_single_step(struct pt_regs *regs) { if (single_stepping(regs)) - single_step_exception(regs); + __single_step_exception(regs); }
static inline int __parse_fpscr(unsigned long fpscr)
From: Laurent Dufour ldufour@linux.ibm.com
commit c18956e6e0b95f78dad2773ecc8c61a9e41f6405 upstream.
After LPM, when migrating from a system with security mitigation enabled to a system with mitigation disabled, the security flavor exposed in /proc is not correctly set back to 0.
Do not assume the value of the security flavor is set to 0 when entering init_cpu_char_feature_flags(), so when called after a LPM, the value is set correctly even if the mitigation are not turned off.
Fixes: 6ce56e1ac380 ("powerpc/pseries: export LPAR security flavor in lparcfg") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Laurent Dufour ldufour@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210805152308.33988-1-ldufour@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/setup.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -539,9 +539,10 @@ static void init_cpu_char_feature_flags( * H_CPU_BEHAV_FAVOUR_SECURITY_H could be set only if * H_CPU_BEHAV_FAVOUR_SECURITY is. */ - if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)) + if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)) { security_ftr_clear(SEC_FTR_FAVOUR_SECURITY); - else if (result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY_H) + pseries_security_flavor = 0; + } else if (result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY_H) pseries_security_flavor = 1; else pseries_security_flavor = 2;
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 62376365048878f770d8b7d11b89b8b3e18018f1 upstream.
When a DSI (Data Storage Interrupt) is taken while in NAP mode, r11 doesn't survive the call to power_save_ppc32_restore().
So use r1 instead of r11 as they both contain the virtual stack pointer at that point.
Fixes: 4c0104a83fc3 ("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Finn Thain fthain@linux-m68k.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/731694e0885271f6ee9ffc179eb4bcee78313682.162800356... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/head_book3s_32.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -300,7 +300,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_HP EXCEPTION_PROLOG_1 EXCEPTION_PROLOG_2 INTERRUPT_DATA_STORAGE DataAccess handle_dar_dsisr=1 prepare_transfer_to_handler - lwz r5, _DSISR(r11) + lwz r5, _DSISR(r1) andis. r0, r5, DSISR_DABRMATCH@h bne- 1f bl do_page_fault
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 8241461536f21bbe51308a6916d1c9fb2e6b75a7 upstream.
Running an SMP kernel on an UP platform not prepared for it, I encountered the following OOPS:
BUG: Kernel NULL pointer dereference on read at 0x00000034 Faulting instruction address: 0xc0a04110 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=2 CMPCPRO Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-pmac-00001-g230fedfaad21 #5234 NIP: c0a04110 LR: c0a040d8 CTR: c0a04084 REGS: e100dda0 TRAP: 0300 Not tainted (5.13.0-pmac-00001-g230fedfaad21) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 84000284 XER: 00000000 DAR: 00000034 DSISR: 20000000 GPR00: c0006bd4 e100de60 c1033320 00000000 00000000 c0942274 00000000 00000000 GPR08: 00000000 00000000 00000001 00000063 00000007 00000000 c0006f30 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000005 GPR24: c0c67d74 c0c67f1c c0c60000 c0c67d70 c0c0c558 1efdf000 c0c00020 00000000 NIP [c0a04110] topology_init+0x8c/0x138 LR [c0a040d8] topology_init+0x54/0x138 Call Trace: [e100de60] [80808080] 0x80808080 (unreliable) [e100de90] [c0006bd4] do_one_initcall+0x48/0x1bc [e100def0] [c0a0150c] kernel_init_freeable+0x1c8/0x278 [e100df20] [c0006f44] kernel_init+0x14/0x10c [e100df30] [c00190fc] ret_from_kernel_thread+0x14/0x1c Instruction dump: 7c692e70 7d290194 7c035040 7c7f1b78 5529103a 546706fe 5468103a 39400001 7c641b78 40800054 80c690b4 7fb9402e <81060034> 7fbeea14 2c080000 7fa3eb78 ---[ end trace b246ffbc6bbbb6fb ]---
Fix it by checking smp_ops before using it, as already done in several other places in the arch/powerpc/kernel/smp.c
Fixes: 39f87561454d ("powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/75287841cbb8740edd44880fe60be66d489160d9.162809799... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -1167,7 +1167,7 @@ static int __init topology_init(void) * CPU. For instance, the boot cpu might never be valid * for hotplugging. */ - if (smp_ops->cpu_offline_self) + if (smp_ops && smp_ops->cpu_offline_self) c->hotpluggable = 1; #endif
From: Cédric Le Goater clg@kaod.org
commit cbc06f051c524dcfe52ef0d1f30647828e226d30 upstream.
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at runtime. Today, the IPI is not created for such nodes, and hot-plugged CPUs use a bogus IPI, which leads to soft lockups.
We can not directly allocate and request the IPI on demand because bringup_up() is called under the IRQ sparse lock. The alternative is to allocate the IPIs for all possible nodes at startup and to request the mapping on demand when the first CPU of a node is brought up.
Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") Cc: stable@vger.kernel.org # v5.13 Reported-by: Geetika Moolchandani Geetika.Moolchandani1@ibm.com Signed-off-by: Cédric Le Goater clg@kaod.org Tested-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Tested-by: Laurent Vivier lvivier@redhat.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210807072057.184698-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/sysdev/xive/common.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)
--- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -67,6 +67,7 @@ static struct irq_domain *xive_irq_domai static struct xive_ipi_desc { unsigned int irq; char name[16]; + atomic_t started; } *xive_ipis;
/* @@ -1120,7 +1121,7 @@ static const struct irq_domain_ops xive_ .alloc = xive_ipi_irq_domain_alloc, };
-static int __init xive_request_ipi(void) +static int __init xive_init_ipis(void) { struct fwnode_handle *fwnode; struct irq_domain *ipi_domain; @@ -1144,10 +1145,6 @@ static int __init xive_request_ipi(void) struct xive_ipi_desc *xid = &xive_ipis[node]; struct xive_ipi_alloc_info info = { node };
- /* Skip nodes without CPUs */ - if (cpumask_empty(cpumask_of_node(node))) - continue; - /* * Map one IPI interrupt per node for all cpus of that node. * Since the HW interrupt number doesn't have any meaning, @@ -1159,11 +1156,6 @@ static int __init xive_request_ipi(void) xid->irq = ret;
snprintf(xid->name, sizeof(xid->name), "IPI-%d", node); - - ret = request_irq(xid->irq, xive_muxed_ipi_action, - IRQF_PERCPU | IRQF_NO_THREAD, xid->name, NULL); - - WARN(ret < 0, "Failed to request IPI %d: %d\n", xid->irq, ret); }
return ret; @@ -1178,6 +1170,22 @@ out: return ret; }
+static int __init xive_request_ipi(unsigned int cpu) +{ + struct xive_ipi_desc *xid = &xive_ipis[early_cpu_to_node(cpu)]; + int ret; + + if (atomic_inc_return(&xid->started) > 1) + return 0; + + ret = request_irq(xid->irq, xive_muxed_ipi_action, + IRQF_PERCPU | IRQF_NO_THREAD, + xid->name, NULL); + + WARN(ret < 0, "Failed to request IPI %d: %d\n", xid->irq, ret); + return ret; +} + static int xive_setup_cpu_ipi(unsigned int cpu) { unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu); @@ -1192,6 +1200,9 @@ static int xive_setup_cpu_ipi(unsigned i if (xc->hw_ipi != XIVE_BAD_IRQ) return 0;
+ /* Register the IPI */ + xive_request_ipi(cpu); + /* Grab an IPI from the backend, this will populate xc->hw_ipi */ if (xive_ops->get_ipi(cpu, xc)) return -EIO; @@ -1231,6 +1242,8 @@ static void xive_cleanup_cpu_ipi(unsigne if (xc->hw_ipi == XIVE_BAD_IRQ) return;
+ /* TODO: clear IPI mapping */ + /* Mask the IPI */ xive_do_source_set_mask(&xc->ipi_data, true);
@@ -1253,7 +1266,7 @@ void __init xive_smp_probe(void) smp_ops->cause_ipi = xive_cause_ipi;
/* Register the IPI */ - xive_request_ipi(); + xive_init_ipis();
/* Allocate and setup IPI for the boot CPU */ xive_setup_cpu_ipi(smp_processor_id());
From: Christophe Leroy christophe.leroy@csgroup.eu
commit b5cfc9cd7b0426e94ffd9e9ed79d1b00ace7780a upstream.
32 bits BOOKE have special interrupts for debug and other critical events.
When handling those interrupts, dedicated registers are saved in the stack frame in addition to the standard registers, leading to a shift of the pt_regs struct.
Since commit db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry"), the pt_regs struct is expected to be at the same place all the time.
Instead of handling a special struct in addition to pt_regs, just add those special registers to struct pt_regs.
Fixes: db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry") Cc: stable@vger.kernel.org Reported-by: Radu Rendec radu.rendec@gmail.com Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/028d5483b4851b01ea4334d0751e7f260419092b.162563726... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/ptrace.h | 16 ++++++++++++++++ arch/powerpc/kernel/asm-offsets.c | 31 ++++++++++++++----------------- arch/powerpc/kernel/head_booke.h | 27 +++------------------------ 3 files changed, 33 insertions(+), 41 deletions(-)
--- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -68,6 +68,22 @@ struct pt_regs }; unsigned long __pad[4]; /* Maintain 16 byte interrupt stack alignment */ }; +#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE) + struct { /* Must be a multiple of 16 bytes */ + unsigned long mas0; + unsigned long mas1; + unsigned long mas2; + unsigned long mas3; + unsigned long mas6; + unsigned long mas7; + unsigned long srr0; + unsigned long srr1; + unsigned long csrr0; + unsigned long csrr1; + unsigned long dsrr0; + unsigned long dsrr1; + }; +#endif }; #endif
--- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -348,24 +348,21 @@ int main(void) #endif
-#if defined(CONFIG_PPC32) -#if defined(CONFIG_BOOKE) || defined(CONFIG_40x) - DEFINE(EXC_LVL_SIZE, STACK_EXC_LVL_FRAME_SIZE); - DEFINE(MAS0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas0)); +#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE) + STACK_PT_REGS_OFFSET(MAS0, mas0); /* we overload MMUCR for 44x on MAS0 since they are mutually exclusive */ - DEFINE(MMUCR, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas0)); - DEFINE(MAS1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas1)); - DEFINE(MAS2, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas2)); - DEFINE(MAS3, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas3)); - DEFINE(MAS6, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas6)); - DEFINE(MAS7, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas7)); - DEFINE(_SRR0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, srr0)); - DEFINE(_SRR1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, srr1)); - DEFINE(_CSRR0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, csrr0)); - DEFINE(_CSRR1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, csrr1)); - DEFINE(_DSRR0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, dsrr0)); - DEFINE(_DSRR1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, dsrr1)); -#endif + STACK_PT_REGS_OFFSET(MMUCR, mas0); + STACK_PT_REGS_OFFSET(MAS1, mas1); + STACK_PT_REGS_OFFSET(MAS2, mas2); + STACK_PT_REGS_OFFSET(MAS3, mas3); + STACK_PT_REGS_OFFSET(MAS6, mas6); + STACK_PT_REGS_OFFSET(MAS7, mas7); + STACK_PT_REGS_OFFSET(_SRR0, srr0); + STACK_PT_REGS_OFFSET(_SRR1, srr1); + STACK_PT_REGS_OFFSET(_CSRR0, csrr0); + STACK_PT_REGS_OFFSET(_CSRR1, csrr1); + STACK_PT_REGS_OFFSET(_DSRR0, dsrr0); + STACK_PT_REGS_OFFSET(_DSRR1, dsrr1); #endif
#ifndef CONFIG_PPC64 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -185,20 +185,18 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV /* only on e500mc */ #define DBG_STACK_BASE dbgirq_ctx
-#define EXC_LVL_FRAME_OVERHEAD (THREAD_SIZE - INT_FRAME_SIZE - EXC_LVL_SIZE) - #ifdef CONFIG_SMP #define BOOKE_LOAD_EXC_LEVEL_STACK(level) \ mfspr r8,SPRN_PIR; \ slwi r8,r8,2; \ addis r8,r8,level##_STACK_BASE@ha; \ lwz r8,level##_STACK_BASE@l(r8); \ - addi r8,r8,EXC_LVL_FRAME_OVERHEAD; + addi r8,r8,THREAD_SIZE - INT_FRAME_SIZE; #else #define BOOKE_LOAD_EXC_LEVEL_STACK(level) \ lis r8,level##_STACK_BASE@ha; \ lwz r8,level##_STACK_BASE@l(r8); \ - addi r8,r8,EXC_LVL_FRAME_OVERHEAD; + addi r8,r8,THREAD_SIZE - INT_FRAME_SIZE; #endif
/* @@ -225,7 +223,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV mtmsr r11; \ mfspr r11,SPRN_SPRG_THREAD; /* if from user, start at top of */\ lwz r11, TASK_STACK - THREAD(r11); /* this thread's kernel stack */\ - addi r11,r11,EXC_LVL_FRAME_OVERHEAD; /* allocate stack frame */\ + addi r11,r11,THREAD_SIZE - INT_FRAME_SIZE; /* allocate stack frame */\ beq 1f; \ /* COMING FROM USER MODE */ \ stw r9,_CCR(r11); /* save CR */\ @@ -533,24 +531,5 @@ label: bl kernel_fp_unavailable_exception; \ b interrupt_return
-#else /* __ASSEMBLY__ */ -struct exception_regs { - unsigned long mas0; - unsigned long mas1; - unsigned long mas2; - unsigned long mas3; - unsigned long mas6; - unsigned long mas7; - unsigned long srr0; - unsigned long srr1; - unsigned long csrr0; - unsigned long csrr1; - unsigned long dsrr0; - unsigned long dsrr1; -}; - -/* ensure this structure is always sized to a multiple of the stack alignment */ -#define STACK_EXC_LVL_FRAME_SIZE ALIGN(sizeof (struct exception_regs), 16) - #endif /* __ASSEMBLY__ */ #endif /* __HEAD_BOOKE_H__ */
From: Ard Biesheuvel ardb@kernel.org
commit c32ac11da3f83bb42b986702a9b92f0a14ed4182 upstream.
On arm64, the stub only moves the kernel image around in memory if needed, which is typically only for KASLR, given that relocatable kernels (which is the default) can run from any 64k aligned address, which is also the minimum alignment communicated to EFI via the PE/COFF header.
Unfortunately, some loaders appear to ignore this header, and load the kernel at some arbitrary offset in memory. We can deal with this, but let's check for this condition anyway, so non-compliant code can be spotted and fixed.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Benjamin Herrenschmidt benh@kernel.crashing.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/libstub/arm64-stub.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c @@ -119,6 +119,10 @@ efi_status_t handle_kernel_image(unsigne if (image->image_base != _text) efi_err("FIRMWARE BUG: efi_loaded_image_t::image_base has bogus value\n");
+ if (!IS_ALIGNED((u64)_text, EFI_KIMG_ALIGN)) + efi_err("FIRMWARE BUG: kernel image not aligned on %ldk boundary\n", + EFI_KIMG_ALIGN >> 10); + kernel_size = _edata - _text; kernel_memsize = kernel_size + (_end - _edata); *reserve_size = kernel_memsize;
From: Zhen Lei thunder.leizhen@huawei.com
commit 07d25971b220e477eb019fcb520a9f2e3ac966af upstream.
It's CONFIG_DEBUG_RT_MUTEXES not CONFIG_DEBUG_RT_MUTEX.
Fixes: f7efc4799f81 ("locking/rtmutex: Inline chainwalk depth check") Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Will Deacon will@kernel.org Acked-by: Boqun Feng boqun.feng@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210731123011.4555-1-thunder.leizhen@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/locking/rtmutex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -343,7 +343,7 @@ static __always_inline bool rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter, enum rtmutex_chainwalk chwalk) { - if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEX)) + if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES)) return waiter != NULL; return chwalk == RT_MUTEX_FULL_CHAINWALK; }
From: Sean Christopherson seanjc@google.com
commit 7b9cae027ba3aaac295ae23a62f47876ed97da73 upstream.
Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to effectively get the controls for the current VMCS, as opposed to using vmx->secondary_exec_controls, which is the cached value of KVM's desired controls for vmcs01 and truly not reflective of any particular VMCS.
While the waitpkg control is not dynamic, i.e. vmcs01 will always hold the same waitpkg configuration as vmx->secondary_exec_controls, the same does not hold true for vmcs02 if the L1 VMM hides the feature from L2. If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL, L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP.
Fixes: 6e3ba4abcea5 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210810171952.2758100-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -538,7 +538,7 @@ static inline void decache_tsc_multiplie
static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx) { - return vmx->secondary_exec_control & + return secondary_exec_controls_get(vmx) & SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE; }
From: Sean Christopherson seanjc@google.com
commit 18712c13709d2de9516c5d3414f707c4f0a9c190 upstream.
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults.
Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept").
Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier pshier@google.com Cc: Oliver Upton oupton@google.com Cc: Jim Mattson jmattson@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210812045615.3167686-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/nested.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5798,7 +5798,8 @@ static bool nested_vmx_l0_wants_exit(str if (is_nmi(intr_info)) return true; else if (is_page_fault(intr_info)) - return vcpu->arch.apf.host_apf_flags || !enable_ept; + return vcpu->arch.apf.host_apf_flags || + vmx_need_pf_intercept(vcpu); else if (is_debug(intr_info) && vcpu->guest_debug & (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
From: Sean Christopherson seanjc@google.com
commit 524a1e4e381fc5e7781008d5bd420fd1357c0113 upstream.
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed.
Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.
Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again.
The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.
============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon bgardon@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210812181414.3376143-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/mmu/tdp_mmu.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)
--- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -41,6 +41,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm * if (!kvm->arch.tdp_mmu_enabled) return;
+ WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages)); WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
/* @@ -79,8 +80,6 @@ static void tdp_mmu_free_sp_rcu_callback void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, bool shared) { - gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT); - kvm_lockdep_assert_mmu_lock_held(kvm, shared);
if (!refcount_dec_and_test(&root->tdp_mmu_root_count)) @@ -92,7 +91,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kv list_del_rcu(&root->link); spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
- zap_gfn_range(kvm, root, 0, max_gfn, false, false, shared); + zap_gfn_range(kvm, root, 0, -1ull, false, false, shared);
call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback); } @@ -722,8 +721,17 @@ static bool zap_gfn_range(struct kvm *kv gfn_t start, gfn_t end, bool can_yield, bool flush, bool shared) { + gfn_t max_gfn_host = 1ULL << (shadow_phys_bits - PAGE_SHIFT); + bool zap_all = (start == 0 && end >= max_gfn_host); struct tdp_iter iter;
+ /* + * Bound the walk at host.MAXPHYADDR, guest accesses beyond that will + * hit a #PF(RSVD) and never get to an EPT Violation/Misconfig / #NPF, + * and so KVM will never install a SPTE for such addresses. + */ + end = min(end, max_gfn_host); + kvm_lockdep_assert_mmu_lock_held(kvm, shared);
rcu_read_lock(); @@ -742,9 +750,10 @@ retry: /* * If this is a non-last-level SPTE that covers a larger range * than should be zapped, continue, and zap the mappings at a - * lower level. + * lower level, except when zapping all SPTEs. */ - if ((iter.gfn < start || + if (!zap_all && + (iter.gfn < start || iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) && !is_last_spte(iter.old_spte, iter.level)) continue; @@ -792,12 +801,11 @@ bool __kvm_tdp_mmu_zap_gfn_range(struct
void kvm_tdp_mmu_zap_all(struct kvm *kvm) { - gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT); bool flush = false; int i;
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) - flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, max_gfn, + flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, -1ull, flush, false);
if (flush) @@ -836,7 +844,6 @@ static struct kvm_mmu_page *next_invalid */ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) { - gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT); struct kvm_mmu_page *next_root; struct kvm_mmu_page *root; bool flush = false; @@ -852,8 +859,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(s
rcu_read_unlock();
- flush = zap_gfn_range(kvm, root, 0, max_gfn, true, flush, - true); + flush = zap_gfn_range(kvm, root, 0, -1ull, true, flush, true);
/* * Put the reference acquired in
From: Sean Christopherson seanjc@google.com
commit ce25681d59ffc4303321e555a2d71b1946af07da upstream.
Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write.
Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages.
Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree.
For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise.
Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock.
Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective).
Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging).
Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon bgardon@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210812181815.3378104-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/virt/kvm/locking.rst | 8 ++++---- arch/x86/include/asm/kvm_host.h | 7 +++++++ arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-)
--- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -20,10 +20,10 @@ On x86:
- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
-- kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock is - taken inside kvm->arch.mmu_lock, and cannot be taken without already - holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise - there's no need to take kvm->arch.tdp_mmu_pages_lock at all). +- kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock and + kvm->arch.mmu_unsync_pages_lock are taken inside kvm->arch.mmu_lock, and + cannot be taken without already holding kvm->arch.mmu_lock (typically with + ``read_lock`` for the TDP MMU, thus the need for additional spinlocks).
Everything else is a leaf: no other lock is taken inside the critical sections. --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -987,6 +987,13 @@ struct kvm_arch { struct list_head lpage_disallowed_mmu_pages; struct kvm_page_track_notifier_node mmu_sp_tracker; struct kvm_page_track_notifier_head track_notifier_head; + /* + * Protects marking pages unsync during page faults, as TDP MMU page + * faults only take mmu_lock for read. For simplicity, the unsync + * pages lock is always taken when marking pages unsync regardless of + * whether mmu_lock is held for read or write. + */ + spinlock_t mmu_unsync_pages_lock;
struct list_head assigned_dev_head; struct iommu_domain *iommu_domain; --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2454,6 +2454,7 @@ bool mmu_need_write_protect(struct kvm_v bool can_unsync) { struct kvm_mmu_page *sp; + bool locked = false;
if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE)) return true; @@ -2465,9 +2466,34 @@ bool mmu_need_write_protect(struct kvm_v if (sp->unsync) continue;
+ /* + * TDP MMU page faults require an additional spinlock as they + * run with mmu_lock held for read, not write, and the unsync + * logic is not thread safe. Take the spinklock regardless of + * the MMU type to avoid extra conditionals/parameters, there's + * no meaningful penalty if mmu_lock is held for write. + */ + if (!locked) { + locked = true; + spin_lock(&vcpu->kvm->arch.mmu_unsync_pages_lock); + + /* + * Recheck after taking the spinlock, a different vCPU + * may have since marked the page unsync. A false + * positive on the unprotected check above is not + * possible as clearing sp->unsync _must_ hold mmu_lock + * for write, i.e. unsync cannot transition from 0->1 + * while this CPU holds mmu_lock for read (or write). + */ + if (READ_ONCE(sp->unsync)) + continue; + } + WARN_ON(sp->role.level != PG_LEVEL_4K); kvm_unsync_page(vcpu, sp); } + if (locked) + spin_unlock(&vcpu->kvm->arch.mmu_unsync_pages_lock);
/* * We need to ensure that the marking of unsync pages is visible @@ -5514,6 +5540,8 @@ void kvm_mmu_init_vm(struct kvm *kvm) { struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+ spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); + kvm_mmu_init_tdp_mmu(kvm);
node->track_write = kvm_mmu_pte_write;
From: Jeff Layton jlayton@kernel.org
commit a6862e6708c15995bc10614b2ef34ca35b4b9078 upstream.
Turn some comments into lockdep asserts.
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/snap.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -65,6 +65,8 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm) { + lockdep_assert_held_write(&mdsc->snap_rwsem); + dout("get_realm %p %d -> %d\n", realm, atomic_read(&realm->nref), atomic_read(&realm->nref)+1); /* @@ -113,6 +115,8 @@ static struct ceph_snap_realm *ceph_crea { struct ceph_snap_realm *realm;
+ lockdep_assert_held_write(&mdsc->snap_rwsem); + realm = kzalloc(sizeof(*realm), GFP_NOFS); if (!realm) return ERR_PTR(-ENOMEM); @@ -143,6 +147,8 @@ static struct ceph_snap_realm *__lookup_ struct rb_node *n = mdsc->snap_realms.rb_node; struct ceph_snap_realm *r;
+ lockdep_assert_held_write(&mdsc->snap_rwsem); + while (n) { r = rb_entry(n, struct ceph_snap_realm, node); if (ino < r->ino) @@ -176,6 +182,8 @@ static void __put_snap_realm(struct ceph static void __destroy_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm) { + lockdep_assert_held_write(&mdsc->snap_rwsem); + dout("__destroy_snap_realm %p %llx\n", realm, realm->ino);
rb_erase(&realm->node, &mdsc->snap_realms); @@ -198,6 +206,8 @@ static void __destroy_snap_realm(struct static void __put_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm) { + lockdep_assert_held_write(&mdsc->snap_rwsem); + dout("__put_snap_realm %llx %p %d -> %d\n", realm->ino, realm, atomic_read(&realm->nref), atomic_read(&realm->nref)-1); if (atomic_dec_and_test(&realm->nref)) @@ -236,6 +246,8 @@ static void __cleanup_empty_realms(struc { struct ceph_snap_realm *realm;
+ lockdep_assert_held_write(&mdsc->snap_rwsem); + spin_lock(&mdsc->snap_empty_lock); while (!list_empty(&mdsc->snap_empty)) { realm = list_first_entry(&mdsc->snap_empty, @@ -269,6 +281,8 @@ static int adjust_snap_realm_parent(stru { struct ceph_snap_realm *parent;
+ lockdep_assert_held_write(&mdsc->snap_rwsem); + if (realm->parent_ino == parentino) return 0;
@@ -696,6 +710,8 @@ int ceph_update_snap_trace(struct ceph_m int err = -ENOMEM; LIST_HEAD(dirty_realms);
+ lockdep_assert_held_write(&mdsc->snap_rwsem); + dout("update_snap_trace deletion=%d\n", deletion); more: ceph_decode_need(&p, e, sizeof(*ri), bad);
From: Jeff Layton jlayton@kernel.org
commit df2c0cb7f8e8c83e495260ad86df8c5da947f2a7 upstream.
They both say that the snap_rwsem must be held for write, but I don't see any real reason for it, and it's not currently always called that way.
The lookup is just walking the rbtree, so holding it for read should be fine there. The "get" is bumping the refcount and (possibly) removing it from the empty list. I see no need to hold the snap_rwsem for write for that.
Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/snap.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -60,12 +60,12 @@ /* * increase ref count for the realm * - * caller must hold snap_rwsem for write. + * caller must hold snap_rwsem. */ void ceph_get_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm) { - lockdep_assert_held_write(&mdsc->snap_rwsem); + lockdep_assert_held(&mdsc->snap_rwsem);
dout("get_realm %p %d -> %d\n", realm, atomic_read(&realm->nref), atomic_read(&realm->nref)+1); @@ -139,7 +139,7 @@ static struct ceph_snap_realm *ceph_crea /* * lookup the realm rooted at @ino. * - * caller must hold snap_rwsem for write. + * caller must hold snap_rwsem. */ static struct ceph_snap_realm *__lookup_snap_realm(struct ceph_mds_client *mdsc, u64 ino) @@ -147,7 +147,7 @@ static struct ceph_snap_realm *__lookup_ struct rb_node *n = mdsc->snap_realms.rb_node; struct ceph_snap_realm *r;
- lockdep_assert_held_write(&mdsc->snap_rwsem); + lockdep_assert_held(&mdsc->snap_rwsem);
while (n) { r = rb_entry(n, struct ceph_snap_realm, node);
From: Jeff Layton jlayton@kernel.org
commit 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f upstream.
There is a race in ceph_put_snap_realm. The change to the nref and the spinlock acquisition are not done atomically, so you could decrement nref, and before you take the spinlock, the nref is incremented again. At that point, you end up putting it on the empty list when it shouldn't be there. Eventually __cleanup_empty_realms runs and frees it when it's still in-use.
Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and just drop the spinlock if we can get the rwsem.
Because these objects can also undergo a 0->1 refcount transition, we must protect that change as well with the spinlock. Increment locklessly unless the value is at 0, in which case we take the spinlock, increment and then take it off the empty list if it did the 0->1 transition.
With these changes, I'm removing the dout() messages from these functions, as well as in __put_snap_realm. They've always been racy, and it's better to not print values that may be misleading.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46419 Reported-by: Mark Nelson mnelson@redhat.com Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Luis Henriques lhenriques@suse.de Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/snap.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-)
--- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -67,19 +67,19 @@ void ceph_get_snap_realm(struct ceph_mds { lockdep_assert_held(&mdsc->snap_rwsem);
- dout("get_realm %p %d -> %d\n", realm, - atomic_read(&realm->nref), atomic_read(&realm->nref)+1); /* - * since we _only_ increment realm refs or empty the empty - * list with snap_rwsem held, adjusting the empty list here is - * safe. we do need to protect against concurrent empty list - * additions, however. + * The 0->1 and 1->0 transitions must take the snap_empty_lock + * atomically with the refcount change. Go ahead and bump the + * nref here, unless it's 0, in which case we take the spinlock + * and then do the increment and remove it from the list. */ - if (atomic_inc_return(&realm->nref) == 1) { - spin_lock(&mdsc->snap_empty_lock); + if (atomic_inc_not_zero(&realm->nref)) + return; + + spin_lock(&mdsc->snap_empty_lock); + if (atomic_inc_return(&realm->nref) == 1) list_del_init(&realm->empty_item); - spin_unlock(&mdsc->snap_empty_lock); - } + spin_unlock(&mdsc->snap_empty_lock); }
static void __insert_snap_realm(struct rb_root *root, @@ -208,28 +208,28 @@ static void __put_snap_realm(struct ceph { lockdep_assert_held_write(&mdsc->snap_rwsem);
- dout("__put_snap_realm %llx %p %d -> %d\n", realm->ino, realm, - atomic_read(&realm->nref), atomic_read(&realm->nref)-1); + /* + * We do not require the snap_empty_lock here, as any caller that + * increments the value must hold the snap_rwsem. + */ if (atomic_dec_and_test(&realm->nref)) __destroy_snap_realm(mdsc, realm); }
/* - * caller needn't hold any locks + * See comments in ceph_get_snap_realm. Caller needn't hold any locks. */ void ceph_put_snap_realm(struct ceph_mds_client *mdsc, struct ceph_snap_realm *realm) { - dout("put_snap_realm %llx %p %d -> %d\n", realm->ino, realm, - atomic_read(&realm->nref), atomic_read(&realm->nref)-1); - if (!atomic_dec_and_test(&realm->nref)) + if (!atomic_dec_and_lock(&realm->nref, &mdsc->snap_empty_lock)) return;
if (down_write_trylock(&mdsc->snap_rwsem)) { + spin_unlock(&mdsc->snap_empty_lock); __destroy_snap_realm(mdsc, realm); up_write(&mdsc->snap_rwsem); } else { - spin_lock(&mdsc->snap_empty_lock); list_add(&realm->empty_item, &mdsc->snap_empty); spin_unlock(&mdsc->snap_empty_lock); }
From: Kuan-Ying Lee Kuan-Ying.Lee@mediatek.com
commit 340caf178ddc2efb0294afaf54c715f7928c258e upstream.
The address still includes the tags when it is printed. With hardware tag-based kasan enabled, we will get a false positive KASAN issue when we access metadata.
Reset the tag before we access the metadata.
Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Kuan-Ying Lee Kuan-Ying.Lee@mediatek.com Reviewed-by: Marco Elver elver@google.com Reviewed-by: Andrey Konovalov andreyknvl@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chinwen Chang chinwen.chang@mediatek.com Cc: Nicholas Tang nicholas.tang@mediatek.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/slub.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/slub.c +++ b/mm/slub.c @@ -551,8 +551,8 @@ static void print_section(char *level, c unsigned int length) { metadata_access_enable(); - print_hex_dump(level, kasan_reset_tag(text), DUMP_PREFIX_ADDRESS, - 16, 1, addr, length, 1); + print_hex_dump(level, text, DUMP_PREFIX_ADDRESS, + 16, 1, kasan_reset_tag((void *)addr), length, 1); metadata_access_disable(); }
On Mon, 16 Aug 2021 15:00:30 +0200, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.13.12 release. There are 151 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 18 Aug 2021 12:54:12 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.13.12-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.13.y and the diffstat can be found below.
thanks,
greg k-h
5.13.12-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
linux-stable-mirror@lists.linaro.org