This is the start of the stable review cycle for the 5.10.54 release. There are 167 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jul 2021 15:38:12 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.54-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.54-rc1
Mathias Nyman mathias.nyman@linux.intel.com xhci: add xhci_get_virt_ep() helper
Íñigo Huguet ihuguet@redhat.com sfc: ensure correct number of XDP queues
Colin Xu colin.xu@intel.com drm/i915/gvt: Clear d3_entered on elsp cmd submission.
David Jeffery djeffery@redhat.com usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
Riccardo Mancini rickyman7@gmail.com perf inject: Close inject.output on exit
Robert Richter rrichter@amd.com Documentation: Fix intiramfs script name
Paul Blakey paulb@nvidia.com skbuff: Release nfct refcount on napi stolen or re-used skbs
Mahesh Bandewar maheshb@google.com bonding: fix build issue
Evan Quan evan.quan@amd.com PCI: Mark AMD Navi14 GPU ATS as broken
Marek Behún kabel@kernel.org net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
Marek Behún kabel@kernel.org net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
Likun Gao Likun.Gao@amd.com drm/amdgpu: update golden setting for sienna_cichlid
Charles Baylis cb-kernel@fishzet.co.uk drm: Return -ENOTTY for non-drm ioctls
Jason Ekstrand jason@jlekstrand.net Revert "drm/i915: Propagate errors on awaiting already signaled fences"
Adrian Hunter adrian.hunter@intel.com driver core: Prevent warning when removing a device link from unregistered consumer
Greg Kroah-Hartman gregkh@linuxfoundation.org nds32: fix up stack guard gap
Jérôme Glisse jglisse@redhat.com misc: eeprom: at24: Always append device id even if label property is set.
Ilya Dryomov idryomov@gmail.com rbd: always kick acquire on "acquired" and "released" notifications
Ilya Dryomov idryomov@gmail.com rbd: don't hold lock_rwsem while running_list is being drained
Mike Kravetz mike.kravetz@oracle.com hugetlbfs: fix mount mode command line processing
Mike Rapoport rppt@kernel.org memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
Peter Collingbourne pcc@google.com userfaultfd: do not untag user pointers
Pavel Begunkov asml.silence@gmail.com io_uring: remove double poll entry on arm failure
Pavel Begunkov asml.silence@gmail.com io_uring: explicitly count entries for poll reqs
Peter Collingbourne pcc@google.com selftest: use mmap instead of posix_memalign to allocate memory
Frederic Weisbecker frederic@kernel.org posix-cpu-timers: Fix rearm racing against process tick
Bhaumik Bhatt bbhatt@codeaurora.org bus: mhi: core: Validate channel ID when processing command completions
Markus Boehme markubo@amazon.com ixgbe: Fix packet corruption due to missing DMA sync
Gustavo A. R. Silva gustavoars@kernel.org media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
Anand Jain anand.jain@oracle.com btrfs: check for missing device in btrfs_trim_fs
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Synthetic event field_pos is an index not a boolean
Haoran Luo www@aegistudio.net tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
Steven Rostedt (VMware) rostedt@goodmis.org tracing/histogram: Rename "cpu" to "common_cpu"
Steven Rostedt (VMware) rostedt@goodmis.org tracepoints: Update static_call before tp_funcs when adding a tracepoint
Marc Zyngier maz@kernel.org firmware/efi: Tell memblock about EFI iomem reservations
Amelie Delaunay amelie.delaunay@foss.st.com usb: typec: stusb160x: register role switch before interrupt registration
Minas Harutyunyan Minas.Harutyunyan@synopsys.com usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
Minas Harutyunyan Minas.Harutyunyan@synopsys.com usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
Zhang Qilong zhangqilong3@huawei.com usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
John Keeping john@metanate.com USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
Ian Ray ian.ray@ge.com USB: serial: cp210x: fix comments for GE CS1000
Marco De Marco marco.demarco@posteo.net USB: serial: option: add support for u-blox LARA-R6 family
Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz usb: max-3421: Prevent corruption of freed memory
Julian Sikorski belegdol@gmail.com USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
Mathias Nyman mathias.nyman@linux.intel.com usb: hub: Fix link power management max exit latency (MEL) calculations
Mathias Nyman mathias.nyman@linux.intel.com usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
Nicholas Piggin npiggin@gmail.com KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
Nicholas Piggin npiggin@gmail.com KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix lost USB 2 remote wake
Greg Thelen gthelen@google.com usb: xhci: avoid renesas_usb_fw.mem when it's unusable
Moritz Fischer mdf@kernel.org Revert "usb: renesas-xhci: Fix handling of unknown ROM state"
Takashi Iwai tiwai@suse.de ALSA: pcm: Fix mmap capability check
Alan Young consult.awy@gmail.com ALSA: pcm: Call substream ack() method upon compat mmap commit
Takashi Iwai tiwai@suse.de ALSA: hdmi: Expose all pins on MSI MS-7C94 board
Hui Wang hui.wang@canonical.com ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
Takashi Iwai tiwai@suse.de ALSA: sb: Fix potential ABBA deadlock in CSP driver
Alexander Tsoy alexander@tsoy.me ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
Alexander Egorenkov egorenar@linux.ibm.com s390/boot: fix use of expolines in the DMA code
Vasily Gorbik gor@linux.ibm.com s390/ftrace: fix ftrace_update_ftrace_func implementation
Stephen Boyd swboyd@chromium.org mmc: core: Don't allocate IDA for OF aliases
Marcelo Henrique Cerri marcelo.cerri@canonical.com proc: Avoid mixing integer types in mem_rw()
Ronnie Sahlberg lsahlber@redhat.com cifs: fix fallocate when trying to allocate a hole.
Ronnie Sahlberg lsahlber@redhat.com cifs: only write 64kb at a time when fallocating a small region of a file
Maxime Ripard maxime@cerno.tech drm/panel: raspberrypi-touchscreen: Prevent double-free
Yajun Deng yajun.deng@linux.dev net: sched: cls_api: Fix the the wrong parameter
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: sja1105: make VID 4095 a bridge VLAN too
Wei Wang weiwan@google.com tcp: disable TFO blackhole logic by default
Xin Long lucien.xin@gmail.com sctp: update active_key for asoc when old key is being replaced
Christoph Hellwig hch@lst.de nvme: set the PRACT bit when using Write Zeroes with T10 PI
Sayanta Pattanayak sayanta.pattanayak@arm.com r8169: Avoid duplicate sysfs entry creation error
David Howells dhowells@redhat.com afs: Fix tracepoint string placement with built-in AFS
Vincent Palatin vpalatin@chromium.org Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
Zhihao Cheng chengzhihao1@huawei.com nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
Luis Henriques lhenriques@suse.de ceph: don't WARN if we're still opening a session to an MDS
Paolo Abeni pabeni@redhat.com ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
Peilin Ye peilin.ye@bytedance.com net/sched: act_skbmod: Skip non-Ethernet packets
Alexandru Tachici alexandru.tachici@analog.com spi: spi-bcm2835: Fix deadlock
Jian Shen shenjian15@huawei.com net: hns3: fix rx VLAN offload state inconsistent issue
Chengwen Feng fengchengwen@huawei.com net: hns3: fix possible mismatches resp of mailbox
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
Eric Dumazet edumazet@google.com net/tcp_fastopen: fix data races around tfo_active_disable_stamp
Randy Dunlap rdunlap@infradead.org net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Check abort error state in bnxt_half_open_nic()
Michael Chan michael.chan@broadcom.com bnxt_en: Validate vlan protocol ID on RX packets
Michael Chan michael.chan@broadcom.com bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task()
Michael Chan michael.chan@broadcom.com bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe()
Kalesh AP kalesh-anakkur.purayil@broadcom.com bnxt_en: don't disable an already disabled PCI device
Robert Richter rrichter@amd.com ACPI: Kconfig: Fix table override from built-in initrd
Marek Vasut marex@denx.de spi: cadence: Correct initialisation of runtime PM again
Dmitry Bogdanov d.bogdanov@yadro.com scsi: target: Fix protect handling in WRITE SAME(32)
Mike Christie michael.christie@oracle.com scsi: iscsi: Fix iface sysfs attr detection
Nguyen Dinh Phi phind.uet@gmail.com netrom: Decrease sock refcount when sock timers expire
Xin Long lucien.xin@gmail.com sctp: trim optlen when it's a huge value in sctp_setsockopt
Pavel Skripkin paskripkin@gmail.com net: sched: fix memory leak in tcindex_partial_destroy_work
Nicholas Piggin npiggin@gmail.com KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
Nicholas Piggin npiggin@gmail.com KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
Yajun Deng yajun.deng@linux.dev net: decnet: Fix sleeping inside in af_decnet
Michal Suchanek msuchanek@suse.de efi/tpm: Differentiate missing and invalid final event log table.
Roman Skakun Roman_Skakun@epam.com dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
Dongliang Mu mudongliangabcd@gmail.com usb: hso: fix error handling code of hso_create_net_device
Ziyang Xuan william.xuanziyang@huawei.com net: fix uninit-value in caif_seqpkt_sendmsg
Tobias Klauser tklauser@distanz.ch bpftool: Check malloc return value in mount_bpffs_for_pin
Jakub Sitnicki jakub@cloudflare.com bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats
John Fastabend john.fastabend@gmail.com bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats
John Fastabend john.fastabend@gmail.com bpf, sockmap: Fix potential memory leak on unlikely error case
Colin Ian King colin.king@canonical.com s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
Colin Ian King colin.king@canonical.com liquidio: Fix unintentional sign extension issue on left shift of u16
Nicolas Saenz Julienne nsaenzju@redhat.com timers: Fix get_next_timer_interrupt() with no timers pending
Xuan Zhuo xuanzhuo@linux.alibaba.com xdp, net: Fix use-after-free in bpf_xdp_link_release
Daniel Borkmann daniel@iogearbox.net bpf: Fix tail_call_reachable rejection for interpreter when jit failed
Xuan Zhuo xuanzhuo@linux.alibaba.com bpf, test: fix NULL pointer dereference on invalid expected_attach_type
Maxim Schwalm maxim.schwalm@gmail.com ASoC: rt5631: Fix regcache sync errors on resume
Peter Hess peter.hess@ph-home.de spi: mediatek: fix fifo rx mode
Axel Lin axel.lin@ingics.com regulator: hi6421: Fix getting wrong drvdata
Axel Lin axel.lin@ingics.com regulator: hi6421: Use correct variable type for regmap api val argument
Alain Volmat alain.volmat@foss.st.com spi: stm32: fixes pm_runtime calls in probe/remove
Clark Wang xiaoning.wang@nxp.com spi: imx: add a check for speed_hz before calculating the clock
Charles Keepax ckeepax@opensource.cirrus.com ASoC: wm_adsp: Correct wm_coeff_tlv_get handling
Yang Jihong yangjihong1@huawei.com perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
Riccardo Mancini rickyman7@gmail.com perf data: Close all files in close_dir()
Riccardo Mancini rickyman7@gmail.com perf probe-file: Delete namelist in del_events() on the error path
Riccardo Mancini rickyman7@gmail.com perf lzma: Close lzma stream on exit
Riccardo Mancini rickyman7@gmail.com perf script: Fix memory 'threads' and 'cpus' leaks on exit
Riccardo Mancini rickyman7@gmail.com perf report: Free generated help strings for sort option
Riccardo Mancini rickyman7@gmail.com perf env: Fix memory leak of cpu_pmu_caps
Riccardo Mancini rickyman7@gmail.com perf test maps__merge_in: Fix memory leak of maps
Riccardo Mancini rickyman7@gmail.com perf dso: Fix memory leak in dso__new_map()
Riccardo Mancini rickyman7@gmail.com perf test event_update: Fix memory leak of evlist
Riccardo Mancini rickyman7@gmail.com perf test session_topology: Delete session->evlist
Riccardo Mancini rickyman7@gmail.com perf env: Fix sibling_dies memory leak
Riccardo Mancini rickyman7@gmail.com perf probe: Fix dso->nsinfo refcounting
Riccardo Mancini rickyman7@gmail.com perf map: Fix dso->nsinfo refcounting
Riccardo Mancini rickyman7@gmail.com perf inject: Fix dso->nsinfo refcounting
Like Xu like.xu.linux@gmail.com KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
Casey Chen cachen@purestorage.com nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
Jianguo Wu wujianguo@chinatelecom.cn mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join
Antoine Tenart atenart@kernel.org net: do not reuse skbuff allocated from skbuff_fclone_cache in the skb cache
Shahjada Abul Husain shahjada@chelsio.com cxgb4: fix IRQ free race during driver unload
Uwe Kleine-König u.kleine-koenig@pengutronix.de pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
Hangbin Liu liuhangbin@gmail.com selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
Hangbin Liu liuhangbin@gmail.com selftests: icmp_redirect: remove from checking for IPv6 route get
YueHaibing yuehaibing@huawei.com stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
Nicolas Dichtel nicolas.dichtel@6wind.com ipv6: fix 'disable_policy' for fwd packets
Taehee Yoo ap420073@gmail.com bonding: fix incorrect return value of bond_ipsec_offload_ok()
Taehee Yoo ap420073@gmail.com bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
Taehee Yoo ap420073@gmail.com bonding: Add struct bond_ipesc to manage SA
Taehee Yoo ap420073@gmail.com bonding: disallow setting nested bonding + ipsec offload
Taehee Yoo ap420073@gmail.com bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
Taehee Yoo ap420073@gmail.com ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
Taehee Yoo ap420073@gmail.com bonding: fix null dereference in bond_ipsec_add_sa()
Taehee Yoo ap420073@gmail.com bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
Björn Töpel bjorn.topel@intel.com net: Introduce preferred busy-polling
Aleksandr Nogikh nogikh@google.com net: add kcov handle to skb extensions
Christophe JAILLET christophe.jaillet@wanadoo.fr gve: Fix an error handling path in 'gve_probe()'
Jedrzej Jagielski jedrzej.jagielski@intel.com igb: Fix position of assignment to *ring
Aleksandr Loktionov aleksandr.loktionov@intel.com igb: Check if num of q_vectors is smaller than max before array access
Christophe JAILLET christophe.jaillet@wanadoo.fr iavf: Fix an error handling path in 'iavf_probe()'
Christophe JAILLET christophe.jaillet@wanadoo.fr e1000e: Fix an error handling path in 'e1000_probe()'
Christophe JAILLET christophe.jaillet@wanadoo.fr fm10k: Fix an error handling path in 'fm10k_probe()'
Christophe JAILLET christophe.jaillet@wanadoo.fr igb: Fix an error handling path in 'igb_probe()'
Christophe JAILLET christophe.jaillet@wanadoo.fr igc: Fix an error handling path in 'igc_probe()'
Christophe JAILLET christophe.jaillet@wanadoo.fr ixgbe: Fix an error handling path in 'ixgbe_probe()'
Tom Rix trix@redhat.com igc: change default return of igc_read_phy_reg()
Vinicius Costa Gomes vinicius.gomes@intel.com igb: Fix use-after-free error during reset
Vinicius Costa Gomes vinicius.gomes@intel.com igc: Fix use-after-free error during reset
-------------
Diffstat:
Documentation/arm64/tagged-address-abi.rst | 26 ++- .../early-userspace/early_userspace_support.rst | 8 +- .../filesystems/ramfs-rootfs-initramfs.rst | 2 +- Documentation/networking/ip-sysctl.rst | 2 +- Documentation/trace/histogram.rst | 2 +- Makefile | 4 +- arch/alpha/include/uapi/asm/socket.h | 2 + arch/mips/include/uapi/asm/socket.h | 2 + arch/nds32/mm/mmap.c | 2 +- arch/parisc/include/uapi/asm/socket.h | 2 + arch/powerpc/kvm/book3s_hv.c | 2 + arch/powerpc/kvm/book3s_hv_nested.c | 20 +++ arch/powerpc/kvm/book3s_rtas.c | 25 ++- arch/powerpc/kvm/powerpc.c | 4 +- arch/s390/boot/text_dma.S | 19 +-- arch/s390/include/asm/ftrace.h | 1 + arch/s390/kernel/ftrace.c | 2 + arch/s390/kernel/mcount.S | 4 +- arch/s390/net/bpf_jit_comp.c | 2 +- arch/sparc/include/uapi/asm/socket.h | 2 + arch/x86/kvm/cpuid.c | 3 +- drivers/acpi/Kconfig | 2 +- drivers/base/core.c | 6 +- drivers/block/rbd.c | 32 ++-- drivers/bus/mhi/core/main.c | 17 +- drivers/firmware/efi/efi.c | 13 +- drivers/firmware/efi/tpm.c | 8 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 + drivers/gpu/drm/drm_ioctl.c | 3 + drivers/gpu/drm/i915/gvt/handlers.c | 15 ++ drivers/gpu/drm/i915/i915_request.c | 8 +- .../gpu/drm/panel/panel-raspberrypi-touchscreen.c | 1 - drivers/media/pci/ngene/ngene-core.c | 2 +- drivers/media/pci/ngene/ngene.h | 14 +- drivers/misc/eeprom/at24.c | 17 +- drivers/mmc/core/host.c | 20 +-- drivers/net/bonding/bond_main.c | 183 +++++++++++++++++---- drivers/net/dsa/mv88e6xxx/chip.c | 10 ++ drivers/net/dsa/mv88e6xxx/serdes.c | 6 +- drivers/net/dsa/sja1105/sja1105_main.c | 6 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 34 +++- drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 9 +- .../ethernet/cavium/liquidio/cn23xx_pf_device.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 18 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 3 + drivers/net/ethernet/google/gve/gve_main.c | 5 +- drivers/net/ethernet/hisilicon/hip04_eth.c | 6 +- drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h | 6 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 1 + .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 10 ++ drivers/net/ethernet/intel/e1000e/netdev.c | 1 + drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 1 + drivers/net/ethernet/intel/iavf/iavf_main.c | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 15 +- drivers/net/ethernet/intel/igc/igc.h | 2 +- drivers/net/ethernet/intel/igc/igc_main.c | 3 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +- drivers/net/ethernet/intel/ixgbevf/ipsec.c | 20 ++- drivers/net/ethernet/realtek/r8169_main.c | 3 +- drivers/net/ethernet/sfc/efx_channels.c | 13 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 +- drivers/net/usb/hso.c | 33 ++-- drivers/nvme/host/core.c | 5 +- drivers/nvme/host/pci.c | 5 +- drivers/pci/quirks.c | 4 +- drivers/pwm/pwm-sprd.c | 11 +- drivers/regulator/hi6421-regulator.c | 30 ++-- drivers/scsi/scsi_transport_iscsi.c | 90 ++++------ drivers/spi/spi-bcm2835.c | 12 +- drivers/spi/spi-cadence.c | 14 +- drivers/spi/spi-imx.c | 37 +++-- drivers/spi/spi-mt65xx.c | 16 +- drivers/spi/spi-stm32.c | 9 +- drivers/target/target_core_sbc.c | 35 ++-- drivers/usb/core/hub.c | 120 ++++++++++---- drivers/usb/core/quirks.c | 4 - drivers/usb/dwc2/gadget.c | 31 +++- drivers/usb/gadget/udc/tegra-xudc.c | 1 + drivers/usb/host/ehci-hcd.c | 18 +- drivers/usb/host/max3421-hcd.c | 44 ++--- drivers/usb/host/xhci-hub.c | 3 +- drivers/usb/host/xhci-pci-renesas.c | 16 +- drivers/usb/host/xhci-pci.c | 7 + drivers/usb/host/xhci-ring.c | 58 +++++-- drivers/usb/host/xhci.h | 3 +- drivers/usb/renesas_usbhs/fifo.c | 7 + drivers/usb/serial/cp210x.c | 5 +- drivers/usb/serial/option.c | 3 + drivers/usb/storage/unusual_uas.h | 7 + drivers/usb/typec/stusb160x.c | 11 +- fs/afs/cmservice.c | 25 +-- fs/btrfs/extent-tree.c | 3 + fs/ceph/mds_client.c | 2 +- fs/cifs/smb2ops.c | 49 ++++-- fs/eventpoll.c | 2 +- fs/hugetlbfs/inode.c | 2 +- fs/io_uring.c | 18 +- fs/proc/base.c | 2 +- fs/userfaultfd.c | 24 ++- include/drm/drm_ioctl.h | 1 + include/linux/memblock.h | 4 +- include/linux/netdevice.h | 35 ++-- include/linux/skbuff.h | 33 ++++ include/net/bonding.h | 9 +- include/net/busy_poll.h | 5 +- include/net/sock.h | 4 + include/trace/events/afs.h | 67 +++++++- include/uapi/asm-generic/socket.h | 2 + kernel/bpf/verifier.c | 2 + kernel/dma/ops_helpers.c | 12 +- kernel/time/posix-cpu-timers.c | 10 +- kernel/time/timer.c | 8 +- kernel/trace/ring_buffer.c | 28 +++- kernel/trace/trace.c | 4 + kernel/trace/trace_events_hist.c | 22 ++- kernel/trace/trace_synth.h | 2 +- kernel/tracepoint.c | 2 +- lib/Kconfig.debug | 1 + mm/memblock.c | 3 +- net/bpf/test_run.c | 3 + net/caif/caif_socket.c | 3 +- net/core/dev.c | 107 +++++++++--- net/core/skbuff.c | 12 ++ net/core/skmsg.c | 16 +- net/core/sock.c | 9 + net/decnet/af_decnet.c | 27 ++- net/ipv4/tcp_bpf.c | 2 +- net/ipv4/tcp_fastopen.c | 28 +++- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp_bpf.c | 2 +- net/ipv6/ip6_output.c | 4 +- net/ipv6/route.c | 2 +- net/mptcp/syncookies.c | 16 +- net/netrom/nr_timer.c | 20 ++- net/sched/act_skbmod.c | 12 +- net/sched/cls_api.c | 2 +- net/sched/cls_tcindex.c | 5 +- net/sctp/auth.c | 2 + net/sctp/socket.c | 4 + sound/core/pcm_native.c | 25 ++- sound/hda/intel-dsp-config.c | 4 + sound/isa/sb/sb16_csp.c | 4 + sound/pci/hda/patch_hdmi.c | 1 + sound/pci/hda/patch_realtek.c | 1 + sound/soc/codecs/rt5631.c | 2 + sound/soc/codecs/wm_adsp.c | 2 +- sound/usb/mixer.c | 10 +- sound/usb/quirks.c | 3 + tools/bpf/bpftool/common.c | 5 + tools/perf/builtin-inject.c | 13 +- tools/perf/builtin-report.c | 33 ++-- tools/perf/builtin-sched.c | 33 +++- tools/perf/builtin-script.c | 7 + tools/perf/tests/event_update.c | 2 +- tools/perf/tests/maps.c | 2 + tools/perf/tests/topology.c | 1 + tools/perf/util/data.c | 2 +- tools/perf/util/dso.c | 4 +- tools/perf/util/env.c | 2 + tools/perf/util/lzma.c | 8 +- tools/perf/util/map.c | 2 + tools/perf/util/probe-event.c | 4 +- tools/perf/util/probe-file.c | 4 +- tools/perf/util/sort.c | 2 +- tools/perf/util/sort.h | 2 +- tools/testing/selftests/net/icmp_redirect.sh | 5 +- tools/testing/selftests/vm/userfaultfd.c | 6 +- 167 files changed, 1515 insertions(+), 637 deletions(-)
From: Vinicius Costa Gomes vinicius.gomes@intel.com
[ Upstream commit 56ea7ed103b46970e171eb1c95916f393d64eeff ]
Cleans the next descriptor to watch (next_to_watch) when cleaning the TX ring.
Failure to do so can cause invalid memory accesses. If igc_poll() runs while the controller is being reset this can lead to the driver try to free a skb that was already freed.
Log message:
[ 101.525242] refcount_t: underflow; use-after-free. [ 101.525251] WARNING: CPU: 1 PID: 646 at lib/refcount.c:28 refcount_warn_saturate+0xab/0xf0 [ 101.525259] Modules linked in: sch_etf(E) sch_mqprio(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) binfmt_misc(E) kvm_intel(E) kvm(E) irqbypass(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) mei_wdt(E) libaes(E) crypto_simd(E) cryptd(E) glue_helper(E) snd_hda_codec_hdmi(E) rapl(E) intel_cstate(E) snd_hda_intel(E) snd_intel_dspcfg(E) sg(E) soundwire_intel(E) intel_uncore(E) at24(E) soundwire_generic_allocation(E) iTCO_wdt(E) soundwire_cadence(E) intel_pmc_bxt(E) serio_raw(E) snd_hda_codec(E) iTCO_vendor_support(E) watchdog(E) snd_hda_core(E) snd_hwdep(E) snd_soc_core(E) snd_compress(E) snd_pcsp(E) soundwire_bus(E) snd_pcm(E) evdev(E) snd_timer(E) mei_me(E) snd(E) soundcore(E) mei(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) i915(E) ahci(E) libahci(E) ehci_pci(E) igb(E) xhci_pci(E) ehci_hcd(E) [ 101.525303] drm_kms_helper(E) dca(E) xhci_hcd(E) libata(E) crct10dif_pclmul(E) cec(E) crct10dif_common(E) tsn(E) igc(E) e1000e(E) ptp(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_algo_bit(E) i2c_smbus(E) scsi_mod(E) lpc_ich(E) pps_core(E) usbcore(E) drm(E) button(E) video(E) [ 101.525318] CPU: 1 PID: 646 Comm: irq/37-enp7s0-T Tainted: G E 5.10.30-rt37-tsn1-rt-ipipe #ipipe [ 101.525320] Hardware name: SIEMENS AG SIMATIC IPC427D/A5E31233588, BIOS V17.02.09 03/31/2017 [ 101.525322] RIP: 0010:refcount_warn_saturate+0xab/0xf0 [ 101.525325] Code: 05 31 48 44 01 01 e8 f0 c6 42 00 0f 0b c3 80 3d 1f 48 44 01 00 75 90 48 c7 c7 78 a8 f3 a6 c6 05 0f 48 44 01 01 e8 d1 c6 42 00 <0f> 0b c3 80 3d fe 47 44 01 00 0f 85 6d ff ff ff 48 c7 c7 d0 a8 f3 [ 101.525327] RSP: 0018:ffffbdedc0917cb8 EFLAGS: 00010286 [ 101.525329] RAX: 0000000000000000 RBX: ffff98fd6becbf40 RCX: 0000000000000001 [ 101.525330] RDX: 0000000000000001 RSI: ffffffffa6f2700c RDI: 00000000ffffffff [ 101.525332] RBP: ffff98fd6becc14c R08: ffffffffa7463d00 R09: ffffbdedc0917c50 [ 101.525333] R10: ffffffffa74c3578 R11: 0000000000000034 R12: 00000000ffffff00 [ 101.525335] R13: ffff98fd6b0b1000 R14: 0000000000000039 R15: ffff98fd6be35c40 [ 101.525337] FS: 0000000000000000(0000) GS:ffff98fd6e240000(0000) knlGS:0000000000000000 [ 101.525339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.525341] CR2: 00007f34135a3a70 CR3: 0000000150210003 CR4: 00000000001706e0 [ 101.525343] Call Trace: [ 101.525346] sock_wfree+0x9c/0xa0 [ 101.525353] unix_destruct_scm+0x7b/0xa0 [ 101.525358] skb_release_head_state+0x40/0x90 [ 101.525362] skb_release_all+0xe/0x30 [ 101.525364] napi_consume_skb+0x57/0x160 [ 101.525367] igc_poll+0xb7/0xc80 [igc] [ 101.525376] ? sched_clock+0x5/0x10 [ 101.525381] ? sched_clock_cpu+0xe/0x100 [ 101.525385] net_rx_action+0x14c/0x410 [ 101.525388] __do_softirq+0xe9/0x2f4 [ 101.525391] __local_bh_enable_ip+0xe3/0x110 [ 101.525395] ? irq_finalize_oneshot.part.47+0xe0/0xe0 [ 101.525398] irq_forced_thread_fn+0x6a/0x80 [ 101.525401] irq_thread+0xe8/0x180 [ 101.525403] ? wake_threads_waitq+0x30/0x30 [ 101.525406] ? irq_thread_check_affinity+0xd0/0xd0 [ 101.525408] kthread+0x183/0x1a0 [ 101.525412] ? kthread_park+0x80/0x80 [ 101.525415] ret_from_fork+0x22/0x30
Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings") Reported-by: Erez Geva erez.geva.ext@siemens.com Signed-off-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Dvora Fuxbrumer dvorax.fuxbrumer@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 7b822cdcc6c5..4b58dd97a7c0 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -207,6 +207,8 @@ static void igc_clean_tx_ring(struct igc_ring *tx_ring) DMA_TO_DEVICE); }
+ tx_buffer->next_to_watch = NULL; + /* move us one more past the eop_desc for start of next pkt */ tx_buffer++; i++;
From: Vinicius Costa Gomes vinicius.gomes@intel.com
[ Upstream commit 7b292608db23ccbbfbfa50cdb155d01725d7a52e ]
Cleans the next descriptor to watch (next_to_watch) when cleaning the TX ring.
Failure to do so can cause invalid memory accesses. If igb_poll() runs while the controller is reset this can lead to the driver try to free a skb that was already freed.
(The crash is harder to reproduce with the igb driver, but the same potential problem exists as the code is identical to igc)
Fixes: 7cc6fd4c60f2 ("igb: Don't bother clearing Tx buffer_info in igb_clean_tx_ring") Signed-off-by: Vinicius Costa Gomes vinicius.gomes@intel.com Reported-by: Erez Geva erez.geva.ext@siemens.com Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 4b9b5148c916..b40654664025 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -4836,6 +4836,8 @@ static void igb_clean_tx_ring(struct igb_ring *tx_ring) DMA_TO_DEVICE); }
+ tx_buffer->next_to_watch = NULL; + /* move us one more past the eop_desc for start of next pkt */ tx_buffer++; i++;
From: Tom Rix trix@redhat.com
[ Upstream commit 05682a0a61b6cbecd97a0f37f743b2cbfd516977 ]
Static analysis reports this problem
igc_main.c:4944:20: warning: The left operand of '&' is a garbage value if (!(phy_data & SR_1000T_REMOTE_RX_STATUS) && ~~~~~~~~ ^
phy_data is set by the call to igc_read_phy_reg() only if there is a read_reg() op, else it is unset and a 0 is returned. Change the return to -EOPNOTSUPP.
Fixes: 208983f099d9 ("igc: Add watchdog") Signed-off-by: Tom Rix trix@redhat.com Tested-by: Dvora Fuxbrumer dvorax.fuxbrumer@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h index 6dca67d9c25d..a97bf7a5f1d6 100644 --- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h @@ -532,7 +532,7 @@ static inline s32 igc_read_phy_reg(struct igc_hw *hw, u32 offset, u16 *data) if (hw->phy.ops.read_reg) return hw->phy.ops.read_reg(hw, offset, data);
- return 0; + return -EOPNOTSUPP; }
void igc_reinit_locked(struct igc_adapter *);
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit dd2aefcd5e37989ae5f90afdae44bbbf3a2990da ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: 6fabd715e6d8 ("ixgbe: Implement PCIe AER support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 1bfba87f1ff6..5c8f9ba43968 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -11081,6 +11081,7 @@ err_ioremap: disable_dev = !test_and_set_bit(__IXGBE_DISABLED, &adapter->state); free_netdev(netdev); err_alloc_etherdev: + pci_disable_pcie_error_reporting(pdev); pci_release_mem_regions(pdev); err_pci_reg: err_dma:
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit c6bc9e5ce5d37cb3e6b552f41b92a193db1806ab ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: c9a11c23ceb6 ("igc: Add netdev") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Tested-by: Dvora Fuxbrumer dvorax.fuxbrumer@linux.intel.com Acked-by: Sasha Neftin sasha.neftin@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 4b58dd97a7c0..b9fe2785f573 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -5223,6 +5223,7 @@ err_sw_init: err_ioremap: free_netdev(netdev); err_alloc_etherdev: + pci_disable_pcie_error_reporting(pdev); pci_release_mem_regions(pdev); err_pci_reg: err_dma:
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit fea03b1cebd653cd095f2e9a58cfe1c85661c363 ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: 40a914fa72ab ("igb: Add support for pci-e Advanced Error Reporting") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index b40654664025..43f2096a0669 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3616,6 +3616,7 @@ err_sw_init: err_ioremap: free_netdev(netdev); err_alloc_etherdev: + pci_disable_pcie_error_reporting(pdev); pci_release_mem_regions(pdev); err_pci_reg: err_dma:
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit e85e14d68f517ef12a5fb8123fff65526b35b6cd ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: 19ae1b3fb99c ("fm10k: Add support for PCI power management and error handling") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c index 9e3103fae723..caedf24c24c1 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c @@ -2227,6 +2227,7 @@ err_sw_init: err_ioremap: free_netdev(netdev); err_alloc_netdev: + pci_disable_pcie_error_reporting(pdev); pci_release_mem_regions(pdev); err_pci_reg: err_dma:
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 4589075608420bc49fcef6e98279324bf2bb91ae ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: 111b9dc5c981 ("e1000e: add aer support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Acked-by: Sasha Neftin sasha.neftin@intel.com Tested-by: Dvora Fuxbrumer dvorax.fuxbrumer@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/netdev.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index b3ad95ac3d85..361b8d0bd78d 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -7657,6 +7657,7 @@ err_flashmap: err_ioremap: free_netdev(netdev); err_alloc_etherdev: + pci_disable_pcie_error_reporting(pdev); pci_release_mem_regions(pdev); err_pci_reg: err_dma:
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit af30cbd2f4d6d66a9b6094e0aa32420bc8b20e08 ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/iavf/iavf_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index ebd08543791b..f3caf5eab8d4 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -3759,6 +3759,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) err_ioremap: free_netdev(netdev); err_alloc_etherdev: + pci_disable_pcie_error_reporting(pdev); pci_release_regions(pdev); err_pci_reg: err_dma:
From: Aleksandr Loktionov aleksandr.loktionov@intel.com
[ Upstream commit 6c19d772618fea40d9681f259368f284a330fd90 ]
Ensure that the adapter->q_vector[MAX_Q_VECTORS] array isn't accessed beyond its size. It was fixed by using a local variable num_q_vectors as a limit for loop index, and ensure that num_q_vectors is not bigger than MAX_Q_VECTORS.
Fixes: 047e0030f1e6 ("igb: add new data structure for handling interrupts and NAPI") Signed-off-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Reviewed-by: Grzegorz Siwik grzegorz.siwik@intel.com Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Reviewed-by: Slawomir Laba slawomirx.laba@intel.com Reviewed-by: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com Reviewed-by: Mateusz Palczewski mateusz.placzewski@intel.com Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 43f2096a0669..c083e5e4e8e6 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -931,6 +931,7 @@ static void igb_configure_msix(struct igb_adapter *adapter) **/ static int igb_request_msix(struct igb_adapter *adapter) { + unsigned int num_q_vectors = adapter->num_q_vectors; struct net_device *netdev = adapter->netdev; int i, err = 0, vector = 0, free_vector = 0;
@@ -939,7 +940,13 @@ static int igb_request_msix(struct igb_adapter *adapter) if (err) goto err_out;
- for (i = 0; i < adapter->num_q_vectors; i++) { + if (num_q_vectors > MAX_Q_VECTORS) { + num_q_vectors = MAX_Q_VECTORS; + dev_warn(&adapter->pdev->dev, + "The number of queue vectors (%d) is higher than max allowed (%d)\n", + adapter->num_q_vectors, MAX_Q_VECTORS); + } + for (i = 0; i < num_q_vectors; i++) { struct igb_q_vector *q_vector = adapter->q_vector[i];
vector++;
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
[ Upstream commit 382a7c20d9253bcd5715789b8179528d0f3de72c ]
Assignment to *ring should be done after correctness check of the argument queue.
Fixes: 91db364236c8 ("igb: Refactor igb_configure_cbs()") Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index c083e5e4e8e6..e24fb122c03a 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -1685,14 +1685,15 @@ static bool is_any_txtime_enabled(struct igb_adapter *adapter) **/ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue) { - struct igb_ring *ring = adapter->tx_ring[queue]; struct net_device *netdev = adapter->netdev; struct e1000_hw *hw = &adapter->hw; + struct igb_ring *ring; u32 tqavcc, tqavctrl; u16 value;
WARN_ON(hw->mac.type != e1000_i210); WARN_ON(queue < 0 || queue > 1); + ring = adapter->tx_ring[queue];
/* If any of the Qav features is enabled, configure queues as SR and * with HIGH PRIO. If none is, then configure them with LOW PRIO and
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 2342ae10d1272d411a468a85a67647dd115b344f ]
If the 'register_netdev() call fails, we must release the resources allocated by the previous 'gve_init_priv()' call, as already done in the remove function.
Add a new label and the missing 'gve_teardown_priv_resources()' in the error handling path.
Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Catherine Sullivan csully@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/google/gve/gve_main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 3a74e4645ce6..0b714b606ba1 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1340,13 +1340,16 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
err = register_netdev(dev); if (err) - goto abort_with_wq; + goto abort_with_gve_init;
dev_info(&pdev->dev, "GVE version %s\n", gve_version_str); gve_clear_probe_in_progress(priv); queue_work(priv->gve_wq, &priv->service_task); return 0;
+abort_with_gve_init: + gve_teardown_priv_resources(priv); + abort_with_wq: destroy_workqueue(priv->gve_wq);
From: Aleksandr Nogikh nogikh@google.com
[ Upstream commit 6370cc3bbd8a0f9bf975b013781243ab147876c6 ]
Remote KCOV coverage collection enables coverage-guided fuzzing of the code that is not reachable during normal system call execution. It is especially helpful for fuzzing networking subsystems, where it is common to perform packet handling in separate work queues even for the packets that originated directly from the user space.
Enable coverage-guided frame injection by adding kcov remote handle to skb extensions. Default initialization in __alloc_skb and __build_skb_around ensures that no socket buffer that was generated during a system call will be missed.
Code that is of interest and that performs packet processing should be annotated with kcov_remote_start()/kcov_remote_stop().
An alternative approach is to determine kcov_handle solely on the basis of the device/interface that received the specific socket buffer. However, in this case it would be impossible to distinguish between packets that originated during normal background network processes or were intentionally injected from the user space.
Signed-off-by: Aleksandr Nogikh nogikh@google.com Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/skbuff.h | 33 +++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 1 + net/core/skbuff.c | 11 +++++++++++ 3 files changed, 45 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a828cf99c521..2d01b2bbb746 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4150,6 +4150,9 @@ enum skb_ext_id { #endif #if IS_ENABLED(CONFIG_MPTCP) SKB_EXT_MPTCP, +#endif +#if IS_ENABLED(CONFIG_KCOV) + SKB_EXT_KCOV_HANDLE, #endif SKB_EXT_NUM, /* must be last */ }; @@ -4605,5 +4608,35 @@ static inline void skb_reset_redirect(struct sk_buff *skb) #endif }
+#ifdef CONFIG_KCOV +static inline void skb_set_kcov_handle(struct sk_buff *skb, + const u64 kcov_handle) +{ + /* Do not allocate skb extensions only to set kcov_handle to zero + * (as it is zero by default). However, if the extensions are + * already allocated, update kcov_handle anyway since + * skb_set_kcov_handle can be called to zero a previously set + * value. + */ + if (skb_has_extensions(skb) || kcov_handle) { + u64 *kcov_handle_ptr = skb_ext_add(skb, SKB_EXT_KCOV_HANDLE); + + if (kcov_handle_ptr) + *kcov_handle_ptr = kcov_handle; + } +} + +static inline u64 skb_get_kcov_handle(struct sk_buff *skb) +{ + u64 *kcov_handle = skb_ext_find(skb, SKB_EXT_KCOV_HANDLE); + + return kcov_handle ? *kcov_handle : 0; +} +#else +static inline void skb_set_kcov_handle(struct sk_buff *skb, + const u64 kcov_handle) { } +static inline u64 skb_get_kcov_handle(struct sk_buff *skb) { return 0; } +#endif /* CONFIG_KCOV */ + #endif /* __KERNEL__ */ #endif /* _LINUX_SKBUFF_H */ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 5b7f88a2876d..ffccc13d685b 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1869,6 +1869,7 @@ config KCOV depends on CC_HAS_SANCOV_TRACE_PC || GCC_PLUGINS select DEBUG_FS select GCC_PLUGIN_SANCOV if !CC_HAS_SANCOV_TRACE_PC + select SKB_EXTENSIONS help KCOV exposes kernel code coverage information in a form suitable for coverage-guided fuzzing (randomized testing). diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1301ea694b94..d17b87aabc8b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -249,6 +249,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
fclones->skb2.fclone = SKB_FCLONE_CLONE; } + + skb_set_kcov_handle(skb, kcov_common_handle()); + out: return skb; nodata: @@ -282,6 +285,8 @@ static struct sk_buff *__build_skb_around(struct sk_buff *skb, memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); atomic_set(&shinfo->dataref, 1);
+ skb_set_kcov_handle(skb, kcov_common_handle()); + return skb; }
@@ -4248,6 +4253,9 @@ static const u8 skb_ext_type_len[] = { #if IS_ENABLED(CONFIG_MPTCP) [SKB_EXT_MPTCP] = SKB_EXT_CHUNKSIZEOF(struct mptcp_ext), #endif +#if IS_ENABLED(CONFIG_KCOV) + [SKB_EXT_KCOV_HANDLE] = SKB_EXT_CHUNKSIZEOF(u64), +#endif };
static __always_inline unsigned int skb_ext_total_length(void) @@ -4264,6 +4272,9 @@ static __always_inline unsigned int skb_ext_total_length(void) #endif #if IS_ENABLED(CONFIG_MPTCP) skb_ext_type_len[SKB_EXT_MPTCP] + +#endif +#if IS_ENABLED(CONFIG_KCOV) + skb_ext_type_len[SKB_EXT_KCOV_HANDLE] + #endif 0; }
From: Björn Töpel bjorn.topel@intel.com
[ Upstream commit 7fd3253a7de6a317a0683f83739479fb880bffc8 ]
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling.
One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling.
This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed.
If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume.
In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option.
Example usage:
$ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing.
Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
Signed-off-by: Björn Töpel bjorn.topel@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/alpha/include/uapi/asm/socket.h | 2 + arch/mips/include/uapi/asm/socket.h | 2 + arch/parisc/include/uapi/asm/socket.h | 2 + arch/sparc/include/uapi/asm/socket.h | 2 + fs/eventpoll.c | 2 +- include/linux/netdevice.h | 35 +++++++----- include/net/busy_poll.h | 5 +- include/net/sock.h | 4 ++ include/uapi/asm-generic/socket.h | 2 + net/core/dev.c | 78 +++++++++++++++++++++------ net/core/sock.c | 9 ++++ 11 files changed, 111 insertions(+), 32 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index de6c4df61082..538359642554 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -124,6 +124,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_PREFER_BUSY_POLL 69 + #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index d0a9ed2ca2d6..e406e73b5e6e 100644 --- a/arch/mips/include/uapi/asm/socket.h +++ b/arch/mips/include/uapi/asm/socket.h @@ -135,6 +135,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_PREFER_BUSY_POLL 69 + #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h index 10173c32195e..1bc46200889d 100644 --- a/arch/parisc/include/uapi/asm/socket.h +++ b/arch/parisc/include/uapi/asm/socket.h @@ -116,6 +116,8 @@
#define SO_DETACH_REUSEPORT_BPF 0x4042
+#define SO_PREFER_BUSY_POLL 0x4043 + #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h index 8029b681fc7c..99688cf673a4 100644 --- a/arch/sparc/include/uapi/asm/socket.h +++ b/arch/sparc/include/uapi/asm/socket.h @@ -117,6 +117,8 @@
#define SO_DETACH_REUSEPORT_BPF 0x0047
+#define SO_PREFER_BUSY_POLL 0x0048 + #if !defined(__KERNEL__)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 6094b2e9058b..9e5b05e818ad 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -397,7 +397,7 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock) unsigned int napi_id = READ_ONCE(ep->napi_id);
if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) - napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep); + napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false); }
static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e37480b5f4c0..2488638a8749 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -350,23 +350,25 @@ struct napi_struct { };
enum { - NAPI_STATE_SCHED, /* Poll is scheduled */ - NAPI_STATE_MISSED, /* reschedule a napi */ - NAPI_STATE_DISABLE, /* Disable pending */ - NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */ - NAPI_STATE_LISTED, /* NAPI added to system lists */ - NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */ - NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */ + NAPI_STATE_SCHED, /* Poll is scheduled */ + NAPI_STATE_MISSED, /* reschedule a napi */ + NAPI_STATE_DISABLE, /* Disable pending */ + NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */ + NAPI_STATE_LISTED, /* NAPI added to system lists */ + NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */ + NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ + NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ };
enum { - NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED), - NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED), - NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE), - NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC), - NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED), - NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL), - NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), + NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED), + NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED), + NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE), + NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC), + NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED), + NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL), + NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), + NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), };
enum gro_result { @@ -437,6 +439,11 @@ static inline bool napi_disable_pending(struct napi_struct *n) return test_bit(NAPI_STATE_DISABLE, &n->state); }
+static inline bool napi_prefer_busy_poll(struct napi_struct *n) +{ + return test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state); +} + bool napi_schedule_prep(struct napi_struct *n);
/** diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index b001fa91c14e..0292b8353d7e 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -43,7 +43,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
void napi_busy_loop(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), - void *loop_end_arg); + void *loop_end_arg, bool prefer_busy_poll);
#else /* CONFIG_NET_RX_BUSY_POLL */ static inline unsigned long net_busy_loop_on(void) @@ -105,7 +105,8 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock) unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
if (napi_id >= MIN_NAPI_ID) - napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk); + napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk, + READ_ONCE(sk->sk_prefer_busy_poll)); #endif }
diff --git a/include/net/sock.h b/include/net/sock.h index 3c7addf95150..95311369567f 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -301,6 +301,7 @@ struct bpf_local_storage; * @sk_ack_backlog: current listen backlog * @sk_max_ack_backlog: listen backlog set in listen() * @sk_uid: user id of owner + * @sk_prefer_busy_poll: prefer busypolling over softirq processing * @sk_priority: %SO_PRIORITY setting * @sk_type: socket type (%SOCK_STREAM, etc) * @sk_protocol: which protocol this socket belongs in this network family @@ -479,6 +480,9 @@ struct sock { u32 sk_ack_backlog; u32 sk_max_ack_backlog; kuid_t sk_uid; +#ifdef CONFIG_NET_RX_BUSY_POLL + u8 sk_prefer_busy_poll; +#endif struct pid *sk_peer_pid; const struct cred *sk_peer_cred; long sk_rcvtimeo; diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 77f7c1638eb1..7dd02408b7ce 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -119,6 +119,8 @@
#define SO_DETACH_REUSEPORT_BPF 68
+#define SO_PREFER_BUSY_POLL 69 + #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) diff --git a/net/core/dev.c b/net/core/dev.c index 2fdf30eefc59..6b08de52bf0e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6496,7 +6496,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
- new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED); + new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | + NAPIF_STATE_PREFER_BUSY_POLL);
/* If STATE_MISSED was set, leave STATE_SCHED set, * because we will call napi->poll() one more time. @@ -6535,8 +6536,29 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
#define BUSY_POLL_BUDGET 8
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock) +static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule) { + if (!skip_schedule) { + gro_normal_list(napi); + __napi_schedule(napi); + return; + } + + if (napi->gro_bitmask) { + /* flush too old packets + * If HZ < 1000, flush all packets. + */ + napi_gro_flush(napi, HZ >= 1000); + } + + gro_normal_list(napi); + clear_bit(NAPI_STATE_SCHED, &napi->state); +} + +static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll) +{ + bool skip_schedule = false; + unsigned long timeout; int rc;
/* Busy polling means there is a high chance device driver hard irq @@ -6553,6 +6575,15 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
local_bh_disable();
+ if (prefer_busy_poll) { + napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs); + timeout = READ_ONCE(napi->dev->gro_flush_timeout); + if (napi->defer_hard_irqs_count && timeout) { + hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED); + skip_schedule = true; + } + } + /* All we really want here is to re-enable device interrupts. * Ideally, a new ndo_busy_poll_stop() could avoid another round. */ @@ -6563,19 +6594,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock) */ trace_napi_poll(napi, rc, BUSY_POLL_BUDGET); netpoll_poll_unlock(have_poll_lock); - if (rc == BUSY_POLL_BUDGET) { - /* As the whole budget was spent, we still own the napi so can - * safely handle the rx_list. - */ - gro_normal_list(napi); - __napi_schedule(napi); - } + if (rc == BUSY_POLL_BUDGET) + __busy_poll_stop(napi, skip_schedule); local_bh_enable(); }
void napi_busy_loop(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), - void *loop_end_arg) + void *loop_end_arg, bool prefer_busy_poll) { unsigned long start_time = loop_end ? busy_loop_current_time() : 0; int (*napi_poll)(struct napi_struct *napi, int budget); @@ -6603,12 +6629,18 @@ restart: * we avoid dirtying napi->state as much as we can. */ if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED | - NAPIF_STATE_IN_BUSY_POLL)) + NAPIF_STATE_IN_BUSY_POLL)) { + if (prefer_busy_poll) + set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state); goto count; + } if (cmpxchg(&napi->state, val, val | NAPIF_STATE_IN_BUSY_POLL | - NAPIF_STATE_SCHED) != val) + NAPIF_STATE_SCHED) != val) { + if (prefer_busy_poll) + set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state); goto count; + } have_poll_lock = netpoll_poll_lock(napi); napi_poll = napi->poll; } @@ -6626,7 +6658,7 @@ count:
if (unlikely(need_resched())) { if (napi_poll) - busy_poll_stop(napi, have_poll_lock); + busy_poll_stop(napi, have_poll_lock, prefer_busy_poll); preempt_enable(); rcu_read_unlock(); cond_resched(); @@ -6637,7 +6669,7 @@ count: cpu_relax(); } if (napi_poll) - busy_poll_stop(napi, have_poll_lock); + busy_poll_stop(napi, have_poll_lock, prefer_busy_poll); preempt_enable(); out: rcu_read_unlock(); @@ -6688,8 +6720,10 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) * NAPI_STATE_MISSED, since we do not react to a device IRQ. */ if (!napi_disable_pending(napi) && - !test_and_set_bit(NAPI_STATE_SCHED, &napi->state)) + !test_and_set_bit(NAPI_STATE_SCHED, &napi->state)) { + clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state); __napi_schedule_irqoff(napi); + }
return HRTIMER_NORESTART; } @@ -6747,6 +6781,7 @@ void napi_disable(struct napi_struct *n)
hrtimer_cancel(&n->timer);
+ clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state); clear_bit(NAPI_STATE_DISABLE, &n->state); } EXPORT_SYMBOL(napi_disable); @@ -6819,6 +6854,19 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) goto out_unlock; }
+ /* The NAPI context has more processing work, but busy-polling + * is preferred. Exit early. + */ + if (napi_prefer_busy_poll(n)) { + if (napi_complete_done(n, work)) { + /* If timeout is not set, we need to make sure + * that the NAPI is re-scheduled. + */ + napi_schedule(n); + } + goto out_unlock; + } + if (n->gro_bitmask) { /* flush too old packets * If HZ < 1000, flush all packets. diff --git a/net/core/sock.c b/net/core/sock.c index 7de51ea15cdf..cf0e5fc3a8ba 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1167,6 +1167,12 @@ set_sndbuf: sk->sk_ll_usec = val; } break; + case SO_PREFER_BUSY_POLL: + if (valbool && !capable(CAP_NET_ADMIN)) + ret = -EPERM; + else + WRITE_ONCE(sk->sk_prefer_busy_poll, valbool); + break; #endif
case SO_MAX_PACING_RATE: @@ -1531,6 +1537,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname, case SO_BUSY_POLL: v.val = sk->sk_ll_usec; break; + case SO_PREFER_BUSY_POLL: + v.val = READ_ONCE(sk->sk_prefer_busy_poll); + break; #endif
case SO_MAX_PACING_RATE:
Hi!
From: Björn Töpel bjorn.topel@intel.com
[ Upstream commit 7fd3253a7de6a317a0683f83739479fb880bffc8 ]
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not
Do we need this in -stable? It is rather long at 400 lines, and introduces new API feature, does not fix a bug.
I can revert it on top of 5.10-stable, so I don't believe we have bugfix depending on it.
Best regards, Pavel
On Wed, Jul 28, 2021 at 09:48:28AM +0200, Pavel Machek wrote:
Hi!
From: Björn Töpel bjorn.topel@intel.com
[ Upstream commit 7fd3253a7de6a317a0683f83739479fb880bffc8 ]
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not
Do we need this in -stable? It is rather long at 400 lines, and introduces new API feature, does not fix a bug.
It was needed for a patch that was dropped, so I dropped this too, thanks.
greg k-h
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit b648eba4c69e5819880b4907e7fcb2bb576069ab ]
To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock().
Test commands: ip link add dummy0 type dummy ip link add bond0 type bond ip link set dummy0 master bond0 ip link set dummy0 up ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \ mode transport \ reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel \ src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \ dev bond0 dir in
Splat looks like: ============================= WARNING: suspicious RCU usage 5.13.0-rc3+ #1168 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/684: #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user] 55.191733][ T684] stack backtrace: CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_add_sa+0x18c/0x1f0 [bonding] xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? netlink_ack+0x9d0/0x9d0 ? netlink_deliver_tap+0x17c/0xa50 xfrm_netlink_rcv+0x68/0x80 [xfrm_user] netlink_unicast+0x41c/0x610 ? netlink_attachskb+0x710/0x710 netlink_sendmsg+0x6b9/0xb70 [ ... ]
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 345a3f61c723..8bb90e97898d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -387,10 +387,12 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) struct net_device *bond_dev = xs->xso.dev; struct bonding *bond; struct slave *slave; + int err;
if (!bond_dev) return -EINVAL;
+ rcu_read_lock(); bond = netdev_priv(bond_dev); slave = rcu_dereference(bond->curr_active_slave); xs->xso.real_dev = slave->dev; @@ -399,10 +401,13 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) if (!(slave->dev->xfrmdev_ops && slave->dev->xfrmdev_ops->xdo_dev_state_add)) { slave_warn(bond_dev, slave->dev, "Slave does not support ipsec offload\n"); + rcu_read_unlock(); return -EINVAL; }
- return slave->dev->xfrmdev_ops->xdo_dev_state_add(xs); + err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs); + rcu_read_unlock(); + return err; }
/**
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 105cd17a866017b45f3c45901b394c711c97bf40 ]
If bond doesn't have real device, bond->curr_active_slave is null. But bond_ipsec_add_sa() dereferences bond->curr_active_slave without null checking. So, null-ptr-deref would occur.
Test commands: ip link add bond0 type bond ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \ 0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168 RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding] Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48 RSP: 0018:ffff88810946f508 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100 RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11 R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000 R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330 FS: 00007efc5552e680(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? netlink_ack+0x9d0/0x9d0 ? netlink_deliver_tap+0x17c/0xa50 xfrm_netlink_rcv+0x68/0x80 [xfrm_user] netlink_unicast+0x41c/0x610 ? netlink_attachskb+0x710/0x710 netlink_sendmsg+0x6b9/0xb70 [ ...]
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8bb90e97898d..a66d639c415f 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -395,6 +395,11 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) rcu_read_lock(); bond = netdev_priv(bond_dev); slave = rcu_dereference(bond->curr_active_slave); + if (!slave) { + rcu_read_unlock(); + return -ENODEV; + } + xs->xso.real_dev = slave->dev; bond->xs = xs;
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 2de7e4f67599affc97132bd07e30e3bd59d0b777 ]
There are two pointers in struct xfrm_state_offload, *dev, *real_dev. These are used in callback functions of struct xfrmdev_ops. The *dev points whether bonding interface or real interface. If bonding ipsec offload is used, it points bonding interface If not, it points real interface. And real_dev always points real interface. So, ixgbevf should always use real_dev instead of dev. Of course, real_dev always not be null.
Test commands: ip link add bond0 type bond #eth0 is ixgbevf interface ip link set eth0 master bond0 ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \ transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 6 PID: 688 Comm: ip Not tainted 5.13.0-rc3+ #1168 RIP: 0010:ixgbevf_ipsec_find_empty_idx+0x28/0x1b0 [ixgbevf] Code: 00 00 0f 1f 44 00 00 55 53 48 89 fb 48 83 ec 08 40 84 f6 0f 84 9c 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 01 0f 8e 4c 01 00 00 66 81 3b 00 04 0f RSP: 0018:ffff8880089af390 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff8880089af4f8 R08: 0000000000000003 R09: fffffbfff4287e11 R10: 0000000000000001 R11: ffff888005de8908 R12: 0000000000000000 R13: ffff88810936a000 R14: ffff88810936a000 R15: ffff888004d78040 FS: 00007fdf9883a680(0000) GS:ffff88811a400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bc14adbf40 CR3: 000000000b87c005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ixgbevf_ipsec_add_sa+0x1bf/0x9c0 [ixgbevf] ? rcu_read_lock_sched_held+0x91/0xc0 ? ixgbevf_ipsec_parse_proto_keys.isra.9+0x280/0x280 [ixgbevf] ? lock_acquire+0x191/0x720 ? bond_ipsec_add_sa+0x48/0x350 [bonding] ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? rcu_read_lock_held+0x91/0xa0 ? rcu_read_lock_sched_held+0xc0/0xc0 bond_ipsec_add_sa+0x193/0x350 [bonding] xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 [ ... ]
Fixes: 272c2330adc9 ("xfrm: bail early on slave pass over skb") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbevf/ipsec.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c index caaea2c920a6..e3e4676af9e4 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c +++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c @@ -211,7 +211,7 @@ struct xfrm_state *ixgbevf_ipsec_find_rx_state(struct ixgbevf_ipsec *ipsec, static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs, u32 *mykey, u32 *mysalt) { - struct net_device *dev = xs->xso.dev; + struct net_device *dev = xs->xso.real_dev; unsigned char *key_data; char *alg_name = NULL; int key_len; @@ -260,12 +260,15 @@ static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs, **/ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs) { - struct net_device *dev = xs->xso.dev; - struct ixgbevf_adapter *adapter = netdev_priv(dev); - struct ixgbevf_ipsec *ipsec = adapter->ipsec; + struct net_device *dev = xs->xso.real_dev; + struct ixgbevf_adapter *adapter; + struct ixgbevf_ipsec *ipsec; u16 sa_idx; int ret;
+ adapter = netdev_priv(dev); + ipsec = adapter->ipsec; + if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) { netdev_err(dev, "Unsupported protocol 0x%04x for IPsec offload\n", xs->id.proto); @@ -383,11 +386,14 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs) **/ static void ixgbevf_ipsec_del_sa(struct xfrm_state *xs) { - struct net_device *dev = xs->xso.dev; - struct ixgbevf_adapter *adapter = netdev_priv(dev); - struct ixgbevf_ipsec *ipsec = adapter->ipsec; + struct net_device *dev = xs->xso.real_dev; + struct ixgbevf_adapter *adapter; + struct ixgbevf_ipsec *ipsec; u16 sa_idx;
+ adapter = netdev_priv(dev); + ipsec = adapter->ipsec; + if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) { sa_idx = xs->xso.offload_handle - IXGBE_IPSEC_BASE_RX_INDEX;
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit a22c39b831a081da9b2c488bd970a4412d926f30 ]
To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock().
Test commands: ip netns add A ip netns exec A bash modprobe netdevsim echo "1 1" > /sys/bus/netdevsim/new_device ip link add bond0 type bond ip link set eth0 master bond0 ip link set eth0 up ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \ transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in ip x s f
Splat looks like: ============================= WARNING: suspicious RCU usage 5.13.0-rc3+ #1168 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ip/705: #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user] #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2}, at: xfrm_state_delete+0x16/0x30
stack backtrace: CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_del_sa+0x16a/0x1c0 [bonding] __xfrm_state_delete+0x51f/0x730 xfrm_state_delete+0x1e/0x30 xfrm_state_flush+0x22f/0x390 xfrm_flush_sa+0xd8/0x260 [xfrm_user] ? xfrm_flush_policy+0x290/0x290 [xfrm_user] xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 [ ... ]
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index a66d639c415f..952796fb5f1a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -428,21 +428,24 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) if (!bond_dev) return;
+ rcu_read_lock(); bond = netdev_priv(bond_dev); slave = rcu_dereference(bond->curr_active_slave);
if (!slave) - return; + goto out;
xs->xso.real_dev = slave->dev;
if (!(slave->dev->xfrmdev_ops && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) { slave_warn(bond_dev, slave->dev, "%s: no slave xdo_dev_state_delete\n", __func__); - return; + goto out; }
slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs); +out: + rcu_read_unlock(); }
/**
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit b121693381b112b78c076dea171ee113e237c0e4 ]
bonding interface can be nested and it supports ipsec offload. So, it allows setting the nested bonding + ipsec scenario. But code does not support this scenario. So, it should be disallowed.
interface graph: bond2 | bond1 | eth0
The nested bonding + ipsec offload may not a real usecase. So, disallowing this scenario is fine.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 952796fb5f1a..3555798879f2 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -403,8 +403,9 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) xs->xso.real_dev = slave->dev; bond->xs = xs;
- if (!(slave->dev->xfrmdev_ops - && slave->dev->xfrmdev_ops->xdo_dev_state_add)) { + if (!slave->dev->xfrmdev_ops || + !slave->dev->xfrmdev_ops->xdo_dev_state_add || + netif_is_bond_master(slave->dev)) { slave_warn(bond_dev, slave->dev, "Slave does not support ipsec offload\n"); rcu_read_unlock(); return -EINVAL; @@ -437,8 +438,9 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
xs->xso.real_dev = slave->dev;
- if (!(slave->dev->xfrmdev_ops - && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) { + if (!slave->dev->xfrmdev_ops || + !slave->dev->xfrmdev_ops->xdo_dev_state_delete || + netif_is_bond_master(slave->dev)) { slave_warn(bond_dev, slave->dev, "%s: no slave xdo_dev_state_delete\n", __func__); goto out; } @@ -463,8 +465,9 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) return true;
- if (!(slave_dev->xfrmdev_ops - && slave_dev->xfrmdev_ops->xdo_dev_offload_ok)) { + if (!slave_dev->xfrmdev_ops || + !slave_dev->xfrmdev_ops->xdo_dev_offload_ok || + netif_is_bond_master(slave_dev)) { slave_warn(bond_dev, slave_dev, "%s: no slave xdo_dev_offload_ok\n", __func__); return false; }
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 9a5605505d9c7dbfdb89cc29a8f5fc5cf9fd2334 ]
bonding has been supporting ipsec offload. When SA is added, bonding just passes SA to its own active real interface. But it doesn't manage SA. So, when events(add/del real interface, active real interface change, etc) occur, bonding can't handle that well because It doesn't manage SA. So some problems(panic, UAF, refcnt leak)occur.
In order to make it stable, it should manage SA. That's the reason why struct bond_ipsec is added. When a new SA is added to bonding interface, it is stored in the bond_ipsec list. And the SA is passed to a current active real interface. If events occur, it uses bond_ipsec data to handle these events. bond->ipsec_list is protected by bond->ipsec_lock.
If a current active real interface is changed, the following logic works. 1. delete all SAs from old active real interface 2. Add all SAs to the new active real interface. 3. If a new active real interface doesn't support ipsec offload or SA's option, it sets real_dev to NULL.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 139 +++++++++++++++++++++++++++----- include/net/bonding.h | 9 ++- 2 files changed, 127 insertions(+), 21 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3555798879f2..484784757073 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -385,6 +385,7 @@ static int bond_vlan_rx_kill_vid(struct net_device *bond_dev, static int bond_ipsec_add_sa(struct xfrm_state *xs) { struct net_device *bond_dev = xs->xso.dev; + struct bond_ipsec *ipsec; struct bonding *bond; struct slave *slave; int err; @@ -400,9 +401,6 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) return -ENODEV; }
- xs->xso.real_dev = slave->dev; - bond->xs = xs; - if (!slave->dev->xfrmdev_ops || !slave->dev->xfrmdev_ops->xdo_dev_state_add || netif_is_bond_master(slave->dev)) { @@ -411,11 +409,63 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) return -EINVAL; }
+ ipsec = kmalloc(sizeof(*ipsec), GFP_ATOMIC); + if (!ipsec) { + rcu_read_unlock(); + return -ENOMEM; + } + xs->xso.real_dev = slave->dev; + err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs); + if (!err) { + ipsec->xs = xs; + INIT_LIST_HEAD(&ipsec->list); + spin_lock_bh(&bond->ipsec_lock); + list_add(&ipsec->list, &bond->ipsec_list); + spin_unlock_bh(&bond->ipsec_lock); + } else { + kfree(ipsec); + } rcu_read_unlock(); return err; }
+static void bond_ipsec_add_sa_all(struct bonding *bond) +{ + struct net_device *bond_dev = bond->dev; + struct bond_ipsec *ipsec; + struct slave *slave; + + rcu_read_lock(); + slave = rcu_dereference(bond->curr_active_slave); + if (!slave) + goto out; + + if (!slave->dev->xfrmdev_ops || + !slave->dev->xfrmdev_ops->xdo_dev_state_add || + netif_is_bond_master(slave->dev)) { + spin_lock_bh(&bond->ipsec_lock); + if (!list_empty(&bond->ipsec_list)) + slave_warn(bond_dev, slave->dev, + "%s: no slave xdo_dev_state_add\n", + __func__); + spin_unlock_bh(&bond->ipsec_lock); + goto out; + } + + spin_lock_bh(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + ipsec->xs->xso.real_dev = slave->dev; + if (slave->dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs)) { + slave_warn(bond_dev, slave->dev, "%s: failed to add SA\n", __func__); + ipsec->xs->xso.real_dev = NULL; + } + } + spin_unlock_bh(&bond->ipsec_lock); +out: + rcu_read_unlock(); +} + /** * bond_ipsec_del_sa - clear out this specific SA * @xs: pointer to transformer state struct @@ -423,6 +473,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) static void bond_ipsec_del_sa(struct xfrm_state *xs) { struct net_device *bond_dev = xs->xso.dev; + struct bond_ipsec *ipsec; struct bonding *bond; struct slave *slave;
@@ -436,7 +487,10 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) if (!slave) goto out;
- xs->xso.real_dev = slave->dev; + if (!xs->xso.real_dev) + goto out; + + WARN_ON(xs->xso.real_dev != slave->dev);
if (!slave->dev->xfrmdev_ops || !slave->dev->xfrmdev_ops->xdo_dev_state_delete || @@ -447,6 +501,48 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs); out: + spin_lock_bh(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->xs == xs) { + list_del(&ipsec->list); + kfree(ipsec); + break; + } + } + spin_unlock_bh(&bond->ipsec_lock); + rcu_read_unlock(); +} + +static void bond_ipsec_del_sa_all(struct bonding *bond) +{ + struct net_device *bond_dev = bond->dev; + struct bond_ipsec *ipsec; + struct slave *slave; + + rcu_read_lock(); + slave = rcu_dereference(bond->curr_active_slave); + if (!slave) { + rcu_read_unlock(); + return; + } + + spin_lock_bh(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (!ipsec->xs->xso.real_dev) + continue; + + if (!slave->dev->xfrmdev_ops || + !slave->dev->xfrmdev_ops->xdo_dev_state_delete || + netif_is_bond_master(slave->dev)) { + slave_warn(bond_dev, slave->dev, + "%s: no slave xdo_dev_state_delete\n", + __func__); + } else { + slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs); + } + ipsec->xs->xso.real_dev = NULL; + } + spin_unlock_bh(&bond->ipsec_lock); rcu_read_unlock(); }
@@ -458,22 +554,27 @@ out: static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) { struct net_device *bond_dev = xs->xso.dev; - struct bonding *bond = netdev_priv(bond_dev); - struct slave *curr_active = rcu_dereference(bond->curr_active_slave); - struct net_device *slave_dev = curr_active->dev; + struct net_device *real_dev; + struct slave *curr_active; + struct bonding *bond; + + bond = netdev_priv(bond_dev); + curr_active = rcu_dereference(bond->curr_active_slave); + real_dev = curr_active->dev;
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) return true;
- if (!slave_dev->xfrmdev_ops || - !slave_dev->xfrmdev_ops->xdo_dev_offload_ok || - netif_is_bond_master(slave_dev)) { - slave_warn(bond_dev, slave_dev, "%s: no slave xdo_dev_offload_ok\n", __func__); + if (!xs->xso.real_dev) + return false; + + if (!real_dev->xfrmdev_ops || + !real_dev->xfrmdev_ops->xdo_dev_offload_ok || + netif_is_bond_master(real_dev)) { return false; }
- xs->xso.real_dev = slave_dev; - return slave_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); + return real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); }
static const struct xfrmdev_ops bond_xfrmdev_ops = { @@ -990,8 +1091,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) return;
#ifdef CONFIG_XFRM_OFFLOAD - if (old_active && bond->xs) - bond_ipsec_del_sa(bond->xs); + bond_ipsec_del_sa_all(bond); #endif /* CONFIG_XFRM_OFFLOAD */
if (new_active) { @@ -1067,10 +1167,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) }
#ifdef CONFIG_XFRM_OFFLOAD - if (new_active && bond->xs) { - xfrm_dev_state_flush(dev_net(bond->dev), bond->dev, true); - bond_ipsec_add_sa(bond->xs); - } + bond_ipsec_add_sa_all(bond); #endif /* CONFIG_XFRM_OFFLOAD */
/* resend IGMP joins since active slave has changed or @@ -3309,6 +3406,7 @@ static int bond_master_netdev_event(unsigned long event, return bond_event_changename(event_bond); case NETDEV_UNREGISTER: bond_remove_proc_entry(event_bond); + xfrm_dev_state_flush(dev_net(bond_dev), bond_dev, true); break; case NETDEV_REGISTER: bond_create_proc_entry(event_bond); @@ -4742,7 +4840,8 @@ void bond_setup(struct net_device *bond_dev) #ifdef CONFIG_XFRM_OFFLOAD /* set up xfrm device ops (only supported in active-backup right now) */ bond_dev->xfrmdev_ops = &bond_xfrmdev_ops; - bond->xs = NULL; + INIT_LIST_HEAD(&bond->ipsec_list); + spin_lock_init(&bond->ipsec_lock); #endif /* CONFIG_XFRM_OFFLOAD */
/* don't acquire bond device's netif_tx_lock when transmitting */ diff --git a/include/net/bonding.h b/include/net/bonding.h index adc3da776970..67d676059aa0 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -199,6 +199,11 @@ struct bond_up_slave { */ #define BOND_LINK_NOCHANGE -1
+struct bond_ipsec { + struct list_head list; + struct xfrm_state *xs; +}; + /* * Here are the locking policies for the two bonding locks: * Get rcu_read_lock when reading or RTNL when writing slave list. @@ -247,7 +252,9 @@ struct bonding { #endif /* CONFIG_DEBUG_FS */ struct rtnl_link_stats64 bond_stats; #ifdef CONFIG_XFRM_OFFLOAD - struct xfrm_state *xs; + struct list_head ipsec_list; + /* protecting ipsec_list */ + spinlock_t ipsec_lock; #endif /* CONFIG_XFRM_OFFLOAD */ };
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 955b785ec6b3b2f9b91914d6eeac8ee66ee29239 ]
To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock().
Splat looks like: WARNING: suspicious RCU usage 5.13.0-rc6+ #1179 Not tainted drivers/net/bonding/bond_main.c:571 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ping/974: #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x1303/0x2cb0
stack backtrace: CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_offload_ok+0x1f4/0x260 [bonding] xfrm_output+0x179/0x890 xfrm4_output+0xfa/0x410 ? __xfrm4_output+0x4b0/0x4b0 ? __ip_make_skb+0xecc/0x2030 ? xfrm4_udp_encap_rcv+0x800/0x800 ? ip_local_out+0x21/0x3a0 ip_send_skb+0x37/0xa0 raw_sendmsg+0x1bfd/0x2cb0
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 484784757073..9aa2d79aa942 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -557,24 +557,34 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) struct net_device *real_dev; struct slave *curr_active; struct bonding *bond; + int err;
bond = netdev_priv(bond_dev); + rcu_read_lock(); curr_active = rcu_dereference(bond->curr_active_slave); real_dev = curr_active->dev;
- if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) - return true; + if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { + err = true; + goto out; + }
- if (!xs->xso.real_dev) - return false; + if (!xs->xso.real_dev) { + err = false; + goto out; + }
if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_offload_ok || netif_is_bond_master(real_dev)) { - return false; + err = false; + goto out; }
- return real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); + err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); +out: + rcu_read_unlock(); + return err; }
static const struct xfrmdev_ops bond_xfrmdev_ops = {
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 168e696a36792a4a3b2525a06249e7472ef90186 ]
bond_ipsec_offload_ok() is called to check whether the interface supports ipsec offload or not. bonding interface support ipsec offload only in active-backup mode. So, if a bond interface is not in active-backup mode, it should return false but it returns true.
Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 9aa2d79aa942..1a795a858630 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -565,7 +565,7 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) real_dev = curr_active->dev;
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { - err = true; + err = false; goto out; }
From: Nicolas Dichtel nicolas.dichtel@6wind.com
[ Upstream commit ccd27f05ae7b8ebc40af5b004e94517a919aa862 ]
The goal of commit df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") was to have the disable_policy from ipv4 available on ipv6. However, it's not exactly the same mechanism. On IPv4, all packets coming from an interface, which has disable_policy set, bypass the policy check. For ipv6, this is done only for local packets, ie for packets destinated to an address configured on the incoming interface.
Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same effect for both protocols.
My first approach was to create a new kind of route cache entries, to be able to set DST_NOPOLICY without modifying routes. This would have added a lot of code. Because the local delivery path is already handled, I choose to focus on the forwarding path to minimize code churn.
Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index e889655ca0e2..341d0c7acc8b 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -478,7 +478,9 @@ int ip6_forward(struct sk_buff *skb) if (skb_warn_if_lro(skb)) goto drop;
- if (!xfrm6_policy_check(NULL, XFRM_POLICY_FWD, skb)) { + if (!net->ipv6.devconf_all->disable_policy && + !idev->cnf.disable_policy && + !xfrm6_policy_check(NULL, XFRM_POLICY_FWD, skb)) { __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); goto drop; }
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit eca81f09145d765c21dd8fb1ba5d874ca255c32c ]
The "plat->phy_interface" variable is an enum and in this context GCC will treat it as an unsigned int so the error handling is never triggered.
Fixes: b9f0b2f634c0 ("net: stmmac: platform: fix probe for ACPI devices") Signed-off-by: YueHaibing yuehaibing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index ff95400594fc..53be8fc1d125 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -399,6 +399,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) struct device_node *np = pdev->dev.of_node; struct plat_stmmacenet_data *plat; struct stmmac_dma_cfg *dma_cfg; + int phy_mode; int rc;
plat = devm_kzalloc(&pdev->dev, sizeof(*plat), GFP_KERNEL); @@ -413,10 +414,11 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) *mac = NULL; }
- plat->phy_interface = device_get_phy_mode(&pdev->dev); - if (plat->phy_interface < 0) - return ERR_PTR(plat->phy_interface); + phy_mode = device_get_phy_mode(&pdev->dev); + if (phy_mode < 0) + return ERR_PTR(phy_mode);
+ plat->phy_interface = phy_mode; plat->interface = stmmac_of_get_mac_mode(np); if (plat->interface < 0) plat->interface = plat->phy_interface;
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 24b671aad4eae423e1abf5b7f08d9a5235458b8d ]
If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC info will not be exported to userspace in rt6_fill_node(). And ip cmd will not print "from ::" to the route output. So remove this check.
Fixes: ec8105352869 ("selftests: Add redirect tests") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/icmp_redirect.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/icmp_redirect.sh b/tools/testing/selftests/net/icmp_redirect.sh index bf361f30d6ef..bfcabee50155 100755 --- a/tools/testing/selftests/net/icmp_redirect.sh +++ b/tools/testing/selftests/net/icmp_redirect.sh @@ -311,7 +311,7 @@ check_exception()
if [ "$with_redirect" = "yes" ]; then ip -netns h1 -6 ro get ${H1_VRF_ARG} ${H2_N2_IP6} | \ - grep -q "${H2_N2_IP6} from :: via ${R2_LLADDR} dev br0.*${mtu}" + grep -q "${H2_N2_IP6} .*via ${R2_LLADDR} dev br0.*${mtu}" elif [ -n "${mtu}" ]; then ip -netns h1 -6 ro get ${H1_VRF_ARG} ${H2_N2_IP6} | \ grep -q "${mtu}"
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 0e02bf5de46ae30074a2e1a8194a422a84482a1a ]
After redirecting, it's already a new path. So the old PMTU info should be cleared. The IPv6 test "mtu exception plus redirect" should only has redirect info without old PMTU.
The IPv4 test can not be changed because of legacy.
Fixes: ec8105352869 ("selftests: Add redirect tests") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/icmp_redirect.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/icmp_redirect.sh b/tools/testing/selftests/net/icmp_redirect.sh index bfcabee50155..104a7a5f13b1 100755 --- a/tools/testing/selftests/net/icmp_redirect.sh +++ b/tools/testing/selftests/net/icmp_redirect.sh @@ -309,9 +309,10 @@ check_exception() fi log_test $? 0 "IPv4: ${desc}"
- if [ "$with_redirect" = "yes" ]; then + # No PMTU info for test "redirect" and "mtu exception plus redirect" + if [ "$with_redirect" = "yes" ] && [ "$desc" != "redirect exception plus mtu" ]; then ip -netns h1 -6 ro get ${H1_VRF_ARG} ${H2_N2_IP6} | \ - grep -q "${H2_N2_IP6} .*via ${R2_LLADDR} dev br0.*${mtu}" + grep -v "mtu" | grep -q "${H2_N2_IP6} .*via ${R2_LLADDR} dev br0" elif [ -n "${mtu}" ]; then ip -netns h1 -6 ro get ${H1_VRF_ARG} ${H2_N2_IP6} | \ grep -q "${mtu}"
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 65e2e6c1c20104ed19060a38f4edbf14e9f9a9a5 ]
As the last call to sprd_pwm_apply() might have exited early if state->enabled was false, the values for period and duty_cycle stored in pwm->state might not have been written to hardware and it must be ensured that they are configured before enabling the PWM.
Fixes: 8aae4b02e8a6 ("pwm: sprd: Add Spreadtrum PWM support") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-sprd.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/pwm/pwm-sprd.c b/drivers/pwm/pwm-sprd.c index 5123d948efd6..9eeb59cb81b6 100644 --- a/drivers/pwm/pwm-sprd.c +++ b/drivers/pwm/pwm-sprd.c @@ -180,13 +180,10 @@ static int sprd_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, } }
- if (state->period != cstate->period || - state->duty_cycle != cstate->duty_cycle) { - ret = sprd_pwm_config(spc, pwm, state->duty_cycle, - state->period); - if (ret) - return ret; - } + ret = sprd_pwm_config(spc, pwm, state->duty_cycle, + state->period); + if (ret) + return ret;
sprd_pwm_write(spc, pwm->hwpwm, SPRD_PWM_ENABLE, 1); } else if (cstate->enabled) {
From: Shahjada Abul Husain shahjada@chelsio.com
[ Upstream commit 015fe6fd29c4b9ac0f61b8c4455ef88e6018b9cc ]
IRQs are requested during driver's ndo_open() and then later freed up in disable_interrupts() during driver unload. A race exists where driver can set the CXGB4_FULL_INIT_DONE flag in ndo_open() after the disable_interrupts() in driver unload path checks it, and hence misses calling free_irq().
Fix by unregistering netdevice first and sync with driver's ndo_open(). This ensures disable_interrupts() checks the flag correctly and frees up the IRQs properly.
Fixes: b37987e8db5f ("cxgb4: Disable interrupts and napi before unregistering netdev") Signed-off-by: Shahjada Abul Husain shahjada@chelsio.com Signed-off-by: Raju Rangoju rajur@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 18 ++++++++++-------- drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 3 +++ 2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 8be525c5e2e4..6698afad4379 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -2643,6 +2643,9 @@ static void detach_ulds(struct adapter *adap) { unsigned int i;
+ if (!is_uld(adap)) + return; + mutex_lock(&uld_mutex); list_del(&adap->list_node);
@@ -7145,10 +7148,13 @@ static void remove_one(struct pci_dev *pdev) */ destroy_workqueue(adapter->workq);
- if (is_uld(adapter)) { - detach_ulds(adapter); - t4_uld_clean_up(adapter); - } + detach_ulds(adapter); + + for_each_port(adapter, i) + if (adapter->port[i]->reg_state == NETREG_REGISTERED) + unregister_netdev(adapter->port[i]); + + t4_uld_clean_up(adapter);
adap_free_hma_mem(adapter);
@@ -7156,10 +7162,6 @@ static void remove_one(struct pci_dev *pdev)
cxgb4_free_mps_ref_entries(adapter);
- for_each_port(adapter, i) - if (adapter->port[i]->reg_state == NETREG_REGISTERED) - unregister_netdev(adapter->port[i]); - debugfs_remove_recursive(adapter->debugfs_root);
if (!is_t4(adapter->params.chip)) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c index 743af9e654aa..17faac715882 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c @@ -581,6 +581,9 @@ void t4_uld_clean_up(struct adapter *adap) { unsigned int i;
+ if (!is_uld(adap)) + return; + mutex_lock(&uld_mutex); for (i = 0; i < CXGB4_ULD_MAX; i++) { if (!adap->uld[i].handle)
From: Antoine Tenart atenart@kernel.org
[ Upstream commit 28b34f01a73435a754956ebae826e728c03ffa38 ]
Some socket buffers allocated in the fclone cache (in __alloc_skb) can end-up in the following path[1]:
napi_skb_finish __kfree_skb_defer napi_skb_cache_put
The issue is napi_skb_cache_put is not fclone friendly and will put those skbuff in the skb cache to be reused later, although this cache only expects skbuff allocated from skbuff_head_cache. When this happens the skbuff is eventually freed using the wrong origin cache, and we can see traces similar to:
[ 1223.947534] cache_from_obj: Wrong slab cache. skbuff_head_cache but object is from skbuff_fclone_cache [ 1223.948895] WARNING: CPU: 3 PID: 0 at mm/slab.h:442 kmem_cache_free+0x251/0x3e0 [ 1223.950211] Modules linked in: [ 1223.950680] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.13.0+ #474 [ 1223.951587] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-3.fc34 04/01/2014 [ 1223.953060] RIP: 0010:kmem_cache_free+0x251/0x3e0
Leading sometimes to other memory related issues.
Fix this by using __kfree_skb for fclone skbuff, similar to what is done the other place __kfree_skb_defer is called.
[1] At least in setups using veth pairs and tunnels. Building a kernel with KASAN we can for example see packets allocated in sk_stream_alloc_skb hit the above path and later the issue arises when the skbuff is reused.
Fixes: 9243adfc311a ("skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing") Cc: Alexander Lobakin alobakin@pm.me Signed-off-by: Antoine Tenart atenart@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c index 6b08de52bf0e..86a0fe0f4c02 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6100,6 +6100,8 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi, case GRO_MERGED_FREE: if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) napi_skb_free_stolen_head(skb); + else if (skb->fclone != SKB_FCLONE_UNAVAILABLE) + __kfree_skb(skb); else __kfree_skb(skb); break;
Hi!
[ Upstream commit 28b34f01a73435a754956ebae826e728c03ffa38 ]
Mainline is significantly different here. Patch makes no sense in 5.10, as both branches of if are same.
Best regards, Pavel
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -6100,6 +6100,8 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi, case GRO_MERGED_FREE: if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) napi_skb_free_stolen_head(skb);
else if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
else __kfree_skb(skb); break;__kfree_skb(skb);
On Tue, Jul 27, 2021 at 10:22:38AM +0200, Pavel Machek wrote:
Hi!
[ Upstream commit 28b34f01a73435a754956ebae826e728c03ffa38 ]
Mainline is significantly different here. Patch makes no sense in 5.10, as both branches of if are same.
Best regards, Pavel
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -6100,6 +6100,8 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi, case GRO_MERGED_FREE: if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) napi_skb_free_stolen_head(skb);
else if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
else __kfree_skb(skb); break;__kfree_skb(skb);
You are right, I'll go drop this patch from the queue now, thanks.
greg k-h
From: Jianguo Wu wujianguo@chinatelecom.cn
[ Upstream commit 0c71929b5893e410e0efbe1bbeca6f19a5f19956 ]
I did stress test with wrk[1] and webfsd[2] with the assistance of mptcp-tools[3]:
Server side: ./use_mptcp.sh webfsd -4 -R /tmp/ -p 8099 Client side: ./use_mptcp.sh wrk -c 200 -d 30 -t 4 http://192.168.174.129:8099/
and got the following warning message:
[ 55.552626] TCP: request_sock_subflow: Possible SYN flooding on port 8099. Sending cookies. Check SNMP counters. [ 55.553024] ------------[ cut here ]------------ [ 55.553027] WARNING: CPU: 0 PID: 10 at net/core/flow_dissector.c:984 __skb_flow_dissect+0x280/0x1650 ... [ 55.553117] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.12.0+ #18 [ 55.553121] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 55.553124] RIP: 0010:__skb_flow_dissect+0x280/0x1650 ... [ 55.553133] RSP: 0018:ffffb79580087770 EFLAGS: 00010246 [ 55.553137] RAX: 0000000000000000 RBX: ffffffff8ddb58e0 RCX: ffffb79580087888 [ 55.553139] RDX: ffffffff8ddb58e0 RSI: ffff8f7e4652b600 RDI: 0000000000000000 [ 55.553141] RBP: ffffb79580087858 R08: 0000000000000000 R09: 0000000000000008 [ 55.553143] R10: 000000008c622965 R11: 00000000d3313a5b R12: ffff8f7e4652b600 [ 55.553146] R13: ffff8f7e465c9062 R14: 0000000000000000 R15: ffffb79580087888 [ 55.553149] FS: 0000000000000000(0000) GS:ffff8f7f75e00000(0000) knlGS:0000000000000000 [ 55.553152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.553154] CR2: 00007f73d1d19000 CR3: 0000000135e10004 CR4: 00000000003706f0 [ 55.553160] Call Trace: [ 55.553166] ? __sha256_final+0x67/0xd0 [ 55.553173] ? sha256+0x7e/0xa0 [ 55.553177] __skb_get_hash+0x57/0x210 [ 55.553182] subflow_init_req_cookie_join_save+0xac/0xc0 [ 55.553189] subflow_check_req+0x474/0x550 [ 55.553195] ? ip_route_output_key_hash+0x67/0x90 [ 55.553200] ? xfrm_lookup_route+0x1d/0xa0 [ 55.553207] subflow_v4_route_req+0x8e/0xd0 [ 55.553212] tcp_conn_request+0x31e/0xab0 [ 55.553218] ? selinux_socket_sock_rcv_skb+0x116/0x210 [ 55.553224] ? tcp_rcv_state_process+0x179/0x6d0 [ 55.553229] tcp_rcv_state_process+0x179/0x6d0 [ 55.553235] tcp_v4_do_rcv+0xaf/0x220 [ 55.553239] tcp_v4_rcv+0xce4/0xd80 [ 55.553243] ? ip_route_input_rcu+0x246/0x260 [ 55.553248] ip_protocol_deliver_rcu+0x35/0x1b0 [ 55.553253] ip_local_deliver_finish+0x44/0x50 [ 55.553258] ip_local_deliver+0x6c/0x110 [ 55.553262] ? ip_rcv_finish_core.isra.19+0x5a/0x400 [ 55.553267] ip_rcv+0xd1/0xe0 ...
After debugging, I found in __skb_flow_dissect(), skb->dev and skb->sk are both NULL, then net is NULL, and trigger WARN_ON_ONCE(!net), actually net is always NULL in this code path, as skb->dev is set to NULL in tcp_v4_rcv(), and skb->sk is never set.
Code snippet in __skb_flow_dissect() that trigger warning: 975 if (skb) { 976 if (!net) { 977 if (skb->dev) 978 net = dev_net(skb->dev); 979 else if (skb->sk) 980 net = sock_net(skb->sk); 981 } 982 } 983 984 WARN_ON_ONCE(!net);
So, using seq and transport header derived hash.
[1] https://github.com/wg/wrk [2] https://github.com/ourway/webfsd [3] https://github.com/pabeni/mptcp-tools
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Suggested-by: Paolo Abeni pabeni@redhat.com Suggested-by: Florian Westphal fw@strlen.de Signed-off-by: Jianguo Wu wujianguo@chinatelecom.cn Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/syncookies.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/syncookies.c b/net/mptcp/syncookies.c index abe0fd099746..37127781aee9 100644 --- a/net/mptcp/syncookies.c +++ b/net/mptcp/syncookies.c @@ -37,7 +37,21 @@ static spinlock_t join_entry_locks[COOKIE_JOIN_SLOTS] __cacheline_aligned_in_smp
static u32 mptcp_join_entry_hash(struct sk_buff *skb, struct net *net) { - u32 i = skb_get_hash(skb) ^ net_hash_mix(net); + static u32 mptcp_join_hash_secret __read_mostly; + struct tcphdr *th = tcp_hdr(skb); + u32 seq, i; + + net_get_random_once(&mptcp_join_hash_secret, + sizeof(mptcp_join_hash_secret)); + + if (th->syn) + seq = TCP_SKB_CB(skb)->seq; + else + seq = TCP_SKB_CB(skb)->seq - 1; + + i = jhash_3words(seq, net_hash_mix(net), + (__force __u32)th->source << 16 | (__force __u32)th->dest, + mptcp_join_hash_secret);
return i % ARRAY_SIZE(join_entries); }
From: Casey Chen cachen@purestorage.com
[ Upstream commit 251ef6f71be2adfd09546a26643426fe62585173 ]
nvme_dev_remove_admin could free dev->admin_q and the admin_tagset while they are being accessed by nvme_dev_disable(), which can be called by nvme_reset_work via nvme_remove_dead_ctrl.
Commit cb4bfda62afa ("nvme-pci: fix hot removal during error handling") intended to avoid requests being stuck on a removed controller by killing the admin queue. But the later fix c8e9e9b7646e ("nvme-pci: unquiesce admin queue on shutdown"), together with nvme_dev_disable(dev, true) right before nvme_dev_remove_admin() could help dispatch requests and fail them early, so we don't need nvme_dev_remove_admin() any more.
Fixes: cb4bfda62afa ("nvme-pci: fix hot removal during error handling") Signed-off-by: Casey Chen cachen@purestorage.com Reviewed-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3f05df98697d..80e1d45b0668 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3003,7 +3003,6 @@ static void nvme_remove(struct pci_dev *pdev) if (!pci_device_is_present(pdev)) { nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD); nvme_dev_disable(dev, true); - nvme_dev_remove_admin(dev); }
flush_work(&dev->ctrl.reset_work);
From: Like Xu like.xu.linux@gmail.com
[ Upstream commit 7234c362ccb3c2228f06f19f93b132de9cfa7ae4 ]
The AMD platform does not support the functions Ah CPUID leaf. The returned results for this entry should all remain zero just like the native does:
AMD host: 0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000 (uncanny) AMD guest: 0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000
Fixes: cadbaa039b99 ("perf/x86/intel: Make anythread filter support conditional") Signed-off-by: Like Xu likexu@tencent.com Message-Id: 20210628074354.33848-1-likexu@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 7a3fbf3b796e..41b0dc37720e 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -684,7 +684,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
edx.split.num_counters_fixed = min(cap.num_counters_fixed, MAX_FIXED_COUNTERS); edx.split.bit_width_fixed = cap.bit_width_fixed; - edx.split.anythread_deprecated = 1; + if (cap.version) + edx.split.anythread_deprecated = 1; edx.split.reserved1 = 0; edx.split.reserved2 = 0;
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit 0967ebffe098157180a0bbd180ac90348c6e07d7 ]
ASan reports a memory leak of nsinfo during the execution of:
# perf test "31: Lookup mmap thread"
The leak is caused by a refcounted variable being replaced without dropping the refcount.
This patch makes sure that the refcnt of nsinfo is decreased when a refcounted variable is replaced with a new value.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 27c9c3424fc217da ("perf inject: Add --buildid-all option") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/55223bc8821b34ccb01f92ef1401c02b6a32e61f.1626343... [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-inject.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 5320ac1b1285..ec7e46b63551 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -358,9 +358,10 @@ static struct dso *findnew_dso(int pid, int tid, const char *filename, dso = machine__findnew_dso_id(machine, filename, id); }
- if (dso) + if (dso) { + nsinfo__put(dso->nsinfo); dso->nsinfo = nsi; - else + } else nsinfo__put(nsi);
thread__put(thread);
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit 2d6b74baa7147251c30a46c4996e8cc224aa2dc5 ]
ASan reports a memory leak of nsinfo during the execution of
# perf test "31: Lookup mmap thread"
The leak is caused by a refcounted variable being replaced without dropping the refcount.
This patch makes sure that the refcnt of nsinfo is decreased whenever a refcounted variable is replaced with a new value.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: bf2e710b3cb8445c ("perf maps: Lookup maps in both intitial mountns and inner mountns.") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Krister Johansen kjlx@templeofstupid.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/55223bc8821b34ccb01f92ef1401c02b6a32e61f.1626343... [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/map.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index f4d44f75ba15..6688f6b253a7 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -192,6 +192,8 @@ struct map *map__new(struct machine *machine, u64 start, u64 len, if (!(prot & PROT_EXEC)) dso__set_loaded(dso); } + + nsinfo__put(dso->nsinfo); dso->nsinfo = nsi; dso__put(dso); }
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit dedeb4be203b382ba7245d13079bc3b0f6d40c65 ]
ASan reports a memory leak of nsinfo during the execution of:
# perf test "31: Lookup mmap thread".
The leak is caused by a refcounted variable being replaced without dropping the refcount.
This patch makes sure that the refcnt of nsinfo is decreased whenever a refcounted variable is replaced with a new value.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 544abd44c7064c8a ("perf probe: Allow placing uprobes in alternate namespaces.") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Krister Johansen kjlx@templeofstupid.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/55223bc8821b34ccb01f92ef1401c02b6a32e61f.1626343... [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/probe-event.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8eae2afff71a..07db6cfad65b 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -180,8 +180,10 @@ struct map *get_target_map(const char *target, struct nsinfo *nsi, bool user) struct map *map;
map = dso__new_map(target); - if (map && map->dso) + if (map && map->dso) { + nsinfo__put(map->dso->nsinfo); map->dso->nsinfo = nsinfo__get(nsi); + } return map; } else { return kernel_get_module_map(target);
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit 42db3d9ded555f7148b5695109a7dc8d66f0dde4 ]
ASan reports a memory leak in perf_env while running:
# perf test "41: Session topology"
Caused by sibling_dies not being freed.
This patch adds the required free.
Fixes: acae8b36cded0ee6 ("perf header: Add die information in CPU topology") Signed-off-by: Riccardo Mancini rickyman7@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/2140d0b57656e4eb9021ca9772250c24c032924b.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/env.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index fadc59708ece..744e51c4a6bd 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -178,6 +178,7 @@ void perf_env__exit(struct perf_env *env) zfree(&env->cpuid); zfree(&env->cmdline); zfree(&env->cmdline_argv); + zfree(&env->sibling_dies); zfree(&env->sibling_cores); zfree(&env->sibling_threads); zfree(&env->pmu_mappings);
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit 233f2dc1c284337286f9a64c0152236779a42f6c ]
ASan reports a memory leak related to session->evlist while running:
# perf test "41: Session topology".
When perf_data is in write mode, session->evlist is owned by the caller, which should also take care of deleting it.
This patch adds the missing evlist__delete().
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: c84974ed9fb67293 ("perf test: Add entry to test cpu topology") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Kan Liang kan.liang@intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/822f741f06eb25250fb60686cf30a35f447e9e91.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/topology.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c index 22daf2bdf5fa..f4a2c0df0954 100644 --- a/tools/perf/tests/topology.c +++ b/tools/perf/tests/topology.c @@ -52,6 +52,7 @@ static int session_write_header(char *path) TEST_ASSERT_VAL("failed to write header", !perf_session__write_header(session, session->evlist, data.file.fd, true));
+ evlist__delete(session->evlist); perf_session__delete(session);
return 0;
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit fc56f54f6fcd5337634f4545af6459613129b432 ]
ASan reports a memory leak when running:
# perf test "49: Synthesize attr update"
Caused by evlist not being deleted.
This patch adds the missing evlist__delete and removes the perf_cpu_map__put since it's already being deleted by evlist__delete.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: a6e5281780d1da65 ("perf tools: Add event_update event unit type") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/f7994ad63d248f7645f901132d208fadf9f2b7e4.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/event_update.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/tests/event_update.c b/tools/perf/tests/event_update.c index bdcf032f8516..1c9a6138fba1 100644 --- a/tools/perf/tests/event_update.c +++ b/tools/perf/tests/event_update.c @@ -119,6 +119,6 @@ int test__event_update(struct test *test __maybe_unused, int subtest __maybe_unu TEST_ASSERT_VAL("failed to synthesize attr update cpus", !perf_event__synthesize_event_update_cpus(&tmp.tool, evsel, process_event_cpus));
- perf_cpu_map__put(evsel->core.own_cpus); + evlist__delete(evlist); return 0; }
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit 581e295a0f6b5c2931d280259fbbfff56959faa9 ]
ASan reports a memory leak when running:
# perf test "65: maps__merge_in".
The causes of the leaks are two, this patch addresses only the first one, which is related to dso__new_map().
The bug is that dso__new_map() creates a new dso but never decreases the refcount it gets from creating it.
This patch adds the missing dso__put().
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: d3a7c489c7fd2463 ("perf tools: Reference count struct dso") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/60bfe0cd06e89e2ca33646eb8468d7f5de2ee597.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/dso.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c index 55c11e854fe4..b1ff0c9f32da 100644 --- a/tools/perf/util/dso.c +++ b/tools/perf/util/dso.c @@ -1141,8 +1141,10 @@ struct map *dso__new_map(const char *name) struct map *map = NULL; struct dso *dso = dso__new(name);
- if (dso) + if (dso) { map = map__new2(0, dso); + dso__put(dso); + }
return map; }
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit 244d1797c8c8e850b8de7992af713aa5c70d5650 ]
ASan reports a memory leak when running:
# perf test "65: maps__merge_in"
This is the second and final patch addressing these memory leaks.
This time, the problem is simply that the maps object is never destructed.
This patch adds the missing maps__exit call.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 79b6bb73f888933c ("perf maps: Merge 'struct maps' with 'struct map_groups'") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/a1a29b97a58738987d150e94d4ebfad0282fb038.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/maps.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/perf/tests/maps.c b/tools/perf/tests/maps.c index edcbc70ff9d6..1ac72919fa35 100644 --- a/tools/perf/tests/maps.c +++ b/tools/perf/tests/maps.c @@ -116,5 +116,7 @@ int test__maps__merge_in(struct test *t __maybe_unused, int subtest __maybe_unus
ret = check_maps(merged3, ARRAY_SIZE(merged3), &maps); TEST_ASSERT_VAL("merge check failed", !ret); + + maps__exit(&maps); return TEST_OK; }
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit da6b7c6c0626901428245f65712385805e42eba6 ]
ASan reports memory leaks while running:
# perf test "83: Zstd perf.data compression/decompression"
The first of the leaks is caused by env->cpu_pmu_caps not being freed.
This patch adds the missing (z)free inside perf_env__exit.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 6f91ea283a1ed23e ("perf header: Support CPU PMU capabilities") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/6ba036a8220156ec1f3d6be3e5d25920f6145028.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/env.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index 744e51c4a6bd..03bc843b1cf8 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -183,6 +183,7 @@ void perf_env__exit(struct perf_env *env) zfree(&env->sibling_threads); zfree(&env->pmu_mappings); zfree(&env->cpu); + zfree(&env->cpu_pmu_caps); zfree(&env->numa_map);
for (i = 0; i < env->nr_numa_nodes; i++)
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit a37338aad8c4d8676173ead14e881d2ec308155c ]
ASan reports the memory leak of the strings allocated by sort_help() when running perf report.
This patch changes the returned pointer to char* (instead of const char*), saves it in a temporary variable, and finally deallocates it at function exit.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 702fb9b415e7c99b ("perf report: Show all sort keys in help output") Cc: Andi Kleen ak@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/a38b13f02812a8a6759200b9063c6191337f44d4.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-report.c | 33 ++++++++++++++++++++++----------- tools/perf/util/sort.c | 2 +- tools/perf/util/sort.h | 2 +- 3 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 3c74c9c0f3c3..5824aa24acfc 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1143,6 +1143,8 @@ int cmd_report(int argc, const char **argv) .socket_filter = -1, .annotation_opts = annotation__default_options, }; + char *sort_order_help = sort_help("sort by key(s):"); + char *field_order_help = sort_help("output field(s): overhead period sample "); const struct option options[] = { OPT_STRING('i', "input", &input_name, "file", "input file name"), @@ -1177,9 +1179,9 @@ int cmd_report(int argc, const char **argv) OPT_BOOLEAN(0, "header-only", &report.header_only, "Show only data header."), OPT_STRING('s', "sort", &sort_order, "key[,key2...]", - sort_help("sort by key(s):")), + sort_order_help), OPT_STRING('F', "fields", &field_order, "key[,keys...]", - sort_help("output field(s): overhead period sample ")), + field_order_help), OPT_BOOLEAN(0, "show-cpu-utilization", &symbol_conf.show_cpu_utilization, "Show sample percentage for different cpu modes"), OPT_BOOLEAN_FLAG(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, @@ -1308,11 +1310,11 @@ int cmd_report(int argc, const char **argv) char sort_tmp[128];
if (ret < 0) - return ret; + goto exit;
ret = perf_config(report__config, &report); if (ret) - return ret; + goto exit;
argc = parse_options(argc, argv, options, report_usage, 0); if (argc) { @@ -1326,8 +1328,10 @@ int cmd_report(int argc, const char **argv) report.symbol_filter_str = argv[0]; }
- if (annotate_check_args(&report.annotation_opts) < 0) - return -EINVAL; + if (annotate_check_args(&report.annotation_opts) < 0) { + ret = -EINVAL; + goto exit; + }
if (report.mmaps_mode) report.tasks_mode = true; @@ -1341,12 +1345,14 @@ int cmd_report(int argc, const char **argv) if (symbol_conf.vmlinux_name && access(symbol_conf.vmlinux_name, R_OK)) { pr_err("Invalid file: %s\n", symbol_conf.vmlinux_name); - return -EINVAL; + ret = -EINVAL; + goto exit; } if (symbol_conf.kallsyms_name && access(symbol_conf.kallsyms_name, R_OK)) { pr_err("Invalid file: %s\n", symbol_conf.kallsyms_name); - return -EINVAL; + ret = -EINVAL; + goto exit; }
if (report.inverted_callchain) @@ -1370,12 +1376,14 @@ int cmd_report(int argc, const char **argv)
repeat: session = perf_session__new(&data, false, &report.tool); - if (IS_ERR(session)) - return PTR_ERR(session); + if (IS_ERR(session)) { + ret = PTR_ERR(session); + goto exit; + }
ret = evswitch__init(&report.evswitch, session->evlist, stderr); if (ret) - return ret; + goto exit;
if (zstd_init(&(session->zstd_data), 0) < 0) pr_warning("Decompression initialization failed. Reported data may be incomplete.\n"); @@ -1603,5 +1611,8 @@ error:
zstd_fini(&(session->zstd_data)); perf_session__delete(session); +exit: + free(sort_order_help); + free(field_order_help); return ret; } diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 8a3b7d5a4737..5e9e96452b9e 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -3177,7 +3177,7 @@ static void add_hpp_sort_string(struct strbuf *sb, struct hpp_dimension *s, int add_key(sb, s[i].name, llen); }
-const char *sort_help(const char *prefix) +char *sort_help(const char *prefix) { struct strbuf sb; char *s; diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 66d39c4cfe2b..fc94dcd67abc 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -293,7 +293,7 @@ void reset_output_field(void); void sort__setup_elide(FILE *fp); void perf_hpp__set_elide(int idx, bool elide);
-const char *sort_help(const char *prefix); +char *sort_help(const char *prefix);
int report_parse_ignore_callees_opt(const struct option *opt, const char *arg, int unset);
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit faf3ac305d61341c74e5cdd9e41daecce7f67bfe ]
ASan reports several memory leaks while running:
# perf test "82: Use vfs_getname probe to get syscall args filenames"
Two of these are caused by some refcounts not being decreased on perf-script exit, namely script.threads and script.cpus.
This patch adds the missing __put calls in a new perf_script__exit function, which is called at the end of cmd_script.
This patch concludes the fixes of all remaining memory leaks in perf test "82: Use vfs_getname probe to get syscall args filenames".
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: cfc8874a48599249 ("perf script: Process cpu/threads maps") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/5ee73b19791c6fa9d24c4d57f4ac1a23609400d7.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-script.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 48588ccf902e..2bb159c10503 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -2483,6 +2483,12 @@ static void perf_script__exit_per_event_dump_stats(struct perf_script *script) } }
+static void perf_script__exit(struct perf_script *script) +{ + perf_thread_map__put(script->threads); + perf_cpu_map__put(script->cpus); +} + static int __cmd_script(struct perf_script *script) { int ret; @@ -3937,6 +3943,7 @@ out_delete:
perf_evlist__free_stats(session->evlist); perf_session__delete(session); + perf_script__exit(&script);
if (script_started) cleanup_scripting();
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit f8cbb0f926ae1e1fb5f9e51614e5437560ed4039 ]
ASan reports memory leaks when running:
# perf test "88: Check open filename arg using perf trace + vfs_getname"
One of these is caused by the lzma stream never being closed inside lzma_decompress_to_file().
This patch adds the missing lzma_end().
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 80a32e5b498a7547 ("perf tools: Add lzma decompression support for kernel module") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/aaf50bdce7afe996cfc06e1bbb36e4a2a9b9db93.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/lzma.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/lzma.c b/tools/perf/util/lzma.c index 39062df02629..51424cdc3b68 100644 --- a/tools/perf/util/lzma.c +++ b/tools/perf/util/lzma.c @@ -69,7 +69,7 @@ int lzma_decompress_to_file(const char *input, int output_fd)
if (ferror(infile)) { pr_err("lzma: read error: %s\n", strerror(errno)); - goto err_fclose; + goto err_lzma_end; }
if (feof(infile)) @@ -83,7 +83,7 @@ int lzma_decompress_to_file(const char *input, int output_fd)
if (writen(output_fd, buf_out, write_size) != write_size) { pr_err("lzma: write error: %s\n", strerror(errno)); - goto err_fclose; + goto err_lzma_end; }
strm.next_out = buf_out; @@ -95,11 +95,13 @@ int lzma_decompress_to_file(const char *input, int output_fd) break;
pr_err("lzma: failed %s\n", lzma_strerror(ret)); - goto err_fclose; + goto err_lzma_end; } }
err = 0; +err_lzma_end: + lzma_end(&strm); err_fclose: fclose(infile); return err;
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit e0fa7ab42232e742dcb3de9f3c1f6127b5adc019 ]
ASan reports some memory leaks when running:
# perf test "42: BPF filter"
This second leak is caused by a strlist not being dellocated on error inside probe_file__del_events.
This patch adds a goto label before the deallocation and makes the error path jump to it.
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: e7895e422e4da63d ("perf probe: Split del_perf_probe_events()") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/probe-file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index bbecb449ea94..d2b98d64438e 100644 --- a/tools/perf/util/probe-file.c +++ b/tools/perf/util/probe-file.c @@ -342,11 +342,11 @@ int probe_file__del_events(int fd, struct strfilter *filter)
ret = probe_file__get_events(fd, filter, namelist); if (ret < 0) - return ret; + goto out;
ret = probe_file__del_strlist(fd, namelist); +out: strlist__delete(namelist); - return ret; }
From: Riccardo Mancini rickyman7@gmail.com
[ Upstream commit d4b3eedce151e63932ce4a00f1d0baa340a8b907 ]
When using 'perf report' in directory mode, the first file is not closed on exit, causing a memory leak.
The problem is caused by the iterating variable never reaching 0.
Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions") Signed-off-by: Riccardo Mancini rickyman7@gmail.com Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Zhen Lei thunder.leizhen@huawei.com Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/data.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index 5d97b3e45fbb..bcb494dc816a 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -20,7 +20,7 @@
static void close_dir(struct perf_data_file *files, int nr) { - while (--nr >= 1) { + while (--nr >= 0) { close(files[nr].fd); zfree(&files[nr].path); }
From: Yang Jihong yangjihong1@huawei.com
[ Upstream commit b0f008551f0bf4d5f6db9b5f0e071b02790d6a2e ]
The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events. As a result, the command fails.
Before:
#perf sched record sleep 1 event syntax error: 'sched:sched_stat_wait' ___ unknown tracepoint
Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Solution: Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
After: # perf sched record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ] # perf sched report run measurement overhead: 4736 nsecs sleep measurement overhead: 9059979 nsecs the run test took 999854 nsecs the sleep test took 8945271 nsecs nr_run_events: 716 nr_sleep_events: 785 nr_wakeup_events: 0 ... ------------------------------------------------------------
Fixes: 2a09b5de235a6 ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set") Signed-off-by: Yang Jihong yangjihong1@huawei.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Yafang Shao laoar.shao@gmail.com Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/builtin-sched.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c index 0e16f9d5a947..d3b5f5faf8c1 100644 --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -3337,6 +3337,16 @@ static void setup_sorting(struct perf_sched *sched, const struct option *options sort_dimension__add("pid", &sched->cmp_pid); }
+static bool schedstat_events_exposed(void) +{ + /* + * Select "sched:sched_stat_wait" event to check + * whether schedstat tracepoints are exposed. + */ + return IS_ERR(trace_event__tp_format("sched", "sched_stat_wait")) ? + false : true; +} + static int __cmd_record(int argc, const char **argv) { unsigned int rec_argc, i, j; @@ -3348,21 +3358,33 @@ static int __cmd_record(int argc, const char **argv) "-m", "1024", "-c", "1", "-e", "sched:sched_switch", - "-e", "sched:sched_stat_wait", - "-e", "sched:sched_stat_sleep", - "-e", "sched:sched_stat_iowait", "-e", "sched:sched_stat_runtime", "-e", "sched:sched_process_fork", "-e", "sched:sched_wakeup_new", "-e", "sched:sched_migrate_task", }; + + /* + * The tracepoints trace_sched_stat_{wait, sleep, iowait} + * are not exposed to user if CONFIG_SCHEDSTATS is not set, + * to prevent "perf sched record" execution failure, determine + * whether to record schedstat events according to actual situation. + */ + const char * const schedstat_args[] = { + "-e", "sched:sched_stat_wait", + "-e", "sched:sched_stat_sleep", + "-e", "sched:sched_stat_iowait", + }; + unsigned int schedstat_argc = schedstat_events_exposed() ? + ARRAY_SIZE(schedstat_args) : 0; + struct tep_event *waking_event;
/* * +2 for either "-e", "sched:sched_wakeup" or * "-e", "sched:sched_waking" */ - rec_argc = ARRAY_SIZE(record_args) + 2 + argc - 1; + rec_argc = ARRAY_SIZE(record_args) + 2 + schedstat_argc + argc - 1; rec_argv = calloc(rec_argc + 1, sizeof(char *));
if (rec_argv == NULL) @@ -3378,6 +3400,9 @@ static int __cmd_record(int argc, const char **argv) else rec_argv[i++] = strdup("sched:sched_wakeup");
+ for (j = 0; j < schedstat_argc; j++) + rec_argv[i++] = strdup(schedstat_args[j]); + for (j = 1; j < (unsigned int)argc; j++, i++) rec_argv[i] = argv[j];
From: Charles Keepax ckeepax@opensource.cirrus.com
[ Upstream commit dd6fb8ff2210f74b056bf9234d0605e8c26a8ac0 ]
When wm_coeff_tlv_get was updated it was accidentally switch to the _raw version of the helper causing it to ignore the current DSP state it should be checking. Switch the code back to the correct helper so that users can't read the controls when they arn't available.
Fixes: 73ecf1a673d3 ("ASoC: wm_adsp: Correct cache handling of new kernel control API") Signed-off-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/20210626155941.12251-1-ckeepax@opensource.cirrus.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wm_adsp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/codecs/wm_adsp.c b/sound/soc/codecs/wm_adsp.c index 985b2dcecf13..51d95437e0fd 100644 --- a/sound/soc/codecs/wm_adsp.c +++ b/sound/soc/codecs/wm_adsp.c @@ -1221,7 +1221,7 @@ static int wm_coeff_tlv_get(struct snd_kcontrol *kctl,
mutex_lock(&ctl->dsp->pwr_lock);
- ret = wm_coeff_read_ctrl_raw(ctl, ctl->cache, size); + ret = wm_coeff_read_ctrl(ctl, ctl->cache, size);
if (!ret && copy_to_user(bytes, ctl->cache, size)) ret = -EFAULT;
From: Clark Wang xiaoning.wang@nxp.com
[ Upstream commit 4df2f5e1372e9eec8f9e1b4a3025b9be23487d36 ]
When some drivers use spi to send data, spi_transfer->speed_hz is not assigned. If spidev->max_speed_hz is not assigned as well, it will cause an error in configuring the clock. Add a check for these two values before configuring the clock. An error will be returned when they are not assigned.
Signed-off-by: Clark Wang xiaoning.wang@nxp.com Link: https://lore.kernel.org/r/20210408103347.244313-2-xiaoning.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-imx.c | 37 +++++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 16 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c index 831a38920fa9..c8b750d8ac35 100644 --- a/drivers/spi/spi-imx.c +++ b/drivers/spi/spi-imx.c @@ -66,8 +66,7 @@ struct spi_imx_data; struct spi_imx_devtype_data { void (*intctrl)(struct spi_imx_data *, int); int (*prepare_message)(struct spi_imx_data *, struct spi_message *); - int (*prepare_transfer)(struct spi_imx_data *, struct spi_device *, - struct spi_transfer *); + int (*prepare_transfer)(struct spi_imx_data *, struct spi_device *); void (*trigger)(struct spi_imx_data *); int (*rx_available)(struct spi_imx_data *); void (*reset)(struct spi_imx_data *); @@ -572,11 +571,10 @@ static int mx51_ecspi_prepare_message(struct spi_imx_data *spi_imx, }
static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx, - struct spi_device *spi, - struct spi_transfer *t) + struct spi_device *spi) { u32 ctrl = readl(spi_imx->base + MX51_ECSPI_CTRL); - u32 clk = t->speed_hz, delay; + u32 clk, delay;
/* Clear BL field and set the right value */ ctrl &= ~MX51_ECSPI_CTRL_BL_MASK; @@ -590,7 +588,7 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx, /* set clock speed */ ctrl &= ~(0xf << MX51_ECSPI_CTRL_POSTDIV_OFFSET | 0xf << MX51_ECSPI_CTRL_PREDIV_OFFSET); - ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, &clk); + ctrl |= mx51_ecspi_clkdiv(spi_imx, spi_imx->spi_bus_clk, &clk); spi_imx->spi_bus_clk = clk;
if (spi_imx->usedma) @@ -702,13 +700,12 @@ static int mx31_prepare_message(struct spi_imx_data *spi_imx, }
static int mx31_prepare_transfer(struct spi_imx_data *spi_imx, - struct spi_device *spi, - struct spi_transfer *t) + struct spi_device *spi) { unsigned int reg = MX31_CSPICTRL_ENABLE | MX31_CSPICTRL_MASTER; unsigned int clk;
- reg |= spi_imx_clkdiv_2(spi_imx->spi_clk, t->speed_hz, &clk) << + reg |= spi_imx_clkdiv_2(spi_imx->spi_clk, spi_imx->spi_bus_clk, &clk) << MX31_CSPICTRL_DR_SHIFT; spi_imx->spi_bus_clk = clk;
@@ -807,14 +804,13 @@ static int mx21_prepare_message(struct spi_imx_data *spi_imx, }
static int mx21_prepare_transfer(struct spi_imx_data *spi_imx, - struct spi_device *spi, - struct spi_transfer *t) + struct spi_device *spi) { unsigned int reg = MX21_CSPICTRL_ENABLE | MX21_CSPICTRL_MASTER; unsigned int max = is_imx27_cspi(spi_imx) ? 16 : 18; unsigned int clk;
- reg |= spi_imx_clkdiv_1(spi_imx->spi_clk, t->speed_hz, max, &clk) + reg |= spi_imx_clkdiv_1(spi_imx->spi_clk, spi_imx->spi_bus_clk, max, &clk) << MX21_CSPICTRL_DR_SHIFT; spi_imx->spi_bus_clk = clk;
@@ -883,13 +879,12 @@ static int mx1_prepare_message(struct spi_imx_data *spi_imx, }
static int mx1_prepare_transfer(struct spi_imx_data *spi_imx, - struct spi_device *spi, - struct spi_transfer *t) + struct spi_device *spi) { unsigned int reg = MX1_CSPICTRL_ENABLE | MX1_CSPICTRL_MASTER; unsigned int clk;
- reg |= spi_imx_clkdiv_2(spi_imx->spi_clk, t->speed_hz, &clk) << + reg |= spi_imx_clkdiv_2(spi_imx->spi_clk, spi_imx->spi_bus_clk, &clk) << MX1_CSPICTRL_DR_SHIFT; spi_imx->spi_bus_clk = clk;
@@ -1195,6 +1190,16 @@ static int spi_imx_setupxfer(struct spi_device *spi, if (!t) return 0;
+ if (!t->speed_hz) { + if (!spi->max_speed_hz) { + dev_err(&spi->dev, "no speed_hz provided!\n"); + return -EINVAL; + } + dev_dbg(&spi->dev, "using spi->max_speed_hz!\n"); + spi_imx->spi_bus_clk = spi->max_speed_hz; + } else + spi_imx->spi_bus_clk = t->speed_hz; + spi_imx->bits_per_word = t->bits_per_word;
/* @@ -1236,7 +1241,7 @@ static int spi_imx_setupxfer(struct spi_device *spi, spi_imx->slave_burst = t->len; }
- spi_imx->devtype_data->prepare_transfer(spi_imx, spi, t); + spi_imx->devtype_data->prepare_transfer(spi_imx, spi);
return 0; }
From: Alain Volmat alain.volmat@foss.st.com
[ Upstream commit 7999d2555c9f879d006ea8469d74db9cdb038af0 ]
Add pm_runtime calls in probe/probe error path and remove in order to be consistent in all places in ordering and ensure that pm_runtime is disabled prior to resources used by the SPI controller.
This patch also fixes the 2 following warnings on driver remove: WARNING: CPU: 0 PID: 743 at drivers/clk/clk.c:594 clk_core_disable_lock+0x18/0x24 WARNING: CPU: 0 PID: 743 at drivers/clk/clk.c:476 clk_unprepare+0x24/0x2c
Fixes: 038ac869c9d2 ("spi: stm32: add runtime PM support")
Signed-off-by: Amelie Delaunay amelie.delaunay@foss.st.com Signed-off-by: Alain Volmat alain.volmat@foss.st.com Link: https://lore.kernel.org/r/1625646426-5826-2-git-send-email-alain.volmat@foss... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-stm32.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c index 0318f02d6212..8f91f8705eee 100644 --- a/drivers/spi/spi-stm32.c +++ b/drivers/spi/spi-stm32.c @@ -1946,6 +1946,7 @@ static int stm32_spi_probe(struct platform_device *pdev) master->can_dma = stm32_spi_can_dma;
pm_runtime_set_active(&pdev->dev); + pm_runtime_get_noresume(&pdev->dev); pm_runtime_enable(&pdev->dev);
ret = spi_register_master(master); @@ -1967,6 +1968,8 @@ static int stm32_spi_probe(struct platform_device *pdev)
err_pm_disable: pm_runtime_disable(&pdev->dev); + pm_runtime_put_noidle(&pdev->dev); + pm_runtime_set_suspended(&pdev->dev); err_dma_release: if (spi->dma_tx) dma_release_channel(spi->dma_tx); @@ -1983,9 +1986,14 @@ static int stm32_spi_remove(struct platform_device *pdev) struct spi_master *master = platform_get_drvdata(pdev); struct stm32_spi *spi = spi_master_get_devdata(master);
+ pm_runtime_get_sync(&pdev->dev); + spi_unregister_master(master); spi->cfg->disable(spi);
+ pm_runtime_disable(&pdev->dev); + pm_runtime_put_noidle(&pdev->dev); + pm_runtime_set_suspended(&pdev->dev); if (master->dma_tx) dma_release_channel(master->dma_tx); if (master->dma_rx) @@ -1993,7 +2001,6 @@ static int stm32_spi_remove(struct platform_device *pdev)
clk_disable_unprepare(spi->clk);
- pm_runtime_disable(&pdev->dev);
pinctrl_pm_select_sleep_state(&pdev->dev);
From: Axel Lin axel.lin@ingics.com
[ Upstream commit ae60e6a9d24e89a74e2512204ad04de94921bdd2 ]
Use unsigned int instead of u32 for regmap_read/regmap_update_bits val argument.
Signed-off-by: Axel Lin axel.lin@ingics.com Link: https://lore.kernel.org/r/20210619124133.4096683-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/hi6421-regulator.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/regulator/hi6421-regulator.c b/drivers/regulator/hi6421-regulator.c index dc631c1a46b4..bff8c515dcde 100644 --- a/drivers/regulator/hi6421-regulator.c +++ b/drivers/regulator/hi6421-regulator.c @@ -386,7 +386,7 @@ static int hi6421_regulator_enable(struct regulator_dev *rdev) static unsigned int hi6421_regulator_ldo_get_mode(struct regulator_dev *rdev) { struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); - u32 reg_val; + unsigned int reg_val;
regmap_read(rdev->regmap, rdev->desc->enable_reg, ®_val); if (reg_val & info->mode_mask) @@ -398,7 +398,7 @@ static unsigned int hi6421_regulator_ldo_get_mode(struct regulator_dev *rdev) static unsigned int hi6421_regulator_buck_get_mode(struct regulator_dev *rdev) { struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); - u32 reg_val; + unsigned int reg_val;
regmap_read(rdev->regmap, rdev->desc->enable_reg, ®_val); if (reg_val & info->mode_mask) @@ -411,7 +411,7 @@ static int hi6421_regulator_ldo_set_mode(struct regulator_dev *rdev, unsigned int mode) { struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); - u32 new_mode; + unsigned int new_mode;
switch (mode) { case REGULATOR_MODE_NORMAL: @@ -435,7 +435,7 @@ static int hi6421_regulator_buck_set_mode(struct regulator_dev *rdev, unsigned int mode) { struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); - u32 new_mode; + unsigned int new_mode;
switch (mode) { case REGULATOR_MODE_NORMAL:
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 1c73daee4bf30ccdff5e86dc400daa6f74735da5 ]
Since config.dev = pdev->dev.parent in current code, so dev_get_drvdata(rdev->dev.parent) call in hi6421_regulator_enable returns the drvdata of the mfd device rather than the regulator. Fix it.
This was broken while converting to use simplified DT parsing because the config.dev changed from pdev->dev to pdev->dev.parent for parsing the parent's of_node.
Fixes: 29dc269a85ef ("regulator: hi6421: Convert to use simplified DT parsing") Signed-off-by: Axel Lin axel.lin@ingics.com Link: https://lore.kernel.org/r/20210630095959.2411543-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/hi6421-regulator.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/regulator/hi6421-regulator.c b/drivers/regulator/hi6421-regulator.c index bff8c515dcde..d144a4bdb76d 100644 --- a/drivers/regulator/hi6421-regulator.c +++ b/drivers/regulator/hi6421-regulator.c @@ -366,9 +366,8 @@ static struct hi6421_regulator_info
static int hi6421_regulator_enable(struct regulator_dev *rdev) { - struct hi6421_regulator_pdata *pdata; + struct hi6421_regulator_pdata *pdata = rdev_get_drvdata(rdev);
- pdata = dev_get_drvdata(rdev->dev.parent); /* hi6421 spec requires regulator enablement must be serialized: * - Because when BUCK, LDO switching from off to on, it will have * a huge instantaneous current; so you can not turn on two or @@ -385,9 +384,10 @@ static int hi6421_regulator_enable(struct regulator_dev *rdev)
static unsigned int hi6421_regulator_ldo_get_mode(struct regulator_dev *rdev) { - struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); + struct hi6421_regulator_info *info; unsigned int reg_val;
+ info = container_of(rdev->desc, struct hi6421_regulator_info, desc); regmap_read(rdev->regmap, rdev->desc->enable_reg, ®_val); if (reg_val & info->mode_mask) return REGULATOR_MODE_IDLE; @@ -397,9 +397,10 @@ static unsigned int hi6421_regulator_ldo_get_mode(struct regulator_dev *rdev)
static unsigned int hi6421_regulator_buck_get_mode(struct regulator_dev *rdev) { - struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); + struct hi6421_regulator_info *info; unsigned int reg_val;
+ info = container_of(rdev->desc, struct hi6421_regulator_info, desc); regmap_read(rdev->regmap, rdev->desc->enable_reg, ®_val); if (reg_val & info->mode_mask) return REGULATOR_MODE_STANDBY; @@ -410,9 +411,10 @@ static unsigned int hi6421_regulator_buck_get_mode(struct regulator_dev *rdev) static int hi6421_regulator_ldo_set_mode(struct regulator_dev *rdev, unsigned int mode) { - struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); + struct hi6421_regulator_info *info; unsigned int new_mode;
+ info = container_of(rdev->desc, struct hi6421_regulator_info, desc); switch (mode) { case REGULATOR_MODE_NORMAL: new_mode = 0; @@ -434,9 +436,10 @@ static int hi6421_regulator_ldo_set_mode(struct regulator_dev *rdev, static int hi6421_regulator_buck_set_mode(struct regulator_dev *rdev, unsigned int mode) { - struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); + struct hi6421_regulator_info *info; unsigned int new_mode;
+ info = container_of(rdev->desc, struct hi6421_regulator_info, desc); switch (mode) { case REGULATOR_MODE_NORMAL: new_mode = 0; @@ -459,7 +462,9 @@ static unsigned int hi6421_regulator_ldo_get_optimum_mode(struct regulator_dev *rdev, int input_uV, int output_uV, int load_uA) { - struct hi6421_regulator_info *info = rdev_get_drvdata(rdev); + struct hi6421_regulator_info *info; + + info = container_of(rdev->desc, struct hi6421_regulator_info, desc);
if (load_uA > info->eco_microamp) return REGULATOR_MODE_NORMAL; @@ -543,14 +548,13 @@ static int hi6421_regulator_probe(struct platform_device *pdev) if (!pdata) return -ENOMEM; mutex_init(&pdata->lock); - platform_set_drvdata(pdev, pdata);
for (i = 0; i < ARRAY_SIZE(hi6421_regulator_info); i++) { /* assign per-regulator data */ info = &hi6421_regulator_info[i];
config.dev = pdev->dev.parent; - config.driver_data = info; + config.driver_data = pdata; config.regmap = pmic->regmap;
rdev = devm_regulator_register(&pdev->dev, &info->desc,
From: Peter Hess peter.hess@ph-home.de
[ Upstream commit 3a70dd2d050331ee4cf5ad9d5c0a32d83ead9a43 ]
In FIFO mode were two problems: - RX mode was never handled and - in this case the tx_buf pointer was NULL and caused an exception
fix this by handling RX mode in mtk_spi_fifo_transfer
Fixes: a568231f4632 ("spi: mediatek: Add spi bus for Mediatek MT8173") Signed-off-by: Peter Hess peter.hess@ph-home.de Signed-off-by: Frank Wunderlich frank-w@public-files.de Link: https://lore.kernel.org/r/20210706121609.680534-1-linux@fw-web.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-mt65xx.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-mt65xx.c b/drivers/spi/spi-mt65xx.c index 5d643051bf3d..8f2d112f0b5d 100644 --- a/drivers/spi/spi-mt65xx.c +++ b/drivers/spi/spi-mt65xx.c @@ -434,13 +434,23 @@ static int mtk_spi_fifo_transfer(struct spi_master *master, mtk_spi_setup_packet(master);
cnt = xfer->len / 4; - iowrite32_rep(mdata->base + SPI_TX_DATA_REG, xfer->tx_buf, cnt); + if (xfer->tx_buf) + iowrite32_rep(mdata->base + SPI_TX_DATA_REG, xfer->tx_buf, cnt); + + if (xfer->rx_buf) + ioread32_rep(mdata->base + SPI_RX_DATA_REG, xfer->rx_buf, cnt);
remainder = xfer->len % 4; if (remainder > 0) { reg_val = 0; - memcpy(®_val, xfer->tx_buf + (cnt * 4), remainder); - writel(reg_val, mdata->base + SPI_TX_DATA_REG); + if (xfer->tx_buf) { + memcpy(®_val, xfer->tx_buf + (cnt * 4), remainder); + writel(reg_val, mdata->base + SPI_TX_DATA_REG); + } + if (xfer->rx_buf) { + reg_val = readl(mdata->base + SPI_RX_DATA_REG); + memcpy(xfer->rx_buf + (cnt * 4), ®_val, remainder); + } }
mtk_spi_enable_transfer(master);
From: Maxim Schwalm maxim.schwalm@gmail.com
[ Upstream commit c71f78a662611fe2c67f3155da19b0eff0f29762 ]
The ALC5631 does not like multi-write accesses, avoid them. This fixes:
rt5631 4-001a: Unable to sync registers 0x3a-0x3c. -121
errors on resume from suspend (and all registers after the registers in the error not being synced).
Inspired by commit 2d30e9494f1e ("ASoC: rt5651: Fix regcache sync errors on resume") from Hans de Geode, which fixed the same errors on ALC5651.
Signed-off-by: Maxim Schwalm maxim.schwalm@gmail.com Link: https://lore.kernel.org/r/20210712005011.28536-1-digetx@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5631.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/codecs/rt5631.c b/sound/soc/codecs/rt5631.c index 653da3eaf355..86d58d0df057 100644 --- a/sound/soc/codecs/rt5631.c +++ b/sound/soc/codecs/rt5631.c @@ -1695,6 +1695,8 @@ static const struct regmap_config rt5631_regmap_config = { .reg_defaults = rt5631_reg, .num_reg_defaults = ARRAY_SIZE(rt5631_reg), .cache_type = REGCACHE_RBTREE, + .use_single_read = true, + .use_single_write = true, };
static int rt5631_i2c_probe(struct i2c_client *i2c,
From: Xuan Zhuo xuanzhuo@linux.alibaba.com
[ Upstream commit 5e21bb4e812566aef86fbb77c96a4ec0782286e4 ]
These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be executed directly in the driver, therefore we should also not directly run them from here. To run in these two situations, there must be further preparations done, otherwise these may cause a kernel panic.
For more details, see also dev_xdp_attach().
[ 46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 46.984295] #PF: supervisor read access in kernel mode [ 46.985777] #PF: error_code(0x0000) - not-present page [ 46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0 [ 46.989201] Oops: 0000 [#1] SMP PTI [ 46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44 [ 46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24 [ 46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710 [ 46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02 [ 47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246 [ 47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000 [ 47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98 [ 47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff [ 47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98 [ 47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8 [ 47.013244] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 [ 47.015705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0 [ 47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.023574] PKRU: 55555554 [ 47.024571] Call Trace: [ 47.025424] __bpf_prog_run32+0x32/0x50 [ 47.026296] ? printk+0x53/0x6a [ 47.027066] ? ktime_get+0x39/0x90 [ 47.027895] bpf_test_run.cold.28+0x23/0x123 [ 47.028866] ? printk+0x53/0x6a [ 47.029630] bpf_prog_test_run_xdp+0x149/0x1d0 [ 47.030649] __sys_bpf+0x1305/0x23d0 [ 47.031482] __x64_sys_bpf+0x17/0x20 [ 47.032316] do_syscall_64+0x3a/0x80 [ 47.033165] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 47.034254] RIP: 0033:0x7f04a51364dd [ 47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48 [ 47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141 [ 47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd [ 47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a [ 47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100 [ 47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070 [ 47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 47.047579] Modules linked in: [ 47.048318] CR2: 0000000000000000 [ 47.049120] ---[ end trace 7ad34443d5be719a ]--- [ 47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710 [ 47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02 [ 47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246 [ 47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000 [ 47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98 [ 47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff [ 47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98 [ 47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8 [ 47.063249] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 [ 47.065070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0 [ 47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.070652] PKRU: 55555554 [ 47.071318] Kernel panic - not syncing: Fatal exception [ 47.072854] Kernel Offset: disabled [ 47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]---
Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry") Reported-by: Abaci abaci@linux.alibaba.com Signed-off-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Dust Li dust.li@linux.alibaba.com Acked-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: David Ahern dsahern@kernel.org Acked-by: Song Liu songliubraving@fb.com Link: https://lore.kernel.org/bpf/20210708080409.73525-1-xuanzhuo@linux.alibaba.co... Signed-off-by: Sasha Levin sashal@kernel.org --- net/bpf/test_run.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 8b796c499cbb..e7cbd1b4a5e5 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -627,6 +627,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, void *data; int ret;
+ if (prog->expected_attach_type == BPF_XDP_DEVMAP || + prog->expected_attach_type == BPF_XDP_CPUMAP) + return -EINVAL; if (kattr->test.ctx_in || kattr->test.ctx_out) return -EINVAL;
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 5dd0a6b8582ffbfa88351949d50eccd5b6694ade ]
During testing of f263a81451c1 ("bpf: Track subprog poke descriptors correctly and fix use-after-free") under various failure conditions, for example, when jit_subprogs() fails and tries to clean up the program to be run under the interpreter, we ran into the following freeze:
[...] #127/8 tailcall_bpf2bpf_3:FAIL [...] [ 92.041251] BUG: KASAN: slab-out-of-bounds in ___bpf_prog_run+0x1b9d/0x2e20 [ 92.042408] Read of size 8 at addr ffff88800da67f68 by task test_progs/682 [ 92.043707] [ 92.044030] CPU: 1 PID: 682 Comm: test_progs Tainted: G O 5.13.0-53301-ge6c08cb33a30-dirty #87 [ 92.045542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 92.046785] Call Trace: [ 92.047171] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.047773] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.048389] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.049019] ? ktime_get+0x117/0x130 [...] // few hundred [similar] lines more [ 92.659025] ? ktime_get+0x117/0x130 [ 92.659845] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.660738] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.661528] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.662378] ? print_usage_bug+0x50/0x50 [ 92.663221] ? print_usage_bug+0x50/0x50 [ 92.664077] ? bpf_ksym_find+0x9c/0xe0 [ 92.664887] ? ktime_get+0x117/0x130 [ 92.665624] ? kernel_text_address+0xf5/0x100 [ 92.666529] ? __kernel_text_address+0xe/0x30 [ 92.667725] ? unwind_get_return_address+0x2f/0x50 [ 92.668854] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.670185] ? ktime_get+0x117/0x130 [ 92.671130] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.672020] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.672860] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.675159] ? ktime_get+0x117/0x130 [ 92.677074] ? lock_is_held_type+0xd5/0x130 [ 92.678662] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.680046] ? ktime_get+0x117/0x130 [ 92.681285] ? __bpf_prog_run32+0x6b/0x90 [ 92.682601] ? __bpf_prog_run64+0x90/0x90 [ 92.683636] ? lock_downgrade+0x370/0x370 [ 92.684647] ? mark_held_locks+0x44/0x90 [ 92.685652] ? ktime_get+0x117/0x130 [ 92.686752] ? lockdep_hardirqs_on+0x79/0x100 [ 92.688004] ? ktime_get+0x117/0x130 [ 92.688573] ? __cant_migrate+0x2b/0x80 [ 92.689192] ? bpf_test_run+0x2f4/0x510 [ 92.689869] ? bpf_test_timer_continue+0x1c0/0x1c0 [ 92.690856] ? rcu_read_lock_bh_held+0x90/0x90 [ 92.691506] ? __kasan_slab_alloc+0x61/0x80 [ 92.692128] ? eth_type_trans+0x128/0x240 [ 92.692737] ? __build_skb+0x46/0x50 [ 92.693252] ? bpf_prog_test_run_skb+0x65e/0xc50 [ 92.693954] ? bpf_prog_test_run_raw_tp+0x2d0/0x2d0 [ 92.694639] ? __fget_light+0xa1/0x100 [ 92.695162] ? bpf_prog_inc+0x23/0x30 [ 92.695685] ? __sys_bpf+0xb40/0x2c80 [ 92.696324] ? bpf_link_get_from_fd+0x90/0x90 [ 92.697150] ? mark_held_locks+0x24/0x90 [ 92.698007] ? lockdep_hardirqs_on_prepare+0x124/0x220 [ 92.699045] ? finish_task_switch+0xe6/0x370 [ 92.700072] ? lockdep_hardirqs_on+0x79/0x100 [ 92.701233] ? finish_task_switch+0x11d/0x370 [ 92.702264] ? __switch_to+0x2c0/0x740 [ 92.703148] ? mark_held_locks+0x24/0x90 [ 92.704155] ? __x64_sys_bpf+0x45/0x50 [ 92.705146] ? do_syscall_64+0x35/0x80 [ 92.706953] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [...]
Turns out that the program rejection from e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") is buggy since env->prog->aux->tail_call_reachable is never true. Commit ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") added a tracker into check_max_stack_depth() which propagates the tail_call_reachable condition throughout the subprograms. This info is then assigned to the subprogram's func[i]->aux->tail_call_reachable. However, in the case of the rejection check upon JIT failure, env->prog->aux->tail_call_reachable is used. func[0]->aux->tail_call_reachable which represents the main program's information did not propagate this to the outer env->prog->aux, though. Add this propagation into check_max_stack_depth() where it needs to belong so that the check can be done reliably.
Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Co-developed-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/bpf/618c34e3163ad1a36b1e82377576a6081e182f25.1626123... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/verifier.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1f8bf2b39d50..36bc34fce623 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3356,6 +3356,8 @@ continue_func: if (tail_call_reachable) for (j = 0; j < frame; j++) subprog[ret_prog[j]].tail_call_reachable = true; + if (subprog[0].tail_call_reachable) + env->prog->aux->tail_call_reachable = true;
/* end of for() loop means the last insn of the 'subprog' * was reached. Doesn't matter whether it was JA or EXIT
From: Xuan Zhuo xuanzhuo@linux.alibaba.com
[ Upstream commit 5acc7d3e8d342858405fbbc671221f676b547ce7 ]
The problem occurs between dev_get_by_index() and dev_xdp_attach_link(). At this point, dev_xdp_uninstall() is called. Then xdp link will not be detached automatically when dev is released. But link->dev already points to dev, when xdp link is released, dev will still be accessed, but dev has been released.
dev_get_by_index() | link->dev = dev | | rtnl_lock() | unregister_netdevice_many() | dev_xdp_uninstall() | rtnl_unlock() rtnl_lock(); | dev_xdp_attach_link() | rtnl_unlock(); | | netdev_run_todo() // dev released bpf_xdp_link_release() | /* access dev. | use-after-free */ |
[ 45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0 [ 45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732 [ 45.968297] [ 45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ #22 [ 45.969222] Hardware name: linux,dummy-virt (DT) [ 45.969795] Call trace: [ 45.970106] dump_backtrace+0x0/0x4c8 [ 45.970564] show_stack+0x30/0x40 [ 45.970981] dump_stack_lvl+0x120/0x18c [ 45.971470] print_address_description.constprop.0+0x74/0x30c [ 45.972182] kasan_report+0x1e8/0x200 [ 45.972659] __asan_report_load8_noabort+0x2c/0x50 [ 45.973273] bpf_xdp_link_release+0x3b8/0x3d0 [ 45.973834] bpf_link_free+0xd0/0x188 [ 45.974315] bpf_link_put+0x1d0/0x218 [ 45.974790] bpf_link_release+0x3c/0x58 [ 45.975291] __fput+0x20c/0x7e8 [ 45.975706] ____fput+0x24/0x30 [ 45.976117] task_work_run+0x104/0x258 [ 45.976609] do_notify_resume+0x894/0xaf8 [ 45.977121] work_pending+0xc/0x328 [ 45.977575] [ 45.977775] The buggy address belongs to the page: [ 45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998 [ 45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff) [ 45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000 [ 45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 45.982259] page dumped because: kasan: bad access detected [ 45.982948] [ 45.983153] Memory state around the buggy address: [ 45.983753] ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 45.984645] ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.986419] ^ [ 45.987112] ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.988006] ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 45.988895] ================================================================== [ 45.989773] Disabling lock debugging due to kernel taint [ 45.990552] Kernel panic - not syncing: panic_on_warn set ... [ 45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G B 5.13.0+ #22 [ 45.991929] Hardware name: linux,dummy-virt (DT) [ 45.992448] Call trace: [ 45.992753] dump_backtrace+0x0/0x4c8 [ 45.993208] show_stack+0x30/0x40 [ 45.993627] dump_stack_lvl+0x120/0x18c [ 45.994113] dump_stack+0x1c/0x34 [ 45.994530] panic+0x3a4/0x7d8 [ 45.994930] end_report+0x194/0x198 [ 45.995380] kasan_report+0x134/0x200 [ 45.995850] __asan_report_load8_noabort+0x2c/0x50 [ 45.996453] bpf_xdp_link_release+0x3b8/0x3d0 [ 45.997007] bpf_link_free+0xd0/0x188 [ 45.997474] bpf_link_put+0x1d0/0x218 [ 45.997942] bpf_link_release+0x3c/0x58 [ 45.998429] __fput+0x20c/0x7e8 [ 45.998833] ____fput+0x24/0x30 [ 45.999247] task_work_run+0x104/0x258 [ 45.999731] do_notify_resume+0x894/0xaf8 [ 46.000236] work_pending+0xc/0x328 [ 46.000697] SMP: stopping secondary CPUs [ 46.001226] Dumping ftrace buffer: [ 46.001663] (ftrace buffer empty) [ 46.002110] Kernel Offset: disabled [ 46.002545] CPU features: 0x00000001,23202c00 [ 46.003080] Memory Limit: none
Fixes: aa8d3a716b59db6c ("bpf, xdp: Add bpf_link-based XDP attachment API") Reported-by: Abaci abaci@linux.alibaba.com Signed-off-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Alexei Starovoitov ast@kernel.org Reviewed-by: Dust Li dust.li@linux.alibaba.com Acked-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20210710031635.41649-1-xuanzhuo@linux.alibaba.co... Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 86a0fe0f4c02..4935ca1e887f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9401,14 +9401,17 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) struct net_device *dev; int err, fd;
+ rtnl_lock(); dev = dev_get_by_index(net, attr->link_create.target_ifindex); - if (!dev) + if (!dev) { + rtnl_unlock(); return -EINVAL; + }
link = kzalloc(sizeof(*link), GFP_USER); if (!link) { err = -ENOMEM; - goto out_put_dev; + goto unlock; }
bpf_link_init(&link->link, BPF_LINK_TYPE_XDP, &bpf_xdp_link_lops, prog); @@ -9418,14 +9421,14 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) err = bpf_link_prime(&link->link, &link_primer); if (err) { kfree(link); - goto out_put_dev; + goto unlock; }
- rtnl_lock(); err = dev_xdp_attach_link(dev, NULL, link); rtnl_unlock();
if (err) { + link->dev = NULL; bpf_link_cleanup(&link_primer); goto out_put_dev; } @@ -9435,6 +9438,9 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) dev_put(dev); return fd;
+unlock: + rtnl_unlock(); + out_put_dev: dev_put(dev); return err;
From: Nicolas Saenz Julienne nsaenzju@redhat.com
[ Upstream commit aebacb7f6ca1926918734faae14d1f0b6fae5cb7 ]
31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary") subtly altered get_next_timer_interrupt()'s behaviour. The function no longer consistently returns KTIME_MAX with no timers pending.
In order to decide if there are any timers pending we check whether the next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now. Unfortunately, the next expiry time and the timer base clock are no longer updated in unison. The former changes upon certain timer operations (enqueue, expire, detach), whereas the latter keeps track of jiffies as they move forward. Ultimately breaking the logic above.
A simplified example:
- Upon entering get_next_timer_interrupt() with:
jiffies = 1 base->clk = 0; base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function returns KTIME_MAX.
- 'base->clk' is updated to the jiffies value.
- The next time we enter get_next_timer_interrupt(), taking into account no timer operations happened:
base->clk = 1; base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function returns a valid expire time, which is incorrect.
This ultimately might unnecessarily rearm sched's timer on nohz_full setups, and add latency to the system[1].
So, introduce 'base->timers_pending'[2], update it every time 'base->next_expiry' changes, and use it in get_next_timer_interrupt().
[1] See tick_nohz_stop_tick(). [2] A quick pahole check on x86_64 and arm64 shows it doesn't make 'struct timer_base' any bigger.
Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary") Signed-off-by: Nicolas Saenz Julienne nsaenzju@redhat.com Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/time/timer.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index c3ad64fb9d8b..aa96b8a4e508 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -207,6 +207,7 @@ struct timer_base { unsigned int cpu; bool next_expiry_recalc; bool is_idle; + bool timers_pending; DECLARE_BITMAP(pending_map, WHEEL_SIZE); struct hlist_head vectors[WHEEL_SIZE]; } ____cacheline_aligned; @@ -595,6 +596,7 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer, * can reevaluate the wheel: */ base->next_expiry = bucket_expiry; + base->timers_pending = true; base->next_expiry_recalc = false; trigger_dyntick_cpu(base, timer); } @@ -1575,6 +1577,7 @@ static unsigned long __next_timer_interrupt(struct timer_base *base) }
base->next_expiry_recalc = false; + base->timers_pending = !(next == base->clk + NEXT_TIMER_MAX_DELTA);
return next; } @@ -1626,7 +1629,6 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); u64 expires = KTIME_MAX; unsigned long nextevt; - bool is_max_delta;
/* * Pretend that there is no timer pending if the cpu is offline. @@ -1639,7 +1641,6 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) if (base->next_expiry_recalc) base->next_expiry = __next_timer_interrupt(base); nextevt = base->next_expiry; - is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA);
/* * We have a fresh next event. Check whether we can forward the @@ -1657,7 +1658,7 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) expires = basem; base->is_idle = false; } else { - if (!is_max_delta) + if (base->timers_pending) expires = basem + (u64)(nextevt - basej) * TICK_NSEC; /* * If we expect to sleep more than a tick, mark the base idle. @@ -1940,6 +1941,7 @@ int timers_prepare_cpu(unsigned int cpu) base = per_cpu_ptr(&timer_bases[b], cpu); base->clk = jiffies; base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA; + base->timers_pending = false; base->is_idle = false; } return 0;
From: Colin Ian King colin.king@canonical.com
[ Upstream commit e7efc2ce3d0789cd7c21b70ff00cd7838d382639 ]
Shifting the u16 integer oct->pcie_port by CN23XX_PKT_INPUT_CTL_MAC_NUM_POS (29) bits will be promoted to a 32 bit signed int and then sign-extended to a u64. In the cases where oct->pcie_port where bit 2 is set (e.g. 3..7) the shifted value will be sign extended and the top 32 bits of the result will be set.
Fix this by casting the u16 values to a u64 before the 29 bit left shift.
Addresses-Coverity: ("Unintended sign extension")
Fixes: 3451b97cce2d ("liquidio: CN23XX register setup") Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index 4cddd628d41b..9ed3d1ab2ca5 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -420,7 +420,7 @@ static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct) * bits 32:47 indicate the PVF num. */ for (q_no = 0; q_no < ern; q_no++) { - reg_val = oct->pcie_port << CN23XX_PKT_INPUT_CTL_MAC_NUM_POS; + reg_val = (u64)oct->pcie_port << CN23XX_PKT_INPUT_CTL_MAC_NUM_POS;
/* for VF assigned queues. */ if (q_no < oct->sriov_info.pf_srn) {
From: Colin Ian King colin.king@canonical.com
[ Upstream commit 91091656252f5d6d8c476e0c92776ce9fae7b445 ]
Currently array jit->seen_reg[r1] is being accessed before the range checking of index r1. The range changing on r1 should be performed first since it will avoid any potential out-of-range accesses on the array seen_reg[] and also it is more optimal to perform checks on r1 before fetching data from the array. Fix this by swapping the order of the checks before the array access.
Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Ilya Leoshkevich iii@linux.ibm.com Acked-by: Ilya Leoshkevich iii@linux.ibm.com Link: https://lore.kernel.org/bpf/20210715125712.24690-1-colin.king@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 0a4182792876..fc44dce59536 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -112,7 +112,7 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1) { u32 r1 = reg2hex[b1];
- if (!jit->seen_reg[r1] && r1 >= 6 && r1 <= 15) + if (r1 >= 6 && r1 <= 15 && !jit->seen_reg[r1]) jit->seen_reg[r1] = 1; }
From: John Fastabend john.fastabend@gmail.com
[ Upstream commit 7e6b27a69167f97c56b5437871d29e9722c3e470 ]
If skb_linearize is needed and fails we could leak a msg on the error handling. To fix ensure we kfree the msg block before returning error. Found during code review.
Fixes: 4363023d2668e ("bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list") Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/bpf/20210712195546.423990-2-john.fastabend@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skmsg.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 923a1d0f84ca..c4c224a5b9de 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -433,10 +433,8 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, if (skb_linearize(skb)) return -EAGAIN; num_sge = skb_to_sgvec(skb, msg->sg.data, 0, skb->len); - if (unlikely(num_sge < 0)) { - kfree(msg); + if (unlikely(num_sge < 0)) return num_sge; - }
copied = skb->len; msg->sg.start = 0; @@ -455,6 +453,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) { struct sock *sk = psock->sk; struct sk_msg *msg; + int err;
/* If we are receiving on the same sock skb->sk is already assigned, * skip memory accounting and owner transition seeing it already set @@ -473,7 +472,10 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb) * into user buffers. */ skb_set_owner_r(skb, sk); - return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); + err = sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); + if (err < 0) + kfree(msg); + return err; }
/* Puts an skb on the ingress queue of the socket already assigned to the @@ -484,12 +486,16 @@ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb { struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); struct sock *sk = psock->sk; + int err;
if (unlikely(!msg)) return -EAGAIN; sk_msg_init(msg); skb_set_owner_r(skb, sk); - return sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); + err = sk_psock_skb_ingress_enqueue(skb, psock, sk, msg); + if (err < 0) + kfree(msg); + return err; }
static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb,
From: John Fastabend john.fastabend@gmail.com
[ Upstream commit 228a4a7ba8e99bb9ef980b62f71e3be33f4aae69 ]
The proc socket stats use sk_prot->inuse_idx value to record inuse sock stats. We currently do not set this correctly from sockmap side. The result is reading sock stats '/proc/net/sockstat' gives incorrect values. The socket counter is incremented correctly, but because we don't set the counter correctly when we replace sk_prot we may omit the decrement.
To get the correct inuse_idx value move the core_initcall that initializes the TCP proto handlers to late_initcall. This way it is initialized after TCP has the chance to assign the inuse_idx value from the register protocol handler.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/bpf/20210712195546.423990-3-john.fastabend@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_bpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index bc7d2a586e18..f91ae827d47f 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -588,7 +588,7 @@ static int __init tcp_bpf_v4_build_proto(void) tcp_bpf_rebuild_protos(tcp_bpf_prots[TCP_BPF_IPV4], &tcp_prot); return 0; } -core_initcall(tcp_bpf_v4_build_proto); +late_initcall(tcp_bpf_v4_build_proto);
static int tcp_bpf_assert_proto_ops(struct proto *ops) {
From: Jakub Sitnicki jakub@cloudflare.com
[ Upstream commit 54ea2f49fd9400dd698c25450be3352b5613b3b4 ]
The proc socket stats use sk_prot->inuse_idx value to record inuse sock stats. We currently do not set this correctly from sockmap side. The result is reading sock stats '/proc/net/sockstat' gives incorrect values. The socket counter is incremented correctly, but because we don't set the counter correctly when we replace sk_prot we may omit the decrement.
To get the correct inuse_idx value move the core_initcall that initializes the UDP proto handlers to late_initcall. This way it is initialized after UDP has the chance to assign the inuse_idx value from the register protocol handler.
Fixes: edc6741cc660 ("bpf: Add sockmap hooks for UDP sockets") Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Cong Wang cong.wang@bytedance.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/bpf/20210714154750.528206-1-jakub@cloudflare.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/udp_bpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a94791efc1a..69c9663f9ee7 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -39,7 +39,7 @@ static int __init udp_bpf_v4_build_proto(void) udp_bpf_rebuild_protos(&udp_bpf_prots[UDP_BPF_IPV4], &udp_prot); return 0; } -core_initcall(udp_bpf_v4_build_proto); +late_initcall(udp_bpf_v4_build_proto);
struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) {
From: Tobias Klauser tklauser@distanz.ch
[ Upstream commit d444b06e40855219ef38b5e9286db16d435f06dc ]
Fix and add a missing NULL check for the prior malloc() call.
Fixes: 49a086c201a9 ("bpftool: implement prog load command") Signed-off-by: Tobias Klauser tklauser@distanz.ch Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Quentin Monnet quentin@isovalent.com Acked-by: Roman Gushchin guro@fb.com Link: https://lore.kernel.org/bpf/20210715110609.29364-1-tklauser@distanz.ch Signed-off-by: Sasha Levin sashal@kernel.org --- tools/bpf/bpftool/common.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 65303664417e..6ebf2b215ef4 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -221,6 +221,11 @@ int mount_bpffs_for_pin(const char *name) int err = 0;
file = malloc(strlen(name) + 1); + if (!file) { + p_err("mem alloc failed"); + return -1; + } + strcpy(file, name); dir = dirname(file);
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 991e634360f2622a683b48dfe44fe6d9cb765a09 ]
When nr_segs equal to zero in iovec_from_user, the object msg->msg_iter.iov is uninit stack memory in caif_seqpkt_sendmsg which is defined in ___sys_sendmsg. So we cann't just judge msg->msg_iter.iov->base directlly. We can use nr_segs to judge msg in caif_seqpkt_sendmsg whether has data buffers.
===================================================== BUG: KMSAN: uninit-value in caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2343 ___sys_sendmsg net/socket.c:2397 [inline] __sys_sendmmsg+0x808/0xc90 net/socket.c:2480 __compat_sys_sendmmsg net/compat.c:656 [inline]
Reported-by: syzbot+09a5d591c1f98cf5efcb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=1ace85e8fc9b0d5a45c08c2656c3e91762daa9b... Fixes: bece7b2398d0 ("caif: Rewritten socket implementation") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/caif/caif_socket.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c index 3ad0a1df6712..9d26c5e9da05 100644 --- a/net/caif/caif_socket.c +++ b/net/caif/caif_socket.c @@ -539,7 +539,8 @@ static int caif_seqpkt_sendmsg(struct socket *sock, struct msghdr *msg, goto err;
ret = -EINVAL; - if (unlikely(msg->msg_iter.iov->iov_base == NULL)) + if (unlikely(msg->msg_iter.nr_segs == 0) || + unlikely(msg->msg_iter.iov->iov_base == NULL)) goto err; noblock = msg->msg_flags & MSG_DONTWAIT;
From: Dongliang Mu mudongliangabcd@gmail.com
[ Upstream commit a6ecfb39ba9d7316057cea823b196b734f6b18ca ]
The current error handling code of hso_create_net_device is hso_free_net_device, no matter which errors lead to. For example, WARNING in hso_free_net_device [1].
Fix this by refactoring the error handling code of hso_create_net_device by handling different errors by different code.
[1] https://syzkaller.appspot.com/bug?id=66eff8d49af1b28370ad342787413e35bbe76ef...
Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com Fixes: 5fcfb6d0bfcd ("hso: fix bailout in error case of probe") Signed-off-by: Dongliang Mu mudongliangabcd@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/hso.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c index fbfcbd0dcfcb..5b3aff2c279f 100644 --- a/drivers/net/usb/hso.c +++ b/drivers/net/usb/hso.c @@ -2496,7 +2496,7 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface, hso_net_init); if (!net) { dev_err(&interface->dev, "Unable to create ethernet device\n"); - goto exit; + goto err_hso_dev; }
hso_net = netdev_priv(net); @@ -2509,13 +2509,13 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface, USB_DIR_IN); if (!hso_net->in_endp) { dev_err(&interface->dev, "Can't find BULK IN endpoint\n"); - goto exit; + goto err_net; } hso_net->out_endp = hso_get_ep(interface, USB_ENDPOINT_XFER_BULK, USB_DIR_OUT); if (!hso_net->out_endp) { dev_err(&interface->dev, "Can't find BULK OUT endpoint\n"); - goto exit; + goto err_net; } SET_NETDEV_DEV(net, &interface->dev); SET_NETDEV_DEVTYPE(net, &hso_type); @@ -2524,18 +2524,18 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface, for (i = 0; i < MUX_BULK_RX_BUF_COUNT; i++) { hso_net->mux_bulk_rx_urb_pool[i] = usb_alloc_urb(0, GFP_KERNEL); if (!hso_net->mux_bulk_rx_urb_pool[i]) - goto exit; + goto err_mux_bulk_rx; hso_net->mux_bulk_rx_buf_pool[i] = kzalloc(MUX_BULK_RX_BUF_SIZE, GFP_KERNEL); if (!hso_net->mux_bulk_rx_buf_pool[i]) - goto exit; + goto err_mux_bulk_rx; } hso_net->mux_bulk_tx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!hso_net->mux_bulk_tx_urb) - goto exit; + goto err_mux_bulk_rx; hso_net->mux_bulk_tx_buf = kzalloc(MUX_BULK_TX_BUF_SIZE, GFP_KERNEL); if (!hso_net->mux_bulk_tx_buf) - goto exit; + goto err_free_tx_urb;
add_net_device(hso_dev);
@@ -2543,7 +2543,7 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface, result = register_netdev(net); if (result) { dev_err(&interface->dev, "Failed to register device\n"); - goto exit; + goto err_free_tx_buf; }
hso_log_port(hso_dev); @@ -2551,8 +2551,21 @@ static struct hso_device *hso_create_net_device(struct usb_interface *interface, hso_create_rfkill(hso_dev, interface);
return hso_dev; -exit: - hso_free_net_device(hso_dev, true); + +err_free_tx_buf: + remove_net_device(hso_dev); + kfree(hso_net->mux_bulk_tx_buf); +err_free_tx_urb: + usb_free_urb(hso_net->mux_bulk_tx_urb); +err_mux_bulk_rx: + for (i = 0; i < MUX_BULK_RX_BUF_COUNT; i++) { + usb_free_urb(hso_net->mux_bulk_rx_urb_pool[i]); + kfree(hso_net->mux_bulk_rx_buf_pool[i]); + } +err_net: + free_netdev(net); +err_hso_dev: + kfree(hso_dev); return NULL; }
From: Roman Skakun Roman_Skakun@epam.com
[ Upstream commit 40ac971eab89330d6153e7721e88acd2d98833f9 ]
xen-swiotlb can use vmalloc backed addresses for dma coherent allocations and uses the common helpers. Properly handle them to unbreak Xen on ARM platforms.
Fixes: 1b65c4e5a9af ("swiotlb-xen: use xen_alloc/free_coherent_pages") Signed-off-by: Roman Skakun roman_skakun@epam.com Reviewed-by: Andrii Anisov andrii_anisov@epam.com [hch: split the patch, renamed the helpers] Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/dma/ops_helpers.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c index 910ae69cae77..af4a6ef48ce0 100644 --- a/kernel/dma/ops_helpers.c +++ b/kernel/dma/ops_helpers.c @@ -5,6 +5,13 @@ */ #include <linux/dma-map-ops.h>
+static struct page *dma_common_vaddr_to_page(void *cpu_addr) +{ + if (is_vmalloc_addr(cpu_addr)) + return vmalloc_to_page(cpu_addr); + return virt_to_page(cpu_addr); +} + /* * Create scatter-list for the already allocated DMA buffer. */ @@ -12,7 +19,7 @@ int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { - struct page *page = virt_to_page(cpu_addr); + struct page *page = dma_common_vaddr_to_page(cpu_addr); int ret;
ret = sg_alloc_table(sgt, 1, GFP_KERNEL); @@ -32,6 +39,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, unsigned long user_count = vma_pages(vma); unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; unsigned long off = vma->vm_pgoff; + struct page *page = dma_common_vaddr_to_page(cpu_addr); int ret = -ENXIO;
vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs); @@ -43,7 +51,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, return -ENXIO;
return remap_pfn_range(vma, vma->vm_start, - page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff, + page_to_pfn(page) + vma->vm_pgoff, user_count << PAGE_SHIFT, vma->vm_page_prot); #else return -ENXIO;
From: Michal Suchanek msuchanek@suse.de
[ Upstream commit 674a9f1f6815849bfb5bf385e7da8fc198aaaba9 ]
Missing TPM final event log table is not a firmware bug.
Clearly if providing event log in the old format makes the final event log invalid it should not be provided at least in that case.
Fixes: b4f1874c6216 ("tpm: check event log version before reading final events") Signed-off-by: Michal Suchanek msuchanek@suse.de Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/tpm.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/efi/tpm.c b/drivers/firmware/efi/tpm.c index c1955d320fec..8f665678e9e3 100644 --- a/drivers/firmware/efi/tpm.c +++ b/drivers/firmware/efi/tpm.c @@ -62,9 +62,11 @@ int __init efi_tpm_eventlog_init(void) tbl_size = sizeof(*log_tbl) + log_tbl->size; memblock_reserve(efi.tpm_log, tbl_size);
- if (efi.tpm_final_log == EFI_INVALID_TABLE_ADDR || - log_tbl->version != EFI_TCG2_EVENT_LOG_FORMAT_TCG_2) { - pr_warn(FW_BUG "TPM Final Events table missing or invalid\n"); + if (efi.tpm_final_log == EFI_INVALID_TABLE_ADDR) { + pr_info("TPM Final Events table not present\n"); + goto out; + } else if (log_tbl->version != EFI_TCG2_EVENT_LOG_FORMAT_TCG_2) { + pr_warn(FW_BUG "TPM Final Events table invalid\n"); goto out; }
From: Yajun Deng yajun.deng@linux.dev
[ Upstream commit 5f119ba1d5771bbf46d57cff7417dcd84d3084ba ]
The release_sock() is blocking function, it would change the state after sleeping. use wait_woken() instead.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yajun Deng yajun.deng@linux.dev Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/decnet/af_decnet.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index 5dbd45dc35ad..dc92a67baea3 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -816,7 +816,7 @@ static int dn_auto_bind(struct socket *sock) static int dn_confirm_accept(struct sock *sk, long *timeo, gfp_t allocation) { struct dn_scp *scp = DN_SK(sk); - DEFINE_WAIT(wait); + DEFINE_WAIT_FUNC(wait, woken_wake_function); int err;
if (scp->state != DN_CR) @@ -826,11 +826,11 @@ static int dn_confirm_accept(struct sock *sk, long *timeo, gfp_t allocation) scp->segsize_loc = dst_metric_advmss(__sk_dst_get(sk)); dn_send_conn_conf(sk, allocation);
- prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + add_wait_queue(sk_sleep(sk), &wait); for(;;) { release_sock(sk); if (scp->state == DN_CC) - *timeo = schedule_timeout(*timeo); + *timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo); lock_sock(sk); err = 0; if (scp->state == DN_RUN) @@ -844,9 +844,8 @@ static int dn_confirm_accept(struct sock *sk, long *timeo, gfp_t allocation) err = -EAGAIN; if (!*timeo) break; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); } - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); if (err == 0) { sk->sk_socket->state = SS_CONNECTED; } else if (scp->state != DN_CC) { @@ -858,7 +857,7 @@ static int dn_confirm_accept(struct sock *sk, long *timeo, gfp_t allocation) static int dn_wait_run(struct sock *sk, long *timeo) { struct dn_scp *scp = DN_SK(sk); - DEFINE_WAIT(wait); + DEFINE_WAIT_FUNC(wait, woken_wake_function); int err = 0;
if (scp->state == DN_RUN) @@ -867,11 +866,11 @@ static int dn_wait_run(struct sock *sk, long *timeo) if (!*timeo) return -EALREADY;
- prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + add_wait_queue(sk_sleep(sk), &wait); for(;;) { release_sock(sk); if (scp->state == DN_CI || scp->state == DN_CC) - *timeo = schedule_timeout(*timeo); + *timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo); lock_sock(sk); err = 0; if (scp->state == DN_RUN) @@ -885,9 +884,8 @@ static int dn_wait_run(struct sock *sk, long *timeo) err = -ETIMEDOUT; if (!*timeo) break; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); } - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); out: if (err == 0) { sk->sk_socket->state = SS_CONNECTED; @@ -1032,16 +1030,16 @@ static void dn_user_copy(struct sk_buff *skb, struct optdata_dn *opt)
static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo) { - DEFINE_WAIT(wait); + DEFINE_WAIT_FUNC(wait, woken_wake_function); struct sk_buff *skb = NULL; int err = 0;
- prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + add_wait_queue(sk_sleep(sk), &wait); for(;;) { release_sock(sk); skb = skb_dequeue(&sk->sk_receive_queue); if (skb == NULL) { - *timeo = schedule_timeout(*timeo); + *timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, *timeo); skb = skb_dequeue(&sk->sk_receive_queue); } lock_sock(sk); @@ -1056,9 +1054,8 @@ static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo) err = -EAGAIN; if (!*timeo) break; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); } - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait);
return skb == NULL ? ERR_PTR(err) : skb; }
From: Nicholas Piggin npiggin@gmail.com
[ Upstream commit bd31ecf44b8e18ccb1e5f6b50f85de6922a60de3 ]
When running CPU_FTR_P9_TM_HV_ASSIST, HFSCR[TM] is set for the guest even if the host has CONFIG_TRANSACTIONAL_MEM=n, which causes it to be unprepared to handle guest exits while transactional.
Normal guests don't have a problem because the HTM capability will not be advertised, but a rogue or buggy one could crash the host.
Fixes: 4bb3c7a0208f ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9") Reported-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210716024310.164448-1-npiggin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kvm/book3s_hv.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2325b7a6e95f..bd7350a608d4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2366,8 +2366,10 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu) HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX; if (cpu_has_feature(CPU_FTR_HVMODE)) { vcpu->arch.hfscr &= mfspr(SPRN_HFSCR); +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM if (cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) vcpu->arch.hfscr |= HFSCR_TM; +#endif } if (cpu_has_feature(CPU_FTR_TM_COMP)) vcpu->arch.hfscr |= HFSCR_TM;
From: Nicholas Piggin npiggin@gmail.com
[ Upstream commit bc4188a2f56e821ea057aca6bf444e138d06c252 ]
vcpu_put is not called if the user copy fails. This can result in preempt notifier corruption and crashes, among other issues.
Fixes: b3cebfe8c1ca ("KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl") Reported-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210716024310.164448-2-npiggin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 32fa0fa3d4ff..543db9157f3b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -2041,9 +2041,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, { struct kvm_enable_cap cap; r = -EFAULT; - vcpu_load(vcpu); if (copy_from_user(&cap, argp, sizeof(cap))) goto out; + vcpu_load(vcpu); r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap); vcpu_put(vcpu); break; @@ -2067,9 +2067,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp, case KVM_DIRTY_TLB: { struct kvm_dirty_tlb dirty; r = -EFAULT; - vcpu_load(vcpu); if (copy_from_user(&dirty, argp, sizeof(dirty))) goto out; + vcpu_load(vcpu); r = kvm_vcpu_ioctl_dirty_tlb(vcpu, &dirty); vcpu_put(vcpu); break;
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit f5051bcece50140abd1a11a2d36dc3ec5484fc32 ]
Syzbot reported memory leak in tcindex_set_parms(). The problem was in non-freed perfect hash in tcindex_partial_destroy_work().
In tcindex_set_parms() new tcindex_data is allocated and some fields from old one are copied to new one, but not the perfect hash. Since tcindex_partial_destroy_work() is the destroy function for old tcindex_data, we need to free perfect hash to avoid memory leak.
Reported-and-tested-by: syzbot+f0bbb2287b8993d4fa74@syzkaller.appspotmail.com Fixes: 331b72922c5f ("net: sched: RCU cls_tcindex") Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_tcindex.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 5b274534264c..e9a8a2c86bbd 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -278,6 +278,8 @@ static int tcindex_filter_result_init(struct tcindex_filter_result *r, TCA_TCINDEX_POLICE); }
+static void tcindex_free_perfect_hash(struct tcindex_data *cp); + static void tcindex_partial_destroy_work(struct work_struct *work) { struct tcindex_data *p = container_of(to_rcu_work(work), @@ -285,7 +287,8 @@ static void tcindex_partial_destroy_work(struct work_struct *work) rwork);
rtnl_lock(); - kfree(p->perfect); + if (p->perfect) + tcindex_free_perfect_hash(p); kfree(p); rtnl_unlock(); }
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 2f3fdd8d4805015fa964807e1c7f3d88f31bd389 ]
After commit ca84bd058dae ("sctp: copy the optval from user space in sctp_setsockopt"), it does memory allocation in sctp_setsockopt with the optlen, and it would fail the allocation and return error if the optlen from user space is a huge value.
This breaks some sockopts, like SCTP_HMAC_IDENT, SCTP_RESET_STREAMS and SCTP_AUTH_KEY, as when processing these sockopts before, optlen would be trimmed to a biggest value it needs when optlen is a huge value, instead of failing the allocation and returning error.
This patch is to fix the allocation failure when it's a huge optlen from user space by trimming it to the biggest size sctp sockopt may need when necessary, and this biggest size is from sctp_setsockopt_reset_streams() for SCTP_RESET_STREAMS, which is bigger than those for SCTP_HMAC_IDENT and SCTP_AUTH_KEY.
Fixes: ca84bd058dae ("sctp: copy the optval from user space in sctp_setsockopt") Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/socket.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 3ac6b21ecf2c..e872bc50bbe6 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4471,6 +4471,10 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname, }
if (optlen > 0) { + /* Trim it to the biggest size sctp sockopt may need if necessary */ + optlen = min_t(unsigned int, optlen, + PAGE_ALIGN(USHRT_MAX + + sizeof(__u16) * sizeof(struct sctp_reset_streams))); kopt = memdup_sockptr(optval, optlen); if (IS_ERR(kopt)) return PTR_ERR(kopt);
From: Nguyen Dinh Phi phind.uet@gmail.com
[ Upstream commit 517a16b1a88bdb6b530f48d5d153478b2552d9a8 ]
Commit 63346650c1a9 ("netrom: switch to sock timer API") switched to use sock timer API. It replaces mod_timer() by sk_reset_timer(), and del_timer() by sk_stop_timer().
Function sk_reset_timer() will increase the refcount of sock if it is called on an inactive timer, hence, in case the timer expires, we need to decrease the refcount ourselves in the handler, otherwise, the sock refcount will be unbalanced and the sock will never be freed.
Signed-off-by: Nguyen Dinh Phi phind.uet@gmail.com Reported-by: syzbot+10f1194569953b72f1ae@syzkaller.appspotmail.com Fixes: 63346650c1a9 ("netrom: switch to sock timer API") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/netrom/nr_timer.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/net/netrom/nr_timer.c b/net/netrom/nr_timer.c index 9115f8a7dd45..a8da88db7893 100644 --- a/net/netrom/nr_timer.c +++ b/net/netrom/nr_timer.c @@ -121,11 +121,9 @@ static void nr_heartbeat_expiry(struct timer_list *t) is accepted() it isn't 'dead' so doesn't get removed. */ if (sock_flag(sk, SOCK_DESTROY) || (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_DEAD))) { - sock_hold(sk); bh_unlock_sock(sk); nr_destroy_socket(sk); - sock_put(sk); - return; + goto out; } break;
@@ -146,6 +144,8 @@ static void nr_heartbeat_expiry(struct timer_list *t)
nr_start_heartbeat(sk); bh_unlock_sock(sk); +out: + sock_put(sk); }
static void nr_t2timer_expiry(struct timer_list *t) @@ -159,6 +159,7 @@ static void nr_t2timer_expiry(struct timer_list *t) nr_enquiry_response(sk); } bh_unlock_sock(sk); + sock_put(sk); }
static void nr_t4timer_expiry(struct timer_list *t) @@ -169,6 +170,7 @@ static void nr_t4timer_expiry(struct timer_list *t) bh_lock_sock(sk); nr_sk(sk)->condition &= ~NR_COND_PEER_RX_BUSY; bh_unlock_sock(sk); + sock_put(sk); }
static void nr_idletimer_expiry(struct timer_list *t) @@ -197,6 +199,7 @@ static void nr_idletimer_expiry(struct timer_list *t) sock_set_flag(sk, SOCK_DEAD); } bh_unlock_sock(sk); + sock_put(sk); }
static void nr_t1timer_expiry(struct timer_list *t) @@ -209,8 +212,7 @@ static void nr_t1timer_expiry(struct timer_list *t) case NR_STATE_1: if (nr->n2count == nr->n2) { nr_disconnect(sk, ETIMEDOUT); - bh_unlock_sock(sk); - return; + goto out; } else { nr->n2count++; nr_write_internal(sk, NR_CONNREQ); @@ -220,8 +222,7 @@ static void nr_t1timer_expiry(struct timer_list *t) case NR_STATE_2: if (nr->n2count == nr->n2) { nr_disconnect(sk, ETIMEDOUT); - bh_unlock_sock(sk); - return; + goto out; } else { nr->n2count++; nr_write_internal(sk, NR_DISCREQ); @@ -231,8 +232,7 @@ static void nr_t1timer_expiry(struct timer_list *t) case NR_STATE_3: if (nr->n2count == nr->n2) { nr_disconnect(sk, ETIMEDOUT); - bh_unlock_sock(sk); - return; + goto out; } else { nr->n2count++; nr_requeue_frames(sk); @@ -241,5 +241,7 @@ static void nr_t1timer_expiry(struct timer_list *t) }
nr_start_t1timer(sk); +out: bh_unlock_sock(sk); + sock_put(sk); }
From: Mike Christie michael.christie@oracle.com
[ Upstream commit e746f3451ec7f91dcc9fd67a631239c715850a34 ]
A ISCSI_IFACE_PARAM can have the same value as a ISCSI_NET_PARAM so when iscsi_iface_attr_is_visible tries to figure out the type by just checking the value, we can collide and return the wrong type. When we call into the driver we might not match and return that we don't want attr visible in sysfs. The patch fixes this by setting the type when we figure out what the param is.
Link: https://lore.kernel.org/r/20210701002559.89533-1-michael.christie@oracle.com Fixes: 3e0f65b34cc9 ("[SCSI] iscsi_transport: Additional parameters for network settings") Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 90 +++++++++++------------------ 1 file changed, 34 insertions(+), 56 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index 2171dab3e5dc..ac07a9ef3578 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -440,39 +440,10 @@ static umode_t iscsi_iface_attr_is_visible(struct kobject *kobj, struct device *dev = container_of(kobj, struct device, kobj); struct iscsi_iface *iface = iscsi_dev_to_iface(dev); struct iscsi_transport *t = iface->transport; - int param; - int param_type; + int param = -1;
if (attr == &dev_attr_iface_enabled.attr) param = ISCSI_NET_PARAM_IFACE_ENABLE; - else if (attr == &dev_attr_iface_vlan_id.attr) - param = ISCSI_NET_PARAM_VLAN_ID; - else if (attr == &dev_attr_iface_vlan_priority.attr) - param = ISCSI_NET_PARAM_VLAN_PRIORITY; - else if (attr == &dev_attr_iface_vlan_enabled.attr) - param = ISCSI_NET_PARAM_VLAN_ENABLED; - else if (attr == &dev_attr_iface_mtu.attr) - param = ISCSI_NET_PARAM_MTU; - else if (attr == &dev_attr_iface_port.attr) - param = ISCSI_NET_PARAM_PORT; - else if (attr == &dev_attr_iface_ipaddress_state.attr) - param = ISCSI_NET_PARAM_IPADDR_STATE; - else if (attr == &dev_attr_iface_delayed_ack_en.attr) - param = ISCSI_NET_PARAM_DELAYED_ACK_EN; - else if (attr == &dev_attr_iface_tcp_nagle_disable.attr) - param = ISCSI_NET_PARAM_TCP_NAGLE_DISABLE; - else if (attr == &dev_attr_iface_tcp_wsf_disable.attr) - param = ISCSI_NET_PARAM_TCP_WSF_DISABLE; - else if (attr == &dev_attr_iface_tcp_wsf.attr) - param = ISCSI_NET_PARAM_TCP_WSF; - else if (attr == &dev_attr_iface_tcp_timer_scale.attr) - param = ISCSI_NET_PARAM_TCP_TIMER_SCALE; - else if (attr == &dev_attr_iface_tcp_timestamp_en.attr) - param = ISCSI_NET_PARAM_TCP_TIMESTAMP_EN; - else if (attr == &dev_attr_iface_cache_id.attr) - param = ISCSI_NET_PARAM_CACHE_ID; - else if (attr == &dev_attr_iface_redirect_en.attr) - param = ISCSI_NET_PARAM_REDIRECT_EN; else if (attr == &dev_attr_iface_def_taskmgmt_tmo.attr) param = ISCSI_IFACE_PARAM_DEF_TASKMGMT_TMO; else if (attr == &dev_attr_iface_header_digest.attr) @@ -509,6 +480,38 @@ static umode_t iscsi_iface_attr_is_visible(struct kobject *kobj, param = ISCSI_IFACE_PARAM_STRICT_LOGIN_COMP_EN; else if (attr == &dev_attr_iface_initiator_name.attr) param = ISCSI_IFACE_PARAM_INITIATOR_NAME; + + if (param != -1) + return t->attr_is_visible(ISCSI_IFACE_PARAM, param); + + if (attr == &dev_attr_iface_vlan_id.attr) + param = ISCSI_NET_PARAM_VLAN_ID; + else if (attr == &dev_attr_iface_vlan_priority.attr) + param = ISCSI_NET_PARAM_VLAN_PRIORITY; + else if (attr == &dev_attr_iface_vlan_enabled.attr) + param = ISCSI_NET_PARAM_VLAN_ENABLED; + else if (attr == &dev_attr_iface_mtu.attr) + param = ISCSI_NET_PARAM_MTU; + else if (attr == &dev_attr_iface_port.attr) + param = ISCSI_NET_PARAM_PORT; + else if (attr == &dev_attr_iface_ipaddress_state.attr) + param = ISCSI_NET_PARAM_IPADDR_STATE; + else if (attr == &dev_attr_iface_delayed_ack_en.attr) + param = ISCSI_NET_PARAM_DELAYED_ACK_EN; + else if (attr == &dev_attr_iface_tcp_nagle_disable.attr) + param = ISCSI_NET_PARAM_TCP_NAGLE_DISABLE; + else if (attr == &dev_attr_iface_tcp_wsf_disable.attr) + param = ISCSI_NET_PARAM_TCP_WSF_DISABLE; + else if (attr == &dev_attr_iface_tcp_wsf.attr) + param = ISCSI_NET_PARAM_TCP_WSF; + else if (attr == &dev_attr_iface_tcp_timer_scale.attr) + param = ISCSI_NET_PARAM_TCP_TIMER_SCALE; + else if (attr == &dev_attr_iface_tcp_timestamp_en.attr) + param = ISCSI_NET_PARAM_TCP_TIMESTAMP_EN; + else if (attr == &dev_attr_iface_cache_id.attr) + param = ISCSI_NET_PARAM_CACHE_ID; + else if (attr == &dev_attr_iface_redirect_en.attr) + param = ISCSI_NET_PARAM_REDIRECT_EN; else if (iface->iface_type == ISCSI_IFACE_TYPE_IPV4) { if (attr == &dev_attr_ipv4_iface_ipaddress.attr) param = ISCSI_NET_PARAM_IPV4_ADDR; @@ -599,32 +602,7 @@ static umode_t iscsi_iface_attr_is_visible(struct kobject *kobj, return 0; }
- switch (param) { - case ISCSI_IFACE_PARAM_DEF_TASKMGMT_TMO: - case ISCSI_IFACE_PARAM_HDRDGST_EN: - case ISCSI_IFACE_PARAM_DATADGST_EN: - case ISCSI_IFACE_PARAM_IMM_DATA_EN: - case ISCSI_IFACE_PARAM_INITIAL_R2T_EN: - case ISCSI_IFACE_PARAM_DATASEQ_INORDER_EN: - case ISCSI_IFACE_PARAM_PDU_INORDER_EN: - case ISCSI_IFACE_PARAM_ERL: - case ISCSI_IFACE_PARAM_MAX_RECV_DLENGTH: - case ISCSI_IFACE_PARAM_FIRST_BURST: - case ISCSI_IFACE_PARAM_MAX_R2T: - case ISCSI_IFACE_PARAM_MAX_BURST: - case ISCSI_IFACE_PARAM_CHAP_AUTH_EN: - case ISCSI_IFACE_PARAM_BIDI_CHAP_EN: - case ISCSI_IFACE_PARAM_DISCOVERY_AUTH_OPTIONAL: - case ISCSI_IFACE_PARAM_DISCOVERY_LOGOUT_EN: - case ISCSI_IFACE_PARAM_STRICT_LOGIN_COMP_EN: - case ISCSI_IFACE_PARAM_INITIATOR_NAME: - param_type = ISCSI_IFACE_PARAM; - break; - default: - param_type = ISCSI_NET_PARAM; - } - - return t->attr_is_visible(param_type, param); + return t->attr_is_visible(ISCSI_NET_PARAM, param); }
static struct attribute *iscsi_iface_attrs[] = {
From: Dmitry Bogdanov d.bogdanov@yadro.com
[ Upstream commit 6d8e7e7c932162bccd06872362751b0e1d76f5af ]
WRITE SAME(32) command handling reads WRPROTECT at the wrong offset in 1st byte instead of 10th byte.
Link: https://lore.kernel.org/r/20210702091655.22818-1-d.bogdanov@yadro.com Fixes: afd73f1b60fc ("target: Perform PROTECT sanity checks for WRITE_SAME") Signed-off-by: Dmitry Bogdanov d.bogdanov@yadro.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_sbc.c | 35 ++++++++++++++++---------------- 1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index 6e8b8d30938f..eaf8551ebc61 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -25,7 +25,7 @@ #include "target_core_alua.h"
static sense_reason_t -sbc_check_prot(struct se_device *, struct se_cmd *, unsigned char *, u32, bool); +sbc_check_prot(struct se_device *, struct se_cmd *, unsigned char, u32, bool); static sense_reason_t sbc_execute_unmap(struct se_cmd *cmd);
static sense_reason_t @@ -279,14 +279,14 @@ static inline unsigned long long transport_lba_64_ext(unsigned char *cdb) }
static sense_reason_t -sbc_setup_write_same(struct se_cmd *cmd, unsigned char *flags, struct sbc_ops *ops) +sbc_setup_write_same(struct se_cmd *cmd, unsigned char flags, struct sbc_ops *ops) { struct se_device *dev = cmd->se_dev; sector_t end_lba = dev->transport->get_blocks(dev) + 1; unsigned int sectors = sbc_get_write_same_sectors(cmd); sense_reason_t ret;
- if ((flags[0] & 0x04) || (flags[0] & 0x02)) { + if ((flags & 0x04) || (flags & 0x02)) { pr_err("WRITE_SAME PBDATA and LBDATA" " bits not supported for Block Discard" " Emulation\n"); @@ -308,7 +308,7 @@ sbc_setup_write_same(struct se_cmd *cmd, unsigned char *flags, struct sbc_ops *o }
/* We always have ANC_SUP == 0 so setting ANCHOR is always an error */ - if (flags[0] & 0x10) { + if (flags & 0x10) { pr_warn("WRITE SAME with ANCHOR not supported\n"); return TCM_INVALID_CDB_FIELD; } @@ -316,7 +316,7 @@ sbc_setup_write_same(struct se_cmd *cmd, unsigned char *flags, struct sbc_ops *o * Special case for WRITE_SAME w/ UNMAP=1 that ends up getting * translated into block discard requests within backend code. */ - if (flags[0] & 0x08) { + if (flags & 0x08) { if (!ops->execute_unmap) return TCM_UNSUPPORTED_SCSI_OPCODE;
@@ -331,7 +331,7 @@ sbc_setup_write_same(struct se_cmd *cmd, unsigned char *flags, struct sbc_ops *o if (!ops->execute_write_same) return TCM_UNSUPPORTED_SCSI_OPCODE;
- ret = sbc_check_prot(dev, cmd, &cmd->t_task_cdb[0], sectors, true); + ret = sbc_check_prot(dev, cmd, flags >> 5, sectors, true); if (ret) return ret;
@@ -686,10 +686,9 @@ sbc_set_prot_op_checks(u8 protect, bool fabric_prot, enum target_prot_type prot_ }
static sense_reason_t -sbc_check_prot(struct se_device *dev, struct se_cmd *cmd, unsigned char *cdb, +sbc_check_prot(struct se_device *dev, struct se_cmd *cmd, unsigned char protect, u32 sectors, bool is_write) { - u8 protect = cdb[1] >> 5; int sp_ops = cmd->se_sess->sup_prot_ops; int pi_prot_type = dev->dev_attrib.pi_prot_type; bool fabric_prot = false; @@ -737,7 +736,7 @@ sbc_check_prot(struct se_device *dev, struct se_cmd *cmd, unsigned char *cdb, fallthrough; default: pr_err("Unable to determine pi_prot_type for CDB: 0x%02x " - "PROTECT: 0x%02x\n", cdb[0], protect); + "PROTECT: 0x%02x\n", cmd->t_task_cdb[0], protect); return TCM_INVALID_CDB_FIELD; }
@@ -812,7 +811,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) if (sbc_check_dpofua(dev, cmd, cdb)) return TCM_INVALID_CDB_FIELD;
- ret = sbc_check_prot(dev, cmd, cdb, sectors, false); + ret = sbc_check_prot(dev, cmd, cdb[1] >> 5, sectors, false); if (ret) return ret;
@@ -826,7 +825,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) if (sbc_check_dpofua(dev, cmd, cdb)) return TCM_INVALID_CDB_FIELD;
- ret = sbc_check_prot(dev, cmd, cdb, sectors, false); + ret = sbc_check_prot(dev, cmd, cdb[1] >> 5, sectors, false); if (ret) return ret;
@@ -840,7 +839,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) if (sbc_check_dpofua(dev, cmd, cdb)) return TCM_INVALID_CDB_FIELD;
- ret = sbc_check_prot(dev, cmd, cdb, sectors, false); + ret = sbc_check_prot(dev, cmd, cdb[1] >> 5, sectors, false); if (ret) return ret;
@@ -861,7 +860,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) if (sbc_check_dpofua(dev, cmd, cdb)) return TCM_INVALID_CDB_FIELD;
- ret = sbc_check_prot(dev, cmd, cdb, sectors, true); + ret = sbc_check_prot(dev, cmd, cdb[1] >> 5, sectors, true); if (ret) return ret;
@@ -875,7 +874,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) if (sbc_check_dpofua(dev, cmd, cdb)) return TCM_INVALID_CDB_FIELD;
- ret = sbc_check_prot(dev, cmd, cdb, sectors, true); + ret = sbc_check_prot(dev, cmd, cdb[1] >> 5, sectors, true); if (ret) return ret;
@@ -890,7 +889,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) if (sbc_check_dpofua(dev, cmd, cdb)) return TCM_INVALID_CDB_FIELD;
- ret = sbc_check_prot(dev, cmd, cdb, sectors, true); + ret = sbc_check_prot(dev, cmd, cdb[1] >> 5, sectors, true); if (ret) return ret;
@@ -949,7 +948,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) size = sbc_get_size(cmd, 1); cmd->t_task_lba = get_unaligned_be64(&cdb[12]);
- ret = sbc_setup_write_same(cmd, &cdb[10], ops); + ret = sbc_setup_write_same(cmd, cdb[10], ops); if (ret) return ret; break; @@ -1048,7 +1047,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) size = sbc_get_size(cmd, 1); cmd->t_task_lba = get_unaligned_be64(&cdb[2]);
- ret = sbc_setup_write_same(cmd, &cdb[1], ops); + ret = sbc_setup_write_same(cmd, cdb[1], ops); if (ret) return ret; break; @@ -1066,7 +1065,7 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) * Follow sbcr26 with WRITE_SAME (10) and check for the existence * of byte 1 bit 3 UNMAP instead of original reserved field */ - ret = sbc_setup_write_same(cmd, &cdb[1], ops); + ret = sbc_setup_write_same(cmd, cdb[1], ops); if (ret) return ret; break;
From: Marek Vasut marex@denx.de
[ Upstream commit 56912da7a68c8356df6a6740476237441b0b792a ]
The original implementation of RPM handling in probe() was mostly correct, except it failed to call pm_runtime_get_*() to activate the hardware. The subsequent fix, 734882a8bf98 ("spi: cadence: Correct initialisation of runtime PM"), breaks the implementation further, to the point where the system using this hard IP on ZynqMP hangs on boot, because it accesses hardware which is gated off.
Undo 734882a8bf98 ("spi: cadence: Correct initialisation of runtime PM") and instead add missing pm_runtime_get_noresume() and move the RPM disabling all the way to the end of probe(). That makes ZynqMP not hang on boot yet again.
Fixes: 734882a8bf98 ("spi: cadence: Correct initialisation of runtime PM") Signed-off-by: Marek Vasut marex@denx.de Cc: Charles Keepax ckeepax@opensource.cirrus.com Cc: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20210716182133.218640-1-marex@denx.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-cadence.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/spi/spi-cadence.c b/drivers/spi/spi-cadence.c index a3afd1b9ac56..ceb16e70d235 100644 --- a/drivers/spi/spi-cadence.c +++ b/drivers/spi/spi-cadence.c @@ -517,6 +517,12 @@ static int cdns_spi_probe(struct platform_device *pdev) goto clk_dis_apb; }
+ pm_runtime_use_autosuspend(&pdev->dev); + pm_runtime_set_autosuspend_delay(&pdev->dev, SPI_AUTOSUSPEND_TIMEOUT); + pm_runtime_get_noresume(&pdev->dev); + pm_runtime_set_active(&pdev->dev); + pm_runtime_enable(&pdev->dev); + ret = of_property_read_u32(pdev->dev.of_node, "num-cs", &num_cs); if (ret < 0) master->num_chipselect = CDNS_SPI_DEFAULT_NUM_CS; @@ -531,11 +537,6 @@ static int cdns_spi_probe(struct platform_device *pdev) /* SPI controller initializations */ cdns_spi_init_hw(xspi);
- pm_runtime_set_active(&pdev->dev); - pm_runtime_enable(&pdev->dev); - pm_runtime_use_autosuspend(&pdev->dev); - pm_runtime_set_autosuspend_delay(&pdev->dev, SPI_AUTOSUSPEND_TIMEOUT); - irq = platform_get_irq(pdev, 0); if (irq <= 0) { ret = -ENXIO; @@ -566,6 +567,9 @@ static int cdns_spi_probe(struct platform_device *pdev)
master->bits_per_word_mask = SPI_BPW_MASK(8);
+ pm_runtime_mark_last_busy(&pdev->dev); + pm_runtime_put_autosuspend(&pdev->dev); + ret = spi_register_master(master); if (ret) { dev_err(&pdev->dev, "spi_register_master failed\n");
From: Robert Richter rrichter@amd.com
[ Upstream commit d2cbbf1fe503c07e466c62f83aa1926d74d15821 ]
During a rework of initramfs code the INITRAMFS_COMPRESSION config option was removed in commit 65e00e04e5ae. A leftover as a dependency broke the config option ACPI_TABLE_OVERRIDE_VIA_ BUILTIN_INITRD that is used to enable the overriding of ACPI tables from built-in initrd. Fixing the dependency.
Fixes: 65e00e04e5ae ("initramfs: refactor the initramfs build rules") Signed-off-by: Robert Richter rrichter@amd.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index edf1558c1105..b5ea34c340cc 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -359,7 +359,7 @@ config ACPI_TABLE_UPGRADE config ACPI_TABLE_OVERRIDE_VIA_BUILTIN_INITRD bool "Override ACPI tables from built-in initrd" depends on ACPI_TABLE_UPGRADE - depends on INITRAMFS_SOURCE!="" && INITRAMFS_COMPRESSION="" + depends on INITRAMFS_SOURCE!="" && INITRAMFS_COMPRESSION_NONE help This option provides functionality to override arbitrary ACPI tables from built-in uncompressed initrd.
From: Kalesh AP kalesh-anakkur.purayil@broadcom.com
[ Upstream commit c81cfb6256d90ea5ba4a6fb280ea3b171be4e05c ]
If device is already disabled in reset path and PCI io error is detected before the device could be enabled, driver could call pci_disable_device() for already disabled device. Fix this problem by calling pci_disable_device() only if the device is already enabled.
Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.") Signed-off-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index db1b89f57079..f003f08de167 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -12901,7 +12901,8 @@ static pci_ers_result_t bnxt_io_error_detected(struct pci_dev *pdev, if (netif_running(netdev)) bnxt_close(netdev);
- pci_disable_device(pdev); + if (pci_is_enabled(pdev)) + pci_disable_device(pdev); bnxt_free_ctx_mem(bp); kfree(bp->ctx); bp->ctx = NULL;
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 2c9f046bc377efd1f5e26e74817d5f96e9506c86 ]
The capabilities can change after firmware upgrade/downgrade, so we should get the up-to-date RoCE capabilities everytime bnxt_ulp_probe() is called.
Fixes: 2151fe0830fd ("bnxt_en: Handle RESET_NOTIFY async event from firmware.") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Edwin Peer edwin.peer@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c index 64dbbb04b043..abf169001bf3 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c @@ -479,15 +479,16 @@ struct bnxt_en_dev *bnxt_ulp_probe(struct net_device *dev) if (!edev) return ERR_PTR(-ENOMEM); edev->en_ops = &bnxt_en_ops_tbl; - if (bp->flags & BNXT_FLAG_ROCEV1_CAP) - edev->flags |= BNXT_EN_FLAG_ROCEV1_CAP; - if (bp->flags & BNXT_FLAG_ROCEV2_CAP) - edev->flags |= BNXT_EN_FLAG_ROCEV2_CAP; edev->net = dev; edev->pdev = bp->pdev; edev->l2_db_size = bp->db_size; edev->l2_db_size_nc = bp->db_size; bp->edev = edev; } + edev->flags &= ~BNXT_EN_FLAG_ROCE_CAP; + if (bp->flags & BNXT_FLAG_ROCEV1_CAP) + edev->flags |= BNXT_EN_FLAG_ROCEV1_CAP; + if (bp->flags & BNXT_FLAG_ROCEV2_CAP) + edev->flags |= BNXT_EN_FLAG_ROCEV2_CAP; return bp->edev; }
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 6cd657cb3ee6f4de57e635b126ffbe0e51d00f1a ]
In the BNXT_FW_RESET_STATE_POLL_VF state in bnxt_fw_reset_task() after all VFs have unregistered, we need to check for BNXT_STATE_ABORT_ERR after we acquire the rtnl_lock. If the flag is set, we need to abort.
Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index f003f08de167..dee6bcfe2fe2 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11480,6 +11480,10 @@ static void bnxt_fw_reset_task(struct work_struct *work) } bp->fw_reset_timestamp = jiffies; rtnl_lock(); + if (test_bit(BNXT_STATE_ABORT_ERR, &bp->state)) { + rtnl_unlock(); + goto fw_reset_abort; + } bnxt_fw_reset_close(bp); if (bp->fw_cap & BNXT_FW_CAP_ERR_RECOVER_RELOAD) { bp->fw_reset_state = BNXT_FW_RESET_STATE_POLL_FW_DOWN;
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 96bdd4b9ea7ef9a12db8fdd0ce90e37dffbd3703 ]
Only pass supported VLAN protocol IDs for stripped VLAN tags to the stack. The stack will hit WARN() if the protocol ID is unsupported.
Existing firmware sets up the chip to strip 0x8100, 0x88a8, 0x9100. Only the 1st two protocols are supported by the kernel.
Fixes: a196e96bb68f ("bnxt_en: clean up VLAN feature bit handling") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index dee6bcfe2fe2..e3a8c1c6d237 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1633,11 +1633,16 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
if ((tpa_info->flags2 & RX_CMP_FLAGS2_META_FORMAT_VLAN) && (skb->dev->features & BNXT_HW_FEATURE_VLAN_ALL_RX)) { - u16 vlan_proto = tpa_info->metadata >> - RX_CMP_FLAGS2_METADATA_TPID_SFT; + __be16 vlan_proto = htons(tpa_info->metadata >> + RX_CMP_FLAGS2_METADATA_TPID_SFT); u16 vtag = tpa_info->metadata & RX_CMP_FLAGS2_METADATA_TCI_MASK;
- __vlan_hwaccel_put_tag(skb, htons(vlan_proto), vtag); + if (eth_type_vlan(vlan_proto)) { + __vlan_hwaccel_put_tag(skb, vlan_proto, vtag); + } else { + dev_kfree_skb(skb); + return NULL; + } }
skb_checksum_none_assert(skb); @@ -1858,9 +1863,15 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, (skb->dev->features & BNXT_HW_FEATURE_VLAN_ALL_RX)) { u32 meta_data = le32_to_cpu(rxcmp1->rx_cmp_meta_data); u16 vtag = meta_data & RX_CMP_FLAGS2_METADATA_TCI_MASK; - u16 vlan_proto = meta_data >> RX_CMP_FLAGS2_METADATA_TPID_SFT; + __be16 vlan_proto = htons(meta_data >> + RX_CMP_FLAGS2_METADATA_TPID_SFT);
- __vlan_hwaccel_put_tag(skb, htons(vlan_proto), vtag); + if (eth_type_vlan(vlan_proto)) { + __vlan_hwaccel_put_tag(skb, vlan_proto, vtag); + } else { + dev_kfree_skb(skb); + goto next_rx; + } }
skb_checksum_none_assert(skb);
From: Somnath Kotur somnath.kotur@broadcom.com
[ Upstream commit 11a39259ff79b74bc99f8b7c44075a2d6d5e7ab1 ]
bnxt_half_open_nic() is called during during ethtool self test and is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_half_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_half_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The ethtool self test will fail but the NIC will be brought to a consistent IF_DOWN state.
Without this patch, if bnxt_half_open_nic() were to continue in this error state, it may crash like this:
bnxt_en 0000:82:00.1 enp130s0f1np1: FW reset in progress during close, FW reset will be aborted Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... Process ethtool (pid: 333327, stack limit = 0x0000000046476577) Call trace: bnxt_alloc_mem+0x444/0xef0 [bnxt_en] bnxt_half_open_nic+0x24/0xb8 [bnxt_en] bnxt_self_test+0x2dc/0x390 [bnxt_en] ethtool_self_test+0xe0/0x1f8 dev_ethtool+0x1744/0x22d0 dev_ioctl+0x190/0x3e0 sock_ioctl+0x238/0x480 do_vfs_ioctl+0xc4/0x758 ksys_ioctl+0x84/0xb8 __arm64_sys_ioctl+0x28/0x38 el0_svc_handler+0xb0/0x180 el0_svc+0x8/0xc
Fixes: a1301f08c5ac ("bnxt_en: Check abort error state in bnxt_open_nic().") Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index e3a8c1c6d237..8f169508a90a 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -9841,6 +9841,12 @@ int bnxt_half_open_nic(struct bnxt *bp) { int rc = 0;
+ if (test_bit(BNXT_STATE_ABORT_ERR, &bp->state)) { + netdev_err(bp->dev, "A previous firmware reset has not completed, aborting half open\n"); + rc = -ENODEV; + goto half_open_err; + } + rc = bnxt_alloc_mem(bp, false); if (rc) { netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc);
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit b16f3299ae1aa3c327e1fb742d0379ae4d6e86f2 ]
Building on ARCH=arc causes a "redefined" warning, so rename this driver's CACHE_LINE_MASK to avoid the warning.
../drivers/net/ethernet/hisilicon/hip04_eth.c:134: warning: "CACHE_LINE_MASK" redefined 134 | #define CACHE_LINE_MASK 0x3F In file included from ../include/linux/cache.h:6, from ../include/linux/printk.h:9, from ../include/linux/kernel.h:19, from ../include/linux/list.h:9, from ../include/linux/module.h:12, from ../drivers/net/ethernet/hisilicon/hip04_eth.c:7: ../arch/arc/include/asm/cache.h:17: note: this is the location of the previous definition 17 | #define CACHE_LINE_MASK (~(L1_CACHE_BYTES - 1))
Fixes: d413779cdd93 ("net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC") Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Vineet Gupta vgupta@synopsys.com Cc: Jiangfeng Xiao xiaojiangfeng@huawei.com Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hip04_eth.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 12f6c2442a7a..e53512f6878a 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -131,7 +131,7 @@ /* buf unit size is cache_line_size, which is 64, so the shift is 6 */ #define PPE_BUF_SIZE_SHIFT 6 #define PPE_TX_BUF_HOLD BIT(31) -#define CACHE_LINE_MASK 0x3F +#define SOC_CACHE_LINE_MASK 0x3F #else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT 11 @@ -531,8 +531,8 @@ hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev) #if defined(CONFIG_HI13X1_GMAC) desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT); - desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK); - desc->send_addr = (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK); + desc->data_offset = (__force u32)cpu_to_be32(phys & SOC_CACHE_LINE_MASK); + desc->send_addr = (__force u32)cpu_to_be32(phys & ~SOC_CACHE_LINE_MASK); #else desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); desc->send_addr = (__force u32)cpu_to_be32(phys);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 6f20c8adb1813467ea52c1296d52c4e95978cb2f ]
tfo_active_disable_stamp is read and written locklessly. We need to annotate these accesses appropriately.
Then, we need to perform the atomic_inc(tfo_active_disable_times) after the timestamp has been updated, and thus add barriers to make sure tcp_fastopen_active_should_disable() wont read a stale timestamp.
Fixes: cf1ef3f0719b ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Wei Wang weiwan@google.com Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Acked-by: Wei Wang weiwan@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_fastopen.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index af2814c9342a..08548ff23d83 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -507,8 +507,15 @@ void tcp_fastopen_active_disable(struct sock *sk) { struct net *net = sock_net(sk);
+ /* Paired with READ_ONCE() in tcp_fastopen_active_should_disable() */ + WRITE_ONCE(net->ipv4.tfo_active_disable_stamp, jiffies); + + /* Paired with smp_rmb() in tcp_fastopen_active_should_disable(). + * We want net->ipv4.tfo_active_disable_stamp to be updated first. + */ + smp_mb__before_atomic(); atomic_inc(&net->ipv4.tfo_active_disable_times); - net->ipv4.tfo_active_disable_stamp = jiffies; + NET_INC_STATS(net, LINUX_MIB_TCPFASTOPENBLACKHOLE); }
@@ -526,10 +533,16 @@ bool tcp_fastopen_active_should_disable(struct sock *sk) if (!tfo_da_times) return false;
+ /* Paired with smp_mb__before_atomic() in tcp_fastopen_active_disable() */ + smp_rmb(); + /* Limit timout to max: 2^6 * initial timeout */ multiplier = 1 << min(tfo_da_times - 1, 6); - timeout = multiplier * tfo_bh_timeout * HZ; - if (time_before(jiffies, sock_net(sk)->ipv4.tfo_active_disable_stamp + timeout)) + + /* Paired with the WRITE_ONCE() in tcp_fastopen_active_disable(). */ + timeout = READ_ONCE(sock_net(sk)->ipv4.tfo_active_disable_stamp) + + multiplier * tfo_bh_timeout * HZ; + if (time_before(jiffies, timeout)) return true;
/* Mark check bit so we can check for successful active TFO
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit 114613f62f42e7cbc1242c4e82076a0153043761 ]
We missed the fact that ElkhartLake platforms have two different PCI IDs. We only added one so the SOF driver is never selected by the autodetection logic for the missing configuration.
BugLink: https://github.com/thesofproject/linux/issues/2990 Fixes: cc8f81c7e625 ('ALSA: hda: fix intel DSP config') Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20210719231746.557325-1-pierre-louis.bossart@linux... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/hda/intel-dsp-config.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/sound/hda/intel-dsp-config.c b/sound/hda/intel-dsp-config.c index fe49e9a97f0e..61e1de6d7be0 100644 --- a/sound/hda/intel-dsp-config.c +++ b/sound/hda/intel-dsp-config.c @@ -318,6 +318,10 @@ static const struct config_entry config_table[] = { .flags = FLAG_SOF | FLAG_SOF_ONLY_IF_DMIC, .device = 0x4b55, }, + { + .flags = FLAG_SOF | FLAG_SOF_ONLY_IF_DMIC, + .device = 0x4b58, + }, #endif
};
From: Chengwen Feng fengchengwen@huawei.com
[ Upstream commit 1b713d14dc3c077ec45e65dab4ea01a8bc41b8c1 ]
Currently, the mailbox synchronous communication between VF and PF use the following fields to maintain communication: 1. Origin_mbx_msg which was combined by message code and subcode, used to match request and response. 2. Received_resp which means whether received response.
There may possible mismatches of the following situation: 1. VF sends message A with code=1 subcode=1. 2. PF was blocked about 500ms when processing the message A. 3. VF will detect message A timeout because it can't get the response within 500ms. 4. VF sends message B with code=1 subcode=1 which equal message A. 5. PF processes the first message A and send the response message to VF. 6. VF will identify the response matched the message B because the code/subcode is the same. This will lead to mismatch of request and response.
To fix the above bug, we use the following scheme: 1. The message sent from VF was labelled with match_id which was a unique 16-bit non-zero value. 2. The response sent from PF will label with match_id which got from the request. 3. The VF uses the match_id to match request and response message.
As for PF driver, it only needs to copy the match_id from request to response.
Fixes: dde1a86e93ca ("net: hns3: Add mailbox support to PF driver") Signed-off-by: Chengwen Feng fengchengwen@huawei.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h | 6 ++++-- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 1 + 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h index 98a9f5e3fe86..98f55fbe6c3d 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h +++ b/drivers/net/ethernet/hisilicon/hns3/hclge_mbx.h @@ -134,7 +134,8 @@ struct hclge_mbx_vf_to_pf_cmd { u8 mbx_need_resp; u8 rsv1[1]; u8 msg_len; - u8 rsv2[3]; + u8 rsv2; + u16 match_id; struct hclge_vf_to_pf_msg msg; };
@@ -144,7 +145,8 @@ struct hclge_mbx_pf_to_vf_cmd { u8 dest_vfid; u8 rsv[3]; u8 msg_len; - u8 rsv1[3]; + u8 rsv1; + u16 match_id; struct hclge_pf_to_vf_msg msg; };
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 2c2d53f5c56e..61f6f0287cbe 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -47,6 +47,7 @@ static int hclge_gen_resp_to_vf(struct hclge_vport *vport,
resp_pf_to_vf->dest_vfid = vf_to_pf_req->mbx_src_vfid; resp_pf_to_vf->msg_len = vf_to_pf_req->msg_len; + resp_pf_to_vf->match_id = vf_to_pf_req->match_id;
resp_pf_to_vf->msg.code = HCLGE_MBX_PF_VF_RESP; resp_pf_to_vf->msg.vf_mbx_msg_code = vf_to_pf_req->msg.code;
From: Jian Shen shenjian15@huawei.com
[ Upstream commit bbfd4506f962e7e6fff8f37f017154a3c3791264 ]
Currently, VF doesn't enable rx VLAN offload when initializating, and PF does it for VFs. If user disable the rx VLAN offload for VF with ethtool -K, and reload the VF driver, it may cause the rx VLAN offload state being inconsistent between hardware and software.
Fixes it by enabling rx VLAN offload when VF initializing.
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index ac6980acb6f0..d3010d5ab366 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -2518,6 +2518,16 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
static int hclgevf_init_vlan_config(struct hclgevf_dev *hdev) { + struct hnae3_handle *nic = &hdev->nic; + int ret; + + ret = hclgevf_en_hw_strip_rxvtag(nic, true); + if (ret) { + dev_err(&hdev->pdev->dev, + "failed to enable rx vlan offload, ret = %d\n", ret); + return ret; + } + return hclgevf_set_vlan_filter(&hdev->nic, htons(ETH_P_8021Q), 0, false); }
From: Alexandru Tachici alexandru.tachici@analog.com
[ Upstream commit c45c1e82bba130db4f19d9dbc1deefcf4ea994ed ]
The bcm2835_spi_transfer_one function can create a deadlock if it is called while another thread already has the CCF lock.
Signed-off-by: Alexandru Tachici alexandru.tachici@analog.com Fixes: f8043872e796 ("spi: add driver for BCM2835") Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20210716210245.13240-2-alexandru.tachici@analog.co... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-bcm2835.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c index 29ee555a42f9..33c32e931767 100644 --- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -84,6 +84,7 @@ MODULE_PARM_DESC(polling_limit_us, * struct bcm2835_spi - BCM2835 SPI controller * @regs: base address of register map * @clk: core clock, divided to calculate serial clock + * @clk_hz: core clock cached speed * @irq: interrupt, signals TX FIFO empty or RX FIFO ¾ full * @tfr: SPI transfer currently processed * @ctlr: SPI controller reverse lookup @@ -124,6 +125,7 @@ MODULE_PARM_DESC(polling_limit_us, struct bcm2835_spi { void __iomem *regs; struct clk *clk; + unsigned long clk_hz; int irq; struct spi_transfer *tfr; struct spi_controller *ctlr; @@ -1082,19 +1084,18 @@ static int bcm2835_spi_transfer_one(struct spi_controller *ctlr, struct spi_transfer *tfr) { struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr); - unsigned long spi_hz, clk_hz, cdiv; + unsigned long spi_hz, cdiv; unsigned long hz_per_byte, byte_limit; u32 cs = bs->prepare_cs[spi->chip_select];
/* set clock */ spi_hz = tfr->speed_hz; - clk_hz = clk_get_rate(bs->clk);
- if (spi_hz >= clk_hz / 2) { + if (spi_hz >= bs->clk_hz / 2) { cdiv = 2; /* clk_hz/2 is the fastest we can go */ } else if (spi_hz) { /* CDIV must be a multiple of two */ - cdiv = DIV_ROUND_UP(clk_hz, spi_hz); + cdiv = DIV_ROUND_UP(bs->clk_hz, spi_hz); cdiv += (cdiv % 2);
if (cdiv >= 65536) @@ -1102,7 +1103,7 @@ static int bcm2835_spi_transfer_one(struct spi_controller *ctlr, } else { cdiv = 0; /* 0 is the slowest we can go */ } - tfr->effective_speed_hz = cdiv ? (clk_hz / cdiv) : (clk_hz / 65536); + tfr->effective_speed_hz = cdiv ? (bs->clk_hz / cdiv) : (bs->clk_hz / 65536); bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
/* handle all the 3-wire mode */ @@ -1318,6 +1319,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev) return bs->irq ? bs->irq : -ENODEV;
clk_prepare_enable(bs->clk); + bs->clk_hz = clk_get_rate(bs->clk);
err = bcm2835_dma_init(ctlr, &pdev->dev, bs); if (err)
From: Peilin Ye peilin.ye@bytedance.com
[ Upstream commit 727d6a8b7ef3d25080fad228b2c4a1d4da5999c6 ]
Currently tcf_skbmod_act() assumes that packets use Ethernet as their L2 protocol, which is not always the case. As an example, for CAN devices:
$ ip link add dev vcan0 type vcan $ ip link set up vcan0 $ tc qdisc add dev vcan0 root handle 1: htb $ tc filter add dev vcan0 parent 1: protocol ip prio 10 \ matchall action skbmod swap mac
Doing the above silently corrupts all the packets. Do not perform skbmod actions for non-Ethernet packets.
Fixes: 86da71b57383 ("net_sched: Introduce skbmod action") Reviewed-by: Cong Wang cong.wang@bytedance.com Signed-off-by: Peilin Ye peilin.ye@bytedance.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_skbmod.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c index 81a1c67335be..8d17a543cc9f 100644 --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -6,6 +6,7 @@ */
#include <linux/module.h> +#include <linux/if_arp.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/skbuff.h> @@ -33,6 +34,13 @@ static int tcf_skbmod_act(struct sk_buff *skb, const struct tc_action *a, tcf_lastuse_update(&d->tcf_tm); bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
+ action = READ_ONCE(d->tcf_action); + if (unlikely(action == TC_ACT_SHOT)) + goto drop; + + if (!skb->dev || skb->dev->type != ARPHRD_ETHER) + return action; + /* XXX: if you are going to edit more fields beyond ethernet header * (example when you add IP header replacement or vlan swap) * then MAX_EDIT_LEN needs to change appropriately @@ -41,10 +49,6 @@ static int tcf_skbmod_act(struct sk_buff *skb, const struct tc_action *a, if (unlikely(err)) /* best policy is to drop on the floor */ goto drop;
- action = READ_ONCE(d->tcf_action); - if (unlikely(action == TC_ACT_SHOT)) - goto drop; - p = rcu_dereference_bh(d->skbmod_p); flags = p->flags; if (flags & SKBMOD_F_DMAC)
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 8fb4792f091e608a0a1d353dfdf07ef55a719db5 ]
While running the self-tests on a KASAN enabled kernel, I observed a slab-out-of-bounds splat very similar to the one reported in commit 821bbf79fe46 ("ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions").
We additionally need to take care of fib6_metrics initialization failure when the caller provides an nh.
The fix is similar, explicitly free the route instead of calling fib6_info_release on a half-initialized object.
Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index ccff4738313c..62db3c98424b 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3640,7 +3640,7 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, err = PTR_ERR(rt->fib6_metrics); /* Do not leave garbage there. */ rt->fib6_metrics = (struct dst_metrics *)&dst_default_metrics; - goto out; + goto out_free; }
if (cfg->fc_flags & RTF_ADDRCONF)
From: Luis Henriques lhenriques@suse.de
[ Upstream commit cdb330f4b41ab55feb35487729e883c9e08b8a54 ]
If MDSs aren't available while mounting a filesystem, the session state will transition from SESSION_OPENING to SESSION_CLOSING. And in that scenario check_session_state() will be called from delayed_work() and trigger this WARN.
Avoid this by only WARNing after a session has already been established (i.e., the s_ttl will be different from 0).
Fixes: 62575e270f66 ("ceph: check session state after bumping session->s_seq") Signed-off-by: Luis Henriques lhenriques@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/mds_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index d560752b764d..6b00f1d7c8e7 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4401,7 +4401,7 @@ bool check_session_state(struct ceph_mds_session *s) break; case CEPH_MDS_SESSION_CLOSING: /* Should never reach this when we're unmounting */ - WARN_ON_ONCE(true); + WARN_ON_ONCE(s->s_ttl); fallthrough; case CEPH_MDS_SESSION_NEW: case CEPH_MDS_SESSION_RESTARTING:
From: Zhihao Cheng chengzhihao1@huawei.com
[ Upstream commit 7764656b108cd308c39e9a8554353b8f9ca232a3 ]
Followling process: nvme_probe nvme_reset_ctrl nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) queue_work(nvme_reset_wq, &ctrl->reset_work)
--------------> nvme_remove nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING) worker_thread process_one_work nvme_reset_work WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)
, which will trigger WARN_ON in nvme_reset_work(): [ 127.534298] WARNING: CPU: 0 PID: 139 at drivers/nvme/host/pci.c:2594 [ 127.536161] CPU: 0 PID: 139 Comm: kworker/u8:7 Not tainted 5.13.0 [ 127.552518] Call Trace: [ 127.552840] ? kvm_sched_clock_read+0x25/0x40 [ 127.553936] ? native_send_call_func_single_ipi+0x1c/0x30 [ 127.555117] ? send_call_function_single_ipi+0x9b/0x130 [ 127.556263] ? __smp_call_single_queue+0x48/0x60 [ 127.557278] ? ttwu_queue_wakelist+0xfa/0x1c0 [ 127.558231] ? try_to_wake_up+0x265/0x9d0 [ 127.559120] ? ext4_end_io_rsv_work+0x160/0x290 [ 127.560118] process_one_work+0x28c/0x640 [ 127.561002] worker_thread+0x39a/0x700 [ 127.561833] ? rescuer_thread+0x580/0x580 [ 127.562714] kthread+0x18c/0x1e0 [ 127.563444] ? set_kthread_struct+0x70/0x70 [ 127.564347] ret_from_fork+0x1f/0x30
The preceding problem can be easily reproduced by executing following script (based on blktests suite): test() { pdev="$(_get_pci_dev_from_blkdev)" sysfs="/sys/bus/pci/devices/${pdev}" for ((i = 0; i < 10; i++)); do echo 1 > "$sysfs/remove" echo 1 > /sys/bus/pci/rescan done }
Since the device ctrl could be updated as an non-RESETTING state by repeating probe/remove in userspace (which is a normal situation), we can replace stack dumping WARN_ON with a warnning message.
Fixes: 82b057caefaff ("nvme-pci: fix multiple ctrl removal schedulin") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 80e1d45b0668..fb48a88d1acb 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2596,7 +2596,9 @@ static void nvme_reset_work(struct work_struct *work) bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL); int result;
- if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)) { + if (dev->ctrl.state != NVME_CTRL_RESETTING) { + dev_warn(dev->ctrl.device, "ctrl state %d is not RESETTING\n", + dev->ctrl.state); result = -ENODEV; goto out; }
From: Vincent Palatin vpalatin@chromium.org
[ Upstream commit f3a1a937f7b240be623d989c8553a6d01465d04f ]
This reverts commit 0bd860493f81eb2a46173f6f5e44cc38331c8dbd.
While the patch was working as stated,ie preventing the L850-GL LTE modem from crashing on some U3 wake-ups due to a race condition between the host wake-up and the modem-side wake-up, when using the MBIM interface, this would force disabling the USB runtime PM on the device.
The increased power consumption is significant for LTE laptops, and given that with decently recent modem firmwares, when the modem hits the bug, it automatically recovers (ie it drops from the bus, but automatically re-enumerates after less than half a second, rather than being stuck until a power cycle as it was doing with ancient firmware), for most people, the trade-off now seems in favor of re-enabling it by default.
For people with access to the platform code, the bug can also be worked-around successfully by changing the USB3 LFPM polling off-time for the XHCI controller in the BIOS code.
Signed-off-by: Vincent Palatin vpalatin@chromium.org Link: https://lore.kernel.org/r/20210721092516.2775971-1-vpalatin@chromium.org Fixes: 0bd860493f81 ("USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/core/quirks.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 21e7522655ac..a54a735b6384 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -502,10 +502,6 @@ static const struct usb_device_id usb_quirk_list[] = { /* DJI CineSSD */ { USB_DEVICE(0x2ca3, 0x0031), .driver_info = USB_QUIRK_NO_LPM },
- /* Fibocom L850-GL LTE Modem */ - { USB_DEVICE(0x2cb7, 0x0007), .driver_info = - USB_QUIRK_IGNORE_REMOTE_WAKEUP }, - /* INTEL VALUE SSD */ { USB_DEVICE(0x8086, 0xf1a5), .driver_info = USB_QUIRK_RESET_RESUME },
From: David Howells dhowells@redhat.com
[ Upstream commit 6c881ca0b3040f3e724eae513117ba4ddef86057 ]
To quote Alexey[1]:
I was adding custom tracepoint to the kernel, grabbed full F34 kernel .config, disabled modules and booted whole shebang as VM kernel.
Then did
perf record -a -e ...
It crashed:
general protection fault, probably for non-canonical address 0x435f5346592e4243: 0000 [#1] SMP PTI CPU: 1 PID: 842 Comm: cat Not tainted 5.12.6+ #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:t_show+0x22/0xd0
Then reproducer was narrowed to
# cat /sys/kernel/tracing/printk_formats
Original F34 kernel with modules didn't crash.
So I started to disable options and after disabling AFS everything started working again.
The root cause is that AFS was placing char arrays content into a section full of _pointers_ to strings with predictable consequences.
Non canonical address 435f5346592e4243 is "CB.YFS_" which came from CM_NAME macro.
Steps to reproduce:
CONFIG_AFS=y CONFIG_TRACING=y
# cat /sys/kernel/tracing/printk_formats
Fix this by the following means:
(1) Add enum->string translation tables in the event header with the AFS and YFS cache/callback manager operations listed by RPC operation ID.
(2) Modify the afs_cb_call tracepoint to print the string from the translation table rather than using the string at the afs_call name pointer.
(3) Switch translation table depending on the service we're being accessed as (AFS or YFS) in the tracepoint print clause. Will this cause problems to userspace utilities?
Note that the symbolic representation of the YFS service ID isn't available to this header, so I've put it in as a number. I'm not sure if this is the best way to do this.
(4) Remove the name wrangling (CM_NAME) macro and put the names directly into the afs_call_type structs in cmservice.c.
Fixes: 8e8d7f13b6d5a9 ("afs: Add some tracepoints") Reported-by: Alexey Dobriyan (SK hynix) adobriyan@gmail.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Steven Rostedt (VMware) rostedt@goodmis.org Reviewed-by: Marc Dionne marc.dionne@auristor.com cc: Andrew Morton akpm@linux-foundation.org cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLAXfvZ+rObEOdc%2F@localhost.localdomain/ [1] Link: https://lore.kernel.org/r/643721.1623754699@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162430903582.2896199.6098150063997983353.stgit@war... # v1 Link: https://lore.kernel.org/r/162609463957.3133237.15916579353149746363.stgit@wa... # v1 (repost) Link: https://lore.kernel.org/r/162610726860.3408253.445207609466288531.stgit@wart... # v2 Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/cmservice.c | 25 ++++---------- include/trace/events/afs.h | 67 +++++++++++++++++++++++++++++++++++--- 2 files changed, 69 insertions(+), 23 deletions(-)
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index a4e9e6e07e93..2a528b70478c 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -29,16 +29,11 @@ static void SRXAFSCB_TellMeAboutYourself(struct work_struct *);
static int afs_deliver_yfs_cb_callback(struct afs_call *);
-#define CM_NAME(name) \ - char afs_SRXCB##name##_name[] __tracepoint_string = \ - "CB." #name - /* * CB.CallBack operation type */ -static CM_NAME(CallBack); static const struct afs_call_type afs_SRXCBCallBack = { - .name = afs_SRXCBCallBack_name, + .name = "CB.CallBack", .deliver = afs_deliver_cb_callback, .destructor = afs_cm_destructor, .work = SRXAFSCB_CallBack, @@ -47,9 +42,8 @@ static const struct afs_call_type afs_SRXCBCallBack = { /* * CB.InitCallBackState operation type */ -static CM_NAME(InitCallBackState); static const struct afs_call_type afs_SRXCBInitCallBackState = { - .name = afs_SRXCBInitCallBackState_name, + .name = "CB.InitCallBackState", .deliver = afs_deliver_cb_init_call_back_state, .destructor = afs_cm_destructor, .work = SRXAFSCB_InitCallBackState, @@ -58,9 +52,8 @@ static const struct afs_call_type afs_SRXCBInitCallBackState = { /* * CB.InitCallBackState3 operation type */ -static CM_NAME(InitCallBackState3); static const struct afs_call_type afs_SRXCBInitCallBackState3 = { - .name = afs_SRXCBInitCallBackState3_name, + .name = "CB.InitCallBackState3", .deliver = afs_deliver_cb_init_call_back_state3, .destructor = afs_cm_destructor, .work = SRXAFSCB_InitCallBackState, @@ -69,9 +62,8 @@ static const struct afs_call_type afs_SRXCBInitCallBackState3 = { /* * CB.Probe operation type */ -static CM_NAME(Probe); static const struct afs_call_type afs_SRXCBProbe = { - .name = afs_SRXCBProbe_name, + .name = "CB.Probe", .deliver = afs_deliver_cb_probe, .destructor = afs_cm_destructor, .work = SRXAFSCB_Probe, @@ -80,9 +72,8 @@ static const struct afs_call_type afs_SRXCBProbe = { /* * CB.ProbeUuid operation type */ -static CM_NAME(ProbeUuid); static const struct afs_call_type afs_SRXCBProbeUuid = { - .name = afs_SRXCBProbeUuid_name, + .name = "CB.ProbeUuid", .deliver = afs_deliver_cb_probe_uuid, .destructor = afs_cm_destructor, .work = SRXAFSCB_ProbeUuid, @@ -91,9 +82,8 @@ static const struct afs_call_type afs_SRXCBProbeUuid = { /* * CB.TellMeAboutYourself operation type */ -static CM_NAME(TellMeAboutYourself); static const struct afs_call_type afs_SRXCBTellMeAboutYourself = { - .name = afs_SRXCBTellMeAboutYourself_name, + .name = "CB.TellMeAboutYourself", .deliver = afs_deliver_cb_tell_me_about_yourself, .destructor = afs_cm_destructor, .work = SRXAFSCB_TellMeAboutYourself, @@ -102,9 +92,8 @@ static const struct afs_call_type afs_SRXCBTellMeAboutYourself = { /* * YFS CB.CallBack operation type */ -static CM_NAME(YFS_CallBack); static const struct afs_call_type afs_SRXYFSCB_CallBack = { - .name = afs_SRXCBYFS_CallBack_name, + .name = "YFSCB.CallBack", .deliver = afs_deliver_yfs_cb_callback, .destructor = afs_cm_destructor, .work = SRXAFSCB_CallBack, diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h index 4eef374d4413..5deb9f490f6f 100644 --- a/include/trace/events/afs.h +++ b/include/trace/events/afs.h @@ -174,6 +174,34 @@ enum afs_vl_operation { afs_VL_GetCapabilities = 65537, /* AFS Get VL server capabilities */ };
+enum afs_cm_operation { + afs_CB_CallBack = 204, /* AFS break callback promises */ + afs_CB_InitCallBackState = 205, /* AFS initialise callback state */ + afs_CB_Probe = 206, /* AFS probe client */ + afs_CB_GetLock = 207, /* AFS get contents of CM lock table */ + afs_CB_GetCE = 208, /* AFS get cache file description */ + afs_CB_GetXStatsVersion = 209, /* AFS get version of extended statistics */ + afs_CB_GetXStats = 210, /* AFS get contents of extended statistics data */ + afs_CB_InitCallBackState3 = 213, /* AFS initialise callback state, version 3 */ + afs_CB_ProbeUuid = 214, /* AFS check the client hasn't rebooted */ +}; + +enum yfs_cm_operation { + yfs_CB_Probe = 206, /* YFS probe client */ + yfs_CB_GetLock = 207, /* YFS get contents of CM lock table */ + yfs_CB_XStatsVersion = 209, /* YFS get version of extended statistics */ + yfs_CB_GetXStats = 210, /* YFS get contents of extended statistics data */ + yfs_CB_InitCallBackState3 = 213, /* YFS initialise callback state, version 3 */ + yfs_CB_ProbeUuid = 214, /* YFS check the client hasn't rebooted */ + yfs_CB_GetServerPrefs = 215, + yfs_CB_GetCellServDV = 216, + yfs_CB_GetLocalCell = 217, + yfs_CB_GetCacheConfig = 218, + yfs_CB_GetCellByNum = 65537, + yfs_CB_TellMeAboutYourself = 65538, /* get client capabilities */ + yfs_CB_CallBack = 64204, +}; + enum afs_edit_dir_op { afs_edit_dir_create, afs_edit_dir_create_error, @@ -435,6 +463,32 @@ enum afs_cb_break_reason { EM(afs_YFSVL_GetCellName, "YFSVL.GetCellName") \ E_(afs_VL_GetCapabilities, "VL.GetCapabilities")
+#define afs_cm_operations \ + EM(afs_CB_CallBack, "CB.CallBack") \ + EM(afs_CB_InitCallBackState, "CB.InitCallBackState") \ + EM(afs_CB_Probe, "CB.Probe") \ + EM(afs_CB_GetLock, "CB.GetLock") \ + EM(afs_CB_GetCE, "CB.GetCE") \ + EM(afs_CB_GetXStatsVersion, "CB.GetXStatsVersion") \ + EM(afs_CB_GetXStats, "CB.GetXStats") \ + EM(afs_CB_InitCallBackState3, "CB.InitCallBackState3") \ + E_(afs_CB_ProbeUuid, "CB.ProbeUuid") + +#define yfs_cm_operations \ + EM(yfs_CB_Probe, "YFSCB.Probe") \ + EM(yfs_CB_GetLock, "YFSCB.GetLock") \ + EM(yfs_CB_XStatsVersion, "YFSCB.XStatsVersion") \ + EM(yfs_CB_GetXStats, "YFSCB.GetXStats") \ + EM(yfs_CB_InitCallBackState3, "YFSCB.InitCallBackState3") \ + EM(yfs_CB_ProbeUuid, "YFSCB.ProbeUuid") \ + EM(yfs_CB_GetServerPrefs, "YFSCB.GetServerPrefs") \ + EM(yfs_CB_GetCellServDV, "YFSCB.GetCellServDV") \ + EM(yfs_CB_GetLocalCell, "YFSCB.GetLocalCell") \ + EM(yfs_CB_GetCacheConfig, "YFSCB.GetCacheConfig") \ + EM(yfs_CB_GetCellByNum, "YFSCB.GetCellByNum") \ + EM(yfs_CB_TellMeAboutYourself, "YFSCB.TellMeAboutYourself") \ + E_(yfs_CB_CallBack, "YFSCB.CallBack") + #define afs_edit_dir_ops \ EM(afs_edit_dir_create, "create") \ EM(afs_edit_dir_create_error, "c_fail") \ @@ -567,6 +621,8 @@ afs_server_traces; afs_cell_traces; afs_fs_operations; afs_vl_operations; +afs_cm_operations; +yfs_cm_operations; afs_edit_dir_ops; afs_edit_dir_reasons; afs_eproto_causes; @@ -647,20 +703,21 @@ TRACE_EVENT(afs_cb_call,
TP_STRUCT__entry( __field(unsigned int, call ) - __field(const char *, name ) __field(u32, op ) + __field(u16, service_id ) ),
TP_fast_assign( __entry->call = call->debug_id; - __entry->name = call->type->name; __entry->op = call->operation_ID; + __entry->service_id = call->service_id; ),
- TP_printk("c=%08x %s o=%u", + TP_printk("c=%08x %s", __entry->call, - __entry->name, - __entry->op) + __entry->service_id == 2501 ? + __print_symbolic(__entry->op, yfs_cm_operations) : + __print_symbolic(__entry->op, afs_cm_operations)) );
TRACE_EVENT(afs_call,
From: Sayanta Pattanayak sayanta.pattanayak@arm.com
[ Upstream commit e9a72f874d5b95cef0765bafc56005a50f72c5fe ]
When registering the MDIO bus for a r8169 device, we use the PCI bus/device specifier as a (seemingly) unique device identifier. However the very same BDF number can be used on another PCI segment, which makes the driver fail probing:
[ 27.544136] r8169 0002:07:00.0: enabling device (0000 -> 0003) [ 27.559734] sysfs: cannot create duplicate filename '/class/mdio_bus/r8169-700' .... [ 27.684858] libphy: mii_bus r8169-700 failed to register [ 27.695602] r8169: probe of 0002:07:00.0 failed with error -22
Add the segment number to the device name to make it more unique.
This fixes operation on ARM N1SDP boards, with two boards connected together to form an SMP system, and all on-board devices showing up twice, just on different PCI segments. A similar issue would occur on large systems with many PCI slots and multiple RTL8169 NICs.
Fixes: f1e911d5d0dfd ("r8169: add basic phylib support") Signed-off-by: Sayanta Pattanayak sayanta.pattanayak@arm.com [Andre: expand commit message, use pci_domain_nr()] Signed-off-by: Andre Przywara andre.przywara@arm.com Acked-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 9010aabd9782..e690a1b09e98 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -5160,7 +5160,8 @@ static int r8169_mdio_register(struct rtl8169_private *tp) new_bus->priv = tp; new_bus->parent = &pdev->dev; new_bus->irq[0] = PHY_IGNORE_INTERRUPT; - snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x", pci_dev_id(pdev)); + snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x-%x", + pci_domain_nr(pdev->bus), pci_dev_id(pdev));
new_bus->read = r8169_mdio_read_reg; new_bus->write = r8169_mdio_write_reg;
From: Christoph Hellwig hch@lst.de
[ Upstream commit aaeb7bb061be545251606f4d9c82d710ca2a7c8e ]
When using Write Zeroes on a namespace that has protection information enabled they behavior without the PRACT bit counter-intuitive and will generally lead to validation failures when reading the written blocks. Fix this by always setting the PRACT bit that generates matching PI data on the fly.
Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f520a71a361f..ff5a16b17133 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -751,7 +751,10 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns, cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req))); cmnd->write_zeroes.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1); - cmnd->write_zeroes.control = 0; + if (nvme_ns_has_pi(ns)) + cmnd->write_zeroes.control = cpu_to_le16(NVME_RW_PRINFO_PRACT); + else + cmnd->write_zeroes.control = 0; return BLK_STS_OK; }
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 58acd10092268831e49de279446c314727101292 ]
syzbot reported a call trace:
BUG: KASAN: use-after-free in sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 Call Trace: sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 sctp_set_owner_w net/sctp/socket.c:131 [inline] sctp_sendmsg_to_asoc+0x152e/0x2180 net/sctp/socket.c:1865 sctp_sendmsg+0x103b/0x1d30 net/sctp/socket.c:2027 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:821 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:723
This is an use-after-free issue caused by not updating asoc->shkey after it was replaced in the key list asoc->endpoint_shared_keys, and the old key was freed.
This patch is to fix by also updating active_key for asoc when old key is being replaced with a new one. Note that this issue doesn't exist in sctp_auth_del_key_id(), as it's not allowed to delete the active_key from the asoc.
Fixes: 1b1e0bc99474 ("sctp: add refcnt support for sh_key") Reported-by: syzbot+b774577370208727d12b@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/auth.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/sctp/auth.c b/net/sctp/auth.c index 6f8319b828b0..fe74c5f95630 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -860,6 +860,8 @@ int sctp_auth_set_key(struct sctp_endpoint *ep, if (replace) { list_del_init(&shkey->key_list); sctp_auth_shkey_release(shkey); + if (asoc && asoc->active_key_id == auth_key->sca_keynumber) + sctp_auth_asoc_init_active_key(asoc, GFP_KERNEL); } list_add(&cur_key->key_list, sh_keys);
From: Wei Wang weiwan@google.com
[ Upstream commit 213ad73d06073b197a02476db3a4998e219ddb06 ]
Multiple complaints have been raised from the TFO users on the internet stating that the TFO blackhole logic is too aggressive and gets falsely triggered too often. (e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/) Considering that most middleboxes no longer drop TFO packets, we decide to disable the blackhole logic by setting /proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.
Fixes: cf1ef3f0719b4 ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Wei Wang weiwan@google.com Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/ip-sysctl.rst | 2 +- net/ipv4/tcp_fastopen.c | 9 ++++++++- net/ipv4/tcp_ipv4.c | 2 +- 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 4abcfff15e38..4822a058a81d 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -751,7 +751,7 @@ tcp_fastopen_blackhole_timeout_sec - INTEGER initial value when the blackhole issue goes away. 0 to disable the blackhole detection.
- By default, it is set to 1hr. + By default, it is set to 0 (feature is disabled).
tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs The list consists of a primary key and an optional backup key. The diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 08548ff23d83..d49709ba8e16 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -507,6 +507,9 @@ void tcp_fastopen_active_disable(struct sock *sk) { struct net *net = sock_net(sk);
+ if (!sock_net(sk)->ipv4.sysctl_tcp_fastopen_blackhole_timeout) + return; + /* Paired with READ_ONCE() in tcp_fastopen_active_should_disable() */ WRITE_ONCE(net->ipv4.tfo_active_disable_stamp, jiffies);
@@ -526,10 +529,14 @@ void tcp_fastopen_active_disable(struct sock *sk) bool tcp_fastopen_active_should_disable(struct sock *sk) { unsigned int tfo_bh_timeout = sock_net(sk)->ipv4.sysctl_tcp_fastopen_blackhole_timeout; - int tfo_da_times = atomic_read(&sock_net(sk)->ipv4.tfo_active_disable_times); unsigned long timeout; + int tfo_da_times; int multiplier;
+ if (!tfo_bh_timeout) + return false; + + tfo_da_times = atomic_read(&sock_net(sk)->ipv4.tfo_active_disable_times); if (!tfo_da_times) return false;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 5212db9ea157..04e259a04443 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2913,7 +2913,7 @@ static int __net_init tcp_sk_init(struct net *net) net->ipv4.sysctl_tcp_comp_sack_nr = 44; net->ipv4.sysctl_tcp_fastopen = TFO_CLIENT_ENABLE; spin_lock_init(&net->ipv4.tcp_fastopen_ctx_lock); - net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 60 * 60; + net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 0; atomic_set(&net->ipv4.tfo_active_disable_times, 0);
/* Reno is always built in */
Hi!
[ Upstream commit 213ad73d06073b197a02476db3a4998e219ddb06 ]
Multiple complaints have been raised from the TFO users on the internet stating that the TFO blackhole logic is too aggressive and gets falsely triggered too often. (e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/) Considering that most middleboxes no longer drop TFO packets, we decide to disable the blackhole logic by setting /proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.
I understand this makes sense for mainline, but should we have this in stable? Somebody may still be using broken middlebox with their "stable" server.
Best regards, Pavel
On Wed, Jul 28, 2021 at 3:12 AM Pavel Machek pavel@denx.de wrote:
Hi!
[ Upstream commit 213ad73d06073b197a02476db3a4998e219ddb06 ]
Multiple complaints have been raised from the TFO users on the internet stating that the TFO blackhole logic is too aggressive and gets falsely triggered too often. (e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/) Considering that most middleboxes no longer drop TFO packets, we decide to disable the blackhole logic by setting /proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.
I understand this makes sense for mainline, but should we have this in stable? Somebody may still be using broken middlebox with their "stable" server.
Thank you Pavel for raising this issue. You made a good point.
The enabled-by-default policy has caused disruptions to applications. We have received quite a few others over the years beside the cited report. Other major TFO implementations (e.g. iOS, Windows) do not have such mechanisms and seem to work fine.
On the other hand maybe we do not hear middlebox issues because this mechanism is working. So I am okay to avoid applying to stable and keep in net-next to test this new policy.
Best regards, Pavel
-- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
On Wed, Jul 28, 2021 at 09:32:42 -0700 Yuchung Cheng ycheng@google.com wrote:
On the other hand maybe we do not hear middlebox issues because this mechanism is working. So I am okay to avoid applying to stable and keep in net-next to test this new policy.
This change did indeed break our mail servers at Wikimedia, causing difficult to diagnose timeout errors on sending outgoing email. I resorted to bisecting the kernel, which resulted in finding this commit. I have verified that reverting the sysctl value for tcp_fastopen_blackhole_timeout_sec to 3600 does resolve the timeouts.
Given that it is not clear how a user would discover that this sysctl has changed, or know how to fix a middle box somewhere on a path to their destination, I would love to see this change reverted.
Yours kindly, Jesse Hathaway
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit e40cba9490bab1414d45c2d62defc0ad4f6e4136 ]
This simple series of commands:
ip link add br0 type bridge vlan_filtering 1 ip link set swp0 master br0
fails on sja1105 with the following error: [ 33.439103] sja1105 spi0.1: vlan-lookup-table needs to have at least the default untagged VLAN [ 33.447710] sja1105 spi0.1: Invalid config, cannot upload Warning: sja1105: Failed to change VLAN Ethertype.
For context, sja1105 has 3 operating modes: - SJA1105_VLAN_UNAWARE: the dsa_8021q_vlans are committed to hardware - SJA1105_VLAN_FILTERING_FULL: the bridge_vlans are committed to hardware - SJA1105_VLAN_FILTERING_BEST_EFFORT: both the dsa_8021q_vlans and the bridge_vlans are committed to hardware
Swapping out a VLAN list and another in happens in sja1105_build_vlan_table(), which performs a delta update procedure. That function is called from a few places, notably from sja1105_vlan_filtering() which is called from the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler.
The above set of 2 commands fails when run on a kernel pre-commit 8841f6e63f2c ("net: dsa: sja1105: make devlink property best_effort_vlan_filtering true by default"). So the priv->vlan_state transition that takes place is between VLAN-unaware and full VLAN filtering. So the dsa_8021q_vlans are swapped out and the bridge_vlans are swapped in.
So why does it fail?
Well, the bridge driver, through nbp_vlan_init(), first sets up the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING attribute, and only then proceeds to call nbp_vlan_add for the default_pvid.
So when we swap out the dsa_8021q_vlans and swap in the bridge_vlans in the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler, there are no bridge VLANs (yet). So we have wiped the VLAN table clean, and the low-level static config checker complains of an invalid configuration. We _will_ add the bridge VLANs using the dynamic config interface, albeit later, when nbp_vlan_add() calls us. So it is natural that it fails.
So why did it ever work?
Surprisingly, it looks like I only tested this configuration with 2 things set up in a particular way: - a network manager that brings all ports up - a kernel with CONFIG_VLAN_8021Q=y
It is widely known that commit ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") installs VID 0 to every net device that comes up. DSA treats these VLANs as bridge VLANs, and therefore, in my testing, the list of bridge_vlans was never empty.
However, if CONFIG_VLAN_8021Q is not enabled, or the port is not up when it joins a VLAN-aware bridge, the bridge_vlans list will be temporarily empty, and the sja1105_static_config_reload() call from sja1105_vlan_filtering() will fail.
To fix this, the simplest thing is to keep VID 4095, the one used for CPU-injected control packets since commit ed040abca4c1 ("net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic"), in the list of bridge VLANs too, not just the list of tag_8021q VLANs. This ensures that the list of bridge VLANs will never be empty.
Fixes: ec5ae61076d0 ("net: dsa: sja1105: save/restore VLANs using a delta commit method") Reported-by: Radu Pirea (NXP OSS) radu-nicolae.pirea@oss.nxp.com Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/sja1105/sja1105_main.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 82852c57cc0e..82b918d36117 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -350,6 +350,12 @@ static int sja1105_init_static_vlan(struct sja1105_private *priv) if (dsa_is_cpu_port(ds, port)) v->pvid = true; list_add(&v->list, &priv->dsa_8021q_vlans); + + v = kmemdup(v, sizeof(*v), GFP_KERNEL); + if (!v) + return -ENOMEM; + + list_add(&v->list, &priv->bridge_vlans); }
((struct sja1105_vlan_lookup_entry *)table->entries)[0] = pvid;
From: Yajun Deng yajun.deng@linux.dev
[ Upstream commit 9d85a6f44bd5585761947f40f7821c9cd78a1bbe ]
The 4th parameter in tc_chain_notify() should be flags rather than seq. Let's change it back correctly.
Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Yajun Deng yajun.deng@linux.dev Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 30090794b791..31ac76a9189e 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -2905,7 +2905,7 @@ replay: break; case RTM_GETCHAIN: err = tc_chain_notify(chain, skb, n->nlmsg_seq, - n->nlmsg_seq, n->nlmsg_type, true); + n->nlmsg_flags, n->nlmsg_type, true); if (err < 0) NL_SET_ERR_MSG(extack, "Failed to send chain notify message"); break;
From: Maxime Ripard maxime@cerno.tech
[ Upstream commit 7bbcb919e32d776ca8ddce08abb391ab92eef6a9 ]
The mipi_dsi_device allocated by mipi_dsi_device_register_full() is already free'd on release.
Fixes: 2f733d6194bd ("drm/panel: Add support for the Raspberry Pi 7" Touchscreen.") Signed-off-by: Maxime Ripard maxime@cerno.tech Reviewed-by: Sam Ravnborg sam@ravnborg.org Link: https://patchwork.freedesktop.org/patch/msgid/20210720134525.563936-9-maxime... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c b/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c index 5e9ccefb88f6..bbdd086be7f5 100644 --- a/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c +++ b/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c @@ -447,7 +447,6 @@ static int rpi_touchscreen_remove(struct i2c_client *i2c) drm_panel_remove(&ts->base);
mipi_dsi_device_unregister(ts->dsi); - kfree(ts->dsi);
return 0; }
From: Ronnie Sahlberg lsahlber@redhat.com
[ Upstream commit 2485bd7557a7edb4520b4072af464f0a08c8efe0 ]
We only allow sending single credit writes through the SMB2_write() synchronous api so split this into smaller chunks.
Fixes: 966a3cb7c7db ("cifs: improve fallocate emulation")
Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Reported-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2ops.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index f6ceb79a995d..442bf422aa01 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -3466,7 +3466,7 @@ static int smb3_simple_fallocate_write_range(unsigned int xid, char *buf) { struct cifs_io_parms io_parms = {0}; - int nbytes; + int rc, nbytes; struct kvec iov[2];
io_parms.netfid = cfile->fid.netfid; @@ -3474,13 +3474,25 @@ static int smb3_simple_fallocate_write_range(unsigned int xid, io_parms.tcon = tcon; io_parms.persistent_fid = cfile->fid.persistent_fid; io_parms.volatile_fid = cfile->fid.volatile_fid; - io_parms.offset = off; - io_parms.length = len;
- /* iov[0] is reserved for smb header */ - iov[1].iov_base = buf; - iov[1].iov_len = io_parms.length; - return SMB2_write(xid, &io_parms, &nbytes, iov, 1); + while (len) { + io_parms.offset = off; + io_parms.length = len; + if (io_parms.length > SMB2_MAX_BUFFER_SIZE) + io_parms.length = SMB2_MAX_BUFFER_SIZE; + /* iov[0] is reserved for smb header */ + iov[1].iov_base = buf; + iov[1].iov_len = io_parms.length; + rc = SMB2_write(xid, &io_parms, &nbytes, iov, 1); + if (rc) + break; + if (nbytes > len) + return -EINVAL; + buf += nbytes; + off += nbytes; + len -= nbytes; + } + return rc; }
static int smb3_simple_fallocate_range(unsigned int xid,
Hi!
[ Upstream commit 2485bd7557a7edb4520b4072af464f0a08c8efe0 ]
We only allow sending single credit writes through the SMB2_write() synchronous api so split this into smaller chunks.
I'm not sure if this matters, but if len is ever zero, we'll return uninitialized value from the function.
Best regards, Pavel
+++ b/fs/cifs/smb2ops.c @@ -3466,7 +3466,7 @@ static int smb3_simple_fallocate_write_range(unsigned int xid, char *buf) { struct cifs_io_parms io_parms = {0};
- int nbytes;
- int rc, nbytes; struct kvec iov[2];
io_parms.netfid = cfile->fid.netfid; @@ -3474,13 +3474,25 @@ static int smb3_simple_fallocate_write_range(unsigned int xid, io_parms.tcon = tcon; io_parms.persistent_fid = cfile->fid.persistent_fid; io_parms.volatile_fid = cfile->fid.volatile_fid;
- io_parms.offset = off;
- io_parms.length = len;
- /* iov[0] is reserved for smb header */
- iov[1].iov_base = buf;
- iov[1].iov_len = io_parms.length;
- return SMB2_write(xid, &io_parms, &nbytes, iov, 1);
- while (len) {
io_parms.offset = off;
io_parms.length = len;
if (io_parms.length > SMB2_MAX_BUFFER_SIZE)
io_parms.length = SMB2_MAX_BUFFER_SIZE;
/* iov[0] is reserved for smb header */
iov[1].iov_base = buf;
iov[1].iov_len = io_parms.length;
rc = SMB2_write(xid, &io_parms, &nbytes, iov, 1);
if (rc)
break;
if (nbytes > len)
return -EINVAL;
buf += nbytes;
off += nbytes;
len -= nbytes;
- }
- return rc;
} static int smb3_simple_fallocate_range(unsigned int xid,
From: Ronnie Sahlberg lsahlber@redhat.com
[ Upstream commit 488968a8945c119859d91bb6a8dc13bf50002f15 ]
Remove the conditional checking for out_data_len and skipping the fallocate if it is 0. This is wrong will actually change any legitimate the fallocate where the entire region is unallocated into a no-op.
Additionally, before allocating the range, if FALLOC_FL_KEEP_SIZE is set then we need to clamp the length of the fallocate region as to not extend the size of the file.
Fixes: 966a3cb7c7db ("cifs: improve fallocate emulation") Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2ops.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 442bf422aa01..b0b06eb86edf 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -3516,11 +3516,6 @@ static int smb3_simple_fallocate_range(unsigned int xid, (char **)&out_data, &out_data_len); if (rc) goto out; - /* - * It is already all allocated - */ - if (out_data_len == 0) - goto out;
buf = kzalloc(1024 * 1024, GFP_KERNEL); if (buf == NULL) { @@ -3643,6 +3638,24 @@ static long smb3_simple_falloc(struct file *file, struct cifs_tcon *tcon, goto out; }
+ if (keep_size == true) { + /* + * We can not preallocate pages beyond the end of the file + * in SMB2 + */ + if (off >= i_size_read(inode)) { + rc = 0; + goto out; + } + /* + * For fallocates that are partially beyond the end of file, + * clamp len so we only fallocate up to the end of file. + */ + if (off + len > i_size_read(inode)) { + len = i_size_read(inode) - off; + } + } + if ((keep_size == true) || (i_size_read(inode) >= off + len)) { /* * At this point, we are trying to fallocate an internal
From: Marcelo Henrique Cerri marcelo.cerri@canonical.com
[ Upstream commit d238692b4b9f2c36e35af4c6e6f6da36184aeb3e ]
Use size_t when capping the count argument received by mem_rw(). Since count is size_t, using min_t(int, ...) can lead to a negative value that will later be passed to access_remote_vm(), which can cause unexpected behavior.
Since we are capping the value to at maximum PAGE_SIZE, the conversion from size_t to int when passing it to access_remote_vm() as "len" shouldn't be a problem.
Link: https://lkml.kernel.org/r/20210512125215.3348316-1-marcelo.cerri@canonical.c... Reviewed-by: David Disseldorp ddiss@suse.de Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@canonical.com Signed-off-by: Marcelo Henrique Cerri marcelo.cerri@canonical.com Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Souza Cascardo cascardo@canonical.com Cc: Christian Brauner christian.brauner@ubuntu.com Cc: Michel Lespinasse walken@google.com Cc: Helge Deller deller@gmx.de Cc: Oleg Nesterov oleg@redhat.com Cc: Lorenzo Stoakes lstoakes@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c index df9b17dd92cb..5d52aea8d7e7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -855,7 +855,7 @@ static ssize_t mem_rw(struct file *file, char __user *buf, flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
while (count > 0) { - int this_len = min_t(int, count, PAGE_SIZE); + size_t this_len = min_t(size_t, count, PAGE_SIZE);
if (write && copy_from_user(page, buf, this_len)) { copied = -EFAULT;
From: Stephen Boyd swboyd@chromium.org
commit 10252bae863d09b9648bed2e035572d207200ca1 upstream.
There's a chance that the IDA allocated in mmc_alloc_host() is not freed for some time because it's freed as part of a class' release function (see mmc_host_classdev_release() where the IDA is freed). If another thread is holding a reference to the class, then only once all balancing device_put() calls (in turn calling kobject_put()) have been made will the IDA be released and usable again.
Normally this isn't a problem because the kobject is released before anything else that may want to use the same number tries to again, but with CONFIG_DEBUG_KOBJECT_RELEASE=y and OF aliases it becomes pretty easy to try to allocate an alias from the IDA twice while the first time it was allocated is still pending a call to ida_simple_remove(). It's also possible to trigger it by using CONFIG_DEBUG_KOBJECT_RELEASE and probe defering a driver at boot that calls mmc_alloc_host() before trying to get resources that may defer likes clks or regulators.
Instead of allocating from the IDA in this scenario, let's just skip it if we know this is an OF alias. The number is already "claimed" and devices that aren't using OF aliases won't try to use the claimed numbers anyway (see mmc_first_nonreserved_index()). This should avoid any issues with mmc_alloc_host() returning failures from the ida_simple_get() in the case that we're using an OF alias.
Cc: Matthias Schiffer matthias.schiffer@ew.tq-group.com Cc: Sujit Kautkar sujitka@chromium.org Reported-by: Zubin Mithra zsm@chromium.org Fixes: fa2d0aa96941 ("mmc: core: Allow setting slot index via device tree alias") Signed-off-by: Stephen Boyd swboyd@chromium.org Link: https://lore.kernel.org/r/20210623075002.1746924-3-swboyd@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/host.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
--- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c @@ -74,7 +74,8 @@ static void mmc_host_classdev_release(st { struct mmc_host *host = cls_dev_to_mmc_host(dev); wakeup_source_unregister(host->ws); - ida_simple_remove(&mmc_host_ida, host->index); + if (of_alias_get_id(host->parent->of_node, "mmc") < 0) + ida_simple_remove(&mmc_host_ida, host->index); kfree(host); }
@@ -436,7 +437,7 @@ static int mmc_first_nonreserved_index(v */ struct mmc_host *mmc_alloc_host(int extra, struct device *dev) { - int err; + int index; struct mmc_host *host; int alias_id, min_idx, max_idx;
@@ -449,20 +450,19 @@ struct mmc_host *mmc_alloc_host(int extr
alias_id = of_alias_get_id(dev->of_node, "mmc"); if (alias_id >= 0) { - min_idx = alias_id; - max_idx = alias_id + 1; + index = alias_id; } else { min_idx = mmc_first_nonreserved_index(); max_idx = 0; - }
- err = ida_simple_get(&mmc_host_ida, min_idx, max_idx, GFP_KERNEL); - if (err < 0) { - kfree(host); - return NULL; + index = ida_simple_get(&mmc_host_ida, min_idx, max_idx, GFP_KERNEL); + if (index < 0) { + kfree(host); + return NULL; + } }
- host->index = err; + host->index = index;
dev_set_name(&host->class_dev, "mmc%d", host->index); host->ws = wakeup_source_register(NULL, dev_name(&host->class_dev));
From: Vasily Gorbik gor@linux.ibm.com
commit f8c2602733c953ed7a16e060640b8e96f9d94b9b upstream.
s390 enforces DYNAMIC_FTRACE if FUNCTION_TRACER is selected. At the same time implementation of ftrace_caller is not compliant with HAVE_DYNAMIC_FTRACE since it doesn't provide implementation of ftrace_update_ftrace_func() and calls ftrace_trace_function() directly.
The subtle difference is that during ftrace code patching ftrace replaces function tracer via ftrace_update_ftrace_func() and activates it back afterwards. Unexpected direct calls to ftrace_trace_function() during ftrace code patching leads to nullptr-dereferences when tracing is activated for one of functions which are used during code patching. Those function currently are: copy_from_kernel_nofault() copy_from_kernel_nofault_allowed() preempt_count_sub() [with debug_defconfig] preempt_count_add() [with debug_defconfig]
Corresponding KASAN report: BUG: KASAN: nullptr-dereference in function_trace_call+0x316/0x3b0 Read of size 4 at addr 0000000000001e08 by task migration/0/15
CPU: 0 PID: 15 Comm: migration/0 Tainted: G B 5.13.0-41423-g08316af3644d Hardware name: IBM 3906 M04 704 (LPAR) Stopper: multi_cpu_stop+0x0/0x3e0 <- stop_machine_cpuslocked+0x1e4/0x218 Call Trace: [<0000000001f77caa>] show_stack+0x16a/0x1d0 [<0000000001f8de42>] dump_stack+0x15a/0x1b0 [<0000000001f81d56>] print_address_description.constprop.0+0x66/0x2e0 [<000000000082b0ca>] kasan_report+0x152/0x1c0 [<00000000004cfd8e>] function_trace_call+0x316/0x3b0 [<0000000001fb7082>] ftrace_caller+0x7a/0x7e [<00000000006bb3e6>] copy_from_kernel_nofault_allowed+0x6/0x10 [<00000000006bb42e>] copy_from_kernel_nofault+0x3e/0xd0 [<000000000014605c>] ftrace_make_call+0xb4/0x1f8 [<000000000047a1b4>] ftrace_replace_code+0x134/0x1d8 [<000000000047a6e0>] ftrace_modify_all_code+0x120/0x1d0 [<000000000047a7ec>] __ftrace_modify_code+0x5c/0x78 [<000000000042395c>] multi_cpu_stop+0x224/0x3e0 [<0000000000423212>] cpu_stopper_thread+0x33a/0x5a0 [<0000000000243ff2>] smpboot_thread_fn+0x302/0x708 [<00000000002329ea>] kthread+0x342/0x408 [<00000000001066b2>] __ret_from_fork+0x92/0xf0 [<0000000001fb57fa>] ret_from_fork+0xa/0x30
The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1 flags: 0x1ffff00000001000(reserved|node=0|zone=0|lastcpupid=0x1ffff) raw: 1ffff00000001000 0000040000000048 0000040000000048 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: 0000000000001d00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0000000000001d80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
0000000000001e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
^ 0000000000001e80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 0000000000001f00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 ==================================================================
To fix that introduce ftrace_func callback to be called from ftrace_caller and update it in ftrace_update_ftrace_func().
Fixes: 4cc9bed034d1 ("[S390] cleanup ftrace backend functions") Cc: stable@vger.kernel.org Reviewed-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/ftrace.h | 1 + arch/s390/kernel/ftrace.c | 2 ++ arch/s390/kernel/mcount.S | 4 ++-- 3 files changed, 5 insertions(+), 2 deletions(-)
--- a/arch/s390/include/asm/ftrace.h +++ b/arch/s390/include/asm/ftrace.h @@ -27,6 +27,7 @@ void ftrace_caller(void);
extern char ftrace_graph_caller_end; extern unsigned long ftrace_plt; +extern void *ftrace_func;
struct dyn_arch_ftrace { };
--- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -57,6 +57,7 @@ * > brasl %r0,ftrace_caller # offset 0 */
+void *ftrace_func __read_mostly = ftrace_stub; unsigned long ftrace_plt;
static inline void ftrace_generate_orig_insn(struct ftrace_insn *insn) @@ -120,6 +121,7 @@ int ftrace_make_call(struct dyn_ftrace *
int ftrace_update_ftrace_func(ftrace_func_t func) { + ftrace_func = func; return 0; }
--- a/arch/s390/kernel/mcount.S +++ b/arch/s390/kernel/mcount.S @@ -67,13 +67,13 @@ ENTRY(ftrace_caller) #ifdef CONFIG_HAVE_MARCH_Z196_FEATURES aghik %r2,%r0,-MCOUNT_INSN_SIZE lgrl %r4,function_trace_op - lgrl %r1,ftrace_trace_function + lgrl %r1,ftrace_func #else lgr %r2,%r0 aghi %r2,-MCOUNT_INSN_SIZE larl %r4,function_trace_op lg %r4,0(%r4) - larl %r1,ftrace_trace_function + larl %r1,ftrace_func lg %r1,0(%r1) #endif lgr %r3,%r14
From: Alexander Egorenkov egorenar@linux.ibm.com
commit 463f36c76fa4ec015c640ff63ccf52e7527abee0 upstream.
The DMA code section of the decompressor must be compiled with expolines if Spectre V2 mitigation has been enabled for the decompressed kernel. This is required because although the decompressor's image contains the DMA code section, it is handed over to the decompressed kernel for use.
Because the DMA code is already slow w/o expolines, use expolines always regardless whether the decompressed kernel is using them or not. This simplifies the DMA code by dropping the conditional compilation of expolines.
Fixes: bf72630130c2 ("s390: use proper expoline sections for .dma code") Cc: stable@vger.kernel.org # 5.2 Signed-off-by: Alexander Egorenkov egorenar@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/boot/text_dma.S | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-)
--- a/arch/s390/boot/text_dma.S +++ b/arch/s390/boot/text_dma.S @@ -9,16 +9,6 @@ #include <asm/errno.h> #include <asm/sigp.h>
-#ifdef CC_USING_EXPOLINE - .pushsection .dma.text.__s390_indirect_jump_r14,"axG" -__dma__s390_indirect_jump_r14: - larl %r1,0f - ex 0,0(%r1) - j . -0: br %r14 - .popsection -#endif - .section .dma.text,"ax" /* * Simplified version of expoline thunk. The normal thunks can not be used here, @@ -27,11 +17,10 @@ __dma__s390_indirect_jump_r14: * affects a few functions that are not performance-relevant. */ .macro BR_EX_DMA_r14 -#ifdef CC_USING_EXPOLINE - jg __dma__s390_indirect_jump_r14 -#else - br %r14 -#endif + larl %r1,0f + ex 0,0(%r1) + j . +0: br %r14 .endm
/*
From: Takashi Iwai tiwai@suse.de
commit 64752a95b702817602d72f109ceaf5ec0780e283 upstream.
Recently we've added a new usb_mixer element type, USB_MIXER_BESPOKEN, but it wasn't added in the table in snd_usb_mixer_dump_cval(). This is no big problem since each bespoken type should have its own dump method, but it still isn't disallowed to use the standard one, so we should cover it as well. Along with it, define the table with the explicit array initializer for avoiding other pitfalls.
Fixes: 785b6f29a795 ("ALSA: usb-audio: scarlett2: Fix wrong resume call") Reported-by: Pavel Machek pavel@denx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210714084836.1977-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/mixer.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -3274,7 +3274,15 @@ static void snd_usb_mixer_dump_cval(stru { struct usb_mixer_elem_info *cval = mixer_elem_list_to_info(list); static const char * const val_types[] = { - "BOOLEAN", "INV_BOOLEAN", "S8", "U8", "S16", "U16", "S32", "U32", + [USB_MIXER_BOOLEAN] = "BOOLEAN", + [USB_MIXER_INV_BOOLEAN] = "INV_BOOLEAN", + [USB_MIXER_S8] = "S8", + [USB_MIXER_U8] = "U8", + [USB_MIXER_S16] = "S16", + [USB_MIXER_U16] = "U16", + [USB_MIXER_S32] = "S32", + [USB_MIXER_U32] = "U32", + [USB_MIXER_BESPOKEN] = "BESPOKEN", }; snd_iprintf(buffer, " Info: id=%i, control=%i, cmask=0x%x, " "channels=%i, type="%s"\n", cval->head.id,
From: Alexander Tsoy alexander@tsoy.me
commit b0084afde27fe8a504377dee65f55bc6aa776937 upstream.
These devices has two interfaces, but only the second interface contains the capture endpoint, thus quirk is required to delay the registration until the second interface appears.
Tested-by: Jakub Fišer jakub@ufiseru.cz Signed-off-by: Alexander Tsoy alexander@tsoy.me Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210721235605.53741-1-alexander@tsoy.me Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1895,6 +1895,9 @@ static const struct registration_quirk r REG_QUIRK_ENTRY(0x0951, 0x16d8, 2), /* Kingston HyperX AMP */ REG_QUIRK_ENTRY(0x0951, 0x16ed, 2), /* Kingston HyperX Cloud Alpha S */ REG_QUIRK_ENTRY(0x0951, 0x16ea, 2), /* Kingston HyperX Cloud Flight S */ + REG_QUIRK_ENTRY(0x0ecb, 0x1f46, 2), /* JBL Quantum 600 */ + REG_QUIRK_ENTRY(0x0ecb, 0x2039, 2), /* JBL Quantum 400 */ + REG_QUIRK_ENTRY(0x0ecb, 0x203e, 2), /* JBL Quantum 800 */ { 0 } /* terminator */ };
From: Takashi Iwai tiwai@suse.de
commit 1c2b9519159b470ef24b2638f4794e86e2952ab7 upstream.
SB16 CSP driver may hit potentially a typical ABBA deadlock in two code paths:
In snd_sb_csp_stop(): spin_lock_irqsave(&p->chip->mixer_lock, flags); spin_lock(&p->chip->reg_lock);
In snd_sb_csp_load(): spin_lock_irqsave(&p->chip->reg_lock, flags); spin_lock(&p->chip->mixer_lock);
Also the similar pattern is seen in snd_sb_csp_start().
Although the practical impact is very small (those states aren't triggered in the same running state and this happens only on a real hardware, decades old ISA sound boards -- which must be very difficult to find nowadays), it's a real scenario and has to be fixed.
This patch addresses those deadlocks by splitting the locks in snd_sb_csp_start() and snd_sb_csp_stop() for avoiding the nested locks.
Reported-by: Jia-Ju Bai baijiaju1990@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/7b0fcdaf-cd4f-4728-2eae-48c151a92e10@gmail.com Link: https://lore.kernel.org/r/20210716132723.13216-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/isa/sb/sb16_csp.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/sound/isa/sb/sb16_csp.c +++ b/sound/isa/sb/sb16_csp.c @@ -814,6 +814,7 @@ static int snd_sb_csp_start(struct snd_s mixR = snd_sbmixer_read(p->chip, SB_DSP4_PCM_DEV + 1); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV, mixL & 0x7); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV + 1, mixR & 0x7); + spin_unlock_irqrestore(&p->chip->mixer_lock, flags);
spin_lock(&p->chip->reg_lock); set_mode_register(p->chip, 0xc0); /* c0 = STOP */ @@ -853,6 +854,7 @@ static int snd_sb_csp_start(struct snd_s spin_unlock(&p->chip->reg_lock);
/* restore PCM volume */ + spin_lock_irqsave(&p->chip->mixer_lock, flags); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV, mixL); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV + 1, mixR); spin_unlock_irqrestore(&p->chip->mixer_lock, flags); @@ -878,6 +880,7 @@ static int snd_sb_csp_stop(struct snd_sb mixR = snd_sbmixer_read(p->chip, SB_DSP4_PCM_DEV + 1); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV, mixL & 0x7); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV + 1, mixR & 0x7); + spin_unlock_irqrestore(&p->chip->mixer_lock, flags);
spin_lock(&p->chip->reg_lock); if (p->running & SNDRV_SB_CSP_ST_QSOUND) { @@ -892,6 +895,7 @@ static int snd_sb_csp_stop(struct snd_sb spin_unlock(&p->chip->reg_lock);
/* restore PCM volume */ + spin_lock_irqsave(&p->chip->mixer_lock, flags); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV, mixL); snd_sbmixer_write(p->chip, SB_DSP4_PCM_DEV + 1, mixR); spin_unlock_irqrestore(&p->chip->mixer_lock, flags);
From: Hui Wang hui.wang@canonical.com
commit e4efa82660e6d80338c554e45e903714e1b2c27b upstream.
This is a Lenovo ThinkStation machine which uses the codec alc623. There are 2 issues on this machine, the 1st one is the pop noise in the lineout, the 2nd one is there are 2 Front Mics and pulseaudio can't handle them, After applying the fixup of ALC623_FIXUP_LENOVO_THINKSTATION_P340 to this machine, the 2 issues are fixed.
Cc: stable@vger.kernel.org Signed-off-by: Hui Wang hui.wang@canonical.com Link: https://lore.kernel.org/r/20210719030231.6870-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -8550,6 +8550,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x17aa, 0x3151, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x17aa, 0x3176, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x17aa, 0x3178, "ThinkCentre Station", ALC283_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x17aa, 0x31af, "ThinkCentre Station", ALC623_FIXUP_LENOVO_THINKSTATION_P340), SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940", ALC298_FIXUP_LENOVO_SPK_VOLUME), SND_PCI_QUIRK(0x17aa, 0x3827, "Ideapad S740", ALC285_FIXUP_IDEAPAD_S740_COEF), SND_PCI_QUIRK(0x17aa, 0x3843, "Yoga 9i", ALC287_FIXUP_IDEAPAD_BASS_SPK_AMP),
From: Takashi Iwai tiwai@suse.de
commit 33f735f137c6539e3ceceb515cd1e2a644005b49 upstream.
The BIOS on MSI Mortar B550m WiFi (MS-7C94) board with AMDGPU seems disabling the other pins than HDMI although it has more outputs including DP.
This patch adds the board to the allow list for enabling all pins.
Reported-by: Damjan Georgievski gdamjan@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CAEk1YH4Jd0a8vfZxORVu7qg+Zsc-K+pR187ezNq8QhJBPW4gp... Link: https://lore.kernel.org/r/20210716135600.24176-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_hdmi.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1939,6 +1939,7 @@ static int hdmi_add_cvt(struct hda_codec static const struct snd_pci_quirk force_connect_list[] = { SND_PCI_QUIRK(0x103c, 0x870f, "HP", 1), SND_PCI_QUIRK(0x103c, 0x871a, "HP", 1), + SND_PCI_QUIRK(0x1462, 0xec94, "MS-7C94", 1), {} };
From: Alan Young consult.awy@gmail.com
commit 2e2832562c877e6530b8480982d99a4ff90c6777 upstream.
If a 32-bit application is being used with a 64-bit kernel and is using the mmap mechanism to write data, then the SNDRV_PCM_IOCTL_SYNC_PTR ioctl results in calling snd_pcm_ioctl_sync_ptr_compat(). Make this use pcm_lib_apply_appl_ptr() so that the substream's ack() method, if defined, is called.
The snd_pcm_sync_ptr() function, used in the 64-bit ioctl case, already uses snd_pcm_ioctl_sync_ptr_compat().
Fixes: 9027c4639ef1 ("ALSA: pcm: Call ack() whenever appl_ptr is updated") Signed-off-by: Alan Young consult.awy@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/c441f18c-eb2a-3bdd-299a-696ccca2de9c@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/pcm_native.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3062,9 +3062,14 @@ static int snd_pcm_ioctl_sync_ptr_compat boundary = 0x7fffffff; snd_pcm_stream_lock_irq(substream); /* FIXME: we should consider the boundary for the sync from app */ - if (!(sflags & SNDRV_PCM_SYNC_PTR_APPL)) - control->appl_ptr = scontrol.appl_ptr; - else + if (!(sflags & SNDRV_PCM_SYNC_PTR_APPL)) { + err = pcm_lib_apply_appl_ptr(substream, + scontrol.appl_ptr); + if (err < 0) { + snd_pcm_stream_unlock_irq(substream); + return err; + } + } else scontrol.appl_ptr = control->appl_ptr % boundary; if (!(sflags & SNDRV_PCM_SYNC_PTR_AVAIL_MIN)) control->avail_min = scontrol.avail_min;
From: Takashi Iwai tiwai@suse.de
commit c4824ae7db418aee6f50f308a20b832e58e997fd upstream.
The hw_support_mmap() doesn't cover all memory allocation types and might use a wrong device pointer for checking the capability. Check the all memory allocation types more completely.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210720092640.12338-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/pcm_native.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -246,12 +246,18 @@ static bool hw_support_mmap(struct snd_p if (!(substream->runtime->hw.info & SNDRV_PCM_INFO_MMAP)) return false;
- if (substream->ops->mmap || - (substream->dma_buffer.dev.type != SNDRV_DMA_TYPE_DEV && - substream->dma_buffer.dev.type != SNDRV_DMA_TYPE_DEV_UC)) + if (substream->ops->mmap) return true;
- return dma_can_mmap(substream->dma_buffer.dev.dev); + switch (substream->dma_buffer.dev.type) { + case SNDRV_DMA_TYPE_UNKNOWN: + return false; + case SNDRV_DMA_TYPE_CONTINUOUS: + case SNDRV_DMA_TYPE_VMALLOC: + return true; + default: + return dma_can_mmap(substream->dma_buffer.dev.dev); + } }
static int constrain_mask_params(struct snd_pcm_substream *substream,
From: Moritz Fischer mdf@kernel.org
commit 44cf53602f5a0db80d53c8fff6cdbcae59650a42 upstream.
This reverts commit d143825baf15f204dac60acdf95e428182aa3374.
Justin reports some of his systems now fail as result of this commit:
xhci_hcd 0000:04:00.0: Direct firmware load for renesas_usb_fw.mem failed with error -2 xhci_hcd 0000:04:00.0: request_firmware failed: -2 xhci_hcd: probe of 0000:04:00.0 failed with error -2
The revert brings back the original issue the commit tried to solve but at least unbreaks existing systems relying on previous behavior.
Cc: stable@vger.kernel.org Cc: Mathias Nyman mathias.nyman@intel.com Cc: Vinod Koul vkoul@kernel.org Cc: Justin Forbes jmforbes@linuxtx.org Reported-by: Justin Forbes jmforbes@linuxtx.org Signed-off-by: Moritz Fischer mdf@kernel.org Fixes: d143825baf15 ("usb: renesas-xhci: Fix handling of unknown ROM state") Link: https://lore.kernel.org/r/20210719070519.41114-1-mdf@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-pci-renesas.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/usb/host/xhci-pci-renesas.c +++ b/drivers/usb/host/xhci-pci-renesas.c @@ -207,8 +207,7 @@ static int renesas_check_rom_state(struc return 0;
case RENESAS_ROM_STATUS_NO_RESULT: /* No result yet */ - dev_dbg(&pdev->dev, "Unknown ROM status ...\n"); - break; + return 0;
case RENESAS_ROM_STATUS_ERROR: /* Error State */ default: /* All other states are marked as "Reserved states" */ @@ -225,12 +224,13 @@ static int renesas_fw_check_running(stru u8 fw_state; int err;
- /* - * Only if device has ROM and loaded FW we can skip loading and - * return success. Otherwise (even unknown state), attempt to load FW. - */ - if (renesas_check_rom(pdev) && !renesas_check_rom_state(pdev)) - return 0; + /* Check if device has ROM and loaded, if so skip everything */ + err = renesas_check_rom(pdev); + if (err) { /* we have rom */ + err = renesas_check_rom_state(pdev); + if (!err) + return err; + }
/* * Test if the device is actually needing the firmware. As most
From: Greg Thelen gthelen@google.com
commit 0665e387318607d8269bfdea60723c627c8bae43 upstream.
Commit a66d21d7dba8 ("usb: xhci: Add support for Renesas controller with memory") added renesas_usb_fw.mem firmware reference to xhci-pci. Thus modinfo indicates xhci-pci.ko has "firmware: renesas_usb_fw.mem". But the firmware is only actually used with CONFIG_USB_XHCI_PCI_RENESAS. An unusable firmware reference can trigger safety checkers which look for drivers with unmet firmware dependencies.
Avoid referring to renesas_usb_fw.mem in circumstances when it cannot be loaded (when CONFIG_USB_XHCI_PCI_RENESAS isn't set).
Fixes: a66d21d7dba8 ("usb: xhci: Add support for Renesas controller with memory") Cc: stable stable@vger.kernel.org Signed-off-by: Greg Thelen gthelen@google.com Link: https://lore.kernel.org/r/20210702071224.3673568-1-gthelen@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-pci.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -631,7 +631,14 @@ static const struct pci_device_id pci_id { /* end: all zeroes */ } }; MODULE_DEVICE_TABLE(pci, pci_ids); + +/* + * Without CONFIG_USB_XHCI_PCI_RENESAS renesas_xhci_check_request_fw() won't + * load firmware, so don't encumber the xhci-pci driver with it. + */ +#if IS_ENABLED(CONFIG_USB_XHCI_PCI_RENESAS) MODULE_FIRMWARE("renesas_usb_fw.mem"); +#endif
/* pci driver glue; this is a "new style" PCI driver module */ static struct pci_driver xhci_pci_driver = {
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 72f68bf5c756f5ce1139b31daae2684501383ad5 upstream.
There's a small window where a USB 2 remote wake may be left unhandled due to a race between hub thread and xhci port event interrupt handler.
When the resume event is detected in the xhci interrupt handler it kicks the hub timer, which should move the port from resume to U0 once resume has been signalled for long enough.
To keep the hub "thread" running we set a bus_state->resuming_ports flag. This flag makes sure hub timer function kicks itself.
checking this flag was not properly protected by the spinlock. Flag was copied to a local variable before lock was taken. The local variable was then checked later with spinlock held.
If interrupt is handled right after copying the flag to the local variable we end up stopping the hub thread before it can handle the USB 2 resume.
CPU0 CPU1 (hub thread) (xhci event handler)
xhci_hub_status_data() status = bus_state->resuming_ports; <Interrupt> handle_port_status() spin_lock() bus_state->resuming_ports = 1 set_flag(HCD_FLAG_POLL_RH) spin_unlock() spin_lock() if (!status) clear_flag(HCD_FLAG_POLL_RH) spin_unlock()
Fix this by taking the lock a bit earlier so that it covers the resuming_ports flag copy in the hub thread
Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20210715150651.1996099-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-hub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -1552,11 +1552,12 @@ int xhci_hub_status_data(struct usb_hcd * Inform the usbcore about resume-in-progress by returning * a non-zero value even if there are no status changes. */ + spin_lock_irqsave(&xhci->lock, flags); + status = bus_state->resuming_ports;
mask = PORT_CSC | PORT_PEC | PORT_OCC | PORT_PLC | PORT_WRC | PORT_CEC;
- spin_lock_irqsave(&xhci->lock, flags); /* For each port, did anything change? If so, set that bit in buf. */ for (i = 0; i < max_ports; i++) { temp = readl(ports[i]->addr);
From: Nicholas Piggin npiggin@gmail.com
commit f62f3c20647ebd5fb6ecb8f0b477b9281c44c10a upstream.
The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on the rtas_args.nargs that was provided by the guest. That guest nargs value is not range checked, so the guest can cause the host rets pointer to be pointed outside the args array. The individual rtas function handlers check the nargs and nrets values to ensure they are correct, but if they are not, the handlers store a -3 (0xfffffffd) failure indication in rets[0] which corrupts host memory.
Fix this by testing up front whether the guest supplied nargs and nret would exceed the array size, and fail the hcall directly without storing a failure indication to rets[0].
Also expand on a comment about why we kill the guest and try not to return errors directly if we have a valid rets[0] pointer.
Fixes: 8e591cb72047 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls") Cc: stable@vger.kernel.org # v3.10+ Reported-by: Alexey Kardashevskiy aik@ozlabs.ru Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/book3s_rtas.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
--- a/arch/powerpc/kvm/book3s_rtas.c +++ b/arch/powerpc/kvm/book3s_rtas.c @@ -242,6 +242,17 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *v * value so we can restore it on the way out. */ orig_rets = args.rets; + if (be32_to_cpu(args.nargs) >= ARRAY_SIZE(args.args)) { + /* + * Don't overflow our args array: ensure there is room for + * at least rets[0] (even if the call specifies 0 nret). + * + * Each handler must then check for the correct nargs and nret + * values, but they may always return failure in rets[0]. + */ + rc = -EINVAL; + goto fail; + } args.rets = &args.args[be32_to_cpu(args.nargs)];
mutex_lock(&vcpu->kvm->arch.rtas_token_lock); @@ -269,9 +280,17 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *v fail: /* * We only get here if the guest has called RTAS with a bogus - * args pointer. That means we can't get to the args, and so we - * can't fail the RTAS call. So fail right out to userspace, - * which should kill the guest. + * args pointer or nargs/nret values that would overflow the + * array. That means we can't get to the args, and so we can't + * fail the RTAS call. So fail right out to userspace, which + * should kill the guest. + * + * SLOF should actually pass the hcall return value from the + * rtas handler call in r3, so enter_rtas could be modified to + * return a failure indication in r3 and we could return such + * errors to the guest rather than failing to host userspace. + * However old guests that don't test for failure could then + * continue silently after errors, so for now we won't do this. */ return rc; }
From: Nicholas Piggin npiggin@gmail.com
commit d9c57d3ed52a92536f5fa59dc5ccdd58b4875076 upstream.
The H_ENTER_NESTED hypercall is handled by the L0, and it is a request by the L1 to switch the context of the vCPU over to that of its L2 guest, and return with an interrupt indication. The L1 is responsible for switching some registers to guest context, and the L0 switches others (including all the hypervisor privileged state).
If the L2 MSR has TM active, then the L1 is responsible for recheckpointing the L2 TM state. Then the L1 exits to L0 via the H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit, and then it recheckpoints the TM state as part of the nested entry and finally HRFIDs into the L2 with TM active MSR. Not efficient, but about the simplest approach for something that's horrendously complicated.
Problems arise if the L1 exits to the L0 with a TM state which does not match the L2 TM state being requested. For example if the L1 is transactional but the L2 MSR is non-transactional, or vice versa. The L0's HRFID can take a TM Bad Thing interrupt and crash.
Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then ensuring that if the L1 is suspended then the L2 must have TM active, and if the L1 is not suspended then the L2 must not have TM active.
Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Alexey Kardashevskiy aik@ozlabs.ru Acked-by: Michael Neuling mikey@neuling.org Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/book3s_hv_nested.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -232,6 +232,9 @@ long kvmhv_enter_nested_guest(struct kvm if (vcpu->kvm->arch.l1_ptcr == 0) return H_NOT_AVAILABLE;
+ if (MSR_TM_TRANSACTIONAL(vcpu->arch.shregs.msr)) + return H_BAD_MODE; + /* copy parameters in */ hv_ptr = kvmppc_get_gpr(vcpu, 4); regs_ptr = kvmppc_get_gpr(vcpu, 5); @@ -254,6 +257,23 @@ long kvmhv_enter_nested_guest(struct kvm if (l2_hv.vcpu_token >= NR_CPUS) return H_PARAMETER;
+ /* + * L1 must have set up a suspended state to enter the L2 in a + * transactional state, and only in that case. These have to be + * filtered out here to prevent causing a TM Bad Thing in the + * host HRFID. We could synthesize a TM Bad Thing back to the L1 + * here but there doesn't seem like much point. + */ + if (MSR_TM_SUSPENDED(vcpu->arch.shregs.msr)) { + if (!MSR_TM_ACTIVE(l2_regs.msr)) + return H_BAD_MODE; + } else { + if (l2_regs.msr & MSR_TS_MASK) + return H_BAD_MODE; + if (WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_TS_MASK)) + return H_BAD_MODE; + } + /* translate lpid */ l2 = kvmhv_get_nested(vcpu->kvm, l2_hv.lpid, true); if (!l2)
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 1b7f56fbc7a1b66967b6114d1b5f5a257c3abae6 upstream.
The device initiated link power management U1/U2 states should not be enabled in case the system exit latency plus one bus interval (125us) is greater than the shortest service interval of any periodic endpoint.
This is the case for both U1 and U2 sytstem exit latencies and link states.
See USB 3.2 section 9.4.9 "Set Feature" for more details
Note, before this patch the host and device initiated U1/U2 lpm states were both enabled with lpm. After this patch it's possible to end up with only host inititated U1/U2 lpm in case the exit latencies won't allow device initiated lpm.
If this case we still want to set the udev->usb3_lpm_ux_enabled flag so that sysfs users can see the link may go to U1/U2.
Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20210715150122.1995966-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 68 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 56 insertions(+), 12 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -4041,6 +4041,47 @@ static int usb_set_lpm_timeout(struct us }
/* + * Don't allow device intiated U1/U2 if the system exit latency + one bus + * interval is greater than the minimum service interval of any active + * periodic endpoint. See USB 3.2 section 9.4.9 + */ +static bool usb_device_may_initiate_lpm(struct usb_device *udev, + enum usb3_link_state state) +{ + unsigned int sel; /* us */ + int i, j; + + if (state == USB3_LPM_U1) + sel = DIV_ROUND_UP(udev->u1_params.sel, 1000); + else if (state == USB3_LPM_U2) + sel = DIV_ROUND_UP(udev->u2_params.sel, 1000); + else + return false; + + for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { + struct usb_interface *intf; + struct usb_endpoint_descriptor *desc; + unsigned int interval; + + intf = udev->actconfig->interface[i]; + if (!intf) + continue; + + for (j = 0; j < intf->cur_altsetting->desc.bNumEndpoints; j++) { + desc = &intf->cur_altsetting->endpoint[j].desc; + + if (usb_endpoint_xfer_int(desc) || + usb_endpoint_xfer_isoc(desc)) { + interval = (1 << (desc->bInterval - 1)) * 125; + if (sel + 125 > interval) + return false; + } + } + } + return true; +} + +/* * Enable the hub-initiated U1/U2 idle timeouts, and enable device-initiated * U1/U2 entry. * @@ -4112,20 +4153,23 @@ static void usb_enable_link_state(struct * U1/U2_ENABLE */ if (udev->actconfig && - usb_set_device_initiated_lpm(udev, state, true) == 0) { - if (state == USB3_LPM_U1) - udev->usb3_lpm_u1_enabled = 1; - else if (state == USB3_LPM_U2) - udev->usb3_lpm_u2_enabled = 1; - } else { - /* Don't request U1/U2 entry if the device - * cannot transition to U1/U2. - */ - usb_set_lpm_timeout(udev, state, 0); - hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state); + usb_device_may_initiate_lpm(udev, state)) { + if (usb_set_device_initiated_lpm(udev, state, true)) { + /* + * Request to enable device initiated U1/U2 failed, + * better to turn off lpm in this case. + */ + usb_set_lpm_timeout(udev, state, 0); + hcd->driver->disable_usb3_lpm_timeout(hcd, udev, state); + return; + } } -}
+ if (state == USB3_LPM_U1) + udev->usb3_lpm_u1_enabled = 1; + else if (state == USB3_LPM_U2) + udev->usb3_lpm_u2_enabled = 1; +} /* * Disable the hub-initiated U1/U2 idle timeouts, and disable device-initiated * U1/U2 entry.
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 1bf2761c837571a66ec290fb66c90413821ffda2 upstream.
Maximum Exit Latency (MEL) value is used by host to know how much in advance it needs to start waking up a U1/U2 suspended link in order to service a periodic transfer in time.
Current MEL calculation only includes the time to wake up the path from U1/U2 to U0. This is called tMEL1 in USB 3.1 section C 1.5.2
Total MEL = tMEL1 + tMEL2 +tMEL3 + tMEL4 which should additinally include: - tMEL2 which is the time it takes for PING message to reach device - tMEL3 time for device to process the PING and submit a PING_RESPONSE - tMEL4 time for PING_RESPONSE to traverse back upstream to host.
Add the missing tMEL2, tMEL3 and tMEL4 to MEL calculation.
Cc: stable@kernel.org # v3.5 Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20210715150122.1995966-1-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 52 ++++++++++++++++++++++++++----------------------- 1 file changed, 28 insertions(+), 24 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -47,6 +47,7 @@
#define USB_TP_TRANSMISSION_DELAY 40 /* ns */ #define USB_TP_TRANSMISSION_DELAY_MAX 65535 /* ns */ +#define USB_PING_RESPONSE_TIME 400 /* ns */
/* Protect struct usb_device->state and ->children members * Note: Both are also protected by ->dev.sem, except that ->state can @@ -181,8 +182,9 @@ int usb_device_supports_lpm(struct usb_d }
/* - * Set the Maximum Exit Latency (MEL) for the host to initiate a transition from - * either U1 or U2. + * Set the Maximum Exit Latency (MEL) for the host to wakup up the path from + * U1/U2, send a PING to the device and receive a PING_RESPONSE. + * See USB 3.1 section C.1.5.2 */ static void usb_set_lpm_mel(struct usb_device *udev, struct usb3_lpm_parameters *udev_lpm_params, @@ -192,35 +194,37 @@ static void usb_set_lpm_mel(struct usb_d unsigned int hub_exit_latency) { unsigned int total_mel; - unsigned int device_mel; - unsigned int hub_mel;
/* - * Calculate the time it takes to transition all links from the roothub - * to the parent hub into U0. The parent hub must then decode the - * packet (hub header decode latency) to figure out which port it was - * bound for. - * - * The Hub Header decode latency is expressed in 0.1us intervals (0x1 - * means 0.1us). Multiply that by 100 to get nanoseconds. + * tMEL1. time to transition path from host to device into U0. + * MEL for parent already contains the delay up to parent, so only add + * the exit latency for the last link (pick the slower exit latency), + * and the hub header decode latency. See USB 3.1 section C 2.2.1 + * Store MEL in nanoseconds */ total_mel = hub_lpm_params->mel + - (hub->descriptor->u.ss.bHubHdrDecLat * 100); + max(udev_exit_latency, hub_exit_latency) * 1000 + + hub->descriptor->u.ss.bHubHdrDecLat * 100;
/* - * How long will it take to transition the downstream hub's port into - * U0? The greater of either the hub exit latency or the device exit - * latency. - * - * The BOS U1/U2 exit latencies are expressed in 1us intervals. - * Multiply that by 1000 to get nanoseconds. + * tMEL2. Time to submit PING packet. Sum of tTPTransmissionDelay for + * each link + wHubDelay for each hub. Add only for last link. + * tMEL4, the time for PING_RESPONSE to traverse upstream is similar. + * Multiply by 2 to include it as well. */ - device_mel = udev_exit_latency * 1000; - hub_mel = hub_exit_latency * 1000; - if (device_mel > hub_mel) - total_mel += device_mel; - else - total_mel += hub_mel; + total_mel += (__le16_to_cpu(hub->descriptor->u.ss.wHubDelay) + + USB_TP_TRANSMISSION_DELAY) * 2; + + /* + * tMEL3, tPingResponse. Time taken by device to generate PING_RESPONSE + * after receiving PING. Also add 2100ns as stated in USB 3.1 C 1.5.2.4 + * to cover the delay if the PING_RESPONSE is queued behind a Max Packet + * Size DP. + * Note these delays should be added only once for the entire path, so + * add them to the MEL of the device connected to the roothub. + */ + if (!hub->hdev->parent) + total_mel += USB_PING_RESPONSE_TIME + 2100;
udev_lpm_params->mel = total_mel; }
From: Julian Sikorski belegdol@gmail.com
commit 6abf2fe6b4bf6e5256b80c5817908151d2d33e9f upstream.
LaCie Rugged USB3-FW appears to be incompatible with UAS. It generates errors like: [ 1151.582598] sd 14:0:0:0: tag#16 uas_eh_abort_handler 0 uas-tag 1 inflight: IN [ 1151.582602] sd 14:0:0:0: tag#16 CDB: Report supported operation codes a3 0c 01 12 00 00 00 00 02 00 00 00 [ 1151.588594] scsi host14: uas_eh_device_reset_handler start [ 1151.710482] usb 2-4: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 1151.741398] scsi host14: uas_eh_device_reset_handler success [ 1181.785534] scsi host14: uas_eh_device_reset_handler start
Signed-off-by: Julian Sikorski belegdol+github@gmail.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20210720171910.36497-1-belegdol+github@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/storage/unusual_uas.h | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/storage/unusual_uas.h +++ b/drivers/usb/storage/unusual_uas.h @@ -45,6 +45,13 @@ UNUSUAL_DEV(0x059f, 0x105f, 0x0000, 0x99 USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NO_REPORT_OPCODES | US_FL_NO_SAME),
+/* Reported-by: Julian Sikorski belegdol@gmail.com */ +UNUSUAL_DEV(0x059f, 0x1061, 0x0000, 0x9999, + "LaCie", + "Rugged USB3-FW", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_IGNORE_UAS), + /* * Apricorn USB3 dongle sometimes returns "USBSUSBSUSBS" in response to SCSI * commands in UAS mode. Observed with the 1.28 firmware; are there others?
From: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
commit b5fdf5c6e6bee35837e160c00ac89327bdad031b upstream.
The MAX-3421 USB driver remembers the state of the USB toggles for a device/endpoint. To save SPI writes, this was only done when a new device/endpoint was being used. Unfortunately, if the old device was removed, this would cause writes to freed memory.
To fix this, a simpler scheme is used. The toggles are read from hardware when a URB is completed, and the toggles are always written to hardware when any URB transaction is started. This will cause a few more SPI transactions, but no causes kernel panics.
Fixes: 2d53139f3162 ("Add support for using a MAX3421E chip as a host driver.") Cc: stable stable@vger.kernel.org Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz Link: https://lore.kernel.org/r/20210625031456.8632-1-mark.tomlinson@alliedtelesis... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/max3421-hcd.c | 44 +++++++++++++---------------------------- 1 file changed, 14 insertions(+), 30 deletions(-)
--- a/drivers/usb/host/max3421-hcd.c +++ b/drivers/usb/host/max3421-hcd.c @@ -153,8 +153,6 @@ struct max3421_hcd { */ struct urb *curr_urb; enum scheduling_pass sched_pass; - struct usb_device *loaded_dev; /* dev that's loaded into the chip */ - int loaded_epnum; /* epnum whose toggles are loaded */ int urb_done; /* > 0 -> no errors, < 0: errno */ size_t curr_len; u8 hien; @@ -492,39 +490,17 @@ max3421_set_speed(struct usb_hcd *hcd, s * Caller must NOT hold HCD spinlock. */ static void -max3421_set_address(struct usb_hcd *hcd, struct usb_device *dev, int epnum, - int force_toggles) +max3421_set_address(struct usb_hcd *hcd, struct usb_device *dev, int epnum) { - struct max3421_hcd *max3421_hcd = hcd_to_max3421(hcd); - int old_epnum, same_ep, rcvtog, sndtog; - struct usb_device *old_dev; + int rcvtog, sndtog; u8 hctl;
- old_dev = max3421_hcd->loaded_dev; - old_epnum = max3421_hcd->loaded_epnum; - - same_ep = (dev == old_dev && epnum == old_epnum); - if (same_ep && !force_toggles) - return; - - if (old_dev && !same_ep) { - /* save the old end-points toggles: */ - u8 hrsl = spi_rd8(hcd, MAX3421_REG_HRSL); - - rcvtog = (hrsl >> MAX3421_HRSL_RCVTOGRD_BIT) & 1; - sndtog = (hrsl >> MAX3421_HRSL_SNDTOGRD_BIT) & 1; - - /* no locking: HCD (i.e., we) own toggles, don't we? */ - usb_settoggle(old_dev, old_epnum, 0, rcvtog); - usb_settoggle(old_dev, old_epnum, 1, sndtog); - } /* setup new endpoint's toggle bits: */ rcvtog = usb_gettoggle(dev, epnum, 0); sndtog = usb_gettoggle(dev, epnum, 1); hctl = (BIT(rcvtog + MAX3421_HCTL_RCVTOG0_BIT) | BIT(sndtog + MAX3421_HCTL_SNDTOG0_BIT));
- max3421_hcd->loaded_epnum = epnum; spi_wr8(hcd, MAX3421_REG_HCTL, hctl);
/* @@ -532,7 +508,6 @@ max3421_set_address(struct usb_hcd *hcd, * address-assignment so it's best to just always load the * address whenever the end-point changed/was forced. */ - max3421_hcd->loaded_dev = dev; spi_wr8(hcd, MAX3421_REG_PERADDR, dev->devnum); }
@@ -667,7 +642,7 @@ max3421_select_and_start_urb(struct usb_ struct max3421_hcd *max3421_hcd = hcd_to_max3421(hcd); struct urb *urb, *curr_urb = NULL; struct max3421_ep *max3421_ep; - int epnum, force_toggles = 0; + int epnum; struct usb_host_endpoint *ep; struct list_head *pos; unsigned long flags; @@ -777,7 +752,6 @@ done: usb_settoggle(urb->dev, epnum, 0, 1); usb_settoggle(urb->dev, epnum, 1, 1); max3421_ep->pkt_state = PKT_STATE_SETUP; - force_toggles = 1; } else max3421_ep->pkt_state = PKT_STATE_TRANSFER; } @@ -785,7 +759,7 @@ done: spin_unlock_irqrestore(&max3421_hcd->lock, flags);
max3421_ep->last_active = max3421_hcd->frame_number; - max3421_set_address(hcd, urb->dev, epnum, force_toggles); + max3421_set_address(hcd, urb->dev, epnum); max3421_set_speed(hcd, urb->dev); max3421_next_transfer(hcd, 0); return 1; @@ -1380,6 +1354,16 @@ max3421_urb_done(struct usb_hcd *hcd) status = 0; urb = max3421_hcd->curr_urb; if (urb) { + /* save the old end-points toggles: */ + u8 hrsl = spi_rd8(hcd, MAX3421_REG_HRSL); + int rcvtog = (hrsl >> MAX3421_HRSL_RCVTOGRD_BIT) & 1; + int sndtog = (hrsl >> MAX3421_HRSL_SNDTOGRD_BIT) & 1; + int epnum = usb_endpoint_num(&urb->ep->desc); + + /* no locking: HCD (i.e., we) own toggles, don't we? */ + usb_settoggle(urb->dev, epnum, 0, rcvtog); + usb_settoggle(urb->dev, epnum, 1, sndtog); + max3421_hcd->curr_urb = NULL; spin_lock_irqsave(&max3421_hcd->lock, flags); usb_hcd_unlink_urb_from_ep(hcd, urb);
From: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com
commit 5719df243e118fb343725e8b2afb1637e1af1373 upstream.
This driver has a potential issue which this driver is possible to cause superfluous irqs after usb_pkt_pop() is called. So, after the commit 3af32605289e ("usb: renesas_usbhs: fix error return code of usbhsf_pkt_handler()") had been applied, we could observe the following error happened when we used g_audio.
renesas_usbhs e6590000.usb: irq_ready run_error 1 : -22
To fix the issue, disable the tx or rx interrupt in usb_pkt_pop().
Fixes: 2743e7f90dc0 ("usb: renesas_usbhs: fix the usb_pkt_pop()") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://lore.kernel.org/r/20210624122039.596528-1-yoshihiro.shimoda.uh@renes... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/renesas_usbhs/fifo.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/renesas_usbhs/fifo.c +++ b/drivers/usb/renesas_usbhs/fifo.c @@ -101,6 +101,8 @@ static struct dma_chan *usbhsf_dma_chan_ #define usbhsf_dma_map(p) __usbhsf_dma_map_ctrl(p, 1) #define usbhsf_dma_unmap(p) __usbhsf_dma_map_ctrl(p, 0) static int __usbhsf_dma_map_ctrl(struct usbhs_pkt *pkt, int map); +static void usbhsf_tx_irq_ctrl(struct usbhs_pipe *pipe, int enable); +static void usbhsf_rx_irq_ctrl(struct usbhs_pipe *pipe, int enable); struct usbhs_pkt *usbhs_pkt_pop(struct usbhs_pipe *pipe, struct usbhs_pkt *pkt) { struct usbhs_priv *priv = usbhs_pipe_to_priv(pipe); @@ -123,6 +125,11 @@ struct usbhs_pkt *usbhs_pkt_pop(struct u if (chan) { dmaengine_terminate_all(chan); usbhsf_dma_unmap(pkt); + } else { + if (usbhs_pipe_is_dir_in(pipe)) + usbhsf_rx_irq_ctrl(pipe, 0); + else + usbhsf_tx_irq_ctrl(pipe, 0); }
usbhs_pipe_clear_without_sequence(pipe, 0, 0);
From: Marco De Marco marco.demarco@posteo.net
commit 94b619a07655805a1622484967754f5848640456 upstream.
The patch is meant to support LARA-R6 Cat 1 module family.
Module USB ID: Vendor ID: 0x05c6 Product ID: 0x90fA
Interface layout: If 0: Diagnostic If 1: AT parser If 2: AT parser If 3: QMI wwan (not available in all versions)
Signed-off-by: Marco De Marco marco.demarco@posteo.net Link: https://lore.kernel.org/r/49260184.kfMIbaSn9k@mars Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -238,6 +238,7 @@ static void option_instat_callback(struc #define QUECTEL_PRODUCT_UC15 0x9090 /* These u-blox products use Qualcomm's vendor ID */ #define UBLOX_PRODUCT_R410M 0x90b2 +#define UBLOX_PRODUCT_R6XX 0x90fa /* These Yuga products use Qualcomm's vendor ID */ #define YUGA_PRODUCT_CLM920_NC5 0x9625
@@ -1101,6 +1102,8 @@ static const struct usb_device_id option /* u-blox products using Qualcomm vendor ID */ { USB_DEVICE(QUALCOMM_VENDOR_ID, UBLOX_PRODUCT_R410M), .driver_info = RSVD(1) | RSVD(3) }, + { USB_DEVICE(QUALCOMM_VENDOR_ID, UBLOX_PRODUCT_R6XX), + .driver_info = RSVD(3) }, /* Quectel products using Quectel vendor ID */ { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC21, 0xff, 0xff, 0xff), .driver_info = NUMEP2 },
From: Ian Ray ian.ray@ge.com
commit e9db418d4b828dd049caaf5ed65dc86f93bb1a0c upstream.
Fix comments for GE CS1000 CP210x USB ID assignments.
Fixes: 42213a0190b5 ("USB: serial: cp210x: add some more GE USB IDs") Signed-off-by: Ian Ray ian.ray@ge.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/cp210x.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -206,8 +206,8 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x1901, 0x0194) }, /* GE Healthcare Remote Alarm Box */ { USB_DEVICE(0x1901, 0x0195) }, /* GE B850/B650/B450 CP2104 DP UART interface */ { USB_DEVICE(0x1901, 0x0196) }, /* GE B850 CP2105 DP UART interface */ - { USB_DEVICE(0x1901, 0x0197) }, /* GE CS1000 Display serial interface */ - { USB_DEVICE(0x1901, 0x0198) }, /* GE CS1000 M.2 Key E serial interface */ + { USB_DEVICE(0x1901, 0x0197) }, /* GE CS1000 M.2 Key E serial interface */ + { USB_DEVICE(0x1901, 0x0198) }, /* GE CS1000 Display serial interface */ { USB_DEVICE(0x199B, 0xBA30) }, /* LORD WSDA-200-USB */ { USB_DEVICE(0x19CF, 0x3000) }, /* Parrot NMEA GPS Flight Recorder */ { USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */
From: John Keeping john@metanate.com
commit d6a206e60124a9759dd7f6dfb86b0e1d3b1df82e upstream.
Add the USB serial device ID for the CEL ZigBee EM3588 radio stick.
Signed-off-by: John Keeping john@metanate.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -159,6 +159,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x10C4, 0x89A4) }, /* CESINEL FTBC Flexible Thyristor Bridge Controller */ { USB_DEVICE(0x10C4, 0x89FB) }, /* Qivicon ZigBee USB Radio Stick */ { USB_DEVICE(0x10C4, 0x8A2A) }, /* HubZ dual ZigBee and Z-Wave dongle */ + { USB_DEVICE(0x10C4, 0x8A5B) }, /* CEL EM3588 ZigBee USB Stick */ { USB_DEVICE(0x10C4, 0x8A5E) }, /* CEL EM3588 ZigBee USB Stick Long Range */ { USB_DEVICE(0x10C4, 0x8B34) }, /* Qivicon ZigBee USB Radio Stick */ { USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */
From: Zhang Qilong zhangqilong3@huawei.com
commit 5b01248156bd75303e66985c351dee648c149979 upstream.
Add missing pm_runtime_disable() when probe error out. It could avoid pm_runtime implementation complains when removing and probing again the driver.
Fixes: 49db427232fe ("usb: gadget: Add UDC driver for tegra XUSB device mode controller") Cc: stable stable@vger.kernel.org Signed-off-by: Zhang Qilong zhangqilong3@huawei.com Link: https://lore.kernel.org/r/20210618141441.107817-1-zhangqilong3@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/udc/tegra-xudc.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/gadget/udc/tegra-xudc.c +++ b/drivers/usb/gadget/udc/tegra-xudc.c @@ -3861,6 +3861,7 @@ static int tegra_xudc_probe(struct platf return 0;
free_eps: + pm_runtime_disable(&pdev->dev); tegra_xudc_free_eps(xudc); free_event_ring: tegra_xudc_free_event_ring(xudc);
From: Minas Harutyunyan Minas.Harutyunyan@synopsys.com
commit fecb3a171db425e5068b27231f8efe154bf72637 upstream.
Because of dwc2_hsotg_ep_stop_xfr() function uses poll mode, first need to mask GINTSTS_GOUTNAKEFF interrupt. In Slave mode GINTSTS_GOUTNAKEFF interrupt will be aserted only after pop OUT NAK status packet from RxFIFO.
In dwc2_hsotg_ep_sethalt() function before setting DCTL_SGOUTNAK need to unmask GOUTNAKEFF interrupt.
Tested by USBCV CH9 and MSC tests set in Slave, BDMA and DDMA. All tests are passed.
Fixes: a4f827714539a ("usb: dwc2: gadget: Disable enabled HW endpoint in dwc2_hsotg_ep_disable") Fixes: 6070636c4918c ("usb: dwc2: Fix Stalling a Non-Isochronous OUT EP") Cc: stable stable@vger.kernel.org Signed-off-by: Minas Harutyunyan Minas.Harutyunyan@synopsys.com Link: https://lore.kernel.org/r/e17fad802bbcaf879e1ed6745030993abb93baf8.162615292... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/gadget.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -3900,9 +3900,27 @@ static void dwc2_hsotg_ep_stop_xfr(struc __func__); } } else { + /* Mask GINTSTS_GOUTNAKEFF interrupt */ + dwc2_hsotg_disable_gsint(hsotg, GINTSTS_GOUTNAKEFF); + if (!(dwc2_readl(hsotg, GINTSTS) & GINTSTS_GOUTNAKEFF)) dwc2_set_bit(hsotg, DCTL, DCTL_SGOUTNAK);
+ if (!using_dma(hsotg)) { + /* Wait for GINTSTS_RXFLVL interrupt */ + if (dwc2_hsotg_wait_bit_set(hsotg, GINTSTS, + GINTSTS_RXFLVL, 100)) { + dev_warn(hsotg->dev, "%s: timeout GINTSTS.RXFLVL\n", + __func__); + } else { + /* + * Pop GLOBAL OUT NAK status packet from RxFIFO + * to assert GOUTNAKEFF interrupt + */ + dwc2_readl(hsotg, GRXSTSP); + } + } + /* Wait for global nak to take effect */ if (dwc2_hsotg_wait_bit_set(hsotg, GINTSTS, GINTSTS_GOUTNAKEFF, 100)) @@ -4348,6 +4366,9 @@ static int dwc2_hsotg_ep_sethalt(struct epctl = dwc2_readl(hs, epreg);
if (value) { + /* Unmask GOUTNAKEFF interrupt */ + dwc2_hsotg_en_gsint(hs, GINTSTS_GOUTNAKEFF); + if (!(dwc2_readl(hs, GINTSTS) & GINTSTS_GOUTNAKEFF)) dwc2_set_bit(hs, DCTL, DCTL_SGOUTNAK); // STALL bit will be set in GOUTNAKEFF interrupt handler
From: Minas Harutyunyan Minas.Harutyunyan@synopsys.com
commit d53dc38857f6dbefabd9eecfcbf67b6eac9a1ef4 upstream.
Sending zero length packet in DDMA mode perform by DMA descriptor by setting SP (short packet) flag.
For DDMA in function dwc2_hsotg_complete_in() does not need to send zlp.
Tested by USBCV MSC tests.
Fixes: f71b5e2533de ("usb: dwc2: gadget: fix zero length packet transfers") Cc: stable stable@vger.kernel.org Signed-off-by: Minas Harutyunyan Minas.Harutyunyan@synopsys.com Link: https://lore.kernel.org/r/967bad78c55dd2db1c19714eee3d0a17cf99d74a.162677773... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/gadget.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -2749,12 +2749,14 @@ static void dwc2_hsotg_complete_in(struc return; }
- /* Zlp for all endpoints, for ep0 only in DATA IN stage */ + /* Zlp for all endpoints in non DDMA, for ep0 only in DATA IN stage */ if (hs_ep->send_zlp) { - dwc2_hsotg_program_zlp(hsotg, hs_ep); hs_ep->send_zlp = 0; - /* transfer will be completed on next complete interrupt */ - return; + if (!using_desc_dma(hsotg)) { + dwc2_hsotg_program_zlp(hsotg, hs_ep); + /* transfer will be completed on next complete interrupt */ + return; + } }
if (hs_ep->index == 0 && hsotg->ep0_state == DWC2_EP0_DATA_IN) {
From: Amelie Delaunay amelie.delaunay@foss.st.com
commit 86762ad4abcc549deb7a155c8e5e961b9755bcf0 upstream.
During interrupt registration, attach state is checked. If attached, then the Type-C state is updated with typec_set_xxx functions and role switch is set with usb_role_switch_set_role().
If the usb_role_switch parameter is error or null, the function simply returns 0.
So, to update usb_role_switch role if a device is attached before the irq is registered, usb_role_switch must be registered before irq registration.
Fixes: da0cb6310094 ("usb: typec: add support for STUSB160x Type-C controller family") Cc: stable stable@vger.kernel.org Signed-off-by: Amelie Delaunay amelie.delaunay@foss.st.com Link: https://lore.kernel.org/r/20210716120718.20398-2-amelie.delaunay@foss.st.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/stusb160x.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/usb/typec/stusb160x.c +++ b/drivers/usb/typec/stusb160x.c @@ -739,10 +739,6 @@ static int stusb160x_probe(struct i2c_cl typec_set_pwr_opmode(chip->port, chip->pwr_opmode);
if (client->irq) { - ret = stusb160x_irq_init(chip, client->irq); - if (ret) - goto port_unregister; - chip->role_sw = fwnode_usb_role_switch_get(fwnode); if (IS_ERR(chip->role_sw)) { ret = PTR_ERR(chip->role_sw); @@ -752,6 +748,10 @@ static int stusb160x_probe(struct i2c_cl ret); goto port_unregister; } + + ret = stusb160x_irq_init(chip, client->irq); + if (ret) + goto role_sw_put; } else { /* * If Source or Dual power role, need to enable VDD supply @@ -775,6 +775,9 @@ static int stusb160x_probe(struct i2c_cl
return 0;
+role_sw_put: + if (chip->role_sw) + usb_role_switch_put(chip->role_sw); port_unregister: typec_unregister_port(chip->port); all_reg_disable:
From: Marc Zyngier maz@kernel.org
commit 2bab693a608bdf614b9fcd44083c5100f34b9f77 upstream.
kexec_load_file() relies on the memblock infrastructure to avoid stamping over regions of memory that are essential to the survival of the system.
However, nobody seems to agree how to flag these regions as reserved, and (for example) EFI only publishes its reservations in /proc/iomem for the benefit of the traditional, userspace based kexec tool.
On arm64 platforms with GICv3, this can result in the payload being placed at the location of the LPI tables. Shock, horror!
Let's augment the EFI reservation code with a memblock_reserve() call, protecting our dear tables from the secondary kernel invasion.
Reported-by: Moritz Fischer mdf@kernel.org Tested-by: Moritz Fischer mdf@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Cc: Ard Biesheuvel ardb@kernel.org Cc: James Morse james.morse@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/efi.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -896,6 +896,7 @@ static int __init efi_memreserve_map_roo static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size) { struct resource *res, *parent; + int ret;
res = kzalloc(sizeof(struct resource), GFP_ATOMIC); if (!res) @@ -908,7 +909,17 @@ static int efi_mem_reserve_iomem(phys_ad
/* we expect a conflict with a 'System RAM' region */ parent = request_resource_conflict(&iomem_resource, res); - return parent ? request_resource(parent, res) : 0; + ret = parent ? request_resource(parent, res) : 0; + + /* + * Given that efi_mem_reserve_iomem() can be called at any + * time, only call memblock_reserve() if the architecture + * keeps the infrastructure around. + */ + if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK) && !ret) + memblock_reserve(addr, size); + + return ret; }
int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 352384d5c84ebe40fa77098cc234fe173247d8ef upstream.
Because of the significant overhead that retpolines pose on indirect calls, the tracepoint code was updated to use the new "static_calls" that can modify the running code to directly call a function instead of using an indirect caller, and this function can be changed at runtime.
In the tracepoint code that calls all the registered callbacks that are attached to a tracepoint, the following is done:
it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs); if (it_func_ptr) { __data = (it_func_ptr)->data; static_call(tp_func_##name)(__data, args); }
If there's just a single callback, the static_call is updated to just call that callback directly. Once another handler is added, then the static caller is updated to call the iterator, that simply loops over all the funcs in the array and calls each of the callbacks like the old method using indirect calling.
The issue was discovered with a race between updating the funcs array and updating the static_call. The funcs array was updated first and then the static_call was updated. This is not an issue as long as the first element in the old array is the same as the first element in the new array. But that assumption is incorrect, because callbacks also have a priority field, and if there's a callback added that has a higher priority than the callback on the old array, then it will become the first callback in the new array. This means that it is possible to call the old callback with the new callback data element, which can cause a kernel panic.
static_call = callback1() funcs[] = {callback1,data1}; callback2 has higher priority than callback1
CPU 1 CPU 2 ----- -----
new_funcs = {callback2,data2}, {callback1,data1}
rcu_assign_pointer(tp->funcs, new_funcs);
/* * Now tp->funcs has the new array * but the static_call still calls callback1 */
it_func_ptr = tp->funcs [ new_funcs ] data = it_func_ptr->data [ data2 ] static_call(callback1, data);
/* Now callback1 is called with * callback2's data */
[ KERNEL PANIC ]
update_static_call(iterator);
To prevent this from happening, always switch the static_call to the iterator before assigning the tp->funcs to the new array. The iterator will always properly match the callback with its data.
To trigger this bug:
In one terminal:
while :; do hackbench 50; done
In another terminal
echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable while :; do echo 1 > /sys/kernel/tracing/set_event_pid; sleep 0.5 echo 0 > /sys/kernel/tracing/set_event_pid; sleep 0.5 done
And it doesn't take long to crash. This is because the set_event_pid adds a callback to the sched_waking tracepoint with a high priority, which will be called before the sched_waking trace event callback is called.
Note, the removal to a single callback updates the array first, before changing the static_call to single callback, which is the proper order as the first element in the array is the same as what the static_call is being changed to.
Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba....
Cc: stable@vger.kernel.org Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()") Reported-by: Stefan Metzmacher metze@samba.org tested-by: Stefan Metzmacher metze@samba.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/tracepoint.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -320,8 +320,8 @@ static int tracepoint_add_func(struct tr * a pointer to it. This array is referenced by __DO_TRACE from * include/linux/tracepoint.h using rcu_dereference_sched(). */ - rcu_assign_pointer(tp->funcs, tp_funcs); tracepoint_update_call(tp, tp_funcs, false); + rcu_assign_pointer(tp->funcs, tp_funcs); static_key_enable(&tp->key);
release_probes(old);
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 1e3bac71c5053c99d438771fc9fa5082ae5d90aa upstream.
Currently the histogram logic allows the user to write "cpu" in as an event field, and it will record the CPU that the event happened on.
The problem with this is that there's a lot of events that have "cpu" as a real field, and using "cpu" as the CPU it ran on, makes it impossible to run histograms on the "cpu" field of events.
For example, if I want to have a histogram on the count of the workqueue_queue_work event on its cpu field, running:
# echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger
Gives a misleading and wrong result.
Change the command to "common_cpu" as no event should have "common_*" fields as that's a reserved name for fields used by all events. And this makes sense here as common_cpu would be a field used by all events.
Now we can even do:
# echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger # cat events/workqueue/workqueue_queue_work/hist
# event histogram # # trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active] #
{ common_cpu: 0, cpu: 2 } hitcount: 1 { common_cpu: 0, cpu: 4 } hitcount: 1 { common_cpu: 7, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 1 } hitcount: 1 { common_cpu: 0, cpu: 6 } hitcount: 2 { common_cpu: 0, cpu: 5 } hitcount: 2 { common_cpu: 1, cpu: 1 } hitcount: 4 { common_cpu: 6, cpu: 6 } hitcount: 4 { common_cpu: 5, cpu: 5 } hitcount: 14 { common_cpu: 4, cpu: 4 } hitcount: 26 { common_cpu: 0, cpu: 0 } hitcount: 39 { common_cpu: 2, cpu: 2 } hitcount: 184
Now for backward compatibility, I added a trick. If "cpu" is used, and the field is not found, it will fall back to "common_cpu" and work as it did before. This way, it will still work for old programs that use "cpu" to get the actual CPU, but if the event has a "cpu" as a field, it will get that event's "cpu" field, which is probably what it wants anyway.
I updated the tracefs/README to include documentation about both the common_timestamp and the common_cpu. This way, if that text is present in the README, then an application can know that common_cpu is supported over just plain "cpu".
Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home
Cc: Namhyung Kim namhyung@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: stable@vger.kernel.org Fixes: 8b7622bf94a44 ("tracing: Add cpu field for hist triggers") Reviewed-by: Tom Zanussi zanussi@kernel.org Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/trace/histogram.rst | 2 +- kernel/trace/trace.c | 4 ++++ kernel/trace/trace_events_hist.c | 22 ++++++++++++++++------ 3 files changed, 21 insertions(+), 7 deletions(-)
--- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -191,7 +191,7 @@ Documentation written by Tom Zanussi with the event, in nanoseconds. May be modified by .usecs to have timestamps interpreted as microseconds. - cpu int the cpu on which the event occurred. + common_cpu int the cpu on which the event occurred. ====================== ==== =======================================
Extended error information --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5241,6 +5241,10 @@ static const char readme_msg[] = "\t [:name=histname1]\n" "\t [:<handler>.<action>]\n" "\t [if <filter>]\n\n" + "\t Note, special fields can be used as well:\n" + "\t common_timestamp - to record current timestamp\n" + "\t common_cpu - to record the CPU the event happened on\n" + "\n" "\t When a matching event is hit, an entry is added to a hash\n" "\t table using the key(s) and value(s) named, and the value of a\n" "\t sum called 'hitcount' is incremented. Keys and values\n" --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -1095,7 +1095,7 @@ static const char *hist_field_name(struc field->flags & HIST_FIELD_FL_ALIAS) field_name = hist_field_name(field->operands[0], ++level); else if (field->flags & HIST_FIELD_FL_CPU) - field_name = "cpu"; + field_name = "common_cpu"; else if (field->flags & HIST_FIELD_FL_EXPR || field->flags & HIST_FIELD_FL_VAR_REF) { if (field->system) { @@ -1975,14 +1975,24 @@ parse_field(struct hist_trigger_data *hi hist_data->enable_timestamps = true; if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS) hist_data->attrs->ts_in_usecs = true; - } else if (strcmp(field_name, "cpu") == 0) + } else if (strcmp(field_name, "common_cpu") == 0) *flags |= HIST_FIELD_FL_CPU; else { field = trace_find_event_field(file->event_call, field_name); if (!field || !field->size) { - hist_err(tr, HIST_ERR_FIELD_NOT_FOUND, errpos(field_name)); - field = ERR_PTR(-EINVAL); - goto out; + /* + * For backward compatibility, if field_name + * was "cpu", then we treat this the same as + * common_cpu. + */ + if (strcmp(field_name, "cpu") == 0) { + *flags |= HIST_FIELD_FL_CPU; + } else { + hist_err(tr, HIST_ERR_FIELD_NOT_FOUND, + errpos(field_name)); + field = ERR_PTR(-EINVAL); + goto out; + } } } out: @@ -5057,7 +5067,7 @@ static void hist_field_print(struct seq_ seq_printf(m, "%s=", hist_field->var.name);
if (hist_field->flags & HIST_FIELD_FL_CPU) - seq_puts(m, "cpu"); + seq_puts(m, "common_cpu"); else if (field_name) { if (hist_field->flags & HIST_FIELD_FL_VAR_REF || hist_field->flags & HIST_FIELD_FL_ALIAS)
From: Haoran Luo www@aegistudio.net
commit 67f0d6d9883c13174669f88adac4f0ee656cc16a upstream.
The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when "head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to the same buffer page, whose "buffer_data_page" is empty and "read" field is non-zero.
An error scenario could be constructed as followed (kernel perspective):
1. All pages in the buffer has been accessed by reader(s) so that all of them will have non-zero "read" field.
2. Read and clear all buffer pages so that "rb_num_of_entries()" will return 0 rendering there's no more data to read. It is also required that the "read_page", "commit_page" and "tail_page" points to the same page, while "head_page" is the next page of them.
3. Invoke "ring_buffer_lock_reserve()" with large enough "length" so that it shot pass the end of current tail buffer page. Now the "head_page", "commit_page" and "tail_page" points to the same page.
4. Discard current event with "ring_buffer_discard_commit()", so that "head_page", "commit_page" and "tail_page" points to a page whose buffer data page is now empty.
When the error scenario has been constructed, "tracing_read_pipe" will be trapped inside a deadloop: "trace_empty()" returns 0 since "rb_per_cpu_empty()" returns 0 when it hits the CPU containing such constructed ring buffer. Then "trace_find_next_entry_inc()" always return NULL since "rb_num_of_entries()" reports there's no more entry to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking "tracing_read_pipe" back to the start of the "waitagain" loop.
I've also written a proof-of-concept script to construct the scenario and trigger the bug automatically, you can use it to trace and validate my reasoning above:
https://github.com/aegistudio/RingBufferDetonator.git
Tests has been carried out on linux kernel 5.14-rc2 (2734d6c1b1a089fb593ef6a23d4b70903526fe0c), my fixed version of kernel (for testing whether my update fixes the bug) and some older kernels (for range of affected kernels). Test result is also attached to the proof-of-concept repository.
Link: https://lore.kernel.org/linux-trace-devel/YPaNxsIlb2yjSi5Y@aegistudio/ Link: https://lore.kernel.org/linux-trace-devel/YPgrN85WL9VyrZ55@aegistudio
Cc: stable@vger.kernel.org Fixes: bf41a158cacba ("ring-buffer: make reentrant") Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Signed-off-by: Haoran Luo www@aegistudio.net Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ring_buffer.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3649,10 +3649,30 @@ static bool rb_per_cpu_empty(struct ring if (unlikely(!head)) return true;
- return reader->read == rb_page_commit(reader) && - (commit == reader || - (commit == head && - head->read == rb_page_commit(commit))); + /* Reader should exhaust content in reader page */ + if (reader->read != rb_page_commit(reader)) + return false; + + /* + * If writers are committing on the reader page, knowing all + * committed content has been read, the ring buffer is empty. + */ + if (commit == reader) + return true; + + /* + * If writers are committing on a page other than reader page + * and head page, there should always be content to read. + */ + if (commit != head) + return false; + + /* + * Writers are committing on the head page, we just need + * to care about there're committed data, and the reader will + * swap reader page with head page when it is to read data. + */ + return rb_page_commit(commit) == 0; }
/**
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 3b13911a2fd0dd0146c9777a254840c5466cf120 upstream.
Performing the following:
# echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events # echo 'hist:keys=pid:__arg__1=common_timestamp.usecs' > events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1:onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,prev_comm)'\
> events/sched/sched_switch/trigger
# echo 1 > events/synthetic/enable
Crashed the kernel:
BUG: kernel NULL pointer dereference, address: 000000000000001b #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.13.0-rc5-test+ #104 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strlen+0x0/0x20 Code: f6 82 80 2b 0b bc 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2b 0b bc 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 9 f8 c3 31 RSP: 0018:ffffaa75000d79d0 EFLAGS: 00010046 RAX: 0000000000000002 RBX: ffff9cdb55575270 RCX: 0000000000000000 RDX: ffff9cdb58c7a320 RSI: ffffaa75000d7b40 RDI: 000000000000001b RBP: ffffaa75000d7b40 R08: ffff9cdb40a4f010 R09: ffffaa75000d7ab8 R10: ffff9cdb4398c700 R11: 0000000000000008 R12: ffff9cdb58c7a320 R13: ffff9cdb55575270 R14: ffff9cdb58c7a000 R15: 0000000000000018 FS: 0000000000000000(0000) GS:ffff9cdb5aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001b CR3: 00000000c0612006 CR4: 00000000001706e0 Call Trace: trace_event_raw_event_synth+0x90/0x1d0 action_trace+0x5b/0x70 event_hist_trigger+0x4bd/0x4e0 ? cpumask_next_and+0x20/0x30 ? update_sd_lb_stats.constprop.0+0xf6/0x840 ? __lock_acquire.constprop.0+0x125/0x550 ? find_held_lock+0x32/0x90 ? sched_clock_cpu+0xe/0xd0 ? lock_release+0x155/0x440 ? update_load_avg+0x8c/0x6f0 ? enqueue_entity+0x18a/0x920 ? __rb_reserve_next+0xe5/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x1ae/0x240 trace_event_raw_event_sched_switch+0x114/0x170 __traceiter_sched_switch+0x39/0x50 __schedule+0x431/0xb00 schedule_idle+0x28/0x40 do_idle+0x198/0x2e0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb
The reason is that the dynamic events array keeps track of the field position of the fields array, via the field_pos variable in the synth_field structure. Unfortunately, that field is a boolean for some reason, which means any field_pos greater than 1 will be a bug (in this case it was 2).
Link: https://lkml.kernel.org/r/20210721191008.638bce34@oasis.local.home
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: stable@vger.kernel.org Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") Reviewed-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_synth.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/trace_synth.h +++ b/kernel/trace/trace_synth.h @@ -14,10 +14,10 @@ struct synth_field { char *name; size_t size; unsigned int offset; + unsigned int field_pos; bool is_signed; bool is_string; bool is_dynamic; - bool field_pos; };
struct synth_event {
From: Anand Jain anand.jain@oracle.com
commit 16a200f66ede3f9afa2e51d90ade017aaa18d213 upstream.
A fstrim on a degraded raid1 can trigger the following null pointer dereference:
BTRFS info (device loop0): allowing degraded mounts BTRFS info (device loop0): disk space caching is enabled BTRFS info (device loop0): has skinny extents BTRFS warning (device loop0): devid 2 uuid 97ac16f7-e14d-4db1-95bc-3d489b424adb is missing BTRFS warning (device loop0): devid 2 uuid 97ac16f7-e14d-4db1-95bc-3d489b424adb is missing BTRFS info (device loop0): enabling ssd optimizations BUG: kernel NULL pointer dereference, address: 0000000000000620 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 4574 Comm: fstrim Not tainted 5.13.0-rc7+ #31 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:btrfs_trim_fs+0x199/0x4a0 [btrfs] RSP: 0018:ffff959541797d28 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff946f84eca508 RCX: a7a67937adff8608 RDX: ffff946e8122d000 RSI: 0000000000000000 RDI: ffffffffc02fdbf0 RBP: ffff946ea4615000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: ffff946e8122d960 R12: 0000000000000000 R13: ffff959541797db8 R14: ffff946e8122d000 R15: ffff959541797db8 FS: 00007f55917a5080(0000) GS:ffff946f9bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000620 CR3: 000000002d2c8001 CR4: 00000000000706f0 Call Trace: btrfs_ioctl_fitrim+0x167/0x260 [btrfs] btrfs_ioctl+0x1c00/0x2fe0 [btrfs] ? selinux_file_ioctl+0x140/0x240 ? syscall_trace_enter.constprop.0+0x188/0x240 ? __x64_sys_ioctl+0x83/0xb0 __x64_sys_ioctl+0x83/0xb0
Reproducer:
$ mkfs.btrfs -fq -d raid1 -m raid1 /dev/loop0 /dev/loop1 $ mount /dev/loop0 /btrfs $ umount /btrfs $ btrfs dev scan --forget $ mount -o degraded /dev/loop0 /btrfs
$ fstrim /btrfs
The reason is we call btrfs_trim_free_extents() for the missing device, which uses device->bdev (NULL for missing device) to find if the device supports discard.
Fix is to check if the device is missing before calling btrfs_trim_free_extents().
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5883,6 +5883,9 @@ int btrfs_trim_fs(struct btrfs_fs_info * mutex_lock(&fs_info->fs_devices->device_list_mutex); devices = &fs_info->fs_devices->devices; list_for_each_entry(device, devices, dev_list) { + if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) + continue; + ret = btrfs_trim_free_extents(device, &group_trimmed); if (ret) { dev_failed++;
From: Gustavo A. R. Silva gustavoars@kernel.org
commit 8d4abca95ecc82fc8c41912fa0085281f19cc29f upstream.
Fix an 11-year old bug in ngene_command_config_free_buf() while addressing the following warnings caught with -Warray-bounds:
arch/alpha/include/asm/string.h:22:16: warning: '__builtin_memcpy' offset [12, 16] from the object at 'com' is out of the bounds of referenced subobject 'config' with type 'unsigned char' at offset 10 [-Warray-bounds] arch/x86/include/asm/string_32.h:182:25: warning: '__builtin_memcpy' offset [12, 16] from the object at 'com' is out of the bounds of referenced subobject 'config' with type 'unsigned char' at offset 10 [-Warray-bounds]
The problem is that the original code is trying to copy 6 bytes of data into a one-byte size member _config_ of the wrong structue FW_CONFIGURE_BUFFERS, in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &com.cmd.ConfigureBuffers.config. It seems that the right structure is FW_CONFIGURE_FREE_BUFFERS, instead, because it contains 6 more members apart from the header _hdr_. Also, the name of the function ngene_command_config_free_buf() suggests that the actual intention is to ConfigureFreeBuffers, instead of ConfigureBuffers (which takes place in the function ngene_command_config_buf(), above).
Fix this by enclosing those 6 members of struct FW_CONFIGURE_FREE_BUFFERS into new struct config, and use &com.cmd.ConfigureFreeBuffers.config as the destination address, instead of &com.cmd.ConfigureBuffers.config, when calling memcpy().
This also helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy().
Link: https://github.com/KSPP/linux/issues/109 Fixes: dae52d009fc9 ("V4L/DVB: ngene: Initial check-in") Cc: stable@vger.kernel.org Reported-by: kernel test robot lkp@intel.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Link: https://lore.kernel.org/linux-hardening/20210420001631.GA45456@embeddedor/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/pci/ngene/ngene-core.c | 2 +- drivers/media/pci/ngene/ngene.h | 14 ++++++++------ 2 files changed, 9 insertions(+), 7 deletions(-)
--- a/drivers/media/pci/ngene/ngene-core.c +++ b/drivers/media/pci/ngene/ngene-core.c @@ -385,7 +385,7 @@ static int ngene_command_config_free_buf
com.cmd.hdr.Opcode = CMD_CONFIGURE_FREE_BUFFER; com.cmd.hdr.Length = 6; - memcpy(&com.cmd.ConfigureBuffers.config, config, 6); + memcpy(&com.cmd.ConfigureFreeBuffers.config, config, 6); com.in_len = 6; com.out_len = 0;
--- a/drivers/media/pci/ngene/ngene.h +++ b/drivers/media/pci/ngene/ngene.h @@ -407,12 +407,14 @@ enum _BUFFER_CONFIGS {
struct FW_CONFIGURE_FREE_BUFFERS { struct FW_HEADER hdr; - u8 UVI1_BufferLength; - u8 UVI2_BufferLength; - u8 TVO_BufferLength; - u8 AUD1_BufferLength; - u8 AUD2_BufferLength; - u8 TVA_BufferLength; + struct { + u8 UVI1_BufferLength; + u8 UVI2_BufferLength; + u8 TVO_BufferLength; + u8 AUD1_BufferLength; + u8 AUD2_BufferLength; + u8 TVA_BufferLength; + } __packed config; } __attribute__ ((__packed__));
struct FW_CONFIGURE_UART {
From: Markus Boehme markubo@amazon.com
commit 09cfae9f13d51700b0fecf591dcd658fc5375428 upstream.
When receiving a packet with multiple fragments, hardware may still touch the first fragment until the entire packet has been received. The driver therefore keeps the first fragment mapped for DMA until end of packet has been asserted, and delays its dma_sync call until then.
The driver tries to fit multiple receive buffers on one page. When using 3K receive buffers (e.g. using Jumbo frames and legacy-rx is turned off/build_skb is being used) on an architecture with 4K pages, the driver allocates an order 1 compound page and uses one page per receive buffer. To determine the correct offset for a delayed DMA sync of the first fragment of a multi-fragment packet, the driver then cannot just use PAGE_MASK on the DMA address but has to construct a mask based on the actual size of the backing page.
Using PAGE_MASK in the 3K RX buffer/4K page architecture configuration will always sync the first page of a compound page. With the SWIOTLB enabled this can lead to corrupted packets (zeroed out first fragment, re-used garbage from another packet) and various consequences, such as slow/stalling data transfers and connection resets. For example, testing on a link with MTU exceeding 3058 bytes on a host with SWIOTLB enabled (e.g. "iommu=soft swiotlb=262144,force") TCP transfers quickly fizzle out without this patch.
Cc: stable@vger.kernel.org Fixes: 0c5661ecc5dd7 ("ixgbe: fix crash in build_skb Rx code path") Signed-off-by: Markus Boehme markubo@amazon.com Tested-by: Tony Brelinski tonyx.brelinski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1825,7 +1825,8 @@ static void ixgbe_dma_sync_frag(struct i struct sk_buff *skb) { if (ring_uses_build_skb(rx_ring)) { - unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK; + unsigned long mask = (unsigned long)ixgbe_rx_pg_size(rx_ring) - 1; + unsigned long offset = (unsigned long)(skb->data) & mask;
dma_sync_single_range_for_cpu(rx_ring->dev, IXGBE_CB(skb)->dma,
From: Bhaumik Bhatt bbhatt@codeaurora.org
commit 546362a9ef2ef40b57c6605f14e88ced507f8dd0 upstream.
MHI reads the channel ID from the event ring element sent by the device which can be any value between 0 and 255. In order to prevent any out of bound accesses, add a check against the maximum number of channels supported by the controller and those channels not configured yet so as to skip processing of that event ring element.
Link: https://lore.kernel.org/r/1624558141-11045-1-git-send-email-bbhatt@codeauror... Fixes: 1d3173a3bae7 ("bus: mhi: core: Add support for processing events from client device") Cc: stable@vger.kernel.org #5.10 Reviewed-by: Hemant Kumar hemantk@codeaurora.org Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Signed-off-by: Bhaumik Bhatt bbhatt@codeaurora.org Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20210716075106.49938-3-manivannan.sadhasivam@linar... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bus/mhi/core/main.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
--- a/drivers/bus/mhi/core/main.c +++ b/drivers/bus/mhi/core/main.c @@ -706,11 +706,18 @@ static void mhi_process_cmd_completion(s cmd_pkt = mhi_to_virtual(mhi_ring, ptr);
chan = MHI_TRE_GET_CMD_CHID(cmd_pkt); - mhi_chan = &mhi_cntrl->mhi_chan[chan]; - write_lock_bh(&mhi_chan->lock); - mhi_chan->ccs = MHI_TRE_GET_EV_CODE(tre); - complete(&mhi_chan->completion); - write_unlock_bh(&mhi_chan->lock); + + if (chan < mhi_cntrl->max_chan && + mhi_cntrl->mhi_chan[chan].configured) { + mhi_chan = &mhi_cntrl->mhi_chan[chan]; + write_lock_bh(&mhi_chan->lock); + mhi_chan->ccs = MHI_TRE_GET_EV_CODE(tre); + complete(&mhi_chan->completion); + write_unlock_bh(&mhi_chan->lock); + } else { + dev_err(&mhi_cntrl->mhi_dev->dev, + "Completion packet for invalid channel ID: %d\n", chan); + }
mhi_del_ring_element(mhi_cntrl, mhi_ring); }
From: Frederic Weisbecker frederic@kernel.org
commit 1a3402d93c73bf6bb4df6d7c2aac35abfc3c50e2 upstream.
Since the process wide cputime counter is started locklessly from posix_cpu_timer_rearm(), it can be concurrently stopped by operations on other timers from the same thread group, such as in the following unlucky scenario:
CPU 0 CPU 1 ----- ----- timer_settime(TIMER B) posix_cpu_timer_rearm(TIMER A) cpu_clock_sample_group() (pct->timers_active already true)
handle_posix_cpu_timers() check_process_timers() stop_process_timers() pct->timers_active = false arm_timer(TIMER A)
tick -> run_posix_cpu_timers() // sees !pct->timers_active, ignore // our TIMER A
Fix this with simply locking process wide cputime counting start and timer arm in the same block.
Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group") Cc: stable@vger.kernel.org Cc: Oleg Nesterov oleg@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/posix-cpu-timers.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -991,6 +991,11 @@ static void posix_cpu_timer_rearm(struct if (!p) goto out;
+ /* Protect timer list r/w in arm_timer() */ + sighand = lock_task_sighand(p, &flags); + if (unlikely(sighand == NULL)) + goto out; + /* * Fetch the current sample and update the timer's expiry time. */ @@ -1001,11 +1006,6 @@ static void posix_cpu_timer_rearm(struct
bump_cpu_timer(timer, now);
- /* Protect timer list r/w in arm_timer() */ - sighand = lock_task_sighand(p, &flags); - if (unlikely(sighand == NULL)) - goto out; - /* * Now re-arm for the new expiry time. */
From: Peter Collingbourne pcc@google.com
commit 0db282ba2c12c1515d490d14a1ff696643ab0f1b upstream.
This test passes pointers obtained from anon_allocate_area to the userfaultfd and mremap APIs. This causes a problem if the system allocator returns tagged pointers because with the tagged address ABI the kernel rejects tagged addresses passed to these APIs, which would end up causing the test to fail. To make this test compatible with such system allocators, stop using the system allocator to allocate memory in anon_allocate_area, and instead just use mmap.
Link: https://lkml.kernel.org/r/20210714195437.118982-3-pcc@google.com Link: https://linux-review.googlesource.com/id/Icac91064fcd923f77a83e8e133f8631c5b... Fixes: c47174fc362a ("userfaultfd: selftest") Co-developed-by: Lokesh Gidra lokeshgidra@google.com Signed-off-by: Lokesh Gidra lokeshgidra@google.com Signed-off-by: Peter Collingbourne pcc@google.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Dave Martin Dave.Martin@arm.com Cc: Will Deacon will@kernel.org Cc: Andrea Arcangeli aarcange@redhat.com Cc: Alistair Delva adelva@google.com Cc: William McVicker willmcvicker@google.com Cc: Evgenii Stepanov eugenis@google.com Cc: Mitch Phillips mitchp@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: stable@vger.kernel.org [5.4] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/vm/userfaultfd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -180,8 +180,10 @@ static int anon_release_pages(char *rel_
static void anon_allocate_area(void **alloc_area) { - if (posix_memalign(alloc_area, page_size, nr_pages * page_size)) { - fprintf(stderr, "out of memory\n"); + *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (*alloc_area == MAP_FAILED) + fprintf(stderr, "mmap of anonymous memory failed"); *alloc_area = NULL; } }
From: Pavel Begunkov asml.silence@gmail.com
commit 68b11e8b1562986c134764433af64e97d30c9fc0 upstream.
If __io_queue_proc() fails to add a second poll entry, e.g. kmalloc() failed, but it goes on with a third waitqueue, it may succeed and overwrite the error status. Count the number of poll entries we added, so we can set pt->error to zero at the beginning and find out when the mentioned scenario happens.
Cc: stable@vger.kernel.org Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/9d6b9e561f88bcc0163623b74a76c39f712151c3.162677445... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4916,6 +4916,7 @@ static int io_connect(struct io_kiocb *r struct io_poll_table { struct poll_table_struct pt; struct io_kiocb *req; + int nr_entries; int error; };
@@ -5098,11 +5099,11 @@ static void __io_queue_proc(struct io_po struct io_kiocb *req = pt->req;
/* - * If poll->head is already set, it's because the file being polled - * uses multiple waitqueues for poll handling (eg one for read, one - * for write). Setup a separate io_poll_iocb if this happens. + * The file being polled uses multiple waitqueues for poll handling + * (e.g. one for read, one for write). Setup a separate io_poll_iocb + * if this happens. */ - if (unlikely(poll->head)) { + if (unlikely(pt->nr_entries)) { struct io_poll_iocb *poll_one = poll;
/* already have a 2nd entry, fail a third attempt */ @@ -5124,7 +5125,7 @@ static void __io_queue_proc(struct io_po *poll_ptr = poll; }
- pt->error = 0; + pt->nr_entries++; poll->head = head;
if (poll->events & EPOLLEXCLUSIVE) @@ -5210,9 +5211,12 @@ static __poll_t __io_arm_poll_handler(st
ipt->pt._key = mask; ipt->req = req; - ipt->error = -EINVAL; + ipt->error = 0; + ipt->nr_entries = 0;
mask = vfs_poll(req->file, &ipt->pt) & poll->events; + if (unlikely(!ipt->nr_entries) && !ipt->error) + ipt->error = -EINVAL;
spin_lock_irq(&ctx->completion_lock); if (likely(poll->head)) {
From: Pavel Begunkov asml.silence@gmail.com
commit 46fee9ab02cb24979bbe07631fc3ae95ae08aa3e upstream.
__io_queue_proc() can enqueue both poll entries and still fail afterwards, so the callers trying to cancel it should also try to remove the second poll entry (if any).
For example, it may leave the request alive referencing a io_uring context but not accessible for cancellation:
[ 282.599913][ T1620] task:iou-sqp-23145 state:D stack:28720 pid:23155 ppid: 8844 flags:0x00004004 [ 282.609927][ T1620] Call Trace: [ 282.613711][ T1620] __schedule+0x93a/0x26f0 [ 282.634647][ T1620] schedule+0xd3/0x270 [ 282.638874][ T1620] io_uring_cancel_generic+0x54d/0x890 [ 282.660346][ T1620] io_sq_thread+0xaac/0x1250 [ 282.696394][ T1620] ret_from_fork+0x1f/0x30
Cc: stable@vger.kernel.org Fixes: 18bceab101add ("io_uring: allow POLL_ADD with double poll_wait() users") Reported-and-tested-by: syzbot+ac957324022b7132accf@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/0ec1228fc5eda4cb524eeda857da8efdc43c331c.162677445... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5219,6 +5219,8 @@ static __poll_t __io_arm_poll_handler(st ipt->error = -EINVAL;
spin_lock_irq(&ctx->completion_lock); + if (ipt->error) + io_poll_remove_double(req); if (likely(poll->head)) { spin_lock(&poll->head->lock); if (unlikely(list_empty(&poll->wait.entry))) {
From: Peter Collingbourne pcc@google.com
commit e71e2ace5721a8b921dca18b045069e7bb411277 upstream.
Patch series "userfaultfd: do not untag user pointers", v5.
If a user program uses userfaultfd on ranges of heap memory, it may end up passing a tagged pointer to the kernel in the range.start field of the UFFDIO_REGISTER ioctl. This can happen when using an MTE-capable allocator, or on Android if using the Tagged Pointers feature for MTE readiness [1].
When a fault subsequently occurs, the tag is stripped from the fault address returned to the application in the fault.address field of struct uffd_msg. However, from the application's perspective, the tagged address *is* the memory address, so if the application is unaware of memory tags, it may get confused by receiving an address that is, from its point of view, outside of the bounds of the allocation. We observed this behavior in the kselftest for userfaultfd [2] but other applications could have the same problem.
Address this by not untagging pointers passed to the userfaultfd ioctls. Instead, let the system call fail. Also change the kselftest to use mmap so that it doesn't encounter this problem.
[1] https://source.android.com/devices/tech/debug/tagged-pointers [2] tools/testing/selftests/vm/userfaultfd.c
This patch (of 2):
Do not untag pointers passed to the userfaultfd ioctls. Instead, let the system call fail. This will provide an early indication of problems with tag-unaware userspace code instead of letting the code get confused later, and is consistent with how we decided to handle brk/mmap/mremap in commit dcde237319e6 ("mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()"), as well as being consistent with the existing tagged address ABI documentation relating to how ioctl arguments are handled.
The code change is a revert of commit 7d0325749a6c ("userfaultfd: untag user pointers") plus some fixups to some additional calls to validate_range that have appeared since then.
[1] https://source.android.com/devices/tech/debug/tagged-pointers [2] tools/testing/selftests/vm/userfaultfd.c
Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a... Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI") Signed-off-by: Peter Collingbourne pcc@google.com Reviewed-by: Andrey Konovalov andreyknvl@gmail.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Cc: Alistair Delva adelva@google.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dave Martin Dave.Martin@arm.com Cc: Evgenii Stepanov eugenis@google.com Cc: Lokesh Gidra lokeshgidra@google.com Cc: Mitch Phillips mitchp@google.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Will Deacon will@kernel.org Cc: William McVicker willmcvicker@google.com Cc: stable@vger.kernel.org [5.4] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arm64/tagged-address-abi.rst | 26 ++++++++++++++++++-------- fs/userfaultfd.c | 24 +++++++++++------------- 2 files changed, 29 insertions(+), 21 deletions(-)
--- a/Documentation/arm64/tagged-address-abi.rst +++ b/Documentation/arm64/tagged-address-abi.rst @@ -45,14 +45,24 @@ how the user addresses are used by the k
1. User addresses not accessed by the kernel but used for address space management (e.g. ``mprotect()``, ``madvise()``). The use of valid - tagged pointers in this context is allowed with the exception of - ``brk()``, ``mmap()`` and the ``new_address`` argument to - ``mremap()`` as these have the potential to alias with existing - user addresses. - - NOTE: This behaviour changed in v5.6 and so some earlier kernels may - incorrectly accept valid tagged pointers for the ``brk()``, - ``mmap()`` and ``mremap()`` system calls. + tagged pointers in this context is allowed with these exceptions: + + - ``brk()``, ``mmap()`` and the ``new_address`` argument to + ``mremap()`` as these have the potential to alias with existing + user addresses. + + NOTE: This behaviour changed in v5.6 and so some earlier kernels may + incorrectly accept valid tagged pointers for the ``brk()``, + ``mmap()`` and ``mremap()`` system calls. + + - The ``range.start``, ``start`` and ``dst`` arguments to the + ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from + ``userfaultfd()``, as fault addresses subsequently obtained by reading + the file descriptor will be untagged, which may otherwise confuse + tag-unaware programs. + + NOTE: This behaviour changed in v5.14 and so some earlier kernels may + incorrectly accept valid tagged pointers for this system call.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI relaxation is disabled by default and the application thread needs to --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1228,23 +1228,21 @@ static __always_inline void wake_userfau }
static __always_inline int validate_range(struct mm_struct *mm, - __u64 *start, __u64 len) + __u64 start, __u64 len) { __u64 task_size = mm->task_size;
- *start = untagged_addr(*start); - - if (*start & ~PAGE_MASK) + if (start & ~PAGE_MASK) return -EINVAL; if (len & ~PAGE_MASK) return -EINVAL; if (!len) return -EINVAL; - if (*start < mmap_min_addr) + if (start < mmap_min_addr) return -EINVAL; - if (*start >= task_size) + if (start >= task_size) return -EINVAL; - if (len > task_size - *start) + if (len > task_size - start) return -EINVAL; return 0; } @@ -1290,7 +1288,7 @@ static int userfaultfd_register(struct u if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) vm_flags |= VM_UFFD_WP;
- ret = validate_range(mm, &uffdio_register.range.start, + ret = validate_range(mm, uffdio_register.range.start, uffdio_register.range.len); if (ret) goto out; @@ -1490,7 +1488,7 @@ static int userfaultfd_unregister(struct if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) goto out;
- ret = validate_range(mm, &uffdio_unregister.start, + ret = validate_range(mm, uffdio_unregister.start, uffdio_unregister.len); if (ret) goto out; @@ -1639,7 +1637,7 @@ static int userfaultfd_wake(struct userf if (copy_from_user(&uffdio_wake, buf, sizeof(uffdio_wake))) goto out;
- ret = validate_range(ctx->mm, &uffdio_wake.start, uffdio_wake.len); + ret = validate_range(ctx->mm, uffdio_wake.start, uffdio_wake.len); if (ret) goto out;
@@ -1679,7 +1677,7 @@ static int userfaultfd_copy(struct userf sizeof(uffdio_copy)-sizeof(__s64))) goto out;
- ret = validate_range(ctx->mm, &uffdio_copy.dst, uffdio_copy.len); + ret = validate_range(ctx->mm, uffdio_copy.dst, uffdio_copy.len); if (ret) goto out; /* @@ -1736,7 +1734,7 @@ static int userfaultfd_zeropage(struct u sizeof(uffdio_zeropage)-sizeof(__s64))) goto out;
- ret = validate_range(ctx->mm, &uffdio_zeropage.range.start, + ret = validate_range(ctx->mm, uffdio_zeropage.range.start, uffdio_zeropage.range.len); if (ret) goto out; @@ -1786,7 +1784,7 @@ static int userfaultfd_writeprotect(stru sizeof(struct uffdio_writeprotect))) return -EFAULT;
- ret = validate_range(ctx->mm, &uffdio_wp.range.start, + ret = validate_range(ctx->mm, uffdio_wp.range.start, uffdio_wp.range.len); if (ret) return ret;
From: Mike Rapoport rppt@linux.ibm.com
commit 79e482e9c3ae86e849c701c846592e72baddda5a upstream.
Commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") didn't take into account that when there is movable_node parameter in the kernel command line, for_each_mem_range() would skip ranges marked with MEMBLOCK_HOTPLUG.
The page table setup code in POWER uses for_each_mem_range() to create the linear mapping of the physical memory and since the regions marked as MEMORY_HOTPLUG are skipped, they never make it to the linear map.
A later access to the memory in those ranges will fail:
BUG: Unable to handle kernel data access on write at 0xc000000400000000 Faulting instruction address: 0xc00000000008a3c0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 53 Comm: kworker/u2:0 Not tainted 5.13.0 #7 NIP: c00000000008a3c0 LR: c0000000003c1ed8 CTR: 0000000000000040 REGS: c000000008a57770 TRAP: 0300 Not tainted (5.13.0) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 84222202 XER: 20040000 CFAR: c0000000003c1ed4 DAR: c000000400000000 DSISR: 42000000 IRQMASK: 0 GPR00: c0000000003c1ed8 c000000008a57a10 c0000000019da700 c000000400000000 GPR04: 0000000000000280 0000000000000180 0000000000000400 0000000000000200 GPR08: 0000000000000100 0000000000000080 0000000000000040 0000000000000300 GPR12: 0000000000000380 c000000001bc0000 c0000000001660c8 c000000006337e00 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000040000000 0000000020000000 c000000001a81990 c000000008c30000 GPR24: c000000008c20000 c000000001a81998 000fffffffff0000 c000000001a819a0 GPR28: c000000001a81908 c00c000001000000 c000000008c40000 c000000008a64680 NIP clear_user_page+0x50/0x80 LR __handle_mm_fault+0xc88/0x1910 Call Trace: __handle_mm_fault+0xc44/0x1910 (unreliable) handle_mm_fault+0x130/0x2a0 __get_user_pages+0x248/0x610 __get_user_pages_remote+0x12c/0x3e0 get_arg_page+0x54/0xf0 copy_string_kernel+0x11c/0x210 kernel_execve+0x16c/0x220 call_usermodehelper_exec_async+0x1b0/0x2f0 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 79280fa4 79271764 79261f24 794ae8e2 7ca94214 7d683a14 7c893a14 7d893050 7d4903a6 60000000 60000000 60000000 <7c001fec> 7c091fec 7c081fec 7c051fec ---[ end trace 490b8c67e6075e09 ]---
Making for_each_mem_range() include MEMBLOCK_HOTPLUG regions in the traversal fixes this issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976100 Link: https://lkml.kernel.org/r/20210712071132.20902-1-rppt@kernel.org Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Greg Kurz groug@kaod.org Reviewed-by: David Hildenbrand david@redhat.com Cc: stable@vger.kernel.org [5.10+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/memblock.h | 4 ++-- mm/memblock.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-)
--- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -207,7 +207,7 @@ static inline void __next_physmem_range( */ #define for_each_mem_range(i, p_start, p_end) \ __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, \ - MEMBLOCK_NONE, p_start, p_end, NULL) + MEMBLOCK_HOTPLUG, p_start, p_end, NULL)
/** * for_each_mem_range_rev - reverse iterate through memblock areas from @@ -218,7 +218,7 @@ static inline void __next_physmem_range( */ #define for_each_mem_range_rev(i, p_start, p_end) \ __for_each_mem_range_rev(i, &memblock.memory, NULL, NUMA_NO_NODE, \ - MEMBLOCK_NONE, p_start, p_end, NULL) + MEMBLOCK_HOTPLUG, p_start, p_end, NULL)
/** * for_each_reserved_mem_range - iterate over all reserved memblock areas --- a/mm/memblock.c +++ b/mm/memblock.c @@ -940,7 +940,8 @@ static bool should_skip_region(struct me return true;
/* skip hotpluggable memory regions if needed */ - if (movable_node_is_enabled() && memblock_is_hotpluggable(m)) + if (movable_node_is_enabled() && memblock_is_hotpluggable(m) && + !(flags & MEMBLOCK_HOTPLUG)) return true;
/* if we want mirror memory skip non-mirror memory regions */
From: Mike Kravetz mike.kravetz@oracle.com
commit e0f7e2b2f7e7864238a4eea05cc77ae1be2bf784 upstream.
In commit 32021982a324 ("hugetlbfs: Convert to fs_context") processing of the mount mode string was changed from match_octal() to fsparam_u32.
This changed existing behavior as match_octal does not require octal values to have a '0' prefix, but fsparam_u32 does.
Use fsparam_u32oct which provides the same behavior as match_octal.
Link: https://lkml.kernel.org/r/20210721183326.102716-1-mike.kravetz@oracle.com Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reported-by: Dennis Camera bugs+kernel.org@dtnr.ch Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: David Howells dhowells@redhat.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hugetlbfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -77,7 +77,7 @@ enum hugetlb_param { static const struct fs_parameter_spec hugetlb_fs_parameters[] = { fsparam_u32 ("gid", Opt_gid), fsparam_string("min_size", Opt_min_size), - fsparam_u32 ("mode", Opt_mode), + fsparam_u32oct("mode", Opt_mode), fsparam_string("nr_inodes", Opt_nr_inodes), fsparam_string("pagesize", Opt_pagesize), fsparam_string("size", Opt_size),
From: Ilya Dryomov idryomov@gmail.com
commit ed9eb71085ecb7ded9a5118cec2ab70667cc7350 upstream.
Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking on releasing_wait completion. On the I/O completion side, each image request also needs to take lock_rwsem for read. Because rw_semaphore implementation doesn't allow new readers after a writer has indicated interest in the lock, this can result in a deadlock if something that needs to take lock_rwsem for write gets involved. For example:
1. watch error occurs 2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and releases lock_rwsem 3. after reestablishing the watch, rbd_reregister_watch() takes lock_rwsem for write and calls rbd_reacquire_lock() 4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on releasing_wait until running_list becomes empty 5. another watch error occurs 6. rbd_watch_errcb() blocks trying to take lock_rwsem for write 7. no in-flight image request can complete and delete itself from running_list because lock_rwsem won't be granted anymore
A similar scenario can occur with "lock has been acquired" and "lock has been released" notification handers which also take lock_rwsem for write to update owner_cid.
We don't actually get anything useful from sitting on lock_rwsem in rbd_quiesce_lock() -- owner_cid updates certainly don't need to be synchronized with. In fact the whole owner_cid tracking logic could probably be removed from the kernel client because we don't support proxied maintenance operations.
Cc: stable@vger.kernel.org # 5.3+ URL: https://tracker.ceph.com/issues/42757 Signed-off-by: Ilya Dryomov idryomov@gmail.com Tested-by: Robin Geuze robin.geuze@nl.team.blue Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/rbd.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -4147,8 +4147,6 @@ again:
static bool rbd_quiesce_lock(struct rbd_device *rbd_dev) { - bool need_wait; - dout("%s rbd_dev %p\n", __func__, rbd_dev); lockdep_assert_held_write(&rbd_dev->lock_rwsem);
@@ -4160,11 +4158,11 @@ static bool rbd_quiesce_lock(struct rbd_ */ rbd_dev->lock_state = RBD_LOCK_STATE_RELEASING; rbd_assert(!completion_done(&rbd_dev->releasing_wait)); - need_wait = !list_empty(&rbd_dev->running_list); - downgrade_write(&rbd_dev->lock_rwsem); - if (need_wait) - wait_for_completion(&rbd_dev->releasing_wait); - up_read(&rbd_dev->lock_rwsem); + if (list_empty(&rbd_dev->running_list)) + return true; + + up_write(&rbd_dev->lock_rwsem); + wait_for_completion(&rbd_dev->releasing_wait);
down_write(&rbd_dev->lock_rwsem); if (rbd_dev->lock_state != RBD_LOCK_STATE_RELEASING)
From: Ilya Dryomov idryomov@gmail.com
commit 8798d070d416d18a75770fc19787e96705073f43 upstream.
Skipping the "lock has been released" notification if the lock owner is not what we expect based on owner_cid can lead to I/O hangs. One example is our own notifications: because owner_cid is cleared in rbd_unlock(), when we get our own notification it is processed as unexpected/duplicate and maybe_kick_acquire() isn't called. If a peer that requested the lock then doesn't go through with acquiring it, I/O requests that came in while the lock was being quiesced would be stalled until another I/O request is submitted and kicks acquire from rbd_img_exclusive_lock().
This makes the comment in rbd_release_lock() actually true: prior to this change the canceled work was being requeued in response to the "lock has been acquired" notification from rbd_handle_acquired_lock().
Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Ilya Dryomov idryomov@gmail.com Tested-by: Robin Geuze robin.geuze@nl.team.blue Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/rbd.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-)
--- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -4248,15 +4248,11 @@ static void rbd_handle_acquired_lock(str if (!rbd_cid_equal(&cid, &rbd_empty_cid)) { down_write(&rbd_dev->lock_rwsem); if (rbd_cid_equal(&cid, &rbd_dev->owner_cid)) { - /* - * we already know that the remote client is - * the owner - */ - up_write(&rbd_dev->lock_rwsem); - return; + dout("%s rbd_dev %p cid %llu-%llu == owner_cid\n", + __func__, rbd_dev, cid.gid, cid.handle); + } else { + rbd_set_owner_cid(rbd_dev, &cid); } - - rbd_set_owner_cid(rbd_dev, &cid); downgrade_write(&rbd_dev->lock_rwsem); } else { down_read(&rbd_dev->lock_rwsem); @@ -4281,14 +4277,12 @@ static void rbd_handle_released_lock(str if (!rbd_cid_equal(&cid, &rbd_empty_cid)) { down_write(&rbd_dev->lock_rwsem); if (!rbd_cid_equal(&cid, &rbd_dev->owner_cid)) { - dout("%s rbd_dev %p unexpected owner, cid %llu-%llu != owner_cid %llu-%llu\n", + dout("%s rbd_dev %p cid %llu-%llu != owner_cid %llu-%llu\n", __func__, rbd_dev, cid.gid, cid.handle, rbd_dev->owner_cid.gid, rbd_dev->owner_cid.handle); - up_write(&rbd_dev->lock_rwsem); - return; + } else { + rbd_set_owner_cid(rbd_dev, &rbd_empty_cid); } - - rbd_set_owner_cid(rbd_dev, &rbd_empty_cid); downgrade_write(&rbd_dev->lock_rwsem); } else { down_read(&rbd_dev->lock_rwsem);
From: Jérôme Glisse jglisse@redhat.com
commit c36748ac545421d94a5091c754414c0f3664bf10 upstream.
We need to append device id even if eeprom have a label property set as some platform can have multiple eeproms with same label and we can not register each of those with same label. Failing to register those eeproms trigger cascade failures on such platform (system is no longer working).
This fix regression on such platform introduced with 4e302c3b568e
Reported-by: Alexander Fomichev fomichev.ru@gmail.com Fixes: 4e302c3b568e ("misc: eeprom: at24: fix NVMEM name with custom AT24 device name") Cc: stable@vger.kernel.org Signed-off-by: Jérôme Glisse jglisse@redhat.com Signed-off-by: Bartosz Golaszewski bgolaszewski@baylibre.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/eeprom/at24.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
--- a/drivers/misc/eeprom/at24.c +++ b/drivers/misc/eeprom/at24.c @@ -714,23 +714,20 @@ static int at24_probe(struct i2c_client }
/* - * If the 'label' property is not present for the AT24 EEPROM, - * then nvmem_config.id is initialised to NVMEM_DEVID_AUTO, - * and this will append the 'devid' to the name of the NVMEM - * device. This is purely legacy and the AT24 driver has always - * defaulted to this. However, if the 'label' property is - * present then this means that the name is specified by the - * firmware and this name should be used verbatim and so it is - * not necessary to append the 'devid'. + * We initialize nvmem_config.id to NVMEM_DEVID_AUTO even if the + * label property is set as some platform can have multiple eeproms + * with same label and we can not register each of those with same + * label. Failing to register those eeproms trigger cascade failure + * on such platform. */ + nvmem_config.id = NVMEM_DEVID_AUTO; + if (device_property_present(dev, "label")) { - nvmem_config.id = NVMEM_DEVID_NONE; err = device_property_read_string(dev, "label", &nvmem_config.name); if (err) return err; } else { - nvmem_config.id = NVMEM_DEVID_AUTO; nvmem_config.name = dev_name(dev); }
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit c453db6cd96418c79702eaf38259002755ab23ff upstream.
Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") fixed up all architectures to deal with the stack guard gap. But when nds32 was added to the tree, it forgot to do the same thing.
Resolve this by properly fixing up the nsd32's version of arch_get_unmapped_area()
Cc: Nick Hu nickhu@andestech.com Cc: Greentime Hu green.hu@gmail.com Cc: Vincent Chen deanbo422@gmail.com Cc: Michal Hocko mhocko@suse.com Cc: Hugh Dickins hughd@google.com Cc: Qiang Liu cyruscyliu@gmail.com Cc: stable stable@vger.kernel.org Reported-by: iLifetruth yixiaonn@gmail.com Acked-by: Hugh Dickins hughd@google.com Link: https://lore.kernel.org/r/20210629104024.2293615-1-gregkh@linuxfoundation.or... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/nds32/mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/nds32/mm/mmap.c +++ b/arch/nds32/mm/mmap.c @@ -59,7 +59,7 @@ arch_get_unmapped_area(struct file *filp
vma = find_vma(mm, addr); if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vma->vm_start)) + (!vma || addr + len <= vm_start_gap(vma))) return addr; }
From: Adrian Hunter adrian.hunter@intel.com
commit e64daad660a0c9ace3acdc57099fffe5ed83f977 upstream.
sysfs_remove_link() causes a warning if the parent directory does not exist. That can happen if the device link consumer has not been registered. So do not attempt sysfs_remove_link() in that case.
Fixes: 287905e68dd29 ("driver core: Expose device link details in sysfs") Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org # 5.9+ Reviewed-by: Rafael J. Wysocki rafael@kernel.org Link: https://lore.kernel.org/r/20210716114408.17320-2-adrian.hunter@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -449,8 +449,10 @@ static void devlink_remove_symlinks(stru return; }
- snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup)); - sysfs_remove_link(&con->kobj, buf); + if (device_is_registered(con)) { + snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup)); + sysfs_remove_link(&con->kobj, buf); + } snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con)); sysfs_remove_link(&sup->kobj, buf); kfree(buf);
From: Jason Ekstrand jason@jlekstrand.net
commit 3761baae908a7b5012be08d70fa553cc2eb82305 upstream.
This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever since that commit, we've been having issues where a hang in one client can propagate to another. In particular, a hang in an app can propagate to the X server which causes the whole desktop to lock up.
Error propagation along fences sound like a good idea, but as your bug shows, surprising consequences, since propagating errors across security boundaries is not a good thing.
What we do have is track the hangs on the ctx, and report information to userspace using RESET_STATS. That's how arb_robustness works. Also, if my understanding is still correct, the EIO from execbuf is when your context is banned (because not recoverable or too many hangs). And in all these cases it's up to userspace to figure out what is all impacted and should be reported to the application, that's not on the kernel to guess and automatically propagate.
What's more, we're also building more features on top of ctx error reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the userspace fence wait also relies on that mechanism. So it is the path going forward for reporting gpu hangs and resets to userspace.
So all together that's why I think we should just bury this idea again as not quite the direction we want to go to, hence why I think the revert is the right option here.
For backporters: Please note that you _must_ have a backport of https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.n... for otherwise backporting just this patch opens up a security bug.
v2: Augment commit message. Also restore Jason's sob that I accidentally lost.
v3: Add a note for backporters
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Reported-by: Marcin Slusarz marcin.slusarz@intel.com Cc: stable@vger.kernel.org # v5.6+ Cc: Jason Ekstrand jason.ekstrand@intel.com Cc: Marcin Slusarz marcin.slusarz@intel.com Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080 Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences") Acked-by: Daniel Vetter daniel.vetter@ffwll.ch Reviewed-by: Jon Bloomfield jon.bloomfield@intel.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason... (cherry picked from commit 93a2711cddd5760e2f0f901817d71c93183c3b87) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/i915_request.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1285,10 +1285,8 @@ i915_request_await_execution(struct i915
do { fence = *child++; - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) { - i915_sw_fence_set_error_once(&rq->submit, fence->error); + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) continue; - }
if (fence->context == rq->fence.context) continue; @@ -1386,10 +1384,8 @@ i915_request_await_dma_fence(struct i915
do { fence = *child++; - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) { - i915_sw_fence_set_error_once(&rq->submit, fence->error); + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) continue; - }
/* * Requests on the same timeline are explicitly ordered, along
Hi!
According to changelog, this introduces security hole.
From: Jason Ekstrand jason@jlekstrand.net
commit 3761baae908a7b5012be08d70fa553cc2eb82305 upstream.
This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever since that commit, we've been having issues where a hang in one client
Hmm. Sounds like problem I'm seeing in mainline. So... good to know.
For backporters: Please note that you _must_ have a backport of https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.n... for otherwise backporting just this patch opens up a security bug.
AFAICT we don't have that c9d9fdbc108af8915d3f497bbdf3898bf8f321b8 drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser" in 5.10 tree.
Hmm, and it needs follow up fix: 6e0b6528d783b2b87bd9e1bea97cf4dac87540d7 drm/i915: Correct the docs for intel_engine_cmd_parser.
(Someone please double check this).
Best regards, Pavel
On Tue, Jul 27, 2021 at 11:35:46PM +0200, Pavel Machek wrote:
Hi!
According to changelog, this introduces security hole.
From: Jason Ekstrand jason@jlekstrand.net
commit 3761baae908a7b5012be08d70fa553cc2eb82305 upstream.
This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever since that commit, we've been having issues where a hang in one client
Hmm. Sounds like problem I'm seeing in mainline. So... good to know.
For backporters: Please note that you _must_ have a backport of https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.n... for otherwise backporting just this patch opens up a security bug.
AFAICT we don't have that c9d9fdbc108af8915d3f497bbdf3898bf8f321b8 drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser" in 5.10 tree.
Hmm, and it needs follow up fix: 6e0b6528d783b2b87bd9e1bea97cf4dac87540d7 drm/i915: Correct the docs for intel_engine_cmd_parser.
(Someone please double check this).
thanks, let me drop this for now.
Jason, you sent me a single patch to backport, should this be 3 patches instead?
thanks,
greg k-h
From: Charles Baylis cb-kernel@fishzet.co.uk
commit 3abab27c322e0f2acf981595aa8040c9164dc9fb upstream.
drm: Return -ENOTTY for non-drm ioctls
Return -ENOTTY from drm_ioctl() when userspace passes in a cmd number which doesn't relate to the drm subsystem.
Glibc uses the TCGETS ioctl to implement isatty(), and without this change isatty() returns it incorrectly returns true for drm devices.
To test run this command: $ if [ -t 0 ]; then echo is a tty; fi < /dev/dri/card0 which shows "is a tty" without this patch.
This may also modify memory which the userspace application is not expecting.
Signed-off-by: Charles Baylis cb-kernel@fishzet.co.uk Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/YPG3IBlzaMhfPqCr@stando.fishze... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_ioctl.c | 3 +++ include/drm/drm_ioctl.h | 1 + 2 files changed, 4 insertions(+)
--- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -827,6 +827,9 @@ long drm_ioctl(struct file *filp, if (drm_dev_is_unplugged(dev)) return -ENODEV;
+ if (DRM_IOCTL_TYPE(cmd) != DRM_IOCTL_BASE) + return -ENOTTY; + is_driver_ioctl = nr >= DRM_COMMAND_BASE && nr < DRM_COMMAND_END;
if (is_driver_ioctl) { --- a/include/drm/drm_ioctl.h +++ b/include/drm/drm_ioctl.h @@ -68,6 +68,7 @@ typedef int drm_ioctl_compat_t(struct fi unsigned long arg);
#define DRM_IOCTL_NR(n) _IOC_NR(n) +#define DRM_IOCTL_TYPE(n) _IOC_TYPE(n) #define DRM_MAJOR 226
/**
From: Likun Gao Likun.Gao@amd.com
commit 3e94b5965e624f7e6d8dd18eb8f3bf2bb99ba30d upstream.
Update GFX golden setting for sienna_cichlid.
Signed-off-by: Likun Gao Likun.Gao@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -3137,6 +3137,7 @@ static const struct soc15_reg_golden gol SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_PERFCOUNTER7_SELECT, 0xf0f001ff, 0x00000000), SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_PERFCOUNTER8_SELECT, 0xf0f001ff, 0x00000000), SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_PERFCOUNTER9_SELECT, 0xf0f001ff, 0x00000000), + SOC15_REG_GOLDEN_VALUE(GC, 0, mmSX_DEBUG_1, 0x00010000, 0x00010020), SOC15_REG_GOLDEN_VALUE(GC, 0, mmTA_CNTL_AUX, 0xfff7ffff, 0x01030000), SOC15_REG_GOLDEN_VALUE(GC, 0, mmUTCL1_CTRL, 0xffbfffff, 0x00a00000) };
From: Marek Behún kabel@kernel.org
commit a03b98d68367b18e5db6d6850e2cc18754fba94a upstream.
Commit 0df952873636a ("mv88e6xxx: Add serdes Rx statistics") added support for RX statistics on SerDes ports for Peridot.
This same implementation is also valid for Topaz, but was not enabled at the time.
We need to use the generic .serdes_get_lane() method instead of the Peridot specific one in the stats methods so that on Topaz the proper one is used.
Signed-off-by: Marek Behún kabel@kernel.org Fixes: 0df952873636a ("mv88e6xxx: Add serdes Rx statistics") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/mv88e6xxx/chip.c | 6 ++++++ drivers/net/dsa/mv88e6xxx/serdes.c | 6 +++--- 2 files changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3433,6 +3433,9 @@ static const struct mv88e6xxx_ops mv88e6 .serdes_irq_enable = mv88e6390_serdes_irq_enable, .serdes_irq_status = mv88e6390_serdes_irq_status, .gpio_ops = &mv88e6352_gpio_ops, + .serdes_get_sset_count = mv88e6390_serdes_get_sset_count, + .serdes_get_strings = mv88e6390_serdes_get_strings, + .serdes_get_stats = mv88e6390_serdes_get_stats, .phylink_validate = mv88e6341_phylink_validate, };
@@ -4205,6 +4208,9 @@ static const struct mv88e6xxx_ops mv88e6 .gpio_ops = &mv88e6352_gpio_ops, .avb_ops = &mv88e6390_avb_ops, .ptp_ops = &mv88e6352_ptp_ops, + .serdes_get_sset_count = mv88e6390_serdes_get_sset_count, + .serdes_get_strings = mv88e6390_serdes_get_strings, + .serdes_get_stats = mv88e6390_serdes_get_stats, .phylink_validate = mv88e6341_phylink_validate, };
--- a/drivers/net/dsa/mv88e6xxx/serdes.c +++ b/drivers/net/dsa/mv88e6xxx/serdes.c @@ -590,7 +590,7 @@ static struct mv88e6390_serdes_hw_stat m
int mv88e6390_serdes_get_sset_count(struct mv88e6xxx_chip *chip, int port) { - if (mv88e6390_serdes_get_lane(chip, port) == 0) + if (mv88e6xxx_serdes_get_lane(chip, port) == 0) return 0;
return ARRAY_SIZE(mv88e6390_serdes_hw_stats); @@ -602,7 +602,7 @@ int mv88e6390_serdes_get_strings(struct struct mv88e6390_serdes_hw_stat *stat; int i;
- if (mv88e6390_serdes_get_lane(chip, port) == 0) + if (mv88e6xxx_serdes_get_lane(chip, port) == 0) return 0;
for (i = 0; i < ARRAY_SIZE(mv88e6390_serdes_hw_stats); i++) { @@ -638,7 +638,7 @@ int mv88e6390_serdes_get_stats(struct mv int lane; int i;
- lane = mv88e6390_serdes_get_lane(chip, port); + lane = mv88e6xxx_serdes_get_lane(chip, port); if (lane == 0) return 0;
From: Marek Behún kabel@kernel.org
commit 953b0dcbe2e3f7bee98cc3bca2ec82c8298e9c16 upstream.
Commit bf3504cea7d7e ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") added support for dumping SerDes PCS registers via ethtool -d for Peridot.
The same implementation is also valid for Topaz, but was not enabled at the time.
Signed-off-by: Marek Behún kabel@kernel.org Fixes: bf3504cea7d7e ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d") Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/mv88e6xxx/chip.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3436,6 +3436,8 @@ static const struct mv88e6xxx_ops mv88e6 .serdes_get_sset_count = mv88e6390_serdes_get_sset_count, .serdes_get_strings = mv88e6390_serdes_get_strings, .serdes_get_stats = mv88e6390_serdes_get_stats, + .serdes_get_regs_len = mv88e6390_serdes_get_regs_len, + .serdes_get_regs = mv88e6390_serdes_get_regs, .phylink_validate = mv88e6341_phylink_validate, };
@@ -4211,6 +4213,8 @@ static const struct mv88e6xxx_ops mv88e6 .serdes_get_sset_count = mv88e6390_serdes_get_sset_count, .serdes_get_strings = mv88e6390_serdes_get_strings, .serdes_get_stats = mv88e6390_serdes_get_stats, + .serdes_get_regs_len = mv88e6390_serdes_get_regs_len, + .serdes_get_regs = mv88e6390_serdes_get_regs, .phylink_validate = mv88e6341_phylink_validate, };
From: Evan Quan evan.quan@amd.com
commit e8946a53e2a698c148b3b3ed732f43c7747fbeb6 upstream
Observed unexpected GPU hang during runpm stress test on 0x7341 rev 0x00. Further debugging shows broken ATS is related.
Disable ATS on this part. Similar issues on other devices:
a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms") 45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken") 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
Suggested-by: Alex Deucher alexander.deucher@amd.com Link: https://lore.kernel.org/r/20210602021255.939090-1-evan.quan@amd.com Signed-off-by: Evan Quan evan.quan@amd.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Krzysztof Wilczyński kw@linux.com Cc: stable@vger.kernel.org [sudip: adjust context] Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/quirks.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5264,7 +5264,8 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SE static void quirk_amd_harvest_no_ats(struct pci_dev *pdev) { if ((pdev->device == 0x7312 && pdev->revision != 0x00) || - (pdev->device == 0x7340 && pdev->revision != 0xc5)) + (pdev->device == 0x7340 && pdev->revision != 0xc5) || + (pdev->device == 0x7341 && pdev->revision != 0x00)) return;
pci_info(pdev, "disabling ATS\n"); @@ -5279,6 +5280,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AT DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, quirk_amd_harvest_no_ats); /* AMD Navi14 dGPU */ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, quirk_amd_harvest_no_ats); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7341, quirk_amd_harvest_no_ats); #endif /* CONFIG_PCI_ATS */
/* Freescale PCIe doesn't support MSI in RC mode */
From: Mahesh Bandewar maheshb@google.com
commit 5b69874f74cc5707edd95fcdaa757c507ac8af0f upstream.
The commit 9a5605505d9c (" bonding: Add struct bond_ipesc to manage SA") is causing following build error when XFRM is not selected in kernel config.
lld: error: undefined symbol: xfrm_dev_state_flush
referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453) net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a
Fixes: 9a5605505d9c (" bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Mahesh Bandewar maheshb@google.com CC: Taehee Yoo ap420073@gmail.com CC: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3416,7 +3416,9 @@ static int bond_master_netdev_event(unsi return bond_event_changename(event_bond); case NETDEV_UNREGISTER: bond_remove_proc_entry(event_bond); +#ifdef CONFIG_XFRM_OFFLOAD xfrm_dev_state_flush(dev_net(bond_dev), bond_dev, true); +#endif /* CONFIG_XFRM_OFFLOAD */ break; case NETDEV_REGISTER: bond_create_proc_entry(event_bond);
From: Paul Blakey paulb@nvidia.com
commit 8550ff8d8c75416e984d9c4b082845e57e560984 upstream.
When multiple SKBs are merged to a new skb under napi GRO, or SKB is re-used by napi, if nfct was set for them in the driver, it will not be released while freeing their stolen head state or on re-use.
Release nfct on napi's stolen or re-used SKBs, and in gro_list_prepare, check conntrack metadata diff.
Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action") Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Paul Blakey paulb@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 13 +++++++++++++ net/core/skbuff.c | 1 + 2 files changed, 14 insertions(+)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -5870,6 +5870,18 @@ static struct list_head *gro_list_prepar diffs = memcmp(skb_mac_header(p), skb_mac_header(skb), maclen); + + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb); + + if (!diffs) { + struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); + struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); + + diffs |= (!!p_ext) ^ (!!skb_ext); + if (!diffs && unlikely(skb_ext)) + diffs |= p_ext->chain ^ skb_ext->chain; + } + NAPI_GRO_CB(p)->same_flow = !diffs; }
@@ -6151,6 +6163,7 @@ static void napi_reuse_skb(struct napi_s skb_shinfo(skb)->gso_type = 0; skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); skb_ext_reset(skb); + nf_reset_ct(skb);
napi->skb = skb; } --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -659,6 +659,7 @@ fastpath:
void skb_release_head_state(struct sk_buff *skb) { + nf_reset_ct(skb); skb_dst_drop(skb); if (skb->destructor) { WARN_ON(in_irq());
From: Robert Richter rrichter@amd.com
commit 5e60f363b38fd40e4d8838b5d6f4d4ecee92c777 upstream.
Documentation was not changed when renaming the script in commit 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh"). Fixing this.
Basically does:
$ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)
Fixes: 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh") Signed-off-by: Robert Richter rrichter@amd.com Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/driver-api/early-userspace/early_userspace_support.rst | 8 ++++---- Documentation/filesystems/ramfs-rootfs-initramfs.rst | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-)
--- a/Documentation/driver-api/early-userspace/early_userspace_support.rst +++ b/Documentation/driver-api/early-userspace/early_userspace_support.rst @@ -69,17 +69,17 @@ early userspace image can be built by an
As a technical note, when directories and files are specified, the entire CONFIG_INITRAMFS_SOURCE is passed to -usr/gen_initramfs_list.sh. This means that CONFIG_INITRAMFS_SOURCE +usr/gen_initramfs.sh. This means that CONFIG_INITRAMFS_SOURCE can really be interpreted as any legal argument to -gen_initramfs_list.sh. If a directory is specified as an argument then +gen_initramfs.sh. If a directory is specified as an argument then the contents are scanned, uid/gid translation is performed, and usr/gen_init_cpio file directives are output. If a directory is -specified as an argument to usr/gen_initramfs_list.sh then the +specified as an argument to usr/gen_initramfs.sh then the contents of the file are simply copied to the output. All of the output directives from directory scanning and file contents copying are processed by usr/gen_init_cpio.
-See also 'usr/gen_initramfs_list.sh -h'. +See also 'usr/gen_initramfs.sh -h'.
Where's this all leading? ========================= --- a/Documentation/filesystems/ramfs-rootfs-initramfs.rst +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst @@ -170,7 +170,7 @@ Documentation/driver-api/early-userspace The kernel does not depend on external cpio tools. If you specify a directory instead of a configuration file, the kernel's build infrastructure creates a configuration file from that directory (usr/Makefile calls -usr/gen_initramfs_list.sh), and proceeds to package up that directory +usr/gen_initramfs.sh), and proceeds to package up that directory using the config file (by feeding it to usr/gen_init_cpio, which is created from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is entirely self-contained, and the kernel's boot-time extractor is also
From: Riccardo Mancini rickyman7@gmail.com
commit 02e6246f5364d5260a6ea6f92ab6f409058b162f upstream.
ASan reports a memory leak when running:
# perf test "83: Zstd perf.data compression/decompression"
which happens inside 'perf inject'.
The bug is caused by inject.output never being closed.
This patch adds the missing perf_data__close().
Signed-off-by: Riccardo Mancini rickyman7@gmail.com Fixes: 6ef81c55a2b6584c ("perf session: Return error code for perf_session__new() function on failure") Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mamatha Inamdar mamatha4@linux.vnet.ibm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/c06f682afa964687367cf6e92a64ceb49aec76a5.1626343... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/builtin-inject.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -906,8 +906,10 @@ int cmd_inject(int argc, const char **ar
data.path = inject.input_name; inject.session = perf_session__new(&data, inject.output.is_pipe, &inject.tool); - if (IS_ERR(inject.session)) - return PTR_ERR(inject.session); + if (IS_ERR(inject.session)) { + ret = PTR_ERR(inject.session); + goto out_close_output; + }
if (zstd_init(&(inject.session->zstd_data), 0) < 0) pr_warning("Decompression initialization failed.\n"); @@ -949,5 +951,7 @@ int cmd_inject(int argc, const char **ar out_delete: zstd_fini(&(inject.session->zstd_data)); perf_session__delete(inject.session); +out_close_output: + perf_data__close(&inject.output); return ret; }
From: David Jeffery djeffery@redhat.com
commit 0b60557230adfdeb8164e0b342ac9cd469a75759 upstream.
When MSI is used by the ehci-hcd driver, it can cause lost interrupts which results in EHCI only continuing to work due to a polling fallback. But the reliance of polling drastically reduces performance of any I/O through EHCI.
Interrupts are lost as the EHCI interrupt handler does not safely handle edge-triggered interrupts. It fails to ensure all interrupt status bits are cleared, which works with level-triggered interrupts but not the edge-triggered interrupts typical from using MSI.
To fix this problem, check if the driver may have raced with the hardware setting additional interrupt status bits and clear status until it is in a stable state.
Fixes: 306c54d0edb6 ("usb: hcd: Try MSI interrupts on PCI devices") Tested-by: Laurence Oberman loberman@redhat.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: David Jeffery djeffery@redhat.com Link: https://lore.kernel.org/r/20210715213744.GA44506@redhat Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/ehci-hcd.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
--- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -703,7 +703,8 @@ EXPORT_SYMBOL_GPL(ehci_setup); static irqreturn_t ehci_irq (struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); - u32 status, masked_status, pcd_status = 0, cmd; + u32 status, current_status, masked_status, pcd_status = 0; + u32 cmd; int bh; unsigned long flags;
@@ -715,19 +716,22 @@ static irqreturn_t ehci_irq (struct usb_ */ spin_lock_irqsave(&ehci->lock, flags);
- status = ehci_readl(ehci, &ehci->regs->status); + status = 0; + current_status = ehci_readl(ehci, &ehci->regs->status); +restart:
/* e.g. cardbus physical eject */ - if (status == ~(u32) 0) { + if (current_status == ~(u32) 0) { ehci_dbg (ehci, "device removed\n"); goto dead; } + status |= current_status;
/* * We don't use STS_FLR, but some controllers don't like it to * remain on, so mask it out along with the other status bits. */ - masked_status = status & (INTR_MASK | STS_FLR); + masked_status = current_status & (INTR_MASK | STS_FLR);
/* Shared IRQ? */ if (!masked_status || unlikely(ehci->rh_state == EHCI_RH_HALTED)) { @@ -737,6 +741,12 @@ static irqreturn_t ehci_irq (struct usb_
/* clear (just) interrupts */ ehci_writel(ehci, masked_status, &ehci->regs->status); + + /* For edge interrupts, don't race with an interrupt bit being raised */ + current_status = ehci_readl(ehci, &ehci->regs->status); + if (current_status & INTR_MASK) + goto restart; + cmd = ehci_readl(ehci, &ehci->regs->command); bh = 0;
From: Colin Xu colin.xu@intel.com
commit c90b4503ccf42d9d367e843c223df44aa550e82a upstream.
d3_entered flag is used to mark for vgpu_reset a previous power transition from D3->D0, typically for VM resume from S3, so that gvt could skip PPGTT invalidation in current vgpu_reset during resuming.
In case S0ix exit, although there is D3->D0, guest driver continue to use vgpu as normal, with d3_entered set, until next shutdown/reboot or power transition.
If a reboot follows a S0ix exit, device power state transite as: D0->D3->D0->D0(reboot), while system power state transites as: S0->S0 (reboot). There is no vgpu_reset until D0(reboot), thus d3_entered won't be cleared, the vgpu_reset will skip PPGTT invalidation however those PPGTT entries are no longer valid. Err appears like:
gvt: vgpu 2: vfio_pin_pages failed for gfn 0xxxxx, ret -22 gvt: vgpu 2: fail: spt xxxx guest entry 0xxxxx type 2 gvt: vgpu 2: fail: shadow page xxxx guest entry 0xxxxx type 2.
Give gvt a chance to clear d3_entered on elsp cmd submission so that the states before & after S0ix enter/exit are consistent.
Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.") Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Colin Xu colin.xu@intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20210707004531.4873-1-colin.xu@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gvt/handlers.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -1728,6 +1728,21 @@ static int elsp_mmio_write(struct intel_ if (drm_WARN_ON(&i915->drm, !engine)) return -EINVAL;
+ /* + * Due to d3_entered is used to indicate skipping PPGTT invalidation on + * vGPU reset, it's set on D0->D3 on PCI config write, and cleared after + * vGPU reset if in resuming. + * In S0ix exit, the device power state also transite from D3 to D0 as + * S3 resume, but no vGPU reset (triggered by QEMU devic model). After + * S0ix exit, all engines continue to work. However the d3_entered + * remains set which will break next vGPU reset logic (miss the expected + * PPGTT invalidation). + * Engines can only work in D0. Thus the 1st elsp write gives GVT a + * chance to clear d3_entered. + */ + if (vgpu->d3_entered) + vgpu->d3_entered = false; + execlist = &vgpu->submission.execlist[engine->id];
execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;
From: Íñigo Huguet ihuguet@redhat.com
[ Upstream commit 788bc000d4c2f25232db19ab3a0add0ba4e27671 ]
Commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") intended to fix a problem caused by a round up when calculating the number of XDP channels and queues. However, this was not the real problem. The real problem was that the number of XDP TX queues had been reduced to half in commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"), but the variable xdp_tx_queue_count had remained the same.
Once the correct number of XDP TX queues is created again in the previous patch of this series, this also can be reverted since the error doesn't actually exist.
Only in the case that there is a bug in the code we can have different values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this, and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it happens again in the future.
Note that the number of allocated queues can be higher than the number of used ones due to the round up, as explained in the existing comment in the code. That's why we also have to stop increasing xdp_queue_number beyond efx->xdp_tx_queue_count.
Signed-off-by: Íñigo Huguet ihuguet@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/efx_channels.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index a4a626e9cd9a..0a8799a208cf 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -889,18 +889,20 @@ int efx_set_channels(struct efx_nic *efx) if (efx_channel_is_xdp_tx(channel)) { efx_for_each_channel_tx_queue(tx_queue, channel) { tx_queue->queue = next_queue++; - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n", - channel->channel, tx_queue->label, - xdp_queue_number, tx_queue->queue); + /* We may have a few left-over XDP TX * queues owing to xdp_tx_queue_count * not dividing evenly by EFX_MAX_TXQ_PER_CHANNEL. * We still allocate and probe those * TXQs, but never use them. */ - if (xdp_queue_number < efx->xdp_tx_queue_count) + if (xdp_queue_number < efx->xdp_tx_queue_count) { + netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n", + channel->channel, tx_queue->label, + xdp_queue_number, tx_queue->queue); efx->xdp_tx_queues[xdp_queue_number] = tx_queue; - xdp_queue_number++; + xdp_queue_number++; + } } } else { efx_for_each_channel_tx_queue(tx_queue, channel) { @@ -912,6 +914,7 @@ int efx_set_channels(struct efx_nic *efx) } } } + WARN_ON(xdp_queue_number != efx->xdp_tx_queue_count);
rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels); if (rc)
From: Mathias Nyman mathias.nyman@linux.intel.com
[commit b1adc42d440df3233255e313a45ab7e9b2b74096 upstream]
In several event handlers we need to find the right endpoint structure from slot_id and ep_index in the event.
Add a helper for this, check that slot_id and ep_index are valid.
Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20210129130044.206855-6-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Carsten Schmid carsten_schmid@mentor.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 58 +++++++++++++++++++++++++++++++++---------- drivers/usb/host/xhci.h | 3 +- 2 files changed, 47 insertions(+), 14 deletions(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -446,6 +446,26 @@ void xhci_ring_doorbell_for_active_rings ring_doorbell_for_active_rings(xhci, slot_id, ep_index); }
+static struct xhci_virt_ep *xhci_get_virt_ep(struct xhci_hcd *xhci, + unsigned int slot_id, + unsigned int ep_index) +{ + if (slot_id == 0 || slot_id >= MAX_HC_SLOTS) { + xhci_warn(xhci, "Invalid slot_id %u\n", slot_id); + return NULL; + } + if (ep_index >= EP_CTX_PER_DEV) { + xhci_warn(xhci, "Invalid endpoint index %u\n", ep_index); + return NULL; + } + if (!xhci->devs[slot_id]) { + xhci_warn(xhci, "No xhci virt device for slot_id %u\n", slot_id); + return NULL; + } + + return &xhci->devs[slot_id]->eps[ep_index]; +} + /* Get the right ring for the given slot_id, ep_index and stream_id. * If the endpoint supports streams, boundary check the URB's stream ID. * If the endpoint doesn't support streams, return the singular endpoint ring. @@ -456,7 +476,10 @@ struct xhci_ring *xhci_triad_to_transfer { struct xhci_virt_ep *ep;
- ep = &xhci->devs[slot_id]->eps[ep_index]; + ep = xhci_get_virt_ep(xhci, slot_id, ep_index); + if (!ep) + return NULL; + /* Common case: no streams */ if (!(ep->ep_state & EP_HAS_STREAMS)) return ep->ring; @@ -747,11 +770,14 @@ static void xhci_handle_cmd_stop_ep(stru memset(&deq_state, 0, sizeof(deq_state)); ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
+ ep = xhci_get_virt_ep(xhci, slot_id, ep_index); + if (!ep) + return; + vdev = xhci->devs[slot_id]; ep_ctx = xhci_get_ep_ctx(xhci, vdev->out_ctx, ep_index); trace_xhci_handle_cmd_stop_ep(ep_ctx);
- ep = &xhci->devs[slot_id]->eps[ep_index]; last_unlinked_td = list_last_entry(&ep->cancelled_td_list, struct xhci_td, cancelled_td_list);
@@ -1076,9 +1102,11 @@ static void xhci_handle_cmd_set_deq(stru
ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3])); stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2])); - dev = xhci->devs[slot_id]; - ep = &dev->eps[ep_index]; + ep = xhci_get_virt_ep(xhci, slot_id, ep_index); + if (!ep) + return;
+ dev = xhci->devs[slot_id]; ep_ring = xhci_stream_id_to_ring(dev, ep_index, stream_id); if (!ep_ring) { xhci_warn(xhci, "WARN Set TR deq ptr command for freed stream ID %u\n", @@ -1151,9 +1179,9 @@ static void xhci_handle_cmd_set_deq(stru }
cleanup: - dev->eps[ep_index].ep_state &= ~SET_DEQ_PENDING; - dev->eps[ep_index].queued_deq_seg = NULL; - dev->eps[ep_index].queued_deq_ptr = NULL; + ep->ep_state &= ~SET_DEQ_PENDING; + ep->queued_deq_seg = NULL; + ep->queued_deq_ptr = NULL; /* Restart any rings with pending URBs */ ring_doorbell_for_active_rings(xhci, slot_id, ep_index); } @@ -1162,10 +1190,15 @@ static void xhci_handle_cmd_reset_ep(str union xhci_trb *trb, u32 cmd_comp_code) { struct xhci_virt_device *vdev; + struct xhci_virt_ep *ep; struct xhci_ep_ctx *ep_ctx; unsigned int ep_index;
ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3])); + ep = xhci_get_virt_ep(xhci, slot_id, ep_index); + if (!ep) + return; + vdev = xhci->devs[slot_id]; ep_ctx = xhci_get_ep_ctx(xhci, vdev->out_ctx, ep_index); trace_xhci_handle_cmd_reset_ep(ep_ctx); @@ -1195,7 +1228,7 @@ static void xhci_handle_cmd_reset_ep(str xhci_ring_cmd_db(xhci); } else { /* Clear our internal halted state */ - xhci->devs[slot_id]->eps[ep_index].ep_state &= ~EP_HALTED; + ep->ep_state &= ~EP_HALTED; }
/* if this was a soft reset, then restart */ @@ -2364,14 +2397,13 @@ static int handle_tx_event(struct xhci_h trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len)); ep_trb_dma = le64_to_cpu(event->buffer);
- xdev = xhci->devs[slot_id]; - if (!xdev) { - xhci_err(xhci, "ERROR Transfer event pointed to bad slot %u\n", - slot_id); + ep = xhci_get_virt_ep(xhci, slot_id, ep_index); + if (!ep) { + xhci_err(xhci, "ERROR Invalid Transfer event\n"); goto err_out; }
- ep = &xdev->eps[ep_index]; + xdev = xhci->devs[slot_id]; ep_ring = xhci_dma_to_transfer_ring(ep, ep_trb_dma); ep_ctx = xhci_get_ep_ctx(xhci, xdev->out_ctx, ep_index);
--- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -993,6 +993,7 @@ struct xhci_interval_bw_table { unsigned int ss_bw_out; };
+#define EP_CTX_PER_DEV 31
struct xhci_virt_device { struct usb_device *udev; @@ -1007,7 +1008,7 @@ struct xhci_virt_device { struct xhci_container_ctx *out_ctx; /* Used for addressing devices and configuration changes */ struct xhci_container_ctx *in_ctx; - struct xhci_virt_ep eps[31]; + struct xhci_virt_ep eps[EP_CTX_PER_DEV]; u8 fake_port; u8 real_port; struct xhci_interval_bw_table *bw_table;
Hi!
From: Mathias Nyman mathias.nyman@linux.intel.com
[commit b1adc42d440df3233255e313a45ab7e9b2b74096 upstream]
This is yet another variation in upstream commit making. So far I was using these:
ma = re.match(".*Upstream commit ([0-9a-f]*) .*", l) if ma: m.upstream = ma.group(1) ma = re.match("[Cc]ommit ([0-9a-f]*) upstream[.]*", l) if ma: m.upstream = ma.group(1) ma = re.match("[Cc]ommit: ([0-9a-f]*)", l) if ma: m.upstream = ma.group(1)
I guess I could update second regexp to search anywhere in the line.... but at that point it will also match stuff like "commit 1234 upstream is broken".
Do you have suggestion how to extract upstream sha1 automatically?
Best regards, Pavel
On Wed, Jul 28, 2021 at 12:10:40PM +0200, Pavel Machek wrote:
Hi!
From: Mathias Nyman mathias.nyman@linux.intel.com
[commit b1adc42d440df3233255e313a45ab7e9b2b74096 upstream]
This is yet another variation in upstream commit making. So far I was using these:
ma = re.match(".*Upstream commit ([0-9a-f]*) .*", l) if ma: m.upstream = ma.group(1) ma = re.match("[Cc]ommit ([0-9a-f]*) upstream[.]*", l) if ma: m.upstream = ma.group(1) ma = re.match("[Cc]ommit: ([0-9a-f]*)", l) if ma: m.upstream = ma.group(1)
I guess I could update second regexp to search anywhere in the line.... but at that point it will also match stuff like "commit 1234 upstream is broken".
Do you have suggestion how to extract upstream sha1 automatically?
I use: grep -E -o '[0-9a-f]{40}'
Hello!
On 7/26/21 10:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.54 release. There are 167 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jul 2021 15:38:12 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.54-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Build regressions detected across plenty of architectures and configurations:
/builds/linux/net/core/dev.c:5877:51: error: use of undeclared identifier 'TC_SKB_EXT' struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ /builds/linux/net/core/dev.c:5878:47: error: use of undeclared identifier 'TC_SKB_EXT' struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT); ^ /builds/linux/net/core/dev.c:5882:19: error: incomplete definition of type 'struct tc_skb_ext' diffs |= p_ext->chain ^ skb_ext->chain; ~~~~~^ /builds/linux/net/core/dev.c:5877:11: note: forward declaration of 'struct tc_skb_ext' struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ /builds/linux/net/core/dev.c:5882:36: error: incomplete definition of type 'struct tc_skb_ext' diffs |= p_ext->chain ^ skb_ext->chain; ~~~~~~~^ /builds/linux/net/core/dev.c:5877:11: note: forward declaration of 'struct tc_skb_ext' struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT); ^ 4 errors generated. make[3]: *** [/builds/linux/scripts/Makefile.build:280: net/core/dev.o] Error 1 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:497: net/core] Error 2
As with 5.13, it failed everywhere for the same reason. Fails on defconfig and bunch others, with GCC/Clang, and across many architectures.
Greetings!
Daniel Díaz daniel.diaz@linaro.org
On 7/26/21 8:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.54 release. There are 167 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jul 2021 15:38:12 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.54-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 7/26/21 9:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.54 release. There are 167 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jul 2021 15:38:12 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.54-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org