This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.128-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.128-rc1
Peter Chen peter.chen@kernel.org usb: dwc3: core: fix kernel panic when do reboot
Jack Pham jackp@codeaurora.org usb: dwc3: debugfs: Add and remove endpoint dirs dynamically
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
afzal mohammed afzal.mohd.ma@gmail.com ARM: OMAP: replace setup_irq() by request_irq()
Eric Auger eric.auger@redhat.com KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read
Arnaldo Carvalho de Melo acme@redhat.com tools headers UAPI: Sync linux/in.h copy with the kernel sources
Fugang Duan fugang.duan@nxp.com net: fec_ptp: add clock rate zero check
Joakim Zhang qiangqing.zhang@nxp.com net: stmmac: disable clocks in stmmac_remove_config_dt()
Andrew Morton akpm@linux-foundation.org mm/slub.c: include swab.h
Kees Cook keescook@chromium.org mm/slub: fix redzoning for small allocations
Kees Cook keescook@chromium.org mm/slub: clarify verification reporting
Nikolay Aleksandrov nikolay@nvidia.com net: bridge: fix vlan tunnel dst refcnt when egressing
Nikolay Aleksandrov nikolay@nvidia.com net: bridge: fix vlan tunnel dst null pointer dereference
Esben Haabendal esben@geanix.com net: ll_temac: Fix TX BD buffer overwrite
Esben Haabendal esben@geanix.com net: ll_temac: Make sure to free skb when it is completely used
Yifan Zhang yifan1.zhang@amd.com drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue.
Yifan Zhang yifan1.zhang@amd.com drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell.
Avraham Stern avraham.stern@intel.com cfg80211: avoid double free of PMSR request
Johannes Berg johannes.berg@intel.com cfg80211: make certificate generation more robust
Bumyong Lee bumyong.lee@samsung.com dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc
Thomas Gleixner tglx@linutronix.de x86/fpu: Reset state for all signal restore failures
Thomas Gleixner tglx@linutronix.de x86/pkru: Write hardware init value to PKRU when xstate is init
Thomas Gleixner tglx@linutronix.de x86/process: Check PF_KTHREAD and not current->mm for kernel threads
Vineet Gupta vgupta@synopsys.com ARCv2: save ABI registers across signal handling
Sean Christopherson seanjc@google.com KVM: x86: Immediately reset the MMU context when the SMM flag is cleared
Chiqijun chiqijun@huawei.com PCI: Work around Huawei Intelligent NIC VF FLR erratum
Sriharsha Basavapatna sriharsha.basavapatna@broadcom.com PCI: Add ACS quirk for Broadcom BCM57414 NIC
Pali Rohár pali@kernel.org PCI: aardvark: Fix kernel panic during PIO transfer
Remi Pommarel repk@triplefau.lt PCI: aardvark: Don't rely on jiffies while holding spinlock
Shanker Donthineni sdonthineni@nvidia.com PCI: Mark some NVIDIA GPUs to avoid bus reset
Antti Järvinen antti.jarvinen@gmail.com PCI: Mark TI C667X to avoid bus reset
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Do no increment trace_clock_global() by one
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Do not stop recording comms if the trace file is being read
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Do not stop recording cmdlines when tracing is off
Andrew Lunn andrew@lunn.ch usb: core: hub: Disable autosuspend for Cypress CY7C65632
Pavel Skripkin paskripkin@gmail.com can: mcba_usb: fix memory leak in mcba_usb
Oleksij Rempel linux@rempel-privat.de can: j1939: fix Use-after-Free, hold skb ref while in use
Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp can: bcm/raw/isotp: use per module netdevice notifier
Norbert Slusarek nslusarek@gmx.net can: bcm: fix infoleak in struct bcm_msg_head
Odin Ugedal odin@uged.al sched/fair: Correctly insert cfs_rq's to list on unthrottle
Riwen Lu luriwen@kylinos.cn hwmon: (scpi-hwmon) shows the negative temperature properly
Chen Li chenli@uniontech.com radeon: use memcpy_to/fromio for UVD fw upload
Sergio Paracuellos sergio.paracuellos@gmail.com pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled
Patrice Chotard patrice.chotard@foss.st.com spi: stm32-qspi: Always wait BUSY bit to be cleared in stm32_qspi_wait_cmd()
Jack Yu jack.yu@realtek.com ASoC: rt5659: Fix the lost powers for the HDA header
Axel Lin axel.lin@ingics.com regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting
Pavel Skripkin paskripkin@gmail.com net: ethernet: fix potential use-after-free in ec_bhf_remove
Toke Høiland-Jørgensen toke@redhat.com icmp: don't send out ICMP messages with a source address of 0.0.0.0
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
Michael Chan michael.chan@broadcom.com bnxt_en: Rediscover PHY capabilities after firmware reset
Pavel Machek pavel@denx.de cxgb4: fix wrong shift.
Linyu Yuan linyyuan@codeaurora.org net: cdc_eem: fix tx fixup skb leak
Pavel Skripkin paskripkin@gmail.com net: hamradio: fix memory leak in mkiss_close
Christophe JAILLET christophe.jaillet@wanadoo.fr be2net: Fix an error handling path in 'be_probe()'
Eric Dumazet edumazet@google.com net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock
Chengyang Fan cy.fan@huawei.com net: ipv4: fix memory leak in ip_mc_add1_src
Joakim Zhang qiangqing.zhang@nxp.com net: fec_ptp: fix issue caused by refactor the fec_devtype
Dongliang Mu mudongliangabcd@gmail.com net: usb: fix possible use-after-free in smsc75xx_bind
Aleksander Jan Bajkowski olek2@wp.pl lantiq: net: fix duplicated skb in rx descriptor ring
Maciej Żenczykowski maze@google.com net: cdc_ncm: switch to eth%d interface naming
Jakub Kicinski kuba@kernel.org ptp: improve max_adj check against unreasonable values
Pavel Skripkin paskripkin@gmail.com net: qrtr: fix OOB Read in qrtr_endpoint_post
Christophe JAILLET christophe.jaillet@wanadoo.fr netxen_nic: Fix an error handling path in 'netxen_nic_probe()'
Christophe JAILLET christophe.jaillet@wanadoo.fr qlcnic: Fix an error handling path in 'qlcnic_probe()'
Changbin Du changbin.du@intel.com net: make get_net_ns return error if NET_NS is disabled
Jisheng Zhang Jisheng.Zhang@synaptics.com net: stmmac: dwmac1000: Fix extended MAC address registers definition
Christophe JAILLET christophe.jaillet@wanadoo.fr alx: Fix an error handling path in 'alx_probe()'
Maxim Mikityanskiy maximmi@nvidia.com sch_cake: Fix out of bounds when parsing TCP options and header
Maxim Mikityanskiy maximmi@nvidia.com netfilter: synproxy: Fix out of bounds when parsing TCP options
Aya Levin ayal@nvidia.com net/mlx5e: Block offload of outer header csum for UDP tunnels
Davide Caratti dcaratti@redhat.com net/mlx5e: allow TSO on VXLAN over VLAN topologies
Maor Gottlieb maorg@nvidia.com net/mlx5: Consider RoCE cap before init RDMA resources
Dima Chumak dchumak@nvidia.com net/mlx5e: Fix page reclaim for dead peer hairpin
Huy Nguyen huyn@nvidia.com net/mlx5e: Remove dependency in IPsec initialization flows
Marcelo Ricardo Leitner marcelo.leitner@gmail.com net/sched: act_ct: handle DNAT tuple collision
Ido Schimmel idosch@nvidia.com rtnetlink: Fix regression in bridge VLAN configuration
Paolo Abeni pabeni@redhat.com udp: fix race between close() and udp_abort()
Aleksander Jan Bajkowski olek2@wp.pl net: lantiq: disable interrupt before sheduling NAPI
Pavel Skripkin paskripkin@gmail.com net: rds: fix memory leak in rds_recvmsg
Nicolas Dichtel nicolas.dichtel@6wind.com vrf: fix maximum MTU
Nanyong Sun sunnanyong@huawei.com net: ipv4: fix memory leak in netlbl_cipsov4_add_std
Sven Eckelmann sven@narfation.org batman-adv: Avoid WARN_ON timing related checks
Jim Mattson jmattson@google.com kvm: LAPIC: Restore guard to prevent illegal APIC register access
yangerkun yangerkun@huawei.com mm/memory-failure: make sure wait for page writeback in memory_failure
Dan Carpenter dan.carpenter@oracle.com afs: Fix an IS_ERR() vs NULL check
Yang Yingliang yangyingliang@huawei.com dmaengine: stedma40: add missing iounmap() on error in d40_probe()
Randy Dunlap rdunlap@infradead.org dmaengine: QCOM_HIDMA_MGMT depends on HAS_IOMEM
Randy Dunlap rdunlap@infradead.org dmaengine: ALTERA_MSGDMA depends on HAS_IOMEM
-------------
Diffstat:
Documentation/vm/slub.rst | 10 +- Makefile | 4 +- arch/arc/include/uapi/asm/sigcontext.h | 1 + arch/arc/kernel/signal.c | 43 +++++ arch/arm/boot/dts/dra7-l4.dtsi | 4 +- arch/arm/boot/dts/dra7.dtsi | 20 +++ arch/arm/mach-omap1/pm.c | 13 +- arch/arm/mach-omap1/time.c | 10 +- arch/arm/mach-omap1/timer32k.c | 10 +- arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 181 ++++++++++++++------- arch/x86/include/asm/fpu/internal.h | 13 +- arch/x86/kernel/fpu/signal.c | 26 +-- arch/x86/kvm/lapic.c | 3 + arch/x86/kvm/x86.c | 5 +- drivers/clk/ti/clk-7xx.c | 1 + drivers/dma/Kconfig | 1 + drivers/dma/pl330.c | 6 +- drivers/dma/qcom/Kconfig | 1 + drivers/dma/ste_dma40.c | 3 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +- drivers/gpu/drm/radeon/radeon_uvd.c | 4 +- drivers/hwmon/scpi-hwmon.c | 9 + drivers/net/can/usb/mcba_usb.c | 17 +- drivers/net/ethernet/atheros/alx/main.c | 1 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 2 +- drivers/net/ethernet/ec_bhf.c | 4 +- drivers/net/ethernet/emulex/benet/be_main.c | 1 + drivers/net/ethernet/freescale/fec_ptp.c | 8 +- drivers/net/ethernet/lantiq_xrx200.c | 5 +- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 3 - drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/rdma.c | 3 + drivers/net/ethernet/mellanox/mlx5/core/transobj.c | 30 +++- .../net/ethernet/qlogic/netxen/netxen_nic_main.c | 2 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 + drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 8 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 + drivers/net/ethernet/xilinx/ll_temac_main.c | 8 +- drivers/net/hamradio/mkiss.c | 1 + drivers/net/usb/cdc_eem.c | 2 +- drivers/net/usb/cdc_ncm.c | 2 +- drivers/net/usb/smsc75xx.c | 10 +- drivers/net/vrf.c | 6 +- drivers/pci/controller/pci-aardvark.c | 59 +++++-- drivers/pci/quirks.c | 89 ++++++++++ drivers/ptp/ptp_clock.c | 6 +- drivers/spi/spi-stm32-qspi.c | 5 +- drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c | 2 +- drivers/usb/core/hub.c | 7 + drivers/usb/dwc3/core.c | 2 +- drivers/usb/dwc3/debug.h | 3 + drivers/usb/dwc3/debugfs.c | 21 +-- drivers/usb/dwc3/gadget.c | 3 + fs/afs/main.c | 4 +- include/linux/cpuhotplug.h | 1 + include/linux/mfd/rohm-bd70528.h | 4 +- include/linux/mlx5/transobj.h | 1 + include/linux/ptp_clock_kernel.h | 2 +- include/linux/socket.h | 2 - include/net/net_namespace.h | 7 + include/uapi/linux/in.h | 3 + kernel/sched/fair.c | 44 ++--- kernel/trace/trace.c | 11 -- kernel/trace/trace_clock.c | 6 +- mm/memory-failure.c | 7 +- mm/slab_common.c | 3 +- mm/slub.c | 23 +-- net/batman-adv/bat_iv_ogm.c | 4 +- net/bridge/br_private.h | 4 +- net/bridge/br_vlan_tunnel.c | 38 +++-- net/can/bcm.c | 62 +++++-- net/can/j1939/transport.c | 54 ++++-- net/can/raw.c | 62 +++++-- net/core/net_namespace.c | 12 ++ net/core/rtnetlink.c | 8 +- net/ipv4/cipso_ipv4.c | 1 + net/ipv4/icmp.c | 7 + net/ipv4/igmp.c | 1 + net/ipv4/udp.c | 10 ++ net/ipv6/udp.c | 3 + net/netfilter/nf_synproxy_core.c | 5 + net/qrtr/qrtr.c | 2 +- net/rds/recv.c | 2 +- net/sched/act_ct.c | 21 ++- net/sched/sch_cake.c | 6 +- net/socket.c | 13 -- net/unix/af_unix.c | 7 +- net/wireless/Makefile | 2 +- net/wireless/pmsr.c | 16 +- sound/soc/codecs/rt5659.c | 26 ++- tools/include/uapi/linux/in.h | 3 + virt/kvm/arm/vgic/vgic-kvm-device.c | 4 +- 96 files changed, 865 insertions(+), 339 deletions(-)
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 253697b93c2a1c237d34d3ae326e394aeb0ca7b3 ]
When CONFIG_HAS_IOMEM is not set/enabled, certain iomap() family functions [including ioremap(), devm_ioremap(), etc.] are not available. Drivers that use these functions should depend on HAS_IOMEM so that they do not cause build errors.
Repairs this build error: s390-linux-ld: drivers/dma/altera-msgdma.o: in function `request_and_map': altera-msgdma.c:(.text+0x14b0): undefined reference to `devm_ioremap'
Fixes: a85c6f1b2921 ("dmaengine: Add driver for Altera / Intel mSGDMA IP core") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Stefan Roese sr@denx.de Cc: Vinod Koul vkoul@kernel.org Cc: dmaengine@vger.kernel.org Reviewed-by: Stefan Roese sr@denx.de Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr@denx.de Link: https://lore.kernel.org/r/20210522021313.16405-2-rdunlap@infradead.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 7af874b69ffb..a32d0d715247 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -59,6 +59,7 @@ config DMA_OF #devices config ALTERA_MSGDMA tristate "Altera / Intel mSGDMA Engine" + depends on HAS_IOMEM select DMA_ENGINE help Enable support for Altera / Intel mSGDMA controller.
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 0cfbb589d67f16fa55b26ae02b69c31b52e344b1 ]
When CONFIG_HAS_IOMEM is not set/enabled, certain iomap() family functions [including ioremap(), devm_ioremap(), etc.] are not available. Drivers that use these functions should depend on HAS_IOMEM so that they do not cause build errors.
Rectifies these build errors: s390-linux-ld: drivers/dma/qcom/hidma_mgmt.o: in function `hidma_mgmt_probe': hidma_mgmt.c:(.text+0x780): undefined reference to `devm_ioremap_resource' s390-linux-ld: drivers/dma/qcom/hidma_mgmt.o: in function `hidma_mgmt_init': hidma_mgmt.c:(.init.text+0x126): undefined reference to `of_address_to_resource' s390-linux-ld: hidma_mgmt.c:(.init.text+0x16e): undefined reference to `of_address_to_resource'
Fixes: 67a2003e0607 ("dmaengine: add Qualcomm Technologies HIDMA channel driver") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Sinan Kaya okaya@codeaurora.org Cc: Vinod Koul vkoul@kernel.org Cc: dmaengine@vger.kernel.org Link: https://lore.kernel.org/r/20210522021313.16405-3-rdunlap@infradead.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/qcom/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/dma/qcom/Kconfig b/drivers/dma/qcom/Kconfig index 1d189438aeb0..bef309ef6a71 100644 --- a/drivers/dma/qcom/Kconfig +++ b/drivers/dma/qcom/Kconfig @@ -10,6 +10,7 @@ config QCOM_BAM_DMA
config QCOM_HIDMA_MGMT tristate "Qualcomm Technologies HIDMA Management support" + depends on HAS_IOMEM select DMA_ENGINE help Enable support for the Qualcomm Technologies HIDMA Management.
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit fffdaba402cea79b8d219355487d342ec23f91c6 ]
Add the missing iounmap() before return from d40_probe() in the error handling case.
Fixes: 8d318a50b3d7 ("DMAENGINE: Support for ST-Ericssons DMA40 block v3") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20210518141108.1324127-1-yangyingliang@huawei.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/ste_dma40.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/dma/ste_dma40.c b/drivers/dma/ste_dma40.c index de8bfd9a76e9..6671bfe08489 100644 --- a/drivers/dma/ste_dma40.c +++ b/drivers/dma/ste_dma40.c @@ -3678,6 +3678,9 @@ static int __init d40_probe(struct platform_device *pdev)
kfree(base->lcla_pool.base_unaligned);
+ if (base->lcpa_base) + iounmap(base->lcpa_base); + if (base->phy_lcpa) release_mem_region(base->phy_lcpa, base->lcpa_size);
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit a33d62662d275cee22888fa7760fe09d5b9cd1f9 ]
The proc_symlink() function returns NULL on error, it doesn't return error pointers.
Fixes: 5b86d4ff5dce ("afs: Implement network namespacing") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: David Howells dhowells@redhat.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLjMRKX40pTrJvgf@mwanda/ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/afs/main.c b/fs/afs/main.c index 5cd26af2464c..d129a1a49616 100644 --- a/fs/afs/main.c +++ b/fs/afs/main.c @@ -196,8 +196,8 @@ static int __init afs_init(void) goto error_fs;
afs_proc_symlink = proc_symlink("fs/afs", NULL, "../self/net/afs"); - if (IS_ERR(afs_proc_symlink)) { - ret = PTR_ERR(afs_proc_symlink); + if (!afs_proc_symlink) { + ret = -ENOMEM; goto error_proc; }
From: yangerkun yangerkun@huawei.com
[ Upstream commit e8675d291ac007e1c636870db880f837a9ea112a ]
Our syzkaller trigger the "BUG_ON(!list_empty(&inode->i_wb_list))" in clear_inode:
kernel BUG at fs/inode.c:519! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: Process syz-executor.0 (pid: 249, stack limit = 0x00000000a12409d7) CPU: 1 PID: 249 Comm: syz-executor.0 Not tainted 4.19.95 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO) pc : clear_inode+0x280/0x2a8 lr : clear_inode+0x280/0x2a8 Call trace: clear_inode+0x280/0x2a8 ext4_clear_inode+0x38/0xe8 ext4_free_inode+0x130/0xc68 ext4_evict_inode+0xb20/0xcb8 evict+0x1a8/0x3c0 iput+0x344/0x460 do_unlinkat+0x260/0x410 __arm64_sys_unlinkat+0x6c/0xc0 el0_svc_common+0xdc/0x3b0 el0_svc_handler+0xf8/0x160 el0_svc+0x10/0x218 Kernel panic - not syncing: Fatal exception
A crash dump of this problem show that someone called __munlock_pagevec to clear page LRU without lock_page: do_mmap -> mmap_region -> do_munmap -> munlock_vma_pages_range -> __munlock_pagevec.
As a result memory_failure will call identify_page_state without wait_on_page_writeback. And after truncate_error_page clear the mapping of this page. end_page_writeback won't call sb_clear_inode_writeback to clear inode->i_wb_list. That will trigger BUG_ON in clear_inode!
Fix it by checking PageWriteback too to help determine should we skip wait_on_page_writeback.
Link: https://lkml.kernel.org/r/20210604084705.3729204-1-yangerkun@huawei.com Fixes: 0bc1f8b0682c ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU") Signed-off-by: yangerkun yangerkun@huawei.com Acked-by: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Jan Kara jack@suse.cz Cc: Theodore Ts'o tytso@mit.edu Cc: Oscar Salvador osalvador@suse.de Cc: Yu Kuai yukuai3@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/memory-failure.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index d823ec74f3fc..9030ab0d9d97 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1382,7 +1382,12 @@ int memory_failure(unsigned long pfn, int flags) return 0; }
- if (!PageTransTail(p) && !PageLRU(p)) + /* + * __munlock_pagevec may clear a writeback page's LRU flag without + * page_lock. We need wait writeback completion for this page or it + * may trigger vfs BUG while evict inode. + */ + if (!PageTransTail(p) && !PageLRU(p) && !PageWriteback(p)) goto identify_page_state;
/*
From: Jim Mattson jmattson@google.com
[ Upstream commit 218bf772bddd221489c38dde6ef8e917131161f6 ]
Per the SDM, "any access that touches bytes 4 through 15 of an APIC register may cause undefined behavior and must not be executed." Worse, such an access in kvm_lapic_reg_read can result in a leak of kernel stack contents. Prior to commit 01402cf81051 ("kvm: LAPIC: write down valid APIC registers"), such an access was explicitly disallowed. Restore the guard that was removed in that commit.
Fixes: 01402cf81051 ("kvm: LAPIC: write down valid APIC registers") Signed-off-by: Jim Mattson jmattson@google.com Reported-by: syzbot syzkaller@googlegroups.com Message-Id: 20210602205224.3189316-1-jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/lapic.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 3f6b866c644d..eea2d6f10f59 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1332,6 +1332,9 @@ int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len, if (!apic_x2apic_mode(apic)) valid_reg_mask |= APIC_REG_MASK(APIC_ARBPRI);
+ if (alignment + len > 4) + return 1; + if (offset > 0x3f0 || !(valid_reg_mask & APIC_REG_MASK(offset))) return 1;
From: Sven Eckelmann sven@narfation.org
[ Upstream commit 9f460ae31c4435fd022c443a6029352217a16ac1 ]
The soft/batadv interface for a queued OGM can be changed during the time the OGM was queued for transmission and when the OGM is actually transmitted by the worker.
But WARN_ON must be used to denote kernel bugs and not to print simple warnings. A warning can simply be printed using pr_warn.
Reported-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Reported-by: syzbot+c0b807de416427ff3dd1@syzkaller.appspotmail.com Fixes: ef0a937f7a14 ("batman-adv: consider outgoing interface in OGM sending") Signed-off-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich sw@simonwunderlich.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/batman-adv/bat_iv_ogm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index d88a4de02237..8be8a1feca84 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -409,8 +409,10 @@ static void batadv_iv_ogm_emit(struct batadv_forw_packet *forw_packet) if (WARN_ON(!forw_packet->if_outgoing)) return;
- if (WARN_ON(forw_packet->if_outgoing->soft_iface != soft_iface)) + if (forw_packet->if_outgoing->soft_iface != soft_iface) { + pr_warn("%s: soft interface switch for queued OGM\n", __func__); return; + }
if (forw_packet->if_incoming->if_status != BATADV_IF_ACTIVE) return;
From: Nanyong Sun sunnanyong@huawei.com
[ Upstream commit d612c3f3fae221e7ea736d196581c2217304bbbc ]
Reported by syzkaller: BUG: memory leak unreferenced object 0xffff888105df7000 (size 64): comm "syz-executor842", pid 360, jiffies 4294824824 (age 22.546s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000e67ed558>] kmalloc include/linux/slab.h:590 [inline] [<00000000e67ed558>] kzalloc include/linux/slab.h:720 [inline] [<00000000e67ed558>] netlbl_cipsov4_add_std net/netlabel/netlabel_cipso_v4.c:145 [inline] [<00000000e67ed558>] netlbl_cipsov4_add+0x390/0x2340 net/netlabel/netlabel_cipso_v4.c:416 [<0000000006040154>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320 net/netlink/genetlink.c:739 [<00000000204d7a1c>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000204d7a1c>] genl_rcv_msg+0x2bf/0x4f0 net/netlink/genetlink.c:800 [<00000000c0d6a995>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504 [<00000000d78b9d2c>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009733081b>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<000000009733081b>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340 [<00000000d5fd43b8>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929 [<000000000a2d1e40>] sock_sendmsg_nosec net/socket.c:654 [inline] [<000000000a2d1e40>] sock_sendmsg+0x139/0x170 net/socket.c:674 [<00000000321d1969>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350 [<00000000964e16bc>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404 [<000000001615e288>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433 [<000000004ee8b6a5>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47 [<00000000171c7cee>] entry_SYSCALL_64_after_hwframe+0x44/0xae
The memory of doi_def->map.std pointing is allocated in netlbl_cipsov4_add_std, but no place has freed it. It should be freed in cipso_v4_doi_free which frees the cipso DOI resource.
Fixes: 96cb8e3313c7a ("[NetLabel]: CIPSOv4 and Unlabeled packet integration") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Nanyong Sun sunnanyong@huawei.com Acked-by: Paul Moore paul@paul-moore.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/cipso_ipv4.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index e290a0c9e928..c1ac802d6894 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -472,6 +472,7 @@ void cipso_v4_doi_free(struct cipso_v4_doi *doi_def) kfree(doi_def->map.std->lvl.local); kfree(doi_def->map.std->cat.cipso); kfree(doi_def->map.std->cat.local); + kfree(doi_def->map.std); break; } kfree(doi_def);
From: Nicolas Dichtel nicolas.dichtel@6wind.com
[ Upstream commit 9bb392f62447d73cc7dd7562413a2cd9104c82f8 ]
My initial goal was to fix the default MTU, which is set to 65536, ie above the maximum defined in the driver: 65535 (ETH_MAX_MTU).
In fact, it's seems more consistent, wrt min_mtu, to set the max_mtu to IP6_MAX_MTU (65535 + sizeof(struct ipv6hdr)) and use it by default.
Let's also, for consistency, set the mtu in vrf_setup(). This function calls ether_setup(), which set the mtu to 1500. Thus, the whole mtu config is done in the same function.
Before the patch: $ ip link add blue type vrf table 1234 $ ip link list blue 9: blue: <NOARP,MASTER> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether fa:f5:27:70:24:2a brd ff:ff:ff:ff:ff:ff $ ip link set dev blue mtu 65535 $ ip link set dev blue mtu 65536 Error: mtu greater than device maximum.
Fixes: 5055376a3b44 ("net: vrf: Fix ping failed when vrf mtu is set to 0") CC: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vrf.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 14dfb9278345..1267786d2931 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -908,9 +908,6 @@ static int vrf_dev_init(struct net_device *dev)
dev->flags = IFF_MASTER | IFF_NOARP;
- /* MTU is irrelevant for VRF device; set to 64k similar to lo */ - dev->mtu = 64 * 1024; - /* similarly, oper state is irrelevant; set to up to avoid confusion */ dev->operstate = IF_OPER_UP; return 0; @@ -1343,7 +1340,8 @@ static void vrf_setup(struct net_device *dev) * which breaks networking. */ dev->min_mtu = IPV6_MIN_MTU; - dev->max_mtu = ETH_MAX_MTU; + dev->max_mtu = IP6_MAX_MTU; + dev->mtu = dev->max_mtu; }
static int vrf_validate(struct nlattr *tb[], struct nlattr *data[],
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit 49bfcbfd989a8f1f23e705759a6bb099de2cff9f ]
Syzbot reported memory leak in rds. The problem was in unputted refcount in case of error.
int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int msg_flags) { ...
if (!rds_next_incoming(rs, &inc)) { ... }
After this "if" inc refcount incremented and
if (rds_cmsg_recv(inc, msg, rs)) { ret = -EFAULT; goto out; } ... out: return ret; }
in case of rds_cmsg_recv() fail the refcount won't be decremented. And it's easy to see from ftrace log, that rds_inc_addref() don't have rds_inc_put() pair in rds_recvmsg() after rds_cmsg_recv()
1) | rds_recvmsg() { 1) 3.721 us | rds_inc_addref(); 1) 3.853 us | rds_message_inc_copy_to_user(); 1) + 10.395 us | rds_cmsg_recv(); 1) + 34.260 us | }
Fixes: bdbe6fbc6a2f ("RDS: recv.c") Reported-and-tested-by: syzbot+5134cdf021c4ed5aaa5f@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin paskripkin@gmail.com Reviewed-by: Håkon Bugge haakon.bugge@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/recv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rds/recv.c b/net/rds/recv.c index aba4afe4dfed..967d115f97ef 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -714,7 +714,7 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
if (rds_cmsg_recv(inc, msg, rs)) { ret = -EFAULT; - goto out; + break; } rds_recvmsg_zcookie(rs, msg);
From: Aleksander Jan Bajkowski olek2@wp.pl
[ Upstream commit f2386cf7c5f4ff5d7b584f5d92014edd7df6c676 ]
This patch fixes TX hangs with threaded NAPI enabled. The scheduled NAPI seems to be executed in parallel with the interrupt on second thread. Sometimes it happens that ltq_dma_disable_irq() is executed after xrx200_tx_housekeeping(). The symptom is that TX interrupts are disabled in the DMA controller. As a result, the TX hangs after a few seconds of the iperf test. Scheduling NAPI after disabling interrupts fixes this issue.
Tested on Lantiq xRX200 (BT Home Hub 5A).
Fixes: 9423361da523 ("net: lantiq: Disable IRQs only if NAPI gets scheduled ") Signed-off-by: Aleksander Jan Bajkowski olek2@wp.pl Acked-by: Hauke Mehrtens hauke@hauke-m.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/lantiq_xrx200.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c index 6ece99e6b6dd..8d073b895fc0 100644 --- a/drivers/net/ethernet/lantiq_xrx200.c +++ b/drivers/net/ethernet/lantiq_xrx200.c @@ -351,8 +351,8 @@ static irqreturn_t xrx200_dma_irq(int irq, void *ptr) struct xrx200_chan *ch = ptr;
if (napi_schedule_prep(&ch->napi)) { - __napi_schedule(&ch->napi); ltq_dma_disable_irq(&ch->dma); + __napi_schedule(&ch->napi); }
ltq_dma_ack_irq(&ch->dma);
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit a8b897c7bcd47f4147d066e22cc01d1026d7640e ]
Kaustubh reported and diagnosed a panic in udp_lib_lookup(). The root cause is udp_abort() racing with close(). Both racing functions acquire the socket lock, but udp{v6}_destroy_sock() release it before performing destructive actions.
We can't easily extend the socket lock scope to avoid the race, instead use the SOCK_DEAD flag to prevent udp_abort from doing any action when the critical race happens.
Diagnosed-and-tested-by: Kaustubh Pandey kapandey@codeaurora.org Fixes: 5d77dca82839 ("net: diag: support SOCK_DESTROY for UDP sockets") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/udp.c | 10 ++++++++++ net/ipv6/udp.c | 3 +++ 2 files changed, 13 insertions(+)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 24841a9e9966..4644f86c932f 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2511,6 +2511,9 @@ void udp_destroy_sock(struct sock *sk) { struct udp_sock *up = udp_sk(sk); bool slow = lock_sock_fast(sk); + + /* protects from races with udp_abort() */ + sock_set_flag(sk, SOCK_DEAD); udp_flush_pending_frames(sk); unlock_sock_fast(sk, slow); if (static_branch_unlikely(&udp_encap_needed_key)) { @@ -2770,10 +2773,17 @@ int udp_abort(struct sock *sk, int err) { lock_sock(sk);
+ /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing + * with close() + */ + if (sock_flag(sk, SOCK_DEAD)) + goto out; + sk->sk_err = err; sk->sk_error_report(sk); __udp_disconnect(sk, 0);
+out: release_sock(sk);
return 0; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 6762430280f5..3c94b81bb459 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1539,6 +1539,9 @@ void udpv6_destroy_sock(struct sock *sk) { struct udp_sock *up = udp_sk(sk); lock_sock(sk); + + /* protects from races with udp_abort() */ + sock_set_flag(sk, SOCK_DEAD); udp_v6_flush_pending_frames(sk); release_sock(sk);
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit d2e381c4963663bca6f30c3b996fa4dbafe8fcb5 ]
Cited commit started returning errors when notification info is not filled by the bridge driver, resulting in the following regression:
# ip link add name br1 type bridge vlan_filtering 1 # bridge vlan add dev br1 vid 555 self pvid untagged RTNETLINK answers: Invalid argument
As long as the bridge driver does not fill notification info for the bridge device itself, an empty notification should not be considered as an error. This is explained in commit 59ccaaaa49b5 ("bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify").
Fix by removing the error and add a comment to avoid future bugs.
Fixes: a8db57c1d285 ("rtnetlink: Fix missing error code in rtnl_bridge_notify()") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index bdeb169a7a99..0bad5db23129 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -4535,10 +4535,12 @@ static int rtnl_bridge_notify(struct net_device *dev) if (err < 0) goto errout;
- if (!skb->len) { - err = -EINVAL; + /* Notification info is only filled for bridge ports, not the bridge + * device itself. Therefore, a zero notification length is valid and + * should not result in an error. + */ + if (!skb->len) goto errout; - }
rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC); return 0;
From: Marcelo Ricardo Leitner marcelo.leitner@gmail.com
[ Upstream commit 13c62f5371e3eb4fc3400cfa26e64ca75f888008 ]
This this the counterpart of 8aa7b526dc0b ("openvswitch: handle DNAT tuple collision") for act_ct. From that commit changelog:
""" With multiple DNAT rules it's possible that after destination translation the resulting tuples collide.
...
Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction. """
Fixes: 95219afbb980 ("act_ct: support asymmetric conntrack") Signed-off-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/act_ct.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 31eb8eefc868..16c4cbf6d1f0 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -361,14 +361,19 @@ static int tcf_ct_act_nat(struct sk_buff *skb, }
err = ct_nat_execute(skb, ct, ctinfo, range, maniptype); - if (err == NF_ACCEPT && - ct->status & IPS_SRC_NAT && ct->status & IPS_DST_NAT) { - if (maniptype == NF_NAT_MANIP_SRC) - maniptype = NF_NAT_MANIP_DST; - else - maniptype = NF_NAT_MANIP_SRC; - - err = ct_nat_execute(skb, ct, ctinfo, range, maniptype); + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { + if (ct->status & IPS_SRC_NAT) { + if (maniptype == NF_NAT_MANIP_SRC) + maniptype = NF_NAT_MANIP_DST; + else + maniptype = NF_NAT_MANIP_SRC; + + err = ct_nat_execute(skb, ct, ctinfo, range, + maniptype); + } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { + err = ct_nat_execute(skb, ct, ctinfo, NULL, + NF_NAT_MANIP_SRC); + } } return err; #else
From: Huy Nguyen huyn@nvidia.com
[ Upstream commit 8ad893e516a77209a1818a2072d2027d87db809f ]
Currently, IPsec feature is disabled because mlx5e_build_nic_netdev is required to be called after mlx5e_ipsec_init. This requirement is invalid as mlx5e_build_nic_netdev and mlx5e_ipsec_init initialize independent resources.
Remove ipsec pointer check in mlx5e_build_nic_netdev so that the two functions can be called at any order.
Fixes: 547eede070eb ("net/mlx5e: IPSec, Innova IPSec offload infrastructure") Signed-off-by: Huy Nguyen huyn@nvidia.com Reviewed-by: Raed Salem raeds@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index cf58c9637904..c467f5e981f6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -515,9 +515,6 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv) struct mlx5_core_dev *mdev = priv->mdev; struct net_device *netdev = priv->netdev;
- if (!priv->ipsec) - return; - if (!(mlx5_accel_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_ESP) || !MLX5_CAP_ETH(mdev, swp)) { mlx5_core_dbg(mdev, "mlx5e: ESP and SWP offload not supported\n");
From: Dima Chumak dchumak@nvidia.com
[ Upstream commit a3e5fd9314dfc4314a9567cde96e1aef83a7458a ]
When adding a hairpin flow, a firmware-side send queue is created for the peer net device, which claims some host memory pages for its internal ring buffer. If the peer net device is removed/unbound before the hairpin flow is deleted, then the send queue is not destroyed which leads to a stack trace on pci device remove:
[ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110 [ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0 [ 748.002171] ------------[ cut here ]------------ [ 748.001177] FW pages counter is 4 after reclaiming all pages [ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core] [ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1 [ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9 [ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286 [ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000 [ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51 [ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8 [ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30 [ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000 [ 748.001429] FS: 00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000 [ 748.001695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0 [ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 748.001654] Call Trace: [ 748.000576] ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core] [ 748.001416] ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core] [ 748.001354] ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core] [ 748.001203] mlx5_function_teardown+0x30/0x60 [mlx5_core] [ 748.001275] mlx5_uninit_one+0xa7/0xc0 [mlx5_core] [ 748.001200] remove_one+0x5f/0xc0 [mlx5_core] [ 748.001075] pci_device_remove+0x9f/0x1d0 [ 748.000833] device_release_driver_internal+0x1e0/0x490 [ 748.001207] unbind_store+0x19f/0x200 [ 748.000942] ? sysfs_file_ops+0x170/0x170 [ 748.001000] kernfs_fop_write_iter+0x2bc/0x450 [ 748.000970] new_sync_write+0x373/0x610 [ 748.001124] ? new_sync_read+0x600/0x600 [ 748.001057] ? lock_acquire+0x4d6/0x700 [ 748.000908] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 748.001126] ? fd_install+0x1c9/0x4d0 [ 748.000951] vfs_write+0x4d0/0x800 [ 748.000804] ksys_write+0xf9/0x1d0 [ 748.000868] ? __x64_sys_read+0xb0/0xb0 [ 748.000811] ? filp_open+0x50/0x50 [ 748.000919] ? syscall_enter_from_user_mode+0x1d/0x50 [ 748.001223] do_syscall_64+0x3f/0x80 [ 748.000892] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 748.001026] RIP: 0033:0x7f58bcfb22f7 [ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7 [ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003 [ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001 [ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d [ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700 [ 748.001564] irq event stamp: 0 [ 748.000787] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0 [ 748.001854] softirqs last enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0 [ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 748.001492] ---[ end trace a6fabd773d1c51ae ]---
Fix by destroying the send queue of a hairpin peer net device that is being removed/unbound, which returns the allocated ring buffer pages to the host.
Fixes: 4d8fcf216c90 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules") Signed-off-by: Dima Chumak dchumak@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +- .../ethernet/mellanox/mlx5/core/transobj.c | 30 +++++++++++++++---- include/linux/mlx5/transobj.h | 1 + 3 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index fe7342e8a043..9d26463f3fa5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -4079,7 +4079,7 @@ static void mlx5e_tc_hairpin_update_dead_peer(struct mlx5e_priv *priv, list_for_each_entry_safe(hpe, tmp, &init_wait_list, dead_peer_wait_list) { wait_for_completion(&hpe->res_ready); if (!IS_ERR_OR_NULL(hpe->hp) && hpe->peer_vhca_id == peer_vhca_id) - hpe->hp->pair->peer_gone = true; + mlx5_core_hairpin_clear_dead_peer(hpe->hp->pair);
mlx5e_hairpin_put(priv, hpe); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/transobj.c b/drivers/net/ethernet/mellanox/mlx5/core/transobj.c index b1068500f1df..0b5437051a3b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/transobj.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/transobj.c @@ -457,6 +457,15 @@ err_modify_sq: return err; }
+static void mlx5_hairpin_unpair_peer_sq(struct mlx5_hairpin *hp) +{ + int i; + + for (i = 0; i < hp->num_channels; i++) + mlx5_hairpin_modify_sq(hp->peer_mdev, hp->sqn[i], MLX5_SQC_STATE_RDY, + MLX5_SQC_STATE_RST, 0, 0); +} + static void mlx5_hairpin_unpair_queues(struct mlx5_hairpin *hp) { int i; @@ -465,13 +474,9 @@ static void mlx5_hairpin_unpair_queues(struct mlx5_hairpin *hp) for (i = 0; i < hp->num_channels; i++) mlx5_hairpin_modify_rq(hp->func_mdev, hp->rqn[i], MLX5_RQC_STATE_RDY, MLX5_RQC_STATE_RST, 0, 0); - /* unset peer SQs */ - if (hp->peer_gone) - return; - for (i = 0; i < hp->num_channels; i++) - mlx5_hairpin_modify_sq(hp->peer_mdev, hp->sqn[i], MLX5_SQC_STATE_RDY, - MLX5_SQC_STATE_RST, 0, 0); + if (!hp->peer_gone) + mlx5_hairpin_unpair_peer_sq(hp); }
struct mlx5_hairpin * @@ -518,3 +523,16 @@ void mlx5_core_hairpin_destroy(struct mlx5_hairpin *hp) mlx5_hairpin_destroy_queues(hp); kfree(hp); } + +void mlx5_core_hairpin_clear_dead_peer(struct mlx5_hairpin *hp) +{ + int i; + + mlx5_hairpin_unpair_peer_sq(hp); + + /* destroy peer SQ */ + for (i = 0; i < hp->num_channels; i++) + mlx5_core_destroy_sq(hp->peer_mdev, hp->sqn[i]); + + hp->peer_gone = true; +} diff --git a/include/linux/mlx5/transobj.h b/include/linux/mlx5/transobj.h index dc6b1e7cb8c4..3abefd40a4d8 100644 --- a/include/linux/mlx5/transobj.h +++ b/include/linux/mlx5/transobj.h @@ -92,4 +92,5 @@ mlx5_core_hairpin_create(struct mlx5_core_dev *func_mdev, struct mlx5_hairpin_params *params);
void mlx5_core_hairpin_destroy(struct mlx5_hairpin *pair); +void mlx5_core_hairpin_clear_dead_peer(struct mlx5_hairpin *hp); #endif /* __TRANSOBJ_H__ */
From: Maor Gottlieb maorg@nvidia.com
[ Upstream commit c189716b2a7c1d2d8658e269735273caa1c38b54 ]
Check if RoCE is supported by the device before enable it in the vport context and create all the RDMA steering objects.
Fixes: 80f09dfc237f ("net/mlx5: Eswitch, enable RoCE loopback traffic") Signed-off-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/rdma.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rdma.c b/drivers/net/ethernet/mellanox/mlx5/core/rdma.c index 8e0dddc6383f..2389239acadc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/rdma.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/rdma.c @@ -156,6 +156,9 @@ void mlx5_rdma_enable_roce(struct mlx5_core_dev *dev) { int err;
+ if (!MLX5_CAP_GEN(dev, roce)) + return; + err = mlx5_nic_vport_enable_roce(dev); if (err) { mlx5_core_err(dev, "Failed to enable RoCE: %d\n", err);
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit a1718505d7f67ee0ab051322f1cbc7ac42b5da82 ]
since mlx5 hardware can segment correctly TSO packets on VXLAN over VLAN topologies, CPU usage can improve significantly if we enable tunnel offloads in dev->vlan_features, like it was done in the past with other NIC drivers (e.g. mlx4, be2net and ixgbe).
Signed-off-by: Davide Caratti dcaratti@redhat.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 36b9a364ef26..374ce17e1499 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5012,6 +5012,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM; netdev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM; + netdev->vlan_features |= NETIF_F_GSO_UDP_TUNNEL | + NETIF_F_GSO_UDP_TUNNEL_CSUM; }
if (mlx5e_tunnel_proto_supported(mdev, IPPROTO_GRE)) {
From: Aya Levin ayal@nvidia.com
[ Upstream commit 6d6727dddc7f93fcc155cb8d0c49c29ae0e71122 ]
The device is able to offload either the outer header csum or inner header csum. The driver utilizes the inner csum offload. Hence, block setting of tx-udp_tnl-csum-segmentation and set it to off[fixed].
Fixes: b49663c8fb49 ("net/mlx5e: Add support for UDP tunnel segmentation with outer checksum offload") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 374ce17e1499..24c49a84947f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5007,13 +5007,9 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) }
if (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev)) { - netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL | - NETIF_F_GSO_UDP_TUNNEL_CSUM; - netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL | - NETIF_F_GSO_UDP_TUNNEL_CSUM; - netdev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM; - netdev->vlan_features |= NETIF_F_GSO_UDP_TUNNEL | - NETIF_F_GSO_UDP_TUNNEL_CSUM; + netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL; + netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL; + netdev->vlan_features |= NETIF_F_GSO_UDP_TUNNEL; }
if (mlx5e_tunnel_proto_supported(mdev, IPPROTO_GRE)) {
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit 5fc177ab759418c9537433e63301096e733fb915 ]
The TCP option parser in synproxy (synproxy_parse_options) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1.
This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options.").
v2 changes:
Added an early return when length < 0 to avoid calling skb_header_pointer with negative length.
Cc: Young Xiao 92siuyang@gmail.com Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_synproxy_core.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c index 4bb4cfde28b4..c6c0d27caaed 100644 --- a/net/netfilter/nf_synproxy_core.c +++ b/net/netfilter/nf_synproxy_core.c @@ -31,6 +31,9 @@ synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, int length = (th->doff * 4) - sizeof(*th); u8 buf[40], *ptr;
+ if (unlikely(length < 0)) + return false; + ptr = skb_header_pointer(skb, doff + sizeof(*th), length, buf); if (ptr == NULL) return false; @@ -47,6 +50,8 @@ synproxy_parse_options(const struct sk_buff *skb, unsigned int doff, length--; continue; default: + if (length < 2) + return true; opsize = *ptr++; if (opsize < 2) return true;
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit ba91c49dedbde758ba0b72f57ac90b06ddf8e548 ]
The TCP option parser in cake qdisc (cake_get_tcpopt and cake_tcph_may_drop) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1.
This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options.").
v2 changes:
Added doff validation in cake_get_tcphdr to avoid parsing garbage as TCP header. Although it wasn't strictly an out-of-bounds access (memory was allocated), garbage values could be read where CAKE expected the TCP header if doff was smaller than 5.
Cc: Young Xiao 92siuyang@gmail.com Fixes: 8b7138814f29 ("sch_cake: Add optional ACK filter") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Acked-by: Toke Høiland-Jørgensen toke@toke.dk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_cake.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 896c0562cb42..e8eebe40e0ae 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -907,7 +907,7 @@ static struct tcphdr *cake_get_tcphdr(const struct sk_buff *skb, }
tcph = skb_header_pointer(skb, offset, sizeof(_tcph), &_tcph); - if (!tcph) + if (!tcph || tcph->doff < 5) return NULL;
return skb_header_pointer(skb, offset, @@ -931,6 +931,8 @@ static const void *cake_get_tcpopt(const struct tcphdr *tcph, length--; continue; } + if (length < 2) + break; opsize = *ptr++; if (opsize < 2 || opsize > length) break; @@ -1068,6 +1070,8 @@ static bool cake_tcph_may_drop(const struct tcphdr *tcph, length--; continue; } + if (length < 2) + break; opsize = *ptr++; if (opsize < 2 || opsize > length) break;
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 33e381448cf7a05d76ac0b47d4a6531ecd0e5c53 ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: ab69bde6b2e9 ("alx: add a simple AR816x/AR817x device driver") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/atheros/alx/main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c index bde8ec75ac4e..bee62d7caccc 100644 --- a/drivers/net/ethernet/atheros/alx/main.c +++ b/drivers/net/ethernet/atheros/alx/main.c @@ -1852,6 +1852,7 @@ out_free_netdev: free_netdev(netdev); out_pci_release: pci_release_mem_regions(pdev); + pci_disable_pcie_error_reporting(pdev); out_pci_disable: pci_disable_device(pdev); return err;
From: Jisheng Zhang Jisheng.Zhang@synaptics.com
[ Upstream commit 1adb20f0d496b2c61e9aa1f4761b8d71f93d258e ]
The register starts from 0x800 is the 16th MAC address register rather than the first one.
Fixes: cffb13f4d6fb ("stmmac: extend mac addr reg and fix perfect filering") Signed-off-by: Jisheng Zhang Jisheng.Zhang@synaptics.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h index b70d44ac0990..3c73453725f9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h @@ -76,10 +76,10 @@ enum power_event { #define LPI_CTRL_STATUS_TLPIEN 0x00000001 /* Transmit LPI Entry */
/* GMAC HW ADDR regs */ -#define GMAC_ADDR_HIGH(reg) (((reg > 15) ? 0x00000800 : 0x00000040) + \ - (reg * 8)) -#define GMAC_ADDR_LOW(reg) (((reg > 15) ? 0x00000804 : 0x00000044) + \ - (reg * 8)) +#define GMAC_ADDR_HIGH(reg) ((reg > 15) ? 0x00000800 + (reg - 16) * 8 : \ + 0x00000040 + (reg * 8)) +#define GMAC_ADDR_LOW(reg) ((reg > 15) ? 0x00000804 + (reg - 16) * 8 : \ + 0x00000044 + (reg * 8)) #define GMAC_MAX_PERFECT_ADDRESSES 1
#define GMAC_PCS_BASE 0x000000c0 /* PCS register base */
From: Changbin Du changbin.du@gmail.com
[ Upstream commit ea6932d70e223e02fea3ae20a4feff05d7c1ea9a ]
There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled. The reason is that nsfs tries to access ns->ops but the proc_ns_operations is not implemented in this case.
[7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [7.670268] pgd = 32b54000 [7.670544] [00000010] *pgd=00000000 [7.671861] Internal error: Oops: 5 [#1] SMP ARM [7.672315] Modules linked in: [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16 [7.673309] Hardware name: Generic DT based system [7.673642] PC is at nsfs_evict+0x24/0x30 [7.674486] LR is at clear_inode+0x20/0x9c
The same to tun SIOCGSKNS command.
To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is disabled. Meanwhile move it to right place net/core/net_namespace.c.
Signed-off-by: Changbin Du changbin.du@gmail.com Fixes: c62cce2caee5 ("net: add an ioctl to get a socket network namespace") Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jakub Kicinski kuba@kernel.org Cc: David Laight David.Laight@ACULAB.COM Cc: Christian Brauner christian.brauner@ubuntu.com Suggested-by: Jakub Kicinski kuba@kernel.org Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/socket.h | 2 -- include/net/net_namespace.h | 7 +++++++ net/core/net_namespace.c | 12 ++++++++++++ net/socket.c | 13 ------------- 4 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h index 4049d9755cf1..a465c6a45d6f 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -406,6 +406,4 @@ extern int __sys_getpeername(int fd, struct sockaddr __user *usockaddr, extern int __sys_socketpair(int family, int type, int protocol, int __user *usockvec); extern int __sys_shutdown(int fd, int how); - -extern struct ns_common *get_net_ns(struct ns_common *ns); #endif /* _LINUX_SOCKET_H */ diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 0fca98a3d2d3..167e390ac9d4 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -195,6 +195,8 @@ struct net *copy_net_ns(unsigned long flags, struct user_namespace *user_ns, void net_ns_get_ownership(const struct net *net, kuid_t *uid, kgid_t *gid);
void net_ns_barrier(void); + +struct ns_common *get_net_ns(struct ns_common *ns); #else /* CONFIG_NET_NS */ #include <linux/sched.h> #include <linux/nsproxy.h> @@ -214,6 +216,11 @@ static inline void net_ns_get_ownership(const struct net *net, }
static inline void net_ns_barrier(void) {} + +static inline struct ns_common *get_net_ns(struct ns_common *ns) +{ + return ERR_PTR(-EINVAL); +} #endif /* CONFIG_NET_NS */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 39402840025e..c303873496a3 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -643,6 +643,18 @@ void __put_net(struct net *net) } EXPORT_SYMBOL_GPL(__put_net);
+/** + * get_net_ns - increment the refcount of the network namespace + * @ns: common namespace (net) + * + * Returns the net's common namespace. + */ +struct ns_common *get_net_ns(struct ns_common *ns) +{ + return &get_net(container_of(ns, struct net, ns))->ns; +} +EXPORT_SYMBOL_GPL(get_net_ns); + struct net *get_net_ns_by_fd(int fd) { struct file *file; diff --git a/net/socket.c b/net/socket.c index d1a0264401b7..b14917dd811a 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1071,19 +1071,6 @@ static long sock_do_ioctl(struct net *net, struct socket *sock, * what to do with it - that's up to the protocol still. */
-/** - * get_net_ns - increment the refcount of the network namespace - * @ns: common namespace (net) - * - * Returns the net's common namespace. - */ - -struct ns_common *get_net_ns(struct ns_common *ns) -{ - return &get_net(container_of(ns, struct net, ns))->ns; -} -EXPORT_SYMBOL_GPL(get_net_ns); - static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) { struct socket *sock;
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit cb3376604a676e0302258b01893911bdd7aa5278 ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: 451724c821c1 ("qlcnic: aer support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index f2e5f494462b..3a96fd6deef7 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -2709,6 +2709,7 @@ err_out_free_hw_res: kfree(ahw);
err_out_free_res: + pci_disable_pcie_error_reporting(pdev); pci_release_regions(pdev);
err_out_disable_pdev:
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 49a10c7b176295f8fafb338911cf028e97f65f4d ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: e87ad5539343 ("netxen: support pci error handlers") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c index 25b6f2ee2beb..1b9867ea4333 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c @@ -1602,6 +1602,8 @@ err_out_free_netdev: free_netdev(netdev);
err_out_free_res: + if (NX_IS_REVISION_P3(pdev->revision)) + pci_disable_pcie_error_reporting(pdev); pci_release_regions(pdev);
err_out_disable_pdev:
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit ad9d24c9429e2159d1e279dc3a83191ccb4daf1d ]
Syzbot reported slab-out-of-bounds Read in qrtr_endpoint_post. The problem was in wrong _size_ type:
if (len != ALIGN(size, 4) + hdrlen) goto err;
If size from qrtr_hdr is 4294967293 (0xfffffffd), the result of ALIGN(size, 4) will be 0. In case of len == hdrlen and size == 4294967293 in header this check won't fail and
skb_put_data(skb, data + hdrlen, size);
will read out of bound from data, which is hdrlen allocated block.
Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets") Reported-and-tested-by: syzbot+1917d778024161609247@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin paskripkin@gmail.com Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/qrtr/qrtr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c index 46273a838361..faea2ce12511 100644 --- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -257,7 +257,7 @@ int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len) const struct qrtr_hdr_v2 *v2; struct sk_buff *skb; struct qrtr_cb *cb; - unsigned int size; + size_t size; unsigned int ver; size_t hdrlen;
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 475b92f932168a78da8109acd10bfb7578b8f2bb ]
Scaled PPM conversion to PPB may (on 64bit systems) result in a value larger than s32 can hold (freq/scaled_ppm is a long). This means the kernel will not correctly reject unreasonably high ->freq values (e.g. > 4294967295ppb, 281474976645 scaled PPM).
The conversion is equivalent to a division by ~66 (65.536), so the value of ppb is always smaller than ppm, but not small enough to assume narrowing the type from long -> s32 is okay.
Note that reasonable user space (e.g. ptp4l) will not use such high values, anyway, 4289046510ppb ~= 4.3x, so the fix is somewhat pedantic.
Fixes: d39a743511cd ("ptp: validate the requested frequency adjustment.") Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Signed-off-by: Jakub Kicinski kuba@kernel.org Acked-by: Richard Cochran richardcochran@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ptp/ptp_clock.c | 6 +++--- include/linux/ptp_clock_kernel.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index b84f16bbd6f2..eedf067ee8e3 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -63,7 +63,7 @@ static void enqueue_external_timestamp(struct timestamp_event_queue *queue, spin_unlock_irqrestore(&queue->lock, flags); }
-s32 scaled_ppm_to_ppb(long ppm) +long scaled_ppm_to_ppb(long ppm) { /* * The 'freq' field in the 'struct timex' is in parts per @@ -80,7 +80,7 @@ s32 scaled_ppm_to_ppb(long ppm) s64 ppb = 1 + ppm; ppb *= 125; ppb >>= 13; - return (s32) ppb; + return (long) ppb; } EXPORT_SYMBOL(scaled_ppm_to_ppb);
@@ -138,7 +138,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct __kernel_timex *tx) delta = ktime_to_ns(kt); err = ops->adjtime(ops, delta); } else if (tx->modes & ADJ_FREQUENCY) { - s32 ppb = scaled_ppm_to_ppb(tx->freq); + long ppb = scaled_ppm_to_ppb(tx->freq); if (ppb > ops->max_adj || ppb < -ops->max_adj) return -ERANGE; if (ops->adjfine) diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h index 93cc4f1d444a..874f7e73ed01 100644 --- a/include/linux/ptp_clock_kernel.h +++ b/include/linux/ptp_clock_kernel.h @@ -218,7 +218,7 @@ extern int ptp_clock_index(struct ptp_clock *ptp); * @ppm: Parts per million, but with a 16 bit binary fractional field */
-extern s32 scaled_ppm_to_ppb(long ppm); +extern long scaled_ppm_to_ppb(long ppm);
/** * ptp_find_pin() - obtain the pin index of a given auxiliary function
From: Maciej Żenczykowski maze@google.com
[ Upstream commit c1a3d4067309451e68c33dbd356032549cc0bd8e ]
This is meant to make the host side cdc_ncm interface consistently named just like the older CDC protocols: cdc_ether & cdc_ecm (and even rndis_host), which all use 'FLAG_ETHER | FLAG_POINTTOPOINT'.
include/linux/usb/usbnet.h: #define FLAG_ETHER 0x0020 /* maybe use "eth%d" names */ #define FLAG_WLAN 0x0080 /* use "wlan%d" names */ #define FLAG_WWAN 0x0400 /* use "wwan%d" names */ #define FLAG_POINTTOPOINT 0x1000 /* possibly use "usb%d" names */
drivers/net/usb/usbnet.c @ line 1711: strcpy (net->name, "usb%d"); ... // heuristic: "usb%d" for links we know are two-host, // else "eth%d" when there's reasonable doubt. userspace // can rename the link if it knows better. if ((dev->driver_info->flags & FLAG_ETHER) != 0 && ((dev->driver_info->flags & FLAG_POINTTOPOINT) == 0 || (net->dev_addr [0] & 0x02) == 0)) strcpy (net->name, "eth%d"); /* WLAN devices should always be named "wlan%d" */ if ((dev->driver_info->flags & FLAG_WLAN) != 0) strcpy(net->name, "wlan%d"); /* WWAN devices should always be named "wwan%d" */ if ((dev->driver_info->flags & FLAG_WWAN) != 0) strcpy(net->name, "wwan%d");
So by using ETHER | POINTTOPOINT the interface naming is either usb%d or eth%d based on the global uniqueness of the mac address of the device.
Without this 2.5gbps ethernet dongles which all seem to use the cdc_ncm driver end up being called usb%d instead of eth%d even though they're definitely not two-host. (All 1gbps & 5gbps ethernet usb dongles I've tested don't hit this problem due to use of different drivers, primarily r8152 and aqc111)
Fixes tag is based purely on git blame, and is really just here to make sure this hits LTS branches newer than v4.5.
Cc: Lorenzo Colitti lorenzo@google.com Fixes: 4d06dd537f95 ("cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind") Signed-off-by: Maciej Żenczykowski maze@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/cdc_ncm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 0646bcd26968..4cff0a6b1098 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -1662,7 +1662,7 @@ static void cdc_ncm_status(struct usbnet *dev, struct urb *urb) static const struct driver_info cdc_ncm_info = { .description = "CDC NCM", .flags = FLAG_POINTTOPOINT | FLAG_NO_SETINT | FLAG_MULTI_PACKET - | FLAG_LINK_INTR, + | FLAG_LINK_INTR | FLAG_ETHER, .bind = cdc_ncm_bind, .unbind = cdc_ncm_unbind, .manage_power = usbnet_manage_power,
From: Aleksander Jan Bajkowski olek2@wp.pl
[ Upstream commit 7ea6cd16f1599c1eac6018751eadbc5fc736b99a ]
The previous commit didn't fix the bug properly. By mistake, it replaces the pointer of the next skb in the descriptor ring instead of the current one. As a result, the two descriptors are assigned the same SKB. The error is seen during the iperf test when skb_put tries to insert a second packet and exceeds the available buffer.
Fixes: c7718ee96dbc ("net: lantiq: fix memory corruption in RX ring ") Signed-off-by: Aleksander Jan Bajkowski olek2@wp.pl Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/lantiq_xrx200.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c index 8d073b895fc0..6e504854571c 100644 --- a/drivers/net/ethernet/lantiq_xrx200.c +++ b/drivers/net/ethernet/lantiq_xrx200.c @@ -154,6 +154,7 @@ static int xrx200_close(struct net_device *net_dev)
static int xrx200_alloc_skb(struct xrx200_chan *ch) { + struct sk_buff *skb = ch->skb[ch->dma.desc]; dma_addr_t mapping; int ret = 0;
@@ -168,6 +169,7 @@ static int xrx200_alloc_skb(struct xrx200_chan *ch) XRX200_DMA_DATA_LEN, DMA_FROM_DEVICE); if (unlikely(dma_mapping_error(ch->priv->dev, mapping))) { dev_kfree_skb_any(ch->skb[ch->dma.desc]); + ch->skb[ch->dma.desc] = skb; ret = -ENOMEM; goto skip; } @@ -198,7 +200,6 @@ static int xrx200_hw_receive(struct xrx200_chan *ch) ch->dma.desc %= LTQ_DESC_NUM;
if (ret) { - ch->skb[ch->dma.desc] = skb; net_dev->stats.rx_dropped++; netdev_err(net_dev, "failed to allocate new rx buffer\n"); return ret;
From: Dongliang Mu mudongliangabcd@gmail.com
[ Upstream commit 56b786d86694e079d8aad9b314e015cd4ac02a3d ]
The commit 46a8b29c6306 ("net: usb: fix memory leak in smsc75xx_bind") fails to clean up the work scheduled in smsc75xx_reset-> smsc75xx_set_multicast, which leads to use-after-free if the work is scheduled to start after the deallocation. In addition, this patch also removes a dangling pointer - dev->data[0].
This patch calls cancel_work_sync to cancel the scheduled work and set the dangling pointer to NULL.
Fixes: 46a8b29c6306 ("net: usb: fix memory leak in smsc75xx_bind") Signed-off-by: Dongliang Mu mudongliangabcd@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/smsc75xx.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c index d0ae0df34e13..aa848be459ec 100644 --- a/drivers/net/usb/smsc75xx.c +++ b/drivers/net/usb/smsc75xx.c @@ -1482,7 +1482,7 @@ static int smsc75xx_bind(struct usbnet *dev, struct usb_interface *intf) ret = smsc75xx_wait_ready(dev, 0); if (ret < 0) { netdev_warn(dev->net, "device not ready in smsc75xx_bind\n"); - goto err; + goto free_pdata; }
smsc75xx_init_mac_address(dev); @@ -1491,7 +1491,7 @@ static int smsc75xx_bind(struct usbnet *dev, struct usb_interface *intf) ret = smsc75xx_reset(dev); if (ret < 0) { netdev_warn(dev->net, "smsc75xx_reset error %d\n", ret); - goto err; + goto cancel_work; }
dev->net->netdev_ops = &smsc75xx_netdev_ops; @@ -1502,8 +1502,11 @@ static int smsc75xx_bind(struct usbnet *dev, struct usb_interface *intf) dev->net->max_mtu = MAX_SINGLE_PACKET_SIZE; return 0;
-err: +cancel_work: + cancel_work_sync(&pdata->set_multicast); +free_pdata: kfree(pdata); + dev->data[0] = 0; return ret; }
@@ -1514,7 +1517,6 @@ static void smsc75xx_unbind(struct usbnet *dev, struct usb_interface *intf) cancel_work_sync(&pdata->set_multicast); netif_dbg(dev, ifdown, dev->net, "free pdata\n"); kfree(pdata); - pdata = NULL; dev->data[0] = 0; } }
From: Joakim Zhang qiangqing.zhang@nxp.com
[ Upstream commit d23765646e71b43ed2b809930411ba5c0aadee7b ]
Commit da722186f654 ("net: fec: set GPR bit on suspend by DT configuration.") refactor the fec_devtype, need adjust ptp driver accordingly.
Fixes: da722186f654 ("net: fec: set GPR bit on suspend by DT configuration.") Signed-off-by: Joakim Zhang qiangqing.zhang@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_ptp.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c index 49fad118988b..da04e61d819a 100644 --- a/drivers/net/ethernet/freescale/fec_ptp.c +++ b/drivers/net/ethernet/freescale/fec_ptp.c @@ -220,15 +220,13 @@ static u64 fec_ptp_read(const struct cyclecounter *cc) { struct fec_enet_private *fep = container_of(cc, struct fec_enet_private, cc); - const struct platform_device_id *id_entry = - platform_get_device_id(fep->pdev); u32 tempval;
tempval = readl(fep->hwp + FEC_ATIME_CTRL); tempval |= FEC_T_CTRL_CAPTURE; writel(tempval, fep->hwp + FEC_ATIME_CTRL);
- if (id_entry->driver_data & FEC_QUIRK_BUG_CAPTURE) + if (fep->quirks & FEC_QUIRK_BUG_CAPTURE) udelay(1);
return readl(fep->hwp + FEC_ATIME);
From: Chengyang Fan cy.fan@huawei.com
[ Upstream commit d8e2973029b8b2ce477b564824431f3385c77083 ]
BUG: memory leak unreferenced object 0xffff888101bc4c00 (size 32): comm "syz-executor527", pid 360, jiffies 4294807421 (age 19.329s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 00 ac 14 14 bb 00 00 02 00 ................ backtrace: [<00000000f17c5244>] kmalloc include/linux/slab.h:558 [inline] [<00000000f17c5244>] kzalloc include/linux/slab.h:688 [inline] [<00000000f17c5244>] ip_mc_add1_src net/ipv4/igmp.c:1971 [inline] [<00000000f17c5244>] ip_mc_add_src+0x95f/0xdb0 net/ipv4/igmp.c:2095 [<000000001cb99709>] ip_mc_source+0x84c/0xea0 net/ipv4/igmp.c:2416 [<0000000052cf19ed>] do_ip_setsockopt net/ipv4/ip_sockglue.c:1294 [inline] [<0000000052cf19ed>] ip_setsockopt+0x114b/0x30c0 net/ipv4/ip_sockglue.c:1423 [<00000000477edfbc>] raw_setsockopt+0x13d/0x170 net/ipv4/raw.c:857 [<00000000e75ca9bb>] __sys_setsockopt+0x158/0x270 net/socket.c:2117 [<00000000bdb993a8>] __do_sys_setsockopt net/socket.c:2128 [inline] [<00000000bdb993a8>] __se_sys_setsockopt net/socket.c:2125 [inline] [<00000000bdb993a8>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2125 [<000000006a1ffdbd>] do_syscall_64+0x40/0x80 arch/x86/entry/common.c:47 [<00000000b11467c4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In commit 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down"), the ip_mc_clear_src() in ip_mc_destroy_dev() was removed, because it was also called in igmpv3_clear_delrec().
Rough callgraph:
inetdev_destroy -> ip_mc_destroy_dev -> igmpv3_clear_delrec -> ip_mc_clear_src -> RCU_INIT_POINTER(dev->ip_ptr, NULL)
However, ip_mc_clear_src() called in igmpv3_clear_delrec() doesn't release in_dev->mc_list->sources. And RCU_INIT_POINTER() assigns the NULL to dev->ip_ptr. As a result, in_dev cannot be obtained through inetdev_by_index() and then in_dev->mc_list->sources cannot be released by ip_mc_del1_src() in the sock_close. Rough call sequence goes like:
sock_close -> __sock_release -> inet_release -> ip_mc_drop_socket -> inetdev_by_index -> ip_mc_leave_src -> ip_mc_del_src -> ip_mc_del1_src
So we still need to call ip_mc_clear_src() in ip_mc_destroy_dev() to free in_dev->mc_list->sources.
Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info ...") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Chengyang Fan cy.fan@huawei.com Acked-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/igmp.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 480d0b22db1a..c8cbdc4d5cbc 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1803,6 +1803,7 @@ void ip_mc_destroy_dev(struct in_device *in_dev) while ((i = rtnl_dereference(in_dev->mc_list)) != NULL) { in_dev->mc_list = i->next_rcu; in_dev->mc_count--; + ip_mc_clear_src(i); ip_ma_put(i); } }
From: Eric Dumazet edumazet@google.com
[ Upstream commit a494bd642d9120648b06bb7d28ce6d05f55a7819 ]
While unix_may_send(sk, osk) is called while osk is locked, it appears unix_release_sock() can overwrite unix_peer() after this lock has been released, making KCSAN unhappy.
Changing unix_release_sock() to access/change unix_peer() before lock is released should fix this issue.
BUG: KCSAN: data-race in unix_dgram_sendmsg / unix_release_sock
write to 0xffff88810465a338 of 8 bytes by task 20852 on cpu 1: unix_release_sock+0x4ed/0x6e0 net/unix/af_unix.c:558 unix_release+0x2f/0x50 net/unix/af_unix.c:859 __sock_release net/socket.c:599 [inline] sock_close+0x6c/0x150 net/socket.c:1258 __fput+0x25b/0x4e0 fs/file_table.c:280 ____fput+0x11/0x20 fs/file_table.c:313 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:209 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:302 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88810465a338 of 8 bytes by task 20888 on cpu 0: unix_may_send net/unix/af_unix.c:189 [inline] unix_dgram_sendmsg+0x923/0x1610 net/unix/af_unix.c:1712 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff888167905400 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 20888 Comm: syz-executor.0 Not tainted 5.13.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index ecadd9e482c4..9f96826eb3ba 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -537,12 +537,14 @@ static void unix_release_sock(struct sock *sk, int embrion) u->path.mnt = NULL; state = sk->sk_state; sk->sk_state = TCP_CLOSE; + + skpair = unix_peer(sk); + unix_peer(sk) = NULL; + unix_state_unlock(sk);
wake_up_interruptible_all(&u->peer_wait);
- skpair = unix_peer(sk); - if (skpair != NULL) { if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) { unix_state_lock(skpair); @@ -557,7 +559,6 @@ static void unix_release_sock(struct sock *sk, int embrion)
unix_dgram_peer_wake_disconnect(sk, skpair); sock_put(skpair); /* It may now die */ - unix_peer(sk) = NULL; }
/* Try to flush out this socket. Throw out buffers at least */
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit c19c8c0e666f9259e2fc4d2fa4b9ff8e3b40ee5d ]
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function.
Fixes: d6b6d9877878 ("be2net: use PCIe AER capability") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Acked-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/emulex/benet/be_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 39eb7d525043..9aebb121365f 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -6030,6 +6030,7 @@ drv_cleanup: unmap_bars: be_unmap_pci_bars(adapter); free_netdev: + pci_disable_pcie_error_reporting(pdev); free_netdev(netdev); rel_reg: pci_release_regions(pdev);
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit 7edcc682301492380fbdd604b4516af5ae667a13 ]
My local syzbot instance hit memory leak in mkiss_open()[1]. The problem was in missing free_netdev() in mkiss_close().
In mkiss_open() netdevice is allocated and then registered, but in mkiss_close() netdevice was only unregistered, but not freed.
Fail log:
BUG: memory leak unreferenced object 0xffff8880281ba000 (size 4096): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 61 78 30 00 00 00 00 00 00 00 00 00 00 00 00 00 ax0............. 00 27 fa 2a 80 88 ff ff 00 00 00 00 00 00 00 00 .'.*............ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706e7e8>] alloc_netdev_mqs+0x98/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak unreferenced object 0xffff8880141a9a00 (size 96): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): e8 a2 1b 28 80 88 ff ff e8 a2 1b 28 80 88 ff ff ...(.......(.... 98 92 9c aa b0 40 02 00 00 00 00 00 00 00 00 00 .....@.......... backtrace: [<ffffffff8709f68b>] __hw_addr_create_ex+0x5b/0x310 [<ffffffff8709fb38>] __hw_addr_add_ex+0x1f8/0x2b0 [<ffffffff870a0c7b>] dev_addr_init+0x10b/0x1f0 [<ffffffff8706e88b>] alloc_netdev_mqs+0x13b/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak unreferenced object 0xffff8880219bfc00 (size 512): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 00 a0 1b 28 80 88 ff ff 80 8f b1 8d ff ff ff ff ...(............ 80 8f b1 8d ff ff ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706eec7>] alloc_netdev_mqs+0x777/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak unreferenced object 0xffff888029b2b200 (size 256): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706f062>] alloc_netdev_mqs+0x912/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 815f62bf7427 ("[PATCH] SMP rewrite of mkiss") Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/hamradio/mkiss.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c index deef14215110..352f9e75954c 100644 --- a/drivers/net/hamradio/mkiss.c +++ b/drivers/net/hamradio/mkiss.c @@ -800,6 +800,7 @@ static void mkiss_close(struct tty_struct *tty) ax->tty = NULL;
unregister_netdev(ax->dev); + free_netdev(ax->dev); }
/* Perform I/O control on an active ax25 channel. */
From: Linyu Yuan linyyuan@codeaurora.org
[ Upstream commit c3b26fdf1b32f91c7a3bc743384b4a298ab53ad7 ]
when usbnet transmit a skb, eem fixup it in eem_tx_fixup(), if skb_copy_expand() failed, it return NULL, usbnet_start_xmit() will have no chance to free original skb.
fix it by free orginal skb in eem_tx_fixup() first, then check skb clone status, if failed, return NULL to usbnet.
Fixes: 9f722c0978b0 ("usbnet: CDC EEM support (v5)") Signed-off-by: Linyu Yuan linyyuan@codeaurora.org Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/cdc_eem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/cdc_eem.c b/drivers/net/usb/cdc_eem.c index 0eeec80bec31..e4a570366646 100644 --- a/drivers/net/usb/cdc_eem.c +++ b/drivers/net/usb/cdc_eem.c @@ -123,10 +123,10 @@ static struct sk_buff *eem_tx_fixup(struct usbnet *dev, struct sk_buff *skb, }
skb2 = skb_copy_expand(skb, EEM_HEAD, ETH_FCS_LEN + padlen, flags); + dev_kfree_skb_any(skb); if (!skb2) return NULL;
- dev_kfree_skb_any(skb); skb = skb2;
done:
From: Pavel Machek pavel@denx.de
[ Upstream commit 39eb028183bc7378bb6187067e20bf6d8c836407 ]
While fixing coverity warning, commit dd2c79677375 introduced typo in shift value. Fix that.
Signed-off-by: Pavel Machek (CIP) pavel@denx.de Fixes: dd2c79677375 ("cxgb4: Fix unintentional sign extension issues") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c index ccb28182f745..44f86a33ef62 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -198,7 +198,7 @@ static void set_nat_params(struct adapter *adap, struct filter_entry *f, WORD_MASK, f->fs.nat_lip[3] | f->fs.nat_lip[2] << 8 | f->fs.nat_lip[1] << 16 | - (u64)f->fs.nat_lip[0] << 25, 1); + (u64)f->fs.nat_lip[0] << 24, 1); } }
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 0afd6a4e8028cc487c240b6cfe04094e45a306e4 ]
There is a missing bnxt_probe_phy() call in bnxt_fw_init_one() to rediscover the PHY capabilities after a firmware reset. This can cause some PHY related functionalities to fail after a firmware reset. For example, in multi-host, the ability for any host to configure the PHY settings may be lost after a firmware reset.
Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 00ae7a9a42bf..f983d03da1e4 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -10610,6 +10610,8 @@ static void bnxt_fw_init_one_p3(struct bnxt *bp) bnxt_hwrm_coal_params_qcaps(bp); }
+static int bnxt_probe_phy(struct bnxt *bp, bool fw_dflt); + static int bnxt_fw_init_one(struct bnxt *bp) { int rc; @@ -10624,6 +10626,9 @@ static int bnxt_fw_init_one(struct bnxt *bp) netdev_err(bp->dev, "Firmware init phase 2 failed\n"); return rc; } + rc = bnxt_probe_phy(bp, false); + if (rc) + return rc; rc = bnxt_approve_mac(bp, bp->dev->dev_addr, false); if (rc) return rc;
From: Somnath Kotur somnath.kotur@broadcom.com
[ Upstream commit 03400aaa69f916a376e11526cf591901a96a3a5c ]
bnxt_ethtool_init() may have allocated some memory and we need to call bnxt_ethtool_free() to properly unwind if bnxt_init_one() fails.
Fixes: 7c3809181468 ("bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.") Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index f983d03da1e4..d1c3939b0307 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11963,6 +11963,7 @@ init_err_cleanup: init_err_pci_clean: bnxt_free_hwrm_short_cmd_req(bp); bnxt_free_hwrm_resources(bp); + bnxt_ethtool_free(bp); kfree(bp->fw_health); bp->fw_health = NULL; bnxt_cleanup_pci(bp);
From: Toke Høiland-Jørgensen toke@redhat.com
[ Upstream commit 321827477360934dc040e9d3c626bf1de6c3ab3c ]
When constructing ICMP response messages, the kernel will try to pick a suitable source address for the outgoing packet. However, if no IPv4 addresses are configured on the system at all, this will fail and we end up producing an ICMP message with a source address of 0.0.0.0. This can happen on a box routing IPv4 traffic via v6 nexthops, for instance.
Since 0.0.0.0 is not generally routable on the internet, there's a good chance that such ICMP messages will never make it back to the sender of the original packet that the ICMP message was sent in response to. This, in turn, can create connectivity and PMTUd problems for senders. Fortunately, RFC7600 reserves a dummy address to be used as a source for ICMP messages (192.0.0.8/32), so let's teach the kernel to substitute that address as a last resort if the regular source address selection procedure fails.
Below is a quick example reproducing this issue with network namespaces:
ip netns add ns0 ip l add type veth peer netns ns0 ip l set dev veth0 up ip a add 10.0.0.1/24 dev veth0 ip a add fc00:dead:cafe:42::1/64 dev veth0 ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 ip -n ns0 l set dev veth0 up ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 tcpdump -tpni veth0 -c 2 icmp & ping -w 1 10.1.0.1 > /dev/null tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 2 packets captured 2 packets received by filter 0 packets dropped by kernel
With this patch the above capture changes to: IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Juliusz Chroboczek jch@irif.fr Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/in.h | 3 +++ net/ipv4/icmp.c | 7 +++++++ 2 files changed, 10 insertions(+)
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index e7ad9d350a28..60e1241d4b77 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -284,6 +284,9 @@ struct sockaddr_in { /* Address indicating an error return. */ #define INADDR_NONE ((unsigned long int) 0xffffffff)
+/* Dummy address for src of ICMP replies if no real address is set (RFC7600). */ +#define INADDR_DUMMY ((unsigned long int) 0xc0000008) + /* Network number for local host loopback. */ #define IN_LOOPBACKNET 127
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index dd8fae89be72..c88612242c89 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -739,6 +739,13 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, icmp_param.data_len = room; icmp_param.head_len = sizeof(struct icmphdr);
+ /* if we don't have a source address at this point, fall back to the + * dummy address instead of sending out a packet with a source address + * of 0.0.0.0 + */ + if (!fl4.saddr) + fl4.saddr = htonl(INADDR_DUMMY); + icmp_push_reply(&icmp_param, &fl4, &ipc, &rt); ende: ip_rt_put(rt);
From: Pavel Skripkin paskripkin@gmail.com
[ Upstream commit 9cca0c2d70149160407bda9a9446ce0c29b6e6c6 ]
static void ec_bhf_remove(struct pci_dev *dev) { ... struct ec_bhf_priv *priv = netdev_priv(net_dev);
unregister_netdev(net_dev); free_netdev(net_dev);
pci_iounmap(dev, priv->dma_io); pci_iounmap(dev, priv->io); ... }
priv is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing priv pointer. So, fix it by moving free_netdev() after pci_iounmap() calls.
Fixes: 6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.") Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ec_bhf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ec_bhf.c b/drivers/net/ethernet/ec_bhf.c index 46b0dbab8aad..7c992172933b 100644 --- a/drivers/net/ethernet/ec_bhf.c +++ b/drivers/net/ethernet/ec_bhf.c @@ -576,10 +576,12 @@ static void ec_bhf_remove(struct pci_dev *dev) struct ec_bhf_priv *priv = netdev_priv(net_dev);
unregister_netdev(net_dev); - free_netdev(net_dev);
pci_iounmap(dev, priv->dma_io); pci_iounmap(dev, priv->io); + + free_netdev(net_dev); + pci_release_regions(dev); pci_clear_master(dev); pci_disable_device(dev);
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 0514582a1a5b4ac1a3fd64792826d392d7ae9ddc ]
The valid selectors for bd70528 bucks are 0 ~ 0xf, so the .n_voltages should be 16 (0x10). Use 0x10 to make it consistent with BD70528_LDO_VOLTS. Also remove redundant defines for BD70528_BUCK_VOLTS.
Signed-off-by: Axel Lin axel.lin@ingics.com Acked-by: Matti Vaittinen matti.vaittinen@fi.rohmeurope.com Link: https://lore.kernel.org/r/20210523071045.2168904-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mfd/rohm-bd70528.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/include/linux/mfd/rohm-bd70528.h b/include/linux/mfd/rohm-bd70528.h index b0109ee6dae2..1c3014d2f28b 100644 --- a/include/linux/mfd/rohm-bd70528.h +++ b/include/linux/mfd/rohm-bd70528.h @@ -25,9 +25,7 @@ struct bd70528_data { struct mutex rtc_timer_lock; };
-#define BD70528_BUCK_VOLTS 17 -#define BD70528_BUCK_VOLTS 17 -#define BD70528_BUCK_VOLTS 17 +#define BD70528_BUCK_VOLTS 0x10 #define BD70528_LDO_VOLTS 0x20
#define BD70528_REG_BUCK1_EN 0x0F
From: Jack Yu jack.yu@realtek.com
[ Upstream commit 6308c44ed6eeadf65c0a7ba68d609773ed860fbb ]
The power of "LDO2", "MICBIAS1" and "Mic Det Power" were powered off after the DAPM widgets were added, and these powers were set by the JD settings "RT5659_JD_HDA_HEADER" in the probe function. In the codec probe function, these powers were ignored to prevent them controlled by DAPM.
Signed-off-by: Oder Chiou oder_chiou@realtek.com Signed-off-by: Jack Yu jack.yu@realtek.com Message-Id: 15fced51977b458798ca4eebf03dafb9@realtek.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5659.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/sound/soc/codecs/rt5659.c b/sound/soc/codecs/rt5659.c index afd61599d94c..a28afb480060 100644 --- a/sound/soc/codecs/rt5659.c +++ b/sound/soc/codecs/rt5659.c @@ -2470,13 +2470,18 @@ static int set_dmic_power(struct snd_soc_dapm_widget *w, return 0; }
-static const struct snd_soc_dapm_widget rt5659_dapm_widgets[] = { +static const struct snd_soc_dapm_widget rt5659_particular_dapm_widgets[] = { SND_SOC_DAPM_SUPPLY("LDO2", RT5659_PWR_ANLG_3, RT5659_PWR_LDO2_BIT, 0, NULL, 0), - SND_SOC_DAPM_SUPPLY("PLL", RT5659_PWR_ANLG_3, RT5659_PWR_PLL_BIT, 0, - NULL, 0), + SND_SOC_DAPM_SUPPLY("MICBIAS1", RT5659_PWR_ANLG_2, RT5659_PWR_MB1_BIT, + 0, NULL, 0), SND_SOC_DAPM_SUPPLY("Mic Det Power", RT5659_PWR_VOL, RT5659_PWR_MIC_DET_BIT, 0, NULL, 0), +}; + +static const struct snd_soc_dapm_widget rt5659_dapm_widgets[] = { + SND_SOC_DAPM_SUPPLY("PLL", RT5659_PWR_ANLG_3, RT5659_PWR_PLL_BIT, 0, + NULL, 0), SND_SOC_DAPM_SUPPLY("Mono Vref", RT5659_PWR_ANLG_1, RT5659_PWR_VREF3_BIT, 0, NULL, 0),
@@ -2501,8 +2506,6 @@ static const struct snd_soc_dapm_widget rt5659_dapm_widgets[] = { RT5659_ADC_MONO_R_ASRC_SFT, 0, NULL, 0),
/* Input Side */ - SND_SOC_DAPM_SUPPLY("MICBIAS1", RT5659_PWR_ANLG_2, RT5659_PWR_MB1_BIT, - 0, NULL, 0), SND_SOC_DAPM_SUPPLY("MICBIAS2", RT5659_PWR_ANLG_2, RT5659_PWR_MB2_BIT, 0, NULL, 0), SND_SOC_DAPM_SUPPLY("MICBIAS3", RT5659_PWR_ANLG_2, RT5659_PWR_MB3_BIT, @@ -3697,10 +3700,23 @@ static int rt5659_set_bias_level(struct snd_soc_component *component,
static int rt5659_probe(struct snd_soc_component *component) { + struct snd_soc_dapm_context *dapm = + snd_soc_component_get_dapm(component); struct rt5659_priv *rt5659 = snd_soc_component_get_drvdata(component);
rt5659->component = component;
+ switch (rt5659->pdata.jd_src) { + case RT5659_JD_HDA_HEADER: + break; + + default: + snd_soc_dapm_new_controls(dapm, + rt5659_particular_dapm_widgets, + ARRAY_SIZE(rt5659_particular_dapm_widgets)); + break; + } + return 0; }
From: Patrice Chotard patrice.chotard@foss.st.com
[ Upstream commit d38fa9a155b2829b7e2cfcf8a4171b6dd3672808 ]
In U-boot side, an issue has been encountered when QSPI source clock is running at low frequency (24 MHz for example), waiting for TCF bit to be set didn't ensure that all data has been send out the FIFO, we should also wait that BUSY bit is cleared.
To prevent similar issue in kernel driver, we implement similar behavior by always waiting BUSY bit to be cleared.
Signed-off-by: Patrice Chotard patrice.chotard@foss.st.com Link: https://lore.kernel.org/r/20210603073421.8441-1-patrice.chotard@foss.st.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-stm32-qspi.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-stm32-qspi.c b/drivers/spi/spi-stm32-qspi.c index 4e726929bb4f..ea77d915216a 100644 --- a/drivers/spi/spi-stm32-qspi.c +++ b/drivers/spi/spi-stm32-qspi.c @@ -291,7 +291,7 @@ static int stm32_qspi_wait_cmd(struct stm32_qspi *qspi, int err = 0;
if (!op->data.nbytes) - return stm32_qspi_wait_nobusy(qspi); + goto wait_nobusy;
if (readl_relaxed(qspi->io_base + QSPI_SR) & SR_TCF) goto out; @@ -312,6 +312,9 @@ static int stm32_qspi_wait_cmd(struct stm32_qspi *qspi, out: /* clear flags */ writel_relaxed(FCR_CTCF | FCR_CTEF, qspi->io_base + QSPI_FCR); +wait_nobusy: + if (!err) + err = stm32_qspi_wait_nobusy(qspi);
return err; }
From: Sergio Paracuellos sergio.paracuellos@gmail.com
[ Upstream commit eb367d875f94a228c17c8538e3f2efcf2eb07ead ]
In 'rt2880_pmx_group_enable' driver is printing an error and returning -EBUSY if a pin has been already enabled. This begets anoying messages in the caller when this happens like the following:
rt2880-pinmux pinctrl: pcie is already enabled mt7621-pci 1e140000.pcie: Error applying setting, reverse things back
To avoid this just print the already enabled message in the pinctrl driver and return 0 instead to not confuse the user with a real bad problem.
Signed-off-by: Sergio Paracuellos sergio.paracuellos@gmail.com Link: https://lore.kernel.org/r/20210604055337.20407-1-sergio.paracuellos@gmail.co... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c b/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c index d0f06790d38f..0ba4e4e070a9 100644 --- a/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c +++ b/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c @@ -127,7 +127,7 @@ static int rt2880_pmx_group_enable(struct pinctrl_dev *pctrldev, if (p->groups[group].enabled) { dev_err(p->dev, "%s is already enabled\n", p->groups[group].name); - return -EBUSY; + return 0; }
p->groups[group].enabled = 1;
From: Chen Li chenli@uniontech.com
[ Upstream commit ab8363d3875a83f4901eb1cc00ce8afd24de6c85 ]
I met a gpu addr bug recently and the kernel log tells me the pc is memcpy/memset and link register is radeon_uvd_resume.
As we know, in some architectures, optimized memcpy/memset may not work well on device memory. Trival memcpy_toio/memset_io can fix this problem.
BTW, amdgpu has already done it in: commit ba0b2275a678 ("drm/amdgpu: use memcpy_to/fromio for UVD fw upload"), that's why it has no this issue on the same gpu and platform.
Signed-off-by: Chen Li chenli@uniontech.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/radeon/radeon_uvd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 1ad5c3b86b64..a18bf70a251e 100644 --- a/drivers/gpu/drm/radeon/radeon_uvd.c +++ b/drivers/gpu/drm/radeon/radeon_uvd.c @@ -286,7 +286,7 @@ int radeon_uvd_resume(struct radeon_device *rdev) if (rdev->uvd.vcpu_bo == NULL) return -EINVAL;
- memcpy(rdev->uvd.cpu_addr, rdev->uvd_fw->data, rdev->uvd_fw->size); + memcpy_toio((void __iomem *)rdev->uvd.cpu_addr, rdev->uvd_fw->data, rdev->uvd_fw->size);
size = radeon_bo_size(rdev->uvd.vcpu_bo); size -= rdev->uvd_fw->size; @@ -294,7 +294,7 @@ int radeon_uvd_resume(struct radeon_device *rdev) ptr = rdev->uvd.cpu_addr; ptr += rdev->uvd_fw->size;
- memset(ptr, 0, size); + memset_io((void __iomem *)ptr, 0, size);
return 0; }
From: Riwen Lu luriwen@kylinos.cn
[ Upstream commit 78d13552346289bad4a9bf8eabb5eec5e5a321a5 ]
The scpi hwmon shows the sub-zero temperature in an unsigned integer, which would confuse the users when the machine works in low temperature environment. This shows the sub-zero temperature in an signed value and users can get it properly from sensors.
Signed-off-by: Riwen Lu luriwen@kylinos.cn Tested-by: Xin Chen chenxin@kylinos.cn Link: https://lore.kernel.org/r/20210604030959.736379-1-luriwen@kylinos.cn Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/scpi-hwmon.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/hwmon/scpi-hwmon.c b/drivers/hwmon/scpi-hwmon.c index 25aac40f2764..919877970ae3 100644 --- a/drivers/hwmon/scpi-hwmon.c +++ b/drivers/hwmon/scpi-hwmon.c @@ -99,6 +99,15 @@ scpi_show_sensor(struct device *dev, struct device_attribute *attr, char *buf)
scpi_scale_reading(&value, sensor);
+ /* + * Temperature sensor values are treated as signed values based on + * observation even though that is not explicitly specified, and + * because an unsigned u64 temperature does not really make practical + * sense especially when the temperature is below zero degrees Celsius. + */ + if (sensor->info.class == TEMPERATURE) + return sprintf(buf, "%lld\n", (s64)value); + return sprintf(buf, "%llu\n", value); }
From: Odin Ugedal odin@uged.al
[ Upstream commit a7b359fc6a37faaf472125867c8dc5a068c90982 ]
Fix an issue where fairness is decreased since cfs_rq's can end up not being decayed properly. For two sibling control groups with the same priority, this can often lead to a load ratio of 99/1 (!!).
This happens because when a cfs_rq is throttled, all the descendant cfs_rq's will be removed from the leaf list. When they initial cfs_rq is unthrottled, it will currently only re add descendant cfs_rq's if they have one or more entities enqueued. This is not a perfect heuristic.
Instead, we insert all cfs_rq's that contain one or more enqueued entities, or it its load is not completely decayed.
Can often lead to situations like this for equally weighted control groups:
$ ps u -C stress USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1 root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1
Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()") [vingo: !SMP build fix] Signed-off-by: Odin Ugedal odin@uged.al Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Link: https://lore.kernel.org/r/20210612112815.61678-1-odin@uged.al Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d3f4113e87de..877672df822f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3131,6 +3131,24 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags)
#ifdef CONFIG_SMP #ifdef CONFIG_FAIR_GROUP_SCHED + +static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) +{ + if (cfs_rq->load.weight) + return false; + + if (cfs_rq->avg.load_sum) + return false; + + if (cfs_rq->avg.util_sum) + return false; + + if (cfs_rq->avg.runnable_load_sum) + return false; + + return true; +} + /** * update_tg_load_avg - update the tg's load avg * @cfs_rq: the cfs_rq whose avg changed @@ -3833,6 +3851,11 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
#else /* CONFIG_SMP */
+static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) +{ + return true; +} + #define UPDATE_TG 0x0 #define SKIP_AGE_LOAD 0x0 #define DO_ATTACH 0x0 @@ -4488,8 +4511,8 @@ static int tg_unthrottle_up(struct task_group *tg, void *data) cfs_rq->throttled_clock_task_time += rq_clock_task(rq) - cfs_rq->throttled_clock_task;
- /* Add cfs_rq with already running entity in the list */ - if (cfs_rq->nr_running >= 1) + /* Add cfs_rq with load or one or more already running entities to the list */ + if (!cfs_rq_is_decayed(cfs_rq) || cfs_rq->nr_running) list_add_leaf_cfs_rq(cfs_rq); }
@@ -7620,23 +7643,6 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
#ifdef CONFIG_FAIR_GROUP_SCHED
-static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) -{ - if (cfs_rq->load.weight) - return false; - - if (cfs_rq->avg.load_sum) - return false; - - if (cfs_rq->avg.util_sum) - return false; - - if (cfs_rq->avg.runnable_load_sum) - return false; - - return true; -} - static bool __update_blocked_fair(struct rq *rq, bool *done) { struct cfs_rq *cfs_rq, *pos;
From: Norbert Slusarek nslusarek@gmx.net
commit 5e87ddbe3942e27e939bdc02deb8579b0cbd8ecc upstream.
On 64-bit systems, struct bcm_msg_head has an added padding of 4 bytes between struct members count and ival1. Even though all struct members are initialized, the 4-byte hole will contain data from the kernel stack. This patch zeroes out struct bcm_msg_head before usage, preventing infoleaks to userspace.
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol") Link: https://lore.kernel.org/r/trinity-7c1b2e82-e34f-4885-8060-2cd7a13769ce-16235... Cc: linux-stable stable@vger.kernel.org Signed-off-by: Norbert Slusarek nslusarek@gmx.net Acked-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/bcm.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -404,6 +404,7 @@ static enum hrtimer_restart bcm_tx_timeo if (!op->count && (op->flags & TX_COUNTEVT)) {
/* create notification to user */ + memset(&msg_head, 0, sizeof(msg_head)); msg_head.opcode = TX_EXPIRED; msg_head.flags = op->flags; msg_head.count = op->count; @@ -441,6 +442,7 @@ static void bcm_rx_changed(struct bcm_op /* this element is not throttled anymore */ data->flags &= (BCM_CAN_FLAGS_MASK|RX_RECV);
+ memset(&head, 0, sizeof(head)); head.opcode = RX_CHANGED; head.flags = op->flags; head.count = op->count; @@ -562,6 +564,7 @@ static enum hrtimer_restart bcm_rx_timeo }
/* create notification to user */ + memset(&msg_head, 0, sizeof(msg_head)); msg_head.opcode = RX_TIMEOUT; msg_head.flags = op->flags; msg_head.count = op->count;
From: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp
commit 8d0caedb759683041d9db82069937525999ada53 upstream.
syzbot is reporting hung task at register_netdevice_notifier() [1] and unregister_netdevice_notifier() [2], for cleanup_net() might perform time consuming operations while CAN driver's raw/bcm/isotp modules are calling {register,unregister}_netdevice_notifier() on each socket.
Change raw/bcm/isotp modules to call register_netdevice_notifier() from module's __init function and call unregister_netdevice_notifier() from module's __exit function, as with gw/j1939 modules are doing.
Link: https://syzkaller.appspot.com/bug?id=391b9498827788b3cc6830226d4ff5be87107c3... [1] Link: https://syzkaller.appspot.com/bug?id=1724d278c83ca6e6df100a2e320c10d991cf2bc... [2] Link: https://lore.kernel.org/r/54a5f451-05ed-f977-8534-79e7aa2bcc8f@i-love.sakura... Cc: linux-stable stable@vger.kernel.org Reported-by: syzbot syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com Reported-by: syzbot syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com Reviewed-by: Kirill Tkhai ktkhai@virtuozzo.com Tested-by: syzbot syzbot+355f8edb2ff45d5f95fa@syzkaller.appspotmail.com Tested-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/bcm.c | 59 +++++++++++++++++++++++++++++++++++++++++++------------ net/can/raw.c | 62 ++++++++++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 94 insertions(+), 27 deletions(-)
--- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -127,7 +127,7 @@ struct bcm_sock { struct sock sk; int bound; int ifindex; - struct notifier_block notifier; + struct list_head notifier; struct list_head rx_ops; struct list_head tx_ops; unsigned long dropped_usr_msgs; @@ -135,6 +135,10 @@ struct bcm_sock { char procname [32]; /* inode number in decimal with \0 */ };
+static LIST_HEAD(bcm_notifier_list); +static DEFINE_SPINLOCK(bcm_notifier_lock); +static struct bcm_sock *bcm_busy_notifier; + static inline struct bcm_sock *bcm_sk(const struct sock *sk) { return (struct bcm_sock *)sk; @@ -1383,20 +1387,15 @@ static int bcm_sendmsg(struct socket *so /* * notification handler for netdevice status changes */ -static int bcm_notifier(struct notifier_block *nb, unsigned long msg, - void *ptr) +static void bcm_notify(struct bcm_sock *bo, unsigned long msg, + struct net_device *dev) { - struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct bcm_sock *bo = container_of(nb, struct bcm_sock, notifier); struct sock *sk = &bo->sk; struct bcm_op *op; int notify_enodev = 0;
if (!net_eq(dev_net(dev), sock_net(sk))) - return NOTIFY_DONE; - - if (dev->type != ARPHRD_CAN) - return NOTIFY_DONE; + return;
switch (msg) {
@@ -1431,7 +1430,28 @@ static int bcm_notifier(struct notifier_ sk->sk_error_report(sk); } } +}
+static int bcm_notifier(struct notifier_block *nb, unsigned long msg, + void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + + if (dev->type != ARPHRD_CAN) + return NOTIFY_DONE; + if (msg != NETDEV_UNREGISTER && msg != NETDEV_DOWN) + return NOTIFY_DONE; + if (unlikely(bcm_busy_notifier)) /* Check for reentrant bug. */ + return NOTIFY_DONE; + + spin_lock(&bcm_notifier_lock); + list_for_each_entry(bcm_busy_notifier, &bcm_notifier_list, notifier) { + spin_unlock(&bcm_notifier_lock); + bcm_notify(bcm_busy_notifier, msg, dev); + spin_lock(&bcm_notifier_lock); + } + bcm_busy_notifier = NULL; + spin_unlock(&bcm_notifier_lock); return NOTIFY_DONE; }
@@ -1451,9 +1471,9 @@ static int bcm_init(struct sock *sk) INIT_LIST_HEAD(&bo->rx_ops);
/* set notifier */ - bo->notifier.notifier_call = bcm_notifier; - - register_netdevice_notifier(&bo->notifier); + spin_lock(&bcm_notifier_lock); + list_add_tail(&bo->notifier, &bcm_notifier_list); + spin_unlock(&bcm_notifier_lock);
return 0; } @@ -1476,7 +1496,14 @@ static int bcm_release(struct socket *so
/* remove bcm_ops, timer, rx_unregister(), etc. */
- unregister_netdevice_notifier(&bo->notifier); + spin_lock(&bcm_notifier_lock); + while (bcm_busy_notifier == bo) { + spin_unlock(&bcm_notifier_lock); + schedule_timeout_uninterruptible(1); + spin_lock(&bcm_notifier_lock); + } + list_del(&bo->notifier); + spin_unlock(&bcm_notifier_lock);
lock_sock(sk);
@@ -1699,6 +1726,10 @@ static struct pernet_operations canbcm_p .exit = canbcm_pernet_exit, };
+static struct notifier_block canbcm_notifier = { + .notifier_call = bcm_notifier +}; + static int __init bcm_module_init(void) { int err; @@ -1712,12 +1743,14 @@ static int __init bcm_module_init(void) }
register_pernet_subsys(&canbcm_pernet_ops); + register_netdevice_notifier(&canbcm_notifier); return 0; }
static void __exit bcm_module_exit(void) { can_proto_unregister(&bcm_can_proto); + unregister_netdevice_notifier(&canbcm_notifier); unregister_pernet_subsys(&canbcm_pernet_ops); }
--- a/net/can/raw.c +++ b/net/can/raw.c @@ -85,7 +85,7 @@ struct raw_sock { struct sock sk; int bound; int ifindex; - struct notifier_block notifier; + struct list_head notifier; int loopback; int recv_own_msgs; int fd_frames; @@ -97,6 +97,10 @@ struct raw_sock { struct uniqframe __percpu *uniq; };
+static LIST_HEAD(raw_notifier_list); +static DEFINE_SPINLOCK(raw_notifier_lock); +static struct raw_sock *raw_busy_notifier; + /* Return pointer to store the extra msg flags for raw_recvmsg(). * We use the space of one unsigned int beyond the 'struct sockaddr_can' * in skb->cb. @@ -265,21 +269,16 @@ static int raw_enable_allfilters(struct return err; }
-static int raw_notifier(struct notifier_block *nb, - unsigned long msg, void *ptr) +static void raw_notify(struct raw_sock *ro, unsigned long msg, + struct net_device *dev) { - struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct raw_sock *ro = container_of(nb, struct raw_sock, notifier); struct sock *sk = &ro->sk;
if (!net_eq(dev_net(dev), sock_net(sk))) - return NOTIFY_DONE; - - if (dev->type != ARPHRD_CAN) - return NOTIFY_DONE; + return;
if (ro->ifindex != dev->ifindex) - return NOTIFY_DONE; + return;
switch (msg) { case NETDEV_UNREGISTER: @@ -307,7 +306,28 @@ static int raw_notifier(struct notifier_ sk->sk_error_report(sk); break; } +} + +static int raw_notifier(struct notifier_block *nb, unsigned long msg, + void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + + if (dev->type != ARPHRD_CAN) + return NOTIFY_DONE; + if (msg != NETDEV_UNREGISTER && msg != NETDEV_DOWN) + return NOTIFY_DONE; + if (unlikely(raw_busy_notifier)) /* Check for reentrant bug. */ + return NOTIFY_DONE;
+ spin_lock(&raw_notifier_lock); + list_for_each_entry(raw_busy_notifier, &raw_notifier_list, notifier) { + spin_unlock(&raw_notifier_lock); + raw_notify(raw_busy_notifier, msg, dev); + spin_lock(&raw_notifier_lock); + } + raw_busy_notifier = NULL; + spin_unlock(&raw_notifier_lock); return NOTIFY_DONE; }
@@ -336,9 +356,9 @@ static int raw_init(struct sock *sk) return -ENOMEM;
/* set notifier */ - ro->notifier.notifier_call = raw_notifier; - - register_netdevice_notifier(&ro->notifier); + spin_lock(&raw_notifier_lock); + list_add_tail(&ro->notifier, &raw_notifier_list); + spin_unlock(&raw_notifier_lock);
return 0; } @@ -353,7 +373,14 @@ static int raw_release(struct socket *so
ro = raw_sk(sk);
- unregister_netdevice_notifier(&ro->notifier); + spin_lock(&raw_notifier_lock); + while (raw_busy_notifier == ro) { + spin_unlock(&raw_notifier_lock); + schedule_timeout_uninterruptible(1); + spin_lock(&raw_notifier_lock); + } + list_del(&ro->notifier); + spin_unlock(&raw_notifier_lock);
lock_sock(sk);
@@ -879,6 +906,10 @@ static const struct can_proto raw_can_pr .prot = &raw_proto, };
+static struct notifier_block canraw_notifier = { + .notifier_call = raw_notifier +}; + static __init int raw_module_init(void) { int err; @@ -888,6 +919,8 @@ static __init int raw_module_init(void) err = can_proto_register(&raw_can_proto); if (err < 0) pr_err("can: registration of raw protocol failed\n"); + else + register_netdevice_notifier(&canraw_notifier);
return err; } @@ -895,6 +928,7 @@ static __init int raw_module_init(void) static __exit void raw_module_exit(void) { can_proto_unregister(&raw_can_proto); + unregister_netdevice_notifier(&canraw_notifier); }
module_init(raw_module_init);
From: Oleksij Rempel o.rempel@pengutronix.de
commit 2030043e616cab40f510299f09b636285e0a3678 upstream.
This patch fixes a Use-after-Free found by the syzbot.
The problem is that a skb is taken from the per-session skb queue, without incrementing the ref count. This leads to a Use-after-Free if the skb is taken concurrently from the session queue due to a CTS.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/r/20210521115720.7533-1-o.rempel@pengutronix.de Cc: Hillf Danton hdanton@sina.com Cc: linux-stable stable@vger.kernel.org Reported-by: syzbot+220c1a29987a9a490903@syzkaller.appspotmail.com Reported-by: syzbot+45199c1b73b4013525cf@syzkaller.appspotmail.com Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/j1939/transport.c | 54 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 14 deletions(-)
--- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -330,6 +330,9 @@ static void j1939_session_skb_drop_old(s
if ((do_skcb->offset + do_skb->len) < offset_start) { __skb_unlink(do_skb, &session->skb_queue); + /* drop ref taken in j1939_session_skb_queue() */ + skb_unref(do_skb); + kfree_skb(do_skb); } spin_unlock_irqrestore(&session->skb_queue.lock, flags); @@ -349,12 +352,13 @@ void j1939_session_skb_queue(struct j193
skcb->flags |= J1939_ECU_LOCAL_SRC;
+ skb_get(skb); skb_queue_tail(&session->skb_queue, skb); }
static struct -sk_buff *j1939_session_skb_find_by_offset(struct j1939_session *session, - unsigned int offset_start) +sk_buff *j1939_session_skb_get_by_offset(struct j1939_session *session, + unsigned int offset_start) { struct j1939_priv *priv = session->priv; struct j1939_sk_buff_cb *do_skcb; @@ -371,6 +375,10 @@ sk_buff *j1939_session_skb_find_by_offse skb = do_skb; } } + + if (skb) + skb_get(skb); + spin_unlock_irqrestore(&session->skb_queue.lock, flags);
if (!skb) @@ -381,12 +389,12 @@ sk_buff *j1939_session_skb_find_by_offse return skb; }
-static struct sk_buff *j1939_session_skb_find(struct j1939_session *session) +static struct sk_buff *j1939_session_skb_get(struct j1939_session *session) { unsigned int offset_start;
offset_start = session->pkt.dpo * 7; - return j1939_session_skb_find_by_offset(session, offset_start); + return j1939_session_skb_get_by_offset(session, offset_start); }
/* see if we are receiver @@ -776,7 +784,7 @@ static int j1939_session_tx_dat(struct j int ret = 0; u8 dat[8];
- se_skb = j1939_session_skb_find_by_offset(session, session->pkt.tx * 7); + se_skb = j1939_session_skb_get_by_offset(session, session->pkt.tx * 7); if (!se_skb) return -ENOBUFS;
@@ -801,7 +809,8 @@ static int j1939_session_tx_dat(struct j netdev_err_once(priv->ndev, "%s: 0x%p: requested data outside of queued buffer: offset %i, len %i, pkt.tx: %i\n", __func__, session, skcb->offset, se_skb->len , session->pkt.tx); - return -EOVERFLOW; + ret = -EOVERFLOW; + goto out_free; }
if (!len) { @@ -835,6 +844,12 @@ static int j1939_session_tx_dat(struct j if (pkt_done) j1939_tp_set_rxtimeout(session, 250);
+ out_free: + if (ret) + kfree_skb(se_skb); + else + consume_skb(se_skb); + return ret; }
@@ -1007,7 +1022,7 @@ static int j1939_xtp_txnext_receiver(str static int j1939_simple_txnext(struct j1939_session *session) { struct j1939_priv *priv = session->priv; - struct sk_buff *se_skb = j1939_session_skb_find(session); + struct sk_buff *se_skb = j1939_session_skb_get(session); struct sk_buff *skb; int ret;
@@ -1015,8 +1030,10 @@ static int j1939_simple_txnext(struct j1 return 0;
skb = skb_clone(se_skb, GFP_ATOMIC); - if (!skb) - return -ENOMEM; + if (!skb) { + ret = -ENOMEM; + goto out_free; + }
can_skb_set_owner(skb, se_skb->sk);
@@ -1024,12 +1041,18 @@ static int j1939_simple_txnext(struct j1
ret = j1939_send_one(priv, skb); if (ret) - return ret; + goto out_free;
j1939_sk_errqueue(session, J1939_ERRQUEUE_SCHED); j1939_sk_queue_activate_next(session);
- return 0; + out_free: + if (ret) + kfree_skb(se_skb); + else + consume_skb(se_skb); + + return ret; }
static bool j1939_session_deactivate_locked(struct j1939_session *session) @@ -1170,9 +1193,10 @@ static void j1939_session_completed(stru struct sk_buff *skb;
if (!session->transmission) { - skb = j1939_session_skb_find(session); + skb = j1939_session_skb_get(session); /* distribute among j1939 receivers */ j1939_sk_recv(session->priv, skb); + consume_skb(skb); }
j1939_session_deactivate_activate_next(session); @@ -1744,7 +1768,7 @@ static void j1939_xtp_rx_dat_one(struct { struct j1939_priv *priv = session->priv; struct j1939_sk_buff_cb *skcb; - struct sk_buff *se_skb; + struct sk_buff *se_skb = NULL; const u8 *dat; u8 *tpdat; int offset; @@ -1786,7 +1810,7 @@ static void j1939_xtp_rx_dat_one(struct goto out_session_cancel; }
- se_skb = j1939_session_skb_find_by_offset(session, packet * 7); + se_skb = j1939_session_skb_get_by_offset(session, packet * 7); if (!se_skb) { netdev_warn(priv->ndev, "%s: 0x%p: no skb found\n", __func__, session); @@ -1848,11 +1872,13 @@ static void j1939_xtp_rx_dat_one(struct j1939_tp_set_rxtimeout(session, 250); } session->last_cmd = 0xff; + consume_skb(se_skb); j1939_session_put(session);
return;
out_session_cancel: + kfree_skb(se_skb); j1939_session_timers_cancel(session); j1939_session_cancel(session, J1939_XTP_ABORT_FAULT); j1939_session_put(session);
From: Pavel Skripkin paskripkin@gmail.com
commit 91c02557174be7f72e46ed7311e3bea1939840b0 upstream.
Syzbot reported memory leak in SocketCAN driver for Microchip CAN BUS Analyzer Tool. The problem was in unfreed usb_coherent.
In mcba_usb_start() 20 coherent buffers are allocated and there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all 2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER is not set (see mcba_usb_start) and this flag cannot be used with coherent buffers.
Fail log: | [ 1354.053291][ T8413] mcba_usb 1-1:0.0 can0: device disconnected | [ 1367.059384][ T8420] kmemleak: 20 new suspected memory leaks (see /sys/kernel/debug/kmem)
So, all allocated buffers should be freed with usb_free_coherent() explicitly
NOTE: The same pattern for allocating and freeing coherent buffers is used in drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
Fixes: 51f3baad7de9 ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer") Link: https://lore.kernel.org/r/20210609215833.30393-1-paskripkin@gmail.com Cc: linux-stable stable@vger.kernel.org Reported-and-tested-by: syzbot+57281c762a3922e14dfe@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/usb/mcba_usb.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/drivers/net/can/usb/mcba_usb.c +++ b/drivers/net/can/usb/mcba_usb.c @@ -82,6 +82,8 @@ struct mcba_priv { bool can_ka_first_pass; bool can_speed_check; atomic_t free_ctx_cnt; + void *rxbuf[MCBA_MAX_RX_URBS]; + dma_addr_t rxbuf_dma[MCBA_MAX_RX_URBS]; };
/* CAN frame */ @@ -633,6 +635,7 @@ static int mcba_usb_start(struct mcba_pr for (i = 0; i < MCBA_MAX_RX_URBS; i++) { struct urb *urb = NULL; u8 *buf; + dma_addr_t buf_dma;
/* create a URB, and a buffer for it */ urb = usb_alloc_urb(0, GFP_KERNEL); @@ -642,7 +645,7 @@ static int mcba_usb_start(struct mcba_pr }
buf = usb_alloc_coherent(priv->udev, MCBA_USB_RX_BUFF_SIZE, - GFP_KERNEL, &urb->transfer_dma); + GFP_KERNEL, &buf_dma); if (!buf) { netdev_err(netdev, "No memory left for USB buffer\n"); usb_free_urb(urb); @@ -661,11 +664,14 @@ static int mcba_usb_start(struct mcba_pr if (err) { usb_unanchor_urb(urb); usb_free_coherent(priv->udev, MCBA_USB_RX_BUFF_SIZE, - buf, urb->transfer_dma); + buf, buf_dma); usb_free_urb(urb); break; }
+ priv->rxbuf[i] = buf; + priv->rxbuf_dma[i] = buf_dma; + /* Drop reference, USB core will take care of freeing it */ usb_free_urb(urb); } @@ -708,7 +714,14 @@ static int mcba_usb_open(struct net_devi
static void mcba_urb_unlink(struct mcba_priv *priv) { + int i; + usb_kill_anchored_urbs(&priv->rx_submitted); + + for (i = 0; i < MCBA_MAX_RX_URBS; ++i) + usb_free_coherent(priv->udev, MCBA_USB_RX_BUFF_SIZE, + priv->rxbuf[i], priv->rxbuf_dma[i]); + usb_kill_anchored_urbs(&priv->tx_submitted); }
From: Andrew Lunn andrew@lunn.ch
commit a7d8d1c7a7f73e780aa9ae74926ae5985b2f895f upstream.
The Cypress CY7C65632 appears to have an issue with auto suspend and detecting devices, not too dissimilar to the SMSC 5534B hub. It is easiest to reproduce by connecting multiple mass storage devices to the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts result in the devices not being detected. It is however possible to make them appear using lsusb -v.
Disabling autosuspend for this hub resolves the issue.
Fixes: 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub") Cc: stable@vger.kernel.org Signed-off-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20210614155524.2228800-1-andrew@lunn.ch Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -39,6 +39,8 @@ #define USB_VENDOR_GENESYS_LOGIC 0x05e3 #define USB_VENDOR_SMSC 0x0424 #define USB_PRODUCT_USB5534B 0x5534 +#define USB_VENDOR_CYPRESS 0x04b4 +#define USB_PRODUCT_CY7C65632 0x6570 #define HUB_QUIRK_CHECK_PORT_AUTOSUSPEND 0x01 #define HUB_QUIRK_DISABLE_AUTOSUSPEND 0x02
@@ -5515,6 +5517,11 @@ static const struct usb_device_id hub_id .bInterfaceClass = USB_CLASS_HUB, .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, { .match_flags = USB_DEVICE_ID_MATCH_VENDOR + | USB_DEVICE_ID_MATCH_PRODUCT, + .idVendor = USB_VENDOR_CYPRESS, + .idProduct = USB_PRODUCT_CY7C65632, + .driver_info = HUB_QUIRK_DISABLE_AUTOSUSPEND}, + { .match_flags = USB_DEVICE_ID_MATCH_VENDOR | USB_DEVICE_ID_MATCH_INT_CLASS, .idVendor = USB_VENDOR_GENESYS_LOGIC, .bInterfaceClass = USB_CLASS_HUB,
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 85550c83da421fb12dc1816c45012e1e638d2b38 upstream.
The saved_cmdlines is used to map pids to the task name, such that the output of the tracing does not just show pids, but also gives a human readable name for the task.
If the name is not mapped, the output looks like this:
<...>-1316 [005] ...2 132.044039: ...
Instead of this:
gnome-shell-1316 [005] ...2 132.044039: ...
The names are updated when tracing is running, but are skipped if tracing is stopped. Unfortunately, this stops the recording of the names if the top level tracer is stopped, and not if there's other tracers active.
The recording of a name only happens when a new event is written into a ring buffer, so there is no need to test if tracing is on or not. If tracing is off, then no event is written and no need to test if tracing is off or not.
Remove the check, as it hides the names of tasks for events in the instance buffers.
Cc: stable@vger.kernel.org Fixes: 7ffbd48d5cab2 ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 2 -- 1 file changed, 2 deletions(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2236,8 +2236,6 @@ static bool tracing_record_taskinfo_skip { if (unlikely(!(flags & (TRACE_RECORD_CMDLINE | TRACE_RECORD_TGID)))) return true; - if (atomic_read(&trace_record_taskinfo_disabled) || !tracing_is_on()) - return true; if (!__this_cpu_read(trace_taskinfo_save)) return true; return false;
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 4fdd595e4f9a1ff6d93ec702eaecae451cfc6591 upstream.
A while ago, when the "trace" file was opened, tracing was stopped, and code was added to stop recording the comms to saved_cmdlines, for mapping of the pids to the task name.
Code has been added that only records the comm if a trace event occurred, and there's no reason to not trace it if the trace file is opened.
Cc: stable@vger.kernel.org Fixes: 7ffbd48d5cab2 ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 9 --------- 1 file changed, 9 deletions(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1948,9 +1948,6 @@ struct saved_cmdlines_buffer { }; static struct saved_cmdlines_buffer *savedcmd;
-/* temporary disable recording */ -static atomic_t trace_record_taskinfo_disabled __read_mostly; - static inline char *get_saved_cmdlines(int idx) { return &savedcmd->saved_cmdlines[idx * TASK_COMM_LEN]; @@ -3458,9 +3455,6 @@ static void *s_start(struct seq_file *m, return ERR_PTR(-EBUSY); #endif
- if (!iter->snapshot) - atomic_inc(&trace_record_taskinfo_disabled); - if (*pos != iter->pos) { iter->ent = NULL; iter->cpu = 0; @@ -3503,9 +3497,6 @@ static void s_stop(struct seq_file *m, v return; #endif
- if (!iter->snapshot) - atomic_dec(&trace_record_taskinfo_disabled); - trace_access_unlock(iter->cpu_file); trace_event_read_unlock(); }
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 89529d8b8f8daf92d9979382b8d2eb39966846ea upstream.
The trace_clock_global() tries to make sure the events between CPUs is somewhat in order. A global value is used and updated by the latest read of a clock. If one CPU is ahead by a little, and is read by another CPU, a lock is taken, and if the timestamp of the other CPU is behind, it will simply use the other CPUs timestamp.
The lock is also only taken with a "trylock" due to tracing, and strange recursions can happen. The lock is not taken at all in NMI context.
In the case where the lock is not able to be taken, the non synced timestamp is returned. But it will not be less than the saved global timestamp.
The problem arises because when the time goes "backwards" the time returned is the saved timestamp plus 1. If the lock is not taken, and the plus one to the timestamp is returned, there's a small race that can cause the time to go backwards!
CPU0 CPU1 ---- ---- trace_clock_global() { ts = clock() [ 1000 ] trylock(clock_lock) [ success ] global_ts = ts; [ 1000 ]
<interrupted by NMI> trace_clock_global() { ts = clock() [ 999 ] if (ts < global_ts) ts = global_ts + 1 [ 1001 ]
trylock(clock_lock) [ fail ]
return ts [ 1001] } unlock(clock_lock); return ts; [ 1000 ] }
trace_clock_global() { ts = clock() [ 1000 ] if (ts < global_ts) [ false 1000 == 1000 ]
trylock(clock_lock) [ success ] global_ts = ts; [ 1000 ] unlock(clock_lock)
return ts; [ 1000 ] }
The above case shows to reads of trace_clock_global() on the same CPU, but the second read returns one less than the first read. That is, time when backwards, and this is not what is allowed by trace_clock_global().
This was triggered by heavy tracing and the ring buffer checker that tests for the clock going backwards:
Ring buffer clock went backwards: 20613921464 -> 20613921463 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0 Modules linked in: [..] [CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463 [20613915818] PAGE TIME STAMP [20613915818] delta:0 [20613915819] delta:1 [20613916035] delta:216 [20613916465] delta:430 [20613916575] delta:110 [20613916749] delta:174 [20613917248] delta:499 [20613917333] delta:85 [20613917775] delta:442 [20613917921] delta:146 [20613918321] delta:400 [20613918568] delta:247 [20613918768] delta:200 [20613919306] delta:538 [20613919353] delta:47 [20613919980] delta:627 [20613920296] delta:316 [20613920571] delta:275 [20613920862] delta:291 [20613921152] delta:290 [20613921464] delta:312 [20613921464] delta:0 TIME EXTEND [20613921464] delta:0
This happened more than once, and always for an off by one result. It also started happening after commit aafe104aa9096 was added.
Cc: stable@vger.kernel.org Fixes: aafe104aa9096 ("tracing: Restructure trace_clock_global() to never block") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_clock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/trace/trace_clock.c +++ b/kernel/trace/trace_clock.c @@ -115,9 +115,9 @@ u64 notrace trace_clock_global(void) prev_time = READ_ONCE(trace_clock_struct.prev_time); now = sched_clock_cpu(this_cpu);
- /* Make sure that now is always greater than prev_time */ + /* Make sure that now is always greater than or equal to prev_time */ if ((s64)(now - prev_time) < 0) - now = prev_time + 1; + now = prev_time;
/* * If in an NMI context then dont risk lockups and simply return @@ -131,7 +131,7 @@ u64 notrace trace_clock_global(void) /* Reread prev_time in case it was already updated */ prev_time = READ_ONCE(trace_clock_struct.prev_time); if ((s64)(now - prev_time) < 0) - now = prev_time + 1; + now = prev_time;
trace_clock_struct.prev_time = now;
From: Antti Järvinen antti.jarvinen@gmail.com
commit b5cf198e74a91073d12839a3e2db99994a39995d upstream.
Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS automatically disables LTSSM when Secondary Bus Reset is received and device stops working. Prevent bus reset for these devices. With this change, the device can be assigned to VMs with VFIO, but it will leak state between VMs.
Reference: https://e2e.ti.com/support/processors/f/791/t/954382 Link: https://lore.kernel.org/r/20210315102606.17153-1-antti.jarvinen@gmail.com Signed-off-by: Antti Järvinen antti.jarvinen@gmail.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Kishon Vijay Abraham I kishon@ti.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/quirks.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3577,6 +3577,16 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_A */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
+/* + * Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS + * automatically disables LTSSM when Secondary Bus Reset is received and + * the device stops working. Prevent bus reset for these devices. With + * this change, the device can be assigned to VMs with VFIO, but it will + * leak state between VMs. Reference + * https://e2e.ti.com/support/processors/f/791/t/954382 + */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TI, 0xb005, quirk_no_bus_reset); + static void quirk_no_pm_reset(struct pci_dev *dev) { /*
From: Shanker Donthineni sdonthineni@nvidia.com
commit 4c207e7121fa92b66bf1896bf8ccb9edfb0f9731 upstream.
Some NVIDIA GPU devices do not work with SBR. Triggering SBR leaves the device inoperable for the current system boot. It requires a system hard-reboot to get the GPU device back to normal operating condition post-SBR. For the affected devices, enable NO_BUS_RESET quirk to avoid the issue.
This issue will be fixed in the next generation of hardware.
Link: https://lore.kernel.org/r/20210608054857.18963-8-ameynarkhede03@gmail.com Signed-off-by: Shanker Donthineni sdonthineni@nvidia.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Sinan Kaya okaya@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/quirks.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3558,6 +3558,18 @@ static void quirk_no_bus_reset(struct pc }
/* + * Some NVIDIA GPU devices do not work with bus reset, SBR needs to be + * prevented for those affected devices. + */ +static void quirk_nvidia_no_bus_reset(struct pci_dev *dev) +{ + if ((dev->device & 0xffc0) == 0x2340) + quirk_no_bus_reset(dev); +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, + quirk_nvidia_no_bus_reset); + +/* * Some Atheros AR9xxx and QCA988x chips do not behave after a bus reset. * The device will throw a Link Down error on AER-capable systems and * regardless of AER, config space of the device is never accessible again
From: Remi Pommarel repk@triplefau.lt
commit 7fbcb5da811be7d47468417c7795405058abb3da upstream.
advk_pcie_wait_pio() can be called while holding a spinlock (from pci_bus_read_config_dword()), then depends on jiffies in order to timeout while polling on PIO state registers. In the case the PIO transaction failed, the timeout will never happen and will also cause the cpu to stall.
This decrements a variable and wait instead of using jiffies.
Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Andrew Murray andrew.murray@arm.com Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -175,7 +175,8 @@ (PCIE_CONF_BUS(bus) | PCIE_CONF_DEV(PCI_SLOT(devfn)) | \ PCIE_CONF_FUNC(PCI_FUNC(devfn)) | PCIE_CONF_REG(where))
-#define PIO_TIMEOUT_MS 1 +#define PIO_RETRY_CNT 500 +#define PIO_RETRY_DELAY 2 /* 2 us*/
#define LINK_WAIT_MAX_RETRIES 10 #define LINK_WAIT_USLEEP_MIN 90000 @@ -392,17 +393,16 @@ static void advk_pcie_check_pio_status(s static int advk_pcie_wait_pio(struct advk_pcie *pcie) { struct device *dev = &pcie->pdev->dev; - unsigned long timeout; + int i;
- timeout = jiffies + msecs_to_jiffies(PIO_TIMEOUT_MS); - - while (time_before(jiffies, timeout)) { + for (i = 0; i < PIO_RETRY_CNT; i++) { u32 start, isr;
start = advk_readl(pcie, PIO_START); isr = advk_readl(pcie, PIO_ISR); if (!start && isr) return 0; + udelay(PIO_RETRY_DELAY); }
dev_err(dev, "config read/write timed out\n");
From: Pali Rohár pali@kernel.org
commit f18139966d072dab8e4398c95ce955a9742e04f7 upstream.
Trying to start a new PIO transfer by writing value 0 in PIO_START register when previous transfer has not yet completed (which is indicated by value 1 in PIO_START) causes an External Abort on CPU, which results in kernel panic:
SError Interrupt on CPU0, code 0xbf000002 -- SError Kernel panic - not syncing: Asynchronous SError Interrupt
To prevent kernel panic, it is required to reject a new PIO transfer when previous one has not finished yet.
If previous PIO transfer is not finished yet, the kernel may issue a new PIO request only if the previous PIO transfer timed out.
In the past the root cause of this issue was incorrectly identified (as it often happens during link retraining or after link down event) and special hack was implemented in Trusted Firmware to catch all SError events in EL3, to ignore errors with code 0xbf000002 and not forwarding any other errors to kernel and instead throw panic from EL3 Trusted Firmware handler.
Links to discussion and patches about this issue: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dc... https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/ https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.... https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
But the real cause was the fact that during link retraining or after link down event the PIO transfer may take longer time, up to the 1.44s until it times out. This increased probability that a new PIO transfer would be issued by kernel while previous one has not finished yet.
After applying this change into the kernel, it is possible to revert the mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Marek Behún kabel@kernel.org Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock") Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-aardvark.c | 49 +++++++++++++++++++++++++++------- 1 file changed, 40 insertions(+), 9 deletions(-)
--- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -405,7 +405,7 @@ static int advk_pcie_wait_pio(struct adv udelay(PIO_RETRY_DELAY); }
- dev_err(dev, "config read/write timed out\n"); + dev_err(dev, "PIO read/write transfer time out\n"); return -ETIMEDOUT; }
@@ -539,6 +539,35 @@ static bool advk_pcie_valid_device(struc return true; }
+static bool advk_pcie_pio_is_running(struct advk_pcie *pcie) +{ + struct device *dev = &pcie->pdev->dev; + + /* + * Trying to start a new PIO transfer when previous has not completed + * cause External Abort on CPU which results in kernel panic: + * + * SError Interrupt on CPU0, code 0xbf000002 -- SError + * Kernel panic - not syncing: Asynchronous SError Interrupt + * + * Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected + * by raw_spin_lock_irqsave() at pci_lock_config() level to prevent + * concurrent calls at the same time. But because PIO transfer may take + * about 1.5s when link is down or card is disconnected, it means that + * advk_pcie_wait_pio() does not always have to wait for completion. + * + * Some versions of ARM Trusted Firmware handles this External Abort at + * EL3 level and mask it to prevent kernel panic. Relevant TF-A commit: + * https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dc... + */ + if (advk_readl(pcie, PIO_START)) { + dev_err(dev, "Previous PIO read/write transfer is still running\n"); + return true; + } + + return false; +} + static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn, int where, int size, u32 *val) { @@ -555,9 +584,10 @@ static int advk_pcie_rd_conf(struct pci_ return pci_bridge_emul_conf_read(&pcie->bridge, where, size, val);
- /* Start PIO */ - advk_writel(pcie, 0, PIO_START); - advk_writel(pcie, 1, PIO_ISR); + if (advk_pcie_pio_is_running(pcie)) { + *val = 0xffffffff; + return PCIBIOS_SET_FAILED; + }
/* Program the control register */ reg = advk_readl(pcie, PIO_CTRL); @@ -576,7 +606,8 @@ static int advk_pcie_rd_conf(struct pci_ /* Program the data strobe */ advk_writel(pcie, 0xf, PIO_WR_DATA_STRB);
- /* Start the transfer */ + /* Clear PIO DONE ISR and start the transfer */ + advk_writel(pcie, 1, PIO_ISR); advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie); @@ -614,9 +645,8 @@ static int advk_pcie_wr_conf(struct pci_ if (where % size) return PCIBIOS_SET_FAILED;
- /* Start PIO */ - advk_writel(pcie, 0, PIO_START); - advk_writel(pcie, 1, PIO_ISR); + if (advk_pcie_pio_is_running(pcie)) + return PCIBIOS_SET_FAILED;
/* Program the control register */ reg = advk_readl(pcie, PIO_CTRL); @@ -643,7 +673,8 @@ static int advk_pcie_wr_conf(struct pci_ /* Program the data strobe */ advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB);
- /* Start the transfer */ + /* Clear PIO DONE ISR and start the transfer */ + advk_writel(pcie, 1, PIO_ISR); advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie);
From: Sriharsha Basavapatna sriharsha.basavapatna@broadcom.com
commit db2f77e2bd99dbd2fb23ddde58f0fae392fe3338 upstream.
The Broadcom BCM57414 NIC may be a multi-function device. While it does not advertise an ACS capability, peer-to-peer transactions are not possible between the individual functions, so it is safe to treat them as fully isolated.
Add an ACS quirk for this device so the functions can be in independent IOMMU groups and attached individually to userspace applications using VFIO.
[bhelgaas: commit log] Link: https://lore.kernel.org/r/1621645997-16251-1-git-send-email-michael.chan@bro... Signed-off-by: Sriharsha Basavapatna sriharsha.basavapatna@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4843,6 +4843,8 @@ static const struct pci_dev_acs_enabled { PCI_VENDOR_ID_AMPERE, 0xE00A, pci_quirk_xgene_acs }, { PCI_VENDOR_ID_AMPERE, 0xE00B, pci_quirk_xgene_acs }, { PCI_VENDOR_ID_AMPERE, 0xE00C, pci_quirk_xgene_acs }, + /* Broadcom multi-function device */ + { PCI_VENDOR_ID_BROADCOM, 0x16D7, pci_quirk_mf_endpoint_acs }, { PCI_VENDOR_ID_BROADCOM, 0xD714, pci_quirk_brcm_acs }, /* Amazon Annapurna Labs */ { PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031, pci_quirk_al_acs },
From: Chiqijun chiqijun@huawei.com
commit ce00322c2365e1f7b0312f2f493539c833465d97 upstream.
pcie_flr() starts a Function Level Reset (FLR), waits 100ms (the maximum time allowed for FLR completion by PCIe r5.0, sec 6.6.2), and waits for the FLR to complete. It assumes the FLR is complete when a config read returns valid data.
When we do an FLR on several Huawei Intelligent NIC VFs at the same time, firmware on the NIC processes them serially. The VF may respond to config reads before the firmware has completed its reset processing. If we bind a driver to the VF (e.g., by assigning the VF to a virtual machine) in the interval between the successful config read and completion of the firmware reset processing, the NIC VF driver may fail to load.
Prevent this driver failure by waiting for the NIC firmware to complete its reset processing. Not all NIC firmware supports this feature.
[bhelgaas: commit log] Link: https://support.huawei.com/enterprise/en/doc/EDOC1100063073/87950645/vm-oss-... Link: https://lore.kernel.org/r/20210414132301.1793-1-chiqijun@huawei.com Signed-off-by: Chiqijun chiqijun@huawei.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Alex Williamson alex.williamson@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/quirks.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+)
--- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3991,6 +3991,69 @@ static int delay_250ms_after_flr(struct return 0; }
+#define PCI_DEVICE_ID_HINIC_VF 0x375E +#define HINIC_VF_FLR_TYPE 0x1000 +#define HINIC_VF_FLR_CAP_BIT (1UL << 30) +#define HINIC_VF_OP 0xE80 +#define HINIC_VF_FLR_PROC_BIT (1UL << 18) +#define HINIC_OPERATION_TIMEOUT 15000 /* 15 seconds */ + +/* Device-specific reset method for Huawei Intelligent NIC virtual functions */ +static int reset_hinic_vf_dev(struct pci_dev *pdev, int probe) +{ + unsigned long timeout; + void __iomem *bar; + u32 val; + + if (probe) + return 0; + + bar = pci_iomap(pdev, 0, 0); + if (!bar) + return -ENOTTY; + + /* Get and check firmware capabilities */ + val = ioread32be(bar + HINIC_VF_FLR_TYPE); + if (!(val & HINIC_VF_FLR_CAP_BIT)) { + pci_iounmap(pdev, bar); + return -ENOTTY; + } + + /* Set HINIC_VF_FLR_PROC_BIT for the start of FLR */ + val = ioread32be(bar + HINIC_VF_OP); + val = val | HINIC_VF_FLR_PROC_BIT; + iowrite32be(val, bar + HINIC_VF_OP); + + pcie_flr(pdev); + + /* + * The device must recapture its Bus and Device Numbers after FLR + * in order generate Completions. Issue a config write to let the + * device capture this information. + */ + pci_write_config_word(pdev, PCI_VENDOR_ID, 0); + + /* Firmware clears HINIC_VF_FLR_PROC_BIT when reset is complete */ + timeout = jiffies + msecs_to_jiffies(HINIC_OPERATION_TIMEOUT); + do { + val = ioread32be(bar + HINIC_VF_OP); + if (!(val & HINIC_VF_FLR_PROC_BIT)) + goto reset_complete; + msleep(20); + } while (time_before(jiffies, timeout)); + + val = ioread32be(bar + HINIC_VF_OP); + if (!(val & HINIC_VF_FLR_PROC_BIT)) + goto reset_complete; + + pci_warn(pdev, "Reset dev timeout, FLR ack reg: %#010x\n", val); + +reset_complete: + pci_iounmap(pdev, bar); + + return 0; +} + static const struct pci_dev_reset_methods pci_dev_reset_methods[] = { { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF, reset_intel_82599_sfp_virtfn }, @@ -4002,6 +4065,8 @@ static const struct pci_dev_reset_method { PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr }, { PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, reset_chelsio_generic_dev }, + { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF, + reset_hinic_vf_dev }, { 0 } };
From: Sean Christopherson seanjc@google.com
commit 78fcb2c91adfec8ce3a2ba6b4d0dda89f2f4a7c6 upstream.
Immediately reset the MMU context when the vCPU's SMM flag is cleared so that the SMM flag in the MMU role is always synchronized with the vCPU's flag. If RSM fails (which isn't correctly emulated), KVM will bail without calling post_leave_smm() and leave the MMU in a bad state.
The bad MMU role can lead to a NULL pointer dereference when grabbing a shadow page's rmap for a page fault as the initial lookups for the gfn will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will use the shadow page's SMM flag, which comes from the MMU (=1). SMM has an entirely different set of memslots, and so the initial lookup can find a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1).
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline] RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947 Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44 RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002 R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000 R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 FS: 000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline] mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604 __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline] direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769 kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline] kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065 vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122 vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428 vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494 kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722 kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:1069 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x440ce9
Cc: stable@vger.kernel.org Reported-by: syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com Fixes: 9ec19493fb86 ("KVM: x86: clear SMM flags before loading state while leaving SMM") Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210609185619.992058-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6279,7 +6279,10 @@ static unsigned emulator_get_hflags(stru
static void emulator_set_hflags(struct x86_emulate_ctxt *ctxt, unsigned emul_flags) { - emul_to_vcpu(ctxt)->arch.hflags = emul_flags; + struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); + + vcpu->arch.hflags = emul_flags; + kvm_mmu_reset_context(vcpu); }
static int emulator_pre_leave_smm(struct x86_emulate_ctxt *ctxt,
From: Vineet Gupta vgupta@synopsys.com
commit 96f1b00138cb8f04c742c82d0a7c460b2202e887 upstream.
ARCv2 has some configuration dependent registers (r30, r58, r59) which could be targetted by the compiler. To keep the ABI stable, these were unconditionally part of the glibc ABI (sysdeps/unix/sysv/linux/arc/sys/ucontext.h:mcontext_t) however we missed populating them (by saving/restoring them across signal handling).
This patch fixes the issue by - adding arcv2 ABI regs to kernel struct sigcontext - populating them during signal handling
Change to struct sigcontext might seem like a glibc ABI change (although it primarily uses ucontext_t:mcontext_t) but the fact is - it has only been extended (existing fields are not touched) - the old sigcontext was ABI incomplete to begin with anyways
Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/53 Cc: stable@vger.kernel.org Tested-by: kernel test robot lkp@intel.com Reported-by: Vladimir Isaev isaev@synopsys.com Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arc/include/uapi/asm/sigcontext.h | 1 arch/arc/kernel/signal.c | 43 +++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+)
--- a/arch/arc/include/uapi/asm/sigcontext.h +++ b/arch/arc/include/uapi/asm/sigcontext.h @@ -18,6 +18,7 @@ */ struct sigcontext { struct user_regs_struct regs; + struct user_regs_arcv2 v2abi; };
#endif /* _ASM_ARC_SIGCONTEXT_H */ --- a/arch/arc/kernel/signal.c +++ b/arch/arc/kernel/signal.c @@ -61,6 +61,41 @@ struct rt_sigframe { unsigned int sigret_magic; };
+static int save_arcv2_regs(struct sigcontext *mctx, struct pt_regs *regs) +{ + int err = 0; +#ifndef CONFIG_ISA_ARCOMPACT + struct user_regs_arcv2 v2abi; + + v2abi.r30 = regs->r30; +#ifdef CONFIG_ARC_HAS_ACCL_REGS + v2abi.r58 = regs->r58; + v2abi.r59 = regs->r59; +#else + v2abi.r58 = v2abi.r59 = 0; +#endif + err = __copy_to_user(&mctx->v2abi, &v2abi, sizeof(v2abi)); +#endif + return err; +} + +static int restore_arcv2_regs(struct sigcontext *mctx, struct pt_regs *regs) +{ + int err = 0; +#ifndef CONFIG_ISA_ARCOMPACT + struct user_regs_arcv2 v2abi; + + err = __copy_from_user(&v2abi, &mctx->v2abi, sizeof(v2abi)); + + regs->r30 = v2abi.r30; +#ifdef CONFIG_ARC_HAS_ACCL_REGS + regs->r58 = v2abi.r58; + regs->r59 = v2abi.r59; +#endif +#endif + return err; +} + static int stash_usr_regs(struct rt_sigframe __user *sf, struct pt_regs *regs, sigset_t *set) @@ -94,6 +129,10 @@ stash_usr_regs(struct rt_sigframe __user
err = __copy_to_user(&(sf->uc.uc_mcontext.regs.scratch), &uregs.scratch, sizeof(sf->uc.uc_mcontext.regs.scratch)); + + if (is_isa_arcv2()) + err |= save_arcv2_regs(&(sf->uc.uc_mcontext), regs); + err |= __copy_to_user(&sf->uc.uc_sigmask, set, sizeof(sigset_t));
return err ? -EFAULT : 0; @@ -109,6 +148,10 @@ static int restore_usr_regs(struct pt_re err |= __copy_from_user(&uregs.scratch, &(sf->uc.uc_mcontext.regs.scratch), sizeof(sf->uc.uc_mcontext.regs.scratch)); + + if (is_isa_arcv2()) + err |= restore_arcv2_regs(&(sf->uc.uc_mcontext), regs); + if (err) return -EFAULT;
From: Thomas Gleixner tglx@linutronix.de
commit 12f7764ac61200e32c916f038bdc08f884b0b604 upstream.
switch_fpu_finish() checks current->mm as indicator for kernel threads. That's wrong because kernel threads can temporarily use a mm of a user process via kthread_use_mm().
Check the task flags for PF_KTHREAD instead.
Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state") Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210608144345.912645927@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/fpu/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -607,7 +607,7 @@ static inline void switch_fpu_finish(str * PKRU state is switched eagerly because it needs to be valid before we * return to userland e.g. for a copy_to_user() operation. */ - if (current->mm) { + if (!(current->flags & PF_KTHREAD)) { pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); if (pk) pkru_val = pk->pkru;
From: Thomas Gleixner tglx@linutronix.de
commit 510b80a6a0f1a0d114c6e33bcea64747d127973c upstream.
When user space brings PKRU into init state, then the kernel handling is broken:
T1 user space xsave(state) state.header.xfeatures &= ~XFEATURE_MASK_PKRU; xrstor(state)
T1 -> kernel schedule() XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0 T1->flags |= TIF_NEED_FPU_LOAD;
wrpkru();
schedule() ... pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU); if (pk) wrpkru(pk->pkru); else wrpkru(DEFAULT_PKRU);
Because the xfeatures bit is 0 and therefore the value in the xsave storage is not valid, get_xsave_addr() returns NULL and switch_to() writes the default PKRU. -> FAIL #1!
So that wrecks any copy_to/from_user() on the way back to user space which hits memory which is protected by the default PKRU value.
Assumed that this does not fail (pure luck) then T1 goes back to user space and because TIF_NEED_FPU_LOAD is set it ends up in
switch_fpu_return() __fpregs_load_activate() if (!fpregs_state_valid()) { load_XSTATE_from_task(); }
But if nothing touched the FPU between T1 scheduling out and back in, then the fpregs_state is still valid which means switch_fpu_return() does nothing and just clears TIF_NEED_FPU_LOAD. Back to user space with DEFAULT_PKRU loaded. -> FAIL #2!
The fix is simple: if get_xsave_addr() returns NULL then set the PKRU value to 0 instead of the restrictive default PKRU value in init_pkru_value.
[ bp: Massage in minor nitpicks from folks. ]
Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state") Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Rik van Riel riel@surriel.com Tested-by: Babu Moger babu.moger@amd.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210608144346.045616965@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/fpu/internal.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -608,9 +608,16 @@ static inline void switch_fpu_finish(str * return to userland e.g. for a copy_to_user() operation. */ if (!(current->flags & PF_KTHREAD)) { + /* + * If the PKRU bit in xsave.header.xfeatures is not set, + * then the PKRU component was in init state, which means + * XRSTOR will set PKRU to 0. If the bit is not set then + * get_xsave_addr() will return NULL because the PKRU value + * in memory is not valid. This means pkru_val has to be + * set to 0 and not to init_pkru_value. + */ pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); - if (pk) - pkru_val = pk->pkru; + pkru_val = pk ? pk->pkru : 0; } __write_pkru(pkru_val); }
From: Thomas Gleixner tglx@linutronix.de
commit efa165504943f2128d50f63de0c02faf6dcceb0d upstream.
If access_ok() or fpregs_soft_set() fails in __fpu__restore_sig() then the function just returns but does not clear the FPU state as it does for all other fatal failures.
Clear the FPU state for these failures as well.
Fixes: 72a671ced66d ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels") Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87mtryyhhz.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fpu/signal.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -289,13 +289,17 @@ static int __fpu__restore_sig(void __use return 0; }
- if (!access_ok(buf, size)) - return -EACCES; + if (!access_ok(buf, size)) { + ret = -EACCES; + goto out; + }
- if (!static_cpu_has(X86_FEATURE_FPU)) - return fpregs_soft_set(current, NULL, - 0, sizeof(struct user_i387_ia32_struct), - NULL, buf) != 0; + if (!static_cpu_has(X86_FEATURE_FPU)) { + ret = fpregs_soft_set(current, NULL, 0, + sizeof(struct user_i387_ia32_struct), + NULL, buf); + goto out; + }
if (use_xsave()) { struct _fpx_sw_bytes fx_sw_user; @@ -333,7 +337,7 @@ static int __fpu__restore_sig(void __use if (ia32_fxstate) { ret = __copy_from_user(&env, buf, sizeof(env)); if (ret) - goto err_out; + goto out; envp = &env; } else { /* @@ -369,7 +373,7 @@ static int __fpu__restore_sig(void __use ret = validate_xstate_header(&fpu->state.xsave.header); } if (ret) - goto err_out; + goto out;
sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
@@ -382,7 +386,7 @@ static int __fpu__restore_sig(void __use ret = __copy_from_user(&fpu->state.fxsave, buf_fx, state_size); if (ret) { ret = -EFAULT; - goto err_out; + goto out; }
sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only); @@ -397,7 +401,7 @@ static int __fpu__restore_sig(void __use } else { ret = __copy_from_user(&fpu->state.fsave, buf_fx, state_size); if (ret) - goto err_out; + goto out;
fpregs_lock(); ret = copy_kernel_to_fregs_err(&fpu->state.fsave); @@ -408,7 +412,7 @@ static int __fpu__restore_sig(void __use fpregs_deactivate(fpu); fpregs_unlock();
-err_out: +out: if (ret) fpu__clear(fpu); return ret;
From: Bumyong Lee bumyong.lee@samsung.com
commit 4ad5dd2d7876d79507a20f026507d1a93b8fff10 upstream.
flags varible which is the input parameter of pl330_prep_dma_cyclic() should not be used by spinlock_irq[save/restore] function.
Signed-off-by: Jongho Park jongho7.park@samsung.com Signed-off-by: Bumyong Lee bumyong.lee@samsung.com Signed-off-by: Chanho Park chanho61.park@samsung.com Link: https://lore.kernel.org/r/20210507063647.111209-1-chanho61.park@samsung.com Fixes: f6f2421c0a1c ("dmaengine: pl330: Merge dma_pl330_dmac and pl330_dmac structs") Cc: stable@vger.kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/pl330.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2690,13 +2690,15 @@ static struct dma_async_tx_descriptor *p for (i = 0; i < len / period_len; i++) { desc = pl330_get_desc(pch); if (!desc) { + unsigned long iflags; + dev_err(pch->dmac->ddma.dev, "%s:%d Unable to fetch desc\n", __func__, __LINE__);
if (!first) return NULL;
- spin_lock_irqsave(&pl330->pool_lock, flags); + spin_lock_irqsave(&pl330->pool_lock, iflags);
while (!list_empty(&first->node)) { desc = list_entry(first->node.next, @@ -2706,7 +2708,7 @@ static struct dma_async_tx_descriptor *p
list_move_tail(&first->node, &pl330->desc_pool);
- spin_unlock_irqrestore(&pl330->pool_lock, flags); + spin_unlock_irqrestore(&pl330->pool_lock, iflags);
return NULL; }
From: Johannes Berg johannes.berg@intel.com
commit b5642479b0f7168fe16d156913533fe65ab4f8d5 upstream.
If all net/wireless/certs/*.hex files are deleted, the build will hang at this point since the 'cat' command will have no arguments. Do "echo | cat - ..." so that even if the "..." part is empty, the whole thing won't hang.
Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c989056c3664.Ic3b77531d00b3... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/wireless/Makefile +++ b/net/wireless/Makefile @@ -28,7 +28,7 @@ $(obj)/shipped-certs.c: $(wildcard $(src @$(kecho) " GEN $@" @(echo '#include "reg.h"'; \ echo 'const u8 shipped_regdb_certs[] = {'; \ - cat $^ ; \ + echo | cat - $^ ; \ echo '};'; \ echo 'unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);'; \ ) > $@
From: Avraham Stern avraham.stern@intel.com
commit 0288e5e16a2e18f0b7e61a2b70d9037fc6e4abeb upstream.
If cfg80211_pmsr_process_abort() moves all the PMSR requests that need to be freed into a local list before aborting and freeing them. As a result, it is possible that cfg80211_pmsr_complete() will run in parallel and free the same PMSR request.
Fix it by freeing the request in cfg80211_pmsr_complete() only if it is still in the original pmsr list.
Cc: stable@vger.kernel.org Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") Signed-off-by: Avraham Stern avraham.stern@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20210618133832.1fbef57e269a.I00294bebdb068... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/pmsr.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
--- a/net/wireless/pmsr.c +++ b/net/wireless/pmsr.c @@ -293,6 +293,7 @@ void cfg80211_pmsr_complete(struct wirel gfp_t gfp) { struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy); + struct cfg80211_pmsr_request *tmp, *prev, *to_free = NULL; struct sk_buff *msg; void *hdr;
@@ -323,9 +324,20 @@ free_msg: nlmsg_free(msg); free_request: spin_lock_bh(&wdev->pmsr_lock); - list_del(&req->list); + /* + * cfg80211_pmsr_process_abort() may have already moved this request + * to the free list, and will free it later. In this case, don't free + * it here. + */ + list_for_each_entry_safe(tmp, prev, &wdev->pmsr_list, list) { + if (tmp == req) { + list_del(&req->list); + to_free = req; + break; + } + } spin_unlock_bh(&wdev->pmsr_lock); - kfree(req); + kfree(to_free); } EXPORT_SYMBOL_GPL(cfg80211_pmsr_complete);
From: Yifan Zhang yifan1.zhang@amd.com
commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3 upstream.
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
Signed-off-by: Yifan Zhang yifan1.zhang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -3416,8 +3416,12 @@ static int gfx_v10_0_kiq_init_register(s if (ring->use_doorbell) { WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER, (adev->doorbell_index.kiq * 2) << 2); + /* If GC has entered CGPG, ringing doorbell > first page doesn't + * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround + * this issue. + */ WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER, - (adev->doorbell_index.userqueue_end * 2) << 2); + (adev->doorbell.size - 4)); }
WREG32_SOC15(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL,
From: Yifan Zhang yifan1.zhang@amd.com
commit 4cbbe34807938e6e494e535a68d5ff64edac3f20 upstream.
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
Signed-off-by: Yifan Zhang yifan1.zhang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -3593,8 +3593,12 @@ static int gfx_v9_0_kiq_init_register(st if (ring->use_doorbell) { WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_LOWER, (adev->doorbell_index.kiq * 2) << 2); + /* If GC has entered CGPG, ringing doorbell > first page doesn't + * wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround + * this issue. + */ WREG32_SOC15(GC, 0, mmCP_MEC_DOORBELL_RANGE_UPPER, - (adev->doorbell_index.userqueue_end * 2) << 2); + (adev->doorbell.size - 4)); }
WREG32_SOC15_RLC(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL,
From: Esben Haabendal esben@geanix.com
commit 6aa32217a9a446275440ee8724b1ecaf1838df47 upstream.
With the skb pointer piggy-backed on the TX BD, we have a simple and efficient way to free the skb buffer when the frame has been transmitted. But in order to avoid freeing the skb while there are still fragments from the skb in use, we need to piggy-back on the TX BD of the skb, not the first.
Without this, we are doing use-after-free on the DMA side, when the first BD of a multi TX BD packet is seen as completed in xmit_done, and the remaining BDs are still being processed.
Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Esben Haabendal esben@geanix.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/xilinx/ll_temac_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -873,7 +873,6 @@ temac_start_xmit(struct sk_buff *skb, st return NETDEV_TX_OK; } cur_p->phys = cpu_to_be32(skb_dma_addr); - ptr_to_txbd((void *)skb, cur_p);
for (ii = 0; ii < num_frag; ii++) { if (++lp->tx_bd_tail >= TX_BD_NUM) @@ -912,6 +911,11 @@ temac_start_xmit(struct sk_buff *skb, st } cur_p->app0 |= cpu_to_be32(STS_CTRL_APP0_EOP);
+ /* Mark last fragment with skb address, so it can be consumed + * in temac_start_xmit_done() + */ + ptr_to_txbd((void *)skb, cur_p); + tail_p = lp->tx_bd_p + sizeof(*lp->tx_bd_v) * lp->tx_bd_tail; lp->tx_bd_tail++; if (lp->tx_bd_tail >= TX_BD_NUM)
From: Esben Haabendal esben@geanix.com
commit c364df2489b8ef2f5e3159b1dff1ff1fdb16040d upstream.
Just as the initial check, we need to ensure num_frag+1 buffers available, as that is the number of buffers we are going to use.
This fixes a buffer overflow, which might be seen during heavy network load. Complete lockup of TEMAC was reproducible within about 10 minutes of a particular load.
Fixes: 84823ff80f74 ("net: ll_temac: Fix race condition causing TX hang") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Esben Haabendal esben@geanix.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/xilinx/ll_temac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -846,7 +846,7 @@ temac_start_xmit(struct sk_buff *skb, st smp_mb();
/* Space might have just been freed - check again */ - if (temac_check_tx_bd_space(lp, num_frag)) + if (temac_check_tx_bd_space(lp, num_frag + 1)) return NETDEV_TX_BUSY;
netif_wake_queue(ndev);
From: Nikolay Aleksandrov nikolay@nvidia.com
commit 58e2071742e38f29f051b709a5cca014ba51166f upstream.
This patch fixes a tunnel_dst null pointer dereference due to lockless access in the tunnel egress path. When deleting a vlan tunnel the tunnel_dst pointer is set to NULL without waiting a grace period (i.e. while it's still usable) and packets egressing are dereferencing it without checking. Use READ/WRITE_ONCE to annotate the lockless use of tunnel_id, use RCU for accessing tunnel_dst and make sure it is read only once and checked in the egress path. The dst is already properly RCU protected so we don't need to do anything fancy than to make sure tunnel_id and tunnel_dst are read only once and checked in the egress path.
Cc: stable@vger.kernel.org Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_private.h | 4 ++-- net/bridge/br_vlan_tunnel.c | 38 ++++++++++++++++++++++++-------------- 2 files changed, 26 insertions(+), 16 deletions(-)
--- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -96,8 +96,8 @@ struct br_vlan_stats { };
struct br_tunnel_info { - __be64 tunnel_id; - struct metadata_dst *tunnel_dst; + __be64 tunnel_id; + struct metadata_dst __rcu *tunnel_dst; };
/* private vlan flags */ --- a/net/bridge/br_vlan_tunnel.c +++ b/net/bridge/br_vlan_tunnel.c @@ -41,26 +41,33 @@ static struct net_bridge_vlan *br_vlan_t br_vlan_tunnel_rht_params); }
+static void vlan_tunnel_info_release(struct net_bridge_vlan *vlan) +{ + struct metadata_dst *tdst = rtnl_dereference(vlan->tinfo.tunnel_dst); + + WRITE_ONCE(vlan->tinfo.tunnel_id, 0); + RCU_INIT_POINTER(vlan->tinfo.tunnel_dst, NULL); + dst_release(&tdst->dst); +} + void vlan_tunnel_info_del(struct net_bridge_vlan_group *vg, struct net_bridge_vlan *vlan) { - if (!vlan->tinfo.tunnel_dst) + if (!rcu_access_pointer(vlan->tinfo.tunnel_dst)) return; rhashtable_remove_fast(&vg->tunnel_hash, &vlan->tnode, br_vlan_tunnel_rht_params); - vlan->tinfo.tunnel_id = 0; - dst_release(&vlan->tinfo.tunnel_dst->dst); - vlan->tinfo.tunnel_dst = NULL; + vlan_tunnel_info_release(vlan); }
static int __vlan_tunnel_info_add(struct net_bridge_vlan_group *vg, struct net_bridge_vlan *vlan, u32 tun_id) { - struct metadata_dst *metadata = NULL; + struct metadata_dst *metadata = rtnl_dereference(vlan->tinfo.tunnel_dst); __be64 key = key32_to_tunnel_id(cpu_to_be32(tun_id)); int err;
- if (vlan->tinfo.tunnel_dst) + if (metadata) return -EEXIST;
metadata = __ip_tun_set_dst(0, 0, 0, 0, 0, TUNNEL_KEY, @@ -69,8 +76,8 @@ static int __vlan_tunnel_info_add(struct return -EINVAL;
metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX | IP_TUNNEL_INFO_BRIDGE; - vlan->tinfo.tunnel_dst = metadata; - vlan->tinfo.tunnel_id = key; + rcu_assign_pointer(vlan->tinfo.tunnel_dst, metadata); + WRITE_ONCE(vlan->tinfo.tunnel_id, key);
err = rhashtable_lookup_insert_fast(&vg->tunnel_hash, &vlan->tnode, br_vlan_tunnel_rht_params); @@ -79,9 +86,7 @@ static int __vlan_tunnel_info_add(struct
return 0; out: - dst_release(&vlan->tinfo.tunnel_dst->dst); - vlan->tinfo.tunnel_dst = NULL; - vlan->tinfo.tunnel_id = 0; + vlan_tunnel_info_release(vlan);
return err; } @@ -181,12 +186,15 @@ int br_handle_ingress_vlan_tunnel(struct int br_handle_egress_vlan_tunnel(struct sk_buff *skb, struct net_bridge_vlan *vlan) { + struct metadata_dst *tunnel_dst; + __be64 tunnel_id; int err;
- if (!vlan || !vlan->tinfo.tunnel_id) + if (!vlan) return 0;
- if (unlikely(!skb_vlan_tag_present(skb))) + tunnel_id = READ_ONCE(vlan->tinfo.tunnel_id); + if (!tunnel_id || unlikely(!skb_vlan_tag_present(skb))) return 0;
skb_dst_drop(skb); @@ -194,7 +202,9 @@ int br_handle_egress_vlan_tunnel(struct if (err) return err;
- skb_dst_set(skb, dst_clone(&vlan->tinfo.tunnel_dst->dst)); + tunnel_dst = rcu_dereference(vlan->tinfo.tunnel_dst); + if (tunnel_dst) + skb_dst_set(skb, dst_clone(&tunnel_dst->dst));
return 0; }
From: Nikolay Aleksandrov nikolay@nvidia.com
commit cfc579f9d89af4ada58c69b03bcaa4887840f3b3 upstream.
The egress tunnel code uses dst_clone() and directly sets the result which is wrong because the entry might have 0 refcnt or be already deleted, causing number of problems. It also triggers the WARN_ON() in dst_hold()[1] when a refcnt couldn't be taken. Fix it by using dst_hold_safe() and checking if a reference was actually taken before setting the dst.
[1] dmesg WARN_ON log and following refcnt errors WARNING: CPU: 5 PID: 38 at include/net/dst.h:230 br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge] Modules linked in: 8021q garp mrp bridge stp llc bonding ipv6 virtio_net CPU: 5 PID: 38 Comm: ksoftirqd/5 Kdump: loaded Tainted: G W 5.13.0-rc3+ #360 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge] Code: e8 85 bc 01 e1 45 84 f6 74 90 45 31 f6 85 db 48 c7 c7 a0 02 19 a0 41 0f 94 c6 31 c9 31 d2 44 89 f6 e8 64 bc 01 e1 85 db 75 02 <0f> 0b 31 c9 31 d2 44 89 f6 48 c7 c7 70 02 19 a0 e8 4b bc 01 e1 49 RSP: 0018:ffff8881003d39e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa01902a0 RBP: ffff8881040c6700 R08: 0000000000000000 R09: 0000000000000001 R10: 2ce93d0054fe0d00 R11: 54fe0d00000e0000 R12: ffff888109515000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000401 FS: 0000000000000000(0000) GS:ffff88822bf40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f42ba70f030 CR3: 0000000109926000 CR4: 00000000000006e0 Call Trace: br_handle_vlan+0xbc/0xca [bridge] __br_forward+0x23/0x164 [bridge] deliver_clone+0x41/0x48 [bridge] br_handle_frame_finish+0x36f/0x3aa [bridge] ? skb_dst+0x2e/0x38 [bridge] ? br_handle_ingress_vlan_tunnel+0x3e/0x1c8 [bridge] ? br_handle_frame_finish+0x3aa/0x3aa [bridge] br_handle_frame+0x2c3/0x377 [bridge] ? __skb_pull+0x33/0x51 ? vlan_do_receive+0x4f/0x36a ? br_handle_frame_finish+0x3aa/0x3aa [bridge] __netif_receive_skb_core+0x539/0x7c6 ? __list_del_entry_valid+0x16e/0x1c2 __netif_receive_skb_list_core+0x6d/0xd6 netif_receive_skb_list_internal+0x1d9/0x1fa gro_normal_list+0x22/0x3e dev_gro_receive+0x55b/0x600 ? detach_buf_split+0x58/0x140 napi_gro_receive+0x94/0x12e virtnet_poll+0x15d/0x315 [virtio_net] __napi_poll+0x2c/0x1c9 net_rx_action+0xe6/0x1fb __do_softirq+0x115/0x2d8 run_ksoftirqd+0x18/0x20 smpboot_thread_fn+0x183/0x19c ? smpboot_unregister_percpu_thread+0x66/0x66 kthread+0x10a/0x10f ? kthread_mod_delayed_work+0xb6/0xb6 ret_from_fork+0x22/0x30 ---[ end trace 49f61b07f775fd2b ]--- dst_release: dst:00000000c02d677a refcnt:-1 dst_release underflow
Cc: stable@vger.kernel.org Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_vlan_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/bridge/br_vlan_tunnel.c +++ b/net/bridge/br_vlan_tunnel.c @@ -203,8 +203,8 @@ int br_handle_egress_vlan_tunnel(struct return err;
tunnel_dst = rcu_dereference(vlan->tinfo.tunnel_dst); - if (tunnel_dst) - skb_dst_set(skb, dst_clone(&tunnel_dst->dst)); + if (tunnel_dst && dst_hold_safe(&tunnel_dst->dst)) + skb_dst_set(skb, &tunnel_dst->dst);
return 0; }
From: Kees Cook keescook@chromium.org
commit 8669dbab2ae56085c128894b181c2aa50f97e368 upstream.
Patch series "Actually fix freelist pointer vs redzoning", v4.
This fixes redzoning vs the freelist pointer (both for middle-position and very small caches). Both are "theoretical" fixes, in that I see no evidence of such small-sized caches actually be used in the kernel, but that's no reason to let the bugs continue to exist, especially since people doing local development keep tripping over it. :)
This patch (of 3):
Instead of repeating "Redzone" and "Poison", clarify which sides of those zones got tripped. Additionally fix column alignment in the trailer.
Before:
BUG test (Tainted: G B ): Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
After:
BUG test (Tainted: G B ): Right Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
The earlier commits that slowly resulted in the "Before" reporting were:
d86bd1bece6f ("mm/slub: support left redzone") ffc79d288000 ("slub: use print_hex_dump") 2492268472e7 ("SLUB: change error reporting format to follow lockdep loosely")
Link: https://lkml.kernel.org/r/20210608183955.280836-1-keescook@chromium.org Link: https://lkml.kernel.org/r/20210608183955.280836-2-keescook@chromium.org Link: https://lore.kernel.org/lkml/cfdb11d7-fb8e-e578-c939-f7f5fb69a6bd@suse.cz/ Signed-off-by: Kees Cook keescook@chromium.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Marco Elver elver@google.com Cc: "Lin, Zhenpeng" zplin@psu.edu Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Roman Gushchin guro@fb.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/vm/slub.rst | 10 +++++----- mm/slub.c | 14 +++++++------- 2 files changed, 12 insertions(+), 12 deletions(-)
--- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -160,7 +160,7 @@ SLUB Debug output Here is a sample of slub debug output::
==================================================================== - BUG kmalloc-8: Redzone overwritten + BUG kmalloc-8: Right Redzone overwritten --------------------------------------------------------------------
INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc @@ -168,10 +168,10 @@ Here is a sample of slub debug output:: INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58 INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
- Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ - Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 - Redzone 0xc90f6d28: 00 cc cc cc . - Padding 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ + Bytes b4 (0xc90f6d10): 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ + Object (0xc90f6d20): 31 30 31 39 2e 30 30 35 1019.005 + Redzone (0xc90f6d28): 00 cc cc cc . + Padding (0xc90f6d50): 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
[<c010523d>] dump_trace+0x63/0x1eb [<c01053df>] show_trace_log_lvl+0x1a/0x2f --- a/mm/slub.c +++ b/mm/slub.c @@ -688,15 +688,15 @@ static void print_trailer(struct kmem_ca p, p - addr, get_freepointer(s, p));
if (s->flags & SLAB_RED_ZONE) - print_section(KERN_ERR, "Redzone ", p - s->red_left_pad, + print_section(KERN_ERR, "Redzone ", p - s->red_left_pad, s->red_left_pad); else if (p > addr + 16) print_section(KERN_ERR, "Bytes b4 ", p - 16, 16);
- print_section(KERN_ERR, "Object ", p, + print_section(KERN_ERR, "Object ", p, min_t(unsigned int, s->object_size, PAGE_SIZE)); if (s->flags & SLAB_RED_ZONE) - print_section(KERN_ERR, "Redzone ", p + s->object_size, + print_section(KERN_ERR, "Redzone ", p + s->object_size, s->inuse - s->object_size);
off = get_info_end(s); @@ -708,7 +708,7 @@ static void print_trailer(struct kmem_ca
if (off != size_from_object(s)) /* Beginning of the filler is the free pointer */ - print_section(KERN_ERR, "Padding ", p + off, + print_section(KERN_ERR, "Padding ", p + off, size_from_object(s) - off);
dump_stack(); @@ -882,11 +882,11 @@ static int check_object(struct kmem_cach u8 *endobject = object + s->object_size;
if (s->flags & SLAB_RED_ZONE) { - if (!check_bytes_and_report(s, page, object, "Redzone", + if (!check_bytes_and_report(s, page, object, "Left Redzone", object - s->red_left_pad, val, s->red_left_pad)) return 0;
- if (!check_bytes_and_report(s, page, object, "Redzone", + if (!check_bytes_and_report(s, page, object, "Right Redzone", endobject, val, s->inuse - s->object_size)) return 0; } else { @@ -901,7 +901,7 @@ static int check_object(struct kmem_cach if (val != SLUB_RED_ACTIVE && (s->flags & __OBJECT_POISON) && (!check_bytes_and_report(s, page, p, "Poison", p, POISON_FREE, s->object_size - 1) || - !check_bytes_and_report(s, page, p, "Poison", + !check_bytes_and_report(s, page, p, "End Poison", p + s->object_size - 1, POISON_END, 1))) return 0; /*
From: Kees Cook keescook@chromium.org
commit 74c1d3e081533825f2611e46edea1fcdc0701985 upstream.
The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void *), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"):
BUG test (Tainted: G B ): Right Redzone overwritten -----------------------------------------------------------------------------
INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........
Store the freelist pointer out of line when object_size is smaller than sizeof(void *) and redzoning is enabled.
Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes.
(Note that no caches within this size range are known to exist in the kernel currently.)
Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: 81819f0fc828 ("SLUB core") Signed-off-by: Kees Cook keescook@chromium.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Christoph Lameter cl@linux.com Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: "Lin, Zhenpeng" zplin@psu.edu Cc: Marco Elver elver@google.com Cc: Pekka Enberg penberg@kernel.org Cc: Roman Gushchin guro@fb.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/slab_common.c | 3 +-- mm/slub.c | 8 +++++--- 2 files changed, 6 insertions(+), 5 deletions(-)
--- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -85,8 +85,7 @@ EXPORT_SYMBOL(kmem_cache_size); #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, unsigned int size) { - if (!name || in_interrupt() || size < sizeof(void *) || - size > KMALLOC_MAX_SIZE) { + if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) { pr_err("kmem_cache_create(%s) integrity check failed\n", name); return -EINVAL; } --- a/mm/slub.c +++ b/mm/slub.c @@ -3586,15 +3586,17 @@ static int calculate_sizes(struct kmem_c */ s->inuse = size;
- if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || - s->ctor)) { + if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || + s->ctor) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on * kmem_cache_free. * * This is the case if we do RCU, have a constructor or - * destructor or are poisoning the objects. + * destructor, are poisoning the objects, or are + * redzoning an object smaller than sizeof(void *). * * The assumption that s->offset >= s->inuse means free * pointer is outside of the object is used in the
From: Andrew Morton akpm@linux-foundation.org
commit 1b3865d016815cbd69a1879ca1c8a8901fda1072 upstream.
Fixes build with CONFIG_SLAB_FREELIST_HARDENED=y.
Hopefully. But it's the right thing to do anwyay.
Fixes: 1ad53d9fa3f61 ("slub: improve bit diffusion for freelist ptr obfuscation") Link: https://bugzilla.kernel.org/show_bug.cgi?id=213417 Reported-by: vannguye@cisco.com Acked-by: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/slub.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/slub.c +++ b/mm/slub.c @@ -15,6 +15,7 @@ #include <linux/module.h> #include <linux/bit_spinlock.h> #include <linux/interrupt.h> +#include <linux/swab.h> #include <linux/bitops.h> #include <linux/slab.h> #include "slab.h"
From: Joakim Zhang qiangqing.zhang@nxp.com
commit 8f269102baf788aecfcbbc6313b6bceb54c9b990 upstream.
Platform drivers may call stmmac_probe_config_dt() to parse dt, could call stmmac_remove_config_dt() in error handing after dt parsed, so need disable clocks in stmmac_remove_config_dt().
Go through all platforms drivers which use stmmac_probe_config_dt(), none of them disable clocks manually, so it's safe to disable them in stmmac_remove_config_dt().
Fixes: commit d2ed0a7755fe ("net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks") Signed-off-by: Joakim Zhang qiangqing.zhang@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -624,6 +624,8 @@ error_pclk_get: void stmmac_remove_config_dt(struct platform_device *pdev, struct plat_stmmacenet_data *plat) { + clk_disable_unprepare(plat->stmmac_clk); + clk_disable_unprepare(plat->pclk); of_node_put(plat->phy_node); of_node_put(plat->mdio_node); }
From: Fugang Duan fugang.duan@nxp.com
commit cb3cefe3f3f8af27c6076ef7d1f00350f502055d upstream.
Add clock rate zero check to fix coverity issue of "divide by 0".
Fixes: commit 85bd1798b24a ("net: fec: fix spin_lock dead lock") Signed-off-by: Fugang Duan fugang.duan@nxp.com Signed-off-by: Joakim Zhang qiangqing.zhang@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/fec_ptp.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/ethernet/freescale/fec_ptp.c +++ b/drivers/net/ethernet/freescale/fec_ptp.c @@ -597,6 +597,10 @@ void fec_ptp_init(struct platform_device fep->ptp_caps.enable = fec_ptp_enable;
fep->cycle_speed = clk_get_rate(fep->clk_ptp); + if (!fep->cycle_speed) { + fep->cycle_speed = NSEC_PER_SEC; + dev_err(&fep->pdev->dev, "clk_ptp clock rate is zero\n"); + } fep->ptp_inc = NSEC_PER_SEC / fep->cycle_speed;
spin_lock_init(&fep->tmreg_lock);
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 1792a59eab9593de2eae36c40c5a22d70f52c026 upstream.
To pick the changes in:
321827477360934d ("icmp: don't send out ICMP messages with a source address of 0.0.0.0")
That don't result in any change in tooling, as INADDR_ are not used to generate id->string tables used by 'perf trace'.
This addresses this build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h' diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
Cc: David S. Miller davem@davemloft.net Cc: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/include/uapi/linux/in.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/tools/include/uapi/linux/in.h +++ b/tools/include/uapi/linux/in.h @@ -284,6 +284,9 @@ struct sockaddr_in { /* Address indicating an error return. */ #define INADDR_NONE ((unsigned long int) 0xffffffff)
+/* Dummy address for src of ICMP replies if no real address is set (RFC7600). */ +#define INADDR_DUMMY ((unsigned long int) 0xc0000008) + /* Network number for local host loopback. */ #define IN_LOOPBACKNET 127
From: Eric Auger eric.auger@redhat.com
commit 94ac0835391efc1a30feda6fc908913ec012951e upstream.
When reading the base address of the a REDIST region through KVM_VGIC_V3_ADDR_TYPE_REDIST we expect the redistributor region list to be populated with a single element.
However list_first_entry() expects the list to be non empty. Instead we should use list_first_entry_or_null which effectively returns NULL if the list is empty.
Fixes: dbd9733ab674 ("KVM: arm/arm64: Replace the single rdist region by a list") Cc: Stable@vger.kernel.org # v4.18+ Signed-off-by: Eric Auger eric.auger@redhat.com Reported-by: Gavin Shan gshan@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20210412150034.29185-1-eric.auger@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/arm/vgic/vgic-kvm-device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/virt/kvm/arm/vgic/vgic-kvm-device.c +++ b/virt/kvm/arm/vgic/vgic-kvm-device.c @@ -87,8 +87,8 @@ int kvm_vgic_addr(struct kvm *kvm, unsig r = vgic_v3_set_redist_base(kvm, 0, *addr, 0); goto out; } - rdreg = list_first_entry(&vgic->rd_regions, - struct vgic_redist_region, list); + rdreg = list_first_entry_or_null(&vgic->rd_regions, + struct vgic_redist_region, list); if (!rdreg) addr_ptr = &undef_value; else
From: afzal mohammed afzal.mohd.ma@gmail.com
commit b75ca5217743e4d7076cf65e044e88389e44318d upstream.
request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready.
Per tglx[1], setup_irq() existed in olden days when allocators were not ready by the time early interrupts were initialized.
Hence replace setup_irq() by request_irq().
[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org Signed-off-by: afzal mohammed afzal.mohd.ma@gmail.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-omap1/pm.c | 13 ++++++------- arch/arm/mach-omap1/time.c | 10 +++------- arch/arm/mach-omap1/timer32k.c | 10 +++------- arch/arm/mach-omap2/timer.c | 11 +++-------- 4 files changed, 15 insertions(+), 29 deletions(-)
--- a/arch/arm/mach-omap1/pm.c +++ b/arch/arm/mach-omap1/pm.c @@ -596,11 +596,6 @@ static irqreturn_t omap_wakeup_interrupt return IRQ_HANDLED; }
-static struct irqaction omap_wakeup_irq = { - .name = "peripheral wakeup", - .handler = omap_wakeup_interrupt -}; -
static const struct platform_suspend_ops omap_pm_ops = { @@ -613,6 +608,7 @@ static const struct platform_suspend_ops static int __init omap_pm_init(void) { int error = 0; + int irq;
if (!cpu_class_is_omap1()) return -ENODEV; @@ -656,9 +652,12 @@ static int __init omap_pm_init(void) arm_pm_idle = omap1_pm_idle;
if (cpu_is_omap7xx()) - setup_irq(INT_7XX_WAKE_UP_REQ, &omap_wakeup_irq); + irq = INT_7XX_WAKE_UP_REQ; else if (cpu_is_omap16xx()) - setup_irq(INT_1610_WAKE_UP_REQ, &omap_wakeup_irq); + irq = INT_1610_WAKE_UP_REQ; + if (request_irq(irq, omap_wakeup_interrupt, 0, "peripheral wakeup", + NULL)) + pr_err("Failed to request irq %d (peripheral wakeup)\n", irq);
/* Program new power ramp-up time * (0 for most boards since we don't lower voltage when in deep sleep) --- a/arch/arm/mach-omap1/time.c +++ b/arch/arm/mach-omap1/time.c @@ -155,15 +155,11 @@ static irqreturn_t omap_mpu_timer1_inter return IRQ_HANDLED; }
-static struct irqaction omap_mpu_timer1_irq = { - .name = "mpu_timer1", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap_mpu_timer1_interrupt, -}; - static __init void omap_init_mpu_timer(unsigned long rate) { - setup_irq(INT_TIMER1, &omap_mpu_timer1_irq); + if (request_irq(INT_TIMER1, omap_mpu_timer1_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "mpu_timer1", NULL)) + pr_err("Failed to request irq %d (mpu_timer1)\n", INT_TIMER1); omap_mpu_timer_start(0, (rate / HZ) - 1, 1);
clockevent_mpu_timer1.cpumask = cpumask_of(0); --- a/arch/arm/mach-omap1/timer32k.c +++ b/arch/arm/mach-omap1/timer32k.c @@ -148,15 +148,11 @@ static irqreturn_t omap_32k_timer_interr return IRQ_HANDLED; }
-static struct irqaction omap_32k_timer_irq = { - .name = "32KHz timer", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap_32k_timer_interrupt, -}; - static __init void omap_init_32k_timer(void) { - setup_irq(INT_OS_TIMER, &omap_32k_timer_irq); + if (request_irq(INT_OS_TIMER, omap_32k_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "32KHz timer", NULL)) + pr_err("Failed to request irq %d(32KHz timer)\n", INT_OS_TIMER);
clockevent_32k_timer.cpumask = cpumask_of(0); clockevents_config_and_register(&clockevent_32k_timer, --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -91,12 +91,6 @@ static irqreturn_t omap2_gp_timer_interr return IRQ_HANDLED; }
-static struct irqaction omap2_gp_timer_irq = { - .name = "gp_timer", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap2_gp_timer_interrupt, -}; - static int omap2_gp_timer_set_next_event(unsigned long cycles, struct clock_event_device *evt) { @@ -382,8 +376,9 @@ static void __init omap2_gp_clockevent_i &clockevent_gpt.name, OMAP_TIMER_POSTED); BUG_ON(res);
- omap2_gp_timer_irq.dev_id = &clkev; - setup_irq(clkev.irq, &omap2_gp_timer_irq); + if (request_irq(clkev.irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", &clkev)) + pr_err("Failed to request irq %d (gp_timer)\n", clkev.irq);
__omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW);
From: Tony Lindgren tony@atomide.com
commit 52762fbd1c4778ac9b173624ca0faacd22ef4724 upstream.
We can move the TI dmtimer clockevent and clocksource to live under drivers/clocksource if we rely only on the clock framework, and handle the module configuration directly in the clocksource driver based on the device tree data.
This removes the early dependency with system timers to the interconnect related code, and we can probe pretty much everything else later on at the module_init level.
Let's first add a new driver for timer-ti-dm-systimer based on existing arch/arm/mach-omap2/timer.c. Then let's start moving SoCs to probe with device tree data while still keeping the old timer.c. And eventually we can just drop the old timer.c.
Let's take the opportunity to switch to use readl/writel as pointed out by Daniel Lezcano daniel.lezcano@linaro.org. This allows further clean-up of the timer-ti-dm code the a lot of the shared helpers can just become static to the non-syster related code.
Note the boards can optionally configure different timer source clocks if needed with assigned-clocks and assigned-clock-parents.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 5.4.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-omap2/timer.c | 109 ++++++++++++++++++++++++++------------------ 1 file changed, 65 insertions(+), 44 deletions(-)
--- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -63,15 +63,28 @@
/* Clockevent code */
-static struct omap_dm_timer clkev; -static struct clock_event_device clockevent_gpt; - /* Clockevent hwmod for am335x and am437x suspend */ static struct omap_hwmod *clockevent_gpt_hwmod;
/* Clockesource hwmod for am437x suspend */ static struct omap_hwmod *clocksource_gpt_hwmod;
+struct dmtimer_clockevent { + struct clock_event_device dev; + struct omap_dm_timer timer; +}; + +static struct dmtimer_clockevent clockevent; + +static struct omap_dm_timer *to_dmtimer(struct clock_event_device *clockevent) +{ + struct dmtimer_clockevent *clkevt = + container_of(clockevent, struct dmtimer_clockevent, dev); + struct omap_dm_timer *timer = &clkevt->timer; + + return timer; +} + #ifdef CONFIG_SOC_HAS_REALTIME_COUNTER static unsigned long arch_timer_freq;
@@ -83,10 +96,11 @@ void set_cntfreq(void)
static irqreturn_t omap2_gp_timer_interrupt(int irq, void *dev_id) { - struct clock_event_device *evt = &clockevent_gpt; - - __omap_dm_timer_write_status(&clkev, OMAP_TIMER_INT_OVERFLOW); + struct dmtimer_clockevent *clkevt = dev_id; + struct clock_event_device *evt = &clkevt->dev; + struct omap_dm_timer *timer = &clkevt->timer;
+ __omap_dm_timer_write_status(timer, OMAP_TIMER_INT_OVERFLOW); evt->event_handler(evt); return IRQ_HANDLED; } @@ -94,7 +108,9 @@ static irqreturn_t omap2_gp_timer_interr static int omap2_gp_timer_set_next_event(unsigned long cycles, struct clock_event_device *evt) { - __omap_dm_timer_load_start(&clkev, OMAP_TIMER_CTRL_ST, + struct omap_dm_timer *timer = to_dmtimer(evt); + + __omap_dm_timer_load_start(timer, OMAP_TIMER_CTRL_ST, 0xffffffff - cycles, OMAP_TIMER_POSTED);
return 0; @@ -102,22 +118,26 @@ static int omap2_gp_timer_set_next_event
static int omap2_gp_timer_shutdown(struct clock_event_device *evt) { - __omap_dm_timer_stop(&clkev, OMAP_TIMER_POSTED, clkev.rate); + struct omap_dm_timer *timer = to_dmtimer(evt); + + __omap_dm_timer_stop(timer, OMAP_TIMER_POSTED, timer->rate); + return 0; }
static int omap2_gp_timer_set_periodic(struct clock_event_device *evt) { + struct omap_dm_timer *timer = to_dmtimer(evt); u32 period;
- __omap_dm_timer_stop(&clkev, OMAP_TIMER_POSTED, clkev.rate); + __omap_dm_timer_stop(timer, OMAP_TIMER_POSTED, timer->rate);
- period = clkev.rate / HZ; + period = timer->rate / HZ; period -= 1; /* Looks like we need to first set the load value separately */ - __omap_dm_timer_write(&clkev, OMAP_TIMER_LOAD_REG, 0xffffffff - period, + __omap_dm_timer_write(timer, OMAP_TIMER_LOAD_REG, 0xffffffff - period, OMAP_TIMER_POSTED); - __omap_dm_timer_load_start(&clkev, + __omap_dm_timer_load_start(timer, OMAP_TIMER_CTRL_AR | OMAP_TIMER_CTRL_ST, 0xffffffff - period, OMAP_TIMER_POSTED); return 0; @@ -131,26 +151,17 @@ static void omap_clkevt_idle(struct cloc omap_hwmod_idle(clockevent_gpt_hwmod); }
-static void omap_clkevt_unidle(struct clock_event_device *unused) +static void omap_clkevt_unidle(struct clock_event_device *evt) { + struct omap_dm_timer *timer = to_dmtimer(evt); + if (!clockevent_gpt_hwmod) return;
omap_hwmod_enable(clockevent_gpt_hwmod); - __omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW); + __omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW); }
-static struct clock_event_device clockevent_gpt = { - .features = CLOCK_EVT_FEAT_PERIODIC | - CLOCK_EVT_FEAT_ONESHOT, - .rating = 300, - .set_next_event = omap2_gp_timer_set_next_event, - .set_state_shutdown = omap2_gp_timer_shutdown, - .set_state_periodic = omap2_gp_timer_set_periodic, - .set_state_oneshot = omap2_gp_timer_shutdown, - .tick_resume = omap2_gp_timer_shutdown, -}; - static const struct of_device_id omap_timer_match[] __initconst = { { .compatible = "ti,omap2420-timer", }, { .compatible = "ti,omap3430-timer", }, @@ -360,44 +371,54 @@ static void __init omap2_gp_clockevent_i const char *fck_source, const char *property) { + struct dmtimer_clockevent *clkevt = &clockevent; + struct omap_dm_timer *timer = &clkevt->timer; int res;
- clkev.id = gptimer_id; - clkev.errata = omap_dm_timer_get_errata(); + timer->id = gptimer_id; + timer->errata = omap_dm_timer_get_errata(); + clkevt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; + clkevt->dev.rating = 300; + clkevt->dev.set_next_event = omap2_gp_timer_set_next_event; + clkevt->dev.set_state_shutdown = omap2_gp_timer_shutdown; + clkevt->dev.set_state_periodic = omap2_gp_timer_set_periodic; + clkevt->dev.set_state_oneshot = omap2_gp_timer_shutdown; + clkevt->dev.tick_resume = omap2_gp_timer_shutdown;
/* * For clock-event timers we never read the timer counter and * so we are not impacted by errata i103 and i767. Therefore, * we can safely ignore this errata for clock-event timers. */ - __omap_dm_timer_override_errata(&clkev, OMAP_TIMER_ERRATA_I103_I767); + __omap_dm_timer_override_errata(timer, OMAP_TIMER_ERRATA_I103_I767);
- res = omap_dm_timer_init_one(&clkev, fck_source, property, - &clockevent_gpt.name, OMAP_TIMER_POSTED); + res = omap_dm_timer_init_one(timer, fck_source, property, + &clkevt->dev.name, OMAP_TIMER_POSTED); BUG_ON(res);
- if (request_irq(clkev.irq, omap2_gp_timer_interrupt, - IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", &clkev)) - pr_err("Failed to request irq %d (gp_timer)\n", clkev.irq); - - __omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW); - - clockevent_gpt.cpumask = cpu_possible_mask; - clockevent_gpt.irq = omap_dm_timer_get_irq(&clkev); - clockevents_config_and_register(&clockevent_gpt, clkev.rate, + clkevt->dev.cpumask = cpu_possible_mask; + clkevt->dev.irq = omap_dm_timer_get_irq(timer); + + if (request_irq(timer->irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", clkevt)) + pr_err("Failed to request irq %d (gp_timer)\n", timer->irq); + + __omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW); + + clockevents_config_and_register(&clkevt->dev, timer->rate, 3, /* Timer internal resynch latency */ 0xffffffff);
if (soc_is_am33xx() || soc_is_am43xx()) { - clockevent_gpt.suspend = omap_clkevt_idle; - clockevent_gpt.resume = omap_clkevt_unidle; + clkevt->dev.suspend = omap_clkevt_idle; + clkevt->dev.resume = omap_clkevt_unidle;
clockevent_gpt_hwmod = - omap_hwmod_lookup(clockevent_gpt.name); + omap_hwmod_lookup(clkevt->dev.name); }
- pr_info("OMAP clockevent source: %s at %lu Hz\n", clockevent_gpt.name, - clkev.rate); + pr_info("OMAP clockevent source: %s at %lu Hz\n", clkevt->dev.name, + timer->rate); }
/* Clocksource code */
From: Tony Lindgren tony@atomide.com
commit 3efe7a878a11c13b5297057bfc1e5639ce1241ce upstream.
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days.
To work around the issue, we need to use timer-ti-dm timers instead.
Let's prepare for adding support for percpu timers by adding a common dmtimer_clkevt_init_common() and call it from __omap_sync32k_timer_init(). This patch makes no intentional functional changes.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 5.4.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-omap2/timer.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)
--- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -367,18 +367,21 @@ void tick_broadcast(const struct cpumask } #endif
-static void __init omap2_gp_clockevent_init(int gptimer_id, - const char *fck_source, - const char *property) +static void __init dmtimer_clkevt_init_common(struct dmtimer_clockevent *clkevt, + int gptimer_id, + const char *fck_source, + unsigned int features, + const struct cpumask *cpumask, + const char *property, + int rating, const char *name) { - struct dmtimer_clockevent *clkevt = &clockevent; struct omap_dm_timer *timer = &clkevt->timer; int res;
timer->id = gptimer_id; timer->errata = omap_dm_timer_get_errata(); - clkevt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; - clkevt->dev.rating = 300; + clkevt->dev.features = features; + clkevt->dev.rating = rating; clkevt->dev.set_next_event = omap2_gp_timer_set_next_event; clkevt->dev.set_state_shutdown = omap2_gp_timer_shutdown; clkevt->dev.set_state_periodic = omap2_gp_timer_set_periodic; @@ -396,19 +399,15 @@ static void __init omap2_gp_clockevent_i &clkevt->dev.name, OMAP_TIMER_POSTED); BUG_ON(res);
- clkevt->dev.cpumask = cpu_possible_mask; + clkevt->dev.cpumask = cpumask; clkevt->dev.irq = omap_dm_timer_get_irq(timer);
- if (request_irq(timer->irq, omap2_gp_timer_interrupt, - IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", clkevt)) - pr_err("Failed to request irq %d (gp_timer)\n", timer->irq); + if (request_irq(clkevt->dev.irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, name, clkevt)) + pr_err("Failed to request irq %d (gp_timer)\n", clkevt->dev.irq);
__omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW);
- clockevents_config_and_register(&clkevt->dev, timer->rate, - 3, /* Timer internal resynch latency */ - 0xffffffff); - if (soc_is_am33xx() || soc_is_am43xx()) { clkevt->dev.suspend = omap_clkevt_idle; clkevt->dev.resume = omap_clkevt_unidle; @@ -558,7 +557,12 @@ static void __init __omap_sync32k_timer_ { omap_clk_init(); omap_dmtimer_init(); - omap2_gp_clockevent_init(clkev_nr, clkev_src, clkev_prop); + dmtimer_clkevt_init_common(&clockevent, clkev_nr, clkev_src, + CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, + cpu_possible_mask, clkev_prop, 300, "clockevent"); + clockevents_config_and_register(&clockevent.dev, clockevent.timer.rate, + 3, /* Timer internal resynch latency */ + 0xffffffff);
/* Enable the use of clocksource="gp_timer" kernel parameter */ if (use_gptimer_clksrc || gptimer)
From: Tony Lindgren tony@atomide.com
commit 25de4ce5ed02994aea8bc111d133308f6fd62566 upstream.
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days.
To work around the issue, we need to use timer-ti-dm percpu timers instead.
Let's configure dmtimer3 and 4 as percpu timers by default, and warn about the issue if the dtb is not configured properly.
For more information, please see the errata for "AM572x Sitara Processors Silicon Revisions 1.1, 2.0":
https://www.ti.com/lit/er/sprz429m/sprz429m.pdf
The concept is based on earlier reference patches done by Tero Kristo and Keerthy.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 5.4.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/dra7-l4.dtsi | 4 +- arch/arm/boot/dts/dra7.dtsi | 20 +++++++++++++ arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 53 +++++++++++++++++++++++++++++++++++- drivers/clk/ti/clk-7xx.c | 1 include/linux/cpuhotplug.h | 1 6 files changed, 78 insertions(+), 5 deletions(-)
--- a/arch/arm/boot/dts/dra7-l4.dtsi +++ b/arch/arm/boot/dts/dra7-l4.dtsi @@ -1176,7 +1176,7 @@ }; };
- target-module@34000 { /* 0x48034000, ap 7 46.0 */ + timer3_target: target-module@34000 { /* 0x48034000, ap 7 46.0 */ compatible = "ti,sysc-omap4-timer", "ti,sysc"; ti,hwmods = "timer3"; reg = <0x34000 0x4>, @@ -1204,7 +1204,7 @@ }; };
- target-module@36000 { /* 0x48036000, ap 9 4e.0 */ + timer4_target: target-module@36000 { /* 0x48036000, ap 9 4e.0 */ compatible = "ti,sysc-omap4-timer", "ti,sysc"; ti,hwmods = "timer4"; reg = <0x36000 0x4>, --- a/arch/arm/boot/dts/dra7.dtsi +++ b/arch/arm/boot/dts/dra7.dtsi @@ -46,6 +46,7 @@
timer { compatible = "arm,armv7-timer"; + status = "disabled"; /* See ARM architected timer wrap erratum i940 */ interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, <GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, @@ -766,3 +767,22 @@
#include "dra7-l4.dtsi" #include "dra7xx-clocks.dtsi" + +/* Local timers, see ARM architected timer wrap erratum i940 */ +&timer3_target { + ti,no-reset-on-init; + ti,no-idle; + timer@0 { + assigned-clocks = <&l4per_clkctrl DRA7_L4PER_TIMER3_CLKCTRL 24>; + assigned-clock-parents = <&timer_sys_clk_div>; + }; +}; + +&timer4_target { + ti,no-reset-on-init; + ti,no-idle; + timer@0 { + assigned-clocks = <&l4per_clkctrl DRA7_L4PER_TIMER4_CLKCTRL 24>; + assigned-clock-parents = <&timer_sys_clk_div>; + }; +}; --- a/arch/arm/mach-omap2/board-generic.c +++ b/arch/arm/mach-omap2/board-generic.c @@ -327,7 +327,7 @@ DT_MACHINE_START(DRA74X_DT, "Generic DRA .init_late = dra7xx_init_late, .init_irq = omap_gic_of_init, .init_machine = omap_generic_init, - .init_time = omap5_realtime_timer_init, + .init_time = omap3_gptimer_timer_init, .dt_compat = dra74x_boards_compat, .restart = omap44xx_restart, MACHINE_END @@ -350,7 +350,7 @@ DT_MACHINE_START(DRA72X_DT, "Generic DRA .init_late = dra7xx_init_late, .init_irq = omap_gic_of_init, .init_machine = omap_generic_init, - .init_time = omap5_realtime_timer_init, + .init_time = omap3_gptimer_timer_init, .dt_compat = dra72x_boards_compat, .restart = omap44xx_restart, MACHINE_END --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -42,6 +42,7 @@ #include <linux/platform_device.h> #include <linux/platform_data/dmtimer-omap.h> #include <linux/sched_clock.h> +#include <linux/cpu.h>
#include <asm/mach/time.h>
@@ -420,6 +421,53 @@ static void __init dmtimer_clkevt_init_c timer->rate); }
+static DEFINE_PER_CPU(struct dmtimer_clockevent, dmtimer_percpu_timer); + +static int omap_gptimer_starting_cpu(unsigned int cpu) +{ + struct dmtimer_clockevent *clkevt = per_cpu_ptr(&dmtimer_percpu_timer, cpu); + struct clock_event_device *dev = &clkevt->dev; + struct omap_dm_timer *timer = &clkevt->timer; + + clockevents_config_and_register(dev, timer->rate, 3, ULONG_MAX); + irq_force_affinity(dev->irq, cpumask_of(cpu)); + + return 0; +} + +static int __init dmtimer_percpu_quirk_init(void) +{ + struct dmtimer_clockevent *clkevt; + struct clock_event_device *dev; + struct device_node *arm_timer; + struct omap_dm_timer *timer; + int cpu = 0; + + arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); + if (of_device_is_available(arm_timer)) { + pr_warn_once("ARM architected timer wrap issue i940 detected\n"); + return 0; + } + + for_each_possible_cpu(cpu) { + clkevt = per_cpu_ptr(&dmtimer_percpu_timer, cpu); + dev = &clkevt->dev; + timer = &clkevt->timer; + + dmtimer_clkevt_init_common(clkevt, 0, "timer_sys_ck", + CLOCK_EVT_FEAT_ONESHOT, + cpumask_of(cpu), + "assigned-clock-parents", + 500, "percpu timer"); + } + + cpuhp_setup_state(CPUHP_AP_OMAP_DM_TIMER_STARTING, + "clockevents/omap/gptimer:starting", + omap_gptimer_starting_cpu, NULL); + + return 0; +} + /* Clocksource code */ static struct omap_dm_timer clksrc; static bool use_gptimer_clksrc __initdata; @@ -564,6 +612,9 @@ static void __init __omap_sync32k_timer_ 3, /* Timer internal resynch latency */ 0xffffffff);
+ if (soc_is_dra7xx()) + dmtimer_percpu_quirk_init(); + /* Enable the use of clocksource="gp_timer" kernel parameter */ if (use_gptimer_clksrc || gptimer) omap2_gptimer_clocksource_init(clksrc_nr, clksrc_src, @@ -591,7 +642,7 @@ void __init omap3_secure_sync32k_timer_i #endif /* CONFIG_ARCH_OMAP3 */
#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM33XX) || \ - defined(CONFIG_SOC_AM43XX) + defined(CONFIG_SOC_AM43XX) || defined(CONFIG_SOC_DRA7XX) void __init omap3_gptimer_timer_init(void) { __omap_sync32k_timer_init(2, "timer_sys_ck", NULL, --- a/drivers/clk/ti/clk-7xx.c +++ b/drivers/clk/ti/clk-7xx.c @@ -793,6 +793,7 @@ static struct ti_dt_clk dra7xx_clks[] = DT_CLK(NULL, "timer_32k_ck", "sys_32k_ck"), DT_CLK(NULL, "sys_clkin_ck", "timer_sys_clk_div"), DT_CLK(NULL, "sys_clkin", "sys_clkin1"), + DT_CLK(NULL, "timer_sys_ck", "timer_sys_clk_div"), DT_CLK(NULL, "atl_dpll_clk_mux", "atl-clkctrl:0000:24"), DT_CLK(NULL, "atl_gfclk_mux", "atl-clkctrl:0000:26"), DT_CLK(NULL, "dcan1_sys_clk_mux", "wkupaon-clkctrl:0068:24"), --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -119,6 +119,7 @@ enum cpuhp_state { CPUHP_AP_ARM_L2X0_STARTING, CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_ARCH_TIMER_STARTING, + CPUHP_AP_OMAP_DM_TIMER_STARTING, CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, CPUHP_AP_JCORE_TIMER_STARTING, CPUHP_AP_ARM_TWD_STARTING,
From: Jack Pham jackp@codeaurora.org
commit 8d396bb0a5b62b326f6be7594d8bd46b088296bd upstream.
The DWC3 DebugFS directory and files are currently created once during probe. This includes creation of subdirectories for each of the gadget's endpoints. This works fine for peripheral-only controllers, as dwc3_core_init_mode() calls dwc3_gadget_init() just prior to calling dwc3_debugfs_init().
However, for dual-role controllers, dwc3_core_init_mode() will instead call dwc3_drd_init() which is problematic in a few ways. First, the initial state must be determined, then dwc3_set_mode() will have to schedule drd_work and by then dwc3_debugfs_init() could have already been invoked. Even if the initial mode is peripheral, dwc3_gadget_init() happens after the DebugFS files are created, and worse so if the initial state is host and the controller switches to peripheral much later. And secondly, even if the gadget endpoints' debug entries were successfully created, if the controller exits peripheral mode, its dwc3_eps are freed so the debug files would now hold stale references.
So it is best if the DebugFS endpoint entries are created and removed dynamically at the same time the underlying dwc3_eps are. Do this by calling dwc3_debugfs_create_endpoint_dir() as each endpoint is created, and conversely remove the DebugFS entry when the endpoint is freed.
Fixes: 41ce1456e1db ("usb: dwc3: core: make dwc3_set_mode() work properly") Cc: stable stable@vger.kernel.org Reviewed-by: Peter Chen peter.chen@kernel.org Signed-off-by: Jack Pham jackp@codeaurora.org Link: https://lore.kernel.org/r/20210529192932.22912-1-jackp@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/debug.h | 3 +++ drivers/usb/dwc3/debugfs.c | 21 ++------------------- drivers/usb/dwc3/gadget.c | 3 +++ 3 files changed, 8 insertions(+), 19 deletions(-)
--- a/drivers/usb/dwc3/debug.h +++ b/drivers/usb/dwc3/debug.h @@ -409,9 +409,12 @@ static inline const char *dwc3_gadget_ge
#ifdef CONFIG_DEBUG_FS +extern void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep); extern void dwc3_debugfs_init(struct dwc3 *); extern void dwc3_debugfs_exit(struct dwc3 *); #else +static inline void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep) +{ } static inline void dwc3_debugfs_init(struct dwc3 *d) { } static inline void dwc3_debugfs_exit(struct dwc3 *d) --- a/drivers/usb/dwc3/debugfs.c +++ b/drivers/usb/dwc3/debugfs.c @@ -878,30 +878,14 @@ static void dwc3_debugfs_create_endpoint } }
-static void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep, - struct dentry *parent) +void dwc3_debugfs_create_endpoint_dir(struct dwc3_ep *dep) { struct dentry *dir;
- dir = debugfs_create_dir(dep->name, parent); + dir = debugfs_create_dir(dep->name, dep->dwc->root); dwc3_debugfs_create_endpoint_files(dep, dir); }
-static void dwc3_debugfs_create_endpoint_dirs(struct dwc3 *dwc, - struct dentry *parent) -{ - int i; - - for (i = 0; i < dwc->num_eps; i++) { - struct dwc3_ep *dep = dwc->eps[i]; - - if (!dep) - continue; - - dwc3_debugfs_create_endpoint_dir(dep, parent); - } -} - void dwc3_debugfs_init(struct dwc3 *dwc) { struct dentry *root; @@ -935,7 +919,6 @@ void dwc3_debugfs_init(struct dwc3 *dwc) &dwc3_testmode_fops); debugfs_create_file("link_state", S_IRUGO | S_IWUSR, root, dwc, &dwc3_link_state_fops); - dwc3_debugfs_create_endpoint_dirs(dwc, root); } }
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2483,6 +2483,8 @@ static int dwc3_gadget_init_endpoint(str INIT_LIST_HEAD(&dep->started_list); INIT_LIST_HEAD(&dep->cancelled_list);
+ dwc3_debugfs_create_endpoint_dir(dep); + return 0; }
@@ -2526,6 +2528,7 @@ static void dwc3_gadget_free_endpoints(s list_del(&dep->endpoint.ep_list); }
+ debugfs_remove_recursive(debugfs_lookup(dep->name, dwc->root)); kfree(dep); } }
From: Peter Chen peter.chen@kernel.org
commit 4bf584a03eec674975ee9fe36c8583d9d470dab1 upstream.
When do system reboot, it calls dwc3_shutdown and the whole debugfs for dwc3 has removed first, when the gadget tries to do deinit, and remove debugfs for its endpoints, it meets NULL pointer dereference issue when call debugfs_lookup. Fix it by removing the whole dwc3 debugfs later than dwc3_drd_exit.
[ 2924.958838] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000002 .... [ 2925.030994] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 2925.037005] pc : inode_permission+0x2c/0x198 [ 2925.041281] lr : lookup_one_len_common+0xb0/0xf8 [ 2925.045903] sp : ffff80001276ba70 [ 2925.049218] x29: ffff80001276ba70 x28: ffff0000c01f0000 x27: 0000000000000000 [ 2925.056364] x26: ffff800011791e70 x25: 0000000000000008 x24: dead000000000100 [ 2925.063510] x23: dead000000000122 x22: 0000000000000000 x21: 0000000000000001 [ 2925.070652] x20: ffff8000122c6188 x19: 0000000000000000 x18: 0000000000000000 [ 2925.077797] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000004 [ 2925.084943] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000030 [ 2925.092087] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f x9 : ffff8000102b2420 [ 2925.099232] x8 : 7f7f7f7f7f7f7f7f x7 : feff73746e2f6f64 x6 : 0000000000008080 [ 2925.106378] x5 : 61c8864680b583eb x4 : 209e6ec2d263dbb7 x3 : 000074756f307065 [ 2925.113523] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff8000122c6188 [ 2925.120671] Call trace: [ 2925.123119] inode_permission+0x2c/0x198 [ 2925.127042] lookup_one_len_common+0xb0/0xf8 [ 2925.131315] lookup_one_len_unlocked+0x34/0xb0 [ 2925.135764] lookup_positive_unlocked+0x14/0x50 [ 2925.140296] debugfs_lookup+0x68/0xa0 [ 2925.143964] dwc3_gadget_free_endpoints+0x84/0xb0 [ 2925.148675] dwc3_gadget_exit+0x28/0x78 [ 2925.152518] dwc3_drd_exit+0x100/0x1f8 [ 2925.156267] dwc3_remove+0x11c/0x120 [ 2925.159851] dwc3_shutdown+0x14/0x20 [ 2925.163432] platform_shutdown+0x28/0x38 [ 2925.167360] device_shutdown+0x15c/0x378 [ 2925.171291] kernel_restart_prepare+0x3c/0x48 [ 2925.175650] kernel_restart+0x1c/0x68 [ 2925.179316] __do_sys_reboot+0x218/0x240 [ 2925.183247] __arm64_sys_reboot+0x28/0x30 [ 2925.187262] invoke_syscall+0x48/0x100 [ 2925.191017] el0_svc_common.constprop.0+0x48/0xc8 [ 2925.195726] do_el0_svc+0x28/0x88 [ 2925.199045] el0_svc+0x20/0x30 [ 2925.202104] el0_sync_handler+0xa8/0xb0 [ 2925.205942] el0_sync+0x148/0x180 [ 2925.209270] Code: a9025bf5 2a0203f5 121f0056 370802b5 (79400660) [ 2925.215372] ---[ end trace 124254d8e485a58b ]--- [ 2925.220012] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 2925.227676] Kernel Offset: disabled [ 2925.231164] CPU features: 0x00001001,20000846 [ 2925.235521] Memory Limit: none [ 2925.238580] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Fixes: 8d396bb0a5b6 ("usb: dwc3: debugfs: Add and remove endpoint dirs dynamically") Cc: Jack Pham jackp@codeaurora.org Tested-by: Jack Pham jackp@codeaurora.org Signed-off-by: Peter Chen peter.chen@kernel.org Link: https://lore.kernel.org/r/20210608105656.10795-1-peter.chen@kernel.org (cherry picked from commit 2a042767814bd0edf2619f06fecd374e266ea068) Link: https://lore.kernel.org/r/20210615080847.GA10432@jackp-linux.qualcomm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -1575,8 +1575,8 @@ static int dwc3_remove(struct platform_d
pm_runtime_get_sync(&pdev->dev);
- dwc3_debugfs_exit(dwc); dwc3_core_exit_mode(dwc); + dwc3_debugfs_exit(dwc);
dwc3_core_exit(dwc); dwc3_ulpi_exit(dwc);
On 6/21/21 9:14 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.128-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB, using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Mon, 21 Jun 2021 at 21:48, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.128-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.4.128-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-5.4.y * git commit: 3840287eb948c813b04c456d2ae96f20a77aedee * git describe: v5.4.127-91-g3840287eb948 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.12...
## No regressions (compared to v5.4.126-29-g4e778e863160)
## No fixes (compared to v5.4.126-29-g4e778e863160)
## Test result summary total: 74568, pass: 60365, fail: 1139, skip: 11818, xfail: 1246,
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 192 total, 192 passed, 0 failed * arm64: 26 total, 26 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 15 total, 15 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 45 total, 45 passed, 0 failed * parisc: 9 total, 9 passed, 0 failed * powerpc: 27 total, 27 passed, 0 failed * riscv: 21 total, 21 passed, 0 failed * s390: 9 total, 9 passed, 0 failed * sh: 18 total, 18 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 26 total, 26 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest-android * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On 2021/6/22 0:14, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.128-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Tested on arm64 and x86 for 5.4.128-rc1,
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.4.y Version: 5.4.128-rc1 Commit: 3840287eb948c813b04c456d2ae96f20a77aedee Compiler: gcc version 7.3.0 (GCC)
arm64: -------------------------------------------------------------------- Testcase Result Summary: total: 8905 passed: 8905 failed: 0 timeout: 0 --------------------------------------------------------------------
x86: -------------------------------------------------------------------- Testcase Result Summary: total: 8905 passed: 8905 failed: 0 timeout: 0 --------------------------------------------------------------------
Tested-by: Hulk Robot hulkrobot@huawei.com
On Mon, 21 Jun 2021 18:14:35 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.128-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.4: 12 builds: 12 pass, 0 fail 26 boots: 26 pass, 0 fail 59 tests: 59 pass, 0 fail
Linux version: 5.4.128-rc1-g3840287eb948 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi Greg,
On Mon, Jun 21, 2021 at 06:14:35PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.1.1 20210615): 65 configs -> no failure arm (gcc version 11.1.1 20210615): 107 configs -> no new failure arm64 (gcc version 11.1.1 20210615): 2 configs -> no failure x86_64 (gcc version 10.2.1 20210110): 2 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression.
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Mon, Jun 21, 2021 at 06:14:35PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 428 pass: 428 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 6/21/21 10:14 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.128 release. There are 90 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 23 Jun 2021 15:48:46 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.128-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org