This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.8.15-rc1
Cong Wang xiyou.wangcong@gmail.com net_sched: commit action insertions together
Cong Wang xiyou.wangcong@gmail.com net_sched: defer tcf_idr_insert() in tcf_action_init_1()
Manivannan Sadhasivam manivannan.sadhasivam@linaro.org net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks
Anant Thazhemadam anant.thazhemadam@gmail.com net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
Xiongfeng Wang wangxiongfeng2@huawei.com Input: ati_remote2 - add missing newlines when printing module parameters
Alexey Kardashevskiy aik@ozlabs.ru tty/vt: Do not warn when huge selection requested
Aya Levin ayal@mellanox.com net/mlx5e: Fix driver's declaration to support GRE offload
Rohit Maheshwari rohitm@chelsio.com net/tls: race causes kernel panic
Nikolay Aleksandrov nikolay@nvidia.com net: bridge: fdb: don't flush ext_learn entries
Guillaume Nault gnault@redhat.com net/core: check length before updating Ethertype in skb_mpls_{push,pop}
Johannes Berg johannes.berg@intel.com netlink: fix policy dump leak
Eric Dumazet edumazet@google.com tcp: fix receive window update in tcp_add_backlog()
Vijay Balakrishna vijayb@linux.microsoft.com mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
Minchan Kim minchan@kernel.org mm: validate inode in mapping_set_error()
Coly Li colyli@suse.de mmc: core: don't set limits.discard_granularity as 0
Kajol Jain kjain@linux.ibm.com perf: Fix task_function_call() error handling
David Howells dhowells@redhat.com afs: Fix deadlock between writeback and truncate
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP
Maxim Kochetkov fido_max@inbox.ru net: mscc: ocelot: extend watermark encoding function
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: split writes to pause frame enable bit and to thresholds
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: rename ocelot_board.c to ocelot_vsc7514.c
David Howells dhowells@redhat.com rxrpc: Fix server keyring leak
David Howells dhowells@redhat.com rxrpc: The server keyring isn't network-namespaced
David Howells dhowells@redhat.com rxrpc: Fix some missing _bh annotations on locking conn->state_lock
David Howells dhowells@redhat.com rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
Marc Dionne marc.dionne@auristor.com rxrpc: Fix rxkad token xdr encoding
Tom Rix trix@redhat.com net: mvneta: fix double free of txq->buf
Si-Wei Liu si-wei.liu@oracle.com vhost-vdpa: fix page pinning leakage in error path
Si-Wei Liu si-wei.liu@oracle.com vhost-vdpa: fix vhost_vdpa_map() on error condition
Randy Dunlap rdunlap@infradead.org net: hinic: fix DEVLINK build errors
Vineetha G. Jaya Kumaran vineetha.g.jaya.kumaran@intel.com net: stmmac: Modify configuration method of EEE timers
Vlad Buslov vladbu@nvidia.com net/mlx5e: Fix race condition on nhe->n pointer in neigh update
Aya Levin ayal@nvidia.com net/mlx5e: Fix VLAN create flow
Aya Levin ayal@nvidia.com net/mlx5e: Fix VLAN cleanup flow
Aya Levin ayal@mellanox.com net/mlx5e: Fix return status when setting unsupported FEC mode
Aya Levin ayal@mellanox.com net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU
Maor Gottlieb maorg@nvidia.com net/mlx5: Fix request_irqs error flow
Eran Ben Elisha eranbe@nvidia.com net/mlx5: Add retry mechanism to the command entry index allocation
Eran Ben Elisha eranbe@mellanox.com net/mlx5: poll cmd EQ in case of command timeout
Eran Ben Elisha eranbe@mellanox.com net/mlx5: Avoid possible free of command entry while timeout comp handler
Eran Ben Elisha eranbe@mellanox.com net/mlx5: Fix a race when moving command interface to polling mode
Qian Cai cai@redhat.com pipe: Fix memory leaks in create_pipe_files()
Hariprasad Kelam hkelam@marvell.com octeontx2-pf: Fix synchnorization issue in mbox
Hariprasad Kelam hkelam@marvell.com octeontx2-pf: Fix the device state on error
Geetha sowjanya gakula@marvell.com octeontx2-pf: Fix TCP/UDP checksum offload for IPv6 frames
Subbaraya Sundeep sbhatta@marvell.com octeontx2-af: Fix enable/disable of default NPC entries
Willy Liu willy.liu@realtek.com net: phy: realtek: fix rtl8211e rx/tx delay config
Tonghao Zhang xiangxia.m.yue@gmail.com virtio-net: don't disable guest csum when disable LRO
Wilken Gottwalt wilken.gottwalt@mailbox.org net: usb: ax88179_178a: fix missing stop entry in driver_info
Heiner Kallweit hkallweit1@gmail.com r8169: fix RTL8168f/RTL8411 EPHY config
Ido Schimmel idosch@nvidia.com mlxsw: spectrum_acl: Fix mlxsw_sp_acl_tcam_group_add()'s error path
Randy Dunlap rdunlap@infradead.org mdio: fix mdio-thunder.c dependency & build error
Eric Dumazet edumazet@google.com bonding: set dev->needed_headroom in bond_setup_by_slave()
Ivan Khoronzhuk ivan.khoronzhuk@gmail.com net: ethernet: cavium: octeon_mgmt: use phy_start and phy_stop
Wong Vee Khee vee.khee.wong@intel.com net: stmmac: Fix clock handling on remove path
Ronak Doshi doshir@vmware.com vmxnet3: fix cksum offload issues for non-udp tunnels
Jacob Keller jacob.e.keller@intel.com ice: fix memory leak in ice_vsi_setup
Jacob Keller jacob.e.keller@intel.com ice: fix memory leak if register_netdev_fails
Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com iavf: Fix incorrect adapter get in iavf_resume
Vaibhav Gupta vaibhavgupta40@gmail.com iavf: use generic power management
Herbert Xu herbert@gondor.apana.org.au xfrm: Use correct address family in xfrm_state_find
Xiaoliang Yang xiaoliang.yang_1@nxp.com net: dsa: felix: convert TAS link speed based on phylink speed
Luo bin luobin9@huawei.com hinic: fix wrong return value of mac-set cmd
Luo bin luobin9@huawei.com hinic: add log in exception handling processes
Necip Fazil Yildiran fazilyildiran@gmail.com platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP
Necip Fazil Yildiran fazilyildiran@gmail.com platform/x86: fix kconfig dependency warning for LG_LAPTOP
Voon Weifeng weifeng.voon@intel.com net: stmmac: removed enabling eee in EEE set callback
Magnus Karlsson magnus.karlsson@intel.com xsk: Do not discard packet when NETDEV_TX_BUSY
Antony Antony antony.antony@secunet.com xfrm: clone whole liftime_cur structure in xfrm_do_migrate
Antony Antony antony.antony@secunet.com xfrm: clone XFRMA_SEC_CTX in xfrm_do_migrate
Antony Antony antony.antony@secunet.com xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate
Antony Antony antony.antony@secunet.com xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate
Lu Baolu baolu.lu@linux.intel.com iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
Josef Bacik josef@toxicpanda.com btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks
Zack Rusin zackr@vmware.com drm/vmwgfx: Fix error handling in get_node
Flora Cui flora.cui@amd.com drm/amd/display: fix return value check for hdcp_work
Sudheesh Mavila sudheesh.mavila@amd.com drm/amd/pm: Removed fixed clock in auto mode DPM
Jens Axboe axboe@kernel.dk io_uring: fix potential ABBA deadlock in ->show_fdinfo()
Josef Bacik josef@toxicpanda.com btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing
Philip Yang Philip.Yang@amd.com drm/amdgpu: prevent double kfree ttm->sg
Dumitru Ceara dceara@redhat.com openvswitch: handle DNAT tuple collision
Anant Thazhemadam anant.thazhemadam@gmail.com net: team: fix memory leak in __team_options_register
Eric Dumazet edumazet@google.com team: set dev->needed_headroom in team_setup_by_port()
Eric Dumazet edumazet@google.com sctp: fix sctp_auth_init_hmacs() error path
Cristian Ciocaltea cristian.ciocaltea@gmail.com i2c: owl: Clear NACK and BUS error bits
Nicolas Belin nbelin@baylibre.com i2c: meson: fixup rate calculation with filter delay
Jerome Brunet jbrunet@baylibre.com i2c: meson: keep peripheral clock enabled
Jerome Brunet jbrunet@baylibre.com i2c: meson: fix clock setting overwrite
Vladimir Zapolskiy vladimir@tuxera.com cifs: Fix incomplete memory allocation on setxattr path
Sabrina Dubroca sd@queasysnail.net espintcp: restore IP CB before handing the packet to xfrm
Sabrina Dubroca sd@queasysnail.net xfrmi: drop ignore_df check before updating pmtu
Coly Li colyli@suse.de nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
Coly Li colyli@suse.de tcp: use sendpage_ok() to detect misused .sendpage
Coly Li colyli@suse.de net: introduce helper sendpage_ok() in include/linux/net.h
Hugh Dickins hughd@google.com mm/khugepaged: fix filemap page_to_pgoff(page) != offset
Andy Shevchenko andriy.shevchenko@linux.intel.com gpiolib: Disable compat ->read() code in UML case
Atish Patra atish.patra@wdc.com RISC-V: Make sure memblock reserves the memory containing DT
Eric Dumazet edumazet@google.com macsec: avoid use-after-free in macsec_handle_frame()
Chaitanya Kulkarni chaitanya.kulkarni@wdc.com nvme-core: put ctrl ref when module ref get fail
Aaron Ma aaron.ma@canonical.com platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse
Hans de Goede hdegoede@redhat.com platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting
Heiner Kallweit hkallweit1@gmail.com r8169: consider that PHY reset may still be in progress after applying firmware
Tony Ambardar tony.ambardar@gmail.com bpf: Prevent .BTF section elimination
Tony Ambardar tony.ambardar@gmail.com bpf: Fix sysfs export of empty BTF section
Hans de Goede hdegoede@redhat.com platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models
Tom Rix trix@redhat.com platform/x86: thinkpad_acpi: initialize tp_nvram_state variable
Hans de Goede hdegoede@redhat.com platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360
Dinghao Liu dinghao.liu@zju.edu.cn Platform: OLPC: Fix memleak in olpc_ec_probe
Linus Torvalds torvalds@linux-foundation.org splice: teach splice pipe reading about empty pipe buffers
Linus Torvalds torvalds@linux-foundation.org usermodehelper: reset umask to default before executing user process
Greg Kurz groug@kaod.org vhost: Use vhost_get_used_size() in vhost_vring_set_addr()
Greg Kurz groug@kaod.org vhost: Don't call access_ok() when using IOTLB
Peilin Ye yepeilin.cs@gmail.com block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()
Christoph Hellwig hch@lst.de partitions/ibm: fix non-DASD devices
Karol Herbst kherbst@redhat.com drm/nouveau/mem: guard against NULL pointer access in mem_del
Karol Herbst kherbst@redhat.com drm/nouveau/device: return error for unknown chipsets
Anant Thazhemadam anant.thazhemadam@gmail.com net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
Namjae Jeon namjae.jeon@samsung.com exfat: fix use of uninitialized spinlock on error path
Jeremy Linton jeremy.linton@arm.com crypto: arm64: Use x16 with indirect branch to bti_c
Daniel Borkmann daniel@iogearbox.net bpf: Fix scalar32_min_max_or bounds tracking
Geert Uytterhoeven geert+renesas@glider.be Revert "ravb: Fixed to be able to unload modules"
Peilin Ye yepeilin.cs@gmail.com fbcon: Fix global-out-of-bounds read in fbcon_get_font()
Peilin Ye yepeilin.cs@gmail.com Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
Peilin Ye yepeilin.cs@gmail.com fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
-------------
Diffstat:
Makefile | 4 +- arch/arm64/crypto/aes-neonbs-core.S | 4 +- arch/riscv/mm/init.c | 1 + block/partitions/ibm.c | 7 +- block/scsi_ioctl.c | 1 + drivers/gpio/gpiolib.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c | 2 +- drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 10 +- drivers/gpu/drm/nouveau/nouveau_mem.c | 2 + drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 1 + drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_thp.c | 2 +- drivers/i2c/busses/i2c-meson.c | 52 +++--- drivers/i2c/busses/i2c-owl.c | 6 + drivers/input/misc/ati_remote2.c | 4 +- drivers/iommu/intel/iommu.c | 4 +- drivers/mmc/core/queue.c | 2 +- drivers/net/bonding/bond_main.c | 1 + drivers/net/dsa/ocelot/felix_vsc9959.c | 35 +++- drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 +- drivers/net/ethernet/huawei/hinic/Kconfig | 1 + .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.c | 27 ++- .../net/ethernet/huawei/hinic/hinic_hw_api_cmd.h | 4 + drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 2 + drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h | 2 + drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c | 12 +- drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c | 39 +++++ drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h | 6 +- drivers/net/ethernet/huawei/hinic/hinic_hw_if.c | 23 +++ drivers/net/ethernet/huawei/hinic/hinic_hw_if.h | 10 +- drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c | 2 + drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c | 1 + drivers/net/ethernet/huawei/hinic/hinic_port.c | 68 ++++---- drivers/net/ethernet/huawei/hinic/hinic_sriov.c | 18 +- drivers/net/ethernet/intel/iavf/iavf_main.c | 49 ++---- drivers/net/ethernet/intel/ice/ice_lib.c | 20 ++- drivers/net/ethernet/intel/ice/ice_lib.h | 6 - drivers/net/ethernet/intel/ice/ice_main.c | 13 +- drivers/net/ethernet/marvell/mvneta.c | 13 +- drivers/net/ethernet/marvell/octeontx2/af/mbox.c | 12 +- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 + drivers/net/ethernet/marvell/octeontx2/af/rvu.h | 3 +- .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 5 +- .../net/ethernet/marvell/octeontx2/af/rvu_npc.c | 26 ++- .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 16 +- .../net/ethernet/marvell/octeontx2/nic/otx2_txrx.c | 1 + .../net/ethernet/marvell/octeontx2/nic/otx2_vf.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 181 +++++++++++++++------ drivers/net/ethernet/mellanox/mlx5/core/en.h | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en/port.c | 3 + .../net/ethernet/mellanox/mlx5/core/en/rep/neigh.c | 81 +++++---- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 14 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 74 ++++++++- drivers/net/ethernet/mellanox/mlx5/core/en_rep.h | 6 - drivers/net/ethernet/mellanox/mlx5/core/eq.c | 42 ++++- drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 2 +- .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c | 3 +- drivers/net/ethernet/mscc/Makefile | 2 +- drivers/net/ethernet/mscc/ocelot.c | 43 ++--- .../mscc/{ocelot_board.c => ocelot_vsc7514.c} | 13 ++ drivers/net/ethernet/realtek/r8169_main.c | 11 +- drivers/net/ethernet/renesas/ravb_main.c | 110 ++++++------- drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 - drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 + .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 27 +-- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 23 ++- drivers/net/macsec.c | 4 +- drivers/net/phy/Kconfig | 1 + drivers/net/phy/realtek.c | 31 ++-- drivers/net/team/team.c | 3 +- drivers/net/usb/ax88179_178a.c | 1 + drivers/net/usb/rtl8150.c | 16 +- drivers/net/virtio_net.c | 8 +- drivers/net/vmxnet3/vmxnet3_drv.c | 5 +- drivers/net/vmxnet3/vmxnet3_ethtool.c | 28 ++++ drivers/net/vmxnet3/vmxnet3_int.h | 4 + drivers/nvme/host/core.c | 4 +- drivers/nvme/host/tcp.c | 7 +- drivers/platform/olpc/olpc-ec.c | 4 +- drivers/platform/x86/Kconfig | 2 + drivers/platform/x86/asus-nb-wmi.c | 32 ++++ drivers/platform/x86/asus-wmi.c | 16 +- drivers/platform/x86/asus-wmi.h | 1 + drivers/platform/x86/intel-vbtn.c | 64 ++++++-- drivers/platform/x86/thinkpad_acpi.c | 6 +- drivers/tty/vt/selection.c | 2 +- drivers/vhost/vdpa.c | 122 ++++++++------ drivers/vhost/vhost.c | 12 +- drivers/video/console/newport_con.c | 7 +- drivers/video/fbdev/core/fbcon.c | 12 ++ drivers/video/fbdev/core/fbcon.h | 7 - drivers/video/fbdev/core/fbcon_rotate.c | 1 + drivers/video/fbdev/core/tileblit.c | 1 + fs/afs/inode.c | 47 +++++- fs/afs/internal.h | 1 + fs/afs/write.c | 11 ++ fs/btrfs/dev-replace.c | 6 +- fs/btrfs/volumes.c | 13 +- fs/btrfs/volumes.h | 3 + fs/cifs/smb2ops.c | 2 +- fs/exfat/cache.c | 11 -- fs/exfat/exfat_fs.h | 3 +- fs/exfat/inode.c | 2 - fs/exfat/super.c | 5 +- fs/io_uring.c | 19 ++- fs/pipe.c | 11 +- fs/splice.c | 20 +++ include/asm-generic/vmlinux.lds.h | 2 +- include/linux/font.h | 13 ++ include/linux/khugepaged.h | 5 + include/linux/mlx5/driver.h | 2 + include/linux/net.h | 16 ++ include/linux/pagemap.h | 3 +- include/linux/watch_queue.h | 6 + include/net/act_api.h | 2 - include/net/netlink.h | 3 +- include/net/xfrm.h | 16 +- include/soc/mscc/ocelot.h | 1 + kernel/bpf/sysfs_btf.c | 6 +- kernel/bpf/verifier.c | 8 +- kernel/events/core.c | 5 +- kernel/umh.c | 9 + lib/fonts/font_10x18.c | 9 +- lib/fonts/font_6x10.c | 9 +- lib/fonts/font_6x11.c | 9 +- lib/fonts/font_7x14.c | 9 +- lib/fonts/font_8x16.c | 9 +- lib/fonts/font_8x8.c | 9 +- lib/fonts/font_acorn_8x8.c | 9 +- lib/fonts/font_mini_4x6.c | 8 +- lib/fonts/font_pearl_8x8.c | 9 +- lib/fonts/font_sun12x22.c | 9 +- lib/fonts/font_sun8x16.c | 7 +- lib/fonts/font_ter16x32.c | 9 +- mm/khugepaged.c | 25 ++- mm/page_alloc.c | 3 + net/bridge/br_fdb.c | 2 + net/core/skbuff.c | 4 +- net/ipv4/tcp.c | 3 +- net/ipv4/tcp_ipv4.c | 6 +- net/netlink/genetlink.c | 9 +- net/netlink/policy.c | 24 ++- net/openvswitch/conntrack.c | 22 ++- net/qrtr/ns.c | 34 +++- net/rxrpc/conn_event.c | 6 +- net/rxrpc/key.c | 20 ++- net/sched/act_api.c | 52 ++++-- net/sched/act_bpf.c | 4 +- net/sched/act_connmark.c | 1 - net/sched/act_csum.c | 3 - net/sched/act_ct.c | 2 - net/sched/act_ctinfo.c | 3 - net/sched/act_gact.c | 2 - net/sched/act_gate.c | 3 - net/sched/act_ife.c | 3 - net/sched/act_ipt.c | 2 - net/sched/act_mirred.c | 2 - net/sched/act_mpls.c | 2 - net/sched/act_nat.c | 3 - net/sched/act_pedit.c | 2 - net/sched/act_police.c | 2 - net/sched/act_sample.c | 2 - net/sched/act_simple.c | 2 - net/sched/act_skbedit.c | 2 - net/sched/act_skbmod.c | 2 - net/sched/act_tunnel_key.c | 3 - net/sched/act_vlan.c | 2 - net/sctp/auth.c | 1 + net/tls/tls_sw.c | 9 +- net/wireless/nl80211.c | 3 + net/xdp/xsk.c | 17 +- net/xfrm/espintcp.c | 6 +- net/xfrm/xfrm_interface.c | 2 +- net/xfrm/xfrm_state.c | 42 ++++- 176 files changed, 1519 insertions(+), 756 deletions(-)
From: Peilin Ye yepeilin.cs@gmail.com
commit bb0890b4cd7f8203e3aa99c6d0f062d6acdaad27 upstream.
drivers/video/console/newport_con.c is borrowing FONT_EXTRA_WORDS macros from drivers/video/fbdev/core/fbcon.h. To keep things simple, move all definitions into <linux/font.h>.
Since newport_con now uses four extra words, initialize the fourth word in newport_set_font() properly.
Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye yepeilin.cs@gmail.com Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/7fb8bc9b0abc676ada6b7ac0e0bd44... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/video/console/newport_con.c | 7 +------ drivers/video/fbdev/core/fbcon.h | 7 ------- drivers/video/fbdev/core/fbcon_rotate.c | 1 + drivers/video/fbdev/core/tileblit.c | 1 + include/linux/font.h | 8 ++++++++ 5 files changed, 11 insertions(+), 13 deletions(-)
--- a/drivers/video/console/newport_con.c +++ b/drivers/video/console/newport_con.c @@ -35,12 +35,6 @@
#define FONT_DATA ((unsigned char *)font_vga_8x16.data)
-/* borrowed from fbcon.c */ -#define REFCOUNT(fd) (((int *)(fd))[-1]) -#define FNTSIZE(fd) (((int *)(fd))[-2]) -#define FNTCHARCNT(fd) (((int *)(fd))[-3]) -#define FONT_EXTRA_WORDS 3 - static unsigned char *font_data[MAX_NR_CONSOLES];
static struct newport_regs *npregs; @@ -522,6 +516,7 @@ static int newport_set_font(int unit, st FNTSIZE(new_data) = size; FNTCHARCNT(new_data) = op->charcount; REFCOUNT(new_data) = 0; /* usage counter */ + FNTSUM(new_data) = 0;
p = new_data; for (i = 0; i < op->charcount; i++) { --- a/drivers/video/fbdev/core/fbcon.h +++ b/drivers/video/fbdev/core/fbcon.h @@ -152,13 +152,6 @@ static inline int attr_col_ec(int shift, #define attr_bgcol_ec(bgshift, vc, info) attr_col_ec(bgshift, vc, info, 0) #define attr_fgcol_ec(fgshift, vc, info) attr_col_ec(fgshift, vc, info, 1)
-/* Font */ -#define REFCOUNT(fd) (((int *)(fd))[-1]) -#define FNTSIZE(fd) (((int *)(fd))[-2]) -#define FNTCHARCNT(fd) (((int *)(fd))[-3]) -#define FNTSUM(fd) (((int *)(fd))[-4]) -#define FONT_EXTRA_WORDS 4 - /* * Scroll Method */ --- a/drivers/video/fbdev/core/fbcon_rotate.c +++ b/drivers/video/fbdev/core/fbcon_rotate.c @@ -14,6 +14,7 @@ #include <linux/fb.h> #include <linux/vt_kern.h> #include <linux/console.h> +#include <linux/font.h> #include <asm/types.h> #include "fbcon.h" #include "fbcon_rotate.h" --- a/drivers/video/fbdev/core/tileblit.c +++ b/drivers/video/fbdev/core/tileblit.c @@ -13,6 +13,7 @@ #include <linux/fb.h> #include <linux/vt_kern.h> #include <linux/console.h> +#include <linux/font.h> #include <asm/types.h> #include "fbcon.h"
--- a/include/linux/font.h +++ b/include/linux/font.h @@ -59,4 +59,12 @@ extern const struct font_desc *get_defau /* Max. length for the name of a predefined font */ #define MAX_FONT_NAME 32
+/* Extra word getters */ +#define REFCOUNT(fd) (((int *)(fd))[-1]) +#define FNTSIZE(fd) (((int *)(fd))[-2]) +#define FNTCHARCNT(fd) (((int *)(fd))[-3]) +#define FNTSUM(fd) (((int *)(fd))[-4]) + +#define FONT_EXTRA_WORDS 4 + #endif /* _VIDEO_FONT_H */
From: Peilin Ye yepeilin.cs@gmail.com
commit 6735b4632def0640dbdf4eb9f99816aca18c4f16 upstream.
syzbot has reported an issue in the framebuffer layer, where a malicious user may overflow our built-in font data buffers.
In order to perform a reliable range check, subsystems need to know `FONTDATAMAX` for each built-in font. Unfortunately, our font descriptor, `struct console_font` does not contain `FONTDATAMAX`, and is part of the UAPI, making it infeasible to modify it.
For user-provided fonts, the framebuffer layer resolves this issue by reserving four extra words at the beginning of data buffers. Later, whenever a function needs to access them, it simply uses the following macros:
Recently we have gathered all the above macros to <linux/font.h>. Let us do the same thing for built-in fonts, prepend four extra words (including `FONTDATAMAX`) to their data buffers, so that subsystems can use these macros for all fonts, no matter built-in or user-provided.
This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h".
Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecf... Signed-off-by: Peilin Ye yepeilin.cs@gmail.com Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/ef18af00c35fb3cc826048a5f70924... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/font.h | 5 +++++ lib/fonts/font_10x18.c | 9 ++++----- lib/fonts/font_6x10.c | 9 +++++---- lib/fonts/font_6x11.c | 9 ++++----- lib/fonts/font_7x14.c | 9 ++++----- lib/fonts/font_8x16.c | 9 ++++----- lib/fonts/font_8x8.c | 9 ++++----- lib/fonts/font_acorn_8x8.c | 9 ++++++--- lib/fonts/font_mini_4x6.c | 8 ++++---- lib/fonts/font_pearl_8x8.c | 9 ++++----- lib/fonts/font_sun12x22.c | 9 ++++----- lib/fonts/font_sun8x16.c | 7 ++++--- lib/fonts/font_ter16x32.c | 9 ++++----- 13 files changed, 56 insertions(+), 54 deletions(-)
--- a/include/linux/font.h +++ b/include/linux/font.h @@ -67,4 +67,9 @@ extern const struct font_desc *get_defau
#define FONT_EXTRA_WORDS 4
+struct font_data { + unsigned int extra[FONT_EXTRA_WORDS]; + const unsigned char data[]; +} __packed; + #endif /* _VIDEO_FONT_H */ --- a/lib/fonts/font_10x18.c +++ b/lib/fonts/font_10x18.c @@ -8,8 +8,8 @@
#define FONTDATAMAX 9216
-static const unsigned char fontdata_10x18[FONTDATAMAX] = { - +static struct font_data fontdata_10x18 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, 0x00, /* 0000000000 */ 0x00, 0x00, /* 0000000000 */ @@ -5129,8 +5129,7 @@ static const unsigned char fontdata_10x1 0x00, 0x00, /* 0000000000 */ 0x00, 0x00, /* 0000000000 */ 0x00, 0x00, /* 0000000000 */ - -}; +} };
const struct font_desc font_10x18 = { @@ -5138,7 +5137,7 @@ const struct font_desc font_10x18 = { .name = "10x18", .width = 10, .height = 18, - .data = fontdata_10x18, + .data = fontdata_10x18.data, #ifdef __sparc__ .pref = 5, #else --- a/lib/fonts/font_6x10.c +++ b/lib/fonts/font_6x10.c @@ -1,8 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/font.h>
-static const unsigned char fontdata_6x10[] = { +#define FONTDATAMAX 2560
+static struct font_data fontdata_6x10 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ @@ -3074,14 +3076,13 @@ static const unsigned char fontdata_6x10 0x00, /* 00000000 */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ - -}; +} };
const struct font_desc font_6x10 = { .idx = FONT6x10_IDX, .name = "6x10", .width = 6, .height = 10, - .data = fontdata_6x10, + .data = fontdata_6x10.data, .pref = 0, }; --- a/lib/fonts/font_6x11.c +++ b/lib/fonts/font_6x11.c @@ -9,8 +9,8 @@
#define FONTDATAMAX (11*256)
-static const unsigned char fontdata_6x11[FONTDATAMAX] = { - +static struct font_data fontdata_6x11 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ @@ -3338,8 +3338,7 @@ static const unsigned char fontdata_6x11 0x00, /* 00000000 */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ - -}; +} };
const struct font_desc font_vga_6x11 = { @@ -3347,7 +3346,7 @@ const struct font_desc font_vga_6x11 = { .name = "ProFont6x11", .width = 6, .height = 11, - .data = fontdata_6x11, + .data = fontdata_6x11.data, /* Try avoiding this font if possible unless on MAC */ .pref = -2000, }; --- a/lib/fonts/font_7x14.c +++ b/lib/fonts/font_7x14.c @@ -8,8 +8,8 @@
#define FONTDATAMAX 3584
-static const unsigned char fontdata_7x14[FONTDATAMAX] = { - +static struct font_data fontdata_7x14 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, /* 0000000 */ 0x00, /* 0000000 */ @@ -4105,8 +4105,7 @@ static const unsigned char fontdata_7x14 0x00, /* 0000000 */ 0x00, /* 0000000 */ 0x00, /* 0000000 */ - -}; +} };
const struct font_desc font_7x14 = { @@ -4114,6 +4113,6 @@ const struct font_desc font_7x14 = { .name = "7x14", .width = 7, .height = 14, - .data = fontdata_7x14, + .data = fontdata_7x14.data, .pref = 0, }; --- a/lib/fonts/font_8x16.c +++ b/lib/fonts/font_8x16.c @@ -10,8 +10,8 @@
#define FONTDATAMAX 4096
-static const unsigned char fontdata_8x16[FONTDATAMAX] = { - +static struct font_data fontdata_8x16 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ @@ -4619,8 +4619,7 @@ static const unsigned char fontdata_8x16 0x00, /* 00000000 */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ - -}; +} };
const struct font_desc font_vga_8x16 = { @@ -4628,7 +4627,7 @@ const struct font_desc font_vga_8x16 = { .name = "VGA8x16", .width = 8, .height = 16, - .data = fontdata_8x16, + .data = fontdata_8x16.data, .pref = 0, }; EXPORT_SYMBOL(font_vga_8x16); --- a/lib/fonts/font_8x8.c +++ b/lib/fonts/font_8x8.c @@ -9,8 +9,8 @@
#define FONTDATAMAX 2048
-static const unsigned char fontdata_8x8[FONTDATAMAX] = { - +static struct font_data fontdata_8x8 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ @@ -2570,8 +2570,7 @@ static const unsigned char fontdata_8x8[ 0x00, /* 00000000 */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ - -}; +} };
const struct font_desc font_vga_8x8 = { @@ -2579,6 +2578,6 @@ const struct font_desc font_vga_8x8 = { .name = "VGA8x8", .width = 8, .height = 8, - .data = fontdata_8x8, + .data = fontdata_8x8.data, .pref = 0, }; --- a/lib/fonts/font_acorn_8x8.c +++ b/lib/fonts/font_acorn_8x8.c @@ -3,7 +3,10 @@
#include <linux/font.h>
-static const unsigned char acorndata_8x8[] = { +#define FONTDATAMAX 2048 + +static struct font_data acorndata_8x8 = { +{ 0, 0, FONTDATAMAX, 0 }, { /* 00 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* ^@ */ /* 01 */ 0x7e, 0x81, 0xa5, 0x81, 0xbd, 0x99, 0x81, 0x7e, /* ^A */ /* 02 */ 0x7e, 0xff, 0xbd, 0xff, 0xc3, 0xe7, 0xff, 0x7e, /* ^B */ @@ -260,14 +263,14 @@ static const unsigned char acorndata_8x8 /* FD */ 0x38, 0x04, 0x18, 0x20, 0x3c, 0x00, 0x00, 0x00, /* FE */ 0x00, 0x00, 0x3c, 0x3c, 0x3c, 0x3c, 0x00, 0x00, /* FF */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 -}; +} };
const struct font_desc font_acorn_8x8 = { .idx = ACORN8x8_IDX, .name = "Acorn8x8", .width = 8, .height = 8, - .data = acorndata_8x8, + .data = acorndata_8x8.data, #ifdef CONFIG_ARCH_ACORN .pref = 20, #else --- a/lib/fonts/font_mini_4x6.c +++ b/lib/fonts/font_mini_4x6.c @@ -43,8 +43,8 @@ __END__;
#define FONTDATAMAX 1536
-static const unsigned char fontdata_mini_4x6[FONTDATAMAX] = { - +static struct font_data fontdata_mini_4x6 = { + { 0, 0, FONTDATAMAX, 0 }, { /*{*/ /* Char 0: ' ' */ 0xee, /*= [*** ] */ @@ -2145,14 +2145,14 @@ static const unsigned char fontdata_mini 0xee, /*= [*** ] */ 0x00, /*= [ ] */ /*}*/ -}; +} };
const struct font_desc font_mini_4x6 = { .idx = MINI4x6_IDX, .name = "MINI4x6", .width = 4, .height = 6, - .data = fontdata_mini_4x6, + .data = fontdata_mini_4x6.data, .pref = 3, };
--- a/lib/fonts/font_pearl_8x8.c +++ b/lib/fonts/font_pearl_8x8.c @@ -14,8 +14,8 @@
#define FONTDATAMAX 2048
-static const unsigned char fontdata_pearl8x8[FONTDATAMAX] = { - +static struct font_data fontdata_pearl8x8 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ @@ -2575,14 +2575,13 @@ static const unsigned char fontdata_pear 0x00, /* 00000000 */ 0x00, /* 00000000 */ 0x00, /* 00000000 */ - -}; +} };
const struct font_desc font_pearl_8x8 = { .idx = PEARL8x8_IDX, .name = "PEARL8x8", .width = 8, .height = 8, - .data = fontdata_pearl8x8, + .data = fontdata_pearl8x8.data, .pref = 2, }; --- a/lib/fonts/font_sun12x22.c +++ b/lib/fonts/font_sun12x22.c @@ -3,8 +3,8 @@
#define FONTDATAMAX 11264
-static const unsigned char fontdata_sun12x22[FONTDATAMAX] = { - +static struct font_data fontdata_sun12x22 = { + { 0, 0, FONTDATAMAX, 0 }, { /* 0 0x00 '^@' */ 0x00, 0x00, /* 000000000000 */ 0x00, 0x00, /* 000000000000 */ @@ -6148,8 +6148,7 @@ static const unsigned char fontdata_sun1 0x00, 0x00, /* 000000000000 */ 0x00, 0x00, /* 000000000000 */ 0x00, 0x00, /* 000000000000 */ - -}; +} };
const struct font_desc font_sun_12x22 = { @@ -6157,7 +6156,7 @@ const struct font_desc font_sun_12x22 = .name = "SUN12x22", .width = 12, .height = 22, - .data = fontdata_sun12x22, + .data = fontdata_sun12x22.data, #ifdef __sparc__ .pref = 5, #else --- a/lib/fonts/font_sun8x16.c +++ b/lib/fonts/font_sun8x16.c @@ -3,7 +3,8 @@
#define FONTDATAMAX 4096
-static const unsigned char fontdata_sun8x16[FONTDATAMAX] = { +static struct font_data fontdata_sun8x16 = { +{ 0, 0, FONTDATAMAX, 0 }, { /* */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* */ 0x00,0x00,0x7e,0x81,0xa5,0x81,0x81,0xbd,0x99,0x81,0x81,0x7e,0x00,0x00,0x00,0x00, /* */ 0x00,0x00,0x7e,0xff,0xdb,0xff,0xff,0xc3,0xe7,0xff,0xff,0x7e,0x00,0x00,0x00,0x00, @@ -260,14 +261,14 @@ static const unsigned char fontdata_sun8 /* */ 0x00,0x70,0xd8,0x30,0x60,0xc8,0xf8,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* */ 0x00,0x00,0x00,0x00,0x7c,0x7c,0x7c,0x7c,0x7c,0x7c,0x7c,0x00,0x00,0x00,0x00,0x00, /* */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, -}; +} };
const struct font_desc font_sun_8x16 = { .idx = SUN8x16_IDX, .name = "SUN8x16", .width = 8, .height = 16, - .data = fontdata_sun8x16, + .data = fontdata_sun8x16.data, #ifdef __sparc__ .pref = 10, #else --- a/lib/fonts/font_ter16x32.c +++ b/lib/fonts/font_ter16x32.c @@ -4,8 +4,8 @@
#define FONTDATAMAX 16384
-static const unsigned char fontdata_ter16x32[FONTDATAMAX] = { - +static struct font_data fontdata_ter16x32 = { + { 0, 0, FONTDATAMAX, 0 }, { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7f, 0xfc, 0x7f, 0xfc, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, @@ -2054,8 +2054,7 @@ static const unsigned char fontdata_ter1 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 255 */ - -}; +} };
const struct font_desc font_ter_16x32 = { @@ -2063,7 +2062,7 @@ const struct font_desc font_ter_16x32 = .name = "TER16x32", .width = 16, .height = 32, - .data = fontdata_ter16x32, + .data = fontdata_ter16x32.data, #ifdef __sparc__ .pref = 5, #else
From: Peilin Ye yepeilin.cs@gmail.com
commit 5af08640795b2b9a940c9266c0260455377ae262 upstream.
fbcon_get_font() is reading out-of-bounds. A malicious user may resize `vc->vc_font.height` to a large value, causing fbcon_get_font() to read out of `fontdata`.
fbcon_get_font() handles both built-in and user-provided fonts. Fortunately, recently we have added FONT_EXTRA_WORDS support for built-in fonts, so fix it by adding range checks using FNTSIZE().
This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h", and patch "Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts".
Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+29d4ed7f3bdedf2aa2fd@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecf... Signed-off-by: Peilin Ye yepeilin.cs@gmail.com Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/b34544687a1a09d6de630659eb7a77... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/video/fbdev/core/fbcon.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -2299,6 +2299,9 @@ static int fbcon_get_font(struct vc_data
if (font->width <= 8) { j = vc->vc_font.height; + if (font->charcount * j > FNTSIZE(fontdata)) + return -EINVAL; + for (i = 0; i < font->charcount; i++) { memcpy(data, fontdata, j); memset(data + j, 0, 32 - j); @@ -2307,6 +2310,9 @@ static int fbcon_get_font(struct vc_data } } else if (font->width <= 16) { j = vc->vc_font.height * 2; + if (font->charcount * j > FNTSIZE(fontdata)) + return -EINVAL; + for (i = 0; i < font->charcount; i++) { memcpy(data, fontdata, j); memset(data + j, 0, 64 - j); @@ -2314,6 +2320,9 @@ static int fbcon_get_font(struct vc_data fontdata += j; } } else if (font->width <= 24) { + if (font->charcount * (vc->vc_font.height * sizeof(u32)) > FNTSIZE(fontdata)) + return -EINVAL; + for (i = 0; i < font->charcount; i++) { for (j = 0; j < vc->vc_font.height; j++) { *data++ = fontdata[0]; @@ -2326,6 +2335,9 @@ static int fbcon_get_font(struct vc_data } } else { j = vc->vc_font.height * 4; + if (font->charcount * j > FNTSIZE(fontdata)) + return -EINVAL; + for (i = 0; i < font->charcount; i++) { memcpy(data, fontdata, j); memset(data + j, 0, 128 - j);
From: Geert Uytterhoeven geert+renesas@glider.be
commit 77972b55fb9d35d4a6b0abca99abffaa4ec6a85b upstream.
This reverts commit 1838d6c62f57836639bd3d83e7855e0ee4f6defc.
This commit moved the ravb_mdio_init() call (and thus the of_mdiobus_register() call) from the ravb_probe() to the ravb_open() call. This causes a regression during system resume (s2idle/s2ram), as new PHY devices cannot be bound while suspended.
During boot, the Micrel PHY is detected like this:
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=228) ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
During system suspend, (A) defer_all_probes is set to true, and (B) usermodehelper_disabled is set to UMH_DISABLED, to avoid drivers being probed while suspended.
A. If CONFIG_MODULES=n, phy_device_register() calling device_add() merely adds the device, but does not probe it yet, as really_probe() returns early due to defer_all_probes being set:
dpm_resume+0x128/0x4f8 device_resume+0xcc/0x1b0 dpm_run_callback+0x74/0x340 ravb_resume+0x190/0x1b8 ravb_open+0x84/0x770 of_mdiobus_register+0x1e0/0x468 of_mdiobus_register_phy+0x1b8/0x250 of_mdiobus_phy_device_register+0x178/0x1e8 phy_device_register+0x114/0x1b8 device_add+0x3d4/0x798 bus_probe_device+0x98/0xa0 device_initial_probe+0x10/0x18 __device_attach+0xe4/0x140 bus_for_each_drv+0x64/0xc8 __device_attach_driver+0xb8/0xe0 driver_probe_device.part.11+0xc4/0xd8 really_probe+0x32c/0x3b8
Later, phy_attach_direct() notices no PHY driver has been bound, and falls back to the Generic PHY, leading to degraded operation:
Generic PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=POLL) ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
B. If CONFIG_MODULES=y, request_module() returns early with -EBUSY due to UMH_DISABLED, and MDIO initialization fails completely:
mdio_bus e6800000.ethernet-ffffffff:00: error -16 loading PHY driver module for ID 0x00221622 ravb e6800000.ethernet eth0: failed to initialize MDIO PM: dpm_run_callback(): ravb_resume+0x0/0x1b8 returns -16 PM: Device e6800000.ethernet failed to resume: error -16
Ignoring -EBUSY in phy_request_driver_module(), like was done for -ENOENT in commit 21e194425abd65b5 ("net: phy: fix issue with loading PHY driver w/o initramfs"), would makes it fall back to the Generic PHY, like in the CONFIG_MODULES=n case.
Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Cc: stable@vger.kernel.org Reviewed-by: Sergei Shtylyov sergei.shtylyov@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/renesas/ravb_main.c | 110 +++++++++++++++---------------- 1 file changed, 55 insertions(+), 55 deletions(-)
--- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1342,51 +1342,6 @@ static inline int ravb_hook_irq(unsigned return error; }
-/* MDIO bus init function */ -static int ravb_mdio_init(struct ravb_private *priv) -{ - struct platform_device *pdev = priv->pdev; - struct device *dev = &pdev->dev; - int error; - - /* Bitbang init */ - priv->mdiobb.ops = &bb_ops; - - /* MII controller setting */ - priv->mii_bus = alloc_mdio_bitbang(&priv->mdiobb); - if (!priv->mii_bus) - return -ENOMEM; - - /* Hook up MII support for ethtool */ - priv->mii_bus->name = "ravb_mii"; - priv->mii_bus->parent = dev; - snprintf(priv->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x", - pdev->name, pdev->id); - - /* Register MDIO bus */ - error = of_mdiobus_register(priv->mii_bus, dev->of_node); - if (error) - goto out_free_bus; - - return 0; - -out_free_bus: - free_mdio_bitbang(priv->mii_bus); - return error; -} - -/* MDIO bus release function */ -static int ravb_mdio_release(struct ravb_private *priv) -{ - /* Unregister mdio bus */ - mdiobus_unregister(priv->mii_bus); - - /* Free bitbang info */ - free_mdio_bitbang(priv->mii_bus); - - return 0; -} - /* Network device open function for Ethernet AVB */ static int ravb_open(struct net_device *ndev) { @@ -1395,13 +1350,6 @@ static int ravb_open(struct net_device * struct device *dev = &pdev->dev; int error;
- /* MDIO bus init */ - error = ravb_mdio_init(priv); - if (error) { - netdev_err(ndev, "failed to initialize MDIO\n"); - return error; - } - napi_enable(&priv->napi[RAVB_BE]); napi_enable(&priv->napi[RAVB_NC]);
@@ -1479,7 +1427,6 @@ out_free_irq: out_napi_off: napi_disable(&priv->napi[RAVB_NC]); napi_disable(&priv->napi[RAVB_BE]); - ravb_mdio_release(priv); return error; }
@@ -1789,8 +1736,6 @@ static int ravb_close(struct net_device ravb_ring_free(ndev, RAVB_BE); ravb_ring_free(ndev, RAVB_NC);
- ravb_mdio_release(priv); - return 0; }
@@ -1942,6 +1887,51 @@ static const struct net_device_ops ravb_ .ndo_set_features = ravb_set_features, };
+/* MDIO bus init function */ +static int ravb_mdio_init(struct ravb_private *priv) +{ + struct platform_device *pdev = priv->pdev; + struct device *dev = &pdev->dev; + int error; + + /* Bitbang init */ + priv->mdiobb.ops = &bb_ops; + + /* MII controller setting */ + priv->mii_bus = alloc_mdio_bitbang(&priv->mdiobb); + if (!priv->mii_bus) + return -ENOMEM; + + /* Hook up MII support for ethtool */ + priv->mii_bus->name = "ravb_mii"; + priv->mii_bus->parent = dev; + snprintf(priv->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x", + pdev->name, pdev->id); + + /* Register MDIO bus */ + error = of_mdiobus_register(priv->mii_bus, dev->of_node); + if (error) + goto out_free_bus; + + return 0; + +out_free_bus: + free_mdio_bitbang(priv->mii_bus); + return error; +} + +/* MDIO bus release function */ +static int ravb_mdio_release(struct ravb_private *priv) +{ + /* Unregister mdio bus */ + mdiobus_unregister(priv->mii_bus); + + /* Free bitbang info */ + free_mdio_bitbang(priv->mii_bus); + + return 0; +} + static const struct of_device_id ravb_match_table[] = { { .compatible = "renesas,etheravb-r8a7790", .data = (void *)RCAR_GEN2 }, { .compatible = "renesas,etheravb-r8a7794", .data = (void *)RCAR_GEN2 }, @@ -2184,6 +2174,13 @@ static int ravb_probe(struct platform_de eth_hw_addr_random(ndev); }
+ /* MDIO bus init */ + error = ravb_mdio_init(priv); + if (error) { + dev_err(&pdev->dev, "failed to initialize MDIO\n"); + goto out_dma_free; + } + netif_napi_add(ndev, &priv->napi[RAVB_BE], ravb_poll, 64); netif_napi_add(ndev, &priv->napi[RAVB_NC], ravb_poll, 64);
@@ -2205,6 +2202,8 @@ static int ravb_probe(struct platform_de out_napi_del: netif_napi_del(&priv->napi[RAVB_NC]); netif_napi_del(&priv->napi[RAVB_BE]); + ravb_mdio_release(priv); +out_dma_free: dma_free_coherent(ndev->dev.parent, priv->desc_bat_size, priv->desc_bat, priv->desc_bat_dma);
@@ -2236,6 +2235,7 @@ static int ravb_remove(struct platform_d unregister_netdev(ndev); netif_napi_del(&priv->napi[RAVB_NC]); netif_napi_del(&priv->napi[RAVB_BE]); + ravb_mdio_release(priv); pm_runtime_disable(&pdev->dev); free_netdev(ndev); platform_set_drvdata(pdev, NULL);
From: Daniel Borkmann daniel@iogearbox.net
commit 5b9fbeb75b6a98955f628e205ac26689bcb1383e upstream.
Simon reported an issue with the current scalar32_min_max_or() implementation. That is, compared to the other 32 bit subreg tracking functions, the code in scalar32_min_max_or() stands out that it's using the 64 bit registers instead of 32 bit ones. This leads to bounds tracking issues, for example:
[...] 8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 8: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 9: (b7) r0 = 1 10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 10: (18) r2 = 0x600000002 12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 12: (ad) if r1 < r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: (95) exit 14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 14: (25) if r1 > 0x0 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: (95) exit 16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 16: (47) r1 |= 0 17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm [...]
The bound tests on the map value force the upper unsigned bound to be 25769803777 in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By using OR they are truncated and thus result in the range [1,1] for the 32 bit reg tracker. This is incorrect given the only thing we know is that the value must be positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes sense, for example, for the case where we update dst_reg->s32_{min,max}_value in the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as we know that these are positive. Previously, in the else branch the 64 bit values of umin_value=1 and umax_value=32212254719 were used and latter got truncated to be 1 as upper bound there. After the fix the subreg range is now correct:
[...] 8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 8: (79) r1 = *(u64 *)(r0 +0) R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm 9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 9: (b7) r0 = 1 10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm 10: (18) r2 = 0x600000002 12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 12: (ad) if r1 < r2 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 13: (95) exit 14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 14: (25) if r1 > 0x0 goto pc+1 R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 15: (95) exit 16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm 16: (47) r1 |= 0 17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm [...]
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Simon Scannell scannell.smn@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/verifier.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5490,8 +5490,8 @@ static void scalar32_min_max_or(struct b bool src_known = tnum_subreg_is_const(src_reg->var_off); bool dst_known = tnum_subreg_is_const(dst_reg->var_off); struct tnum var32_off = tnum_subreg(dst_reg->var_off); - s32 smin_val = src_reg->smin_value; - u32 umin_val = src_reg->umin_value; + s32 smin_val = src_reg->s32_min_value; + u32 umin_val = src_reg->u32_min_value;
/* Assuming scalar64_min_max_or will be called so it is safe * to skip updating register for known case. @@ -5514,8 +5514,8 @@ static void scalar32_min_max_or(struct b /* ORing two positives gives a positive, so safe to * cast result into s64. */ - dst_reg->s32_min_value = dst_reg->umin_value; - dst_reg->s32_max_value = dst_reg->umax_value; + dst_reg->s32_min_value = dst_reg->u32_min_value; + dst_reg->s32_max_value = dst_reg->u32_max_value; } }
From: Jeremy Linton jeremy.linton@arm.com
commit 39e4716caa598a07a98598b2e7cd03055ce25fb9 upstream.
The AES code uses a 'br x7' as part of a function called by a macro. That branch needs a bti_j as a target. This results in a panic as seen below. Using x16 (or x17) with an indirect branch keeps the target bti_c.
Bad mode in Synchronous Abort handler detected on CPU1, code 0x34000003 -- BTI CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1 pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-) pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs] lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs] sp : ffff80001052b730
aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs] __xts_crypt+0xb0/0x2dc [aes_neon_bs] xts_encrypt+0x28/0x3c [aes_neon_bs] crypto_skcipher_encrypt+0x50/0x84 simd_skcipher_encrypt+0xc8/0xe0 crypto_skcipher_encrypt+0x50/0x84 test_skcipher_vec_cfg+0x224/0x5f0 test_skcipher+0xbc/0x120 alg_test_skcipher+0xa0/0x1b0 alg_test+0x3dc/0x47c cryptomgr_test+0x38/0x60
Fixes: 0e89640b640d ("crypto: arm64 - Use modern annotations for assembly functions") Cc: stable@vger.kernel.org # 5.6.x- Signed-off-by: Jeremy Linton jeremy.linton@arm.com Suggested-by: Dave P Martin Dave.Martin@arm.com Reviewed-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Mark Brown broonie@kernel.org Link: https://lore.kernel.org/r/20201006163326.2780619-1-jeremy.linton@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/crypto/aes-neonbs-core.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/crypto/aes-neonbs-core.S +++ b/arch/arm64/crypto/aes-neonbs-core.S @@ -788,7 +788,7 @@ SYM_FUNC_START_LOCAL(__xts_crypt8)
0: mov bskey, x21 mov rounds, x22 - br x7 + br x16 SYM_FUNC_END(__xts_crypt8)
.macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 @@ -806,7 +806,7 @@ SYM_FUNC_END(__xts_crypt8) uzp1 v30.4s, v30.4s, v25.4s ld1 {v25.16b}, [x24]
-99: adr x7, \do8 +99: adr x16, \do8 bl __xts_crypt8
ldp q16, q17, [sp, #.Lframe_local_offset]
From: Namjae Jeon namjae.jeon@samsung.com
commit 8ff006e57ad3a25f909c456d053aa498b6673a39 upstream.
syzbot reported warning message:
Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1d6/0x29e lib/dump_stack.c:118 register_lock_class+0xf06/0x1520 kernel/locking/lockdep.c:893 __lock_acquire+0xfd/0x2ae0 kernel/locking/lockdep.c:4320 lock_acquire+0x148/0x720 kernel/locking/lockdep.c:5029 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] exfat_cache_inval_inode+0x30/0x280 fs/exfat/cache.c:226 exfat_evict_inode+0x124/0x270 fs/exfat/inode.c:660 evict+0x2bb/0x6d0 fs/inode.c:576 exfat_fill_super+0x1e07/0x27d0 fs/exfat/super.c:681 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342 vfs_get_tree+0x88/0x270 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x179d/0x29e0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount+0x126/0x180 fs/namespace.c:3390 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
If exfat_read_root() returns an error, spinlock is used in exfat_evict_inode() without initialization. This patch combines exfat_cache_init_inode() with exfat_inode_init_once() to initialize spinlock by slab constructor.
Fixes: c35b6810c495 ("exfat: add exfat cache") Cc: stable@vger.kernel.org # v5.7+ Reported-by: syzbot syzbot+b91107320911a26c9a95@syzkaller.appspotmail.com Signed-off-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/exfat/cache.c | 11 ----------- fs/exfat/exfat_fs.h | 3 ++- fs/exfat/inode.c | 2 -- fs/exfat/super.c | 5 ++++- 4 files changed, 6 insertions(+), 15 deletions(-)
--- a/fs/exfat/cache.c +++ b/fs/exfat/cache.c @@ -17,7 +17,6 @@ #include "exfat_raw.h" #include "exfat_fs.h"
-#define EXFAT_CACHE_VALID 0 #define EXFAT_MAX_CACHE 16
struct exfat_cache { @@ -61,16 +60,6 @@ void exfat_cache_shutdown(void) kmem_cache_destroy(exfat_cachep); }
-void exfat_cache_init_inode(struct inode *inode) -{ - struct exfat_inode_info *ei = EXFAT_I(inode); - - spin_lock_init(&ei->cache_lru_lock); - ei->nr_caches = 0; - ei->cache_valid_id = EXFAT_CACHE_VALID + 1; - INIT_LIST_HEAD(&ei->cache_lru); -} - static inline struct exfat_cache *exfat_cache_alloc(void) { return kmem_cache_alloc(exfat_cachep, GFP_NOFS); --- a/fs/exfat/exfat_fs.h +++ b/fs/exfat/exfat_fs.h @@ -250,6 +250,8 @@ struct exfat_sb_info { struct rcu_head rcu; };
+#define EXFAT_CACHE_VALID 0 + /* * EXFAT file system inode in-memory data */ @@ -429,7 +431,6 @@ extern const struct dentry_operations ex /* cache.c */ int exfat_cache_init(void); void exfat_cache_shutdown(void); -void exfat_cache_init_inode(struct inode *inode); void exfat_cache_inval_inode(struct inode *inode); int exfat_get_cluster(struct inode *inode, unsigned int cluster, unsigned int *fclus, unsigned int *dclus, --- a/fs/exfat/inode.c +++ b/fs/exfat/inode.c @@ -610,8 +610,6 @@ static int exfat_fill_inode(struct inode ei->i_crtime = info->crtime; inode->i_atime = info->atime;
- exfat_cache_init_inode(inode); - return 0; }
--- a/fs/exfat/super.c +++ b/fs/exfat/super.c @@ -361,7 +361,6 @@ static int exfat_read_root(struct inode inode->i_mtime = inode->i_atime = inode->i_ctime = ei->i_crtime = current_time(inode); exfat_truncate_atime(&inode->i_atime); - exfat_cache_init_inode(inode); return 0; }
@@ -747,6 +746,10 @@ static void exfat_inode_init_once(void * { struct exfat_inode_info *ei = (struct exfat_inode_info *)foo;
+ spin_lock_init(&ei->cache_lru_lock); + ei->nr_caches = 0; + ei->cache_valid_id = EXFAT_CACHE_VALID + 1; + INIT_LIST_HEAD(&ei->cache_lru); INIT_HLIST_NODE(&ei->i_hash_fat); inode_init_once(&ei->vfs_inode); }
From: Anant Thazhemadam anant.thazhemadam@gmail.com
commit 3dc289f8f139997f4e9d3cfccf8738f20d23e47b upstream.
In nl80211_parse_key(), key.idx is first initialized as -1. If this value of key.idx remains unmodified and gets returned, and nl80211_key_allowed() also returns 0, then rdev_del_key() gets called with key.idx = -1. This causes an out-of-bounds array access.
Handle this issue by checking if the value of key.idx after nl80211_parse_key() is called and return -EINVAL if key.idx < 0.
Cc: stable@vger.kernel.org Reported-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com Tested-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam anant.thazhemadam@gmail.com Link: https://lore.kernel.org/r/20201007035401.9522-1-anant.thazhemadam@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/nl80211.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4172,6 +4172,9 @@ static int nl80211_del_key(struct sk_buf if (err) return err;
+ if (key.idx < 0) + return -EINVAL; + if (info->attrs[NL80211_ATTR_MAC]) mac_addr = nla_data(info->attrs[NL80211_ATTR_MAC]);
From: Karol Herbst kherbst@redhat.com
commit c3e0276c31ca8c7b8615da890727481260d4676f upstream.
Previously the code relied on device->pri to be NULL and to fail probing later. We really should just return an error inside nvkm_device_ctor for unsupported GPUs.
Fixes: 24d5ff40a732 ("drm/nouveau/device: rework mmio mapping code to get rid of second map")
Signed-off-by: Karol Herbst kherbst@redhat.com Cc: dann frazier dann.frazier@canonical.com Cc: dri-devel dri-devel@lists.freedesktop.org Cc: Dave Airlie airlied@redhat.com Cc: stable@vger.kernel.org Reviewed-by: Jeremy Cline jcline@redhat.com Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-1-kherbst... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c @@ -3149,6 +3149,7 @@ nvkm_device_ctor(const struct nvkm_devic case 0x168: device->chip = &nv168_chipset; break; default: nvdev_error(device, "unknown chipset (%08x)\n", boot0); + ret = -ENODEV; goto done; }
From: Karol Herbst kherbst@redhat.com
commit d10285a25e29f13353bbf7760be8980048c1ef2f upstream.
other drivers seems to do something similar
Signed-off-by: Karol Herbst kherbst@redhat.com Cc: dri-devel dri-devel@lists.freedesktop.org Cc: Dave Airlie airlied@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-2-kherbst... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/nouveau_mem.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/nouveau/nouveau_mem.c +++ b/drivers/gpu/drm/nouveau/nouveau_mem.c @@ -176,6 +176,8 @@ void nouveau_mem_del(struct ttm_mem_reg *reg) { struct nouveau_mem *mem = nouveau_mem(reg); + if (!mem) + return; nouveau_mem_fini(mem); kfree(reg->mm_node); reg->mm_node = NULL;
From: Christoph Hellwig hch@lst.de
commit 7370997d48520ad923e8eb4deb59ebf290396202 upstream.
Don't error out if the dasd_biodasdinfo symbol is not available.
Cc: stable@vger.kernel.org Fixes: 26d7e28e3820 ("s390/dasd: remove ioctl_by_bdev calls") Reported-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Christoph Hellwig hch@lst.de Tested-by: Christian Borntraeger borntraeger@de.ibm.com Reviewed-by: Stefan Haberland sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/partitions/ibm.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/block/partitions/ibm.c +++ b/block/partitions/ibm.c @@ -305,8 +305,6 @@ int ibm_partition(struct parsed_partitio if (!disk->fops->getgeo) goto out_exit; fn = symbol_get(dasd_biodasdinfo); - if (!fn) - goto out_exit; blocksize = bdev_logical_block_size(bdev); if (blocksize <= 0) goto out_symbol; @@ -326,7 +324,7 @@ int ibm_partition(struct parsed_partitio geo->start = get_start_sect(bdev); if (disk->fops->getgeo(bdev, geo)) goto out_freeall; - if (fn(disk, info)) { + if (!fn || fn(disk, info)) { kfree(info); info = NULL; } @@ -370,7 +368,8 @@ out_nolab: out_nogeo: kfree(info); out_symbol: - symbol_put(dasd_biodasdinfo); + if (fn) + symbol_put(dasd_biodasdinfo); out_exit: return res; }
From: Peilin Ye yepeilin.cs@gmail.com
commit 6d53a9fe5a1983490bc14b3a64d49fabb4ccc651 upstream.
scsi_put_cdrom_generic_arg() is copying uninitialized stack memory to userspace, since the compiler may leave a 3-byte hole in the middle of `cgc32`. Fix it by adding a padding field to `struct compat_cdrom_generic_command`.
Cc: stable@vger.kernel.org Fixes: f3ee6e63a9df ("compat_ioctl: move CDROM_SEND_PACKET handling into scsi") Suggested-by: Dan Carpenter dan.carpenter@oracle.com Suggested-by: Arnd Bergmann arnd@arndb.de Reported-by: syzbot+85433a479a646a064ab3@syzkaller.appspotmail.com Signed-off-by: Peilin Ye yepeilin.cs@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/scsi_ioctl.c | 1 + 1 file changed, 1 insertion(+)
--- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -651,6 +651,7 @@ struct compat_cdrom_generic_command { compat_int_t stat; compat_caddr_t sense; unsigned char data_direction; + unsigned char pad[3]; compat_int_t quiet; compat_int_t timeout; compat_caddr_t reserved[1];
From: Greg Kurz groug@kaod.org
commit 0210a8db2aeca393fb3067e234967877e3146266 upstream.
When the IOTLB device is enabled, the vring addresses we get from userspace are GIOVAs. It is thus wrong to pass them down to access_ok() which only takes HVAs.
Access validation is done at prefetch time with IOTLB. Teach vq_access_ok() about that by moving the (vq->iotlb) check from vhost_vq_access_ok() to vq_access_ok(). This prevents vhost_vring_set_addr() to fail when verifying the accesses. No behavior change for vhost_vq_access_ok().
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084 Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Cc: jasowang@redhat.com CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kurz groug@kaod.org Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahi... Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vhost/vhost.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1283,6 +1283,11 @@ static bool vq_access_ok(struct vhost_vi vring_used_t __user *used)
{ + /* If an IOTLB device is present, the vring addresses are + * GIOVAs. Access validation occurs at prefetch time. */ + if (vq->iotlb) + return true; + return access_ok(desc, vhost_get_desc_size(vq, num)) && access_ok(avail, vhost_get_avail_size(vq, num)) && access_ok(used, vhost_get_used_size(vq, num)); @@ -1376,10 +1381,6 @@ bool vhost_vq_access_ok(struct vhost_vir if (!vq_log_access_ok(vq, vq->log_base)) return false;
- /* Access validation occurs at prefetch time with IOTLB */ - if (vq->iotlb) - return true; - return vq_access_ok(vq, vq->num, vq->desc, vq->avail, vq->used); } EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
From: Greg Kurz groug@kaod.org
commit 71878fa46c7e3b40fa7b3f1b6e4ba3f92f1ac359 upstream.
The open-coded computation of the used size doesn't take the event into account when the VIRTIO_RING_F_EVENT_IDX feature is present. Fix that by using vhost_get_used_size().
Fixes: 8ea8cf89e19a ("vhost: support event index") Cc: stable@vger.kernel.org Signed-off-by: Greg Kurz groug@kaod.org Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bah... Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vhost/vhost.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1512,8 +1512,7 @@ static long vhost_vring_set_addr(struct /* Also validate log access for used ring if enabled. */ if ((a.flags & (0x1 << VHOST_VRING_F_LOG)) && !log_access_ok(vq->log_base, a.log_guest_addr, - sizeof *vq->used + - vq->num * sizeof *vq->used->ring)) + vhost_get_used_size(vq, vq->num))) return -EINVAL; }
From: Linus Torvalds torvalds@linux-foundation.org
commit 4013c1496c49615d90d36b9d513eee8e369778e9 upstream.
Kernel threads intentionally do CLONE_FS in order to follow any changes that 'init' does to set up the root directory (or cwd).
It is admittedly a bit odd, but it avoids the situation where 'init' does some extensive setup to initialize the system environment, and then we execute a usermode helper program, and it uses the original FS setup from boot time that may be very limited and incomplete.
[ Both Al Viro and Eric Biederman point out that 'pivot_root()' will follow the root regardless, since it fixes up other users of root (see chroot_fs_refs() for details), but overmounting root and doing a chroot() would not. ]
However, Vegard Nossum noticed that the CLONE_FS not only means that we follow the root and current working directories, it also means we share umask with whatever init changed it to. That wasn't intentional.
Just reset umask to the original default (0022) before actually starting the usermode helper program.
Reported-by: Vegard Nossum vegard.nossum@oracle.com Cc: Al Viro viro@zeniv.linux.org.uk Acked-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/umh.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/kernel/umh.c +++ b/kernel/umh.c @@ -14,6 +14,7 @@ #include <linux/cred.h> #include <linux/file.h> #include <linux/fdtable.h> +#include <linux/fs_struct.h> #include <linux/workqueue.h> #include <linux/security.h> #include <linux/mount.h> @@ -76,6 +77,14 @@ static int call_usermodehelper_exec_asyn spin_unlock_irq(¤t->sighand->siglock);
/* + * Initial kernel threads share ther FS with init, in order to + * get the init root directory. But we've now created a new + * thread that is going to execve a user process and has its own + * 'struct fs_struct'. Reset umask to the default. + */ + current->fs->umask = 0022; + + /* * Our parent (unbound workqueue) runs with elevated scheduling * priority. Avoid propagating that into the userspace child. */
From: Linus Torvalds torvalds@linux-foundation.org
commit d1a819a2ec2d3b5e6a8f8a9f67386bda0ad315bc upstream.
Tetsuo Handa reports that splice() can return 0 before the real EOF, if the data in the splice source pipe is an empty pipe buffer. That empty pipe buffer case doesn't happen in any normal situation, but you can trigger it by doing a write to a pipe that fails due to a page fault.
Tetsuo has a test-case to show the behavior:
#define _GNU_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h>
int main(int argc, char *argv[]) { const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600); int pipe_fd[2] = { -1, -1 }; pipe(pipe_fd); write(pipe_fd[1], NULL, 4096); /* This splice() should wait unless interrupted. */ return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0); }
which results in
write(5, NULL, 4096) = -1 EFAULT (Bad address) splice(4, NULL, 3, NULL, 65536, 0) = 0
and this can confuse splice() users into believing they have hit EOF prematurely.
The issue was introduced when the pipe write code started pre-allocating the pipe buffers before copying data from user space.
This is modified verion of Tetsuo's original patch.
Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot") Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I... Reported-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Acked-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/splice.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/fs/splice.c +++ b/fs/splice.c @@ -526,6 +526,22 @@ static int splice_from_pipe_feed(struct return 1; }
+/* We know we have a pipe buffer, but maybe it's empty? */ +static inline bool eat_empty_buffer(struct pipe_inode_info *pipe) +{ + unsigned int tail = pipe->tail; + unsigned int mask = pipe->ring_size - 1; + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; + + if (unlikely(!buf->len)) { + pipe_buf_release(pipe, buf); + pipe->tail = tail+1; + return true; + } + + return false; +} + /** * splice_from_pipe_next - wait for some data to splice from * @pipe: pipe to splice from @@ -545,6 +561,7 @@ static int splice_from_pipe_next(struct if (signal_pending(current)) return -ERESTARTSYS;
+repeat: while (pipe_empty(pipe->head, pipe->tail)) { if (!pipe->writers) return 0; @@ -566,6 +583,9 @@ static int splice_from_pipe_next(struct pipe_wait_readable(pipe); }
+ if (eat_empty_buffer(pipe)) + goto repeat; + return 1; }
From: Dinghao Liu dinghao.liu@zju.edu.cn
commit 4fd9ac6bd3044734a7028bd993944c3617d1eede upstream.
When devm_regulator_register() fails, ec should be freed just like when olpc_ec_cmd() fails.
Fixes: 231c0c216172a ("Platform: OLPC: Add a regulator for the DCON") Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/olpc/olpc-ec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/platform/olpc/olpc-ec.c +++ b/drivers/platform/olpc/olpc-ec.c @@ -439,7 +439,9 @@ static int olpc_ec_probe(struct platform &config); if (IS_ERR(ec->dcon_rdev)) { dev_err(&pdev->dev, "failed to register DCON regulator\n"); - return PTR_ERR(ec->dcon_rdev); + err = PTR_ERR(ec->dcon_rdev); + kfree(ec); + return err; }
ec->dbgfs_dir = olpc_ec_setup_debugfs();
From: Hans de Goede hdegoede@redhat.com
commit d823346876a970522ff9e4d2b323c9b734dcc4de upstream.
Commit cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") restored SW_TABLET_MODE reporting on the HP stream x360 11 series on which it was previously broken by commit de9647efeaa9 ("platform/x86: intel-vbtn: Only activate tablet mode switch on 2-in-1's").
It turns out that enabling SW_TABLET_MODE reporting on devices with a chassis-type of 10 ("Notebook") causes SW_TABLET_MODE to always report 1 at boot on the HP Pavilion 11 x360, which causes libinput to disable the kbd and touchpad.
The HP Pavilion 11 x360's ACPI VGBS method sets bit 4 instead of bit 6 when NOT in tablet mode at boot. Inspecting all the DSDTs in my DSDT collection shows only one other model, the Medion E1239T ever setting bit 4 and it always sets this together with bit 6.
So lets treat bit 4 as a second bit which when set indicates the device not being in tablet-mode, as we already do for bit 6.
While at it also prefix all VGBS constant defines with "VGBS_".
Fixes: cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") Signed-off-by: Hans de Goede hdegoede@redhat.com Acked-by: Mark Gross mgross@linux.intel.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/intel-vbtn.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/platform/x86/intel-vbtn.c +++ b/drivers/platform/x86/intel-vbtn.c @@ -15,9 +15,13 @@ #include <linux/platform_device.h> #include <linux/suspend.h>
+/* Returned when NOT in tablet mode on some HP Stream x360 11 models */ +#define VGBS_TABLET_MODE_FLAG_ALT 0x10 /* When NOT in tablet mode, VGBS returns with the flag 0x40 */ -#define TABLET_MODE_FLAG 0x40 -#define DOCK_MODE_FLAG 0x80 +#define VGBS_TABLET_MODE_FLAG 0x40 +#define VGBS_DOCK_MODE_FLAG 0x80 + +#define VGBS_TABLET_MODE_FLAGS (VGBS_TABLET_MODE_FLAG | VGBS_TABLET_MODE_FLAG_ALT)
MODULE_LICENSE("GPL"); MODULE_AUTHOR("AceLan Kao"); @@ -72,9 +76,9 @@ static void detect_tablet_mode(struct pl if (ACPI_FAILURE(status)) return;
- m = !(vgbs & TABLET_MODE_FLAG); + m = !(vgbs & VGBS_TABLET_MODE_FLAGS); input_report_switch(priv->input_dev, SW_TABLET_MODE, m); - m = (vgbs & DOCK_MODE_FLAG) ? 1 : 0; + m = (vgbs & VGBS_DOCK_MODE_FLAG) ? 1 : 0; input_report_switch(priv->input_dev, SW_DOCK, m); }
From: Tom Rix trix@redhat.com
commit 5f38b06db8af3ed6c2fc1b427504ca56fae2eacc upstream.
clang static analysis flags this represenative problem thinkpad_acpi.c:2523:7: warning: Branch condition evaluates to a garbage value if (!oldn->mute || ^~~~~~~~~~~
In hotkey_kthread() mute is conditionally set by hotkey_read_nvram() but unconditionally checked by hotkey_compare_and_issue_event(). So the tp_nvram_state variable s[2] needs to be initialized.
Fixes: 01e88f25985d ("ACPI: thinkpad-acpi: add CMOS NVRAM polling for hot keys (v9)") Signed-off-by: Tom Rix trix@redhat.com Reviewed-by: Hans de Goede hdegoede@redhat.com Acked-by: mark gross mgross@linux.intel.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/thinkpad_acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -2569,7 +2569,7 @@ static void hotkey_compare_and_issue_eve */ static int hotkey_kthread(void *data) { - struct tp_nvram_state s[2]; + struct tp_nvram_state s[2] = { 0 }; u32 poll_mask, event_mask; unsigned int si, so; unsigned long t;
From: Hans de Goede hdegoede@redhat.com
commit 1797d588af15174d4a4e7159dac8c800538e4f8c upstream.
Commit b0dbd97de1f1 ("platform/x86: asus-wmi: Add support for SW_TABLET_MODE") added support for reporting SW_TABLET_MODE using the Asus 0x00120063 WMI-device-id to see if various transformer models were docked into their keyboard-dock (SW_TABLET_MODE=0) or if they were being used as a tablet.
The new SW_TABLET_MODE support (naively?) assumed that non Transformer devices would either not support the 0x00120063 WMI-device-id at all, or would NOT set ASUS_WMI_DSTS_PRESENCE_BIT in their reply when querying the device-id.
Unfortunately this is not true and we have received many bug reports about this change causing the asus-wmi driver to always report SW_TABLET_MODE=1 on non Transformer devices. This causes libinput to think that these are 360 degree hinges style 2-in-1s folded into tablet-mode. Making libinput suppress keyboard and touchpad events from the builtin keyboard and touchpad. So effectively this causes the keyboard and touchpad to not work on many non Transformer Asus models.
This commit fixes this by using the existing DMI based quirk mechanism in asus-nb-wmi.c to allow using the 0x00120063 device-id for reporting SW_TABLET_MODE on Transformer models and ignoring it on all other models.
Fixes: b0dbd97de1f1 ("platform/x86: asus-wmi: Add support for SW_TABLET_MODE") Link: https://patchwork.kernel.org/patch/11780901/ BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209011 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1876997 Reported-by: Samuel Čavoj samuel@cavoj.net Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/asus-nb-wmi.c | 32 ++++++++++++++++++++++++++++++++ drivers/platform/x86/asus-wmi.c | 16 +++++++++------- drivers/platform/x86/asus-wmi.h | 1 + 3 files changed, 42 insertions(+), 7 deletions(-)
--- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -120,6 +120,10 @@ static struct quirk_entry quirk_asus_ga5 .wmi_backlight_set_devstate = true, };
+static struct quirk_entry quirk_asus_use_kbd_dock_devid = { + .use_kbd_dock_devid = true, +}; + static int dmi_matched(const struct dmi_system_id *dmi) { pr_info("Identified laptop model '%s'\n", dmi->ident); @@ -493,6 +497,34 @@ static const struct dmi_system_id asus_q }, .driver_data = &quirk_asus_ga502i, }, + { + .callback = dmi_matched, + .ident = "Asus Transformer T100TA / T100HA / T100CHI", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + /* Match *T100* */ + DMI_MATCH(DMI_PRODUCT_NAME, "T100"), + }, + .driver_data = &quirk_asus_use_kbd_dock_devid, + }, + { + .callback = dmi_matched, + .ident = "Asus Transformer T101HA", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "T101HA"), + }, + .driver_data = &quirk_asus_use_kbd_dock_devid, + }, + { + .callback = dmi_matched, + .ident = "Asus Transformer T200TA", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "T200TA"), + }, + .driver_data = &quirk_asus_use_kbd_dock_devid, + }, {}, };
--- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -365,12 +365,14 @@ static int asus_wmi_input_init(struct as if (err) goto err_free_dev;
- result = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_DOCK); - if (result >= 0) { - input_set_capability(asus->inputdev, EV_SW, SW_TABLET_MODE); - input_report_switch(asus->inputdev, SW_TABLET_MODE, !result); - } else if (result != -ENODEV) { - pr_err("Error checking for keyboard-dock: %d\n", result); + if (asus->driver->quirks->use_kbd_dock_devid) { + result = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_DOCK); + if (result >= 0) { + input_set_capability(asus->inputdev, EV_SW, SW_TABLET_MODE); + input_report_switch(asus->inputdev, SW_TABLET_MODE, !result); + } else if (result != -ENODEV) { + pr_err("Error checking for keyboard-dock: %d\n", result); + } }
err = input_register_device(asus->inputdev); @@ -2114,7 +2116,7 @@ static void asus_wmi_handle_event_code(i return; }
- if (code == NOTIFY_KBD_DOCK_CHANGE) { + if (asus->driver->quirks->use_kbd_dock_devid && code == NOTIFY_KBD_DOCK_CHANGE) { result = asus_wmi_get_devstate_simple(asus, ASUS_WMI_DEVID_KBD_DOCK); if (result >= 0) { --- a/drivers/platform/x86/asus-wmi.h +++ b/drivers/platform/x86/asus-wmi.h @@ -33,6 +33,7 @@ struct quirk_entry { bool wmi_backlight_native; bool wmi_backlight_set_devstate; bool wmi_force_als_set; + bool use_kbd_dock_devid; int wapf; /* * For machines with AMD graphic chips, it will send out WMI event
From: Tony Ambardar tony.ambardar@gmail.com
commit e23bb04b0c938588eae41b7f4712b722290ed2b8 upstream.
If BTF data is missing or removed from the ELF section it is still exported via sysfs as a zero-length file:
root@OpenWrt:/# ls -l /sys/kernel/btf/vmlinux -r--r--r-- 1 root root 0 Jul 18 02:59 /sys/kernel/btf/vmlinux
Moreover, reads from this file succeed and leak kernel data:
root@OpenWrt:/# hexdump -C /sys/kernel/btf/vmlinux|head -10 000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000cc0 00 00 00 00 00 00 00 00 00 00 00 00 80 83 b0 80 |................| 000cd0 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000ce0 00 00 00 00 00 00 00 00 00 00 00 00 57 ac 6e 9d |............W.n.| 000cf0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 002650 00 00 00 00 00 00 00 10 00 00 00 01 00 00 00 01 |................| 002660 80 82 9a c4 80 85 97 80 81 a9 51 68 00 00 00 02 |..........Qh....| 002670 80 25 44 dc 80 85 97 80 81 a9 50 24 81 ab c4 60 |.%D.......P$...`|
This situation was first observed with kernel 5.4.x, cross-compiled for a MIPS target system. Fix by adding a sanity-check for export of zero-length data sections.
Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs") Signed-off-by: Tony Ambardar Tony.Ambardar@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Andrii Nakryiko andriin@fb.com Link: https://lore.kernel.org/bpf/b38db205a66238f70823039a8c531535864eaac5.1600417... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/sysfs_btf.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/bpf/sysfs_btf.c +++ b/kernel/bpf/sysfs_btf.c @@ -30,15 +30,15 @@ static struct kobject *btf_kobj;
static int __init btf_vmlinux_init(void) { - if (!__start_BTF) + bin_attr_btf_vmlinux.size = __stop_BTF - __start_BTF; + + if (!__start_BTF || bin_attr_btf_vmlinux.size == 0) return 0;
btf_kobj = kobject_create_and_add("btf", kernel_kobj); if (!btf_kobj) return -ENOMEM;
- bin_attr_btf_vmlinux.size = __stop_BTF - __start_BTF; - return sysfs_create_bin_file(btf_kobj, &bin_attr_btf_vmlinux); }
From: Tony Ambardar tony.ambardar@gmail.com
commit 65c204398928f9c79f1a29912b410439f7052635 upstream.
Systems with memory or disk constraints often reduce the kernel footprint by configuring LD_DEAD_CODE_DATA_ELIMINATION. However, this can result in removal of any BTF information.
Use the KEEP() macro to preserve the BTF data as done with other important sections, while still allowing for smaller kernels.
Fixes: 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF") Signed-off-by: Tony Ambardar Tony.Ambardar@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Andrii Nakryiko andriin@fb.com Link: https://lore.kernel.org/bpf/a635b5d3e2da044e7b51ec1315e8910fbce0083f.1600417... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/asm-generic/vmlinux.lds.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -641,7 +641,7 @@ #define BTF \ .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { \ __start_BTF = .; \ - *(.BTF) \ + KEEP(*(.BTF)) \ __stop_BTF = .; \ } #else
From: Heiner Kallweit hkallweit1@gmail.com
commit 47dda78671a3d5cee3fb2229e37997d2ac8a3b54 upstream.
Some firmware files trigger a PHY soft reset and don't wait for it to be finished. PHY register writes directly after applying the firmware may fail or provide unexpected results therefore. Fix this by waiting for bit BMCR_RESET to be cleared after applying firmware.
There's nothing wrong with the referenced change, it's just that the fix will apply cleanly only after this change.
Fixes: 89fbd26cca7e ("r8169: fix firmware not resetting tp->ocp_base") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/realtek/r8169_main.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2113,11 +2113,18 @@ static void rtl_release_firmware(struct
void r8169_apply_firmware(struct rtl8169_private *tp) { + int val; + /* TODO: release firmware if rtl_fw_write_firmware signals failure. */ if (tp->rtl_fw) { rtl_fw_write_firmware(tp, tp->rtl_fw); /* At least one firmware doesn't reset tp->ocp_base. */ tp->ocp_base = OCP_STD_PHY_BASE; + + /* PHY soft reset may still be in progress */ + phy_read_poll_timeout(tp->phydev, MII_BMCR, val, + !(val & BMCR_RESET), + 50000, 600000, true); } }
From: Hans de Goede hdegoede@redhat.com
commit 8169bd3e6e193497cab781acddcff8fde5d0c416 upstream.
2 recent commits: cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") 1fac39fd0316 ("platform/x86: intel-vbtn: Also handle tablet-mode switch on "Detachable" and "Portable" chassis-types")
Enabled reporting of SW_TABLET_MODE on more devices since the vbtn ACPI interface is used by the firmware on some of those devices to report this.
Testing has shown that unconditionally enabling SW_TABLET_MODE reporting on all devices with a chassis type of 8 ("Portable") or 10 ("Notebook") which support the VGBS method is a very bad idea.
Many of these devices are normal laptops (non 2-in-1) models with a VGBS which always returns 0, which we translate to SW_TABLET_MODE=1. This in turn causes userspace (libinput) to suppress events from the builtin keyboard and touchpad, making the laptop essentially unusable.
Since the problem of wrongly reporting SW_TABLET_MODE=1 in combination with libinput, leads to a non-usable system. Where as OTOH many people will not even notice when SW_TABLET_MODE is not being reported, this commit changes intel_vbtn_has_switches() to use a DMI based allow-list.
The new DMI based allow-list matches on the 31 ("Convertible") and 32 ("Detachable") chassis-types, as these clearly are 2-in-1s and so far if they support the intel-vbtn ACPI interface they all have properly working SW_TABLET_MODE reporting.
Besides these 2 generic matches, it also contains model specific matches for 2-in-1 models which use a different chassis-type and which are known to have properly working SW_TABLET_MODE reporting.
This has been tested on the following 2-in-1 devices:
Dell Venue 11 Pro 7130 vPro HP Pavilion X2 10-p002nd HP Stream x360 Convertible PC 11 Medion E1239T
Fixes: cfae58ed681c ("platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type") BugLink: https://forum.manjaro.org/t/keyboard-and-touchpad-only-work-on-kernel-5-6/22... BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175599 Cc: Barnabás Pőcze pobrn@protonmail.com Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/intel-vbtn.c | 52 +++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 9 deletions(-)
--- a/drivers/platform/x86/intel-vbtn.c +++ b/drivers/platform/x86/intel-vbtn.c @@ -171,20 +171,54 @@ static bool intel_vbtn_has_buttons(acpi_ return ACPI_SUCCESS(status); }
+/* + * There are several laptops (non 2-in-1) models out there which support VGBS, + * but simply always return 0, which we translate to SW_TABLET_MODE=1. This in + * turn causes userspace (libinput) to suppress events from the builtin + * keyboard and touchpad, making the laptop essentially unusable. + * + * Since the problem of wrongly reporting SW_TABLET_MODE=1 in combination + * with libinput, leads to a non-usable system. Where as OTOH many people will + * not even notice when SW_TABLET_MODE is not being reported, a DMI based allow + * list is used here. This list mainly matches on the chassis-type of 2-in-1s. + * + * There are also some 2-in-1s which use the intel-vbtn ACPI interface to report + * SW_TABLET_MODE with a chassis-type of 8 ("Portable") or 10 ("Notebook"), + * these are matched on a per model basis, since many normal laptops with a + * possible broken VGBS ACPI-method also use these chassis-types. + */ +static const struct dmi_system_id dmi_switches_allow_list[] = { + { + .matches = { + DMI_EXACT_MATCH(DMI_CHASSIS_TYPE, "31" /* Convertible */), + }, + }, + { + .matches = { + DMI_EXACT_MATCH(DMI_CHASSIS_TYPE, "32" /* Detachable */), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Venue 11 Pro 7130"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Stream x360 Convertible PC 11"), + }, + }, + {} /* Array terminator */ +}; + static bool intel_vbtn_has_switches(acpi_handle handle) { - const char *chassis_type = dmi_get_system_info(DMI_CHASSIS_TYPE); unsigned long long vgbs; acpi_status status;
- /* - * Some normal laptops have a VGBS method despite being non-convertible - * and their VGBS method always returns 0, causing detect_tablet_mode() - * to report SW_TABLET_MODE=1 to userspace, which causes issues. - * These laptops have a DMI chassis_type of 9 ("Laptop"), do not report - * switches on any devices with a DMI chassis_type of 9. - */ - if (chassis_type && strcmp(chassis_type, "9") == 0) + if (!dmi_check_system(dmi_switches_allow_list)) return false;
status = acpi_evaluate_integer(handle, "VGBS", NULL, &vgbs);
From: Aaron Ma aaron.ma@canonical.com
commit 720ef73d1a239e33c3ad8fac356b9b1348e68aaf upstream.
Evaluating ACPI _BCL could fail, then ACPI buffer size will be set to 0. When reuse this ACPI buffer, AE_BUFFER_OVERFLOW will be triggered.
Re-initialize buffer size will make ACPI evaluate successfully.
Fixes: 46445b6b896fd ("thinkpad-acpi: fix handle locate for video and query of _BCL") Signed-off-by: Aaron Ma aaron.ma@canonical.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/thinkpad_acpi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -6829,8 +6829,10 @@ static int __init tpacpi_query_bcl_level list_for_each_entry(child, &device->children, node) { acpi_status status = acpi_evaluate_object(child->handle, "_BCL", NULL, &buffer); - if (ACPI_FAILURE(status)) + if (ACPI_FAILURE(status)) { + buffer.length = ACPI_ALLOCATE_BUFFER; continue; + }
obj = (union acpi_object *)buffer.pointer; if (!obj || (obj->type != ACPI_TYPE_PACKAGE)) {
From: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com
commit 4bab69093044ca81f394bd0780be1b71c5a4d308 upstream.
When try_module_get() fails in the nvme_dev_open() it returns without releasing the ctrl reference which was taken earlier.
Put the ctrl reference which is taken before calling the try_module_get() in the error return code path.
Fixes: 52a3974feb1a "nvme-core: get/put ctrl and transport module in nvme_dev_open/release()" Signed-off-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Reviewed-by: Logan Gunthorpe logang@deltatee.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvme/host/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3061,8 +3061,10 @@ static int nvme_dev_open(struct inode *i }
nvme_get_ctrl(ctrl); - if (!try_module_get(ctrl->ops->module)) + if (!try_module_get(ctrl->ops->module)) { + nvme_put_ctrl(ctrl); return -EINVAL; + }
file->private_data = ctrl; return 0;
From: Eric Dumazet edumazet@google.com
commit c7cc9200e9b4a2ac172e990ef1975cd42975dad6 upstream.
De-referencing skb after call to gro_cells_receive() is not allowed. We need to fetch skb->len earlier.
Fixes: 5491e7c6b1a9 ("macsec: enable GRO and RPS on macsec devices") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Paolo Abeni pabeni@redhat.com Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/macsec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -1077,6 +1077,7 @@ static rx_handler_result_t macsec_handle struct macsec_rx_sa *rx_sa; struct macsec_rxh_data *rxd; struct macsec_dev *macsec; + unsigned int len; sci_t sci; u32 hdr_pn; bool cbit; @@ -1232,9 +1233,10 @@ deliver: macsec_rxsc_put(rx_sc);
skb_orphan(skb); + len = skb->len; ret = gro_cells_receive(&macsec->gro_cells, skb); if (ret == NET_RX_SUCCESS) - count_rx(dev, skb->len); + count_rx(dev, len); else macsec->secy.netdev->stats.rx_dropped++;
From: Atish Patra atish.patra@wdc.com
commit a78c6f5956a949b496a5b087188dde52483edf51 upstream.
Currently, the memory containing DT is not reserved. Thus, that region of memory can be reallocated or reused for other purposes. This may result in corrupted DT for nommu virt board in Qemu. We may not face any issue in kendryte as DT is embedded in the kernel image for that.
Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Signed-off-by: Atish Patra atish.patra@wdc.com Signed-off-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/riscv/mm/init.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -515,6 +515,7 @@ asmlinkage void __init setup_vm(uintptr_ #else dtb_early_va = (void *)dtb_pa; #endif + dtb_early_pa = dtb_pa; }
static inline void setup_vm_final(void)
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
commit 47e538d86d5776ac8152146c3ed3d22326243190 upstream.
It appears that UML (arch/um) has no compat.h header defined and hence can't compile a recently provided piece of code in GPIO library.
Disable compat ->read() code in UML case to avoid compilation errors.
While at it, use pattern which is already being used in the kernel elsewhere.
Fixes: 5ad284ab3a01 ("gpiolib: Fix line event handling in syscall compatible mode") Reported-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20201005131044.87276-1-andriy.shevchenko@linux.int... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpiolib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -838,7 +838,7 @@ static __poll_t lineevent_poll(struct fi
static ssize_t lineevent_get_size(void) { -#ifdef __x86_64__ +#if defined(CONFIG_X86_64) && !defined(CONFIG_UML) /* i386 has no padding after 'id' */ if (in_ia32_syscall()) { struct compat_gpioeevent_data {
From: Hugh Dickins hughd@google.com
commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream.
There have been elusive reports of filemap_fault() hitting its VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged without NUMA reuses the same huge page after collapse_file() failed (whereas NUMA targets its allocation to the respective node each time). And most of us were usually testing with CONFIG_NUMA=y kernels.
collapse_file(old start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=old offset new_page->mapping = mapping xas_store(&xas, new_page)
filemap_fault page = find_get_page(mapping, offset) // if offset falls inside hpage then // compound_head(page) == hpage lock_page_maybe_drop_mmap() __lock_page(page)
// collapse fails xas_store(&xas, old page) new_page->mapping = NULL unlock_page(new_page)
collapse_file(new start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=new offset new_page->mapping = mapping // mapping becomes valid again
// since compound_head(page) == hpage // page_to_pgoff(page) got changed VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
An initial patch replaced __SetPageLocked() by lock_page(), which did fix the race which Suren illustrates above. But testing showed that it's not good enough: if the racing task's __lock_page() gets delayed long after its find_get_page(), then it may follow collapse_file(new start)'s successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a check and retry (as is done for mapping), with similar relaxations in find_lock_entry() and pagecache_get_page(): but it's not obvious what else might get caught out; and khugepaged non-NUMA appears to be unique in exposing a page to page cache, then revoking, without going through a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry() has been at similar risk; but huge tmpfs does not rely on khugepaged for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Reported-by: Denis Lisov dennis.lissov@gmail.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569 Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%... Reported-by: Qian Cai cai@lca.pw Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw Reported-and-analyzed-by: Suren Baghdasaryan surenb@google.com Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page") Signed-off-by: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/khugepaged.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -914,6 +914,18 @@ static struct page *khugepaged_alloc_hug
static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) { + /* + * If the hpage allocated earlier was briefly exposed in page cache + * before collapse_file() failed, it is possible that racing lookups + * have not yet completed, and would then be unpleasantly surprised by + * finding the hpage reused for the same mapping at a different offset. + * Just release the previous allocation if there is any danger of that. + */ + if (*hpage && page_count(*hpage) > 1) { + put_page(*hpage); + *hpage = NULL; + } + if (!*hpage) *hpage = khugepaged_alloc_hugepage(wait);
From: Coly Li colyli@suse.de
commit c381b07941adc2274ce552daf86c94701c5e265a upstream.
The original problem was from nvme-over-tcp code, who mistakenly uses kernel_sendpage() to send pages allocated by __get_free_pages() without __GFP_COMP flag. Such pages don't have refcount (page_count is 0) on tail pages, sending them by kernel_sendpage() may trigger a kernel panic from a corrupted kernel heap, because these pages are incorrectly freed in network stack as page_count 0 pages.
This patch introduces a helper sendpage_ok(), it returns true if the checking page, - is not slab page: PageSlab(page) is false. - has page refcount: page_count(page) is not zero
All drivers who want to send page to remote end by kernel_sendpage() may use this helper to check whether the page is OK. If the helper does not return true, the driver should try other non sendpage method (e.g. sock_no_sendpage()) to handle the page.
Signed-off-by: Coly Li colyli@suse.de Cc: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Cc: Christoph Hellwig hch@lst.de Cc: Hannes Reinecke hare@suse.de Cc: Jan Kara jack@suse.com Cc: Jens Axboe axboe@kernel.dk Cc: Mikhail Skorzhinskii mskorzhinskiy@solarflare.com Cc: Philipp Reisner philipp.reisner@linbit.com Cc: Sagi Grimberg sagi@grimberg.me Cc: Vlastimil Babka vbabka@suse.com Cc: stable@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/net.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/include/linux/net.h +++ b/include/linux/net.h @@ -21,6 +21,7 @@ #include <linux/rcupdate.h> #include <linux/once.h> #include <linux/fs.h> +#include <linux/mm.h>
#include <uapi/linux/net.h>
@@ -290,6 +291,21 @@ do { \ #define net_get_random_once_wait(buf, nbytes) \ get_random_once_wait((buf), (nbytes))
+/* + * E.g. XFS meta- & log-data is in slab pages, or bcache meta + * data pages, or other high order pages allocated by + * __get_free_pages() without __GFP_COMP, which have a page_count + * of 0 and/or have PageSlab() set. We cannot use send_page for + * those, as that does get_page(); put_page(); and would cause + * either a VM_BUG directly, or __page_cache_release a page that + * would actually still be referenced by someone, leading to some + * obscure delayed Oops somewhere else. + */ +static inline bool sendpage_ok(struct page *page) +{ + return !PageSlab(page) && page_count(page) >= 1; +} + int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t len); int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg,
From: Coly Li colyli@suse.de
commit cf83a17edeeb36195596d2dae060a7c381db35f1 upstream.
commit a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") adds the checks for Slab pages, but the pages don't have page_count are still missing from the check.
Network layer's sendpage method is not designed to send page_count 0 pages neither, therefore both PageSlab() and page_count() should be both checked for the sending page. This is exactly what sendpage_ok() does.
This patch uses sendpage_ok() in do_tcp_sendpages() to detect misused .sendpage, to make the code more robust.
Fixes: a10674bf2406 ("tcp: detecting the misuse of .sendpage for Slab objects") Suggested-by: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Coly Li colyli@suse.de Cc: Vasily Averin vvs@virtuozzo.com Cc: David S. Miller davem@davemloft.net Cc: stable@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -970,7 +970,8 @@ ssize_t do_tcp_sendpages(struct sock *sk long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
if (IS_ENABLED(CONFIG_DEBUG_VM) && - WARN_ONCE(PageSlab(page), "page must not be a Slab one")) + WARN_ONCE(!sendpage_ok(page), + "page must not be a Slab one and have page_count > 0")) return -EINVAL;
/* Wait for a connection to finish. One exception is TCP Fast Open
From: Coly Li colyli@suse.de
commit 7d4194abfc4de13a2663c7fee6891de8360f7a52 upstream.
Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to send slab pages. But for pages allocated by __get_free_pages() without __GFP_COMP, which also have refcount as 0, they are still sent by kernel_sendpage() to remote end, this is problematic.
The new introduced helper sendpage_ok() checks both PageSlab tag and page_count counter, and returns true if the checking page is OK to be sent by kernel_sendpage().
This patch fixes the page checking issue of nvme_tcp_try_send_data() with sendpage_ok(). If sendpage_ok() returns true, send this page by kernel_sendpage(), otherwise use sock_no_sendpage to handle this page.
Signed-off-by: Coly Li colyli@suse.de Cc: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Cc: Christoph Hellwig hch@lst.de Cc: Hannes Reinecke hare@suse.de Cc: Jan Kara jack@suse.com Cc: Jens Axboe axboe@kernel.dk Cc: Mikhail Skorzhinskii mskorzhinskiy@solarflare.com Cc: Philipp Reisner philipp.reisner@linbit.com Cc: Sagi Grimberg sagi@grimberg.me Cc: Vlastimil Babka vbabka@suse.com Cc: stable@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvme/host/tcp.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -889,12 +889,11 @@ static int nvme_tcp_try_send_data(struct else flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
- /* can't zcopy slab pages */ - if (unlikely(PageSlab(page))) { - ret = sock_no_sendpage(queue->sock, page, offset, len, + if (sendpage_ok(page)) { + ret = kernel_sendpage(queue->sock, page, offset, len, flags); } else { - ret = kernel_sendpage(queue->sock, page, offset, len, + ret = sock_no_sendpage(queue->sock, page, offset, len, flags); } if (ret <= 0)
From: Sabrina Dubroca sd@queasysnail.net
commit 45a36a18d01907710bad5258d81f76c18882ad88 upstream.
xfrm interfaces currently test for !skb->ignore_df when deciding whether to update the pmtu on the skb's dst. Because of this, no pmtu exception is created when we do something like:
ping -s 1438 <dest>
By dropping this check, the pmtu exception will be created and the next ping attempt will work.
Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces") Reported-by: Xiumei Mu xmu@redhat.com Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/xfrm/xfrm_interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/xfrm/xfrm_interface.c +++ b/net/xfrm/xfrm_interface.c @@ -292,7 +292,7 @@ xfrmi_xmit2(struct sk_buff *skb, struct }
mtu = dst_mtu(dst); - if (!skb->ignore_df && skb->len > mtu) { + if (skb->len > mtu) { skb_dst_update_pmtu_no_confirm(skb, mtu);
if (skb->protocol == htons(ETH_P_IPV6)) {
From: Sabrina Dubroca sd@queasysnail.net
commit 4eb2e13415757a2bce5bb0d580d22bbeef1f5346 upstream.
Xiumei reported a bug with espintcp over IPv6 in transport mode, because xfrm6_transport_finish expects to find IP6CB data (struct inet6_skb_cb). Currently, espintcp zeroes the CB, but the relevant part is actually preserved by previous layers (first set up by tcp, then strparser only zeroes a small part of tcp_skb_tb), so we can just relocate it to the start of skb->cb.
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Reported-by: Xiumei Mu xmu@redhat.com Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/xfrm/espintcp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -29,8 +29,12 @@ static void handle_nonesp(struct espintc
static void handle_esp(struct sk_buff *skb, struct sock *sk) { + struct tcp_skb_cb *tcp_cb = (struct tcp_skb_cb *)skb->cb; + skb_reset_transport_header(skb); - memset(skb->cb, 0, sizeof(skb->cb)); + + /* restore IP CB, we need at least IP6CB->nhoff */ + memmove(skb->cb, &tcp_cb->header, sizeof(tcp_cb->header));
rcu_read_lock(); skb->dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
From: Vladimir Zapolskiy vladimir@tuxera.com
commit 64b7f674c292207624b3d788eda2dde3dc1415df upstream.
On setxattr() syscall path due to an apprent typo the size of a dynamically allocated memory chunk for storing struct smb2_file_full_ea_info object is computed incorrectly, to be more precise the first addend is the size of a pointer instead of the wanted object size. Coincidentally it makes no difference on 64-bit platforms, however on 32-bit targets the following memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
============================================================================= BUG kmalloc-16 (Not tainted): Redzone overwritten -----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201 INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi Redzone 79e69a6f: 73 68 32 0a sh2. Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 Call Trace: dump_stack+0x54/0x6e print_trailer+0x12c/0x134 check_bytes_and_report.cold+0x3e/0x69 check_object+0x18c/0x250 free_debug_processing+0xfe/0x230 __slab_free+0x1c0/0x300 kfree+0x1d3/0x220 smb2_set_ea+0x27d/0x540 cifs_xattr_set+0x57f/0x620 __vfs_setxattr+0x4e/0x60 __vfs_setxattr_noperm+0x4e/0x100 __vfs_setxattr_locked+0xae/0xd0 vfs_setxattr+0x4e/0xe0 setxattr+0x12c/0x1a0 path_setxattr+0xa4/0xc0 __ia32_sys_lsetxattr+0x1d/0x20 __do_fast_syscall_32+0x40/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2
Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+") Signed-off-by: Vladimir Zapolskiy vladimir@tuxera.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -1208,7 +1208,7 @@ smb2_set_ea(const unsigned int xid, stru rqst[1].rq_iov = si_iov; rqst[1].rq_nvec = 1;
- len = sizeof(ea) + ea_name_len + ea_value_len + 1; + len = sizeof(*ea) + ea_name_len + ea_value_len + 1; ea = kzalloc(len, GFP_KERNEL); if (ea == NULL) { rc = -ENOMEM;
From: Jerome Brunet jbrunet@baylibre.com
commit 28683e847e2f20eed22cdd24f185d7783db396d3 upstream.
When the slave address is written in do_start(), SLAVE_ADDR is written completely. This may overwrite some setting related to the clock rate or signal filtering.
Fix this by writing only the bits related to slave address. To avoid causing unexpected changed, explicitly disable filtering or high/low clock mode which may have been left over by the bootloader.
Fixes: 30021e3707a7 ("i2c: add support for Amlogic Meson I2C controller") Signed-off-by: Jerome Brunet jbrunet@baylibre.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-meson.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-meson.c +++ b/drivers/i2c/busses/i2c-meson.c @@ -5,6 +5,7 @@ * Copyright (C) 2014 Beniamino Galvani b.galvani@gmail.com */
+#include <linux/bitfield.h> #include <linux/clk.h> #include <linux/completion.h> #include <linux/i2c.h> @@ -38,6 +39,12 @@ #define REG_CTRL_CLKDIVEXT_SHIFT 28 #define REG_CTRL_CLKDIVEXT_MASK GENMASK(29, 28)
+#define REG_SLV_ADDR GENMASK(7, 0) +#define REG_SLV_SDA_FILTER GENMASK(10, 8) +#define REG_SLV_SCL_FILTER GENMASK(13, 11) +#define REG_SLV_SCL_LOW GENMASK(27, 16) +#define REG_SLV_SCL_LOW_EN BIT(28) + #define I2C_TIMEOUT_MS 500
enum { @@ -147,6 +154,9 @@ static void meson_i2c_set_clk_div(struct meson_i2c_set_mask(i2c, REG_CTRL, REG_CTRL_CLKDIVEXT_MASK, (div >> 10) << REG_CTRL_CLKDIVEXT_SHIFT);
+ /* Disable HIGH/LOW mode */ + meson_i2c_set_mask(i2c, REG_SLAVE_ADDR, REG_SLV_SCL_LOW_EN, 0); + dev_dbg(i2c->dev, "%s: clk %lu, freq %u, div %u\n", __func__, clk_rate, freq, div); } @@ -280,7 +290,10 @@ static void meson_i2c_do_start(struct me token = (msg->flags & I2C_M_RD) ? TOKEN_SLAVE_ADDR_READ : TOKEN_SLAVE_ADDR_WRITE;
- writel(msg->addr << 1, i2c->regs + REG_SLAVE_ADDR); + + meson_i2c_set_mask(i2c, REG_SLAVE_ADDR, REG_SLV_ADDR, + FIELD_PREP(REG_SLV_ADDR, msg->addr << 1)); + meson_i2c_add_token(i2c, TOKEN_START); meson_i2c_add_token(i2c, token); } @@ -461,6 +474,10 @@ static int meson_i2c_probe(struct platfo return ret; }
+ /* Disable filtering */ + meson_i2c_set_mask(i2c, REG_SLAVE_ADDR, + REG_SLV_SDA_FILTER | REG_SLV_SCL_FILTER, 0); + meson_i2c_set_clk_div(i2c, timings.bus_freq_hz);
return 0;
From: Jerome Brunet jbrunet@baylibre.com
commit 79e137b1540165f788394658442284d55a858984 upstream.
SCL rate appears to be different than what is expected. For example, We get 164kHz on i2c3 of the vim3 when 400kHz is expected. This is partially due to the peripheral clock being disabled when the clock is set.
Let's keep the peripheral clock on after probe to fix the problem. This does not affect the SCL output which is still gated when i2c is idle.
Fixes: 09af1c2fa490 ("i2c: meson: set clock divider in probe instead of setting it for each transfer") Signed-off-by: Jerome Brunet jbrunet@baylibre.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-meson.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
--- a/drivers/i2c/busses/i2c-meson.c +++ b/drivers/i2c/busses/i2c-meson.c @@ -370,16 +370,12 @@ static int meson_i2c_xfer_messages(struc struct meson_i2c *i2c = adap->algo_data; int i, ret = 0;
- clk_enable(i2c->clk); - for (i = 0; i < num; i++) { ret = meson_i2c_xfer_msg(i2c, msgs + i, i == num - 1, atomic); if (ret) break; }
- clk_disable(i2c->clk); - return ret ?: i; }
@@ -448,7 +444,7 @@ static int meson_i2c_probe(struct platfo return ret; }
- ret = clk_prepare(i2c->clk); + ret = clk_prepare_enable(i2c->clk); if (ret < 0) { dev_err(&pdev->dev, "can't prepare clock\n"); return ret; @@ -470,7 +466,7 @@ static int meson_i2c_probe(struct platfo
ret = i2c_add_adapter(&i2c->adap); if (ret < 0) { - clk_unprepare(i2c->clk); + clk_disable_unprepare(i2c->clk); return ret; }
@@ -488,7 +484,7 @@ static int meson_i2c_remove(struct platf struct meson_i2c *i2c = platform_get_drvdata(pdev);
i2c_del_adapter(&i2c->adap); - clk_unprepare(i2c->clk); + clk_disable_unprepare(i2c->clk);
return 0; }
From: Nicolas Belin nbelin@baylibre.com
commit 1334d3b4e49e35d8912a7c37ffca4c5afb9a0516 upstream.
Apparently, 15 cycles of the peripheral clock are used by the controller for sampling and filtering. Because this was not known before, the rate calculation is slightly off.
Clean up and fix the calculation taking this filtering delay into account.
Fixes: 30021e3707a7 ("i2c: add support for Amlogic Meson I2C controller") Signed-off-by: Nicolas Belin nbelin@baylibre.com Signed-off-by: Jerome Brunet jbrunet@baylibre.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-meson.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)
--- a/drivers/i2c/busses/i2c-meson.c +++ b/drivers/i2c/busses/i2c-meson.c @@ -34,10 +34,8 @@ #define REG_CTRL_ACK_IGNORE BIT(1) #define REG_CTRL_STATUS BIT(2) #define REG_CTRL_ERROR BIT(3) -#define REG_CTRL_CLKDIV_SHIFT 12 -#define REG_CTRL_CLKDIV_MASK GENMASK(21, 12) -#define REG_CTRL_CLKDIVEXT_SHIFT 28 -#define REG_CTRL_CLKDIVEXT_MASK GENMASK(29, 28) +#define REG_CTRL_CLKDIV GENMASK(21, 12) +#define REG_CTRL_CLKDIVEXT GENMASK(29, 28)
#define REG_SLV_ADDR GENMASK(7, 0) #define REG_SLV_SDA_FILTER GENMASK(10, 8) @@ -46,6 +44,7 @@ #define REG_SLV_SCL_LOW_EN BIT(28)
#define I2C_TIMEOUT_MS 500 +#define FILTER_DELAY 15
enum { TOKEN_END = 0, @@ -140,19 +139,21 @@ static void meson_i2c_set_clk_div(struct unsigned long clk_rate = clk_get_rate(i2c->clk); unsigned int div;
- div = DIV_ROUND_UP(clk_rate, freq * i2c->data->div_factor); + div = DIV_ROUND_UP(clk_rate, freq); + div -= FILTER_DELAY; + div = DIV_ROUND_UP(div, i2c->data->div_factor);
/* clock divider has 12 bits */ - if (div >= (1 << 12)) { + if (div > GENMASK(11, 0)) { dev_err(i2c->dev, "requested bus frequency too low\n"); - div = (1 << 12) - 1; + div = GENMASK(11, 0); }
- meson_i2c_set_mask(i2c, REG_CTRL, REG_CTRL_CLKDIV_MASK, - (div & GENMASK(9, 0)) << REG_CTRL_CLKDIV_SHIFT); + meson_i2c_set_mask(i2c, REG_CTRL, REG_CTRL_CLKDIV, + FIELD_PREP(REG_CTRL_CLKDIV, div & GENMASK(9, 0)));
- meson_i2c_set_mask(i2c, REG_CTRL, REG_CTRL_CLKDIVEXT_MASK, - (div >> 10) << REG_CTRL_CLKDIVEXT_SHIFT); + meson_i2c_set_mask(i2c, REG_CTRL, REG_CTRL_CLKDIVEXT, + FIELD_PREP(REG_CTRL_CLKDIVEXT, div >> 10));
/* Disable HIGH/LOW mode */ meson_i2c_set_mask(i2c, REG_SLAVE_ADDR, REG_SLV_SCL_LOW_EN, 0);
From: Cristian Ciocaltea cristian.ciocaltea@gmail.com
commit f5b3f433641c543ebe5171285a42aa6adcdb2d22 upstream.
When the NACK and BUS error bits are set by the hardware, the driver is responsible for clearing them by writing "1" into the corresponding status registers.
Hence perform the necessary operations in owl_i2c_interrupt().
Fixes: d211e62af466 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver") Reported-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Cristian Ciocaltea cristian.ciocaltea@gmail.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-owl.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/i2c/busses/i2c-owl.c +++ b/drivers/i2c/busses/i2c-owl.c @@ -176,6 +176,9 @@ static irqreturn_t owl_i2c_interrupt(int fifostat = readl(i2c_dev->base + OWL_I2C_REG_FIFOSTAT); if (fifostat & OWL_I2C_FIFOSTAT_RNB) { i2c_dev->err = -ENXIO; + /* Clear NACK error bit by writing "1" */ + owl_i2c_update_reg(i2c_dev->base + OWL_I2C_REG_FIFOSTAT, + OWL_I2C_FIFOSTAT_RNB, true); goto stop; }
@@ -183,6 +186,9 @@ static irqreturn_t owl_i2c_interrupt(int stat = readl(i2c_dev->base + OWL_I2C_REG_STAT); if (stat & OWL_I2C_STAT_BEB) { i2c_dev->err = -EIO; + /* Clear BUS error bit by writing "1" */ + owl_i2c_update_reg(i2c_dev->base + OWL_I2C_REG_STAT, + OWL_I2C_STAT_BEB, true); goto stop; }
From: Eric Dumazet edumazet@google.com
commit d42ee76ecb6c49d499fc5eb32ca34468d95dbc3e upstream.
After freeing ep->auth_hmacs we have to clear the pointer or risk use-after-free as reported by syzbot:
BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline] BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070 Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874
CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline] sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070 sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203 sctp_endpoint_put net/sctp/endpointola.c:236 [inline] sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183 sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981 sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415 sk_common_release+0x64/0x390 net/core/sock.c:3254 sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475 __sock_release+0xcd/0x280 net/socket.c:596 sock_close+0x18/0x20 net/socket.c:1277 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 exit_task_work include/linux/task_work.h:25 [inline] do_exit+0xb7d/0x29f0 kernel/exit.c:806 do_group_exit+0x125/0x310 kernel/exit.c:903 __do_sys_exit_group kernel/exit.c:914 [inline] __se_sys_exit_group kernel/exit.c:912 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x43f278 Code: Bad RIP value. RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0 R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 6874: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461 kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554 kmalloc include/linux/slab.h:554 [inline] kmalloc_array include/linux/slab.h:593 [inline] kcalloc include/linux/slab.h:605 [inline] sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline] sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631 __sys_setsockopt+0x2db/0x610 net/socket.c:2132 __do_sys_setsockopt net/socket.c:2143 [inline] __se_sys_setsockopt net/socket.c:2140 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 6874: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422 __cache_free mm/slab.c:3422 [inline] kfree+0x10e/0x2b0 mm/slab.c:3760 sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline] sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline] sctp_auth_init_hmacs net/sctp/auth.c:496 [inline] sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline] sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631 __sys_setsockopt+0x2db/0x610 net/socket.c:2132 __do_sys_setsockopt net/socket.c:2143 [inline] __se_sys_setsockopt net/socket.c:2140 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1f485649f529 ("[SCTP]: Implement SCTP-AUTH internals") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Vlad Yasevich vyasevich@gmail.com Cc: Neil Horman nhorman@tuxdriver.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sctp/auth.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -494,6 +494,7 @@ int sctp_auth_init_hmacs(struct sctp_end out_err: /* Clean up any successful allocations */ sctp_auth_destroy_hmacs(ep->auth_hmacs); + ep->auth_hmacs = NULL; return -ENOMEM; }
From: Eric Dumazet edumazet@google.com
commit 89d01748b2354e210b5d4ea47bc25a42a1b42c82 upstream.
Some devices set needed_headroom. If we ignore it, we might end up crashing in various skb_push() for example in ipgre_header() since some layers assume enough headroom has been reserved.
Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/team/team.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -2112,6 +2112,7 @@ static void team_setup_by_port(struct ne dev->header_ops = port_dev->header_ops; dev->type = port_dev->type; dev->hard_header_len = port_dev->hard_header_len; + dev->needed_headroom = port_dev->needed_headroom; dev->addr_len = port_dev->addr_len; dev->mtu = port_dev->mtu; memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len);
From: Anant Thazhemadam anant.thazhemadam@gmail.com
commit 9a9e77495958c7382b2438bc19746dd3aaaabb8e upstream.
The variable "i" isn't initialized back correctly after the first loop under the label inst_rollback gets executed.
The value of "i" is assigned to be option_count - 1, and the ensuing loop (under alloc_rollback) begins by initializing i--. Thus, the value of i when the loop begins execution will now become i = option_count - 2.
Thus, when kfree(dst_opts[i]) is called in the second loop in this order, (i.e., inst_rollback followed by alloc_rollback), dst_optsp[option_count - 2] is the first element freed, and dst_opts[option_count - 1] does not get freed, and thus, a memory leak is caused.
This memory leak can be fixed, by assigning i = option_count (instead of option_count - 1).
Fixes: 80f7c6683fe0 ("team: add support for per-port options") Reported-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com Tested-by: syzbot+69b804437cfec30deac3@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam anant.thazhemadam@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/team/team.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -287,7 +287,7 @@ inst_rollback: for (i--; i >= 0; i--) __team_option_inst_del_option(team, dst_opts[i]);
- i = option_count - 1; + i = option_count; alloc_rollback: for (i--; i >= 0; i--) kfree(dst_opts[i]);
From: Dumitru Ceara dceara@redhat.com
commit 8aa7b526dc0b5dbf40c1b834d76a667ad672a410 upstream.
With multiple DNAT rules it's possible that after destination translation the resulting tuples collide.
For example, two openvswitch flows: nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20)) nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
Assuming two TCP clients initiating the following connections: 10.0.0.10:5000->10.0.0.10:10 10.0.0.10:5000->10.0.0.20:10
Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing nf_conntrack_confirm() to fail because of tuple collision.
Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction.
Reported-at: https://bugzilla.redhat.com/1877128 Suggested-by: Florian Westphal fw@strlen.de Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Dumitru Ceara dceara@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/openvswitch/conntrack.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
--- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -903,15 +903,19 @@ static int ovs_ct_nat(struct net *net, s } err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype);
- if (err == NF_ACCEPT && - ct->status & IPS_SRC_NAT && ct->status & IPS_DST_NAT) { - if (maniptype == NF_NAT_MANIP_SRC) - maniptype = NF_NAT_MANIP_DST; - else - maniptype = NF_NAT_MANIP_SRC; + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { + if (ct->status & IPS_SRC_NAT) { + if (maniptype == NF_NAT_MANIP_SRC) + maniptype = NF_NAT_MANIP_DST; + else + maniptype = NF_NAT_MANIP_SRC;
- err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, - maniptype); + err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, + maniptype); + } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) { + err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL, + NF_NAT_MANIP_SRC); + } }
/* Mark NAT done if successful and update the flow key. */
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit 1d0e16ac1a9e800598dcfa5b6bc53b704a103390 ]
Set ttm->sg to NULL after kfree, to avoid memory corruption backtrace:
[ 420.932812] kernel BUG at /build/linux-do9eLF/linux-4.15.0/mm/slub.c:295! [ 420.934182] invalid opcode: 0000 [#1] SMP NOPTI [ 420.935445] Modules linked in: xt_conntrack ipt_MASQUERADE [ 420.951332] Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 1.5.4 07/09/2020 [ 420.952887] RIP: 0010:__slab_free+0x180/0x2d0 [ 420.954419] RSP: 0018:ffffbe426291fa60 EFLAGS: 00010246 [ 420.955963] RAX: ffff9e29263e9c30 RBX: ffff9e29263e9c30 RCX: 000000018100004b [ 420.957512] RDX: ffff9e29263e9c30 RSI: fffff3d33e98fa40 RDI: ffff9e297e407a80 [ 420.959055] RBP: ffffbe426291fb00 R08: 0000000000000001 R09: ffffffffc0d39ade [ 420.960587] R10: ffffbe426291fb20 R11: ffff9e49ffdd4000 R12: ffff9e297e407a80 [ 420.962105] R13: fffff3d33e98fa40 R14: ffff9e29263e9c30 R15: ffff9e2954464fd8 [ 420.963611] FS: 00007fa2ea097780(0000) GS:ffff9e297e840000(0000) knlGS:0000000000000000 [ 420.965144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 420.966663] CR2: 00007f16bfffefb8 CR3: 0000001ff0c62000 CR4: 0000000000340ee0 [ 420.968193] Call Trace: [ 420.969703] ? __page_cache_release+0x3c/0x220 [ 420.971294] ? amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu] [ 420.972789] kfree+0x168/0x180 [ 420.974353] ? amdgpu_ttm_tt_set_user_pages+0x64/0xc0 [amdgpu] [ 420.975850] ? kfree+0x168/0x180 [ 420.977403] amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu] [ 420.978888] ttm_tt_unpopulate.part.10+0x53/0x60 [amdttm] [ 420.980357] ttm_tt_destroy.part.11+0x4f/0x60 [amdttm] [ 420.981814] ttm_tt_destroy+0x13/0x20 [amdttm] [ 420.983273] ttm_bo_cleanup_memtype_use+0x36/0x80 [amdttm] [ 420.984725] ttm_bo_release+0x1c9/0x360 [amdttm] [ 420.986167] amdttm_bo_put+0x24/0x30 [amdttm] [ 420.987663] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 420.989165] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x9ca/0xb10 [amdgpu] [ 420.990666] kfd_ioctl_alloc_memory_of_gpu+0xef/0x2c0 [amdgpu]
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index e59c01a83dace..9a3267f06376f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1052,6 +1052,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
release_sg: kfree(ttm->sg); + ttm->sg = NULL; return r; }
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 313b085851c13ca08320372a05a7047ea25d3dd4 ]
We need to move the closing of the src_device out of all the device replace locking, but we definitely want to zero out the superblock before we commit the last time to make sure the device is properly removed. Handle this by pushing btrfs_scratch_superblocks into btrfs_dev_replace_finishing, and then later on we'll move the src_device closing and freeing stuff where we need it to be.
Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/dev-replace.c | 3 +++ fs/btrfs/volumes.c | 12 +++--------- fs/btrfs/volumes.h | 3 +++ 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index eb86e4b88c73a..26c9da82e6a91 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -783,6 +783,9 @@ error: /* replace the sysfs entry */ btrfs_sysfs_remove_devices_dir(fs_info->fs_devices, src_device); btrfs_sysfs_update_devid(tgt_device); + if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &src_device->dev_state)) + btrfs_scratch_superblocks(fs_info, src_device->bdev, + src_device->name->str); btrfs_rm_dev_replace_free_srcdev(src_device);
/* write back the superblocks */ diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 956eb0d6bc584..8b5f666a3ea66 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1999,9 +1999,9 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) return num_devices; }
-static void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, - struct block_device *bdev, - const char *device_path) +void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, + struct block_device *bdev, + const char *device_path) { struct btrfs_super_block *disk_super; int copy_num; @@ -2224,12 +2224,6 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev) struct btrfs_fs_info *fs_info = srcdev->fs_info; struct btrfs_fs_devices *fs_devices = srcdev->fs_devices;
- if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &srcdev->dev_state)) { - /* zero out the old super if it is writable */ - btrfs_scratch_superblocks(fs_info, srcdev->bdev, - srcdev->name->str); - } - btrfs_close_bdev(srcdev); synchronize_rcu(); btrfs_free_device(srcdev); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 75af2334b2e37..83862e27f5663 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -573,6 +573,9 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info); bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info, struct btrfs_device *failing_dev); +void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, + struct block_device *bdev, + const char *device_path);
int btrfs_bg_type_to_factor(u64 flags); const char *btrfs_bg_type_to_raid_name(u64 flags);
From: Jens Axboe axboe@kernel.dk
[ Upstream commit fad8e0de4426a776c9bcb060555e7c09e2d08db6 ]
syzbot reports a potential lock deadlock between the normal IO path and ->show_fdinfo():
====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc6-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.2/19710 is trying to acquire lock: ffff888098ddc450 (sb_writers#4){.+.+}-{0:0}, at: io_write+0x6b5/0xb30 fs/io_uring.c:3296
but task is already holding lock: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ctx->uring_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 __io_uring_show_fdinfo fs/io_uring.c:8417 [inline] io_uring_show_fdinfo+0x194/0xc70 fs/io_uring.c:8460 seq_show+0x4a8/0x700 fs/proc/fd.c:65 seq_read+0x432/0x1070 fs/seq_file.c:208 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&p->lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 seq_read+0x61/0x1070 fs/seq_file.c:155 pde_read fs/proc/inode.c:306 [inline] proc_reg_read+0x221/0x300 fs/proc/inode.c:318 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (sb_writers#4){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x228/0x450 fs/super.c:1672 io_write+0x6b5/0xb30 fs/io_uring.c:3296 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254 io_submit_sqe fs/io_uring.c:6324 [inline] io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of: sb_writers#4 --> &p->lock --> &ctx->uring_lock
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&ctx->uring_lock); lock(&p->lock); lock(&ctx->uring_lock); lock(sb_writers#4);
*** DEADLOCK ***
1 lock held by syz-executor.2/19710: #0: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348
stack backtrace: CPU: 0 PID: 19710 Comm: syz-executor.2 Not tainted 5.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x228/0x450 fs/super.c:1672 io_write+0x6b5/0xb30 fs/io_uring.c:3296 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254 io_submit_sqe fs/io_uring.c:6324 [inline] io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45e179 Code: 3d b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1194e74c78 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00000000000082c0 RCX: 000000000045e179 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004 RBP: 000000000118cf98 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c R13: 00007ffd1aa5756f R14: 00007f1194e759c0 R15: 000000000118cf4c
Fix this by just not diving into details if we fail to trylock the io_uring mutex. We know the ctx isn't going away during this operation, but we cannot safely iterate buffers/files/personalities if we don't hold the io_uring mutex.
Reported-by: syzbot+2f8fa4e860edc3066aba@syzkaller.appspotmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index ebc3586b18795..d2bb2ae9551f0 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7998,11 +7998,19 @@ static int io_uring_show_cred(int id, void *p, void *data)
static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) { + bool has_lock; int i;
- mutex_lock(&ctx->uring_lock); + /* + * Avoid ABBA deadlock between the seq lock and the io_uring mutex, + * since fdinfo case grabs it in the opposite direction of normal use + * cases. If we fail to get the lock, we just don't iterate any + * structures that could be going away outside the io_uring mutex. + */ + has_lock = mutex_trylock(&ctx->uring_lock); + seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files); - for (i = 0; i < ctx->nr_user_files; i++) { + for (i = 0; has_lock && i < ctx->nr_user_files; i++) { struct fixed_file_table *table; struct file *f;
@@ -8014,13 +8022,13 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) seq_printf(m, "%5u: <none>\n", i); } seq_printf(m, "UserBufs:\t%u\n", ctx->nr_user_bufs); - for (i = 0; i < ctx->nr_user_bufs; i++) { + for (i = 0; has_lock && i < ctx->nr_user_bufs; i++) { struct io_mapped_ubuf *buf = &ctx->user_bufs[i];
seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, (unsigned int) buf->len); } - if (!idr_is_empty(&ctx->personality_idr)) { + if (has_lock && !idr_is_empty(&ctx->personality_idr)) { seq_printf(m, "Personalities:\n"); idr_for_each(&ctx->personality_idr, io_uring_show_cred, m); } @@ -8035,7 +8043,8 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) req->task->task_works != NULL); } spin_unlock_irq(&ctx->completion_lock); - mutex_unlock(&ctx->uring_lock); + if (has_lock) + mutex_unlock(&ctx->uring_lock); }
static void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
From: Sudheesh Mavila sudheesh.mavila@amd.com
[ Upstream commit 97cf32996c46d9935cc133d910a75fb687dd6144 ]
SMU10_UMD_PSTATE_PEAK_FCLK value should not be used to set the DPM.
Suggested-by: Evan Quan evan.quan@amd.com Reviewed-by: Evan Quan evan.quan@amd.com Signed-off-by: Sudheesh Mavila sudheesh.mavila@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c index 9ee8cf8267c88..43f7adff6cb74 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c @@ -563,6 +563,8 @@ static int smu10_dpm_force_dpm_level(struct pp_hwmgr *hwmgr, struct smu10_hwmgr *data = hwmgr->backend; uint32_t min_sclk = hwmgr->display_config->min_core_set_clock; uint32_t min_mclk = hwmgr->display_config->min_mem_set_clock/100; + uint32_t index_fclk = data->clock_vol_info.vdd_dep_on_fclk->count - 1; + uint32_t index_socclk = data->clock_vol_info.vdd_dep_on_socclk->count - 1;
if (hwmgr->smu_version < 0x1E3700) { pr_info("smu firmware version too old, can not set dpm level\n"); @@ -676,13 +678,13 @@ static int smu10_dpm_force_dpm_level(struct pp_hwmgr *hwmgr, smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinFclkByFreq, hwmgr->display_config->num_display > 3 ? - SMU10_UMD_PSTATE_PEAK_FCLK : + data->clock_vol_info.vdd_dep_on_fclk->entries[0].clk : min_mclk, NULL);
smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinSocclkByFreq, - SMU10_UMD_PSTATE_MIN_SOCCLK, + data->clock_vol_info.vdd_dep_on_socclk->entries[0].clk, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinVcn, @@ -695,11 +697,11 @@ static int smu10_dpm_force_dpm_level(struct pp_hwmgr *hwmgr, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxFclkByFreq, - SMU10_UMD_PSTATE_PEAK_FCLK, + data->clock_vol_info.vdd_dep_on_fclk->entries[index_fclk].clk, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxSocclkByFreq, - SMU10_UMD_PSTATE_PEAK_SOCCLK, + data->clock_vol_info.vdd_dep_on_socclk->entries[index_socclk].clk, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxVcn,
From: Flora Cui flora.cui@amd.com
[ Upstream commit 898c7302f4de1d91065e80fc46552b3ec70894ff ]
max_caps might be 0, thus hdcp_work might be ZERO_SIZE_PTR
Signed-off-by: Flora Cui flora.cui@amd.com Reviewed-by: Feifei Xu Feifei.Xu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c index 949d10ef83040..6dd1f3f8d9903 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c @@ -568,7 +568,7 @@ struct hdcp_workqueue *hdcp_create_workqueue(struct amdgpu_device *adev, struct int i = 0;
hdcp_work = kcalloc(max_caps, sizeof(*hdcp_work), GFP_KERNEL); - if (hdcp_work == NULL) + if (ZERO_OR_NULL_PTR(hdcp_work)) return NULL;
hdcp_work->srm = kcalloc(PSP_HDCP_SRM_FIRST_GEN_MAX_SIZE, sizeof(*hdcp_work->srm), GFP_KERNEL);
From: Zack Rusin zackr@vmware.com
[ Upstream commit f54c4442893b8dfbd3aff8e903c54dfff1aef990 ]
ttm_mem_type_manager_func.get_node was changed to return -ENOSPC instead of setting the node pointer to NULL. Unfortunately vmwgfx still had two places where it was explicitly converting -ENOSPC to 0 causing regressions. This fixes those spots by allowing -ENOSPC to be returned. That seems to fix recent regressions with vmwgfx.
Signed-off-by: Zack Rusin zackr@vmware.com Reviewed-by: Roland Scheidegger sroland@vmware.com Reviewed-by: Martin Krastev krastevm@vmware.com Sigend-off-by: Roland Scheidegger sroland@vmware.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_thp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c index 7da752ca1c34b..b93c558dd86e0 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c @@ -57,7 +57,7 @@ static int vmw_gmrid_man_get_node(struct ttm_mem_type_manager *man,
id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL); if (id < 0) - return (id != -ENOMEM ? 0 : id); + return id;
spin_lock(&gman->lock);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c index b7c816ba71663..c8b9335bccd8d 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c @@ -95,7 +95,7 @@ found_unlock: mem->start = node->start; }
- return 0; + return ret; }
Hi,
this commit should NOT be applied to 5.8. It fixes a regression introduced by 58e4d686d456c3e356439ae160ff4a0728940b8e (drm/ttm: cleanup ttm_mem_type_manager_func.get_node interface v3) which is part of 5.9 but not 5.8. Applying this to 5.8 will very likely break things. I don't know why it ended up as a candidate for 5.8.
Roland
Am 12.10.20 um 15:30 schrieb Greg Kroah-Hartman:
From: Zack Rusin zackr@vmware.com
[ Upstream commit f54c4442893b8dfbd3aff8e903c54dfff1aef990 ]
ttm_mem_type_manager_func.get_node was changed to return -ENOSPC instead of setting the node pointer to NULL. Unfortunately vmwgfx still had two places where it was explicitly converting -ENOSPC to 0 causing regressions. This fixes those spots by allowing -ENOSPC to be returned. That seems to fix recent regressions with vmwgfx.
Signed-off-by: Zack Rusin zackr@vmware.com Reviewed-by: Roland Scheidegger sroland@vmware.com Reviewed-by: Martin Krastev krastevm@vmware.com Sigend-off-by: Roland Scheidegger sroland@vmware.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_thp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c index 7da752ca1c34b..b93c558dd86e0 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c @@ -57,7 +57,7 @@ static int vmw_gmrid_man_get_node(struct ttm_mem_type_manager *man, id = ida_alloc_max(&gman->gmr_ida, gman->max_gmr_ids - 1, GFP_KERNEL); if (id < 0)
return (id != -ENOMEM ? 0 : id);
return id;
spin_lock(&gman->lock); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c index b7c816ba71663..c8b9335bccd8d 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_thp.c @@ -95,7 +95,7 @@ found_unlock: mem->start = node->start; }
- return 0;
- return ret;
}
On Tue, Oct 13, 2020 at 05:55:31PM +0200, Roland Scheidegger wrote:
Hi,
this commit should NOT be applied to 5.8. It fixes a regression introduced by 58e4d686d456c3e356439ae160ff4a0728940b8e (drm/ttm: cleanup ttm_mem_type_manager_func.get_node interface v3) which is part of 5.9 but not 5.8. Applying this to 5.8 will very likely break things. I don't know why it ended up as a candidate for 5.8.
Now dropped, thanks. And it was probably picked up due to the wording in the changelog text, along with a lack of a "Fixes:" tag that pointed at the exact change it fixed up, which would have shown that this is a 5.9-only thing.
thanks,
greg k-h
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit a466c85edc6fbe845facc8f57c408c544f42899e ]
When closing and freeing the source device we could end up doing our final blkdev_put() on the bdev, which will grab the bd_mutex. As such we want to be holding as few locks as possible, so move this call outside of the dev_replace->lock_finishing_cancel_unmount lock. Since we're modifying the fs_devices we need to make sure we're holding the uuid_mutex here, so take that as well.
There's a report from syzbot probably hitting one of the cases where the bd_mutex and device_list_mutex are taken in the wrong order, however it's not with device replace, like this patch fixes. As there's no reproducer available so far, we can't verify the fix.
https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/ dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419
WARNING: possible circular locking dependency detected 5.9.0-rc5-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.0/6878 is trying to acquire lock: ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804
but task is already holding lock: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255 btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109 __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916 find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline] find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
-> #3 (sb_internal#2){.+.+}-{0:0}: percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x234/0x470 fs/super.c:1672 sb_start_intwrite include/linux/fs.h:1690 [inline] start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624 find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline] find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
-> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __flush_work+0x60e/0xac0 kernel/workqueue.c:3041 wb_shutdown+0x180/0x220 mm/backing-dev.c:355 bdi_unregister+0x174/0x590 mm/backing-dev.c:872 del_gendisk+0x820/0xa10 block/genhd.c:933 loop_remove drivers/block/loop.c:2192 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (loop_ctl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lo_open+0x19/0xd0 drivers/block/loop.c:1893 __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507 blkdev_get fs/block_dev.c:1639 [inline] blkdev_open+0x227/0x300 fs/block_dev.c:1753 do_dentry_open+0x4b9/0x11b0 fs/open.c:817 do_open fs/namei.c:3251 [inline] path_openat+0x1b9a/0x2730 fs/namei.c:3368 do_filp_open+0x17e/0x3c0 fs/namei.c:3395 do_sys_openat2+0x16d/0x420 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_sys_open fs/open.c:1192 [inline] __se_sys_open fs/open.c:1188 [inline] __x64_sys_open+0x119/0x1c0 fs/open.c:1188 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&bdev->bd_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of: &bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&fs_devs->device_list_mutex); lock(sb_internal#2); lock(&fs_devs->device_list_mutex); lock(&bdev->bd_mutex);
*** DEADLOCK ***
3 locks held by syz-executor.0/6878: #0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365 #1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178 #2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
stack backtrace: CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x460027 RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027 RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0 RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460 R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460
Signed-off-by: Josef Bacik josef@toxicpanda.com [ add syzbot reference ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/dev-replace.c | 3 ++- fs/btrfs/volumes.c | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 26c9da82e6a91..e4a1c6afe35dc 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -786,7 +786,6 @@ error: if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &src_device->dev_state)) btrfs_scratch_superblocks(fs_info, src_device->bdev, src_device->name->str); - btrfs_rm_dev_replace_free_srcdev(src_device);
/* write back the superblocks */ trans = btrfs_start_transaction(root, 0); @@ -795,6 +794,8 @@ error:
mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
+ btrfs_rm_dev_replace_free_srcdev(src_device); + return 0; }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8b5f666a3ea66..79e9a80bd37a0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2224,6 +2224,8 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev) struct btrfs_fs_info *fs_info = srcdev->fs_info; struct btrfs_fs_devices *fs_devices = srcdev->fs_devices;
+ mutex_lock(&uuid_mutex); + btrfs_close_bdev(srcdev); synchronize_rcu(); btrfs_free_device(srcdev); @@ -2252,6 +2254,7 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev) close_fs_devices(fs_devices); free_fs_devices(fs_devices); } + mutex_unlock(&uuid_mutex); }
void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
From: Lu Baolu baolu.lu@linux.intel.com
[ Upstream commit 1a3f2fd7fc4e8f24510830e265de2ffb8e3300d2 ]
Lock(&iommu->lock) without disabling irq causes lockdep warnings.
[ 12.703950] ======================================================== [ 12.703962] WARNING: possible irq lock inversion dependency detected [ 12.703975] 5.9.0-rc6+ #659 Not tainted [ 12.703983] -------------------------------------------------------- [ 12.703995] systemd-udevd/284 just changed the state of lock: [ 12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at: iommu_flush_dev_iotlb.part.57+0x2e/0x90 [ 12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 12.704043] (&iommu->lock){+.+.}-{2:2} [ 12.704045]
and interrupts could create inverse lock ordering between them.
[ 12.704073] other info that might help us debug this: [ 12.704085] Possible interrupt unsafe locking scenario:
[ 12.704097] CPU0 CPU1 [ 12.704106] ---- ---- [ 12.704115] lock(&iommu->lock); [ 12.704123] local_irq_disable(); [ 12.704134] lock(device_domain_lock); [ 12.704146] lock(&iommu->lock); [ 12.704158] <Interrupt> [ 12.704164] lock(device_domain_lock); [ 12.704174] *** DEADLOCK ***
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/intel/iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index fbe0b0cc56edf..24a84d294fd01 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2617,7 +2617,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, }
/* Setup the PASID entry for requests without PASID: */ - spin_lock(&iommu->lock); + spin_lock_irqsave(&iommu->lock, flags); if (hw_pass_through && domain_type_is_si(domain)) ret = intel_pasid_setup_pass_through(iommu, domain, dev, PASID_RID2PASID); @@ -2627,7 +2627,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, else ret = intel_pasid_setup_second_level(iommu, domain, dev, PASID_RID2PASID); - spin_unlock(&iommu->lock); + spin_unlock_irqrestore(&iommu->lock, flags); if (ret) { dev_err(dev, "Setup RID2PASID failed\n"); dmar_remove_one_dev_info(dev);
From: Antony Antony antony.antony@secunet.com
[ Upstream commit 545e5c571662b1cd79d9588f9d3b6e36985b8007 ]
XFRMA_SET_MARK and XFRMA_SET_MARK_MASK was not cloned from the old to the new. Migrate these two attributes during XFRMA_MSG_MIGRATE
Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Antony Antony antony.antony@secunet.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_state.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 8be2d926acc21..8ec3a6a12dd34 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1510,6 +1510,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, }
memcpy(&x->mark, &orig->mark, sizeof(x->mark)); + memcpy(&x->props.smark, &orig->props.smark, sizeof(x->props.smark));
if (xfrm_init_state(x) < 0) goto error;
From: Antony Antony antony.antony@secunet.com
[ Upstream commit 91a46c6d1b4fcbfa4773df9421b8ad3e58088101 ]
XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE
v1->v2: - move curleft cloning to a separate patch
Fixes: af2f464e326e ("xfrm: Assign esn pointers when cloning a state") Signed-off-by: Antony Antony antony.antony@secunet.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/xfrm.h | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 51f65d23ebafa..2e32cb10ac16b 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1767,21 +1767,17 @@ static inline unsigned int xfrm_replay_state_esn_len(struct xfrm_replay_state_es static inline int xfrm_replay_clone(struct xfrm_state *x, struct xfrm_state *orig) { - x->replay_esn = kzalloc(xfrm_replay_state_esn_len(orig->replay_esn), + + x->replay_esn = kmemdup(orig->replay_esn, + xfrm_replay_state_esn_len(orig->replay_esn), GFP_KERNEL); if (!x->replay_esn) return -ENOMEM; - - x->replay_esn->bmp_len = orig->replay_esn->bmp_len; - x->replay_esn->replay_window = orig->replay_esn->replay_window; - - x->preplay_esn = kmemdup(x->replay_esn, - xfrm_replay_state_esn_len(x->replay_esn), + x->preplay_esn = kmemdup(orig->preplay_esn, + xfrm_replay_state_esn_len(orig->preplay_esn), GFP_KERNEL); - if (!x->preplay_esn) { - kfree(x->replay_esn); + if (!x->preplay_esn) return -ENOMEM; - }
return 0; }
From: Antony Antony antony.antony@secunet.com
[ Upstream commit 7aa05d304785204703a67a6aa7f1db402889a172 ]
XFRMA_SEC_CTX was not cloned from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE
v1->v2: - return -ENOMEM on error v2->v3: - fix return type to int
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony antony.antony@secunet.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_state.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 8ec3a6a12dd34..3a2c1f15d31dd 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1441,6 +1441,30 @@ out: EXPORT_SYMBOL(xfrm_state_add);
#ifdef CONFIG_XFRM_MIGRATE +static inline int clone_security(struct xfrm_state *x, struct xfrm_sec_ctx *security) +{ + struct xfrm_user_sec_ctx *uctx; + int size = sizeof(*uctx) + security->ctx_len; + int err; + + uctx = kmalloc(size, GFP_KERNEL); + if (!uctx) + return -ENOMEM; + + uctx->exttype = XFRMA_SEC_CTX; + uctx->len = size; + uctx->ctx_doi = security->ctx_doi; + uctx->ctx_alg = security->ctx_alg; + uctx->ctx_len = security->ctx_len; + memcpy(uctx + 1, security->ctx_str, security->ctx_len); + err = security_xfrm_state_alloc(x, uctx); + kfree(uctx); + if (err) + return err; + + return 0; +} + static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, struct xfrm_encap_tmpl *encap) { @@ -1497,6 +1521,10 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, goto error; }
+ if (orig->security) + if (clone_security(x, orig->security)) + goto error; + if (orig->coaddr) { x->coaddr = kmemdup(orig->coaddr, sizeof(*x->coaddr), GFP_KERNEL);
From: Antony Antony antony.antony@secunet.com
[ Upstream commit 8366685b2883e523f91e9816d7be371eb1144749 ]
When we clone state only add_time was cloned. It missed values like bytes, packets. Now clone the all members of the structure.
v1->v3: - use memcpy to copy the entire structure
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony antony.antony@secunet.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 3a2c1f15d31dd..6b431a3830721 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1550,7 +1550,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, x->tfcpad = orig->tfcpad; x->replay_maxdiff = orig->replay_maxdiff; x->replay_maxage = orig->replay_maxage; - x->curlft.add_time = orig->curlft.add_time; + memcpy(&x->curlft, &orig->curlft, sizeof(x->curlft)); x->km.state = orig->km.state; x->km.seq = orig->km.seq; x->replay = orig->replay;
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 642e450b6b5955f2059d0ae372183f7c6323f951 ]
In the skb Tx path, transmission of a packet is performed with dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it signifies that it was not possible to send the packet right now, please try later. Unfortunately, the xsk transmit code discarded the packet and returned EBUSY to the application. Fix this unnecessary packet loss, by not discarding the packet in the Tx ring and return EAGAIN. As EAGAIN is returned to the application, it can then retry the send operation later and the packet will then likely be sent as the driver will then likely have space/resources to send the packet.
In summary, EAGAIN tells the application that the packet was not discarded from the Tx ring and that it needs to call send() again. EBUSY, on the other hand, signifies that the packet was not sent and discarded from the Tx ring. The application needs to put the packet on the Tx ring again if it wants it to be sent.
Fixes: 35fcde7f8deb ("xsk: support for Tx") Reported-by: Arkadiusz Zema A.Zema@falconvsystems.com Suggested-by: Arkadiusz Zema A.Zema@falconvsystems.com Suggested-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Link: https://lore.kernel.org/bpf/1600257625-2353-1-git-send-email-magnus.karlsson... Signed-off-by: Sasha Levin sashal@kernel.org --- net/xdp/xsk.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3700266229f63..dcce888b8ef54 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -375,15 +375,30 @@ static int xsk_generic_xmit(struct sock *sk) skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; skb->destructor = xsk_destruct_skb;
+ /* Hinder dev_direct_xmit from freeing the packet and + * therefore completing it in the destructor + */ + refcount_inc(&skb->users); err = dev_direct_xmit(skb, xs->queue_id); + if (err == NETDEV_TX_BUSY) { + /* Tell user-space to retry the send */ + skb->destructor = sock_wfree; + /* Free skb without triggering the perf drop trace */ + consume_skb(skb); + err = -EAGAIN; + goto out; + } + xskq_cons_release(xs->tx); /* Ignore NET_XMIT_CN as packet might have been sent */ - if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) { + if (err == NET_XMIT_DROP) { /* SKB completed but not sent */ + kfree_skb(skb); err = -EBUSY; goto out; }
+ consume_skb(skb); sent_frame = true; }
From: Voon Weifeng weifeng.voon@intel.com
[ Upstream commit 7241c5a697479c7d0c5a96595822cdab750d41ae ]
EEE should be only be enabled during stmmac_mac_link_up() when the link are up and being set up properly. set_eee should only do settings configuration and disabling the eee.
Without this fix, turning on EEE using ethtool will return "Operation not supported". This is due to the driver is in a dead loop waiting for eee to be advertised in the for eee to be activated but the driver will only configure the EEE advertisement after the eee is activated.
Ethtool should only return "Operation not supported" if there is no EEE capbility in the MAC controller.
Fixes: 8a7493e58ad6 ("net: stmmac: Fix a race in EEE enable callback") Signed-off-by: Voon Weifeng weifeng.voon@intel.com Acked-by: Mark Gross mgross@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index eae11c5850251..c16d0cc3e9c44 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -662,23 +662,16 @@ static int stmmac_ethtool_op_set_eee(struct net_device *dev, struct stmmac_priv *priv = netdev_priv(dev); int ret;
- if (!edata->eee_enabled) { + if (!priv->dma_cap.eee) + return -EOPNOTSUPP; + + if (!edata->eee_enabled) stmmac_disable_eee_mode(priv); - } else { - /* We are asking for enabling the EEE but it is safe - * to verify all by invoking the eee_init function. - * In case of failure it will return an error. - */ - edata->eee_enabled = stmmac_eee_init(priv); - if (!edata->eee_enabled) - return -EOPNOTSUPP; - }
ret = phylink_ethtool_set_eee(priv->phylink, edata); if (ret) return ret;
- priv->eee_enabled = edata->eee_enabled; priv->tx_lpi_timer = edata->tx_lpi_timer; return 0; }
From: Necip Fazil Yildiran fazilyildiran@gmail.com
[ Upstream commit 8f0c01e666685c4d2e1a233e6f4d7ab16c9f8b2a ]
When LG_LAPTOP is enabled and NEW_LEDS is disabled, it results in the following Kbuild warning:
WARNING: unmet direct dependencies detected for LEDS_CLASS Depends on [n]: NEW_LEDS [=n] Selected by [y]: - LG_LAPTOP [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y] && ACPI_WMI [=y] && INPUT [=y]
The reason is that LG_LAPTOP selects LEDS_CLASS without depending on or selecting NEW_LEDS while LEDS_CLASS is subordinate to NEW_LEDS.
Honor the kconfig menu hierarchy to remove kconfig dependency warnings.
Fixes: dbf0c5a6b1f8 ("platform/x86: Add LG Gram laptop special features driver") Signed-off-by: Necip Fazil Yildiran fazilyildiran@gmail.com Reviewed-by: Hans de Goede hdegoede@redhat.com Acked-by: mark gross mgross@linux.intel.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 0581a54cf562f..e1668a9538c8f 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1091,6 +1091,7 @@ config LG_LAPTOP depends on ACPI_WMI depends on INPUT select INPUT_SPARSEKMAP + select NEW_LEDS select LEDS_CLASS help This driver adds support for hotkeys as well as control of keyboard
From: Necip Fazil Yildiran fazilyildiran@gmail.com
[ Upstream commit afdd1ebb72051e8b6b83c4d7dc542a9be0e1352d ]
When FUJITSU_LAPTOP is enabled and NEW_LEDS is disabled, it results in the following Kbuild warning:
WARNING: unmet direct dependencies detected for LEDS_CLASS Depends on [n]: NEW_LEDS [=n] Selected by [y]: - FUJITSU_LAPTOP [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y] && INPUT [=y] && BACKLIGHT_CLASS_DEVICE [=y] && (ACPI_VIDEO [=n] || ACPI_VIDEO [=n]=n)
The reason is that FUJITSU_LAPTOP selects LEDS_CLASS without depending on or selecting NEW_LEDS while LEDS_CLASS is subordinate to NEW_LEDS.
Honor the kconfig menu hierarchy to remove kconfig dependency warnings.
Reported-by: Hans de Goede hdegoede@redhat.com Fixes: d89bcc83e709 ("platform/x86: fujitsu-laptop: select LEDS_CLASS") Signed-off-by: Necip Fazil Yildiran fazilyildiran@gmail.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index e1668a9538c8f..a5ad36083b671 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -469,6 +469,7 @@ config FUJITSU_LAPTOP depends on BACKLIGHT_CLASS_DEVICE depends on ACPI_VIDEO || ACPI_VIDEO = n select INPUT_SPARSEKMAP + select NEW_LEDS select LEDS_CLASS help This is a driver for laptops built by Fujitsu:
From: Luo bin luobin9@huawei.com
[ Upstream commit 90f86b8a36c065286b743eed29661fc5cd00d342 ]
improve the error message when functions return failure and dump relevant registers in some exception handling processes
Signed-off-by: Luo bin luobin9@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/huawei/hinic/hinic_hw_api_cmd.c | 27 +++++++- .../ethernet/huawei/hinic/hinic_hw_api_cmd.h | 4 ++ .../net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 2 + .../net/ethernet/huawei/hinic/hinic_hw_cmdq.h | 2 + .../net/ethernet/huawei/hinic/hinic_hw_dev.c | 12 ++-- .../net/ethernet/huawei/hinic/hinic_hw_eqs.c | 39 ++++++++++++ .../net/ethernet/huawei/hinic/hinic_hw_eqs.h | 6 +- .../net/ethernet/huawei/hinic/hinic_hw_if.c | 23 +++++++ .../net/ethernet/huawei/hinic/hinic_hw_if.h | 10 ++- .../net/ethernet/huawei/hinic/hinic_hw_mbox.c | 2 + .../net/ethernet/huawei/hinic/hinic_hw_mgmt.c | 1 + .../net/ethernet/huawei/hinic/hinic_port.c | 62 +++++++++---------- .../net/ethernet/huawei/hinic/hinic_sriov.c | 6 +- 13 files changed, 151 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c index 583fd24c29cf6..29e88e25a4a4f 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c @@ -112,6 +112,26 @@ static u32 get_hw_cons_idx(struct hinic_api_cmd_chain *chain) return HINIC_API_CMD_STATUS_GET(val, CONS_IDX); }
+static void dump_api_chain_reg(struct hinic_api_cmd_chain *chain) +{ + u32 addr, val; + + addr = HINIC_CSR_API_CMD_STATUS_ADDR(chain->chain_type); + val = hinic_hwif_read_reg(chain->hwif, addr); + + dev_err(&chain->hwif->pdev->dev, "Chain type: 0x%x, cpld error: 0x%x, check error: 0x%x, current fsm: 0x%x\n", + chain->chain_type, HINIC_API_CMD_STATUS_GET(val, CPLD_ERR), + HINIC_API_CMD_STATUS_GET(val, CHKSUM_ERR), + HINIC_API_CMD_STATUS_GET(val, FSM)); + + dev_err(&chain->hwif->pdev->dev, "Chain hw current ci: 0x%x\n", + HINIC_API_CMD_STATUS_GET(val, CONS_IDX)); + + addr = HINIC_CSR_API_CMD_CHAIN_PI_ADDR(chain->chain_type); + val = hinic_hwif_read_reg(chain->hwif, addr); + dev_err(&chain->hwif->pdev->dev, "Chain hw current pi: 0x%x\n", val); +} + /** * chain_busy - check if the chain is still processing last requests * @chain: chain to check @@ -131,8 +151,10 @@ static int chain_busy(struct hinic_api_cmd_chain *chain)
/* check for a space for a new command */ if (chain->cons_idx == MASKED_IDX(chain, prod_idx + 1)) { - dev_err(&pdev->dev, "API CMD chain %d is busy\n", - chain->chain_type); + dev_err(&pdev->dev, "API CMD chain %d is busy, cons_idx: %d, prod_idx: %d\n", + chain->chain_type, chain->cons_idx, + chain->prod_idx); + dump_api_chain_reg(chain); return -EBUSY; } break; @@ -332,6 +354,7 @@ static int wait_for_api_cmd_completion(struct hinic_api_cmd_chain *chain) err = wait_for_status_poll(chain); if (err) { dev_err(&pdev->dev, "API CMD Poll status timeout\n"); + dump_api_chain_reg(chain); break; } break; diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h index 0ba00fd828dfc..6d1654b050ad5 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.h @@ -103,10 +103,14 @@ HINIC_API_CMD_STATUS_HEADER_##member##_MASK)
#define HINIC_API_CMD_STATUS_CONS_IDX_SHIFT 0 +#define HINIC_API_CMD_STATUS_FSM_SHIFT 24 #define HINIC_API_CMD_STATUS_CHKSUM_ERR_SHIFT 28 +#define HINIC_API_CMD_STATUS_CPLD_ERR_SHIFT 30
#define HINIC_API_CMD_STATUS_CONS_IDX_MASK 0xFFFFFF +#define HINIC_API_CMD_STATUS_FSM_MASK 0xFU #define HINIC_API_CMD_STATUS_CHKSUM_ERR_MASK 0x3 +#define HINIC_API_CMD_STATUS_CPLD_ERR_MASK 0x1U
#define HINIC_API_CMD_STATUS_GET(val, member) \ (((val) >> HINIC_API_CMD_STATUS_##member##_SHIFT) & \ diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c index cb5b6e5f787f2..e0eb294779ec1 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.c @@ -401,6 +401,7 @@ static int cmdq_sync_cmd_direct_resp(struct hinic_cmdq *cmdq,
spin_unlock_bh(&cmdq->cmdq_lock);
+ hinic_dump_ceq_info(cmdq->hwdev); return -ETIMEDOUT; }
@@ -807,6 +808,7 @@ static int init_cmdqs_ctxt(struct hinic_hwdev *hwdev,
cmdq_type = HINIC_CMDQ_SYNC; for (; cmdq_type < HINIC_MAX_CMDQ_TYPES; cmdq_type++) { + cmdqs->cmdq[cmdq_type].hwdev = hwdev; err = init_cmdq(&cmdqs->cmdq[cmdq_type], &cmdqs->saved_wqs[cmdq_type], cmdq_type, db_area[cmdq_type]); diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h index 3e4b0aef9fe6c..f40c31e1879f1 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_cmdq.h @@ -130,6 +130,8 @@ struct hinic_cmdq_ctxt { };
struct hinic_cmdq { + struct hinic_hwdev *hwdev; + struct hinic_wq *wq;
enum hinic_cmdq_type cmdq_type; diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c index b735bc537508f..298ceb930cc62 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c @@ -253,9 +253,9 @@ static int init_fw_ctxt(struct hinic_hwdev *hwdev) &fw_ctxt, sizeof(fw_ctxt), &fw_ctxt, &out_size); if (err || (out_size != sizeof(fw_ctxt)) || fw_ctxt.status) { - dev_err(&pdev->dev, "Failed to init FW ctxt, ret = %d\n", - fw_ctxt.status); - return -EFAULT; + dev_err(&pdev->dev, "Failed to init FW ctxt, err: %d, status: 0x%x, out size: 0x%x\n", + err, fw_ctxt.status, out_size); + return -EIO; }
return 0; @@ -420,9 +420,9 @@ static int get_base_qpn(struct hinic_hwdev *hwdev, u16 *base_qpn) &cmd_base_qpn, sizeof(cmd_base_qpn), &cmd_base_qpn, &out_size); if (err || (out_size != sizeof(cmd_base_qpn)) || cmd_base_qpn.status) { - dev_err(&pdev->dev, "Failed to get base qpn, status = %d\n", - cmd_base_qpn.status); - return -EFAULT; + dev_err(&pdev->dev, "Failed to get base qpn, err: %d, status: 0x%x, out size: 0x%x\n", + err, cmd_base_qpn.status, out_size); + return -EIO; }
*base_qpn = cmd_base_qpn.qpn; diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c index 397936cac304c..ca8cb68a8d206 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.c @@ -953,3 +953,42 @@ void hinic_ceqs_free(struct hinic_ceqs *ceqs) for (q_id = 0; q_id < ceqs->num_ceqs; q_id++) remove_eq(&ceqs->ceq[q_id]); } + +void hinic_dump_ceq_info(struct hinic_hwdev *hwdev) +{ + struct hinic_eq *eq = NULL; + u32 addr, ci, pi; + int q_id; + + for (q_id = 0; q_id < hwdev->func_to_io.ceqs.num_ceqs; q_id++) { + eq = &hwdev->func_to_io.ceqs.ceq[q_id]; + addr = EQ_CONS_IDX_REG_ADDR(eq); + ci = hinic_hwif_read_reg(hwdev->hwif, addr); + addr = EQ_PROD_IDX_REG_ADDR(eq); + pi = hinic_hwif_read_reg(hwdev->hwif, addr); + dev_err(&hwdev->hwif->pdev->dev, "Ceq id: %d, ci: 0x%08x, sw_ci: 0x%08x, pi: 0x%x, tasklet_state: 0x%lx, wrap: %d, ceqe: 0x%x\n", + q_id, ci, eq->cons_idx, pi, + eq->ceq_tasklet.state, + eq->wrapped, be32_to_cpu(*(__be32 *)(GET_CURR_CEQ_ELEM(eq)))); + } +} + +void hinic_dump_aeq_info(struct hinic_hwdev *hwdev) +{ + struct hinic_aeq_elem *aeqe_pos = NULL; + struct hinic_eq *eq = NULL; + u32 addr, ci, pi; + int q_id; + + for (q_id = 0; q_id < hwdev->aeqs.num_aeqs; q_id++) { + eq = &hwdev->aeqs.aeq[q_id]; + addr = EQ_CONS_IDX_REG_ADDR(eq); + ci = hinic_hwif_read_reg(hwdev->hwif, addr); + addr = EQ_PROD_IDX_REG_ADDR(eq); + pi = hinic_hwif_read_reg(hwdev->hwif, addr); + aeqe_pos = GET_CURR_AEQ_ELEM(eq); + dev_err(&hwdev->hwif->pdev->dev, "Aeq id: %d, ci: 0x%08x, pi: 0x%x, work_state: 0x%x, wrap: %d, desc: 0x%x\n", + q_id, ci, pi, work_busy(&eq->aeq_work.work), + eq->wrapped, be32_to_cpu(aeqe_pos->desc)); + } +} diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h index 74b9ff90640c2..43065fc708693 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_eqs.h @@ -162,7 +162,7 @@ enum hinic_eqe_state {
struct hinic_aeq_elem { u8 data[HINIC_AEQE_DATA_SIZE]; - u32 desc; + __be32 desc; };
struct hinic_eq_work { @@ -254,4 +254,8 @@ int hinic_ceqs_init(struct hinic_ceqs *ceqs, struct hinic_hwif *hwif,
void hinic_ceqs_free(struct hinic_ceqs *ceqs);
+void hinic_dump_ceq_info(struct hinic_hwdev *hwdev); + +void hinic_dump_aeq_info(struct hinic_hwdev *hwdev); + #endif diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.c index cf127d896ba69..bc8925c0c982c 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.c @@ -21,6 +21,8 @@
#define WAIT_HWIF_READY_TIMEOUT 10000
+#define HINIC_SELFTEST_RESULT 0x883C + /** * hinic_msix_attr_set - set message attribute for msix entry * @hwif: the HW interface of a pci function device @@ -369,6 +371,26 @@ u16 hinic_pf_id_of_vf_hw(struct hinic_hwif *hwif) return HINIC_FA0_GET(attr0, PF_IDX); }
+static void __print_selftest_reg(struct hinic_hwif *hwif) +{ + u32 addr, attr0, attr1; + + addr = HINIC_CSR_FUNC_ATTR1_ADDR; + attr1 = hinic_hwif_read_reg(hwif, addr); + + if (attr1 == HINIC_PCIE_LINK_DOWN) { + dev_err(&hwif->pdev->dev, "PCIE is link down\n"); + return; + } + + addr = HINIC_CSR_FUNC_ATTR0_ADDR; + attr0 = hinic_hwif_read_reg(hwif, addr); + if (HINIC_FA0_GET(attr0, FUNC_TYPE) != HINIC_VF && + !HINIC_FA0_GET(attr0, PCI_INTF_IDX)) + dev_err(&hwif->pdev->dev, "Selftest reg: 0x%08x\n", + hinic_hwif_read_reg(hwif, HINIC_SELFTEST_RESULT)); +} + /** * hinic_init_hwif - initialize the hw interface * @hwif: the HW interface of a pci function device @@ -398,6 +420,7 @@ int hinic_init_hwif(struct hinic_hwif *hwif, struct pci_dev *pdev) err = wait_hwif_ready(hwif); if (err) { dev_err(&pdev->dev, "HW interface is not ready\n"); + __print_selftest_reg(hwif); goto err_hwif_ready; }
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h index 0872e035faa11..c06f2253151e2 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_if.h @@ -12,6 +12,8 @@ #include <linux/types.h> #include <asm/byteorder.h>
+#define HINIC_PCIE_LINK_DOWN 0xFFFFFFFF + #define HINIC_DMA_ATTR_ST_SHIFT 0 #define HINIC_DMA_ATTR_AT_SHIFT 8 #define HINIC_DMA_ATTR_PH_SHIFT 10 @@ -249,13 +251,17 @@ struct hinic_hwif {
static inline u32 hinic_hwif_read_reg(struct hinic_hwif *hwif, u32 reg) { - return be32_to_cpu(readl(hwif->cfg_regs_bar + reg)); + u32 out = readl(hwif->cfg_regs_bar + reg); + + return be32_to_cpu(*(__be32 *)&out); }
static inline void hinic_hwif_write_reg(struct hinic_hwif *hwif, u32 reg, u32 val) { - writel(cpu_to_be32(val), hwif->cfg_regs_bar + reg); + __be32 in = cpu_to_be32(val); + + writel(*(u32 *)&in, hwif->cfg_regs_bar + reg); }
int hinic_msix_attr_set(struct hinic_hwif *hwif, u16 msix_index, diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c index bc2f87e6cb5d7..47c93f946b94d 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c @@ -650,6 +650,7 @@ wait_for_mbox_seg_completion(struct hinic_mbox_func_to_func *func_to_func, if (!wait_for_completion_timeout(done, jif)) { dev_err(&hwdev->hwif->pdev->dev, "Send mailbox segment timeout\n"); dump_mox_reg(hwdev); + hinic_dump_aeq_info(hwdev); return -ETIMEDOUT; }
@@ -897,6 +898,7 @@ int hinic_mbox_to_func(struct hinic_mbox_func_to_func *func_to_func, set_mbox_to_func_event(func_to_func, EVENT_TIMEOUT); dev_err(&func_to_func->hwif->pdev->dev, "Send mbox msg timeout, msg_id: %d\n", msg_info.msg_id); + hinic_dump_aeq_info(func_to_func->hwdev); err = -ETIMEDOUT; goto err_send_mbox; } diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c index 7fe39a155b329..070288c8b4f37 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_mgmt.c @@ -276,6 +276,7 @@ static int msg_to_mgmt_sync(struct hinic_pf_to_mgmt *pf_to_mgmt,
if (!wait_for_completion_timeout(recv_done, timeo)) { dev_err(&pdev->dev, "MGMT timeout, MSG id = %d\n", msg_id); + hinic_dump_aeq_info(pf_to_mgmt->hwdev); err = -ETIMEDOUT; goto unlock_sync_msg; } diff --git a/drivers/net/ethernet/huawei/hinic/hinic_port.c b/drivers/net/ethernet/huawei/hinic/hinic_port.c index 175c0ee000384..82d5c50630e64 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_port.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_port.c @@ -61,8 +61,8 @@ static int change_mac(struct hinic_dev *nic_dev, const u8 *addr, (port_mac_cmd.status && port_mac_cmd.status != HINIC_PF_SET_VF_ALREADY && port_mac_cmd.status != HINIC_MGMT_STATUS_EXIST)) { - dev_err(&pdev->dev, "Failed to change MAC, ret = %d\n", - port_mac_cmd.status); + dev_err(&pdev->dev, "Failed to change MAC, err: %d, status: 0x%x, out size: 0x%x\n", + err, port_mac_cmd.status, out_size); return -EFAULT; }
@@ -129,8 +129,8 @@ int hinic_port_get_mac(struct hinic_dev *nic_dev, u8 *addr) &port_mac_cmd, sizeof(port_mac_cmd), &port_mac_cmd, &out_size); if (err || (out_size != sizeof(port_mac_cmd)) || port_mac_cmd.status) { - dev_err(&pdev->dev, "Failed to get mac, ret = %d\n", - port_mac_cmd.status); + dev_err(&pdev->dev, "Failed to get mac, err: %d, status: 0x%x, out size: 0x%x\n", + err, port_mac_cmd.status, out_size); return -EFAULT; }
@@ -172,9 +172,9 @@ int hinic_port_set_mtu(struct hinic_dev *nic_dev, int new_mtu) err = hinic_port_msg_cmd(hwdev, HINIC_PORT_CMD_CHANGE_MTU, &port_mtu_cmd, sizeof(port_mtu_cmd), &port_mtu_cmd, &out_size); - if (err || (out_size != sizeof(port_mtu_cmd)) || port_mtu_cmd.status) { - dev_err(&pdev->dev, "Failed to set mtu, ret = %d\n", - port_mtu_cmd.status); + if (err || out_size != sizeof(port_mtu_cmd) || port_mtu_cmd.status) { + dev_err(&pdev->dev, "Failed to set mtu, err: %d, status: 0x%x, out size: 0x%x\n", + err, port_mtu_cmd.status, out_size); return -EFAULT; }
@@ -264,8 +264,8 @@ int hinic_port_link_state(struct hinic_dev *nic_dev, &link_cmd, sizeof(link_cmd), &link_cmd, &out_size); if (err || (out_size != sizeof(link_cmd)) || link_cmd.status) { - dev_err(&pdev->dev, "Failed to get link state, ret = %d\n", - link_cmd.status); + dev_err(&pdev->dev, "Failed to get link state, err: %d, status: 0x%x, out size: 0x%x\n", + err, link_cmd.status, out_size); return -EINVAL; }
@@ -298,8 +298,8 @@ int hinic_port_set_state(struct hinic_dev *nic_dev, enum hinic_port_state state) &port_state, sizeof(port_state), &port_state, &out_size); if (err || (out_size != sizeof(port_state)) || port_state.status) { - dev_err(&pdev->dev, "Failed to set port state, ret = %d\n", - port_state.status); + dev_err(&pdev->dev, "Failed to set port state, err: %d, status: 0x%x, out size: 0x%x\n", + err, port_state.status, out_size); return -EFAULT; }
@@ -330,8 +330,8 @@ int hinic_port_set_func_state(struct hinic_dev *nic_dev, &func_state, sizeof(func_state), &func_state, &out_size); if (err || (out_size != sizeof(func_state)) || func_state.status) { - dev_err(&pdev->dev, "Failed to set port func state, ret = %d\n", - func_state.status); + dev_err(&pdev->dev, "Failed to set port func state, err: %d, status: 0x%x, out size: 0x%x\n", + err, func_state.status, out_size); return -EFAULT; }
@@ -361,9 +361,9 @@ int hinic_port_get_cap(struct hinic_dev *nic_dev, port_cap, &out_size); if (err || (out_size != sizeof(*port_cap)) || port_cap->status) { dev_err(&pdev->dev, - "Failed to get port capabilities, ret = %d\n", - port_cap->status); - return -EINVAL; + "Failed to get port capabilities, err: %d, status: 0x%x, out size: 0x%x\n", + err, port_cap->status, out_size); + return -EIO; }
return 0; @@ -393,9 +393,9 @@ int hinic_port_set_tso(struct hinic_dev *nic_dev, enum hinic_tso_state state) &tso_cfg, &out_size); if (err || out_size != sizeof(tso_cfg) || tso_cfg.status) { dev_err(&pdev->dev, - "Failed to set port tso, ret = %d\n", - tso_cfg.status); - return -EINVAL; + "Failed to set port tso, err: %d, status: 0x%x, out size: 0x%x\n", + err, tso_cfg.status, out_size); + return -EIO; }
return 0; @@ -423,9 +423,9 @@ int hinic_set_rx_csum_offload(struct hinic_dev *nic_dev, u32 en) &rx_csum_cfg, &out_size); if (err || !out_size || rx_csum_cfg.status) { dev_err(&pdev->dev, - "Failed to set rx csum offload, ret = %d\n", - rx_csum_cfg.status); - return -EINVAL; + "Failed to set rx csum offload, err: %d, status: 0x%x, out size: 0x%x\n", + err, rx_csum_cfg.status, out_size); + return -EIO; }
return 0; @@ -480,9 +480,9 @@ int hinic_set_max_qnum(struct hinic_dev *nic_dev, u8 num_rqs) &rq_num, &out_size); if (err || !out_size || rq_num.status) { dev_err(&pdev->dev, - "Failed to rxq number, ret = %d\n", - rq_num.status); - return -EINVAL; + "Failed to set rxq number, err: %d, status: 0x%x, out size: 0x%x\n", + err, rq_num.status, out_size); + return -EIO; }
return 0; @@ -508,9 +508,9 @@ static int hinic_set_rx_lro(struct hinic_dev *nic_dev, u8 ipv4_en, u8 ipv6_en, &lro_cfg, &out_size); if (err || !out_size || lro_cfg.status) { dev_err(&pdev->dev, - "Failed to set lro offload, ret = %d\n", - lro_cfg.status); - return -EINVAL; + "Failed to set lro offload, err: %d, status: 0x%x, out size: 0x%x\n", + err, lro_cfg.status, out_size); + return -EIO; }
return 0; @@ -542,10 +542,10 @@ static int hinic_set_rx_lro_timer(struct hinic_dev *nic_dev, u32 timer_value)
if (err || !out_size || lro_timer.status) { dev_err(&pdev->dev, - "Failed to set lro timer, ret = %d\n", - lro_timer.status); + "Failed to set lro timer, err: %d, status: 0x%x, out size: 0x%x\n", + err, lro_timer.status, out_size);
- return -EINVAL; + return -EIO; }
return 0; diff --git a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c index efab2dd2c889b..1043389754df0 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c @@ -40,9 +40,9 @@ static int hinic_set_mac(struct hinic_hwdev *hwdev, const u8 *mac_addr, if (err || out_size != sizeof(mac_info) || (mac_info.status && mac_info.status != HINIC_PF_SET_VF_ALREADY && mac_info.status != HINIC_MGMT_STATUS_EXIST)) { - dev_err(&hwdev->func_to_io.hwif->pdev->dev, "Failed to change MAC, ret = %d\n", - mac_info.status); - return -EFAULT; + dev_err(&hwdev->func_to_io.hwif->pdev->dev, "Failed to set MAC, err: %d, status: 0x%x, out size: 0x%x\n", + err, mac_info.status, out_size); + return -EIO; }
return 0;
From: Luo bin luobin9@huawei.com
[ Upstream commit f68910a8056f9451ee9fe7e1b962f7d90d326ad3 ]
It should also be regarded as an error when hw return status=4 for PF's setting mac cmd. Only if PF return status=4 to VF should this cmd be taken special treatment.
Fixes: 7dd29ee12865 ("hinic: add sriov feature support") Signed-off-by: Luo bin luobin9@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/huawei/hinic/hinic_port.c | 6 +++--- drivers/net/ethernet/huawei/hinic/hinic_sriov.c | 12 ++---------- 2 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_port.c b/drivers/net/ethernet/huawei/hinic/hinic_port.c index 82d5c50630e64..2be7c254cca90 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_port.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_port.c @@ -58,9 +58,9 @@ static int change_mac(struct hinic_dev *nic_dev, const u8 *addr, sizeof(port_mac_cmd), &port_mac_cmd, &out_size); if (err || out_size != sizeof(port_mac_cmd) || - (port_mac_cmd.status && - port_mac_cmd.status != HINIC_PF_SET_VF_ALREADY && - port_mac_cmd.status != HINIC_MGMT_STATUS_EXIST)) { + (port_mac_cmd.status && + (port_mac_cmd.status != HINIC_PF_SET_VF_ALREADY || !HINIC_IS_VF(hwif)) && + port_mac_cmd.status != HINIC_MGMT_STATUS_EXIST)) { dev_err(&pdev->dev, "Failed to change MAC, err: %d, status: 0x%x, out size: 0x%x\n", err, port_mac_cmd.status, out_size); return -EFAULT; diff --git a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c index 1043389754df0..b757f7057b8fe 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_sriov.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_sriov.c @@ -38,8 +38,7 @@ static int hinic_set_mac(struct hinic_hwdev *hwdev, const u8 *mac_addr, err = hinic_port_msg_cmd(hwdev, HINIC_PORT_CMD_SET_MAC, &mac_info, sizeof(mac_info), &mac_info, &out_size); if (err || out_size != sizeof(mac_info) || - (mac_info.status && mac_info.status != HINIC_PF_SET_VF_ALREADY && - mac_info.status != HINIC_MGMT_STATUS_EXIST)) { + (mac_info.status && mac_info.status != HINIC_MGMT_STATUS_EXIST)) { dev_err(&hwdev->func_to_io.hwif->pdev->dev, "Failed to set MAC, err: %d, status: 0x%x, out size: 0x%x\n", err, mac_info.status, out_size); return -EIO; @@ -452,8 +451,7 @@ struct hinic_sriov_info *hinic_get_sriov_info_by_pcidev(struct pci_dev *pdev)
static int hinic_check_mac_info(u8 status, u16 vlan_id) { - if ((status && status != HINIC_MGMT_STATUS_EXIST && - status != HINIC_PF_SET_VF_ALREADY) || + if ((status && status != HINIC_MGMT_STATUS_EXIST) || (vlan_id & CHECK_IPSU_15BIT && status == HINIC_MGMT_STATUS_EXIST)) return -EINVAL; @@ -495,12 +493,6 @@ static int hinic_update_mac(struct hinic_hwdev *hwdev, u8 *old_mac, return -EINVAL; }
- if (mac_info.status == HINIC_PF_SET_VF_ALREADY) { - dev_warn(&hwdev->hwif->pdev->dev, - "PF has already set VF MAC. Ignore update operation\n"); - return HINIC_PF_SET_VF_ALREADY; - } - if (mac_info.status == HINIC_MGMT_STATUS_EXIST) dev_warn(&hwdev->hwif->pdev->dev, "MAC is repeated. Ignore update operation\n");
From: Xiaoliang Yang xiaoliang.yang_1@nxp.com
[ Upstream commit dba1e4660a87927bdc03c23e36fd2c81a16a7ab1 ]
state->speed holds a value of 10, 100, 1000 or 2500, but QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the speed to a proper value.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") Signed-off-by: Xiaoliang Yang xiaoliang.yang_1@nxp.com Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix_vsc9959.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c index 7c167a394b762..a83ecd1c5d6c2 100644 --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@ -1215,8 +1215,28 @@ static void vsc9959_mdio_bus_free(struct ocelot *ocelot) static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port, u32 speed) { + u8 tas_speed; + + switch (speed) { + case SPEED_10: + tas_speed = OCELOT_SPEED_10; + break; + case SPEED_100: + tas_speed = OCELOT_SPEED_100; + break; + case SPEED_1000: + tas_speed = OCELOT_SPEED_1000; + break; + case SPEED_2500: + tas_speed = OCELOT_SPEED_2500; + break; + default: + tas_speed = OCELOT_SPEED_1000; + break; + } + ocelot_rmw_rix(ocelot, - QSYS_TAG_CONFIG_LINK_SPEED(speed), + QSYS_TAG_CONFIG_LINK_SPEED(tas_speed), QSYS_TAG_CONFIG_LINK_SPEED_M, QSYS_TAG_CONFIG, port); }
From: Herbert Xu herbert@gondor.apana.org.au
[ Upstream commit e94ee171349db84c7cfdc5fefbebe414054d0924 ]
The struct flowi must never be interpreted by itself as its size depends on the address family. Therefore it must always be grouped with its original family value.
In this particular instance, the original family value is lost in the function xfrm_state_find. Therefore we get a bogus read when it's coupled with the wrong family which would occur with inter- family xfrm states.
This patch fixes it by keeping the original family value.
Note that the same bug could potentially occur in LSM through the xfrm_state_pol_flow_match hook. I checked the current code there and it seems to be safe for now as only secid is used which is part of struct flowi_common. But that API should be changed so that so that we don't get new bugs in the future. We could do that by replacing fl with just secid or adding a family field.
Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_state.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 6b431a3830721..158510cd34ae8 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1019,7 +1019,8 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, */ if (x->km.state == XFRM_STATE_VALID) { if ((x->sel.family && - !xfrm_selector_match(&x->sel, fl, x->sel.family)) || + (x->sel.family != family || + !xfrm_selector_match(&x->sel, fl, family))) || !security_xfrm_state_pol_flow_match(x, pol, fl)) return;
@@ -1032,7 +1033,9 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, *acq_in_progress = 1; } else if (x->km.state == XFRM_STATE_ERROR || x->km.state == XFRM_STATE_EXPIRED) { - if (xfrm_selector_match(&x->sel, fl, x->sel.family) && + if ((!x->sel.family || + (x->sel.family == family && + xfrm_selector_match(&x->sel, fl, family))) && security_xfrm_state_pol_flow_match(x, pol, fl)) *error = -ESRCH; } @@ -1072,7 +1075,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, tmpl->mode == x->props.mode && tmpl->id.proto == x->id.proto && (tmpl->id.spi == x->id.spi || !tmpl->id.spi)) - xfrm_state_look_at(pol, x, fl, encap_family, + xfrm_state_look_at(pol, x, fl, family, &best, &acquire_in_progress, &error); } if (best || acquire_in_progress) @@ -1089,7 +1092,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, tmpl->mode == x->props.mode && tmpl->id.proto == x->id.proto && (tmpl->id.spi == x->id.spi || !tmpl->id.spi)) - xfrm_state_look_at(pol, x, fl, encap_family, + xfrm_state_look_at(pol, x, fl, family, &best, &acquire_in_progress, &error); }
From: Vaibhav Gupta vaibhavgupta40@gmail.com
[ Upstream commit bc5cbd73eb493944b8665dc517f684c40eb18a4a ]
With the support of generic PM callbacks, drivers no longer need to use legacy .suspend() and .resume() in which they had to maintain PCI states changes and device's power state themselves. The required operations are done by PCI core.
PCI drivers are not expected to invoke PCI helper functions like pci_save/restore_state(), pci_enable/disable_device(), pci_set_power_state(), etc. Their tasks are completed by PCI core itself.
Compile-tested only.
Signed-off-by: Vaibhav Gupta vaibhavgupta40@gmail.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/iavf/iavf_main.c | 45 ++++++--------------- 1 file changed, 12 insertions(+), 33 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index d338efe5f3f55..b3b349ecb0a8d 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -3777,7 +3777,6 @@ err_dma: return err; }
-#ifdef CONFIG_PM /** * iavf_suspend - Power management suspend routine * @pdev: PCI device information struct @@ -3785,11 +3784,10 @@ err_dma: * * Called when the system (VM) is entering sleep/suspend. **/ -static int iavf_suspend(struct pci_dev *pdev, pm_message_t state) +static int __maybe_unused iavf_suspend(struct device *dev_d) { - struct net_device *netdev = pci_get_drvdata(pdev); + struct net_device *netdev = dev_get_drvdata(dev_d); struct iavf_adapter *adapter = netdev_priv(netdev); - int retval = 0;
netif_device_detach(netdev);
@@ -3807,12 +3805,6 @@ static int iavf_suspend(struct pci_dev *pdev, pm_message_t state)
clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section);
- retval = pci_save_state(pdev); - if (retval) - return retval; - - pci_disable_device(pdev); - return 0; }
@@ -3822,24 +3814,13 @@ static int iavf_suspend(struct pci_dev *pdev, pm_message_t state) * * Called when the system (VM) is resumed from sleep/suspend. **/ -static int iavf_resume(struct pci_dev *pdev) +static int __maybe_unused iavf_resume(struct device *dev_d) { + struct pci_dev *pdev = to_pci_dev(dev_d); struct iavf_adapter *adapter = pci_get_drvdata(pdev); struct net_device *netdev = adapter->netdev; u32 err;
- pci_set_power_state(pdev, PCI_D0); - pci_restore_state(pdev); - /* pci_restore_state clears dev->state_saved so call - * pci_save_state to restore it. - */ - pci_save_state(pdev); - - err = pci_enable_device_mem(pdev); - if (err) { - dev_err(&pdev->dev, "Cannot enable PCI device from suspend.\n"); - return err; - } pci_set_master(pdev);
rtnl_lock(); @@ -3863,7 +3844,6 @@ static int iavf_resume(struct pci_dev *pdev) return err; }
-#endif /* CONFIG_PM */ /** * iavf_remove - Device Removal Routine * @pdev: PCI device information struct @@ -3965,16 +3945,15 @@ static void iavf_remove(struct pci_dev *pdev) pci_disable_device(pdev); }
+static SIMPLE_DEV_PM_OPS(iavf_pm_ops, iavf_suspend, iavf_resume); + static struct pci_driver iavf_driver = { - .name = iavf_driver_name, - .id_table = iavf_pci_tbl, - .probe = iavf_probe, - .remove = iavf_remove, -#ifdef CONFIG_PM - .suspend = iavf_suspend, - .resume = iavf_resume, -#endif - .shutdown = iavf_shutdown, + .name = iavf_driver_name, + .id_table = iavf_pci_tbl, + .probe = iavf_probe, + .remove = iavf_remove, + .driver.pm = &iavf_pm_ops, + .shutdown = iavf_shutdown, };
/**
From: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com
[ Upstream commit 75598a8fc0e0dff2aa5d46c62531b36a595f1d4f ]
When calling iavf_resume there was a crash because wrong function was used to get iavf_adapter and net_device pointers. Changed how iavf_resume is getting iavf_adapter and net_device pointers from pci_dev.
Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com Reviewed-by: Aleksandr Loktionov aleksandr.loktionov@intel.com Tested-by: Aaron Brown aaron.f.brown@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/iavf/iavf_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index b3b349ecb0a8d..91343e2d3a145 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -3817,8 +3817,8 @@ static int __maybe_unused iavf_suspend(struct device *dev_d) static int __maybe_unused iavf_resume(struct device *dev_d) { struct pci_dev *pdev = to_pci_dev(dev_d); - struct iavf_adapter *adapter = pci_get_drvdata(pdev); - struct net_device *netdev = adapter->netdev; + struct net_device *netdev = pci_get_drvdata(pdev); + struct iavf_adapter *adapter = netdev_priv(netdev); u32 err;
pci_set_master(pdev);
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit 135f4b9e9340dadb78e9737bb4eb9817b9c89dac ]
The ice_setup_pf_sw function can cause a memory leak if register_netdev fails, due to accidentally failing to free the VSI rings. Fix the memory leak by using ice_vsi_release, ensuring we actually go through the full teardown process.
This should be safe even if the netdevice is not registered because we will have set the netdev pointer to NULL, ensuring ice_vsi_release won't call unregister_netdev.
An alternative fix would be moving management of the PF VSI netdev into the main VSI setup code. This is complicated and likely requires significant refactor in how we manage VSIs
Fixes: 3a858ba392c3 ("ice: Add support for VSI allocation and deallocation") Signed-off-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Aaron Brown aaron.f.brown@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 6 +++--- drivers/net/ethernet/intel/ice/ice_lib.h | 6 ------ drivers/net/ethernet/intel/ice/ice_main.c | 13 +++---------- 3 files changed, 6 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 2e3a39cea2c03..133ba6f08e574 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -240,7 +240,7 @@ static int ice_get_free_slot(void *array, int size, int curr) * ice_vsi_delete - delete a VSI from the switch * @vsi: pointer to VSI being removed */ -void ice_vsi_delete(struct ice_vsi *vsi) +static void ice_vsi_delete(struct ice_vsi *vsi) { struct ice_pf *pf = vsi->back; struct ice_vsi_ctx *ctxt; @@ -307,7 +307,7 @@ static void ice_vsi_free_arrays(struct ice_vsi *vsi) * * Returns 0 on success, negative on failure */ -int ice_vsi_clear(struct ice_vsi *vsi) +static int ice_vsi_clear(struct ice_vsi *vsi) { struct ice_pf *pf = NULL; struct device *dev; @@ -557,7 +557,7 @@ static int ice_vsi_get_qs(struct ice_vsi *vsi) * ice_vsi_put_qs - Release queues from VSI to PF * @vsi: the VSI that is going to release queues */ -void ice_vsi_put_qs(struct ice_vsi *vsi) +static void ice_vsi_put_qs(struct ice_vsi *vsi) { struct ice_pf *pf = vsi->back; int i; diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h index d80e6afa45112..2954b30e6ec79 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_lib.h @@ -43,10 +43,6 @@ int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena, bool vlan_promisc);
void ice_cfg_sw_lldp(struct ice_vsi *vsi, bool tx, bool create);
-void ice_vsi_delete(struct ice_vsi *vsi); - -int ice_vsi_clear(struct ice_vsi *vsi); - #ifdef CONFIG_DCB int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc); #endif /* CONFIG_DCB */ @@ -77,8 +73,6 @@ bool ice_is_reset_in_progress(unsigned long *state); void ice_write_qrxflxp_cntxt(struct ice_hw *hw, u16 pf_q, u32 rxdid, u32 prio);
-void ice_vsi_put_qs(struct ice_vsi *vsi); - void ice_vsi_dis_irq(struct ice_vsi *vsi);
void ice_vsi_free_irq(struct ice_vsi *vsi); diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 4cbd49c87568a..4b52f1dea7f3a 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2605,10 +2605,8 @@ static int ice_setup_pf_sw(struct ice_pf *pf) return -EBUSY;
vsi = ice_pf_vsi_setup(pf, pf->hw.port_info); - if (!vsi) { - status = -ENOMEM; - goto unroll_vsi_setup; - } + if (!vsi) + return -ENOMEM;
status = ice_cfg_netdev(vsi); if (status) { @@ -2655,12 +2653,7 @@ unroll_napi_add: }
unroll_vsi_setup: - if (vsi) { - ice_vsi_free_q_vectors(vsi); - ice_vsi_delete(vsi); - ice_vsi_put_qs(vsi); - ice_vsi_clear(vsi); - } + ice_vsi_release(vsi); return status; }
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit f6a07271bb1535d9549380461437cc48d9e19958 ]
During ice_vsi_setup, if ice_cfg_vsi_lan fails, it does not properly release memory associated with the VSI rings. If we had used devres allocations for the rings, this would be ok. However, we use kzalloc and kfree_rcu for these ring structures.
Using the correct label to cleanup the rings during ice_vsi_setup highlights an issue in the ice_vsi_clear_rings function: it can leave behind stale ring pointers in the q_vectors structure.
When releasing rings, we must also ensure that no q_vector associated with the VSI will point to this ring again. To resolve this, loop over all q_vectors and release their ring mapping. Because we are about to free all rings, no q_vector should remain pointing to any of the rings in this VSI.
Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Aaron Brown aaron.f.brown@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 133ba6f08e574..4c5845a0965a9 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1190,6 +1190,18 @@ static void ice_vsi_clear_rings(struct ice_vsi *vsi) { int i;
+ /* Avoid stale references by clearing map from vector to ring */ + if (vsi->q_vectors) { + ice_for_each_q_vector(vsi, i) { + struct ice_q_vector *q_vector = vsi->q_vectors[i]; + + if (q_vector) { + q_vector->tx.ring = NULL; + q_vector->rx.ring = NULL; + } + } + } + if (vsi->tx_rings) { for (i = 0; i < vsi->alloc_txq; i++) { if (vsi->tx_rings[i]) { @@ -2254,7 +2266,7 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi, if (status) { dev_err(dev, "VSI %d failed lan queue config, error %s\n", vsi->vsi_num, ice_stat_str(status)); - goto unroll_vector_base; + goto unroll_clear_rings; }
/* Add switch rule to drop all Tx Flow Control Frames, of look up
From: Ronak Doshi doshir@vmware.com
[ Upstream commit 1dac3b1bc66dc68dbb0c9f43adac71a7d0a0331a ]
Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the inner offload capability is to be restrictued to UDP tunnels.
This patch fixes the issue for non-udp tunnels by adding features check capability and filtering appropriate features for non-udp tunnels.
Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi doshir@vmware.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 5 ++--- drivers/net/vmxnet3/vmxnet3_ethtool.c | 28 +++++++++++++++++++++++++++ drivers/net/vmxnet3/vmxnet3_int.h | 4 ++++ 3 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 2818015324b8b..336504b7531d9 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1032,7 +1032,6 @@ vmxnet3_tq_xmit(struct sk_buff *skb, struct vmxnet3_tx_queue *tq, /* Use temporary descriptor to avoid touching bits multiple times */ union Vmxnet3_GenericDesc tempTxDesc; #endif - struct udphdr *udph;
count = txd_estimate(skb);
@@ -1135,8 +1134,7 @@ vmxnet3_tq_xmit(struct sk_buff *skb, struct vmxnet3_tx_queue *tq, gdesc->txd.om = VMXNET3_OM_ENCAP; gdesc->txd.msscof = ctx.mss;
- udph = udp_hdr(skb); - if (udph->check) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM) gdesc->txd.oco = 1; } else { gdesc->txd.hlen = ctx.l4_offset + ctx.l4_hdr_size; @@ -3371,6 +3369,7 @@ vmxnet3_probe_device(struct pci_dev *pdev, .ndo_change_mtu = vmxnet3_change_mtu, .ndo_fix_features = vmxnet3_fix_features, .ndo_set_features = vmxnet3_set_features, + .ndo_features_check = vmxnet3_features_check, .ndo_get_stats64 = vmxnet3_get_stats64, .ndo_tx_timeout = vmxnet3_tx_timeout, .ndo_set_rx_mode = vmxnet3_set_mc, diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c index def27afa1c69f..3586677920046 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethtool.c +++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c @@ -267,6 +267,34 @@ netdev_features_t vmxnet3_fix_features(struct net_device *netdev, return features; }
+netdev_features_t vmxnet3_features_check(struct sk_buff *skb, + struct net_device *netdev, + netdev_features_t features) +{ + struct vmxnet3_adapter *adapter = netdev_priv(netdev); + + /* Validate if the tunneled packet is being offloaded by the device */ + if (VMXNET3_VERSION_GE_4(adapter) && + skb->encapsulation && skb->ip_summed == CHECKSUM_PARTIAL) { + u8 l4_proto = 0; + + switch (vlan_get_protocol(skb)) { + case htons(ETH_P_IP): + l4_proto = ip_hdr(skb)->protocol; + break; + case htons(ETH_P_IPV6): + l4_proto = ipv6_hdr(skb)->nexthdr; + break; + default: + return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK); + } + + if (l4_proto != IPPROTO_UDP) + return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK); + } + return features; +} + static void vmxnet3_enable_encap_offloads(struct net_device *netdev) { struct vmxnet3_adapter *adapter = netdev_priv(netdev); diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h index 5d2b062215a27..d958b92c94299 100644 --- a/drivers/net/vmxnet3/vmxnet3_int.h +++ b/drivers/net/vmxnet3/vmxnet3_int.h @@ -470,6 +470,10 @@ vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter); netdev_features_t vmxnet3_fix_features(struct net_device *netdev, netdev_features_t features);
+netdev_features_t +vmxnet3_features_check(struct sk_buff *skb, + struct net_device *netdev, netdev_features_t features); + int vmxnet3_set_features(struct net_device *netdev, netdev_features_t features);
From: Wong Vee Khee vee.khee.wong@intel.com
[ Upstream commit ac322f86b56cb99d1c4224c209095aa67647c967 ]
While unloading the dwmac-intel driver, clk_disable_unprepare() is being called twice in stmmac_dvr_remove() and intel_eth_pci_remove(). This causes kernel panic on the second call.
Removing the second call of clk_disable_unprepare() in intel_eth_pci_remove().
Fixes: 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove paths") Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Voon Weifeng weifeng.voon@intel.com Signed-off-by: Wong Vee Khee vee.khee.wong@intel.com Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c index 2ac9dfb3462c6..9e6d60e75f85d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c @@ -653,7 +653,6 @@ static void intel_eth_pci_remove(struct pci_dev *pdev)
pci_free_irq_vectors(pdev);
- clk_disable_unprepare(priv->plat->stmmac_clk); clk_unregister_fixed_rate(priv->plat->stmmac_clk);
pcim_iounmap_regions(pdev, BIT(0));
From: Ivan Khoronzhuk ivan.khoronzhuk@gmail.com
[ Upstream commit 4663ff60257aec4ee1e2e969a7c046f0aff35ab8 ]
To start also "phy state machine", with UP state as it should be, the phy_start() has to be used, in another case machine even is not triggered. After this change negotiation is supposed to be triggered by SM workqueue.
It's not correct usage, but it appears after the following patch, so add it as a fix.
Fixes: 74a992b3598a ("net: phy: add phy_check_link_status") Signed-off-by: Ivan Khoronzhuk ikhoronz@cisco.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c index cbaa1924afbe1..706e959bf02ac 100644 --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c @@ -1219,7 +1219,7 @@ static int octeon_mgmt_open(struct net_device *netdev) */ if (netdev->phydev) { netif_carrier_off(netdev); - phy_start_aneg(netdev->phydev); + phy_start(netdev->phydev); }
netif_wake_queue(netdev); @@ -1247,8 +1247,10 @@ static int octeon_mgmt_stop(struct net_device *netdev) napi_disable(&p->napi); netif_stop_queue(netdev);
- if (netdev->phydev) + if (netdev->phydev) { + phy_stop(netdev->phydev); phy_disconnect(netdev->phydev); + }
netif_carrier_off(netdev);
From: Eric Dumazet edumazet@google.com
[ Upstream commit f32f19339596b214c208c0dba716f4b6cc4f6958 ]
syzbot managed to crash a host by creating a bond with a GRE device.
For non Ethernet device, bonding calls bond_setup_by_slave() instead of ether_setup(), and unfortunately dev->needed_headroom was not copied from the new added member.
[ 171.243095] skbuff: skb_under_panic: text:ffffffffa184b9ea len:116 put:20 head:ffff883f84012dc0 data:ffff883f84012dbc tail:0x70 end:0xd00 dev:bond0 [ 171.243111] ------------[ cut here ]------------ [ 171.243112] kernel BUG at net/core/skbuff.c:112! [ 171.243117] invalid opcode: 0000 [#1] SMP KASAN PTI [ 171.243469] gsmi: Log Shutdown Reason 0x03 [ 171.243505] Call Trace: [ 171.243506] <IRQ> [ 171.243512] [<ffffffffa171be59>] skb_push+0x49/0x50 [ 171.243516] [<ffffffffa184b9ea>] ipgre_header+0x2a/0xf0 [ 171.243520] [<ffffffffa17452d7>] neigh_connected_output+0xb7/0x100 [ 171.243524] [<ffffffffa186f1d3>] ip6_finish_output2+0x383/0x490 [ 171.243528] [<ffffffffa186ede2>] __ip6_finish_output+0xa2/0x110 [ 171.243531] [<ffffffffa186acbc>] ip6_finish_output+0x2c/0xa0 [ 171.243534] [<ffffffffa186abe9>] ip6_output+0x69/0x110 [ 171.243537] [<ffffffffa186ac90>] ? ip6_output+0x110/0x110 [ 171.243541] [<ffffffffa189d952>] mld_sendpack+0x1b2/0x2d0 [ 171.243544] [<ffffffffa189d290>] ? mld_send_report+0xf0/0xf0 [ 171.243548] [<ffffffffa189c797>] mld_ifc_timer_expire+0x2d7/0x3b0 [ 171.243551] [<ffffffffa189c4c0>] ? mld_gq_timer_expire+0x50/0x50 [ 171.243556] [<ffffffffa0fea270>] call_timer_fn+0x30/0x130 [ 171.243559] [<ffffffffa0fea17c>] expire_timers+0x4c/0x110 [ 171.243563] [<ffffffffa0fea0e3>] __run_timers+0x213/0x260 [ 171.243566] [<ffffffffa0fecb7d>] ? ktime_get+0x3d/0xa0 [ 171.243570] [<ffffffffa0ff9c4e>] ? clockevents_program_event+0x7e/0xe0 [ 171.243574] [<ffffffffa0f7e5d5>] ? sched_clock_cpu+0x15/0x190 [ 171.243577] [<ffffffffa0fe973d>] run_timer_softirq+0x1d/0x40 [ 171.243581] [<ffffffffa1c00152>] __do_softirq+0x152/0x2f0 [ 171.243585] [<ffffffffa0f44e1f>] irq_exit+0x9f/0xb0 [ 171.243588] [<ffffffffa1a02e1d>] smp_apic_timer_interrupt+0xfd/0x1a0 [ 171.243591] [<ffffffffa1a01ea6>] apic_timer_interrupt+0x86/0x90
Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 500aa3e19a4c7..fddf7c502355b 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1195,6 +1195,7 @@ static void bond_setup_by_slave(struct net_device *bond_dev,
bond_dev->type = slave_dev->type; bond_dev->hard_header_len = slave_dev->hard_header_len; + bond_dev->needed_headroom = slave_dev->needed_headroom; bond_dev->addr_len = slave_dev->addr_len;
memcpy(bond_dev->broadcast, slave_dev->broadcast,
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 7dbbcf496f2a4b6d82cfc7810a0746e160b79762 ]
Fix build error by selecting MDIO_DEVRES for MDIO_THUNDER. Fixes this build error:
ld: drivers/net/phy/mdio-thunder.o: in function `thunder_mdiobus_pci_probe': drivers/net/phy/mdio-thunder.c:78: undefined reference to `devm_mdiobus_alloc_size'
Fixes: 379d7ac7ca31 ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Bartosz Golaszewski bgolaszewski@baylibre.com Cc: Andrew Lunn andrew@lunn.ch Cc: Heiner Kallweit hkallweit1@gmail.com Cc: netdev@vger.kernel.org Cc: David Daney david.daney@cavium.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index e351d65533aa8..06146ae4c6d8d 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -217,6 +217,7 @@ config MDIO_THUNDER depends on 64BIT depends on PCI select MDIO_CAVIUM + select MDIO_DEVRES help This driver supports the MDIO interfaces found on Cavium ThunderX SoCs when the MDIO bus device appears as a PCI
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 72865028582a678be1e05240e55d452e5c258eca ]
If mlxsw_sp_acl_tcam_group_id_get() fails, the mutex initialized earlier is not destroyed.
Fix this by initializing the mutex after calling the function. This is symmetric to mlxsw_sp_acl_tcam_group_del().
Fixes: 5ec2ee28d27b ("mlxsw: spectrum_acl: Introduce a mutex to guard region list updates") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c index 5c020403342f9..7cccc41dd69c9 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c @@ -292,13 +292,14 @@ mlxsw_sp_acl_tcam_group_add(struct mlxsw_sp_acl_tcam *tcam, int err;
group->tcam = tcam; - mutex_init(&group->lock); INIT_LIST_HEAD(&group->region_list);
err = mlxsw_sp_acl_tcam_group_id_get(tcam, &group->id); if (err) return err;
+ mutex_init(&group->lock); + return 0; }
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 709a16be0593c08190982cfbdca6df95e6d5823b ]
Mistakenly bit 2 was set instead of bit 3 as in the vendor driver.
Fixes: a7a92cf81589 ("r8169: sync PCIe PHY init with vendor driver 8.047.01") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index a5d54fa012213..fe173ea894e2c 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2958,7 +2958,7 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp) { 0x08, 0x0001, 0x0002 }, { 0x09, 0x0000, 0x0080 }, { 0x19, 0x0000, 0x0224 }, - { 0x00, 0x0000, 0x0004 }, + { 0x00, 0x0000, 0x0008 }, { 0x0c, 0x3df0, 0x0200 }, };
@@ -2975,7 +2975,7 @@ static void rtl_hw_start_8411(struct rtl8169_private *tp) { 0x06, 0x00c0, 0x0020 }, { 0x0f, 0xffff, 0x5200 }, { 0x19, 0x0000, 0x0224 }, - { 0x00, 0x0000, 0x0004 }, + { 0x00, 0x0000, 0x0008 }, { 0x0c, 0x3df0, 0x0200 }, };
From: Wilken Gottwalt wilken.gottwalt@mailbox.org
[ Upstream commit 9666ea66a74adfe295cb3a8760c76e1ef70f9caf ]
Adds the missing .stop entry in the Belkin driver_info structure.
Fixes: e20bd60bf62a ("net: usb: asix88179_178a: Add support for the Belkin B2B128") Signed-off-by: Wilken Gottwalt wilken.gottwalt@mailbox.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/ax88179_178a.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index a38e868e44d46..f0ef3706aad96 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -1823,6 +1823,7 @@ static const struct driver_info belkin_info = { .status = ax88179_status, .link_reset = ax88179_link_reset, .reset = ax88179_reset, + .stop = ax88179_stop, .flags = FLAG_ETHER | FLAG_FRAMING_AX, .rx_fixup = ax88179_rx_fixup, .tx_fixup = ax88179_tx_fixup,
From: Tonghao Zhang xiangxia.m.yue@gmail.com
[ Upstream commit 1a03b8a35a957f9f38ecb8a97443b7380bbf6a8b ]
Open vSwitch and Linux bridge will disable LRO of the interface when this interface added to them. Now when disable the LRO, the virtio-net csum is disable too. That drops the forwarding performance.
Fixes: a02e8964eaf9 ("virtio-net: ethtool configurable LRO") Cc: Michael S. Tsirkin mst@redhat.com Cc: Jason Wang jasowang@redhat.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: Tonghao Zhang xiangxia.m.yue@gmail.com Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/virtio_net.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ba38765dc4905..c34927b1d806e 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -63,6 +63,11 @@ static const unsigned long guest_offloads[] = { VIRTIO_NET_F_GUEST_CSUM };
+#define GUEST_OFFLOAD_LRO_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \ + (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \ + (1ULL << VIRTIO_NET_F_GUEST_ECN) | \ + (1ULL << VIRTIO_NET_F_GUEST_UFO)) + struct virtnet_stat_desc { char desc[ETH_GSTRING_LEN]; size_t offset; @@ -2547,7 +2552,8 @@ static int virtnet_set_features(struct net_device *dev, if (features & NETIF_F_LRO) offloads = vi->guest_offloads_capable; else - offloads = 0; + offloads = vi->guest_offloads_capable & + ~GUEST_OFFLOAD_LRO_MASK;
err = virtnet_set_guest_offloads(vi, offloads); if (err)
From: Willy Liu willy.liu@realtek.com
[ Upstream commit bbc4d71d63549bcd003a430de18a72a742d8c91e ]
There are two chip pins named TXDLY and RXDLY which actually adds the 2ns delays to TXC and RXC for TXD/RXD latching. These two pins can config via 4.7k-ohm resistor to 3.3V hw setting, but also config via software setting (extension page 0xa4 register 0x1c bit13 12 and 11).
The configuration register definitions from table 13 official PHY datasheet: PHYAD[2:0] = PHY Address AN[1:0] = Auto-Negotiation Mode = Interface Mode Select RX Delay = RX Delay TX Delay = TX Delay SELRGV = RGMII/GMII Selection
This table describes how to config these hw pins via external pull-high or pull- low resistor.
It is a misunderstanding that mapping it as register bits below: 8:6 = PHY Address 5:4 = Auto-Negotiation 3 = Interface Mode Select 2 = RX Delay 1 = TX Delay 0 = SELRGV So I removed these descriptions above and add related settings as below: 14 = reserved 13 = force Tx RX Delay controlled by bit12 bit11 12 = Tx Delay 11 = Rx Delay 10:0 = Test && debug settings reserved by realtek
Test && debug settings are not recommend to modify by default.
Fixes: f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx delays config") Signed-off-by: Willy Liu willy.liu@realtek.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/realtek.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index c7229d022a27b..48ba757046cea 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -1,6 +1,5 @@ // SPDX-License-Identifier: GPL-2.0+ -/* - * drivers/net/phy/realtek.c +/* drivers/net/phy/realtek.c * * Driver for Realtek PHYs * @@ -32,9 +31,9 @@ #define RTL8211F_TX_DELAY BIT(8) #define RTL8211F_RX_DELAY BIT(3)
-#define RTL8211E_TX_DELAY BIT(1) -#define RTL8211E_RX_DELAY BIT(2) -#define RTL8211E_MODE_MII_GMII BIT(3) +#define RTL8211E_CTRL_DELAY BIT(13) +#define RTL8211E_TX_DELAY BIT(12) +#define RTL8211E_RX_DELAY BIT(11)
#define RTL8201F_ISR 0x1e #define RTL8201F_IER 0x13 @@ -246,16 +245,16 @@ static int rtl8211e_config_init(struct phy_device *phydev) /* enable TX/RX delay for rgmii-* modes, and disable them for rgmii. */ switch (phydev->interface) { case PHY_INTERFACE_MODE_RGMII: - val = 0; + val = RTL8211E_CTRL_DELAY | 0; break; case PHY_INTERFACE_MODE_RGMII_ID: - val = RTL8211E_TX_DELAY | RTL8211E_RX_DELAY; + val = RTL8211E_CTRL_DELAY | RTL8211E_TX_DELAY | RTL8211E_RX_DELAY; break; case PHY_INTERFACE_MODE_RGMII_RXID: - val = RTL8211E_RX_DELAY; + val = RTL8211E_CTRL_DELAY | RTL8211E_RX_DELAY; break; case PHY_INTERFACE_MODE_RGMII_TXID: - val = RTL8211E_TX_DELAY; + val = RTL8211E_CTRL_DELAY | RTL8211E_TX_DELAY; break; default: /* the rest of the modes imply leaving delays as is. */ return 0; @@ -263,11 +262,12 @@ static int rtl8211e_config_init(struct phy_device *phydev)
/* According to a sample driver there is a 0x1c config register on the * 0xa4 extension page (0x7) layout. It can be used to disable/enable - * the RX/TX delays otherwise controlled by RXDLY/TXDLY pins. It can - * also be used to customize the whole configuration register: - * 8:6 = PHY Address, 5:4 = Auto-Negotiation, 3 = Interface Mode Select, - * 2 = RX Delay, 1 = TX Delay, 0 = SELRGV (see original PHY datasheet - * for details). + * the RX/TX delays otherwise controlled by RXDLY/TXDLY pins. + * The configuration register definition: + * 14 = reserved + * 13 = Force Tx RX Delay controlled by bit12 bit11, + * 12 = RX Delay, 11 = TX Delay + * 10:0 = Test && debug settings reserved by realtek */ oldpage = phy_select_page(phydev, 0x7); if (oldpage < 0) @@ -277,7 +277,8 @@ static int rtl8211e_config_init(struct phy_device *phydev) if (ret) goto err_restore_page;
- ret = __phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, + ret = __phy_modify(phydev, 0x1c, RTL8211E_CTRL_DELAY + | RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, val);
err_restore_page:
From: Subbaraya Sundeep sbhatta@marvell.com
[ Upstream commit e154b5b70368a84a19505a0be9b0096c66562b56 ]
Packet replication feature present in Octeontx2 is a hardware linked list of PF and its VF interfaces so that broadcast packets are sent to all interfaces present in the list. It is driver job to add and delete a PF/VF interface to/from the list when the interface is brought up and down. This patch fixes the npc_enadis_default_entries function to handle broadcast replication properly if packet replication feature is present.
Fixes: 40df309e4166 ("octeontx2-af: Support to enable/disable default MCAM entries") Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/marvell/octeontx2/af/rvu.h | 3 ++- .../ethernet/marvell/octeontx2/af/rvu_nix.c | 5 ++-- .../ethernet/marvell/octeontx2/af/rvu_npc.c | 26 ++++++++++++++----- 3 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index dcf25a0920084..b89dde2c8b089 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -463,6 +463,7 @@ void rvu_nix_freemem(struct rvu *rvu); int rvu_get_nixlf_count(struct rvu *rvu); void rvu_nix_lf_teardown(struct rvu *rvu, u16 pcifunc, int blkaddr, int npalf); int nix_get_nixlf(struct rvu *rvu, u16 pcifunc, int *nixlf, int *nix_blkaddr); +int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add);
/* NPC APIs */ int rvu_npc_init(struct rvu *rvu); @@ -477,7 +478,7 @@ void rvu_npc_disable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf); void rvu_npc_enable_promisc_entry(struct rvu *rvu, u16 pcifunc, int nixlf); void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc, int nixlf, u64 chan); -void rvu_npc_disable_bcast_entry(struct rvu *rvu, u16 pcifunc); +void rvu_npc_enable_bcast_entry(struct rvu *rvu, u16 pcifunc, bool enable); int rvu_npc_update_rxvlan(struct rvu *rvu, u16 pcifunc, int nixlf); void rvu_npc_disable_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf); void rvu_npc_disable_default_entries(struct rvu *rvu, u16 pcifunc, int nixlf); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index 36953d4f51c73..3495b3a6828c0 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -17,7 +17,6 @@ #include "npc.h" #include "cgx.h"
-static int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add); static int rvu_nix_get_bpid(struct rvu *rvu, struct nix_bp_cfg_req *req, int type, int chan_id);
@@ -2020,7 +2019,7 @@ static int nix_update_mce_list(struct nix_mce_list *mce_list, return 0; }
-static int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add) +int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add) { int err = 0, idx, next_idx, last_idx; struct nix_mce_list *mce_list; @@ -2065,7 +2064,7 @@ static int nix_update_bcast_mce_list(struct rvu *rvu, u16 pcifunc, bool add)
/* Disable MCAM entry in NPC */ if (!mce_list->count) { - rvu_npc_disable_bcast_entry(rvu, pcifunc); + rvu_npc_enable_bcast_entry(rvu, pcifunc, false); goto end; }
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c index 0a214084406a6..fbaf9bcd83f2f 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c @@ -530,7 +530,7 @@ void rvu_npc_install_bcast_match_entry(struct rvu *rvu, u16 pcifunc, NIX_INTF_RX, &entry, true); }
-void rvu_npc_disable_bcast_entry(struct rvu *rvu, u16 pcifunc) +void rvu_npc_enable_bcast_entry(struct rvu *rvu, u16 pcifunc, bool enable) { struct npc_mcam *mcam = &rvu->hw->mcam; int blkaddr, index; @@ -543,7 +543,7 @@ void rvu_npc_disable_bcast_entry(struct rvu *rvu, u16 pcifunc) pcifunc = pcifunc & ~RVU_PFVF_FUNC_MASK;
index = npc_get_nixlf_mcam_index(mcam, pcifunc, 0, NIXLF_BCAST_ENTRY); - npc_enable_mcam_entry(rvu, mcam, blkaddr, index, false); + npc_enable_mcam_entry(rvu, mcam, blkaddr, index, enable); }
void rvu_npc_update_flowkey_alg_idx(struct rvu *rvu, u16 pcifunc, int nixlf, @@ -622,23 +622,35 @@ static void npc_enadis_default_entries(struct rvu *rvu, u16 pcifunc, nixlf, NIXLF_UCAST_ENTRY); npc_enable_mcam_entry(rvu, mcam, blkaddr, index, enable);
- /* For PF, ena/dis promisc and bcast MCAM match entries */ - if (pcifunc & RVU_PFVF_FUNC_MASK) + /* For PF, ena/dis promisc and bcast MCAM match entries. + * For VFs add/delete from bcast list when RX multicast + * feature is present. + */ + if (pcifunc & RVU_PFVF_FUNC_MASK && !rvu->hw->cap.nix_rx_multicast) return;
/* For bcast, enable/disable only if it's action is not * packet replication, incase if action is replication - * then this PF's nixlf is removed from bcast replication + * then this PF/VF's nixlf is removed from bcast replication * list. */ - index = npc_get_nixlf_mcam_index(mcam, pcifunc, + index = npc_get_nixlf_mcam_index(mcam, pcifunc & ~RVU_PFVF_FUNC_MASK, nixlf, NIXLF_BCAST_ENTRY); bank = npc_get_bank(mcam, index); *(u64 *)&action = rvu_read64(rvu, blkaddr, NPC_AF_MCAMEX_BANKX_ACTION(index & (mcam->banksize - 1), bank)); - if (action.op != NIX_RX_ACTIONOP_MCAST) + + /* VFs will not have BCAST entry */ + if (action.op != NIX_RX_ACTIONOP_MCAST && + !(pcifunc & RVU_PFVF_FUNC_MASK)) { npc_enable_mcam_entry(rvu, mcam, blkaddr, index, enable); + } else { + nix_update_bcast_mce_list(rvu, pcifunc, enable); + /* Enable PF's BCAST entry for packet replication */ + rvu_npc_enable_bcast_entry(rvu, pcifunc, enable); + } + if (enable) rvu_npc_enable_promisc_entry(rvu, pcifunc, nixlf); else
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit 89eae5e87b4fa799726a3e8911c90d418cb5d2b1 ]
For TCP/UDP checksum offload feature in Octeontx2 expects L3TYPE to be set irrespective of IP header checksum is being offloaded or not. Currently for IPv6 frames L3TYPE is not being set resulting in packet drop with checksum error. This patch fixes this issue.
Fixes: 3ca6c4c88 ("octeontx2-pf: Add packet transmission support") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c index b04f5429d72d9..334eab976ee4a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c @@ -524,6 +524,7 @@ static void otx2_sqe_add_hdr(struct otx2_nic *pfvf, struct otx2_snd_queue *sq, sqe_hdr->ol3type = NIX_SENDL3TYPE_IP4_CKSUM; } else if (skb->protocol == htons(ETH_P_IPV6)) { proto = ipv6_hdr(skb)->nexthdr; + sqe_hdr->ol3type = NIX_SENDL3TYPE_IP6; }
if (proto == IPPROTO_TCP)
From: Hariprasad Kelam hkelam@marvell.com
[ Upstream commit 1ea0166da0509e987caa42c30a6a71f2c6ca1875 ]
Currently in otx2_open on failure of nix_lf_start transmit queues are not stopped which are already started in link_event. Since the tx queues are not stopped network stack still try's to send the packets leading to driver crash while access the device resources.
Fixes: 50fe6c02e ("octeontx2-pf: Register and handle link notifications") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index 75a8c407e815c..5d620a39ea802 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -1560,10 +1560,13 @@ int otx2_open(struct net_device *netdev)
err = otx2_rxtx_enable(pf, true); if (err) - goto err_free_cints; + goto err_tx_stop_queues;
return 0;
+err_tx_stop_queues: + netif_tx_stop_all_queues(netdev); + netif_carrier_off(netdev); err_free_cints: otx2_free_cints(pf, qidx); vec = pci_irq_vector(pf->pdev,
From: Hariprasad Kelam hkelam@marvell.com
[ Upstream commit 66a5209b53418111757716d71e52727b782eabd4 ]
Mbox implementation in octeontx2 driver has three states alloc, send and reset in mbox response. VF allocate and sends message to PF for processing, PF ACKs them back and reset the mbox memory. In some case we see synchronization issue where after msgs_acked is incremented and before mbox_reset API is called, if current execution is scheduled out and a different thread is scheduled in which checks for msgs_acked. Since the new thread sees msgs_acked == msgs_sent it will try to allocate a new message and to send a new mbox message to PF.Now if mbox_reset is scheduled in, PF will see '0' in msgs_send. This patch fixes the issue by calling mbox_reset before incrementing msgs_acked flag for last processing message and checks for valid message size.
Fixes: d424b6c02 ("octeontx2-pf: Enable SRIOV and added VF mbox handling") Signed-off-by: Hariprasad Kelam hkelam@marvell.com Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/mbox.c | 12 ++++++++++-- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 + drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 11 ++++++----- drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c | 4 ++-- 4 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.c b/drivers/net/ethernet/marvell/octeontx2/af/mbox.c index 387e33fa417aa..2718fe201c147 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.c @@ -17,7 +17,7 @@
static const u16 msgs_offset = ALIGN(sizeof(struct mbox_hdr), MBOX_MSG_ALIGN);
-void otx2_mbox_reset(struct otx2_mbox *mbox, int devid) +void __otx2_mbox_reset(struct otx2_mbox *mbox, int devid) { void *hw_mbase = mbox->hwbase + (devid * MBOX_SIZE); struct otx2_mbox_dev *mdev = &mbox->dev[devid]; @@ -26,13 +26,21 @@ void otx2_mbox_reset(struct otx2_mbox *mbox, int devid) tx_hdr = hw_mbase + mbox->tx_start; rx_hdr = hw_mbase + mbox->rx_start;
- spin_lock(&mdev->mbox_lock); mdev->msg_size = 0; mdev->rsp_size = 0; tx_hdr->num_msgs = 0; tx_hdr->msg_size = 0; rx_hdr->num_msgs = 0; rx_hdr->msg_size = 0; +} +EXPORT_SYMBOL(__otx2_mbox_reset); + +void otx2_mbox_reset(struct otx2_mbox *mbox, int devid) +{ + struct otx2_mbox_dev *mdev = &mbox->dev[devid]; + + spin_lock(&mdev->mbox_lock); + __otx2_mbox_reset(mbox, devid); spin_unlock(&mdev->mbox_lock); } EXPORT_SYMBOL(otx2_mbox_reset); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index 6dfd0f90cd704..ab433789d2c31 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -93,6 +93,7 @@ struct mbox_msghdr { };
void otx2_mbox_reset(struct otx2_mbox *mbox, int devid); +void __otx2_mbox_reset(struct otx2_mbox *mbox, int devid); void otx2_mbox_destroy(struct otx2_mbox *mbox); int otx2_mbox_init(struct otx2_mbox *mbox, void __force *hwbase, struct pci_dev *pdev, void __force *reg_base, diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index 5d620a39ea802..2fb45670aca49 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -370,8 +370,8 @@ static int otx2_forward_vf_mbox_msgs(struct otx2_nic *pf, dst_mbox = &pf->mbox; dst_size = dst_mbox->mbox.tx_size - ALIGN(sizeof(*mbox_hdr), MBOX_MSG_ALIGN); - /* Check if msgs fit into destination area */ - if (mbox_hdr->msg_size > dst_size) + /* Check if msgs fit into destination area and has valid size */ + if (mbox_hdr->msg_size > dst_size || !mbox_hdr->msg_size) return -EINVAL;
dst_mdev = &dst_mbox->mbox.dev[0]; @@ -526,10 +526,10 @@ static void otx2_pfvf_mbox_up_handler(struct work_struct *work)
end: offset = mbox->rx_start + msg->next_msgoff; + if (mdev->msgs_acked == (vf_mbox->up_num_msgs - 1)) + __otx2_mbox_reset(mbox, 0); mdev->msgs_acked++; } - - otx2_mbox_reset(mbox, vf_idx); }
static irqreturn_t otx2_pfvf_mbox_intr_handler(int irq, void *pf_irq) @@ -803,10 +803,11 @@ static void otx2_pfaf_mbox_handler(struct work_struct *work) msg = (struct mbox_msghdr *)(mdev->mbase + offset); otx2_process_pfaf_mbox_msg(pf, msg); offset = mbox->rx_start + msg->next_msgoff; + if (mdev->msgs_acked == (af_mbox->num_msgs - 1)) + __otx2_mbox_reset(mbox, 0); mdev->msgs_acked++; }
- otx2_mbox_reset(mbox, 0); }
static void otx2_handle_link_event(struct otx2_nic *pf) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c index 92a3db69a6cd6..2f90f17214415 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c @@ -99,10 +99,10 @@ static void otx2vf_vfaf_mbox_handler(struct work_struct *work) msg = (struct mbox_msghdr *)(mdev->mbase + offset); otx2vf_process_vfaf_mbox_msg(af_mbox->pfvf, msg); offset = mbox->rx_start + msg->next_msgoff; + if (mdev->msgs_acked == (af_mbox->num_msgs - 1)) + __otx2_mbox_reset(mbox, 0); mdev->msgs_acked++; } - - otx2_mbox_reset(mbox, 0); }
static int otx2vf_process_mbox_msg_up(struct otx2_nic *vf,
From: Qian Cai cai@redhat.com
[ Upstream commit 8a018eb55e3ac033592afbcb476b0ffe64465b12 ]
Calling pipe2() with O_NOTIFICATION_PIPE could results in memory leaks unless watch_queue_init() is successful.
In case of watch_queue_init() failure in pipe2() we are left with inode and pipe_inode_info instances that need to be freed. That failure exit has been introduced in commit c73be61cede5 ("pipe: Add general notification queue support") and its handling should've been identical to nearby treatment of alloc_file_pseudo() failures - it is dealing with the same situation. As it is, the mainline kernel leaks in that case.
Another problem is that CONFIG_WATCH_QUEUE and !CONFIG_WATCH_QUEUE cases are treated differently (and the former leaks just pipe_inode_info, the latter - both pipe_inode_info and inode).
Fixed by providing a dummy wacth_queue_init() in !CONFIG_WATCH_QUEUE case and by having failures of wacth_queue_init() handled the same way we handle alloc_file_pseudo() ones.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Qian Cai cai@redhat.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pipe.c | 11 +++++------ include/linux/watch_queue.h | 6 ++++++ 2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c index 117db82b10af5..0ac197658a2d6 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -894,19 +894,18 @@ int create_pipe_files(struct file **res, int flags) { struct inode *inode = get_pipe_inode(); struct file *f; + int error;
if (!inode) return -ENFILE;
if (flags & O_NOTIFICATION_PIPE) { -#ifdef CONFIG_WATCH_QUEUE - if (watch_queue_init(inode->i_pipe) < 0) { + error = watch_queue_init(inode->i_pipe); + if (error) { + free_pipe_info(inode->i_pipe); iput(inode); - return -ENOMEM; + return error; } -#else - return -ENOPKG; -#endif }
f = alloc_file_pseudo(inode, pipe_mnt, "", diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h index 5e08db2adc319..c994d1b2cdbaa 100644 --- a/include/linux/watch_queue.h +++ b/include/linux/watch_queue.h @@ -122,6 +122,12 @@ static inline void remove_watch_list(struct watch_list *wlist, u64 id) */ #define watch_sizeof(STRUCT) (sizeof(STRUCT) << WATCH_INFO_LENGTH__SHIFT)
+#else +static inline int watch_queue_init(struct pipe_inode_info *pipe) +{ + return -ENOPKG; +} + #endif
#endif /* _LINUX_WATCH_QUEUE_H */
From: Eran Ben Elisha eranbe@mellanox.com
[ Upstream commit 432161ea26d6d5e5c3f7306d9407d26ed1e1953e ]
As part of driver unload, it destroys the commands EQ (via FW command). As the commands EQ is destroyed, FW will not generate EQEs for any command that driver sends afterwards. Driver should poll for later commands status.
Driver commands mode metadata is updated before the commands EQ is actually destroyed. This can lead for double completion handle by the driver (polling and interrupt), if a command is executed and completed by FW after the mode was changed, but before the EQ was destroyed.
Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee that only DESTROY_EQ command can be executed during this time period.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Reviewed-by: Moshe Shemesh moshe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index 31ef9f8420c87..1318d774b18f2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -656,8 +656,10 @@ static void destroy_async_eqs(struct mlx5_core_dev *dev)
cleanup_async_eq(dev, &table->pages_eq, "pages"); cleanup_async_eq(dev, &table->async_eq, "async"); + mlx5_cmd_allowed_opcode(dev, MLX5_CMD_OP_DESTROY_EQ); mlx5_cmd_use_polling(dev); cleanup_async_eq(dev, &table->cmd_eq, "cmd"); + mlx5_cmd_allowed_opcode(dev, CMD_ALLOWED_OPCODE_ALL); mlx5_eq_notifier_unregister(dev, &table->cq_err_nb); }
From: Eran Ben Elisha eranbe@mellanox.com
[ Upstream commit 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 ]
Upon command completion timeout, driver simulates a forced command completion. In a rare case where real interrupt for that command arrives simultaneously, it might release the command entry while the forced handler might still access it.
Fix that by adding an entry refcount, to track current amount of allowed handlers. Command entry to be released only when this refcount is decremented to zero.
Command refcount is always initialized to one. For callback commands, command completion handler is the symmetric flow to decrement it. For non-callback commands, it is wait_func().
Before ringing the doorbell, increment the refcount for the real completion handler. Once the real completion handler is called, it will decrement it.
For callback commands, once the delayed work is scheduled, increment the refcount. Upon callback command completion handler, we will try to cancel the timeout callback. In case of success, we need to decrement the callback refcount as it will never run.
In addition, gather the entry index free and the entry free into a one flow for all command types release.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Reviewed-by: Moshe Shemesh moshe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 109 ++++++++++++------ include/linux/mlx5/driver.h | 2 + 2 files changed, 73 insertions(+), 38 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 1d91a0d0ab1d7..c0055f5479ce0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -69,12 +69,10 @@ enum { MLX5_CMD_DELIVERY_STAT_CMD_DESCR_ERR = 0x10, };
-static struct mlx5_cmd_work_ent *alloc_cmd(struct mlx5_cmd *cmd, - struct mlx5_cmd_msg *in, - struct mlx5_cmd_msg *out, - void *uout, int uout_size, - mlx5_cmd_cbk_t cbk, - void *context, int page_queue) +static struct mlx5_cmd_work_ent * +cmd_alloc_ent(struct mlx5_cmd *cmd, struct mlx5_cmd_msg *in, + struct mlx5_cmd_msg *out, void *uout, int uout_size, + mlx5_cmd_cbk_t cbk, void *context, int page_queue) { gfp_t alloc_flags = cbk ? GFP_ATOMIC : GFP_KERNEL; struct mlx5_cmd_work_ent *ent; @@ -83,6 +81,7 @@ static struct mlx5_cmd_work_ent *alloc_cmd(struct mlx5_cmd *cmd, if (!ent) return ERR_PTR(-ENOMEM);
+ ent->idx = -EINVAL; ent->in = in; ent->out = out; ent->uout = uout; @@ -91,10 +90,16 @@ static struct mlx5_cmd_work_ent *alloc_cmd(struct mlx5_cmd *cmd, ent->context = context; ent->cmd = cmd; ent->page_queue = page_queue; + refcount_set(&ent->refcnt, 1);
return ent; }
+static void cmd_free_ent(struct mlx5_cmd_work_ent *ent) +{ + kfree(ent); +} + static u8 alloc_token(struct mlx5_cmd *cmd) { u8 token; @@ -109,7 +114,7 @@ static u8 alloc_token(struct mlx5_cmd *cmd) return token; }
-static int alloc_ent(struct mlx5_cmd *cmd) +static int cmd_alloc_index(struct mlx5_cmd *cmd) { unsigned long flags; int ret; @@ -123,7 +128,7 @@ static int alloc_ent(struct mlx5_cmd *cmd) return ret < cmd->max_reg_cmds ? ret : -ENOMEM; }
-static void free_ent(struct mlx5_cmd *cmd, int idx) +static void cmd_free_index(struct mlx5_cmd *cmd, int idx) { unsigned long flags;
@@ -132,6 +137,22 @@ static void free_ent(struct mlx5_cmd *cmd, int idx) spin_unlock_irqrestore(&cmd->alloc_lock, flags); }
+static void cmd_ent_get(struct mlx5_cmd_work_ent *ent) +{ + refcount_inc(&ent->refcnt); +} + +static void cmd_ent_put(struct mlx5_cmd_work_ent *ent) +{ + if (!refcount_dec_and_test(&ent->refcnt)) + return; + + if (ent->idx >= 0) + cmd_free_index(ent->cmd, ent->idx); + + cmd_free_ent(ent); +} + static struct mlx5_cmd_layout *get_inst(struct mlx5_cmd *cmd, int idx) { return cmd->cmd_buf + (idx << cmd->log_stride); @@ -219,11 +240,6 @@ static void poll_timeout(struct mlx5_cmd_work_ent *ent) ent->ret = -ETIMEDOUT; }
-static void free_cmd(struct mlx5_cmd_work_ent *ent) -{ - kfree(ent); -} - static int verify_signature(struct mlx5_cmd_work_ent *ent) { struct mlx5_cmd_mailbox *next = ent->out->next; @@ -842,6 +858,7 @@ static void cb_timeout_handler(struct work_struct *work) mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in)); mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true); + cmd_ent_put(ent); /* for the cmd_ent_get() took on schedule delayed work */ }
static void free_msg(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *msg); @@ -873,14 +890,14 @@ static void cmd_work_handler(struct work_struct *work) sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); if (!ent->page_queue) { - alloc_ret = alloc_ent(cmd); + alloc_ret = cmd_alloc_index(cmd); if (alloc_ret < 0) { mlx5_core_err_rl(dev, "failed to allocate command entry\n"); if (ent->callback) { ent->callback(-EAGAIN, ent->context); mlx5_free_cmd_msg(dev, ent->out); free_msg(dev, ent->in); - free_cmd(ent); + cmd_ent_put(ent); } else { ent->ret = -EAGAIN; complete(&ent->done); @@ -916,8 +933,8 @@ static void cmd_work_handler(struct work_struct *work) ent->ts1 = ktime_get_ns(); cmd_mode = cmd->mode;
- if (ent->callback) - schedule_delayed_work(&ent->cb_timeout_work, cb_timeout); + if (ent->callback && schedule_delayed_work(&ent->cb_timeout_work, cb_timeout)) + cmd_ent_get(ent); set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state);
/* Skip sending command to fw if internal error */ @@ -933,13 +950,10 @@ static void cmd_work_handler(struct work_struct *work) MLX5_SET(mbox_out, ent->out, syndrome, drv_synd);
mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true); - /* no doorbell, no need to keep the entry */ - free_ent(cmd, ent->idx); - if (ent->callback) - free_cmd(ent); return; }
+ cmd_ent_get(ent); /* for the _real_ FW event on completion */ /* ring doorbell after the descriptor is valid */ mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx); wmb(); @@ -1039,11 +1053,16 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in, if (callback && page_queue) return -EINVAL;
- ent = alloc_cmd(cmd, in, out, uout, uout_size, callback, context, - page_queue); + ent = cmd_alloc_ent(cmd, in, out, uout, uout_size, + callback, context, page_queue); if (IS_ERR(ent)) return PTR_ERR(ent);
+ /* put for this ent is when consumed, depending on the use case + * 1) (!callback) blocking flow: by caller after wait_func completes + * 2) (callback) flow: by mlx5_cmd_comp_handler() when ent is handled + */ + ent->token = token; ent->polling = force_polling;
@@ -1062,12 +1081,10 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in, }
if (callback) - goto out; + goto out; /* mlx5_cmd_comp_handler() will put(ent) */
err = wait_func(dev, ent); - if (err == -ETIMEDOUT) - goto out; - if (err == -ECANCELED) + if (err == -ETIMEDOUT || err == -ECANCELED) goto out_free;
ds = ent->ts2 - ent->ts1; @@ -1085,7 +1102,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in, *status = ent->status;
out_free: - free_cmd(ent); + cmd_ent_put(ent); out: return err; } @@ -1516,14 +1533,19 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force if (!forced) { mlx5_core_err(dev, "Command completion arrived after timeout (entry idx = %d).\n", ent->idx); - free_ent(cmd, ent->idx); - free_cmd(ent); + cmd_ent_put(ent); } continue; }
- if (ent->callback) - cancel_delayed_work(&ent->cb_timeout_work); + if (ent->callback && cancel_delayed_work(&ent->cb_timeout_work)) + cmd_ent_put(ent); /* timeout work was canceled */ + + if (!forced || /* Real FW completion */ + pci_channel_offline(dev->pdev) || /* FW is inaccessible */ + dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) + cmd_ent_put(ent); + if (ent->page_queue) sem = &cmd->pages_sem; else @@ -1545,10 +1567,6 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force ent->ret, deliv_status_to_str(ent->status), ent->status); }
- /* only real completion will free the entry slot */ - if (!forced) - free_ent(cmd, ent->idx); - if (ent->callback) { ds = ent->ts2 - ent->ts1; if (ent->op < MLX5_CMD_OP_MAX) { @@ -1576,10 +1594,13 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force free_msg(dev, ent->in);
err = err ? err : ent->status; - if (!forced) - free_cmd(ent); + /* final consumer is done, release ent */ + cmd_ent_put(ent); callback(err, context); } else { + /* release wait_func() so mlx5_cmd_invoke() + * can make the final ent_put() + */ complete(&ent->done); } up(sem); @@ -1589,8 +1610,11 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
void mlx5_cmd_trigger_completions(struct mlx5_core_dev *dev) { + struct mlx5_cmd *cmd = &dev->cmd; + unsigned long bitmask; unsigned long flags; u64 vector; + int i;
/* wait for pending handlers to complete */ mlx5_eq_synchronize_cmd_irq(dev); @@ -1599,11 +1623,20 @@ void mlx5_cmd_trigger_completions(struct mlx5_core_dev *dev) if (!vector) goto no_trig;
+ bitmask = vector; + /* we must increment the allocated entries refcount before triggering the completions + * to guarantee pending commands will not get freed in the meanwhile. + * For that reason, it also has to be done inside the alloc_lock. + */ + for_each_set_bit(i, &bitmask, (1 << cmd->log_sz)) + cmd_ent_get(cmd->ent_arr[i]); vector |= MLX5_TRIGGERED_CMD_COMP; spin_unlock_irqrestore(&dev->cmd.alloc_lock, flags);
mlx5_core_dbg(dev, "vector 0x%llx\n", vector); mlx5_cmd_comp_handler(dev, vector, true); + for_each_set_bit(i, &bitmask, (1 << cmd->log_sz)) + cmd_ent_put(cmd->ent_arr[i]); return;
no_trig: diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 1e6ca716635a9..484cd8ba869c5 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -764,6 +764,8 @@ struct mlx5_cmd_work_ent { u64 ts2; u16 op; bool polling; + /* Track the max comp handlers */ + refcount_t refcnt; };
struct mlx5_pas {
From: Eran Ben Elisha eranbe@mellanox.com
[ Upstream commit 1d5558b1f0de81f54ddee05f3793acc5260d107f ]
Once driver detects a command interface command timeout, it warns the user and returns timeout error to the caller. In such case, the entry of the command is not evacuated (because only real event interrupt is allowed to clear command interface entry). If the HW event interrupt of this entry will never arrive, this entry will be left unused forever. Command interface entries are limited and eventually we can end up without the ability to post a new command.
In addition, if driver will not consume the EQE of the lost interrupt and rearm the EQ, no new interrupts will arrive for other commands.
Add a resiliency mechanism for manually polling the command EQ in case of a command timeout. In case resiliency mechanism will find non-handled EQE, it will consume it, and the command interface will be fully functional again. Once the resiliency flow finished, wait another 5 seconds for the command interface to complete for this command entry.
Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow. Add an async EQ spinlock to avoid races between resiliency flows and real interrupts that might run simultaneously.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 53 ++++++++++++++++--- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 40 +++++++++++++- .../net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 + 3 files changed, 86 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index c0055f5479ce0..37dae95e61d5f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -853,11 +853,21 @@ static void cb_timeout_handler(struct work_struct *work) struct mlx5_core_dev *dev = container_of(ent->cmd, struct mlx5_core_dev, cmd);
+ mlx5_cmd_eq_recover(dev); + + /* Maybe got handled by eq recover ? */ + if (!test_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state)) { + mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) Async, recovered after timeout\n", ent->idx, + mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in)); + goto out; /* phew, already handled */ + } + ent->ret = -ETIMEDOUT; - mlx5_core_warn(dev, "%s(0x%x) timeout. Will cause a leak of a command resource\n", - mlx5_command_str(msg_to_opcode(ent->in)), - msg_to_opcode(ent->in)); + mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) Async, timeout. Will cause a leak of a command resource\n", + ent->idx, mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in)); mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true); + +out: cmd_ent_put(ent); /* for the cmd_ent_get() took on schedule delayed work */ }
@@ -997,6 +1007,35 @@ static const char *deliv_status_to_str(u8 status) } }
+enum { + MLX5_CMD_TIMEOUT_RECOVER_MSEC = 5 * 1000, +}; + +static void wait_func_handle_exec_timeout(struct mlx5_core_dev *dev, + struct mlx5_cmd_work_ent *ent) +{ + unsigned long timeout = msecs_to_jiffies(MLX5_CMD_TIMEOUT_RECOVER_MSEC); + + mlx5_cmd_eq_recover(dev); + + /* Re-wait on the ent->done after executing the recovery flow. If the + * recovery flow (or any other recovery flow running simultaneously) + * has recovered an EQE, it should cause the entry to be completed by + * the command interface. + */ + if (wait_for_completion_timeout(&ent->done, timeout)) { + mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) recovered after timeout\n", ent->idx, + mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in)); + return; + } + + mlx5_core_warn(dev, "cmd[%d]: %s(0x%x) No done completion\n", ent->idx, + mlx5_command_str(msg_to_opcode(ent->in)), msg_to_opcode(ent->in)); + + ent->ret = -ETIMEDOUT; + mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true); +} + static int wait_func(struct mlx5_core_dev *dev, struct mlx5_cmd_work_ent *ent) { unsigned long timeout = msecs_to_jiffies(MLX5_CMD_TIMEOUT_MSEC); @@ -1008,12 +1047,10 @@ static int wait_func(struct mlx5_core_dev *dev, struct mlx5_cmd_work_ent *ent) ent->ret = -ECANCELED; goto out_err; } - if (cmd->mode == CMD_MODE_POLLING || ent->polling) { + if (cmd->mode == CMD_MODE_POLLING || ent->polling) wait_for_completion(&ent->done); - } else if (!wait_for_completion_timeout(&ent->done, timeout)) { - ent->ret = -ETIMEDOUT; - mlx5_cmd_comp_handler(dev, 1UL << ent->idx, true); - } + else if (!wait_for_completion_timeout(&ent->done, timeout)) + wait_func_handle_exec_timeout(dev, ent);
out_err: err = ent->ret; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index 1318d774b18f2..22a19d391e179 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -189,6 +189,29 @@ u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq_comp *eq) return count_eqe; }
+static void mlx5_eq_async_int_lock(struct mlx5_eq_async *eq, unsigned long *flags) + __acquires(&eq->lock) +{ + if (in_irq()) + spin_lock(&eq->lock); + else + spin_lock_irqsave(&eq->lock, *flags); +} + +static void mlx5_eq_async_int_unlock(struct mlx5_eq_async *eq, unsigned long *flags) + __releases(&eq->lock) +{ + if (in_irq()) + spin_unlock(&eq->lock); + else + spin_unlock_irqrestore(&eq->lock, *flags); +} + +enum async_eq_nb_action { + ASYNC_EQ_IRQ_HANDLER = 0, + ASYNC_EQ_RECOVER = 1, +}; + static int mlx5_eq_async_int(struct notifier_block *nb, unsigned long action, void *data) { @@ -198,11 +221,14 @@ static int mlx5_eq_async_int(struct notifier_block *nb, struct mlx5_eq_table *eqt; struct mlx5_core_dev *dev; struct mlx5_eqe *eqe; + unsigned long flags; int num_eqes = 0;
dev = eq->dev; eqt = dev->priv.eq_table;
+ mlx5_eq_async_int_lock(eq_async, &flags); + eqe = next_eqe_sw(eq); if (!eqe) goto out; @@ -223,8 +249,19 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
out: eq_update_ci(eq, 1); + mlx5_eq_async_int_unlock(eq_async, &flags);
- return 0; + return unlikely(action == ASYNC_EQ_RECOVER) ? num_eqes : 0; +} + +void mlx5_cmd_eq_recover(struct mlx5_core_dev *dev) +{ + struct mlx5_eq_async *eq = &dev->priv.eq_table->cmd_eq; + int eqes; + + eqes = mlx5_eq_async_int(&eq->irq_nb, ASYNC_EQ_RECOVER, NULL); + if (eqes) + mlx5_core_warn(dev, "Recovered %d EQEs on cmd_eq\n", eqes); }
static void init_eq_buf(struct mlx5_eq *eq) @@ -569,6 +606,7 @@ setup_async_eq(struct mlx5_core_dev *dev, struct mlx5_eq_async *eq, int err;
eq->irq_nb.notifier_call = mlx5_eq_async_int; + spin_lock_init(&eq->lock);
err = create_async_eq(dev, &eq->core, param); if (err) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h index 4aaca7400fb29..5c681e31983bc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h @@ -37,6 +37,7 @@ struct mlx5_eq { struct mlx5_eq_async { struct mlx5_eq core; struct notifier_block irq_nb; + spinlock_t lock; /* To avoid irq EQ handle races with resiliency flows */ };
struct mlx5_eq_comp { @@ -81,6 +82,7 @@ void mlx5_cq_tasklet_cb(unsigned long data); struct cpumask *mlx5_eq_comp_cpumask(struct mlx5_core_dev *dev, int ix);
u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq_comp *eq); +void mlx5_cmd_eq_recover(struct mlx5_core_dev *dev); void mlx5_eq_synchronize_async_irq(struct mlx5_core_dev *dev); void mlx5_eq_synchronize_cmd_irq(struct mlx5_core_dev *dev);
From: Eran Ben Elisha eranbe@nvidia.com
[ Upstream commit 410bd754cd73c4a2ac3856d9a03d7b08f9c906bf ]
It is possible that new command entry index allocation will temporarily fail. The new command holds the semaphore, so it means that a free entry should be ready soon. Add one second retry mechanism before returning an error.
Patch "net/mlx5: Avoid possible free of command entry while timeout comp handler" increase the possibility to bump into this temporarily failure as it delays the entry index release for non-callback commands.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha eranbe@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 37dae95e61d5f..2b597ac365f84 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -883,6 +883,25 @@ static bool opcode_allowed(struct mlx5_cmd *cmd, u16 opcode) return cmd->allowed_opcode == opcode; }
+static int cmd_alloc_index_retry(struct mlx5_cmd *cmd) +{ + unsigned long alloc_end = jiffies + msecs_to_jiffies(1000); + int idx; + +retry: + idx = cmd_alloc_index(cmd); + if (idx < 0 && time_before(jiffies, alloc_end)) { + /* Index allocation can fail on heavy load of commands. This is a temporary + * situation as the current command already holds the semaphore, meaning that + * another command completion is being handled and it is expected to release + * the entry index soon. + */ + cpu_relax(); + goto retry; + } + return idx; +} + static void cmd_work_handler(struct work_struct *work) { struct mlx5_cmd_work_ent *ent = container_of(work, struct mlx5_cmd_work_ent, work); @@ -900,7 +919,7 @@ static void cmd_work_handler(struct work_struct *work) sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); if (!ent->page_queue) { - alloc_ret = cmd_alloc_index(cmd); + alloc_ret = cmd_alloc_index_retry(cmd); if (alloc_ret < 0) { mlx5_core_err_rl(dev, "failed to allocate command entry\n"); if (ent->callback) {
From: Maor Gottlieb maorg@nvidia.com
[ Upstream commit 732ebfab7fe96b7ac9a3df3208f14752a4bb6db3 ]
Fix error flow handling in request_irqs which try to free irq that we failed to request. It fixes the below trace.
WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60 CPU: 1 PID: 7587 Comm: bash Tainted: G W OE 4.15.15-1.el7MELLANOXsmp-x86_64 #1 Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020 RIP: 0010:free_irq+0x4d/0x60 RSP: 0018:ffffc9000ef47af0 EFLAGS: 00010282 RAX: ffff88001476ae00 RBX: 0000000000000655 RCX: 0000000000000000 RDX: ffff88001476ae00 RSI: ffffc9000ef47ab8 RDI: ffff8800398bb478 RBP: ffff88001476a838 R08: ffff88001476ae00 R09: 000000000000156d R10: 0000000000000000 R11: 0000000000000004 R12: ffff88001476a838 R13: 0000000000000006 R14: ffff88001476a888 R15: 00000000ffffffe4 FS: 00007efeadd32740(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc9cc010008 CR3: 00000001a2380004 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: mlx5_irq_table_create+0x38d/0x400 [mlx5_core] ? atomic_notifier_chain_register+0x50/0x60 mlx5_load_one+0x7ee/0x1130 [mlx5_core] init_one+0x4c9/0x650 [mlx5_core] pci_device_probe+0xb8/0x120 driver_probe_device+0x2a1/0x470 ? driver_allows_async_probing+0x30/0x30 bus_for_each_drv+0x54/0x80 __device_attach+0xa3/0x100 pci_bus_add_device+0x4a/0x90 pci_iov_add_virtfn+0x2dc/0x2f0 pci_enable_sriov+0x32e/0x420 mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core] ? kstrtoll+0x22/0x70 num_vf_store+0x4b/0x70 [mlx5_core] kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x140 ? rcu_all_qs+0x5/0x80 ? _cond_resched+0x15/0x30 ? __sb_start_write+0x41/0x80 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x60/0x110 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 24163189da48 ("net/mlx5: Separate IRQ request/free from EQ life cycle") Signed-off-by: Maor Gottlieb maorg@nvidia.com Reviewed-by: Eran Ben Elisha eranbe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c index 373981a659c7c..6fd9749203944 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c @@ -115,7 +115,7 @@ static int request_irqs(struct mlx5_core_dev *dev, int nvec) return 0;
err_request_irq: - for (; i >= 0; i--) { + while (i--) { struct mlx5_irq *irq = mlx5_irq_get(dev, i); int irqn = pci_irq_vector(dev->pdev, i);
From: Aya Levin ayal@mellanox.com
[ Upstream commit c3c9402373fe20e2d08c04f437ce4dcd252cffb2 ]
Prior to this fix, in Striding RQ mode the driver was vulnerable when receiving packets in the range (stride size - headroom, stride size]. Where stride size is calculated by mtu+headroom+tailroom aligned to the closest power of 2. Usually, this filtering is performed by the HW, except for a few cases: - Between 2 VFs over the same PF with different MTUs - On bluefield, when the host physical function sets a larger MTU than the ARM has configured on its representor and uplink representor.
When the HW filtering is not present, packets that are larger than MTU might be harmful for the RQ's integrity, in the following impacts: 1) Overflow from one WQE to the next, causing a memory corruption that in most cases is unharmful: as the write happens to the headroom of next packet, which will be overwritten by build_skb(). In very rare cases, high stress/load, this is harmful. When the next WQE is not yet reposted and points to existing SKB head. 2) Each oversize packet overflows to the headroom of the next WQE. On the last WQE of the WQ, where addresses wrap-around, the address of the remainder headroom does not belong to the next WQE, but it is out of the memory region range. This results in a HW CQE error that moves the RQ into an error state.
Solution: Add a page buffer at the end of each WQE to absorb the leak. Actually the maximal overflow size is headroom but since all memory units must be of the same size, we use page size to comply with UMR WQEs. The increase in memory consumption is of a single page per RQ. Initialize the mkey with all MTTs pointing to a default page. When the channels are activated, UMR WQEs will redirect the RX WQEs to the actual memory from the RQ's pool, while the overflow MTTs remain mapped to the default page.
Fixes: 73281b78a37a ("net/mlx5e: Derive Striding RQ size from MTU") Signed-off-by: Aya Levin ayal@mellanox.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 8 ++- .../net/ethernet/mellanox/mlx5/core/en_main.c | 55 +++++++++++++++++-- 2 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 76b23ba7a4687..cb3857e136d62 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -90,7 +90,12 @@ struct page_pool; #define MLX5_MPWRQ_PAGES_PER_WQE BIT(MLX5_MPWRQ_WQE_PAGE_ORDER)
#define MLX5_MTT_OCTW(npages) (ALIGN(npages, 8) / 2) -#define MLX5E_REQUIRED_WQE_MTTS (ALIGN(MLX5_MPWRQ_PAGES_PER_WQE, 8)) +/* Add another page to MLX5E_REQUIRED_WQE_MTTS as a buffer between + * WQEs, This page will absorb write overflow by the hardware, when + * receiving packets larger than MTU. These oversize packets are + * dropped by the driver at a later stage. + */ +#define MLX5E_REQUIRED_WQE_MTTS (ALIGN(MLX5_MPWRQ_PAGES_PER_WQE + 1, 8)) #define MLX5E_LOG_ALIGNED_MPWQE_PPW (ilog2(MLX5E_REQUIRED_WQE_MTTS)) #define MLX5E_REQUIRED_MTTS(wqes) (wqes * MLX5E_REQUIRED_WQE_MTTS) #define MLX5E_MAX_RQ_NUM_MTTS \ @@ -621,6 +626,7 @@ struct mlx5e_rq { u32 rqn; struct mlx5_core_dev *mdev; struct mlx5_core_mkey umr_mkey; + struct mlx5e_dma_info wqe_overflow;
/* XDP read-mostly */ struct xdp_rxq_info xdp_rxq; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index cccf65fc116ee..b23ad0b6761c4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -258,12 +258,17 @@ static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq,
static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev, u64 npages, u8 page_shift, - struct mlx5_core_mkey *umr_mkey) + struct mlx5_core_mkey *umr_mkey, + dma_addr_t filler_addr) { - int inlen = MLX5_ST_SZ_BYTES(create_mkey_in); + struct mlx5_mtt *mtt; + int inlen; void *mkc; u32 *in; int err; + int i; + + inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + sizeof(*mtt) * npages;
in = kvzalloc(inlen, GFP_KERNEL); if (!in) @@ -283,6 +288,18 @@ static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev, MLX5_SET(mkc, mkc, translations_octword_size, MLX5_MTT_OCTW(npages)); MLX5_SET(mkc, mkc, log_page_size, page_shift); + MLX5_SET(create_mkey_in, in, translations_octword_actual_size, + MLX5_MTT_OCTW(npages)); + + /* Initialize the mkey with all MTTs pointing to a default + * page (filler_addr). When the channels are activated, UMR + * WQEs will redirect the RX WQEs to the actual memory from + * the RQ's pool, while the gaps (wqe_overflow) remain mapped + * to the default page. + */ + mtt = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); + for (i = 0 ; i < npages ; i++) + mtt[i].ptag = cpu_to_be64(filler_addr);
err = mlx5_core_create_mkey(mdev, umr_mkey, in, inlen);
@@ -294,7 +311,8 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq { u64 num_mtts = MLX5E_REQUIRED_MTTS(mlx5_wq_ll_get_size(&rq->mpwqe.wq));
- return mlx5e_create_umr_mkey(mdev, num_mtts, PAGE_SHIFT, &rq->umr_mkey); + return mlx5e_create_umr_mkey(mdev, num_mtts, PAGE_SHIFT, &rq->umr_mkey, + rq->wqe_overflow.addr); }
static inline u64 mlx5e_get_mpwqe_offset(struct mlx5e_rq *rq, u16 wqe_ix) @@ -362,6 +380,28 @@ static void mlx5e_rq_err_cqe_work(struct work_struct *recover_work) mlx5e_reporter_rq_cqe_err(rq); }
+static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq) +{ + rq->wqe_overflow.page = alloc_page(GFP_KERNEL); + if (!rq->wqe_overflow.page) + return -ENOMEM; + + rq->wqe_overflow.addr = dma_map_page(rq->pdev, rq->wqe_overflow.page, 0, + PAGE_SIZE, rq->buff.map_dir); + if (dma_mapping_error(rq->pdev, rq->wqe_overflow.addr)) { + __free_page(rq->wqe_overflow.page); + return -ENOMEM; + } + return 0; +} + +static void mlx5e_free_mpwqe_rq_drop_page(struct mlx5e_rq *rq) +{ + dma_unmap_page(rq->pdev, rq->wqe_overflow.addr, PAGE_SIZE, + rq->buff.map_dir); + __free_page(rq->wqe_overflow.page); +} + static int mlx5e_alloc_rq(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_xsk_param *xsk, @@ -421,6 +461,10 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, if (err) goto err_rq_wq_destroy;
+ err = mlx5e_alloc_mpwqe_rq_drop_page(rq); + if (err) + goto err_rq_wq_destroy; + rq->mpwqe.wq.db = &rq->mpwqe.wq.db[MLX5_RCV_DBR];
wq_sz = mlx5_wq_ll_get_size(&rq->mpwqe.wq); @@ -459,7 +503,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
err = mlx5e_create_rq_umr_mkey(mdev, rq); if (err) - goto err_rq_wq_destroy; + goto err_rq_drop_page; rq->mkey_be = cpu_to_be32(rq->umr_mkey.key);
err = mlx5e_rq_alloc_mpwqe_info(rq, c); @@ -598,6 +642,8 @@ err_free: case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: kvfree(rq->mpwqe.info); mlx5_core_destroy_mkey(mdev, &rq->umr_mkey); +err_rq_drop_page: + mlx5e_free_mpwqe_rq_drop_page(rq); break; default: /* MLX5_WQ_TYPE_CYCLIC */ kvfree(rq->wqe.frags); @@ -631,6 +677,7 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq) case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: kvfree(rq->mpwqe.info); mlx5_core_destroy_mkey(rq->mdev, &rq->umr_mkey); + mlx5e_free_mpwqe_rq_drop_page(rq); break; default: /* MLX5_WQ_TYPE_CYCLIC */ kvfree(rq->wqe.frags);
From: Aya Levin ayal@mellanox.com
[ Upstream commit 2608a2f831c47dfdf18885a7289be5af97182b05 ]
Verify the configured FEC mode is supported by at least a single link mode before applying the command. Otherwise fail the command and return "Operation not supported". Prior to this patch, the command was successful, yet it falsely set all link modes to FEC auto mode - like configuring FEC mode to auto. Auto mode is the default configuration if a link mode doesn't support the configured FEC mode.
Fixes: b5ede32d3329 ("net/mlx5e: Add support for FEC modes based on 50G per lane links") Signed-off-by: Aya Levin ayal@mellanox.com Reviewed-by: Eran Ben Elisha eranbe@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/port.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c index 98e909bf3c1ec..3e32264cf6131 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c @@ -566,6 +566,9 @@ int mlx5e_set_fec_mode(struct mlx5_core_dev *dev, u16 fec_policy) if (fec_policy >= (1 << MLX5E_FEC_LLRS_272_257_1) && !fec_50g_per_lane) return -EOPNOTSUPP;
+ if (fec_policy && !mlx5e_fec_in_caps(dev, fec_policy)) + return -EOPNOTSUPP; + MLX5_SET(pplm_reg, in, local_port, 1); err = mlx5_core_access_reg(dev, in, sz, out, sz, MLX5_REG_PPLM, 0, 0); if (err)
From: Aya Levin ayal@nvidia.com
[ Upstream commit 8c7353b6f716436ad0bfda2b5c5524ab2dde5894 ]
Prior to this patch unloading an interface in promiscuous mode with RX VLAN filtering feature turned off - resulted in a warning. This is due to a wrong condition in the VLAN rules cleanup flow, which left the any-vid rules in the VLAN steering table. These rules prevented destroying the flow group and the flow table.
The any-vid rules are removed in 2 flows, but none of them remove it in case both promiscuous is set and VLAN filtering is off. Fix the issue by changing the condition of the VLAN table cleanup flow to clean also in case of promiscuous mode.
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1 mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1 ... ... ------------[ cut here ]------------ FW pages counter is 11560 after reclaiming all pages WARNING: CPU: 1 PID: 28729 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660 mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: mlx5_function_teardown+0x2f/0x90 [mlx5_core] mlx5_unload_one+0x71/0x110 [mlx5_core] remove_one+0x44/0x80 [mlx5_core] pci_device_remove+0x3e/0xc0 device_release_driver_internal+0xfb/0x1c0 device_release_driver+0x12/0x20 pci_stop_bus_device+0x68/0x90 pci_stop_and_remove_bus_device+0x12/0x20 hv_eject_device_work+0x6f/0x170 [pci_hyperv] ? __schedule+0x349/0x790 process_one_work+0x206/0x400 worker_thread+0x34/0x3f0 ? process_one_work+0x400/0x400 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 ---[ end trace 6283bde8d26170dc ]---
Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 73d3dc07331f1..c5be0cdfaf0fa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -415,8 +415,12 @@ static void mlx5e_del_vlan_rules(struct mlx5e_priv *priv) for_each_set_bit(i, priv->fs.vlan.active_svlans, VLAN_N_VID) mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, i);
- if (priv->fs.vlan.cvlan_filter_disabled && - !(priv->netdev->flags & IFF_PROMISC)) + WARN_ON_ONCE(!(test_bit(MLX5E_STATE_DESTROYING, &priv->state))); + + /* must be called after DESTROY bit is set and + * set_rx_mode is called and flushed + */ + if (priv->fs.vlan.cvlan_filter_disabled) mlx5e_del_any_vid_rules(priv); }
From: Aya Levin ayal@nvidia.com
[ Upstream commit d4a16052bccdd695982f89d815ca075825115821 ]
When interface is attached while in promiscuous mode and with VLAN filtering turned off, both configurations are not respected and VLAN filtering is performed. There are 2 flows which add the any-vid rules during interface attach: VLAN creation table and set rx mode. Each is relaying on the other to add any-vid rules, eventually non of them does.
Fix this by adding any-vid rules on VLAN creation regardless of promiscuous mode.
Fixes: 9df30601c843 ("net/mlx5e: Restore vlan filter after seamless reset") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index c5be0cdfaf0fa..713dc210f710c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -217,6 +217,9 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv, break; }
+ if (WARN_ONCE(*rule_p, "VLAN rule already exists type %d", rule_type)) + return 0; + *rule_p = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
if (IS_ERR(*rule_p)) { @@ -397,8 +400,7 @@ static void mlx5e_add_vlan_rules(struct mlx5e_priv *priv) for_each_set_bit(i, priv->fs.vlan.active_svlans, VLAN_N_VID) mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_STAG_VID, i);
- if (priv->fs.vlan.cvlan_filter_disabled && - !(priv->netdev->flags & IFF_PROMISC)) + if (priv->fs.vlan.cvlan_filter_disabled) mlx5e_add_any_vid_rules(priv); }
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit 1253935ad801485270194d5651acab04abc97b36 ]
Current neigh update event handler implementation takes reference to neighbour structure, assigns it to nhe->n, tries to schedule workqueue task and releases the reference if task was already enqueued. This results potentially overwriting existing nhe->n pointer with another neighbour instance, which causes double release of the instance (once in neigh update handler that failed to enqueue to workqueue and another one in neigh update workqueue task that processes updated nhe->n pointer instead of original one):
[ 3376.512806] ------------[ cut here ]------------ [ 3376.513534] refcount_t: underflow; use-after-free. [ 3376.521213] Modules linked in: act_skbedit act_mirred act_tunnel_key vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp rpcrdma rdma_ucm ib_umad ib_ipoib ib_iser rdma_cm ib_cm iw_cm rfkill ib_uverbs ib_core sunrpc kvm_intel kvm iTCO_wdt iTCO_vendor_support virtio_net irqbypass net_failover crc32_pclmul lpc_ich i2c_i801 failover pcspkr i2c_smbus mfd_core ghash_clmulni_intel sch_fq_codel drm i2c _core ip_tables crc32c_intel serio_raw [last unloaded: mlxfw] [ 3376.529468] CPU: 8 PID: 22756 Comm: kworker/u20:5 Not tainted 5.9.0-rc5+ #6 [ 3376.530399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 3376.531975] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [ 3376.532820] RIP: 0010:refcount_warn_saturate+0xd8/0xe0 [ 3376.533589] Code: ff 48 c7 c7 e0 b8 27 82 c6 05 0b b6 09 01 01 e8 94 93 c1 ff 0f 0b c3 48 c7 c7 88 b8 27 82 c6 05 f7 b5 09 01 01 e8 7e 93 c1 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 [ 3376.536017] RSP: 0018:ffffc90002a97e30 EFLAGS: 00010286 [ 3376.536793] RAX: 0000000000000000 RBX: ffff8882de30d648 RCX: 0000000000000000 [ 3376.537718] RDX: ffff8882f5c28f20 RSI: ffff8882f5c18e40 RDI: ffff8882f5c18e40 [ 3376.538654] RBP: ffff8882cdf56c00 R08: 000000000000c580 R09: 0000000000001a4d [ 3376.539582] R10: 0000000000000731 R11: ffffc90002a97ccd R12: 0000000000000000 [ 3376.540519] R13: ffff8882de30d600 R14: ffff8882de30d640 R15: ffff88821e000900 [ 3376.541444] FS: 0000000000000000(0000) GS:ffff8882f5c00000(0000) knlGS:0000000000000000 [ 3376.542732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3376.543545] CR2: 0000556e5504b248 CR3: 00000002c6f10005 CR4: 0000000000770ee0 [ 3376.544483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3376.545419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3376.546344] PKRU: 55555554 [ 3376.546911] Call Trace: [ 3376.547479] mlx5e_rep_neigh_update.cold+0x33/0xe2 [mlx5_core] [ 3376.548299] process_one_work+0x1d8/0x390 [ 3376.548977] worker_thread+0x4d/0x3e0 [ 3376.549631] ? rescuer_thread+0x3e0/0x3e0 [ 3376.550295] kthread+0x118/0x130 [ 3376.550914] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3376.551675] ret_from_fork+0x1f/0x30 [ 3376.552312] ---[ end trace d84e8f46d2a77eec ]---
Fix the bug by moving work_struct to dedicated dynamically-allocated structure. This enabled every event handler to work on its own private neighbour pointer and removes the need for handling the case when task is already enqueued.
Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/en/rep/neigh.c | 81 ++++++++++++------- .../net/ethernet/mellanox/mlx5/core/en_rep.h | 6 -- 2 files changed, 50 insertions(+), 37 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c index c3d167fa944c7..6a9d783d129b2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c @@ -109,11 +109,25 @@ static void mlx5e_rep_neigh_stats_work(struct work_struct *work) rtnl_unlock(); }
+struct neigh_update_work { + struct work_struct work; + struct neighbour *n; + struct mlx5e_neigh_hash_entry *nhe; +}; + +static void mlx5e_release_neigh_update_work(struct neigh_update_work *update_work) +{ + neigh_release(update_work->n); + mlx5e_rep_neigh_entry_release(update_work->nhe); + kfree(update_work); +} + static void mlx5e_rep_neigh_update(struct work_struct *work) { - struct mlx5e_neigh_hash_entry *nhe = - container_of(work, struct mlx5e_neigh_hash_entry, neigh_update_work); - struct neighbour *n = nhe->n; + struct neigh_update_work *update_work = container_of(work, struct neigh_update_work, + work); + struct mlx5e_neigh_hash_entry *nhe = update_work->nhe; + struct neighbour *n = update_work->n; struct mlx5e_encap_entry *e; unsigned char ha[ETH_ALEN]; struct mlx5e_priv *priv; @@ -145,30 +159,42 @@ static void mlx5e_rep_neigh_update(struct work_struct *work) mlx5e_rep_update_flows(priv, e, neigh_connected, ha); mlx5e_encap_put(priv, e); } - mlx5e_rep_neigh_entry_release(nhe); rtnl_unlock(); - neigh_release(n); + mlx5e_release_neigh_update_work(update_work); }
-static void mlx5e_rep_queue_neigh_update_work(struct mlx5e_priv *priv, - struct mlx5e_neigh_hash_entry *nhe, - struct neighbour *n) +static struct neigh_update_work *mlx5e_alloc_neigh_update_work(struct mlx5e_priv *priv, + struct neighbour *n) { - /* Take a reference to ensure the neighbour and mlx5 encap - * entry won't be destructed until we drop the reference in - * delayed work. - */ - neigh_hold(n); + struct neigh_update_work *update_work; + struct mlx5e_neigh_hash_entry *nhe; + struct mlx5e_neigh m_neigh = {};
- /* This assignment is valid as long as the the neigh reference - * is taken - */ - nhe->n = n; + update_work = kzalloc(sizeof(*update_work), GFP_ATOMIC); + if (WARN_ON(!update_work)) + return NULL;
- if (!queue_work(priv->wq, &nhe->neigh_update_work)) { - mlx5e_rep_neigh_entry_release(nhe); - neigh_release(n); + m_neigh.dev = n->dev; + m_neigh.family = n->ops->family; + memcpy(&m_neigh.dst_ip, n->primary_key, n->tbl->key_len); + + /* Obtain reference to nhe as last step in order not to release it in + * atomic context. + */ + rcu_read_lock(); + nhe = mlx5e_rep_neigh_entry_lookup(priv, &m_neigh); + rcu_read_unlock(); + if (!nhe) { + kfree(update_work); + return NULL; } + + INIT_WORK(&update_work->work, mlx5e_rep_neigh_update); + neigh_hold(n); + update_work->n = n; + update_work->nhe = nhe; + + return update_work; }
static int mlx5e_rep_netevent_event(struct notifier_block *nb, @@ -180,7 +206,7 @@ static int mlx5e_rep_netevent_event(struct notifier_block *nb, struct net_device *netdev = rpriv->netdev; struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5e_neigh_hash_entry *nhe = NULL; - struct mlx5e_neigh m_neigh = {}; + struct neigh_update_work *update_work; struct neigh_parms *p; struct neighbour *n; bool found = false; @@ -195,17 +221,11 @@ static int mlx5e_rep_netevent_event(struct notifier_block *nb, #endif return NOTIFY_DONE;
- m_neigh.dev = n->dev; - m_neigh.family = n->ops->family; - memcpy(&m_neigh.dst_ip, n->primary_key, n->tbl->key_len); - - rcu_read_lock(); - nhe = mlx5e_rep_neigh_entry_lookup(priv, &m_neigh); - rcu_read_unlock(); - if (!nhe) + update_work = mlx5e_alloc_neigh_update_work(priv, n); + if (!update_work) return NOTIFY_DONE;
- mlx5e_rep_queue_neigh_update_work(priv, nhe, n); + queue_work(priv->wq, &update_work->work); break;
case NETEVENT_DELAY_PROBE_TIME_UPDATE: @@ -351,7 +371,6 @@ int mlx5e_rep_neigh_entry_create(struct mlx5e_priv *priv,
(*nhe)->priv = priv; memcpy(&(*nhe)->m_neigh, &e->m_neigh, sizeof(e->m_neigh)); - INIT_WORK(&(*nhe)->neigh_update_work, mlx5e_rep_neigh_update); spin_lock_init(&(*nhe)->encap_list_lock); INIT_LIST_HEAD(&(*nhe)->encap_list); refcount_set(&(*nhe)->refcnt, 1); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h index 1d56698014843..fcaabafb2e56d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h @@ -133,12 +133,6 @@ struct mlx5e_neigh_hash_entry { /* encap list sharing the same neigh */ struct list_head encap_list;
- /* valid only when the neigh reference is taken during - * neigh_update_work workqueue callback. - */ - struct neighbour *n; - struct work_struct neigh_update_work; - /* neigh hash entry can be deleted only when the refcount is zero. * refcount is needed to avoid neigh hash entry removal by TC, while * it's used by the neigh notification call.
From: Vineetha G. Jaya Kumaran vineetha.g.jaya.kumaran@intel.com
[ Upstream commit 388e201d41fa1ed8f2dce0f0567f56f8e919ffb0 ]
Ethtool manual stated that the tx-timer is the "the amount of time the device should stay in idle mode prior to asserting its Tx LPI". The previous implementation for "ethtool --set-eee tx-timer" sets the LPI TW timer duration which is not correct. Hence, this patch fixes the "ethtool --set-eee tx-timer" to configure the EEE LPI timer.
The LPI TW Timer will be using the defined default value instead of "ethtool --set-eee tx-timer" which follows the EEE LS timer implementation.
Changelog V2 *Not removing/modifying the eee_timer. *EEE LPI timer can be configured through ethtool and also the eee_timer module param. *EEE TW Timer will be configured with default value only, not able to be configured through ethtool or module param. This follows the implementation of the EEE LS Timer.
Fixes: d765955d2ae0 ("stmmac: add the Energy Efficient Ethernet support") Signed-off-by: Vineetha G. Jaya Kumaran vineetha.g.jaya.kumaran@intel.com Signed-off-by: Voon Weifeng weifeng.voon@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ .../ethernet/stmicro/stmmac/stmmac_ethtool.c | 12 +++++++++- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 23 ++++++++++++------- 3 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 9c02fc754bf1b..545696971f65e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -203,6 +203,8 @@ struct stmmac_priv { int eee_enabled; int eee_active; int tx_lpi_timer; + int tx_lpi_enabled; + int eee_tw_timer; unsigned int mode; unsigned int chain_mode; int extend_desc; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index c16d0cc3e9c44..b82c6715f95f3 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -652,6 +652,7 @@ static int stmmac_ethtool_op_get_eee(struct net_device *dev, edata->eee_enabled = priv->eee_enabled; edata->eee_active = priv->eee_active; edata->tx_lpi_timer = priv->tx_lpi_timer; + edata->tx_lpi_enabled = priv->tx_lpi_enabled;
return phylink_ethtool_get_eee(priv->phylink, edata); } @@ -665,6 +666,10 @@ static int stmmac_ethtool_op_set_eee(struct net_device *dev, if (!priv->dma_cap.eee) return -EOPNOTSUPP;
+ if (priv->tx_lpi_enabled != edata->tx_lpi_enabled) + netdev_warn(priv->dev, + "Setting EEE tx-lpi is not supported\n"); + if (!edata->eee_enabled) stmmac_disable_eee_mode(priv);
@@ -672,7 +677,12 @@ static int stmmac_ethtool_op_set_eee(struct net_device *dev, if (ret) return ret;
- priv->tx_lpi_timer = edata->tx_lpi_timer; + if (edata->eee_enabled && + priv->tx_lpi_timer != edata->tx_lpi_timer) { + priv->tx_lpi_timer = edata->tx_lpi_timer; + stmmac_eee_init(priv); + } + return 0; }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 73677c3b33b65..73465e5f5a417 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -94,7 +94,7 @@ static const u32 default_msg_level = (NETIF_MSG_DRV | NETIF_MSG_PROBE | static int eee_timer = STMMAC_DEFAULT_LPI_TIMER; module_param(eee_timer, int, 0644); MODULE_PARM_DESC(eee_timer, "LPI tx expiration time in msec"); -#define STMMAC_LPI_T(x) (jiffies + msecs_to_jiffies(x)) +#define STMMAC_LPI_T(x) (jiffies + usecs_to_jiffies(x))
/* By default the driver will use the ring mode to manage tx and rx descriptors, * but allow user to force to use the chain instead of the ring @@ -370,7 +370,7 @@ static void stmmac_eee_ctrl_timer(struct timer_list *t) struct stmmac_priv *priv = from_timer(priv, t, eee_ctrl_timer);
stmmac_enable_eee_mode(priv); - mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer)); + mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(priv->tx_lpi_timer)); }
/** @@ -383,7 +383,7 @@ static void stmmac_eee_ctrl_timer(struct timer_list *t) */ bool stmmac_eee_init(struct stmmac_priv *priv) { - int tx_lpi_timer = priv->tx_lpi_timer; + int eee_tw_timer = priv->eee_tw_timer;
/* Using PCS we cannot dial with the phy registers at this stage * so we do not support extra feature like EEE. @@ -403,7 +403,7 @@ bool stmmac_eee_init(struct stmmac_priv *priv) if (priv->eee_enabled) { netdev_dbg(priv->dev, "disable EEE\n"); del_timer_sync(&priv->eee_ctrl_timer); - stmmac_set_eee_timer(priv, priv->hw, 0, tx_lpi_timer); + stmmac_set_eee_timer(priv, priv->hw, 0, eee_tw_timer); } mutex_unlock(&priv->lock); return false; @@ -411,11 +411,12 @@ bool stmmac_eee_init(struct stmmac_priv *priv)
if (priv->eee_active && !priv->eee_enabled) { timer_setup(&priv->eee_ctrl_timer, stmmac_eee_ctrl_timer, 0); - mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer)); stmmac_set_eee_timer(priv, priv->hw, STMMAC_DEFAULT_LIT_LS, - tx_lpi_timer); + eee_tw_timer); }
+ mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(priv->tx_lpi_timer)); + mutex_unlock(&priv->lock); netdev_dbg(priv->dev, "Energy-Efficient Ethernet initialized\n"); return true; @@ -930,6 +931,7 @@ static void stmmac_mac_link_down(struct phylink_config *config,
stmmac_mac_set(priv, priv->ioaddr, false); priv->eee_active = false; + priv->tx_lpi_enabled = false; stmmac_eee_init(priv); stmmac_set_eee_pls(priv, priv->hw, false); } @@ -1027,6 +1029,7 @@ static void stmmac_mac_link_up(struct phylink_config *config, if (phy && priv->dma_cap.eee) { priv->eee_active = phy_init_eee(phy, 1) >= 0; priv->eee_enabled = stmmac_eee_init(priv); + priv->tx_lpi_enabled = priv->eee_enabled; stmmac_set_eee_pls(priv, priv->hw, true); } } @@ -2057,7 +2060,7 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
if ((priv->eee_enabled) && (!priv->tx_path_in_lpi_mode)) { stmmac_enable_eee_mode(priv); - mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer)); + mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(priv->tx_lpi_timer)); }
/* We still have pending packets, let's call for a new scheduling */ @@ -2690,7 +2693,11 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp) netdev_warn(priv->dev, "PTP init failed\n"); }
- priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS; + priv->eee_tw_timer = STMMAC_DEFAULT_TWT_LS; + + /* Convert the timer from msec to usec */ + if (!priv->tx_lpi_timer) + priv->tx_lpi_timer = eee_timer * 1000;
if (priv->use_riwt) { if (!priv->rx_riwt)
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 1f7e877c20517735bceff1535e1b7fa846b2f215 ]
Fix many (lots deleted here) build errors in hinic by selecting NET_DEVLINK.
ld: drivers/net/ethernet/huawei/hinic/hinic_hw_dev.o: in function `mgmt_watchdog_timeout_event_handler': hinic_hw_dev.c:(.text+0x30a): undefined reference to `devlink_health_report' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump': hinic_devlink.c:(.text+0x1c): undefined reference to `devlink_fmsg_u32_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_fw_reporter_dump': hinic_devlink.c:(.text+0x126): undefined reference to `devlink_fmsg_binary_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_hw_reporter_dump': hinic_devlink.c:(.text+0x1ba): undefined reference to `devlink_fmsg_string_pair_put' ld: hinic_devlink.c:(.text+0x227): undefined reference to `devlink_fmsg_u8_pair_put' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_alloc': hinic_devlink.c:(.text+0xaee): undefined reference to `devlink_alloc' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_free': hinic_devlink.c:(.text+0xb04): undefined reference to `devlink_free' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_register': hinic_devlink.c:(.text+0xb26): undefined reference to `devlink_register' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_devlink_unregister': hinic_devlink.c:(.text+0xb46): undefined reference to `devlink_unregister' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_create': hinic_devlink.c:(.text+0xb75): undefined reference to `devlink_health_reporter_create' ld: hinic_devlink.c:(.text+0xb95): undefined reference to `devlink_health_reporter_create' ld: hinic_devlink.c:(.text+0xbac): undefined reference to `devlink_health_reporter_destroy' ld: drivers/net/ethernet/huawei/hinic/hinic_devlink.o: in function `hinic_health_reporters_destroy':
Fixes: 51ba902a16e6 ("net-next/hinic: Initialize hw interface") Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Bin Luo luobin9@huawei.com Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Cc: Aviad Krawczyk aviad.krawczyk@huawei.com Cc: Zhao Chen zhaochen6@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/huawei/hinic/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/huawei/hinic/Kconfig b/drivers/net/ethernet/huawei/hinic/Kconfig index 936e2dd3bb135..b47bd5440c5f0 100644 --- a/drivers/net/ethernet/huawei/hinic/Kconfig +++ b/drivers/net/ethernet/huawei/hinic/Kconfig @@ -6,6 +6,7 @@ config HINIC tristate "Huawei Intelligent PCIE Network Interface Card" depends on (PCI_MSI && (X86 || ARM64)) + select NET_DEVLINK help This driver supports HiNIC PCIE Ethernet cards. To compile this driver as part of the kernel, choose Y here.
From: Si-Wei Liu si-wei.liu@oracle.com
[ Upstream commit 1477c8aebb94a1db398c12d929a9d27bbd678d8c ]
vhost_vdpa_map() should remove the iotlb entry just added if the corresponding mapping fails to set up properly.
Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu si-wei.liu@oracle.com Link: https://lore.kernel.org/r/1601701330-16837-2-git-send-email-si-wei.liu@oracl... Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vdpa.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index a54b60d6623f0..5259f5210b375 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -527,6 +527,9 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, r = iommu_map(v->domain, iova, pa, size, perm_to_iommu_flags(perm));
+ if (r) + vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1); + return r; }
From: Si-Wei Liu si-wei.liu@oracle.com
[ Upstream commit 7ed9e3d97c32d969caded2dfb6e67c1a2cc5a0b1 ]
Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. As the inflight pinned pages, specifically for memory region that strides across multiple chunks, would need more than one free page for book keeping and accounting. For simplicity, pin pages for all memory in the IOVA range in one go rather than have multiple pin_user_pages calls to make up the entire region. This way it's easier to track and account the pages already mapped, particularly for clean-up in the error path.
Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu si-wei.liu@oracle.com Link: https://lore.kernel.org/r/1601701330-16837-3-git-send-email-si-wei.liu@oracl... Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vdpa.c | 119 ++++++++++++++++++++++++++----------------- 1 file changed, 71 insertions(+), 48 deletions(-)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 5259f5210b375..e172c2efc663c 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -555,21 +555,19 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, struct vhost_dev *dev = &v->vdev; struct vhost_iotlb *iotlb = dev->iotlb; struct page **page_list; - unsigned long list_size = PAGE_SIZE / sizeof(struct page *); + struct vm_area_struct **vmas; unsigned int gup_flags = FOLL_LONGTERM; - unsigned long npages, cur_base, map_pfn, last_pfn = 0; - unsigned long locked, lock_limit, pinned, i; + unsigned long map_pfn, last_pfn = 0; + unsigned long npages, lock_limit; + unsigned long i, nmap = 0; u64 iova = msg->iova; + long pinned; int ret = 0;
if (vhost_iotlb_itree_first(iotlb, msg->iova, msg->iova + msg->size - 1)) return -EEXIST;
- page_list = (struct page **) __get_free_page(GFP_KERNEL); - if (!page_list) - return -ENOMEM; - if (msg->perm & VHOST_ACCESS_WO) gup_flags |= FOLL_WRITE;
@@ -577,61 +575,86 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, if (!npages) return -EINVAL;
+ page_list = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); + vmas = kvmalloc_array(npages, sizeof(struct vm_area_struct *), + GFP_KERNEL); + if (!page_list || !vmas) { + ret = -ENOMEM; + goto free; + } + mmap_read_lock(dev->mm);
- locked = atomic64_add_return(npages, &dev->mm->pinned_vm); lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - - if (locked > lock_limit) { + if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) { ret = -ENOMEM; - goto out; + goto unlock; }
- cur_base = msg->uaddr & PAGE_MASK; - iova &= PAGE_MASK; + pinned = pin_user_pages(msg->uaddr & PAGE_MASK, npages, gup_flags, + page_list, vmas); + if (npages != pinned) { + if (pinned < 0) { + ret = pinned; + } else { + unpin_user_pages(page_list, pinned); + ret = -ENOMEM; + } + goto unlock; + }
- while (npages) { - pinned = min_t(unsigned long, npages, list_size); - ret = pin_user_pages(cur_base, pinned, - gup_flags, page_list, NULL); - if (ret != pinned) - goto out; - - if (!last_pfn) - map_pfn = page_to_pfn(page_list[0]); - - for (i = 0; i < ret; i++) { - unsigned long this_pfn = page_to_pfn(page_list[i]); - u64 csize; - - if (last_pfn && (this_pfn != last_pfn + 1)) { - /* Pin a contiguous chunk of memory */ - csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; - if (vhost_vdpa_map(v, iova, csize, - map_pfn << PAGE_SHIFT, - msg->perm)) - goto out; - map_pfn = this_pfn; - iova += csize; + iova &= PAGE_MASK; + map_pfn = page_to_pfn(page_list[0]); + + /* One more iteration to avoid extra vdpa_map() call out of loop. */ + for (i = 0; i <= npages; i++) { + unsigned long this_pfn; + u64 csize; + + /* The last chunk may have no valid PFN next to it */ + this_pfn = i < npages ? page_to_pfn(page_list[i]) : -1UL; + + if (last_pfn && (this_pfn == -1UL || + this_pfn != last_pfn + 1)) { + /* Pin a contiguous chunk of memory */ + csize = last_pfn - map_pfn + 1; + ret = vhost_vdpa_map(v, iova, csize << PAGE_SHIFT, + map_pfn << PAGE_SHIFT, + msg->perm); + if (ret) { + /* + * Unpin the rest chunks of memory on the + * flight with no corresponding vdpa_map() + * calls having been made yet. On the other + * hand, vdpa_unmap() in the failure path + * is in charge of accounting the number of + * pinned pages for its own. + * This asymmetrical pattern of accounting + * is for efficiency to pin all pages at + * once, while there is no other callsite + * of vdpa_map() than here above. + */ + unpin_user_pages(&page_list[nmap], + npages - nmap); + goto out; } - - last_pfn = this_pfn; + atomic64_add(csize, &dev->mm->pinned_vm); + nmap += csize; + iova += csize << PAGE_SHIFT; + map_pfn = this_pfn; } - - cur_base += ret << PAGE_SHIFT; - npages -= ret; + last_pfn = this_pfn; }
- /* Pin the rest chunk */ - ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, - map_pfn << PAGE_SHIFT, msg->perm); + WARN_ON(nmap != npages); out: - if (ret) { + if (ret) vhost_vdpa_unmap(v, msg->iova, msg->size); - atomic64_sub(npages, &dev->mm->pinned_vm); - } +unlock: mmap_read_unlock(dev->mm); - free_page((unsigned long)page_list); +free: + kvfree(vmas); + kvfree(page_list); return ret; }
From: Tom Rix trix@redhat.com
[ Upstream commit f4544e5361da5050ff5c0330ceea095cb5dbdd72 ]
clang static analysis reports this problem:
drivers/net/ethernet/marvell/mvneta.c:3465:2: warning: Attempt to free released memory kfree(txq->buf); ^~~~~~~~~~~~~~~
When mvneta_txq_sw_init() fails to alloc txq->tso_hdrs, it frees without poisoning txq->buf. The error is caught in the mvneta_setup_txqs() caller which handles the error by cleaning up all of the txqs with a call to mvneta_txq_sw_deinit which also frees txq->buf.
Since mvneta_txq_sw_deinit is a general cleaner, all of the partial cleaning in mvneta_txq_sw_deinit()'s error handling is not needed.
Fixes: 2adb719d74f6 ("net: mvneta: Implement software TSO") Signed-off-by: Tom Rix trix@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvneta.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 7d5d9d34f4e47..69a234e83b8b7 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3372,24 +3372,15 @@ static int mvneta_txq_sw_init(struct mvneta_port *pp, txq->last_desc = txq->size - 1;
txq->buf = kmalloc_array(txq->size, sizeof(*txq->buf), GFP_KERNEL); - if (!txq->buf) { - dma_free_coherent(pp->dev->dev.parent, - txq->size * MVNETA_DESC_ALIGNED_SIZE, - txq->descs, txq->descs_phys); + if (!txq->buf) return -ENOMEM; - }
/* Allocate DMA buffers for TSO MAC/IP/TCP headers */ txq->tso_hdrs = dma_alloc_coherent(pp->dev->dev.parent, txq->size * TSO_HEADER_SIZE, &txq->tso_hdrs_phys, GFP_KERNEL); - if (!txq->tso_hdrs) { - kfree(txq->buf); - dma_free_coherent(pp->dev->dev.parent, - txq->size * MVNETA_DESC_ALIGNED_SIZE, - txq->descs, txq->descs_phys); + if (!txq->tso_hdrs) return -ENOMEM; - }
/* Setup XPS mapping */ if (txq_number > 1)
From: Marc Dionne marc.dionne@auristor.com
[ Upstream commit 56305118e05b2db8d0395bba640ac9a3aee92624 ]
The session key should be encoded with just the 8 data bytes and no length; ENCODE_DATA precedes it with a 4 byte length, which confuses some existing tools that try to parse this format.
Add an ENCODE_BYTES macro that does not include a length, and use it for the key. Also adjust the expected length.
Note that commit 774521f353e1d ("rxrpc: Fix an assertion in rxrpc_read()") had fixed a BUG by changing the length rather than fixing the encoding. The original length was correct.
Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)") Signed-off-by: Marc Dionne marc.dionne@auristor.com Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/key.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c index 0c98313dd7a8c..d77e89766406a 100644 --- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -1073,7 +1073,7 @@ static long rxrpc_read(const struct key *key,
switch (token->security_index) { case RXRPC_SECURITY_RXKAD: - toksize += 9 * 4; /* viceid, kvno, key*2 + len, begin, + toksize += 8 * 4; /* viceid, kvno, key*2, begin, * end, primary, tktlen */ toksize += RND(token->kad->ticket_len); break; @@ -1139,6 +1139,14 @@ static long rxrpc_read(const struct key *key, memcpy((u8 *)xdr + _l, &zero, 4 - (_l & 3)); \ xdr += (_l + 3) >> 2; \ } while(0) +#define ENCODE_BYTES(l, s) \ + do { \ + u32 _l = (l); \ + memcpy(xdr, (s), _l); \ + if (_l & 3) \ + memcpy((u8 *)xdr + _l, &zero, 4 - (_l & 3)); \ + xdr += (_l + 3) >> 2; \ + } while(0) #define ENCODE64(x) \ do { \ __be64 y = cpu_to_be64(x); \ @@ -1166,7 +1174,7 @@ static long rxrpc_read(const struct key *key, case RXRPC_SECURITY_RXKAD: ENCODE(token->kad->vice_id); ENCODE(token->kad->kvno); - ENCODE_DATA(8, token->kad->session_key); + ENCODE_BYTES(8, token->kad->session_key); ENCODE(token->kad->start); ENCODE(token->kad->expiry); ENCODE(token->kad->primary_flag);
From: David Howells dhowells@redhat.com
[ Upstream commit 9a059cd5ca7d9c5c4ca5a6e755cf72f230176b6a ]
If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a type it doesn't recognise, it can BUG in a couple of places, which is unnecessary as it can easily get back to userspace.
Fix this to print an error message instead.
Fixes: 99455153d067 ("RxRPC: Parse security index 5 keys (Kerberos 5)") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/key.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c index d77e89766406a..32f46edcf7c67 100644 --- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -1108,7 +1108,8 @@ static long rxrpc_read(const struct key *key, break;
default: /* we have a ticket we can't encode */ - BUG(); + pr_err("Unsupported key token type (%u)\n", + token->security_index); continue; }
@@ -1224,7 +1225,6 @@ static long rxrpc_read(const struct key *key, break;
default: - BUG(); break; }
From: David Howells dhowells@redhat.com
[ Upstream commit fa1d113a0f96f9ab7e4fe4f8825753ba1e34a9d3 ]
conn->state_lock may be taken in softirq mode, but a previous patch replaced an outer lock in the response-packet event handling code, and lost the _bh from that when doing so.
Fix this by applying the _bh annotation to the state_lock locking.
Fixes: a1399f8bb033 ("rxrpc: Call channels should have separate call number spaces") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/conn_event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 447f55ca68860..6e972b4823efa 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -340,18 +340,18 @@ static int rxrpc_process_event(struct rxrpc_connection *conn, return ret;
spin_lock(&conn->channel_lock); - spin_lock(&conn->state_lock); + spin_lock_bh(&conn->state_lock);
if (conn->state == RXRPC_CONN_SERVICE_CHALLENGING) { conn->state = RXRPC_CONN_SERVICE; - spin_unlock(&conn->state_lock); + spin_unlock_bh(&conn->state_lock); for (loop = 0; loop < RXRPC_MAXCALLS; loop++) rxrpc_call_is_secure( rcu_dereference_protected( conn->channels[loop].call, lockdep_is_held(&conn->channel_lock))); } else { - spin_unlock(&conn->state_lock); + spin_unlock_bh(&conn->state_lock); }
spin_unlock(&conn->channel_lock);
From: David Howells dhowells@redhat.com
[ Upstream commit fea99111244bae44e7d82a973744d27ea1567814 ]
The keyring containing the server's tokens isn't network-namespaced, so it shouldn't be looked up with a network namespace. It is expected to be owned specifically by the server, so namespacing is unnecessary.
Fixes: a58946c158a0 ("keys: Pass the network namespace into request_key mechanism") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/key.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c index 32f46edcf7c67..64cbbd2f16944 100644 --- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -941,7 +941,7 @@ int rxrpc_server_keyring(struct rxrpc_sock *rx, char __user *optval, if (IS_ERR(description)) return PTR_ERR(description);
- key = request_key_net(&key_type_keyring, description, sock_net(&rx->sk), NULL); + key = request_key(&key_type_keyring, description, NULL); if (IS_ERR(key)) { kfree(description); _leave(" = %ld", PTR_ERR(key));
From: David Howells dhowells@redhat.com
[ Upstream commit 38b1dc47a35ba14c3f4472138ea56d014c2d609b ]
If someone calls setsockopt() twice to set a server key keyring, the first keyring is leaked.
Fix it to return an error instead if the server key keyring is already set.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/key.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c index 64cbbd2f16944..85a9ff8cd236a 100644 --- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -903,7 +903,7 @@ int rxrpc_request_key(struct rxrpc_sock *rx, char __user *optval, int optlen)
_enter("");
- if (optlen <= 0 || optlen > PAGE_SIZE - 1) + if (optlen <= 0 || optlen > PAGE_SIZE - 1 || rx->securities) return -EINVAL;
description = memdup_user_nul(optval, optlen);
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 589aa6e7c9de322d47eb33a5cee8cc38838319e6 ]
To follow the model of felix and seville where we have one platform-specific file, rename this file to the actual SoC it serves.
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/Makefile | 2 +- drivers/net/ethernet/mscc/{ocelot_board.c => ocelot_vsc7514.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename drivers/net/ethernet/mscc/{ocelot_board.c => ocelot_vsc7514.c} (100%)
diff --git a/drivers/net/ethernet/mscc/Makefile b/drivers/net/ethernet/mscc/Makefile index 91b33b55054e1..ad97a5cca6f99 100644 --- a/drivers/net/ethernet/mscc/Makefile +++ b/drivers/net/ethernet/mscc/Makefile @@ -2,4 +2,4 @@ obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot_common.o mscc_ocelot_common-y := ocelot.o ocelot_io.o mscc_ocelot_common-y += ocelot_regs.o ocelot_tc.o ocelot_police.o ocelot_ace.o ocelot_flower.o ocelot_ptp.o -obj-$(CONFIG_MSCC_OCELOT_SWITCH_OCELOT) += ocelot_board.o +obj-$(CONFIG_MSCC_OCELOT_SWITCH_OCELOT) += ocelot_vsc7514.o diff --git a/drivers/net/ethernet/mscc/ocelot_board.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c similarity index 100% rename from drivers/net/ethernet/mscc/ocelot_board.c rename to drivers/net/ethernet/mscc/ocelot_vsc7514.c
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit e8e6e73db14273464b374d49ca7242c0994945f3 ]
We don't want ocelot_port_set_maxlen to enable pause frame TX, just to adjust the pause thresholds.
Move the unconditional enabling of pause TX to ocelot_init_port. There is no good place to put such setting because it shouldn't be unconditional. But at the moment it is, we're not changing that.
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index d0b79cca51840..6e68713c0ac6b 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -2012,6 +2012,7 @@ void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu) { struct ocelot_port *ocelot_port = ocelot->ports[port]; int maxlen = sdu + ETH_HLEN + ETH_FCS_LEN; + int pause_start, pause_stop; int atop_wm;
if (port == ocelot->npi) { @@ -2025,13 +2026,13 @@ void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu)
ocelot_port_writel(ocelot_port, maxlen, DEV_MAC_MAXLEN_CFG);
- /* Set Pause WM hysteresis - * 152 = 6 * maxlen / OCELOT_BUFFER_CELL_SZ - * 101 = 4 * maxlen / OCELOT_BUFFER_CELL_SZ - */ - ocelot_write_rix(ocelot, SYS_PAUSE_CFG_PAUSE_ENA | - SYS_PAUSE_CFG_PAUSE_STOP(101) | - SYS_PAUSE_CFG_PAUSE_START(152), SYS_PAUSE_CFG, port); + /* Set Pause watermark hysteresis */ + pause_start = 6 * maxlen / OCELOT_BUFFER_CELL_SZ; + pause_stop = 4 * maxlen / OCELOT_BUFFER_CELL_SZ; + ocelot_rmw_rix(ocelot, SYS_PAUSE_CFG_PAUSE_START(pause_start), + SYS_PAUSE_CFG_PAUSE_START_M, SYS_PAUSE_CFG, port); + ocelot_rmw_rix(ocelot, SYS_PAUSE_CFG_PAUSE_STOP(pause_stop), + SYS_PAUSE_CFG_PAUSE_STOP_M, SYS_PAUSE_CFG, port);
/* Tail dropping watermark */ atop_wm = (ocelot->shared_queue_sz - 9 * maxlen) / @@ -2094,6 +2095,10 @@ void ocelot_init_port(struct ocelot *ocelot, int port) ocelot_port_writel(ocelot_port, 0, DEV_MAC_FC_MAC_HIGH_CFG); ocelot_port_writel(ocelot_port, 0, DEV_MAC_FC_MAC_LOW_CFG);
+ /* Enable transmission of pause frames */ + ocelot_rmw_rix(ocelot, SYS_PAUSE_CFG_PAUSE_ENA, SYS_PAUSE_CFG_PAUSE_ENA, + SYS_PAUSE_CFG, port); + /* Drop frames with multicast source address */ ocelot_rmw_gix(ocelot, ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA, ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA,
From: Maxim Kochetkov fido_max@inbox.ru
[ Upstream commit aa92d836d5c40a7e21e563a272ad177f1bfd44dd ]
The ocelot_wm_encode function deals with setting thresholds for pause frame start and stop. In Ocelot and Felix the register layout is the same, but for Seville, it isn't. The easiest way to accommodate Seville hardware configuration is to introduce a function pointer for setting this up.
Signed-off-by: Maxim Kochetkov fido_max@inbox.ru Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix_vsc9959.c | 13 +++++++++++++ drivers/net/ethernet/mscc/ocelot.c | 16 ++-------------- drivers/net/ethernet/mscc/ocelot_vsc7514.c | 13 +++++++++++++ include/soc/mscc/ocelot.h | 1 + 4 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c index a83ecd1c5d6c2..259a612da0030 100644 --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@ -1105,8 +1105,21 @@ static int vsc9959_prevalidate_phy_mode(struct ocelot *ocelot, int port, } }
+/* Watermark encode + * Bit 8: Unit; 0:1, 1:16 + * Bit 7-0: Value to be multiplied with unit + */ +static u16 vsc9959_wm_enc(u16 value) +{ + if (value >= BIT(8)) + return BIT(8) | (value / 16); + + return value; +} + static const struct ocelot_ops vsc9959_ops = { .reset = vsc9959_reset, + .wm_enc = vsc9959_wm_enc, };
static int vsc9959_mdio_bus_alloc(struct ocelot *ocelot) diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 6e68713c0ac6b..1438839e3f6ea 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -396,18 +396,6 @@ static void ocelot_vlan_init(struct ocelot *ocelot) } }
-/* Watermark encode - * Bit 8: Unit; 0:1, 1:16 - * Bit 7-0: Value to be multiplied with unit - */ -static u16 ocelot_wm_enc(u16 value) -{ - if (value >= BIT(8)) - return BIT(8) | (value / 16); - - return value; -} - void ocelot_adjust_link(struct ocelot *ocelot, int port, struct phy_device *phydev) { @@ -2037,9 +2025,9 @@ void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu) /* Tail dropping watermark */ atop_wm = (ocelot->shared_queue_sz - 9 * maxlen) / OCELOT_BUFFER_CELL_SZ; - ocelot_write_rix(ocelot, ocelot_wm_enc(9 * maxlen), + ocelot_write_rix(ocelot, ocelot->ops->wm_enc(9 * maxlen), SYS_ATOP, port); - ocelot_write(ocelot, ocelot_wm_enc(atop_wm), SYS_ATOP_TOT_CFG); + ocelot_write(ocelot, ocelot->ops->wm_enc(atop_wm), SYS_ATOP_TOT_CFG); } EXPORT_SYMBOL(ocelot_port_set_maxlen);
diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c index 4a15d2ff8b706..66b58b242f778 100644 --- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c +++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c @@ -240,8 +240,21 @@ static int ocelot_reset(struct ocelot *ocelot) return 0; }
+/* Watermark encode + * Bit 8: Unit; 0:1, 1:16 + * Bit 7-0: Value to be multiplied with unit + */ +static u16 ocelot_wm_enc(u16 value) +{ + if (value >= BIT(8)) + return BIT(8) | (value / 16); + + return value; +} + static const struct ocelot_ops ocelot_ops = { .reset = ocelot_reset, + .wm_enc = ocelot_wm_enc, };
static const struct vcap_field vsc7514_vcap_is2_keys[] = { diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 4953e9994df34..8e174a24c5757 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -468,6 +468,7 @@ struct ocelot;
struct ocelot_ops { int (*reset)(struct ocelot *ocelot); + u16 (*wm_enc)(u16 value); };
struct ocelot_acl_block {
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 601e984f23abcaa7cf3eb078c13de4db3cf6a4f0 ]
Tail dropping is enabled for a port when:
1. A source port consumes more packet buffers than the watermark encoded in SYS:PORT:ATOP_CFG.ATOP.
AND
2. Total memory use exceeds the consumption watermark encoded in SYS:PAUSE_CFG:ATOP_TOT_CFG.
The unit of these watermarks is a 60 byte memory cell. That unit is programmed properly into ATOP_TOT_CFG, but not into ATOP. Actually when written into ATOP, it would get truncated and wrap around.
Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 1438839e3f6ea..61bbb7a090042 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -2001,7 +2001,7 @@ void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu) struct ocelot_port *ocelot_port = ocelot->ports[port]; int maxlen = sdu + ETH_HLEN + ETH_FCS_LEN; int pause_start, pause_stop; - int atop_wm; + int atop, atop_tot;
if (port == ocelot->npi) { maxlen += OCELOT_TAG_LEN; @@ -2022,12 +2022,12 @@ void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu) ocelot_rmw_rix(ocelot, SYS_PAUSE_CFG_PAUSE_STOP(pause_stop), SYS_PAUSE_CFG_PAUSE_STOP_M, SYS_PAUSE_CFG, port);
- /* Tail dropping watermark */ - atop_wm = (ocelot->shared_queue_sz - 9 * maxlen) / + /* Tail dropping watermarks */ + atop_tot = (ocelot->shared_queue_sz - 9 * maxlen) / OCELOT_BUFFER_CELL_SZ; - ocelot_write_rix(ocelot, ocelot->ops->wm_enc(9 * maxlen), - SYS_ATOP, port); - ocelot_write(ocelot, ocelot->ops->wm_enc(atop_wm), SYS_ATOP_TOT_CFG); + atop = (9 * maxlen) / OCELOT_BUFFER_CELL_SZ; + ocelot_write_rix(ocelot, ocelot->ops->wm_enc(atop), SYS_ATOP, port); + ocelot_write(ocelot, ocelot->ops->wm_enc(atop_tot), SYS_ATOP_TOT_CFG); } EXPORT_SYMBOL(ocelot_port_set_maxlen);
From: David Howells dhowells@redhat.com
[ Upstream commit ec0fa0b659144d9c68204d23f627b6a65fa53e50 ]
The afs filesystem has a lock[*] that it uses to serialise I/O operations going to the server (vnode->io_lock), as the server will only perform one modification operation at a time on any given file or directory. This prevents the the filesystem from filling up all the call slots to a server with calls that aren't going to be executed in parallel anyway, thereby allowing operations on other files to obtain slots.
[*] Note that is probably redundant for directories at least since i_rwsem is used to serialise directory modifications and lookup/reading vs modification. The server does allow parallel non-modification ops, however.
When a file truncation op completes, we truncate the in-memory copy of the file to match - but we do it whilst still holding the io_lock, the idea being to prevent races with other operations.
However, if writeback starts in a worker thread simultaneously with truncation (whilst notify_change() is called with i_rwsem locked, writeback pays it no heed), it may manage to set PG_writeback bits on the pages that will get truncated before afs_setattr_success() manages to call truncate_pagecache(). Truncate will then wait for those pages - whilst still inside io_lock:
# cat /proc/8837/stack [<0>] wait_on_page_bit_common+0x184/0x1e7 [<0>] truncate_inode_pages_range+0x37f/0x3eb [<0>] truncate_pagecache+0x3c/0x53 [<0>] afs_setattr_success+0x4d/0x6e [<0>] afs_wait_for_operation+0xd8/0x169 [<0>] afs_do_sync_operation+0x16/0x1f [<0>] afs_setattr+0x1fb/0x25d [<0>] notify_change+0x2cf/0x3c4 [<0>] do_truncate+0x7f/0xb2 [<0>] do_sys_ftruncate+0xd1/0x104 [<0>] do_syscall_64+0x2d/0x3a [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
The writeback operation, however, stalls indefinitely because it needs to get the io_lock to proceed:
# cat /proc/5940/stack [<0>] afs_get_io_locks+0x58/0x1ae [<0>] afs_begin_vnode_operation+0xc7/0xd1 [<0>] afs_store_data+0x1b2/0x2a3 [<0>] afs_write_back_from_locked_page+0x418/0x57c [<0>] afs_writepages_region+0x196/0x224 [<0>] afs_writepages+0x74/0x156 [<0>] do_writepages+0x2d/0x56 [<0>] __writeback_single_inode+0x84/0x207 [<0>] writeback_sb_inodes+0x238/0x3cf [<0>] __writeback_inodes_wb+0x68/0x9f [<0>] wb_writeback+0x145/0x26c [<0>] wb_do_writeback+0x16a/0x194 [<0>] wb_workfn+0x74/0x177 [<0>] process_one_work+0x174/0x264 [<0>] worker_thread+0x117/0x1b9 [<0>] kthread+0xec/0xf1 [<0>] ret_from_fork+0x1f/0x30
and thus deadlock has occurred.
Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact that the caller is holding i_rwsem doesn't preclude more pages being dirtied through an mmap'd region.
Fix this by:
(1) Use the vnode validate_lock to mediate access between afs_setattr() and afs_writepages():
(a) Exclusively lock validate_lock in afs_setattr() around the whole RPC operation.
(b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to shared-lock validate_lock and returning immediately if we couldn't get it.
(c) If WB_SYNC_ALL is set, wait for the lock.
The validate_lock is also used to validate a file and to zap its cache if the file was altered by a third party, so it's probably a good fit for this.
(2) Move the truncation outside of the io_lock in setattr, using the same hook as is used for local directory editing.
This requires the old i_size to be retained in the operation record as we commit the revised status to the inode members inside the io_lock still, but we still need to know if we reduced the file size.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/inode.c | 47 ++++++++++++++++++++++++++++++++++++++--------- fs/afs/internal.h | 1 + fs/afs/write.c | 11 +++++++++++ 3 files changed, 50 insertions(+), 9 deletions(-)
diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 1d13d2e882ada..0fe8844b4bee2 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -810,14 +810,32 @@ void afs_evict_inode(struct inode *inode)
static void afs_setattr_success(struct afs_operation *op) { - struct inode *inode = &op->file[0].vnode->vfs_inode; + struct afs_vnode_param *vp = &op->file[0]; + struct inode *inode = &vp->vnode->vfs_inode; + loff_t old_i_size = i_size_read(inode); + + op->setattr.old_i_size = old_i_size; + afs_vnode_commit_status(op, vp); + /* inode->i_size has now been changed. */ + + if (op->setattr.attr->ia_valid & ATTR_SIZE) { + loff_t size = op->setattr.attr->ia_size; + if (size > old_i_size) + pagecache_isize_extended(inode, old_i_size, size); + } +} + +static void afs_setattr_edit_file(struct afs_operation *op) +{ + struct afs_vnode_param *vp = &op->file[0]; + struct inode *inode = &vp->vnode->vfs_inode;
- afs_vnode_commit_status(op, &op->file[0]); if (op->setattr.attr->ia_valid & ATTR_SIZE) { - loff_t i_size = inode->i_size, size = op->setattr.attr->ia_size; - if (size > i_size) - pagecache_isize_extended(inode, i_size, size); - truncate_pagecache(inode, size); + loff_t size = op->setattr.attr->ia_size; + loff_t i_size = op->setattr.old_i_size; + + if (size < i_size) + truncate_pagecache(inode, size); } }
@@ -825,6 +843,7 @@ static const struct afs_operation_ops afs_setattr_operation = { .issue_afs_rpc = afs_fs_setattr, .issue_yfs_rpc = yfs_fs_setattr, .success = afs_setattr_success, + .edit_dir = afs_setattr_edit_file, };
/* @@ -863,11 +882,16 @@ int afs_setattr(struct dentry *dentry, struct iattr *attr) if (S_ISREG(vnode->vfs_inode.i_mode)) filemap_write_and_wait(vnode->vfs_inode.i_mapping);
+ /* Prevent any new writebacks from starting whilst we do this. */ + down_write(&vnode->validate_lock); + op = afs_alloc_operation(((attr->ia_valid & ATTR_FILE) ? afs_file_key(attr->ia_file) : NULL), vnode->volume); - if (IS_ERR(op)) - return PTR_ERR(op); + if (IS_ERR(op)) { + ret = PTR_ERR(op); + goto out_unlock; + }
afs_op_set_vnode(op, 0, vnode); op->setattr.attr = attr; @@ -880,5 +904,10 @@ int afs_setattr(struct dentry *dentry, struct iattr *attr) op->file[0].update_ctime = 1;
op->ops = &afs_setattr_operation; - return afs_do_sync_operation(op); + ret = afs_do_sync_operation(op); + +out_unlock: + up_write(&vnode->validate_lock); + _leave(" = %d", ret); + return ret; } diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 792ac711985eb..e1ebead2e505a 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -810,6 +810,7 @@ struct afs_operation { } store; struct { struct iattr *attr; + loff_t old_i_size; } setattr; struct afs_acl *acl; struct yfs_acl *yacl; diff --git a/fs/afs/write.c b/fs/afs/write.c index a121c247d95a3..0a98cf36e78a3 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -738,11 +738,21 @@ static int afs_writepages_region(struct address_space *mapping, int afs_writepages(struct address_space *mapping, struct writeback_control *wbc) { + struct afs_vnode *vnode = AFS_FS_I(mapping->host); pgoff_t start, end, next; int ret;
_enter("");
+ /* We have to be careful as we can end up racing with setattr() + * truncating the pagecache since the caller doesn't take a lock here + * to prevent it. + */ + if (wbc->sync_mode == WB_SYNC_ALL) + down_read(&vnode->validate_lock); + else if (!down_read_trylock(&vnode->validate_lock)) + return 0; + if (wbc->range_cyclic) { start = mapping->writeback_index; end = -1; @@ -762,6 +772,7 @@ int afs_writepages(struct address_space *mapping, ret = afs_writepages_region(mapping, wbc, start, end, &next); }
+ up_read(&vnode->validate_lock); _leave(" = %d", ret); return ret; }
From: Kajol Jain kjain@linux.ibm.com
[ Upstream commit 6d6b8b9f4fceab7266ca03d194f60ec72bd4b654 ]
The error handling introduced by commit:
2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
looses any return value from smp_call_function_single() that is not {0, -EINVAL}. This is a problem because it will return -EXNIO when the target CPU is offline. Worse, in that case it'll turn into an infinite loop.
Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()") Reported-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Kajol Jain kjain@linux.ibm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Barret Rhoden brho@google.com Tested-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/events/core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 856d98c36f562..fd8cd00099dae 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -99,7 +99,7 @@ static void remote_function(void *data) * retry due to any failures in smp_call_function_single(), such as if the * task_cpu() goes offline concurrently. * - * returns @func return value or -ESRCH when the process isn't running + * returns @func return value or -ESRCH or -ENXIO when the process isn't running */ static int task_function_call(struct task_struct *p, remote_function_f func, void *info) @@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f func, void *info) for (;;) { ret = smp_call_function_single(task_cpu(p), remote_function, &data, 1); - ret = !ret ? data.ret : -EAGAIN; + if (!ret) + ret = data.ret;
if (ret != -EAGAIN) break;
From: Coly Li colyli@suse.de
[ Upstream commit 4243219141b67d7c2fdb2d8073c17c539b9263eb ]
In mmc_queue_setup_discard() the mmc driver queue's discard_granularity might be set as 0 (when card->pref_erase > max_discard) while the mmc device still declares to support discard operation. This is buggy and triggered the following kernel warning message,
WARNING: CPU: 0 PID: 135 at __blkdev_issue_discard+0x200/0x294 CPU: 0 PID: 135 Comm: f2fs_discard-17 Not tainted 5.9.0-rc6 #1 Hardware name: Google Kevin (DT) pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--) pc : __blkdev_issue_discard+0x200/0x294 lr : __blkdev_issue_discard+0x54/0x294 sp : ffff800011dd3b10 x29: ffff800011dd3b10 x28: 0000000000000000 x27: ffff800011dd3cc4 x26: ffff800011dd3e18 x25: 000000000004e69b x24: 0000000000000c40 x23: ffff0000f1deaaf0 x22: ffff0000f2849200 x21: 00000000002734d8 x20: 0000000000000008 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000394 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000000008b0 x9 : ffff800011dd3cb0 x8 : 000000000004e69b x7 : 0000000000000000 x6 : ffff0000f1926400 x5 : ffff0000f1940800 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 0000000000000008 x1 : 00000000002734d8 x0 : 0000000000000000 Call trace: __blkdev_issue_discard+0x200/0x294 __submit_discard_cmd+0x128/0x374 __issue_discard_cmd_orderly+0x188/0x244 __issue_discard_cmd+0x2e8/0x33c issue_discard_thread+0xe8/0x2f0 kthread+0x11c/0x120 ret_from_fork+0x10/0x1c ---[ end trace e4c8023d33dfe77a ]---
This patch fixes the issue by setting discard_granularity as SECTOR_SIZE instead of 0 when (card->pref_erase > max_discard) is true. Now no more complain from __blkdev_issue_discard() for the improper value of discard granularity.
This issue is exposed after commit b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()"), a "Fixes:" tag is also added for the commit to make sure people won't miss this patch after applying the change of __blkdev_issue_discard().
Fixes: e056a1b5b67b ("mmc: queue: let host controllers specify maximum discard timeout") Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()"). Reported-and-tested-by: Vicente Bergas vicencb@gmail.com Signed-off-by: Coly Li colyli@suse.de Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: Ulf Hansson ulf.hansson@linaro.org Link: https://lore.kernel.org/r/20201002013852.51968-1-colyli@suse.de Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/core/queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 4b1eb89b401d9..1ad518821157f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -190,7 +190,7 @@ static void mmc_queue_setup_discard(struct request_queue *q, q->limits.discard_granularity = card->pref_erase << 9; /* granularity must not be greater than max. discard */ if (card->pref_erase > max_discard) - q->limits.discard_granularity = 0; + q->limits.discard_granularity = SECTOR_SIZE; if (mmc_can_secure_erase_trim(card)) blk_queue_flag_set(QUEUE_FLAG_SECERASE, q); }
From: Minchan Kim minchan@kernel.org
commit 8b7b2eb131d3476062ffd34358785b44be25172f upstream.
The swap address_space doesn't have host. Thus, it makes kernel crash once swap write meets error. Fix it.
Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs") Signed-off-by: Minchan Kim minchan@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Acked-by: Jeff Layton jlayton@kernel.org Cc: Jan Kara jack@suse.cz Cc: Andres Freund andres@anarazel.de Cc: Matthew Wilcox willy@infradead.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christoph Hellwig hch@infradead.org Cc: Dave Chinner david@fromorbit.com Cc: David Howells dhowells@redhat.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20201010000650.750063-1-minchan@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/pagemap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -54,7 +54,8 @@ static inline void mapping_set_error(str __filemap_set_wb_err(mapping, error);
/* Record it in superblock */ - errseq_set(&mapping->host->i_sb->s_wb_err, error); + if (mapping->host) + errseq_set(&mapping->host->i_sb->s_wb_err, error);
/* Record it in flags for now, for legacy callers */ if (error == -ENOSPC)
From: Vijay Balakrishna vijayb@linux.microsoft.com
commit 4aab2be0983031a05cb4a19696c9da5749523426 upstream.
When memory is hotplug added or removed the min_free_kbytes should be recalculated based on what is expected by khugepaged. Currently after hotplug, min_free_kbytes will be set to a lower default and higher default set when THP enabled is lost.
This change restores min_free_kbytes as expected for THP consumers.
[vijayb@linux.microsoft.com: v5] Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.micr...
Fixes: f000565adb77 ("thp: set recommended min free kbytes") Signed-off-by: Vijay Balakrishna vijayb@linux.microsoft.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Pavel Tatashin pasha.tatashin@soleen.com Acked-by: Michal Hocko mhocko@suse.com Cc: Allen Pais apais@microsoft.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Oleg Nesterov oleg@redhat.com Cc: Song Liu songliubraving@fb.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.micr... Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.mic... Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/khugepaged.h | 5 +++++ mm/khugepaged.c | 13 +++++++++++-- mm/page_alloc.c | 3 +++ 3 files changed, 19 insertions(+), 2 deletions(-)
--- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -15,6 +15,7 @@ extern int __khugepaged_enter(struct mm_ extern void __khugepaged_exit(struct mm_struct *mm); extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma, unsigned long vm_flags); +extern void khugepaged_min_free_kbytes_update(void); #ifdef CONFIG_SHMEM extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr); #else @@ -85,6 +86,10 @@ static inline void collapse_pte_mapped_t unsigned long addr) { } + +static inline void khugepaged_min_free_kbytes_update(void) +{ +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif /* _LINUX_KHUGEPAGED_H */ --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -56,6 +56,9 @@ enum scan_result { #define CREATE_TRACE_POINTS #include <trace/events/huge_memory.h>
+static struct task_struct *khugepaged_thread __read_mostly; +static DEFINE_MUTEX(khugepaged_mutex); + /* default scan 8*512 pte (or vmas) every 30 second */ static unsigned int khugepaged_pages_to_scan __read_mostly; static unsigned int khugepaged_pages_collapsed; @@ -2304,8 +2307,6 @@ static void set_recommended_min_free_kby
int start_stop_khugepaged(void) { - static struct task_struct *khugepaged_thread __read_mostly; - static DEFINE_MUTEX(khugepaged_mutex); int err = 0;
mutex_lock(&khugepaged_mutex); @@ -2332,3 +2333,11 @@ fail: mutex_unlock(&khugepaged_mutex); return err; } + +void khugepaged_min_free_kbytes_update(void) +{ + mutex_lock(&khugepaged_mutex); + if (khugepaged_enabled() && khugepaged_thread) + set_recommended_min_free_kbytes(); + mutex_unlock(&khugepaged_mutex); +} --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -69,6 +69,7 @@ #include <linux/nmi.h> #include <linux/psi.h> #include <linux/padata.h> +#include <linux/khugepaged.h>
#include <asm/sections.h> #include <asm/tlbflush.h> @@ -7884,6 +7885,8 @@ int __meminit init_per_zone_wmark_min(vo setup_min_slab_ratio(); #endif
+ khugepaged_min_free_kbytes_update(); + return 0; } postcore_initcall(init_per_zone_wmark_min)
From: Eric Dumazet edumazet@google.com
commit 86bccd0367130f481ca99ba91de1c6a5aa1c78c1 upstream.
We got reports from GKE customers flows being reset by netfilter conntrack unless nf_conntrack_tcp_be_liberal is set to 1.
Traces seemed to suggest ACK packet being dropped by the packet capture, or more likely that ACK were received in the wrong order.
wscale=7, SYN and SYNACK not shown here.
This ACK allows the sender to send 1871*128 bytes from seq 51359321 : New right edge of the window -> 51359321+1871*128=51598809
09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
Receiver now allows to send 606*128=77568 from seq 51521241 : New right edge of the window -> 51521241+606*128=51598809
09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0
09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384
It seems the sender exceeds RWIN allowance, since 51611353 > 51598809
09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040
09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0
netfilter conntrack is not happy and sends RST
09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0
Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet. New right edge of the window -> 51521241+859*128=51631193
Normally TCP stack handles OOO packets just fine, but it turns out tcp_add_backlog() does not. It can update the window field of the aggregated packet even if the ACK sequence of the last received packet is too old.
Many thanks to Alexandre Ferrieux for independently reporting the issue and suggesting a fix.
Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Alexandre Ferrieux alexandre.ferrieux@orange.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/tcp_ipv4.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1787,12 +1787,12 @@ bool tcp_add_backlog(struct sock *sk, st
__skb_pull(skb, hdrlen); if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) { - thtail->window = th->window; - TCP_SKB_CB(tail)->end_seq = TCP_SKB_CB(skb)->end_seq;
- if (after(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq)) + if (likely(!before(TCP_SKB_CB(skb)->ack_seq, TCP_SKB_CB(tail)->ack_seq))) { TCP_SKB_CB(tail)->ack_seq = TCP_SKB_CB(skb)->ack_seq; + thtail->window = th->window; + }
/* We have to update both TCP_SKB_CB(tail)->tcp_flags and * thtail->fin, so that the fast path in tcp_rcv_established()
From: Johannes Berg johannes.berg@intel.com
commit a95bc734e60449e7b073ff7ff70c35083b290ae9 upstream.
If userspace doesn't complete the policy dump, we leak the allocated state. Fix this.
Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg johannes.berg@intel.com Reviewed-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/netlink.h | 3 ++- net/netlink/genetlink.c | 9 ++++++++- net/netlink/policy.c | 24 ++++++++++-------------- 3 files changed, 20 insertions(+), 16 deletions(-)
--- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -1936,7 +1936,8 @@ void nla_get_range_signed(const struct n int netlink_policy_dump_start(const struct nla_policy *policy, unsigned int maxtype, unsigned long *state); -bool netlink_policy_dump_loop(unsigned long *state); +bool netlink_policy_dump_loop(unsigned long state); int netlink_policy_dump_write(struct sk_buff *skb, unsigned long state); +void netlink_policy_dump_free(unsigned long state);
#endif --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -1079,7 +1079,7 @@ static int ctrl_dumppolicy(struct sk_buf if (err) return err;
- while (netlink_policy_dump_loop(&cb->args[1])) { + while (netlink_policy_dump_loop(cb->args[1])) { void *hdr; struct nlattr *nest;
@@ -1113,6 +1113,12 @@ nla_put_failure: return skb->len; }
+static int ctrl_dumppolicy_done(struct netlink_callback *cb) +{ + netlink_policy_dump_free(cb->args[1]); + return 0; +} + static const struct genl_ops genl_ctrl_ops[] = { { .cmd = CTRL_CMD_GETFAMILY, @@ -1123,6 +1129,7 @@ static const struct genl_ops genl_ctrl_o { .cmd = CTRL_CMD_GETPOLICY, .dumpit = ctrl_dumppolicy, + .done = ctrl_dumppolicy_done, }, };
--- a/net/netlink/policy.c +++ b/net/netlink/policy.c @@ -84,7 +84,6 @@ int netlink_policy_dump_start(const stru unsigned int policy_idx; int err;
- /* also returns 0 if "*_state" is our ERR_PTR() end marker */ if (*_state) return 0;
@@ -140,21 +139,11 @@ static bool netlink_policy_dump_finished !state->policies[state->policy_idx].policy; }
-bool netlink_policy_dump_loop(unsigned long *_state) +bool netlink_policy_dump_loop(unsigned long _state) { - struct nl_policy_dump *state = (void *)*_state; - - if (IS_ERR(state)) - return false; - - if (netlink_policy_dump_finished(state)) { - kfree(state); - /* store end marker instead of freed state */ - *_state = (unsigned long)ERR_PTR(-ENOENT); - return false; - } + struct nl_policy_dump *state = (void *)_state;
- return true; + return !netlink_policy_dump_finished(state); }
int netlink_policy_dump_write(struct sk_buff *skb, unsigned long _state) @@ -309,3 +298,10 @@ nla_put_failure: nla_nest_cancel(skb, policy); return -ENOBUFS; } + +void netlink_policy_dump_free(unsigned long _state) +{ + struct nl_policy_dump *state = (void *)_state; + + kfree(state); +}
From: Guillaume Nault gnault@redhat.com
commit 4296adc3e32f5d544a95061160fe7e127be1b9ff upstream.
Openvswitch allows to drop a packet's Ethernet header, therefore skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true and mac_len=0. In that case the pointer passed to skb_mod_eth_type() doesn't point to an Ethernet header and the new Ethertype is written at unexpected locations.
Fix this by verifying that mac_len is big enough to contain an Ethernet header.
Fixes: fa4e0f8855fc ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions") Signed-off-by: Guillaume Nault gnault@redhat.com Acked-by: Davide Caratti dcaratti@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/core/skbuff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5621,7 +5621,7 @@ int skb_mpls_push(struct sk_buff *skb, _ lse->label_stack_entry = mpls_lse; skb_postpush_rcsum(skb, lse, MPLS_HLEN);
- if (ethernet) + if (ethernet && mac_len >= ETH_HLEN) skb_mod_eth_type(skb, eth_hdr(skb), mpls_proto); skb->protocol = mpls_proto;
@@ -5661,7 +5661,7 @@ int skb_mpls_pop(struct sk_buff *skb, __ skb_reset_mac_header(skb); skb_set_network_header(skb, mac_len);
- if (ethernet) { + if (ethernet && mac_len >= ETH_HLEN) { struct ethhdr *hdr;
/* use mpls_hdr() to get ethertype to account for VLANs. */
From: Nikolay Aleksandrov nikolay@nvidia.com
commit f2f3729fb65c5c2e6db234e6316b71a7bdc4b30b upstream.
When a user-space software manages fdb entries externally it should set the ext_learn flag which marks the fdb entry as externally managed and avoids expiring it (they're treated as static fdbs). Unfortunately on events where fdb entries are flushed (STP down, netlink fdb flush etc) these fdbs are also deleted automatically by the bridge. That in turn causes trouble for the managing user-space software (e.g. in MLAG setups we lose remote fdb entries on port flaps). These entries are completely externally managed so we should avoid automatically deleting them, the only exception are offloaded entries (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as before.
Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/br_fdb.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -404,6 +404,8 @@ void br_fdb_delete_by_port(struct net_br
if (!do_all) if (test_bit(BR_FDB_STATIC, &f->flags) || + (test_bit(BR_FDB_ADDED_BY_EXT_LEARN, &f->flags) && + !test_bit(BR_FDB_OFFLOADED, &f->flags)) || (vid && f->key.vlan_id != vid)) continue;
From: Rohit Maheshwari rohitm@chelsio.com
commit 38f7e1c0c43dd25b06513137bb6fd35476f9ec6d upstream.
BUG: kernel NULL pointer dereference, address: 00000000000000b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S 5.9.0-rc3+ #1 Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018 Workqueue: events tx_work_handler [tls] RIP: 0010:tx_work_handler+0x1b/0x70 [tls] Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83 e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00 RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122 RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200 RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665 R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700 R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38 FS: 0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0x1a7/0x370 worker_thread+0x30/0x370 ? process_one_work+0x370/0x370 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30
tls_sw_release_resources_tx() waits for encrypt_pending, which can have race, so we need similar changes as in commit 0cada33241d9de205522e3858b18e506ca5cce2c here as well.
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Rohit Maheshwari rohitm@chelsio.com Acked-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tls/tls_sw.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2142,10 +2142,15 @@ void tls_sw_release_resources_tx(struct struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); struct tls_rec *rec, *tmp; + int pending;
/* Wait for any pending async encryptions to complete */ - smp_store_mb(ctx->async_notify, true); - if (atomic_read(&ctx->encrypt_pending)) + spin_lock_bh(&ctx->encrypt_compl_lock); + ctx->async_notify = true; + pending = atomic_read(&ctx->encrypt_pending); + spin_unlock_bh(&ctx->encrypt_compl_lock); + + if (pending) crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
tls_tx_records(sk, -1);
From: Aya Levin ayal@mellanox.com
commit 3d093bc2369003b4ce6c3522d9b383e47c40045d upstream.
Declare GRE offload support with respect to the inner protocol. Add a list of supported inner protocols on which the driver can offload checksum and GSO. For other protocols, inform the stack to do the needed operations. There is no noticeable impact on GRE performance.
Fixes: 2729984149e6 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels") Signed-off-by: Aya Levin ayal@mellanox.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4323,6 +4323,21 @@ void mlx5e_del_vxlan_port(struct net_dev mlx5e_vxlan_queue_work(priv, be16_to_cpu(ti->port), 0); }
+static bool mlx5e_gre_tunnel_inner_proto_offload_supported(struct mlx5_core_dev *mdev, + struct sk_buff *skb) +{ + switch (skb->inner_protocol) { + case htons(ETH_P_IP): + case htons(ETH_P_IPV6): + case htons(ETH_P_TEB): + return true; + case htons(ETH_P_MPLS_UC): + case htons(ETH_P_MPLS_MC): + return MLX5_CAP_ETH(mdev, tunnel_stateless_mpls_over_gre); + } + return false; +} + static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv, struct sk_buff *skb, netdev_features_t features) @@ -4345,7 +4360,9 @@ static netdev_features_t mlx5e_tunnel_fe
switch (proto) { case IPPROTO_GRE: - return features; + if (mlx5e_gre_tunnel_inner_proto_offload_supported(priv->mdev, skb)) + return features; + break; case IPPROTO_IPIP: case IPPROTO_IPV6: if (mlx5e_tunnel_proto_supported(priv->mdev, IPPROTO_IPIP))
From: Alexey Kardashevskiy aik@ozlabs.ru
commit 44c413d9a51752056d606bf6f312003ac1740fab upstream.
The tty TIOCL_SETSEL ioctl allocates a memory buffer big enough for text selection area. The maximum allowed console size is VC_RESIZE_MAXCOL * VC_RESIZE_MAXROW == 32767*32767 == ~1GB and typical MAX_ORDER is set to allow allocations lot less than than (circa 16MB).
So it is quite possible to trigger huge allocation (and syzkaller just did that) which is going to fail (which is fine) with a backtrace in mm/page_alloc.c at WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)) and this may trigger panic (if panic_on_warn is enabled) and leak kernel addresses to dmesg.
This passes __GFP_NOWARN to kmalloc_array to avoid unnecessary user- triggered WARN_ON. Note that the error is not ignored and the warning is still printed.
Signed-off-by: Alexey Kardashevskiy aik@ozlabs.ru Link: https://lore.kernel.org/r/20200617070444.116704-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/vt/selection.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/vt/selection.c +++ b/drivers/tty/vt/selection.c @@ -193,7 +193,7 @@ static int vc_selection_store_chars(stru /* Allocate a new buffer before freeing the old one ... */ /* chars can take up to 4 bytes with unicode */ bp = kmalloc_array((vc_sel.end - vc_sel.start) / 2 + 1, unicode ? 4 : 1, - GFP_KERNEL); + GFP_KERNEL | __GFP_NOWARN); if (!bp) { printk(KERN_WARNING "selection: kmalloc() failed\n"); clear_selection();
From: Xiongfeng Wang wangxiongfeng2@huawei.com
commit 37bd9e803daea816f2dc2c8f6dc264097eb3ebd2 upstream.
When I cat some module parameters by sysfs, it displays as follows. It's better to add a newline for easy reading.
root@syzkaller:~# cat /sys/module/ati_remote2/parameters/mode_mask 0x1froot@syzkaller:~# cat /sys/module/ati_remote2/parameters/channel_mask 0xffffroot@syzkaller:~#
Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Link: https://lore.kernel.org/r/20200720092148.9320-1-wangxiongfeng2@huawei.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/misc/ati_remote2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/input/misc/ati_remote2.c +++ b/drivers/input/misc/ati_remote2.c @@ -68,7 +68,7 @@ static int ati_remote2_get_channel_mask( { pr_debug("%s()\n", __func__);
- return sprintf(buffer, "0x%04x", *(unsigned int *)kp->arg); + return sprintf(buffer, "0x%04x\n", *(unsigned int *)kp->arg); }
static int ati_remote2_set_mode_mask(const char *val, @@ -84,7 +84,7 @@ static int ati_remote2_get_mode_mask(cha { pr_debug("%s()\n", __func__);
- return sprintf(buffer, "0x%02x", *(unsigned int *)kp->arg); + return sprintf(buffer, "0x%02x\n", *(unsigned int *)kp->arg); }
static unsigned int channel_mask = ATI_REMOTE2_MAX_CHANNEL_MASK;
From: Anant Thazhemadam anant.thazhemadam@gmail.com
commit f45a4248ea4cc13ed50618ff066849f9587226b2 upstream.
When get_registers() fails in set_ethernet_addr(),the uninitialized value of node_id gets copied over as the address. So, check the return value of get_registers().
If get_registers() executed successfully (i.e., it returns sizeof(node_id)), copy over the MAC address using ether_addr_copy() (instead of using memcpy()).
Else, if get_registers() failed instead, a randomly generated MAC address is set as the MAC address instead.
Reported-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Tested-by: syzbot+abbc768b560c84d92fd3@syzkaller.appspotmail.com Acked-by: Petko Manolov petkan@nucleusys.com Signed-off-by: Anant Thazhemadam anant.thazhemadam@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/rtl8150.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/net/usb/rtl8150.c +++ b/drivers/net/usb/rtl8150.c @@ -274,12 +274,20 @@ static int write_mii_word(rtl8150_t * de return 1; }
-static inline void set_ethernet_addr(rtl8150_t * dev) +static void set_ethernet_addr(rtl8150_t *dev) { - u8 node_id[6]; + u8 node_id[ETH_ALEN]; + int ret;
- get_registers(dev, IDR, sizeof(node_id), node_id); - memcpy(dev->netdev->dev_addr, node_id, sizeof(node_id)); + ret = get_registers(dev, IDR, sizeof(node_id), node_id); + + if (ret == sizeof(node_id)) { + ether_addr_copy(dev->netdev->dev_addr, node_id); + } else { + eth_hw_addr_random(dev->netdev); + netdev_notice(dev->netdev, "Assigned a random MAC address: %pM\n", + dev->netdev->dev_addr); + } }
static int rtl8150_set_mac_address(struct net_device *netdev, void *p)
From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
commit a7809ff90ce6c48598d3c4ab54eb599bec1e9c42 upstream.
The rcu read locks are needed to avoid potential race condition while dereferencing radix tree from multiple threads. The issue was identified by syzbot. Below is the crash report:
============================= WARNING: suspicious RCU usage 5.7.0-syzkaller #0 Not tainted ----------------------------- include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/u4:1/21: #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline] #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241 #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243
stack backtrace: CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: qrtr_ns_handler qrtr_ns_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1e9/0x30e lib/dump_stack.c:118 radix_tree_deref_slot include/linux/radix-tree.h:176 [inline] ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline] qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674 process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268 worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414 kthread+0x353/0x380 kernel/kthread.c:268
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace") Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/qrtr/ns.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-)
--- a/net/qrtr/ns.c +++ b/net/qrtr/ns.c @@ -193,12 +193,13 @@ static int announce_servers(struct socka struct qrtr_server *srv; struct qrtr_node *node; void __rcu **slot; - int ret; + int ret = 0;
node = node_get(qrtr_ns.local_node); if (!node) return 0;
+ rcu_read_lock(); /* Announce the list of servers registered in this node */ radix_tree_for_each_slot(slot, &node->servers, &iter, 0) { srv = radix_tree_deref_slot(slot); @@ -206,11 +207,14 @@ static int announce_servers(struct socka ret = service_announce_new(sq, srv); if (ret < 0) { pr_err("failed to announce new service\n"); - return ret; + goto err_out; } }
- return 0; +err_out: + rcu_read_unlock(); + + return ret; }
static struct qrtr_server *server_add(unsigned int service, @@ -335,7 +339,7 @@ static int ctrl_cmd_bye(struct sockaddr_ struct qrtr_node *node; void __rcu **slot; struct kvec iv; - int ret; + int ret = 0;
iv.iov_base = &pkt; iv.iov_len = sizeof(pkt); @@ -344,11 +348,13 @@ static int ctrl_cmd_bye(struct sockaddr_ if (!node) return 0;
+ rcu_read_lock(); /* Advertise removal of this client to all servers of remote node */ radix_tree_for_each_slot(slot, &node->servers, &iter, 0) { srv = radix_tree_deref_slot(slot); server_del(node, srv->port); } + rcu_read_unlock();
/* Advertise the removal of this client to all local servers */ local_node = node_get(qrtr_ns.local_node); @@ -359,6 +365,7 @@ static int ctrl_cmd_bye(struct sockaddr_ pkt.cmd = cpu_to_le32(QRTR_TYPE_BYE); pkt.client.node = cpu_to_le32(from->sq_node);
+ rcu_read_lock(); radix_tree_for_each_slot(slot, &local_node->servers, &iter, 0) { srv = radix_tree_deref_slot(slot);
@@ -372,11 +379,14 @@ static int ctrl_cmd_bye(struct sockaddr_ ret = kernel_sendmsg(qrtr_ns.sock, &msg, &iv, 1, sizeof(pkt)); if (ret < 0) { pr_err("failed to send bye cmd\n"); - return ret; + goto err_out; } }
- return 0; +err_out: + rcu_read_unlock(); + + return ret; }
static int ctrl_cmd_del_client(struct sockaddr_qrtr *from, @@ -394,7 +404,7 @@ static int ctrl_cmd_del_client(struct so struct list_head *li; void __rcu **slot; struct kvec iv; - int ret; + int ret = 0;
iv.iov_base = &pkt; iv.iov_len = sizeof(pkt); @@ -434,6 +444,7 @@ static int ctrl_cmd_del_client(struct so pkt.client.node = cpu_to_le32(node_id); pkt.client.port = cpu_to_le32(port);
+ rcu_read_lock(); radix_tree_for_each_slot(slot, &local_node->servers, &iter, 0) { srv = radix_tree_deref_slot(slot);
@@ -447,11 +458,14 @@ static int ctrl_cmd_del_client(struct so ret = kernel_sendmsg(qrtr_ns.sock, &msg, &iv, 1, sizeof(pkt)); if (ret < 0) { pr_err("failed to send del client cmd\n"); - return ret; + goto err_out; } }
- return 0; +err_out: + rcu_read_unlock(); + + return ret; }
static int ctrl_cmd_new_server(struct sockaddr_qrtr *from, @@ -554,6 +568,7 @@ static int ctrl_cmd_new_lookup(struct so filter.service = service; filter.instance = instance;
+ rcu_read_lock(); radix_tree_for_each_slot(node_slot, &nodes, &node_iter, 0) { node = radix_tree_deref_slot(node_slot);
@@ -568,6 +583,7 @@ static int ctrl_cmd_new_lookup(struct so lookup_notify(from, srv, true); } } + rcu_read_unlock();
/* Empty notification, to indicate end of listing */ lookup_notify(from, NULL, true);
From: Cong Wang xiyou.wangcong@gmail.com
commit e49d8c22f1261c43a986a7fdbf677ac309682a07 upstream.
All TC actions call tcf_idr_insert() for new action at the end of their ->init(), so we can actually move it to a central place in tcf_action_init_1().
And once the action is inserted into the global IDR, other parallel process could free it immediately as its refcnt is still 1, so we can not fail after this, we need to move it after the goto action validation to avoid handling the failure case after insertion.
This is found during code review, is not directly triggered by syzbot. And this prepares for the next patch.
Cc: Vlad Buslov vladbu@mellanox.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/act_api.h | 2 -- net/sched/act_api.c | 38 ++++++++++++++++++++------------------ net/sched/act_bpf.c | 4 +--- net/sched/act_connmark.c | 1 - net/sched/act_csum.c | 3 --- net/sched/act_ct.c | 2 -- net/sched/act_ctinfo.c | 3 --- net/sched/act_gact.c | 2 -- net/sched/act_gate.c | 3 --- net/sched/act_ife.c | 3 --- net/sched/act_ipt.c | 2 -- net/sched/act_mirred.c | 2 -- net/sched/act_mpls.c | 2 -- net/sched/act_nat.c | 3 --- net/sched/act_pedit.c | 2 -- net/sched/act_police.c | 2 -- net/sched/act_sample.c | 2 -- net/sched/act_simple.c | 2 -- net/sched/act_skbedit.c | 2 -- net/sched/act_skbmod.c | 2 -- net/sched/act_tunnel_key.c | 3 --- net/sched/act_vlan.c | 2 -- 22 files changed, 21 insertions(+), 66 deletions(-)
--- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -166,8 +166,6 @@ int tcf_idr_create_from_flags(struct tc_ struct nlattr *est, struct tc_action **a, const struct tc_action_ops *ops, int bind, u32 flags); -void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a); - void tcf_idr_cleanup(struct tc_action_net *tn, u32 index); int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, struct tc_action **a, int bind); --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -467,17 +467,6 @@ int tcf_idr_create_from_flags(struct tc_ } EXPORT_SYMBOL(tcf_idr_create_from_flags);
-void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a) -{ - struct tcf_idrinfo *idrinfo = tn->idrinfo; - - mutex_lock(&idrinfo->lock); - /* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ - WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index))); - mutex_unlock(&idrinfo->lock); -} -EXPORT_SYMBOL(tcf_idr_insert); - /* Cleanup idr index that was allocated but not initialized. */
void tcf_idr_cleanup(struct tc_action_net *tn, u32 index) @@ -902,6 +891,16 @@ static const struct nla_policy tcf_actio [TCA_ACT_HW_STATS] = NLA_POLICY_BITFIELD32(TCA_ACT_HW_STATS_ANY), };
+static void tcf_idr_insert(struct tc_action *a) +{ + struct tcf_idrinfo *idrinfo = a->idrinfo; + + mutex_lock(&idrinfo->lock); + /* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ + WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index))); + mutex_unlock(&idrinfo->lock); +} + struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, struct nlattr *nla, struct nlattr *est, char *name, int ovr, int bind, @@ -989,6 +988,16 @@ struct tc_action *tcf_action_init_1(stru if (err < 0) goto err_mod;
+ if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) && + !rcu_access_pointer(a->goto_chain)) { + tcf_action_destroy_1(a, bind); + NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain"); + return ERR_PTR(-EINVAL); + } + + if (err == ACT_P_CREATED) + tcf_idr_insert(a); + if (!name && tb[TCA_ACT_COOKIE]) tcf_set_action_cookie(&a->act_cookie, cookie);
@@ -1002,13 +1011,6 @@ struct tc_action *tcf_action_init_1(stru if (err != ACT_P_CREATED) module_put(a_o->owner);
- if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN) && - !rcu_access_pointer(a->goto_chain)) { - tcf_action_destroy_1(a, bind); - NL_SET_ERR_MSG(extack, "can't use goto chain with NULL chain"); - return ERR_PTR(-EINVAL); - } - return a;
err_mod: --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -365,9 +365,7 @@ static int tcf_bpf_init(struct net *net, if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (res == ACT_P_CREATED) { - tcf_idr_insert(tn, *act); - } else { + if (res != ACT_P_CREATED) { /* make sure the program being replaced is no longer executing */ synchronize_rcu(); tcf_bpf_cfg_cleanup(&old); --- a/net/sched/act_connmark.c +++ b/net/sched/act_connmark.c @@ -139,7 +139,6 @@ static int tcf_connmark_init(struct net ci->net = net; ci->zone = parm->zone;
- tcf_idr_insert(tn, *a); ret = ACT_P_CREATED; } else if (ret > 0) { ci = to_connmark(*a); --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -110,9 +110,6 @@ static int tcf_csum_init(struct net *net if (params_new) kfree_rcu(params_new, rcu);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); - return ret; put_chain: if (goto_ch) --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1293,8 +1293,6 @@ static int tcf_ct_init(struct net *net, tcf_chain_put_by_act(goto_ch); if (params) call_rcu(¶ms->rcu, tcf_ct_params_free); - if (res == ACT_P_CREATED) - tcf_idr_insert(tn, *a);
return res;
--- a/net/sched/act_ctinfo.c +++ b/net/sched/act_ctinfo.c @@ -269,9 +269,6 @@ static int tcf_ctinfo_init(struct net *n if (cp_new) kfree_rcu(cp_new, rcu);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); - return ret;
put_chain: --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -140,8 +140,6 @@ static int tcf_gact_init(struct net *net if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; release_idr: tcf_idr_release(*a, bind); --- a/net/sched/act_gate.c +++ b/net/sched/act_gate.c @@ -437,9 +437,6 @@ static int tcf_gate_init(struct net *net if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); - return ret;
chain_put: --- a/net/sched/act_ife.c +++ b/net/sched/act_ife.c @@ -627,9 +627,6 @@ static int tcf_ife_init(struct net *net, if (p) kfree_rcu(p, rcu);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); - return ret; metadata_parse_err: if (goto_ch) --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -189,8 +189,6 @@ static int __tcf_ipt_init(struct net *ne ipt->tcfi_t = t; ipt->tcfi_hook = hook; spin_unlock_bh(&ipt->tcf_lock); - if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret;
err3: --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -194,8 +194,6 @@ static int tcf_mirred_init(struct net *n spin_lock(&mirred_list_lock); list_add(&m->tcfm_list, &mirred_list); spin_unlock(&mirred_list_lock); - - tcf_idr_insert(tn, *a); }
return ret; --- a/net/sched/act_mpls.c +++ b/net/sched/act_mpls.c @@ -273,8 +273,6 @@ static int tcf_mpls_init(struct net *net if (p) kfree_rcu(p, rcu);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; put_chain: if (goto_ch) --- a/net/sched/act_nat.c +++ b/net/sched/act_nat.c @@ -93,9 +93,6 @@ static int tcf_nat_init(struct net *net, if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); - return ret; release_idr: tcf_idr_release(*a, bind); --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -238,8 +238,6 @@ static int tcf_pedit_init(struct net *ne spin_unlock_bh(&p->tcf_lock); if (goto_ch) tcf_chain_put_by_act(goto_ch); - if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret;
put_chain: --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -201,8 +201,6 @@ static int tcf_police_init(struct net *n if (new) kfree_rcu(new, rcu);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret;
failure: --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -116,8 +116,6 @@ static int tcf_sample_init(struct net *n if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; put_chain: if (goto_ch) --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -157,8 +157,6 @@ static int tcf_simp_init(struct net *net goto release_idr; }
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; put_chain: if (goto_ch) --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -224,8 +224,6 @@ static int tcf_skbedit_init(struct net * if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; put_chain: if (goto_ch) --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -190,8 +190,6 @@ static int tcf_skbmod_init(struct net *n if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; put_chain: if (goto_ch) --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -536,9 +536,6 @@ static int tunnel_key_init(struct net *n if (goto_ch) tcf_chain_put_by_act(goto_ch);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); - return ret;
put_chain: --- a/net/sched/act_vlan.c +++ b/net/sched/act_vlan.c @@ -229,8 +229,6 @@ static int tcf_vlan_init(struct net *net if (p) kfree_rcu(p, rcu);
- if (ret == ACT_P_CREATED) - tcf_idr_insert(tn, *a); return ret; put_chain: if (goto_ch)
From: Cong Wang xiyou.wangcong@gmail.com
commit 0fedc63fadf0404a729e73a35349481c8009c02f upstream.
syzbot is able to trigger a failure case inside the loop in tcf_action_init(), and when this happens we clean up with tcf_action_destroy(). But, as these actions are already inserted into the global IDR, other parallel process could free them before tcf_action_destroy(), then we will trigger a use-after-free.
Fix this by deferring the insertions even later, after the loop, and committing all the insertions in a separate loop, so we will never fail in the middle of the insertions any more.
One side effect is that the window between alloction and final insertion becomes larger, now it is more likely that the loop in tcf_del_walker() sees the placeholder -EBUSY pointer. So we have to check for error pointer in tcf_del_walker().
Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Vlad Buslov vladbu@mellanox.com Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Jiri Pirko jiri@resnulli.us Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sched/act_api.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-)
--- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -307,6 +307,8 @@ static int tcf_del_walker(struct tcf_idr
mutex_lock(&idrinfo->lock); idr_for_each_entry_ul(idr, p, tmp, id) { + if (IS_ERR(p)) + continue; ret = tcf_idr_release_unsafe(p); if (ret == ACT_P_DELETED) { module_put(ops->owner); @@ -891,14 +893,24 @@ static const struct nla_policy tcf_actio [TCA_ACT_HW_STATS] = NLA_POLICY_BITFIELD32(TCA_ACT_HW_STATS_ANY), };
-static void tcf_idr_insert(struct tc_action *a) +static void tcf_idr_insert_many(struct tc_action *actions[]) { - struct tcf_idrinfo *idrinfo = a->idrinfo; + int i;
- mutex_lock(&idrinfo->lock); - /* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */ - WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index))); - mutex_unlock(&idrinfo->lock); + for (i = 0; i < TCA_ACT_MAX_PRIO; i++) { + struct tc_action *a = actions[i]; + struct tcf_idrinfo *idrinfo; + + if (!a) + continue; + idrinfo = a->idrinfo; + mutex_lock(&idrinfo->lock); + /* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc if + * it is just created, otherwise this is just a nop. + */ + idr_replace(&idrinfo->action_idr, a, a->tcfa_index); + mutex_unlock(&idrinfo->lock); + } }
struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, @@ -995,9 +1007,6 @@ struct tc_action *tcf_action_init_1(stru return ERR_PTR(-EINVAL); }
- if (err == ACT_P_CREATED) - tcf_idr_insert(a); - if (!name && tb[TCA_ACT_COOKIE]) tcf_set_action_cookie(&a->act_cookie, cookie);
@@ -1053,6 +1062,11 @@ int tcf_action_init(struct net *net, str actions[i - 1] = act; }
+ /* We have to commit them all together, because if any error happened in + * between, we could not handle the failure gracefully. + */ + tcf_idr_insert_many(actions); + *attr_size = tcf_action_full_attrs_size(sz); return i - 1;
* On Mon, 2020-10-12 at 15:30 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
hello,
Compiled and booted 5.8.15-rc1+ . No typical dmesg regression. I also have something to mention here. I saw a warning related in several kernels which looks like the following...
"MDS CPU bug present and SMT on, data leak possible"
But now in 5.8.15-rc1+ , that warning disappeared.
Tested-by: Jeffrin Jose T jeffrin@rajagiritech.edu.in
On Mon, Oct 12, 2020 at 11:00:07PM +0530, Jeffrin Jose T wrote:
- On Mon, 2020-10-12 at 15:30 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
hello,
Compiled and booted 5.8.15-rc1+ . No typical dmesg regression. I also have something to mention here. I saw a warning related in several kernels which looks like the following...
"MDS CPU bug present and SMT on, data leak possible"
But now in 5.8.15-rc1+ , that warning disappeared.
Odds are your microcode/bios finally got updated on that machine, right?
thanks for testing and letting me know.
greg k-h
On Wed, 2020-10-14 at 11:56 +0200, Greg Kroah-Hartman wrote:
On Mon, Oct 12, 2020 at 11:00:07PM +0530, Jeffrin Jose T wrote:
- On Mon, 2020-10-12 at 15:30 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
hello,
Compiled and booted 5.8.15-rc1+ . No typical dmesg regression. I also have something to mention here. I saw a warning related in several kernels which looks like the following...
"MDS CPU bug present and SMT on, data leak possible"
But now in 5.8.15-rc1+ , that warning disappeared.
Odds are your microcode/bios finally got updated on that machine, right?
i do not think thot the bios got updoted, becouse the bug is still shown to be present in 5.8.13-rc1+ . so moy be the updoted kernel is fixing the bug or gives o workround.
On Mon, 12 Oct 2020 15:30:04 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.8: 15 builds: 15 pass, 0 fail 26 boots: 26 pass, 0 fail 60 tests: 60 pass, 0 fail
Linux version: 5.8.15-rc1-gf4ed6fb8f168 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Mon, Oct 12, 2020 at 06:27:49PM +0000, Jon Hunter wrote:
On Mon, 12 Oct 2020 15:30:04 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.8: 15 builds: 15 pass, 0 fail 26 boots: 26 pass, 0 fail 60 tests: 60 pass, 0 fail
Linux version: 5.8.15-rc1-gf4ed6fb8f168 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Thanks for testing all of these and letting me know.
greg k-h
On Mon, 12 Oct 2020 at 19:15, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Summary ------------------------------------------------------------------------
kernel: 5.8.15-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.8.y git commit: f4ed6fb8f1680de812daf362f28342e6bf19fdcc git describe: v5.8.14-125-gf4ed6fb8f168 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.8.y/build/v5.8.14...
No regressions (compared to build v5.8.14-5-g0b030df1725b)
No fixes (compared to build v5.8.14-5-g0b030df1725b)
Ran 37577 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - juno-r2-compat - juno-r2-kasan - nxp-ls2088 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86 - x86-kasan
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-syscalls-tests * perf * v4l2-compliance * libhugetlbfs * ltp-sched-tests * network-basic-tests * ltp-open-posix-tests * ltp-tracing-tests * ltp-quickhit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
On Tue, Oct 13, 2020 at 11:14:36AM +0530, Naresh Kamboju wrote:
On Mon, 12 Oct 2020 at 19:15, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Thanks for testing them all.
greg k-h
On Mon, Oct 12, 2020 at 03:30:04PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
Build results: total: 154 pass: 154 fail: 0 Qemu test results: total: 430 pass: 430 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, Oct 13, 2020 at 09:41:31AM -0700, Guenter Roeck wrote:
On Mon, Oct 12, 2020 at 03:30:04PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
Build results: total: 154 pass: 154 fail: 0 Qemu test results: total: 430 pass: 430 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Thanks for testing all of these and letting me know.
greg k-h
On 10/12/20 7:30 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, Oct 13, 2020 at 07:21:50PM -0600, Shuah Khan wrote:
On 10/12/20 7:30 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.8.15 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 14 Oct 2020 13:31:22 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
Thanks for testing all of them and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org