This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.6.17-rc1
Jens Axboe axboe@kernel.dk io_uring/net: limit inline multishot retries
Jens Axboe axboe@kernel.dk io_uring/poll: add requeue return code from poll multishot handling
Jens Axboe axboe@kernel.dk io_uring/net: un-indent mshot retry path in io_recv_finish()
Jens Axboe axboe@kernel.dk io_uring/poll: move poll execution helpers higher up
Jens Axboe axboe@kernel.dk io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers
Aurelien Jarno aurelien@aurel32.net media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c)
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "ASoC: amd: Add new dmi entries for acp5x platform"
Hans de Goede hdegoede@redhat.com Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
Werner Sembach wse@tuxedocomputers.com Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
Frederic Weisbecker frederic@kernel.org hrtimer: Report offline hrtimer enqueue
Heikki Krogerus heikki.krogerus@linux.intel.com usb: dwc3: pci: add support for the Intel Arrow Lake-H
Michal Pecio michal.pecio@gmail.com xhci: handle isoc Babble and Buffer Overrun events properly
Mathias Nyman mathias.nyman@linux.intel.com xhci: process isoc TD properly when there was a transaction error mid TD.
Prashanth K quic_prashk@quicinc.com usb: host: xhci-plat: Add support for XHCI_SG_TRB_CACHE_SIZE_QUIRK
Prashanth K quic_prashk@quicinc.com usb: dwc3: host: Set XHCI_SG_TRB_CACHE_SIZE_QUIRK
Qiuxu Zhuo qiuxu.zhuo@intel.com x86/lib: Revert to _ASM_EXTABLE_UA() for {get,put}_user() fixups
Badhri Jagan Sridharan badhri@google.com Revert "usb: typec: tcpm: fix cc role at port reset"
Leonard Dallmayr leonard.dallmayr@mailbox.org USB: serial: cp210x: add ID for IMST iM871A-USB
Puliang Lu puliang.lu@fibocom.com USB: serial: option: add Fibocom FM101-GL variant
JackBB Wu wojackbb@gmail.com USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
Sean Young sean@mess.org ALSA: usb-audio: add quirk for RODE NT-USB+
Julian Sikorski belegdol+github@gmail.com ALSA: usb-audio: Add a quirk for Yamaha YIT-W12TX transmitter
Alexander Tsoy alexander@tsoy.me ALSA: usb-audio: Add delay quirk for MOTU M Series 2nd revision
Tejun Heo tj@kernel.org blk-iocost: Fix an UBSAN shift-out-of-bounds warning
Ben Dooks ben.dooks@codethink.co.uk riscv: declare overflow_stack as exported from traps.c
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix arch_hugetlb_migration_supported() for NAPOT
Xiubo Li xiubli@redhat.com libceph: just wait for more data to be available on the socket
Xiubo Li xiubli@redhat.com libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
Alexandre Ghiti alexghiti@rivosinc.com riscv: Flush the tlb when a page directory is freed
Ming Lei ming.lei@redhat.com scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled
Alexandre Ghiti alexghiti@rivosinc.com riscv: Fix set_huge_pte_at() for NAPOT mapping
Vincent Chen vincent.chen@sifive.com riscv: mm: execute local TLB flush after populating vmemmap
Alexandre Ghiti alexghiti@rivosinc.com mm: Introduce flush_cache_vmap_early()
Alexandre Ghiti alexghiti@rivosinc.com riscv: Improve flush_tlb_kernel_range()
Alexandre Ghiti alexghiti@rivosinc.com riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Alexandre Ghiti alexghiti@rivosinc.com riscv: Improve tlb_flush()
Dan Carpenter dan.carpenter@linaro.org fs/ntfs3: Fix an NULL dereference bug
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: remove scratch_aligned pointer
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: add helper to release pcpu scratch area
Florian Westphal fw@strlen.de netfilter: nft_set_pipapo: store index in scratch maps
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_ct: reject direction for ct id
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Implement bounds check for stream encoder creation in DCN301
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Add NULL test for 'timing generator' in 'dcn21_set_pipe()'
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Fix 'panel_cntl' could be null in 'dcn21_set_backlight_level()'
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_compat: restrict match/target protocol to u16
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_compat: reject unused compat flag
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_compat: narrow down revision to unsigned 8-bits
Jakub Kicinski kuba@kernel.org selftests: cmsg_ipv6: repeat the exact packet
Eric Dumazet edumazet@google.com ppp_async: limit MRU to 64K
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.
Shigeru Yoshida syoshida@redhat.com tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
Paolo Abeni pabeni@redhat.com selftests: net: let big_tcp test cope with slow env
David Howells dhowells@redhat.com rxrpc: Fix counting of new acks and nacks
David Howells dhowells@redhat.com rxrpc: Fix response to PING RESPONSE ACKs to a dead call
David Howells dhowells@redhat.com rxrpc: Fix delayed ACKs to not set the reference serial number
David Howells dhowells@redhat.com rxrpc: Fix generation of serial numbers to skip zero
Dan Carpenter dan.carpenter@linaro.org drm/i915/gvt: Fix uninitialized variable in handle_mmio()
Eric Dumazet edumazet@google.com inet: read sk->sk_family once in inet_recv_error()
Zhang Rui rui.zhang@intel.com hwmon: (coretemp) Fix bogus core_id to attr name mapping
Zhang Rui rui.zhang@intel.com hwmon: (coretemp) Fix out-of-bounds memory access
Loic Prylli lprylli@netflix.com hwmon: (aspeed-pwm-tacho) mutex for tach reading
Zhipeng Lu alexious@zju.edu.cn octeontx2-pf: Fix a memleak otx2_sq_init
Zhipeng Lu alexious@zju.edu.cn atm: idt77252: fix a memleak in open_card_ubr0
Antoine Tenart atenart@kernel.org tunnels: fix out of bounds access when building IPv6 PMTU error
Gerhard Engleder gerhard@engleder-embedded.com tsnep: Fix mapping for zero copy XDP_TX action
Paolo Abeni pabeni@redhat.com selftests: net: avoid just another constant wait
Paolo Abeni pabeni@redhat.com selftests: net: fix tcp listener handling in pmtu.sh
Yujie Liu yujie.liu@intel.com selftests/net: change shebang to bash to support "source"
Hangbin Liu liuhangbin@gmail.com selftests/net: convert pmtu.sh to run it in unique namespace
Hangbin Liu liuhangbin@gmail.com selftests/net: convert unicast_extensions.sh to run it in unique namespace
Paolo Abeni pabeni@redhat.com selftests: net: cut more slack for gro fwd tests.
Ivan Vecera ivecera@redhat.com net: atlantic: Fix DMA mapping for PTP hwts ring
Eric Dumazet edumazet@google.com netdevsim: avoid potential loop in nsim_dev_trap_report_work()
Kees Cook keescook@chromium.org wifi: brcmfmac: Adjust n_channels usage for __counted_by
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: exit eSR only after the FW does
Johannes Berg johannes.berg@intel.com wifi: mac80211: fix waiting for beacons logic
Johannes Berg johannes.berg@intel.com wifi: mac80211: fix RCU use in TDLS fast-xmit
Furong Xu 0x1207@gmail.com net: stmmac: xgmac: fix handling of DPP safety error for DMA channels
Ard Biesheuvel ardb@kernel.org x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
Ard Biesheuvel ardb@kernel.org x86/efistub: Give up if memory attribute protocol returns an error
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dpu: check for valid hw_pp in dpu_encoder_helper_phys_cleanup
Kuogee Hsieh quic_khsieh@quicinc.com drm/msm/dp: return correct Colorimetry for DP_TEST_DYNAMIC_RANGE_CEA case
Kuogee Hsieh quic_khsieh@quicinc.com drm/msms/dp: fixed link clock divider bits be over written in BPC unknown case
Christoph Hellwig hch@lst.de xfs: respect the stable writes flag on the RT device
Christoph Hellwig hch@lst.de xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
Darrick J. Wong djwong@kernel.org xfs: dquot recovery does not validate the recovered dquot
Darrick J. Wong djwong@kernel.org xfs: clean up dqblk extraction
Dave Chinner dchinner@redhat.com xfs: inode recovery does not validate the recovered inode
Anthony Iliopoulos ailiop@suse.com xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS
Omar Sandoval osandov@fb.com xfs: fix internal error from AGFL exhaustion
Leah Rumancik leah.rumancik@gmail.com xfs: up(ic_sema) if flushing data device fails
Christoph Hellwig hch@lst.de xfs: only remap the written blocks in xfs_reflink_end_cow_extent
Long Li leo.lilong@huawei.com xfs: abort intent items when recovery intents fail
Long Li leo.lilong@huawei.com xfs: factor out xfs_defer_pending_abort
Catherine Hoang catherine.hoang@oracle.com xfs: allow read IO and FICLONE to run concurrently
Christoph Hellwig hch@lst.de xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
Cheng Lin cheng.lin130@zte.com.cn xfs: introduce protection for drop nlink
Darrick J. Wong djwong@kernel.org xfs: make sure maxlen is still congruent with prod when rounding down
Darrick J. Wong djwong@kernel.org xfs: fix units conversion error in xfs_bmap_del_extent_delay
Darrick J. Wong djwong@kernel.org xfs: rt stubs should return negative errnos when rt disabled
Darrick J. Wong djwong@kernel.org xfs: prevent rt growfs when quota is enabled
Darrick J. Wong djwong@kernel.org xfs: hoist freeing of rt data fork extent mappings
Darrick J. Wong djwong@kernel.org xfs: bump max fsgeom struct version
Catherine Hoang catherine.hoang@oracle.com MAINTAINERS: add Catherine as xfs maintainer for 6.6.y
Miguel Ojeda ojeda@kernel.org rust: upgrade to Rust 1.73.0
Miguel Ojeda ojeda@kernel.org rust: print: use explicit link in documentation
Miguel Ojeda ojeda@kernel.org rust: task: remove redundant explicit link
Miguel Ojeda ojeda@kernel.org rust: upgrade to Rust 1.72.1
Miguel Ojeda ojeda@kernel.org rust: arc: add explicit `drop()` around `Box::from_raw()`
Shyam Prasad N sprasad@microsoft.com cifs: failure to add channel on iface should bump up weight
Shyam Prasad N sprasad@microsoft.com cifs: avoid redundant calls to disable multichannel
Tony Lindgren tony@atomide.com phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
Frank Li Frank.Li@nxp.com dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
James Clark james.clark@arm.com perf evlist: Fix evlist__new_default() for > 1 core PMU
Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com phy: renesas: rcar-gen3-usb2: Fix returning wrong error code
Christophe JAILLET christophe.jaillet@wanadoo.fr dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA
Christophe JAILLET christophe.jaillet@wanadoo.fr dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA
Jai Luthra j-luthra@ti.com dmaengine: ti: k3-udma: Report short packet errors
Guanhua Gao guanhua.gao@nxp.com dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools
Baokun Li libaokun1@huawei.com ext4: regenerate buddy after block freeing failed if under fc replay
-------------
Diffstat:
Documentation/process/changes.rst | 2 +- MAINTAINERS | 1 + Makefile | 4 +- arch/arc/include/asm/cacheflush.h | 1 + arch/arm/include/asm/cacheflush.h | 2 + arch/csky/abiv1/inc/abi/cacheflush.h | 1 + arch/csky/abiv2/inc/abi/cacheflush.h | 1 + arch/m68k/include/asm/cacheflush_mm.h | 1 + arch/mips/include/asm/cacheflush.h | 2 + arch/nios2/include/asm/cacheflush.h | 1 + arch/parisc/include/asm/cacheflush.h | 1 + arch/riscv/include/asm/cacheflush.h | 3 +- arch/riscv/include/asm/hugetlb.h | 3 + arch/riscv/include/asm/sbi.h | 3 - arch/riscv/include/asm/stacktrace.h | 5 + arch/riscv/include/asm/tlb.h | 8 +- arch/riscv/include/asm/tlbflush.h | 17 +- arch/riscv/kernel/sbi.c | 32 ++-- arch/riscv/mm/hugetlbpage.c | 78 +++++++- arch/riscv/mm/init.c | 4 + arch/riscv/mm/tlbflush.c | 156 +++++++++------- arch/sh/include/asm/cacheflush.h | 1 + arch/sparc/include/asm/cacheflush_32.h | 1 + arch/sparc/include/asm/cacheflush_64.h | 1 + arch/x86/lib/getuser.S | 24 +-- arch/x86/lib/putuser.S | 20 +-- arch/xtensa/include/asm/cacheflush.h | 6 +- block/blk-iocost.c | 7 + drivers/atm/idt77252.c | 2 + drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c | 10 +- drivers/dma/fsl-qdma.c | 27 ++- drivers/dma/ti/k3-udma.c | 10 +- drivers/firmware/efi/libstub/efistub.h | 3 +- drivers/firmware/efi/libstub/kaslr.c | 2 +- drivers/firmware/efi/libstub/randomalloc.c | 12 +- drivers/firmware/efi/libstub/x86-stub.c | 25 +-- drivers/firmware/efi/libstub/x86-stub.h | 4 +- drivers/firmware/efi/libstub/zboot.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c | 63 ++++--- .../drm/amd/display/dc/dcn301/dcn301_resource.c | 2 +- drivers/gpu/drm/i915/gvt/handlers.c | 3 +- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 +- drivers/gpu/drm/msm/dp/dp_ctrl.c | 5 - drivers/gpu/drm/msm/dp/dp_link.c | 22 ++- drivers/gpu/drm/msm/dp/dp_reg.h | 3 + drivers/hwmon/aspeed-pwm-tacho.c | 7 + drivers/hwmon/coretemp.c | 40 +++-- drivers/input/keyboard/atkbd.c | 13 +- drivers/input/serio/i8042-acpipnpio.h | 6 + drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +- drivers/net/ethernet/aquantia/atlantic/aq_ptp.c | 4 +- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 13 ++ drivers/net/ethernet/aquantia/atlantic/aq_ring.h | 1 + drivers/net/ethernet/engleder/tsnep_main.c | 16 +- .../ethernet/marvell/octeontx2/nic/otx2_common.c | 14 +- drivers/net/ethernet/stmicro/stmmac/common.h | 1 + drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 3 + .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 57 +++++- drivers/net/netdevsim/dev.c | 8 +- drivers/net/ppp/ppp_async.c | 4 + .../broadcom/brcm80211/brcmfmac/cfg80211.c | 6 +- .../net/wireless/intel/iwlwifi/mvm/mld-mac80211.c | 6 +- drivers/phy/renesas/phy-rcar-gen3-usb2.c | 4 - drivers/phy/ti/phy-omap-usb2.c | 4 +- drivers/scsi/scsi_error.c | 3 +- drivers/scsi/scsi_lib.c | 4 +- drivers/usb/dwc3/dwc3-pci.c | 4 + drivers/usb/dwc3/host.c | 4 +- drivers/usb/host/xhci-plat.c | 3 + drivers/usb/host/xhci-ring.c | 80 +++++++-- drivers/usb/host/xhci.h | 1 + drivers/usb/serial/cp210x.c | 1 + drivers/usb/serial/option.c | 1 + drivers/usb/serial/qcserial.c | 2 + drivers/usb/typec/tcpm/tcpm.c | 3 +- fs/ext4/mballoc.c | 20 +++ fs/ntfs3/ntfs_fs.h | 2 +- fs/smb/client/sess.c | 2 + fs/smb/client/smb2pdu.c | 2 +- fs/xfs/Kconfig | 2 +- fs/xfs/libxfs/xfs_alloc.c | 27 ++- fs/xfs/libxfs/xfs_bmap.c | 21 +-- fs/xfs/libxfs/xfs_defer.c | 38 ++-- fs/xfs/libxfs/xfs_defer.h | 2 +- fs/xfs/libxfs/xfs_inode_buf.c | 3 + fs/xfs/libxfs/xfs_rtbitmap.c | 33 ++++ fs/xfs/libxfs/xfs_sb.h | 2 +- fs/xfs/xfs_bmap_util.c | 24 +-- fs/xfs/xfs_dquot.c | 5 +- fs/xfs/xfs_dquot_item_recover.c | 21 ++- fs/xfs/xfs_file.c | 63 +++++-- fs/xfs/xfs_inode.c | 24 +++ fs/xfs/xfs_inode.h | 17 ++ fs/xfs/xfs_inode_item_recover.c | 14 +- fs/xfs/xfs_ioctl.c | 34 ++-- fs/xfs/xfs_iops.c | 7 + fs/xfs/xfs_log.c | 23 +-- fs/xfs/xfs_log_recover.c | 2 +- fs/xfs/xfs_reflink.c | 5 + fs/xfs/xfs_rtalloc.c | 33 +++- fs/xfs/xfs_rtalloc.h | 27 +-- include/asm-generic/cacheflush.h | 6 + include/linux/ceph/messenger.h | 2 +- include/linux/dmaengine.h | 3 +- include/linux/hrtimer.h | 4 +- include/trace/events/rxrpc.h | 8 +- include/uapi/linux/netfilter/nf_tables.h | 2 + io_uring/io_uring.h | 7 + io_uring/net.c | 54 ++++-- io_uring/poll.c | 39 ++-- kernel/time/hrtimer.c | 3 + mm/percpu.c | 8 +- net/ceph/messenger_v1.c | 33 ++-- net/ceph/messenger_v2.c | 4 +- net/ceph/osd_client.c | 9 +- net/ipv4/af_inet.c | 6 +- net/ipv4/ip_tunnel_core.c | 2 +- net/mac80211/mlme.c | 3 +- net/mac80211/tx.c | 7 +- net/netfilter/nft_compat.c | 17 +- net/netfilter/nft_ct.c | 3 + net/netfilter/nft_set_pipapo.c | 108 +++++------ net/netfilter/nft_set_pipapo.h | 18 +- net/netfilter/nft_set_pipapo_avx2.c | 17 +- net/rxrpc/ar-internal.h | 37 +++- net/rxrpc/call_event.c | 12 +- net/rxrpc/call_object.c | 1 + net/rxrpc/conn_event.c | 10 +- net/rxrpc/input.c | 115 ++++++++++-- net/rxrpc/output.c | 8 +- net/rxrpc/proc.c | 2 +- net/rxrpc/rxkad.c | 4 +- net/tipc/bearer.c | 6 + net/unix/garbage.c | 11 ++ rust/alloc/alloc.rs | 21 --- rust/alloc/boxed.rs | 56 ++++-- rust/alloc/lib.rs | 13 +- rust/alloc/raw_vec.rs | 30 ++-- rust/alloc/vec/drain_filter.rs | 199 --------------------- rust/alloc/vec/extract_if.rs | 115 ++++++++++++ rust/alloc/vec/mod.rs | 110 ++++++------ rust/alloc/vec/spec_extend.rs | 8 +- rust/compiler_builtins.rs | 1 + rust/kernel/print.rs | 1 + rust/kernel/sync/arc.rs | 2 +- rust/kernel/task.rs | 2 +- scripts/min-tool-version.sh | 2 +- sound/soc/amd/acp-config.c | 15 +- sound/usb/quirks.c | 6 + tools/perf/util/evlist.c | 9 +- tools/testing/selftests/net/big_tcp.sh | 4 +- tools/testing/selftests/net/cmsg_ipv6.sh | 4 +- tools/testing/selftests/net/pmtu.sh | 52 +++--- tools/testing/selftests/net/udpgro_fwd.sh | 14 +- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- tools/testing/selftests/net/unicast_extensions.sh | 93 +++++----- 156 files changed, 1686 insertions(+), 1003 deletions(-)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit c9b528c35795b711331ed36dc3dbee90d5812d4e ]
This mostly reverts commit 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on code in mb_free_blocks(), fast commit replay can end up marking as free blocks that are already marked as such. This causes corruption of the buddy bitmap so we need to regenerate it in that case.
Reported-by: Jan Kara jack@suse.cz Fixes: 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()") Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 1fd4ed19060d..529ca47da035 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1232,6 +1232,24 @@ void ext4_mb_generate_buddy(struct super_block *sb, atomic64_add(period, &sbi->s_mb_generation_time); }
+static void mb_regenerate_buddy(struct ext4_buddy *e4b) +{ + int count; + int order = 1; + void *buddy; + + while ((buddy = mb_find_buddy(e4b, order++, &count))) + mb_set_bits(buddy, 0, count); + + e4b->bd_info->bb_fragments = 0; + memset(e4b->bd_info->bb_counters, 0, + sizeof(*e4b->bd_info->bb_counters) * + (e4b->bd_sb->s_blocksize_bits + 2)); + + ext4_mb_generate_buddy(e4b->bd_sb, e4b->bd_buddy, + e4b->bd_bitmap, e4b->bd_group, e4b->bd_info); +} + /* The buddy information is attached the buddy cache inode * for convenience. The information regarding each group * is loaded via ext4_mb_load_buddy. The information involve @@ -1920,6 +1938,8 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b, ext4_mark_group_bitmap_corrupted( sb, e4b->bd_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); + } else { + mb_regenerate_buddy(e4b); } goto done; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guanhua Gao guanhua.gao@nxp.com
[ Upstream commit b73e43dcd7a8be26880ef8ff336053b29e79dbc5 ]
In case of long format of qDMA command descriptor, there are one frame descriptor, three entries in the frame list and two data entries. So the size of dma_pool_create for these three fields should be the same with the total size of entries respectively, or the contents may be overwritten by the next allocated descriptor.
Fixes: 7fdf9b05c73b ("dmaengine: fsl-dpaa2-qdma: Add NXP dpaa2 qDMA controller driver for Layerscape SoCs") Signed-off-by: Guanhua Gao guanhua.gao@nxp.com Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240118162917.2951450-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c b/drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c index a42a37634881..da91bc9a8e6f 100644 --- a/drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c +++ b/drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c @@ -38,15 +38,17 @@ static int dpaa2_qdma_alloc_chan_resources(struct dma_chan *chan) if (!dpaa2_chan->fd_pool) goto err;
- dpaa2_chan->fl_pool = dma_pool_create("fl_pool", dev, - sizeof(struct dpaa2_fl_entry), - sizeof(struct dpaa2_fl_entry), 0); + dpaa2_chan->fl_pool = + dma_pool_create("fl_pool", dev, + sizeof(struct dpaa2_fl_entry) * 3, + sizeof(struct dpaa2_fl_entry), 0); + if (!dpaa2_chan->fl_pool) goto err_fd;
dpaa2_chan->sdd_pool = dma_pool_create("sdd_pool", dev, - sizeof(struct dpaa2_qdma_sd_d), + sizeof(struct dpaa2_qdma_sd_d) * 2, sizeof(struct dpaa2_qdma_sd_d), 0); if (!dpaa2_chan->sdd_pool) goto err_fl;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jai Luthra j-luthra@ti.com
[ Upstream commit bc9847c9ba134cfe3398011e343dcf6588c1c902 ]
Propagate the TR response status to the device using BCDMA split-channels. For example CSI-RX driver should be able to check if a frame was not transferred completely (short packet) and needs to be discarded.
Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA") Signed-off-by: Jai Luthra j-luthra@ti.com Acked-by: Peter Ujfalusi peter.ujfalusi@gmail.com Link: https://lore.kernel.org/r/20240103-tr_resp_err-v1-1-2fdf6d48ab92@ti.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/ti/k3-udma.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c index 30fd2f386f36..037f1408e798 100644 --- a/drivers/dma/ti/k3-udma.c +++ b/drivers/dma/ti/k3-udma.c @@ -3968,6 +3968,7 @@ static void udma_desc_pre_callback(struct virt_dma_chan *vc, { struct udma_chan *uc = to_udma_chan(&vc->chan); struct udma_desc *d; + u8 status;
if (!vd) return; @@ -3977,12 +3978,12 @@ static void udma_desc_pre_callback(struct virt_dma_chan *vc, if (d->metadata_size) udma_fetch_epib(uc, d);
- /* Provide residue information for the client */ if (result) { void *desc_vaddr = udma_curr_cppi5_desc_vaddr(d, d->desc_idx);
if (cppi5_desc_get_type(desc_vaddr) == CPPI5_INFO0_DESC_TYPE_VAL_HOST) { + /* Provide residue information for the client */ result->residue = d->residue - cppi5_hdesc_get_pktlen(desc_vaddr); if (result->residue) @@ -3991,7 +3992,12 @@ static void udma_desc_pre_callback(struct virt_dma_chan *vc, result->result = DMA_TRANS_NOERROR; } else { result->residue = 0; - result->result = DMA_TRANS_NOERROR; + /* Propagate TR Response errors to the client */ + status = d->hwdesc[0].tr_resp_base->status; + if (status) + result->result = DMA_TRANS_ABORTED; + else + result->result = DMA_TRANS_NOERROR; } } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 968bc1d7203d384e72afe34124a1801b7af76514 ]
This dma_alloc_coherent() is undone in the remove function, but not in the error handling path of fsl_qdma_probe().
Switch to the managed version to fix the issue in the probe and simplify the remove function.
Fixes: b092529e0aa0 ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Link: https://lore.kernel.org/r/a0ef5d0f5a47381617ef339df776ddc68ce48173.170462151... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/fsl-qdma.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c index a8cc8a4bc610..e4606ab27745 100644 --- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -563,11 +563,11 @@ static struct fsl_qdma_queue /* * Buffer for queue command */ - status_head->cq = dma_alloc_coherent(&pdev->dev, - sizeof(struct fsl_qdma_format) * - status_size, - &status_head->bus_addr, - GFP_KERNEL); + status_head->cq = dmam_alloc_coherent(&pdev->dev, + sizeof(struct fsl_qdma_format) * + status_size, + &status_head->bus_addr, + GFP_KERNEL); if (!status_head->cq) { devm_kfree(&pdev->dev, status_head); return NULL; @@ -1268,8 +1268,6 @@ static void fsl_qdma_cleanup_vchan(struct dma_device *dmadev)
static int fsl_qdma_remove(struct platform_device *pdev) { - int i; - struct fsl_qdma_queue *status; struct device_node *np = pdev->dev.of_node; struct fsl_qdma_engine *fsl_qdma = platform_get_drvdata(pdev);
@@ -1278,11 +1276,6 @@ static int fsl_qdma_remove(struct platform_device *pdev) of_dma_controller_free(np); dma_async_device_unregister(&fsl_qdma->dma_dev);
- for (i = 0; i < fsl_qdma->block_number; i++) { - status = fsl_qdma->status[i]; - dma_free_coherent(&pdev->dev, sizeof(struct fsl_qdma_format) * - status->n_cq, status->cq, status->bus_addr); - } return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 3aa58cb51318e329d203857f7a191678e60bb714 ]
This dma_alloc_coherent() is undone neither in the remove function, nor in the error handling path of fsl_qdma_probe().
Switch to the managed version to fix both issues.
Fixes: b092529e0aa0 ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Link: https://lore.kernel.org/r/7f66aa14f59d32b13672dde28602b47deb294e1f.170462151... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/fsl-qdma.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c index e4606ab27745..e4c293b76e05 100644 --- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -514,11 +514,11 @@ static struct fsl_qdma_queue queue_temp = queue_head + i + (j * queue_num);
queue_temp->cq = - dma_alloc_coherent(&pdev->dev, - sizeof(struct fsl_qdma_format) * - queue_size[i], - &queue_temp->bus_addr, - GFP_KERNEL); + dmam_alloc_coherent(&pdev->dev, + sizeof(struct fsl_qdma_format) * + queue_size[i], + &queue_temp->bus_addr, + GFP_KERNEL); if (!queue_temp->cq) return NULL; queue_temp->block_base = fsl_qdma->block_base +
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com
[ Upstream commit 249abaf3bf0dd07f5ddebbb2fe2e8f4d675f074e ]
Even if device_create_file() returns error code, rcar_gen3_phy_usb2_probe() will return zero because the "ret" is variable shadowing.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter error27@gmail.com Closes: https://lore.kernel.org/r/202312161021.gOLDl48K-lkp@intel.com/ Fixes: 441a681b8843 ("phy: rcar-gen3-usb2: fix implementation for runtime PM") Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20240105093703.3359949-1-yoshihiro.shimoda.uh@rene... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/renesas/phy-rcar-gen3-usb2.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/phy/renesas/phy-rcar-gen3-usb2.c b/drivers/phy/renesas/phy-rcar-gen3-usb2.c index e53eace7c91e..6387c0d34c55 100644 --- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c +++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c @@ -673,8 +673,6 @@ static int rcar_gen3_phy_usb2_probe(struct platform_device *pdev) channel->irq = platform_get_irq_optional(pdev, 0); channel->dr_mode = rcar_gen3_get_dr_mode(dev->of_node); if (channel->dr_mode != USB_DR_MODE_UNKNOWN) { - int ret; - channel->is_otg_channel = true; channel->uses_otg_pins = !of_property_read_bool(dev->of_node, "renesas,no-otg-pins"); @@ -738,8 +736,6 @@ static int rcar_gen3_phy_usb2_probe(struct platform_device *pdev) ret = PTR_ERR(provider); goto error; } else if (channel->is_otg_channel) { - int ret; - ret = device_create_file(dev, &dev_attr_role); if (ret < 0) goto error;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Clark james.clark@arm.com
[ Upstream commit 7814fe24a6211a610db0b408d87420403b5b7a36 ]
The 'Session topology' test currently fails with this message when evlist__new_default() opens more than one event:
32: Session topology : --- start --- templ file: /tmp/perf-test-vv5YzZ Using CPUID 0x00000000410fd070 Opening: unknown-hardware:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xb00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4 Opening: unknown-hardware:HG ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) config 0xa00000000 disabled 1 ------------------------------------------------------------ sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5 non matching sample_type FAILED tests/topology.c:73 can't get session ---- end ---- Session topology: FAILED!
This is because when re-opening the file and parsing the header, Perf expects that any file that has more than one event has the sample ID flag set. Perf record already sets the flag in a similar way when there is more than one event, so add the same logic to evlist__new_default().
evlist__new_default() is only currently used in tests, so I don't expect this change to have any other side effects. The other tests that use it don't save and re-open the file so don't hit this issue.
The session topology test has been failing on Arm big.LITTLE platforms since commit 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events") when evlist__new_default() started opening multiple events for 'cycles'.
Fixes: 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers irogers@google.com Signed-off-by: James Clark james.clark@arm.com [ This was failing as well on a Rocket Lake Refresh/14700k Intel hybrid system - Arnaldo ] Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Tested-by: Ian Rogers irogers@google.com Tested-by: Kan Liang kan.liang@linux.intel.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Changbin Du changbin.du@huawei.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Yang Jihong yangjihong1@huawei.com Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw47... Link: https://lore.kernel.org/r/20240124094358.489372-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/evlist.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index c779b9f2e622..8a8fe1fa0d38 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -103,7 +103,14 @@ struct evlist *evlist__new_default(void) err = parse_event(evlist, can_profile_kernel ? "cycles:P" : "cycles:Pu"); if (err) { evlist__delete(evlist); - evlist = NULL; + return NULL; + } + + if (evlist->core.nr_entries > 1) { + struct evsel *evsel; + + evlist__for_each_entry(evlist, evsel) + evsel__set_sample_id(evsel, /*can_sample_identifier=*/false); }
return evlist;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
[ Upstream commit a22fe1d6dec7e98535b97249fdc95c2be79120bb ]
is_slave_direction() should return true when direction is DMA_DEV_TO_DEV.
Fixes: 49920bc66984 ("dmaengine: add new enum dma_transfer_direction") Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20240123172842.3764529-1-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/dmaengine.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index c3656e590213..cff3dba65820 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -955,7 +955,8 @@ static inline int dmaengine_slave_config(struct dma_chan *chan,
static inline bool is_slave_direction(enum dma_transfer_direction direction) { - return (direction == DMA_MEM_TO_DEV) || (direction == DMA_DEV_TO_MEM); + return (direction == DMA_MEM_TO_DEV) || (direction == DMA_DEV_TO_MEM) || + (direction == DMA_DEV_TO_DEV); }
static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single(
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Lindgren tony@atomide.com
[ Upstream commit 7104ba0f1958adb250319e68a15eff89ec4fd36d ]
If the external phy working together with phy-omap-usb2 does not implement send_srp(), we may still attempt to call it. This can happen on an idle Ethernet gadget triggering a wakeup for example:
configfs-gadget.g1 gadget.0: ECM Suspend configfs-gadget.g1 gadget.0: Port suspended. Triggering wakeup ... Unable to handle kernel NULL pointer dereference at virtual address 00000000 when execute ... PC is at 0x0 LR is at musb_gadget_wakeup+0x1d4/0x254 [musb_hdrc] ... musb_gadget_wakeup [musb_hdrc] from usb_gadget_wakeup+0x1c/0x3c [udc_core] usb_gadget_wakeup [udc_core] from eth_start_xmit+0x3b0/0x3d4 [u_ether] eth_start_xmit [u_ether] from dev_hard_start_xmit+0x94/0x24c dev_hard_start_xmit from sch_direct_xmit+0x104/0x2e4 sch_direct_xmit from __dev_queue_xmit+0x334/0xd88 __dev_queue_xmit from arp_solicit+0xf0/0x268 arp_solicit from neigh_probe+0x54/0x7c neigh_probe from __neigh_event_send+0x22c/0x47c __neigh_event_send from neigh_resolve_output+0x14c/0x1c0 neigh_resolve_output from ip_finish_output2+0x1c8/0x628 ip_finish_output2 from ip_send_skb+0x40/0xd8 ip_send_skb from udp_send_skb+0x124/0x340 udp_send_skb from udp_sendmsg+0x780/0x984 udp_sendmsg from __sys_sendto+0xd8/0x158 __sys_sendto from ret_fast_syscall+0x0/0x58
Let's fix the issue by checking for send_srp() and set_vbus() before calling them. For USB peripheral only cases these both could be NULL.
Fixes: 657b306a7bdf ("usb: phy: add a new driver for omap usb2 phy") Signed-off-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/r/20240128120556.8848-1-tony@atomide.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/ti/phy-omap-usb2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/ti/phy-omap-usb2.c b/drivers/phy/ti/phy-omap-usb2.c index 762d3de8b3c5..6bd3c7492330 100644 --- a/drivers/phy/ti/phy-omap-usb2.c +++ b/drivers/phy/ti/phy-omap-usb2.c @@ -116,7 +116,7 @@ static int omap_usb_set_vbus(struct usb_otg *otg, bool enabled) { struct omap_usb *phy = phy_to_omapusb(otg->usb_phy);
- if (!phy->comparator) + if (!phy->comparator || !phy->comparator->set_vbus) return -ENODEV;
return phy->comparator->set_vbus(phy->comparator, enabled); @@ -126,7 +126,7 @@ static int omap_usb_start_srp(struct usb_otg *otg) { struct omap_usb *phy = phy_to_omapusb(otg->usb_phy);
- if (!phy->comparator) + if (!phy->comparator || !phy->comparator->start_srp) return -ENODEV;
return phy->comparator->start_srp(phy->comparator);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit e77e15fa5eb1c830597c5ca53ea7af973bae2f78 ]
When the server reports query network interface info call as unsupported following a tree connect, it means that multichannel is unsupported, even if the server capabilities report otherwise.
When this happens, cifs_chan_skip_or_disable is called to disable multichannel on the client. However, we only need to call this when multichannel is currently setup.
Fixes: f591062bdbf4 ("cifs: handle servers that still advertise multichannel after disabling") Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index f5006aa97f5b..5d9c87d2e1e0 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -410,7 +410,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, rc = SMB3_request_interfaces(xid, tcon, false); free_xid(xid);
- if (rc == -EOPNOTSUPP) { + if (rc == -EOPNOTSUPP && ses->chan_count > 1) { /* * some servers like Azure SMB server do not advertise * that multichannel has been disabled with server
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 6aac002bcfd554aff6d3ebb55e1660d078d70ab0 ]
After the interface selection policy change to do a weighted round robin, each iface maintains a weight_fulfilled. When the weight_fulfilled reaches the total weight for the iface, we know that the weights can be reset and ifaces can be allocated from scratch again.
During channel allocation failures on a particular channel, weight_fulfilled is not incremented. If a few interfaces are inactive, we could end up in a situation where the active interfaces are all allocated for the total_weight, and inactive ones are all that remain. This can cause a situation where no more channels can be allocated further.
This change fixes it by increasing weight_fulfilled, even when channel allocation failure happens. This could mean that if there are temporary failures in channel allocation, the iface weights may not strictly be adhered to. But that's still okay.
Fixes: a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed") Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/sess.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/smb/client/sess.c b/fs/smb/client/sess.c index 62596299a396..a20a5d0836dc 100644 --- a/fs/smb/client/sess.c +++ b/fs/smb/client/sess.c @@ -263,6 +263,8 @@ int cifs_try_adding_channels(struct cifs_ses *ses) &iface->sockaddr, rc); kref_put(&iface->refcount, release_iface); + /* failure to add chan should increase weight */ + iface->weight_fulfilled++; continue; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit 828176d037e29f813792a8b3ac1591834240e96f upstream.
`Box::from_raw()` is `#[must_use]`, which means the result cannot go unused.
In Rust 1.71.0, this was not detected because the block expression swallows the diagnostic [1]:
unsafe { Box::from_raw(self.ptr.as_ptr()) };
It would have been detected, however, if the line had been instead:
unsafe { Box::from_raw(self.ptr.as_ptr()); }
i.e. the semicolon being inside the `unsafe` block, rather than outside.
In Rust 1.72.0, the compiler started warning about this [2], so without this patch we will get:
error: unused return value of `alloc::boxed::Box::<T>::from_raw` that must be used --> rust/kernel/sync/arc.rs:302:22 | 302 | unsafe { Box::from_raw(self.ptr.as_ptr()) }; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: call `drop(Box::from_raw(ptr))` if you intend to drop the `Box` = note: `-D unused-must-use` implied by `-D warnings` help: use `let _ = ...` to ignore the resulting value | 302 | unsafe { let _ = Box::from_raw(self.ptr.as_ptr()); }; | +++++++ +
Thus add an add an explicit `drop()` as the `#[must_use]`'s annotation suggests (instead of the more general help line).
Link: https://github.com/rust-lang/rust/issues/104253 [1] Link: https://github.com/rust-lang/rust/pull/112529 [2] Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Reviewed-by: Gary Guo gary@garyguo.net Reviewed-by: Alice Ryhl aliceryhl@google.com Reviewed-by: Andreas Hindborg a.hindborg@samsung.com Reviewed-by: Björn Roy Baron bjorn3_gh@protonmail.com Link: https://lore.kernel.org/r/20230823160244.188033-2-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- rust/kernel/sync/arc.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/rust/kernel/sync/arc.rs +++ b/rust/kernel/sync/arc.rs @@ -302,7 +302,7 @@ impl<T: ?Sized> Drop for Arc<T> { // The count reached zero, we must free the memory. // // SAFETY: The pointer was initialised from the result of `Box::leak`. - unsafe { Box::from_raw(self.ptr.as_ptr()) }; + unsafe { drop(Box::from_raw(self.ptr.as_ptr())) }; } } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit ae6df65dabc3f8bd89663d96203963323e266d90 upstream.
This is the third upgrade to the Rust toolchain, from 1.71.1 to 1.72.1 (i.e. the latest) [1].
See the upgrade policy [2] and the comments on the first upgrade in commit 3ed03f4da06e ("rust: upgrade to Rust 1.68.2").
# Unstable features
No unstable features (that we use) were stabilized.
Therefore, the only unstable feature allowed to be used outside the `kernel` crate is still `new_uninit`, though other code to be upstreamed may increase the list.
Please see [3] for details.
# Other improvements
Previously, the compiler could incorrectly generate a `.eh_frame` section under `-Cpanic=abort`. We were hitting this bug when debug assertions were enabled (`CONFIG_RUST_DEBUG_ASSERTIONS=y`) [4]:
LD .tmp_vmlinux.kallsyms1 ld.lld: error: <internal>:(.eh_frame) is being placed in '.eh_frame'
Gary fixed the issue in Rust 1.72.0 [5].
# Required changes
For the upgrade, the following changes are required:
- A call to `Box::from_raw` in `rust/kernel/sync/arc.rs` now requires an explicit `drop()` call. See previous patch for details.
# `alloc` upgrade and reviewing
The vast majority of changes are due to our `alloc` fork being upgraded at once.
There are two kinds of changes to be aware of: the ones coming from upstream, which we should follow as closely as possible, and the updates needed in our added fallible APIs to keep them matching the newer infallible APIs coming from upstream.
Instead of taking a look at the diff of this patch, an alternative approach is reviewing a diff of the changes between upstream `alloc` and the kernel's. This allows to easily inspect the kernel additions only, especially to check if the fallible methods we already have still match the infallible ones in the new version coming from upstream.
Another approach is reviewing the changes introduced in the additions in the kernel fork between the two versions. This is useful to spot potentially unintended changes to our additions.
To apply these approaches, one may follow steps similar to the following to generate a pair of patches that show the differences between upstream Rust and the kernel (for the subset of `alloc` we use) before and after applying this patch:
# Get the difference with respect to the old version. git -C rust checkout $(linux/scripts/min-tool-version.sh rustc) git -C linux ls-tree -r --name-only HEAD -- rust/alloc | cut -d/ -f3- | grep -Fv README.md | xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH git -C linux diff --patch-with-stat --summary -R > old.patch git -C linux restore rust/alloc
# Apply this patch. git -C linux am rust-upgrade.patch
# Get the difference with respect to the new version. git -C rust checkout $(linux/scripts/min-tool-version.sh rustc) git -C linux ls-tree -r --name-only HEAD -- rust/alloc | cut -d/ -f3- | grep -Fv README.md | xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH git -C linux diff --patch-with-stat --summary -R > new.patch git -C linux restore rust/alloc
Now one may check the `new.patch` to take a look at the additions (first approach) or at the difference between those two patches (second approach). For the latter, a side-by-side tool is recommended.
Link: https://github.com/rust-lang/rust/blob/stable/RELEASES.md#version-1721-2023-... [1] Link: https://rust-for-linux.com/rust-version-policy [2] Link: https://github.com/Rust-for-Linux/linux/issues/2 [3] Closes: https://github.com/Rust-for-Linux/linux/issues/1012 [4] Link: https://github.com/rust-lang/rust/pull/112403 [5] Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Reviewed-by: Gary Guo gary@garyguo.net Reviewed-by: Alice Ryhl aliceryhl@google.com Reviewed-by: Björn Roy Baron bjorn3_gh@protonmail.com Link: https://lore.kernel.org/r/20230823160244.188033-3-ojeda@kernel.org [ Used 1.72.1 instead of .0 (no changes in `alloc`) and reworded to mention that we hit the `.eh_frame` bug under debug assertions. ] Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/process/changes.rst | 2 rust/alloc/alloc.rs | 9 - rust/alloc/boxed.rs | 10 + rust/alloc/lib.rs | 10 - rust/alloc/vec/drain_filter.rs | 199 -------------------------------------- rust/alloc/vec/extract_if.rs | 115 +++++++++++++++++++++ rust/alloc/vec/mod.rs | 106 +++++++++----------- scripts/min-tool-version.sh | 2 8 files changed, 187 insertions(+), 266 deletions(-) delete mode 100644 rust/alloc/vec/drain_filter.rs create mode 100644 rust/alloc/vec/extract_if.rs
--- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -31,7 +31,7 @@ you probably needn't concern yourself wi ====================== =============== ======================================== GNU C 5.1 gcc --version Clang/LLVM (optional) 11.0.0 clang --version -Rust (optional) 1.71.1 rustc --version +Rust (optional) 1.72.1 rustc --version bindgen (optional) 0.65.1 bindgen --version GNU make 3.82 make --version bash 4.2 bash --version --- a/rust/alloc/alloc.rs +++ b/rust/alloc/alloc.rs @@ -6,8 +6,10 @@
#[cfg(not(test))] use core::intrinsics; +#[cfg(all(bootstrap, not(test)))] use core::intrinsics::{min_align_of_val, size_of_val};
+#[cfg(all(bootstrap, not(test)))] use core::ptr::Unique; #[cfg(not(test))] use core::ptr::{self, NonNull}; @@ -40,7 +42,6 @@ extern "Rust" { #[rustc_nounwind] fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8;
- #[cfg(not(bootstrap))] static __rust_no_alloc_shim_is_unstable: u8; }
@@ -98,7 +99,6 @@ pub unsafe fn alloc(layout: Layout) -> * unsafe { // Make sure we don't accidentally allow omitting the allocator shim in // stable code until it is actually stabilized. - #[cfg(not(bootstrap))] core::ptr::read_volatile(&__rust_no_alloc_shim_is_unstable);
__rust_alloc(layout.size(), layout.align()) @@ -339,14 +339,15 @@ unsafe fn exchange_malloc(size: usize, a } }
-#[cfg_attr(not(test), lang = "box_free")] +#[cfg(all(bootstrap, not(test)))] +#[lang = "box_free"] #[inline] // This signature has to be the same as `Box`, otherwise an ICE will happen. // When an additional parameter to `Box` is added (like `A: Allocator`), this has to be added here as // well. // For example if `Box` is changed to `struct Box<T: ?Sized, A: Allocator>(Unique<T>, A)`, // this function has to be changed to `fn box_free<T: ?Sized, A: Allocator>(Unique<T>, A)` as well. -pub(crate) unsafe fn box_free<T: ?Sized, A: Allocator>(ptr: Unique<T>, alloc: A) { +unsafe fn box_free<T: ?Sized, A: Allocator>(ptr: Unique<T>, alloc: A) { unsafe { let size = size_of_val(ptr.as_ref()); let align = min_align_of_val(ptr.as_ref()); --- a/rust/alloc/boxed.rs +++ b/rust/alloc/boxed.rs @@ -1215,8 +1215,16 @@ impl<T: ?Sized, A: Allocator> Box<T, A>
#[stable(feature = "rust1", since = "1.0.0")] unsafe impl<#[may_dangle] T: ?Sized, A: Allocator> Drop for Box<T, A> { + #[inline] fn drop(&mut self) { - // FIXME: Do nothing, drop is currently performed by compiler. + // the T in the Box is dropped by the compiler before the destructor is run + + let ptr = self.0; + + unsafe { + let layout = Layout::for_value_raw(ptr.as_ptr()); + self.1.deallocate(From::from(ptr.cast()), layout) + } } }
--- a/rust/alloc/lib.rs +++ b/rust/alloc/lib.rs @@ -58,6 +58,11 @@ //! [`Rc`]: rc //! [`RefCell`]: core::cell
+// To run alloc tests without x.py without ending up with two copies of alloc, Miri needs to be +// able to "empty" this crate. See https://github.com/rust-lang/miri-test-libstd/issues/4. +// rustc itself never sets the feature, so this line has no affect there. +#![cfg(any(not(feature = "miri-test-libstd"), test, doctest))] +// #![allow(unused_attributes)] #![stable(feature = "alloc", since = "1.36.0")] #![doc( @@ -77,11 +82,6 @@ ))] #![no_std] #![needs_allocator] -// To run alloc tests without x.py without ending up with two copies of alloc, Miri needs to be -// able to "empty" this crate. See https://github.com/rust-lang/miri-test-libstd/issues/4. -// rustc itself never sets the feature, so this line has no affect there. -#![cfg(any(not(feature = "miri-test-libstd"), test, doctest))] -// // Lints: #![deny(unsafe_op_in_unsafe_fn)] #![deny(fuzzy_provenance_casts)] --- a/rust/alloc/vec/drain_filter.rs +++ /dev/null @@ -1,199 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 OR MIT - -use crate::alloc::{Allocator, Global}; -use core::mem::{ManuallyDrop, SizedTypeProperties}; -use core::ptr; -use core::slice; - -use super::Vec; - -/// An iterator which uses a closure to determine if an element should be removed. -/// -/// This struct is created by [`Vec::drain_filter`]. -/// See its documentation for more. -/// -/// # Example -/// -/// ``` -/// #![feature(drain_filter)] -/// -/// let mut v = vec![0, 1, 2]; -/// let iter: std::vec::DrainFilter<'_, _, _> = v.drain_filter(|x| *x % 2 == 0); -/// ``` -#[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")] -#[derive(Debug)] -pub struct DrainFilter< - 'a, - T, - F, - #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global, -> where - F: FnMut(&mut T) -> bool, -{ - pub(super) vec: &'a mut Vec<T, A>, - /// The index of the item that will be inspected by the next call to `next`. - pub(super) idx: usize, - /// The number of items that have been drained (removed) thus far. - pub(super) del: usize, - /// The original length of `vec` prior to draining. - pub(super) old_len: usize, - /// The filter test predicate. - pub(super) pred: F, - /// A flag that indicates a panic has occurred in the filter test predicate. - /// This is used as a hint in the drop implementation to prevent consumption - /// of the remainder of the `DrainFilter`. Any unprocessed items will be - /// backshifted in the `vec`, but no further items will be dropped or - /// tested by the filter predicate. - pub(super) panic_flag: bool, -} - -impl<T, F, A: Allocator> DrainFilter<'_, T, F, A> -where - F: FnMut(&mut T) -> bool, -{ - /// Returns a reference to the underlying allocator. - #[unstable(feature = "allocator_api", issue = "32838")] - #[inline] - pub fn allocator(&self) -> &A { - self.vec.allocator() - } - - /// Keep unyielded elements in the source `Vec`. - /// - /// # Examples - /// - /// ``` - /// #![feature(drain_filter)] - /// #![feature(drain_keep_rest)] - /// - /// let mut vec = vec!['a', 'b', 'c']; - /// let mut drain = vec.drain_filter(|_| true); - /// - /// assert_eq!(drain.next().unwrap(), 'a'); - /// - /// // This call keeps 'b' and 'c' in the vec. - /// drain.keep_rest(); - /// - /// // If we wouldn't call `keep_rest()`, - /// // `vec` would be empty. - /// assert_eq!(vec, ['b', 'c']); - /// ``` - #[unstable(feature = "drain_keep_rest", issue = "101122")] - pub fn keep_rest(self) { - // At this moment layout looks like this: - // - // _____________________/-- old_len - // / \ - // [kept] [yielded] [tail] - // _______/ ^-- idx - // -- del - // - // Normally `Drop` impl would drop [tail] (via .for_each(drop), ie still calling `pred`) - // - // 1. Move [tail] after [kept] - // 2. Update length of the original vec to `old_len - del` - // a. In case of ZST, this is the only thing we want to do - // 3. Do *not* drop self, as everything is put in a consistent state already, there is nothing to do - let mut this = ManuallyDrop::new(self); - - unsafe { - // ZSTs have no identity, so we don't need to move them around. - if !T::IS_ZST && this.idx < this.old_len && this.del > 0 { - let ptr = this.vec.as_mut_ptr(); - let src = ptr.add(this.idx); - let dst = src.sub(this.del); - let tail_len = this.old_len - this.idx; - src.copy_to(dst, tail_len); - } - - let new_len = this.old_len - this.del; - this.vec.set_len(new_len); - } - } -} - -#[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")] -impl<T, F, A: Allocator> Iterator for DrainFilter<'_, T, F, A> -where - F: FnMut(&mut T) -> bool, -{ - type Item = T; - - fn next(&mut self) -> Option<T> { - unsafe { - while self.idx < self.old_len { - let i = self.idx; - let v = slice::from_raw_parts_mut(self.vec.as_mut_ptr(), self.old_len); - self.panic_flag = true; - let drained = (self.pred)(&mut v[i]); - self.panic_flag = false; - // Update the index *after* the predicate is called. If the index - // is updated prior and the predicate panics, the element at this - // index would be leaked. - self.idx += 1; - if drained { - self.del += 1; - return Some(ptr::read(&v[i])); - } else if self.del > 0 { - let del = self.del; - let src: *const T = &v[i]; - let dst: *mut T = &mut v[i - del]; - ptr::copy_nonoverlapping(src, dst, 1); - } - } - None - } - } - - fn size_hint(&self) -> (usize, Option<usize>) { - (0, Some(self.old_len - self.idx)) - } -} - -#[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")] -impl<T, F, A: Allocator> Drop for DrainFilter<'_, T, F, A> -where - F: FnMut(&mut T) -> bool, -{ - fn drop(&mut self) { - struct BackshiftOnDrop<'a, 'b, T, F, A: Allocator> - where - F: FnMut(&mut T) -> bool, - { - drain: &'b mut DrainFilter<'a, T, F, A>, - } - - impl<'a, 'b, T, F, A: Allocator> Drop for BackshiftOnDrop<'a, 'b, T, F, A> - where - F: FnMut(&mut T) -> bool, - { - fn drop(&mut self) { - unsafe { - if self.drain.idx < self.drain.old_len && self.drain.del > 0 { - // This is a pretty messed up state, and there isn't really an - // obviously right thing to do. We don't want to keep trying - // to execute `pred`, so we just backshift all the unprocessed - // elements and tell the vec that they still exist. The backshift - // is required to prevent a double-drop of the last successfully - // drained item prior to a panic in the predicate. - let ptr = self.drain.vec.as_mut_ptr(); - let src = ptr.add(self.drain.idx); - let dst = src.sub(self.drain.del); - let tail_len = self.drain.old_len - self.drain.idx; - src.copy_to(dst, tail_len); - } - self.drain.vec.set_len(self.drain.old_len - self.drain.del); - } - } - } - - let backshift = BackshiftOnDrop { drain: self }; - - // Attempt to consume any remaining elements if the filter predicate - // has not yet panicked. We'll backshift any remaining elements - // whether we've already panicked or if the consumption here panics. - if !backshift.drain.panic_flag { - backshift.drain.for_each(drop); - } - } -} --- /dev/null +++ b/rust/alloc/vec/extract_if.rs @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: Apache-2.0 OR MIT + +use crate::alloc::{Allocator, Global}; +use core::ptr; +use core::slice; + +use super::Vec; + +/// An iterator which uses a closure to determine if an element should be removed. +/// +/// This struct is created by [`Vec::extract_if`]. +/// See its documentation for more. +/// +/// # Example +/// +/// ``` +/// #![feature(extract_if)] +/// +/// let mut v = vec![0, 1, 2]; +/// let iter: std::vec::ExtractIf<'_, _, _> = v.extract_if(|x| *x % 2 == 0); +/// ``` +#[unstable(feature = "extract_if", reason = "recently added", issue = "43244")] +#[derive(Debug)] +#[must_use = "iterators are lazy and do nothing unless consumed"] +pub struct ExtractIf< + 'a, + T, + F, + #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global, +> where + F: FnMut(&mut T) -> bool, +{ + pub(super) vec: &'a mut Vec<T, A>, + /// The index of the item that will be inspected by the next call to `next`. + pub(super) idx: usize, + /// The number of items that have been drained (removed) thus far. + pub(super) del: usize, + /// The original length of `vec` prior to draining. + pub(super) old_len: usize, + /// The filter test predicate. + pub(super) pred: F, +} + +impl<T, F, A: Allocator> ExtractIf<'_, T, F, A> +where + F: FnMut(&mut T) -> bool, +{ + /// Returns a reference to the underlying allocator. + #[unstable(feature = "allocator_api", issue = "32838")] + #[inline] + pub fn allocator(&self) -> &A { + self.vec.allocator() + } +} + +#[unstable(feature = "extract_if", reason = "recently added", issue = "43244")] +impl<T, F, A: Allocator> Iterator for ExtractIf<'_, T, F, A> +where + F: FnMut(&mut T) -> bool, +{ + type Item = T; + + fn next(&mut self) -> Option<T> { + unsafe { + while self.idx < self.old_len { + let i = self.idx; + let v = slice::from_raw_parts_mut(self.vec.as_mut_ptr(), self.old_len); + let drained = (self.pred)(&mut v[i]); + // Update the index *after* the predicate is called. If the index + // is updated prior and the predicate panics, the element at this + // index would be leaked. + self.idx += 1; + if drained { + self.del += 1; + return Some(ptr::read(&v[i])); + } else if self.del > 0 { + let del = self.del; + let src: *const T = &v[i]; + let dst: *mut T = &mut v[i - del]; + ptr::copy_nonoverlapping(src, dst, 1); + } + } + None + } + } + + fn size_hint(&self) -> (usize, Option<usize>) { + (0, Some(self.old_len - self.idx)) + } +} + +#[unstable(feature = "extract_if", reason = "recently added", issue = "43244")] +impl<T, F, A: Allocator> Drop for ExtractIf<'_, T, F, A> +where + F: FnMut(&mut T) -> bool, +{ + fn drop(&mut self) { + unsafe { + if self.idx < self.old_len && self.del > 0 { + // This is a pretty messed up state, and there isn't really an + // obviously right thing to do. We don't want to keep trying + // to execute `pred`, so we just backshift all the unprocessed + // elements and tell the vec that they still exist. The backshift + // is required to prevent a double-drop of the last successfully + // drained item prior to a panic in the predicate. + let ptr = self.vec.as_mut_ptr(); + let src = ptr.add(self.idx); + let dst = src.sub(self.del); + let tail_len = self.old_len - self.idx; + src.copy_to(dst, tail_len); + } + self.vec.set_len(self.old_len - self.del); + } + } +} --- a/rust/alloc/vec/mod.rs +++ b/rust/alloc/vec/mod.rs @@ -74,10 +74,10 @@ use crate::boxed::Box; use crate::collections::{TryReserveError, TryReserveErrorKind}; use crate::raw_vec::RawVec;
-#[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")] -pub use self::drain_filter::DrainFilter; +#[unstable(feature = "extract_if", reason = "recently added", issue = "43244")] +pub use self::extract_if::ExtractIf;
-mod drain_filter; +mod extract_if;
#[cfg(not(no_global_oom_handling))] #[stable(feature = "vec_splice", since = "1.21.0")] @@ -618,22 +618,20 @@ impl<T> Vec<T> { /// Using memory that was allocated elsewhere: /// /// ```rust - /// #![feature(allocator_api)] - /// - /// use std::alloc::{AllocError, Allocator, Global, Layout}; + /// use std::alloc::{alloc, Layout}; /// /// fn main() { /// let layout = Layout::array::<u32>(16).expect("overflow cannot happen"); /// /// let vec = unsafe { - /// let mem = match Global.allocate(layout) { - /// Ok(mem) => mem.cast::<u32>().as_ptr(), - /// Err(AllocError) => return, - /// }; + /// let mem = alloc(layout).cast::<u32>(); + /// if mem.is_null() { + /// return; + /// } /// /// mem.write(1_000_000); /// - /// Vec::from_raw_parts_in(mem, 1, 16, Global) + /// Vec::from_raw_parts(mem, 1, 16) /// }; /// /// assert_eq!(vec, &[1_000_000]); @@ -876,19 +874,22 @@ impl<T, A: Allocator> Vec<T, A> { /// Using memory that was allocated elsewhere: /// /// ```rust - /// use std::alloc::{alloc, Layout}; + /// #![feature(allocator_api)] + /// + /// use std::alloc::{AllocError, Allocator, Global, Layout}; /// /// fn main() { /// let layout = Layout::array::<u32>(16).expect("overflow cannot happen"); + /// /// let vec = unsafe { - /// let mem = alloc(layout).cast::<u32>(); - /// if mem.is_null() { - /// return; - /// } + /// let mem = match Global.allocate(layout) { + /// Ok(mem) => mem.cast::<u32>().as_ptr(), + /// Err(AllocError) => return, + /// }; /// /// mem.write(1_000_000); /// - /// Vec::from_raw_parts(mem, 1, 16) + /// Vec::from_raw_parts_in(mem, 1, 16, Global) /// }; /// /// assert_eq!(vec, &[1_000_000]); @@ -2507,7 +2508,7 @@ impl<T: Clone, A: Allocator> Vec<T, A> { let len = self.len();
if new_len > len { - self.extend_with(new_len - len, ExtendElement(value)) + self.extend_with(new_len - len, value) } else { self.truncate(new_len); } @@ -2545,7 +2546,7 @@ impl<T: Clone, A: Allocator> Vec<T, A> { let len = self.len();
if new_len > len { - self.try_extend_with(new_len - len, ExtendElement(value)) + self.try_extend_with(new_len - len, value) } else { self.truncate(new_len); Ok(()) @@ -2684,26 +2685,10 @@ impl<T, A: Allocator, const N: usize> Ve } }
-// This code generalizes `extend_with_{element,default}`. -trait ExtendWith<T> { - fn next(&mut self) -> T; - fn last(self) -> T; -} - -struct ExtendElement<T>(T); -impl<T: Clone> ExtendWith<T> for ExtendElement<T> { - fn next(&mut self) -> T { - self.0.clone() - } - fn last(self) -> T { - self.0 - } -} - -impl<T, A: Allocator> Vec<T, A> { +impl<T: Clone, A: Allocator> Vec<T, A> { #[cfg(not(no_global_oom_handling))] - /// Extend the vector by `n` values, using the given generator. - fn extend_with<E: ExtendWith<T>>(&mut self, n: usize, mut value: E) { + /// Extend the vector by `n` clones of value. + fn extend_with(&mut self, n: usize, value: T) { self.reserve(n);
unsafe { @@ -2715,15 +2700,15 @@ impl<T, A: Allocator> Vec<T, A> {
// Write all elements except the last one for _ in 1..n { - ptr::write(ptr, value.next()); + ptr::write(ptr, value.clone()); ptr = ptr.add(1); - // Increment the length in every step in case next() panics + // Increment the length in every step in case clone() panics local_len.increment_len(1); }
if n > 0 { // We can write the last element directly without cloning needlessly - ptr::write(ptr, value.last()); + ptr::write(ptr, value); local_len.increment_len(1); }
@@ -2731,8 +2716,8 @@ impl<T, A: Allocator> Vec<T, A> { } }
- /// Try to extend the vector by `n` values, using the given generator. - fn try_extend_with<E: ExtendWith<T>>(&mut self, n: usize, mut value: E) -> Result<(), TryReserveError> { + /// Try to extend the vector by `n` clones of value. + fn try_extend_with(&mut self, n: usize, value: T) -> Result<(), TryReserveError> { self.try_reserve(n)?;
unsafe { @@ -2744,15 +2729,15 @@ impl<T, A: Allocator> Vec<T, A> {
// Write all elements except the last one for _ in 1..n { - ptr::write(ptr, value.next()); + ptr::write(ptr, value.clone()); ptr = ptr.add(1); - // Increment the length in every step in case next() panics + // Increment the length in every step in case clone() panics local_len.increment_len(1); }
if n > 0 { // We can write the last element directly without cloning needlessly - ptr::write(ptr, value.last()); + ptr::write(ptr, value); local_len.increment_len(1); }
@@ -3210,6 +3195,12 @@ impl<T, A: Allocator> Vec<T, A> { /// If the closure returns false, the element will remain in the vector and will not be yielded /// by the iterator. /// + /// If the returned `ExtractIf` is not exhausted, e.g. because it is dropped without iterating + /// or the iteration short-circuits, then the remaining elements will be retained. + /// Use [`retain`] with a negated predicate if you do not need the returned iterator. + /// + /// [`retain`]: Vec::retain + /// /// Using this method is equivalent to the following code: /// /// ``` @@ -3228,10 +3219,10 @@ impl<T, A: Allocator> Vec<T, A> { /// # assert_eq!(vec, vec![1, 4, 5]); /// ``` /// - /// But `drain_filter` is easier to use. `drain_filter` is also more efficient, + /// But `extract_if` is easier to use. `extract_if` is also more efficient, /// because it can backshift the elements of the array in bulk. /// - /// Note that `drain_filter` also lets you mutate every element in the filter closure, + /// Note that `extract_if` also lets you mutate every element in the filter closure, /// regardless of whether you choose to keep or remove it. /// /// # Examples @@ -3239,17 +3230,17 @@ impl<T, A: Allocator> Vec<T, A> { /// Splitting an array into evens and odds, reusing the original allocation: /// /// ``` - /// #![feature(drain_filter)] + /// #![feature(extract_if)] /// let mut numbers = vec![1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15]; /// - /// let evens = numbers.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>(); + /// let evens = numbers.extract_if(|x| *x % 2 == 0).collect::<Vec<_>>(); /// let odds = numbers; /// /// assert_eq!(evens, vec![2, 4, 6, 8, 14]); /// assert_eq!(odds, vec![1, 3, 5, 9, 11, 13, 15]); /// ``` - #[unstable(feature = "drain_filter", reason = "recently added", issue = "43244")] - pub fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<'_, T, F, A> + #[unstable(feature = "extract_if", reason = "recently added", issue = "43244")] + pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F, A> where F: FnMut(&mut T) -> bool, { @@ -3260,7 +3251,7 @@ impl<T, A: Allocator> Vec<T, A> { self.set_len(0); }
- DrainFilter { vec: self, idx: 0, del: 0, old_len, pred: filter, panic_flag: false } + ExtractIf { vec: self, idx: 0, del: 0, old_len, pred: filter } } }
@@ -3290,9 +3281,14 @@ impl<'a, T: Copy + 'a, A: Allocator + 'a
/// Implements comparison of vectors, [lexicographically](Ord#lexicographical-comparison). #[stable(feature = "rust1", since = "1.0.0")] -impl<T: PartialOrd, A: Allocator> PartialOrd for Vec<T, A> { +impl<T, A1, A2> PartialOrd<Vec<T, A2>> for Vec<T, A1> +where + T: PartialOrd, + A1: Allocator, + A2: Allocator, +{ #[inline] - fn partial_cmp(&self, other: &Self) -> Option<Ordering> { + fn partial_cmp(&self, other: &Vec<T, A2>) -> Option<Ordering> { PartialOrd::partial_cmp(&**self, &**other) } } --- a/scripts/min-tool-version.sh +++ b/scripts/min-tool-version.sh @@ -31,7 +31,7 @@ llvm) fi ;; rustc) - echo 1.71.1 + echo 1.72.1 ;; bindgen) echo 0.65.1
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit c61bcc278b1924da13fd52edbd46b08a518c11ef upstream.
Starting with Rust 1.73.0, `rustdoc` detects redundant explicit links with its new lint `redundant_explicit_links` [1]:
error: redundant explicit link target --> rust/kernel/task.rs:85:21 | 85 | /// [`current`](crate::current) macro because it is safe. | --------- ^^^^^^^^^^^^^^ explicit target is redundant | | | because label contains path that resolves to same destination | = note: when a link's destination is not specified, the label is used to resolve intra-doc links = note: `-D rustdoc::redundant-explicit-links` implied by `-D warnings` help: remove explicit link target | 85 | /// [`current`] macro because it is safe.
In order to avoid the warning in the compiler upgrade commit, make it an intra-doc link as the tool suggests.
Link: https://github.com/rust-lang/rust/pull/113167 [1] Reviewed-by: Finn Behrens me@kloenk.dev Reviewed-by: Alice Ryhl aliceryhl@google.com Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Reviewed-by: Vincenzo Palazzo vincenzopalazzodev@gmail.com Link: https://lore.kernel.org/r/20231005210556.466856-2-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- rust/kernel/task.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/rust/kernel/task.rs +++ b/rust/kernel/task.rs @@ -82,7 +82,7 @@ impl Task { /// Returns a task reference for the currently executing task/thread. /// /// The recommended way to get the current task/thread is to use the - /// [`current`](crate::current) macro because it is safe. + /// [`current`] macro because it is safe. /// /// # Safety ///
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit a53d8cdd5a0aec75ae32badc2d8995c59ea6e3f0 upstream.
The future `rustdoc` in the Rust 1.73.0 upgrade requires an explicit link for `pr_info!`:
error: unresolved link to `pr_info` --> rust/kernel/print.rs:395:63 | 395 | /// Use only when continuing a previous `pr_*!` macro (e.g. [`pr_info!`]). | ^^^^^^^^ no item named `pr_info` in scope | = note: `macro_rules` named `pr_info` exists in this crate, but it is not in scope at this link's location = note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings`
Thus do so to avoid a broken link while upgrading.
Reviewed-by: Alice Ryhl aliceryhl@google.com Reviewed-by: Vincenzo Palazzo vincenzopalazzodev@gmail.com Reviewed-by: Finn Behrens me@kloenk.dev Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Link: https://lore.kernel.org/r/20231005210556.466856-3-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- rust/kernel/print.rs | 1 + 1 file changed, 1 insertion(+)
--- a/rust/kernel/print.rs +++ b/rust/kernel/print.rs @@ -399,6 +399,7 @@ macro_rules! pr_debug ( /// Mimics the interface of [`std::print!`]. See [`core::fmt`] and /// `alloc::format!` for information about the formatting syntax. /// +/// [`pr_info!`]: crate::pr_info! /// [`pr_cont`]: https://www.kernel.org/doc/html/latest/core-api/printk-basics.html#c.pr_cont /// [`std::print!`]: https://doc.rust-lang.org/std/macro.print.html ///
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
commit e08ff622c91af997cb89bc47e90a1a383e938bd0 upstream.
This is the next upgrade to the Rust toolchain, from 1.72.1 to 1.73.0 (i.e. the latest) [1].
See the upgrade policy [2] and the comments on the first upgrade in commit 3ed03f4da06e ("rust: upgrade to Rust 1.68.2").
# Unstable features
No unstable features (that we use) were stabilized.
Therefore, the only unstable feature allowed to be used outside the `kernel` crate is still `new_uninit`, though other code to be upstreamed may increase the list.
Please see [3] for details.
# Required changes
For the upgrade, the following changes are required:
- Allow `internal_features` for `feature(compiler_builtins)` since now Rust warns about using internal compiler and standard library features (similar to how it also warns about incomplete ones) [4].
- A cleanup for a documentation link thanks to a new `rustdoc` lint. See previous commits for details.
- A need to make an intra-doc link to a macro explicit, due to a change in behavior in `rustdoc`. See previous commits for details.
# `alloc` upgrade and reviewing
The vast majority of changes are due to our `alloc` fork being upgraded at once.
There are two kinds of changes to be aware of: the ones coming from upstream, which we should follow as closely as possible, and the updates needed in our added fallible APIs to keep them matching the newer infallible APIs coming from upstream.
Instead of taking a look at the diff of this patch, an alternative approach is reviewing a diff of the changes between upstream `alloc` and the kernel's. This allows to easily inspect the kernel additions only, especially to check if the fallible methods we already have still match the infallible ones in the new version coming from upstream.
Another approach is reviewing the changes introduced in the additions in the kernel fork between the two versions. This is useful to spot potentially unintended changes to our additions.
To apply these approaches, one may follow steps similar to the following to generate a pair of patches that show the differences between upstream Rust and the kernel (for the subset of `alloc` we use) before and after applying this patch:
# Get the difference with respect to the old version. git -C rust checkout $(linux/scripts/min-tool-version.sh rustc) git -C linux ls-tree -r --name-only HEAD -- rust/alloc | cut -d/ -f3- | grep -Fv README.md | xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH git -C linux diff --patch-with-stat --summary -R > old.patch git -C linux restore rust/alloc
# Apply this patch. git -C linux am rust-upgrade.patch
# Get the difference with respect to the new version. git -C rust checkout $(linux/scripts/min-tool-version.sh rustc) git -C linux ls-tree -r --name-only HEAD -- rust/alloc | cut -d/ -f3- | grep -Fv README.md | xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH git -C linux diff --patch-with-stat --summary -R > new.patch git -C linux restore rust/alloc
Now one may check the `new.patch` to take a look at the additions (first approach) or at the difference between those two patches (second approach). For the latter, a side-by-side tool is recommended.
Link: https://github.com/rust-lang/rust/blob/stable/RELEASES.md#version-1730-2023-... [1] Link: https://rust-for-linux.com/rust-version-policy [2] Link: https://github.com/Rust-for-Linux/linux/issues/2 [3] Link: https://github.com/rust-lang/compiler-team/issues/596 [4] Reviewed-by: Martin Rodriguez Reboredo yakoyoku@gmail.com Reviewed-by: Vincenzo Palazzo vincenzopalazzodev@gmail.com Reviewed-by: Alice Ryhl aliceryhl@google.com Link: https://lore.kernel.org/r/20231005210556.466856-4-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/process/changes.rst | 2 - rust/alloc/alloc.rs | 22 ----------------- rust/alloc/boxed.rs | 48 ++++++++++++++++++++++++-------------- rust/alloc/lib.rs | 5 ++- rust/alloc/raw_vec.rs | 30 +++++++++++++++-------- rust/alloc/vec/mod.rs | 4 +-- rust/alloc/vec/spec_extend.rs | 8 +++--- rust/compiler_builtins.rs | 1 scripts/min-tool-version.sh | 2 - 9 files changed, 63 insertions(+), 59 deletions(-)
--- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -31,7 +31,7 @@ you probably needn't concern yourself wi ====================== =============== ======================================== GNU C 5.1 gcc --version Clang/LLVM (optional) 11.0.0 clang --version -Rust (optional) 1.72.1 rustc --version +Rust (optional) 1.73.0 rustc --version bindgen (optional) 0.65.1 bindgen --version GNU make 3.82 make --version bash 4.2 bash --version --- a/rust/alloc/alloc.rs +++ b/rust/alloc/alloc.rs @@ -6,11 +6,7 @@
#[cfg(not(test))] use core::intrinsics; -#[cfg(all(bootstrap, not(test)))] -use core::intrinsics::{min_align_of_val, size_of_val};
-#[cfg(all(bootstrap, not(test)))] -use core::ptr::Unique; #[cfg(not(test))] use core::ptr::{self, NonNull};
@@ -339,23 +335,6 @@ unsafe fn exchange_malloc(size: usize, a } }
-#[cfg(all(bootstrap, not(test)))] -#[lang = "box_free"] -#[inline] -// This signature has to be the same as `Box`, otherwise an ICE will happen. -// When an additional parameter to `Box` is added (like `A: Allocator`), this has to be added here as -// well. -// For example if `Box` is changed to `struct Box<T: ?Sized, A: Allocator>(Unique<T>, A)`, -// this function has to be changed to `fn box_free<T: ?Sized, A: Allocator>(Unique<T>, A)` as well. -unsafe fn box_free<T: ?Sized, A: Allocator>(ptr: Unique<T>, alloc: A) { - unsafe { - let size = size_of_val(ptr.as_ref()); - let align = min_align_of_val(ptr.as_ref()); - let layout = Layout::from_size_align_unchecked(size, align); - alloc.deallocate(From::from(ptr.cast()), layout) - } -} - // # Allocation error handler
#[cfg(not(no_global_oom_handling))] @@ -415,7 +394,6 @@ pub mod __alloc_error_handler { static __rust_alloc_error_handler_should_panic: u8; }
- #[allow(unused_unsafe)] if unsafe { __rust_alloc_error_handler_should_panic != 0 } { panic!("memory allocation of {size} bytes failed") } else { --- a/rust/alloc/boxed.rs +++ b/rust/alloc/boxed.rs @@ -159,12 +159,12 @@ use core::hash::{Hash, Hasher}; use core::iter::FusedIterator; use core::marker::Tuple; use core::marker::Unsize; -use core::mem; +use core::mem::{self, SizedTypeProperties}; use core::ops::{ CoerceUnsized, Deref, DerefMut, DispatchFromDyn, Generator, GeneratorState, Receiver, }; use core::pin::Pin; -use core::ptr::{self, Unique}; +use core::ptr::{self, NonNull, Unique}; use core::task::{Context, Poll};
#[cfg(not(no_global_oom_handling))] @@ -483,8 +483,12 @@ impl<T, A: Allocator> Box<T, A> { where A: Allocator, { - let layout = Layout::new::<mem::MaybeUninit<T>>(); - let ptr = alloc.allocate(layout)?.cast(); + let ptr = if T::IS_ZST { + NonNull::dangling() + } else { + let layout = Layout::new::<mem::MaybeUninit<T>>(); + alloc.allocate(layout)?.cast() + }; unsafe { Ok(Box::from_raw_in(ptr.as_ptr(), alloc)) } }
@@ -553,8 +557,12 @@ impl<T, A: Allocator> Box<T, A> { where A: Allocator, { - let layout = Layout::new::<mem::MaybeUninit<T>>(); - let ptr = alloc.allocate_zeroed(layout)?.cast(); + let ptr = if T::IS_ZST { + NonNull::dangling() + } else { + let layout = Layout::new::<mem::MaybeUninit<T>>(); + alloc.allocate_zeroed(layout)?.cast() + }; unsafe { Ok(Box::from_raw_in(ptr.as_ptr(), alloc)) } }
@@ -679,14 +687,16 @@ impl<T> Box<[T]> { #[unstable(feature = "allocator_api", issue = "32838")] #[inline] pub fn try_new_uninit_slice(len: usize) -> Result<Box<[mem::MaybeUninit<T>]>, AllocError> { - unsafe { + let ptr = if T::IS_ZST || len == 0 { + NonNull::dangling() + } else { let layout = match Layout::array::<mem::MaybeUninit<T>>(len) { Ok(l) => l, Err(_) => return Err(AllocError), }; - let ptr = Global.allocate(layout)?; - Ok(RawVec::from_raw_parts_in(ptr.as_mut_ptr() as *mut _, len, Global).into_box(len)) - } + Global.allocate(layout)?.cast() + }; + unsafe { Ok(RawVec::from_raw_parts_in(ptr.as_ptr(), len, Global).into_box(len)) } }
/// Constructs a new boxed slice with uninitialized contents, with the memory @@ -711,14 +721,16 @@ impl<T> Box<[T]> { #[unstable(feature = "allocator_api", issue = "32838")] #[inline] pub fn try_new_zeroed_slice(len: usize) -> Result<Box<[mem::MaybeUninit<T>]>, AllocError> { - unsafe { + let ptr = if T::IS_ZST || len == 0 { + NonNull::dangling() + } else { let layout = match Layout::array::<mem::MaybeUninit<T>>(len) { Ok(l) => l, Err(_) => return Err(AllocError), }; - let ptr = Global.allocate_zeroed(layout)?; - Ok(RawVec::from_raw_parts_in(ptr.as_mut_ptr() as *mut _, len, Global).into_box(len)) - } + Global.allocate_zeroed(layout)?.cast() + }; + unsafe { Ok(RawVec::from_raw_parts_in(ptr.as_ptr(), len, Global).into_box(len)) } } }
@@ -1223,7 +1235,9 @@ unsafe impl<#[may_dangle] T: ?Sized, A:
unsafe { let layout = Layout::for_value_raw(ptr.as_ptr()); - self.1.deallocate(From::from(ptr.cast()), layout) + if layout.size() != 0 { + self.1.deallocate(From::from(ptr.cast()), layout); + } } } } @@ -2173,7 +2187,7 @@ impl dyn Error + Send { let err: Box<dyn Error> = self; <dyn Error>::downcast(err).map_err(|s| unsafe { // Reapply the `Send` marker. - mem::transmute::<Box<dyn Error>, Box<dyn Error + Send>>(s) + Box::from_raw(Box::into_raw(s) as *mut (dyn Error + Send)) }) } } @@ -2187,7 +2201,7 @@ impl dyn Error + Send + Sync { let err: Box<dyn Error> = self; <dyn Error>::downcast(err).map_err(|s| unsafe { // Reapply the `Send + Sync` marker. - mem::transmute::<Box<dyn Error>, Box<dyn Error + Send + Sync>>(s) + Box::from_raw(Box::into_raw(s) as *mut (dyn Error + Send + Sync)) }) } } --- a/rust/alloc/lib.rs +++ b/rust/alloc/lib.rs @@ -60,7 +60,7 @@
// To run alloc tests without x.py without ending up with two copies of alloc, Miri needs to be // able to "empty" this crate. See https://github.com/rust-lang/miri-test-libstd/issues/4. -// rustc itself never sets the feature, so this line has no affect there. +// rustc itself never sets the feature, so this line has no effect there. #![cfg(any(not(feature = "miri-test-libstd"), test, doctest))] // #![allow(unused_attributes)] @@ -90,6 +90,8 @@ #![warn(missing_docs)] #![allow(explicit_outlives_requirements)] #![warn(multiple_supertrait_upcastable)] +#![cfg_attr(not(bootstrap), allow(internal_features))] +#![cfg_attr(not(bootstrap), allow(rustdoc::redundant_explicit_links))] // // Library features: // tidy-alphabetical-start @@ -139,7 +141,6 @@ #![feature(maybe_uninit_uninit_array_transpose)] #![feature(pattern)] #![feature(pointer_byte_offsets)] -#![feature(provide_any)] #![feature(ptr_internals)] #![feature(ptr_metadata)] #![feature(ptr_sub_ptr)] --- a/rust/alloc/raw_vec.rs +++ b/rust/alloc/raw_vec.rs @@ -471,16 +471,26 @@ impl<T, A: Allocator> RawVec<T, A> { let (ptr, layout) = if let Some(mem) = self.current_memory() { mem } else { return Ok(()) }; // See current_memory() why this assert is here let _: () = const { assert!(mem::size_of::<T>() % mem::align_of::<T>() == 0) }; - let ptr = unsafe { - // `Layout::array` cannot overflow here because it would have - // overflowed earlier when capacity was larger. - let new_size = mem::size_of::<T>().unchecked_mul(cap); - let new_layout = Layout::from_size_align_unchecked(new_size, layout.align()); - self.alloc - .shrink(ptr, layout, new_layout) - .map_err(|_| AllocError { layout: new_layout, non_exhaustive: () })? - }; - self.set_ptr_and_cap(ptr, cap); + + // If shrinking to 0, deallocate the buffer. We don't reach this point + // for the T::IS_ZST case since current_memory() will have returned + // None. + if cap == 0 { + unsafe { self.alloc.deallocate(ptr, layout) }; + self.ptr = Unique::dangling(); + self.cap = 0; + } else { + let ptr = unsafe { + // `Layout::array` cannot overflow here because it would have + // overflowed earlier when capacity was larger. + let new_size = mem::size_of::<T>().unchecked_mul(cap); + let new_layout = Layout::from_size_align_unchecked(new_size, layout.align()); + self.alloc + .shrink(ptr, layout, new_layout) + .map_err(|_| AllocError { layout: new_layout, non_exhaustive: () })? + }; + self.set_ptr_and_cap(ptr, cap); + } Ok(()) } } --- a/rust/alloc/vec/mod.rs +++ b/rust/alloc/vec/mod.rs @@ -216,7 +216,7 @@ mod spec_extend; /// /// # Indexing /// -/// The `Vec` type allows to access values by index, because it implements the +/// The `Vec` type allows access to values by index, because it implements the /// [`Index`] trait. An example will be more explicit: /// /// ``` @@ -3263,7 +3263,7 @@ impl<T, A: Allocator> Vec<T, A> { /// [`copy_from_slice`]: slice::copy_from_slice #[cfg(not(no_global_oom_handling))] #[stable(feature = "extend_ref", since = "1.2.0")] -impl<'a, T: Copy + 'a, A: Allocator + 'a> Extend<&'a T> for Vec<T, A> { +impl<'a, T: Copy + 'a, A: Allocator> Extend<&'a T> for Vec<T, A> { fn extend<I: IntoIterator<Item = &'a T>>(&mut self, iter: I) { self.spec_extend(iter.into_iter()) } --- a/rust/alloc/vec/spec_extend.rs +++ b/rust/alloc/vec/spec_extend.rs @@ -77,7 +77,7 @@ impl<T, A: Allocator> TrySpecExtend<T, I }
#[cfg(not(no_global_oom_handling))] -impl<'a, T: 'a, I, A: Allocator + 'a> SpecExtend<&'a T, I> for Vec<T, A> +impl<'a, T: 'a, I, A: Allocator> SpecExtend<&'a T, I> for Vec<T, A> where I: Iterator<Item = &'a T>, T: Clone, @@ -87,7 +87,7 @@ where } }
-impl<'a, T: 'a, I, A: Allocator + 'a> TrySpecExtend<&'a T, I> for Vec<T, A> +impl<'a, T: 'a, I, A: Allocator> TrySpecExtend<&'a T, I> for Vec<T, A> where I: Iterator<Item = &'a T>, T: Clone, @@ -98,7 +98,7 @@ where }
#[cfg(not(no_global_oom_handling))] -impl<'a, T: 'a, A: Allocator + 'a> SpecExtend<&'a T, slice::Iter<'a, T>> for Vec<T, A> +impl<'a, T: 'a, A: Allocator> SpecExtend<&'a T, slice::Iter<'a, T>> for Vec<T, A> where T: Copy, { @@ -108,7 +108,7 @@ where } }
-impl<'a, T: 'a, A: Allocator + 'a> TrySpecExtend<&'a T, slice::Iter<'a, T>> for Vec<T, A> +impl<'a, T: 'a, A: Allocator> TrySpecExtend<&'a T, slice::Iter<'a, T>> for Vec<T, A> where T: Copy, { --- a/rust/compiler_builtins.rs +++ b/rust/compiler_builtins.rs @@ -19,6 +19,7 @@ //! [`compiler_builtins`]: https://github.com/rust-lang/compiler-builtins //! [`compiler-rt`]: https://compiler-rt.llvm.org/
+#![allow(internal_features)] #![feature(compiler_builtins)] #![compiler_builtins] #![no_builtins] --- a/scripts/min-tool-version.sh +++ b/scripts/min-tool-version.sh @@ -31,7 +31,7 @@ llvm) fi ;; rustc) - echo 1.72.1 + echo 1.73.0 ;; bindgen) echo 0.65.1
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Catherine Hoang catherine.hoang@oracle.com
This is an attempt to direct the bots and humans that are testing LTS 6.6.y towards the maintainer of xfs in the 6.6.y tree.
Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS index dd5de540ec0b..40312bb550f0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23630,6 +23630,7 @@ F: include/xen/arm/swiotlb-xen.h F: include/xen/swiotlb-xen.h
XFS FILESYSTEM +M: Catherine Hoang catherine.hoang@oracle.com M: Chandan Babu R chandan.babu@oracle.com R: Darrick J. Wong djwong@kernel.org L: linux-xfs@vger.kernel.org
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 9488062805943c2d63350d3ef9e4dc093799789a upstream.
The latest version of the fs geometry structure is v5. Bump this constant so that xfs_db and mkfs calls to libxfs_fs_geometry will fill out all the fields.
IOWs, this commit is a no-op for the kernel, but will be useful for userspace reporting in later changes.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_sb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h index a5e14740ec9a..19134b23c10b 100644 --- a/fs/xfs/libxfs/xfs_sb.h +++ b/fs/xfs/libxfs/xfs_sb.h @@ -25,7 +25,7 @@ extern uint64_t xfs_sb_version_to_features(struct xfs_sb *sbp);
extern int xfs_update_secondary_sbs(struct xfs_mount *mp);
-#define XFS_FS_GEOM_MAX_STRUCT_VER (4) +#define XFS_FS_GEOM_MAX_STRUCT_VER (5) extern void xfs_fs_geometry(struct xfs_mount *mp, struct xfs_fsop_geom *geo, int struct_version); extern int xfs_sb_read_secondary(struct xfs_mount *mp,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 6c664484337b37fa0cf6e958f4019623e30d40f7 upstream.
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert the physical extent of a data fork mapping for a realtime file into rt extents and pass that to the rt extent freeing function. Since the details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to xfs_rtbitmap.c to reduce code size when realtime isn't enabled.
This will (one day) enable realtime EFIs to reuse the same unit-converting call with less code duplication.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_bmap.c | 19 +++---------------- fs/xfs/libxfs/xfs_rtbitmap.c | 33 +++++++++++++++++++++++++++++++++ fs/xfs/xfs_rtalloc.h | 5 +++++ 3 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 30c931b38853..26bfa34b4bbf 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5057,33 +5057,20 @@ xfs_bmap_del_extent_real(
flags = XFS_ILOG_CORE; if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) { - xfs_filblks_t len; - xfs_extlen_t mod; - - len = div_u64_rem(del->br_blockcount, mp->m_sb.sb_rextsize, - &mod); - ASSERT(mod == 0); - if (!(bflags & XFS_BMAPI_REMAP)) { - xfs_fsblock_t bno; - - bno = div_u64_rem(del->br_startblock, - mp->m_sb.sb_rextsize, &mod); - ASSERT(mod == 0); - - error = xfs_rtfree_extent(tp, bno, (xfs_extlen_t)len); + error = xfs_rtfree_blocks(tp, del->br_startblock, + del->br_blockcount); if (error) goto done; }
do_fx = 0; - nblks = len * mp->m_sb.sb_rextsize; qfield = XFS_TRANS_DQ_RTBCOUNT; } else { do_fx = 1; - nblks = del->br_blockcount; qfield = XFS_TRANS_DQ_BCOUNT; } + nblks = del->br_blockcount;
del_endblock = del->br_startblock + del->br_blockcount; if (cur) { diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index fa180ab66b73..655108a4cd05 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1005,6 +1005,39 @@ xfs_rtfree_extent( return 0; }
+/* + * Free some blocks in the realtime subvolume. rtbno and rtlen are in units of + * rt blocks, not rt extents; must be aligned to the rt extent size; and rtlen + * cannot exceed XFS_MAX_BMBT_EXTLEN. + */ +int +xfs_rtfree_blocks( + struct xfs_trans *tp, + xfs_fsblock_t rtbno, + xfs_filblks_t rtlen) +{ + struct xfs_mount *mp = tp->t_mountp; + xfs_rtblock_t bno; + xfs_filblks_t len; + xfs_extlen_t mod; + + ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN); + + len = div_u64_rem(rtlen, mp->m_sb.sb_rextsize, &mod); + if (mod) { + ASSERT(mod == 0); + return -EIO; + } + + bno = div_u64_rem(rtbno, mp->m_sb.sb_rextsize, &mod); + if (mod) { + ASSERT(mod == 0); + return -EIO; + } + + return xfs_rtfree_extent(tp, bno, len); +} + /* Find all the free records within a given range. */ int xfs_rtalloc_query_range( diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 62c7ad79cbb6..3b2f1b499a11 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -58,6 +58,10 @@ xfs_rtfree_extent( xfs_rtblock_t bno, /* starting block number to free */ xfs_extlen_t len); /* length of extent freed */
+/* Same as above, but in units of rt blocks. */ +int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, + xfs_filblks_t rtlen); + /* * Initialize realtime fields in the mount structure. */ @@ -139,6 +143,7 @@ int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); #else # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS) # define xfs_rtfree_extent(t,b,l) (ENOSYS) +# define xfs_rtfree_blocks(t,rb,rl) (ENOSYS) # define xfs_rtpick_extent(m,t,l,rb) (ENOSYS) # define xfs_growfs_rt(mp,in) (ENOSYS) # define xfs_rtalloc_query_range(t,l,h,f,p) (ENOSYS)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit b73494fa9a304ab95b59f07845e8d7d36e4d23e0 upstream.
Quotas aren't (yet) supported with realtime, so we shouldn't allow userspace to set up a realtime section when quotas are enabled, even if they attached one via mount options. IOWS, you shouldn't be able to do:
# mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o rtdev=/dev/sdb,usrquota # xfs_growfs -r /mnt
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_rtalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 16534e9873f6..31fd65b3aaa9 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -954,7 +954,7 @@ xfs_growfs_rt( return -EINVAL;
/* Unsupported realtime features. */ - if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp)) + if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp)) return -EOPNOTSUPP;
nrblocks = in->newblocks;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit c2988eb5cff75c02bc57e02c323154aa08f55b78 upstream.
When realtime support is not compiled into the kernel, these functions should return negative errnos, not positive errnos. While we're at it, fix a broken macro declaration.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_rtalloc.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 3b2f1b499a11..65c284e9d33e 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -141,17 +141,17 @@ int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, bool *is_free); int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); #else -# define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS) -# define xfs_rtfree_extent(t,b,l) (ENOSYS) -# define xfs_rtfree_blocks(t,rb,rl) (ENOSYS) -# define xfs_rtpick_extent(m,t,l,rb) (ENOSYS) -# define xfs_growfs_rt(mp,in) (ENOSYS) -# define xfs_rtalloc_query_range(t,l,h,f,p) (ENOSYS) -# define xfs_rtalloc_query_all(m,t,f,p) (ENOSYS) -# define xfs_rtbuf_get(m,t,b,i,p) (ENOSYS) -# define xfs_verify_rtbno(m, r) (false) -# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (ENOSYS) -# define xfs_rtalloc_reinit_frextents(m) (0) +# define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (-ENOSYS) +# define xfs_rtfree_extent(t,b,l) (-ENOSYS) +# define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) +# define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) +# define xfs_growfs_rt(mp,in) (-ENOSYS) +# define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) +# define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) +# define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) +# define xfs_verify_rtbno(m, r) (false) +# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +# define xfs_rtalloc_reinit_frextents(m) (0) static inline int /* error */ xfs_rtmount_init( xfs_mount_t *mp) /* file system mount structure */ @@ -162,7 +162,7 @@ xfs_rtmount_init( xfs_warn(mp, "Not built with CONFIG_XFS_RT"); return -ENOSYS; } -# define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS)) +# define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (-ENOSYS)) # define xfs_rtunmount_inodes(m) #endif /* CONFIG_XFS_RT */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit ddd98076d5c075c8a6c49d9e6e8ee12844137f23 upstream.
The unit conversions in this function do not make sense. First we convert a block count to bytes, then divide that bytes value by rextsize, which is in blocks, to get an rt extent count. You can't divide bytes by blocks to get a (possibly multiblock) extent value.
Fortunately nobody uses delalloc on the rt volume so this hasn't mattered.
Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_bmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 26bfa34b4bbf..617cc7e78e38 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4827,7 +4827,7 @@ xfs_bmap_del_extent_delay( ASSERT(got_endoff >= del_endoff);
if (isrt) { - uint64_t rtexts = XFS_FSB_TO_B(mp, del->br_blockcount); + uint64_t rtexts = del->br_blockcount;
do_div(rtexts, mp->m_sb.sb_rextsize); xfs_mod_frextents(mp, rtexts);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit f6a2dae2a1f52ea23f649c02615d073beba4cc35 upstream.
In commit 2a6ca4baed62, we tried to fix an overflow problem in the realtime allocator that was caused by an overly large maxlen value causing xfs_rtcheck_range to run off the end of the realtime bitmap. Unfortunately, there is a subtle bug here -- maxlen (and minlen) both have to be aligned with @prod, but @prod can be larger than 1 if the user has set an extent size hint on the file, and that extent size hint is larger than the realtime extent size.
If the rt free space extents are not aligned to this file's extszhint because other files without extent size hints allocated space (or the number of rt extents is similarly not aligned), then it's possible that maxlen after clamping to sb_rextents will no longer be aligned to prod. The allocation will succeed just fine, but we still trip the assertion.
Fix the problem by reducing maxlen by any misalignment with prod. While we're at it, split the assertions into two so that we can tell which value had the bad alignment.
Fixes: 2a6ca4baed62 ("xfs: make sure the rt allocator doesn't run off the end") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_rtalloc.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 31fd65b3aaa9..0e4e2df08aed 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -211,6 +211,23 @@ xfs_rtallocate_range( return error; }
+/* + * Make sure we don't run off the end of the rt volume. Be careful that + * adjusting maxlen downwards doesn't cause us to fail the alignment checks. + */ +static inline xfs_extlen_t +xfs_rtallocate_clamp_len( + struct xfs_mount *mp, + xfs_rtblock_t startrtx, + xfs_extlen_t rtxlen, + xfs_extlen_t prod) +{ + xfs_extlen_t ret; + + ret = min(mp->m_sb.sb_rextents, startrtx + rtxlen) - startrtx; + return rounddown(ret, prod); +} + /* * Attempt to allocate an extent minlen<=len<=maxlen starting from * bitmap block bbno. If we don't get maxlen then use prod to trim @@ -248,7 +265,7 @@ xfs_rtallocate_extent_block( i <= end; i++) { /* Make sure we don't scan off the end of the rt volume. */ - maxlen = min(mp->m_sb.sb_rextents, i + maxlen) - i; + maxlen = xfs_rtallocate_clamp_len(mp, i, maxlen, prod);
/* * See if there's a free extent of maxlen starting at i. @@ -355,7 +372,8 @@ xfs_rtallocate_extent_exact( int isfree; /* extent is free */ xfs_rtblock_t next; /* next block to try (dummy) */
- ASSERT(minlen % prod == 0 && maxlen % prod == 0); + ASSERT(minlen % prod == 0); + ASSERT(maxlen % prod == 0); /* * Check if the range in question (for maxlen) is free. */ @@ -438,7 +456,9 @@ xfs_rtallocate_extent_near( xfs_rtblock_t n; /* next block to try */ xfs_rtblock_t r; /* result block */
- ASSERT(minlen % prod == 0 && maxlen % prod == 0); + ASSERT(minlen % prod == 0); + ASSERT(maxlen % prod == 0); + /* * If the block number given is off the end, silently set it to * the last block. @@ -447,7 +467,7 @@ xfs_rtallocate_extent_near( bno = mp->m_sb.sb_rextents - 1;
/* Make sure we don't run off the end of the rt volume. */ - maxlen = min(mp->m_sb.sb_rextents, bno + maxlen) - bno; + maxlen = xfs_rtallocate_clamp_len(mp, bno, maxlen, prod); if (maxlen < minlen) { *rtblock = NULLRTBLOCK; return 0; @@ -638,7 +658,8 @@ xfs_rtallocate_extent_size( xfs_rtblock_t r; /* result block number */ xfs_suminfo_t sum; /* summary information for extents */
- ASSERT(minlen % prod == 0 && maxlen % prod == 0); + ASSERT(minlen % prod == 0); + ASSERT(maxlen % prod == 0); ASSERT(maxlen != 0);
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cheng Lin cheng.lin130@zte.com.cn
commit 2b99e410b28f5a75ae417e6389e767c7745d6fce upstream.
When abnormal drop_nlink are detected on the inode, return error, to avoid corruption propagation.
Signed-off-by: Cheng Lin cheng.lin130@zte.com.cn Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 4d55f58d99b7..fb85c5c81745 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -918,6 +918,13 @@ xfs_droplink( xfs_trans_t *tp, xfs_inode_t *ip) { + if (VFS_I(ip)->i_nlink == 0) { + xfs_alert(ip->i_mount, + "%s: Attempt to drop inode (%llu) with nlink zero.", + __func__, ip->i_ino); + return -EFSCORRUPTED; + } + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
drop_nlink(VFS_I(ip));
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 35dc55b9e80cb9ec4bcb969302000b002b2ed850 upstream.
If xfs_bmapi_write finds a delalloc extent at the requested range, it tries to convert the entire delalloc extent to a real allocation.
But if the allocator cannot find a single free extent large enough to cover the start block of the requested range, xfs_bmapi_write will return 0 but leave *nimaps set to 0.
In that case we simply need to keep looping with the same startoffset_fsb so that one of the following allocations will eventually reach the requested range.
Note that this could affect any caller of xfs_bmapi_write that covers an existing delayed allocation. As far as I can tell we do not have any other such caller, though - the regular writeback path uses xfs_bmapi_convert_delalloc to convert delayed allocations to real ones, and direct I/O invalidates the page cache first.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_bmap_util.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index fcefab687285..ad4aba5002c1 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -780,12 +780,10 @@ xfs_alloc_file_space( { xfs_mount_t *mp = ip->i_mount; xfs_off_t count; - xfs_filblks_t allocated_fsb; xfs_filblks_t allocatesize_fsb; xfs_extlen_t extsz, temp; xfs_fileoff_t startoffset_fsb; xfs_fileoff_t endoffset_fsb; - int nimaps; int rt; xfs_trans_t *tp; xfs_bmbt_irec_t imaps[1], *imapp; @@ -808,7 +806,6 @@ xfs_alloc_file_space(
count = len; imapp = &imaps[0]; - nimaps = 1; startoffset_fsb = XFS_B_TO_FSBT(mp, offset); endoffset_fsb = XFS_B_TO_FSB(mp, offset + count); allocatesize_fsb = endoffset_fsb - startoffset_fsb; @@ -819,6 +816,7 @@ xfs_alloc_file_space( while (allocatesize_fsb && !error) { xfs_fileoff_t s, e; unsigned int dblocks, rblocks, resblks; + int nimaps = 1;
/* * Determine space reservations for data/realtime. @@ -884,15 +882,19 @@ xfs_alloc_file_space( if (error) break;
- allocated_fsb = imapp->br_blockcount; - - if (nimaps == 0) { - error = -ENOSPC; - break; + /* + * If the allocator cannot find a single free extent large + * enough to cover the start block of the requested range, + * xfs_bmapi_write will return 0 but leave *nimaps set to 0. + * + * In that case we simply need to keep looping with the same + * startoffset_fsb so that one of the following allocations + * will eventually reach the requested range. + */ + if (nimaps) { + startoffset_fsb += imapp->br_blockcount; + allocatesize_fsb -= imapp->br_blockcount; } - - startoffset_fsb += allocated_fsb; - allocatesize_fsb -= allocated_fsb; }
return error;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Catherine Hoang catherine.hoang@oracle.com
commit 14a537983b228cb050ceca3a5b743d01315dc4aa upstream.
One of our VM cluster management products needs to snapshot KVM image files so that they can be restored in case of failure. Snapshotting is done by redirecting VM disk writes to a sidecar file and using reflink on the disk image, specifically the FICLONE ioctl as used by "cp --reflink". Reflink locks the source and destination files while it operates, which means that reads from the main vm disk image are blocked, causing the vm to stall. When an image file is heavily fragmented, the copy process could take several minutes. Some of the vm image files have 50-100 million extent records, and duplicating that much metadata locks the file for 30 minutes or more. Having activities suspended for such a long time in a cluster node could result in node eviction.
Clone operations and read IO do not change any data in the source file, so they should be able to run concurrently. Demote the exclusive locks taken by FICLONE to shared locks to allow reads while cloning. While a clone is in progress, writes will take the IOLOCK_EXCL, so they block until the clone completes.
Link: https://lore.kernel.org/linux-xfs/8911B94D-DD29-4D6E-B5BC-32EAF1866245@oracl... Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_file.c | 63 +++++++++++++++++++++++++++++++++++--------- fs/xfs/xfs_inode.c | 17 ++++++++++++ fs/xfs/xfs_inode.h | 9 +++++++ fs/xfs/xfs_reflink.c | 4 +++ 4 files changed, 80 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 203700278ddb..e33e5e13b95f 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -214,6 +214,43 @@ xfs_ilock_iocb( return 0; }
+static int +xfs_ilock_iocb_for_write( + struct kiocb *iocb, + unsigned int *lock_mode) +{ + ssize_t ret; + struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + + ret = xfs_ilock_iocb(iocb, *lock_mode); + if (ret) + return ret; + + if (*lock_mode == XFS_IOLOCK_EXCL) + return 0; + if (!xfs_iflags_test(ip, XFS_IREMAPPING)) + return 0; + + xfs_iunlock(ip, *lock_mode); + *lock_mode = XFS_IOLOCK_EXCL; + return xfs_ilock_iocb(iocb, *lock_mode); +} + +static unsigned int +xfs_ilock_for_write_fault( + struct xfs_inode *ip) +{ + /* get a shared lock if no remapping in progress */ + xfs_ilock(ip, XFS_MMAPLOCK_SHARED); + if (!xfs_iflags_test(ip, XFS_IREMAPPING)) + return XFS_MMAPLOCK_SHARED; + + /* wait for remapping to complete */ + xfs_iunlock(ip, XFS_MMAPLOCK_SHARED); + xfs_ilock(ip, XFS_MMAPLOCK_EXCL); + return XFS_MMAPLOCK_EXCL; +} + STATIC ssize_t xfs_file_dio_read( struct kiocb *iocb, @@ -551,7 +588,7 @@ xfs_file_dio_write_aligned( unsigned int iolock = XFS_IOLOCK_SHARED; ssize_t ret;
- ret = xfs_ilock_iocb(iocb, iolock); + ret = xfs_ilock_iocb_for_write(iocb, &iolock); if (ret) return ret; ret = xfs_file_write_checks(iocb, from, &iolock); @@ -618,7 +655,7 @@ xfs_file_dio_write_unaligned( flags = IOMAP_DIO_FORCE_WAIT; }
- ret = xfs_ilock_iocb(iocb, iolock); + ret = xfs_ilock_iocb_for_write(iocb, &iolock); if (ret) return ret;
@@ -1180,7 +1217,7 @@ xfs_file_remap_range( if (xfs_file_sync_writes(file_in) || xfs_file_sync_writes(file_out)) xfs_log_force_inode(dest); out_unlock: - xfs_iunlock2_io_mmap(src, dest); + xfs_iunlock2_remapping(src, dest); if (ret) trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); return remapped > 0 ? remapped : ret; @@ -1328,6 +1365,7 @@ __xfs_filemap_fault( struct inode *inode = file_inode(vmf->vma->vm_file); struct xfs_inode *ip = XFS_I(inode); vm_fault_t ret; + unsigned int lock_mode = 0;
trace_xfs_filemap_fault(ip, order, write_fault);
@@ -1336,25 +1374,24 @@ __xfs_filemap_fault( file_update_time(vmf->vma->vm_file); }
+ if (IS_DAX(inode) || write_fault) + lock_mode = xfs_ilock_for_write_fault(XFS_I(inode)); + if (IS_DAX(inode)) { pfn_t pfn;
- xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = xfs_dax_fault(vmf, order, write_fault, &pfn); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, order, pfn); - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + } else if (write_fault) { + ret = iomap_page_mkwrite(vmf, &xfs_page_mkwrite_iomap_ops); } else { - if (write_fault) { - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); - ret = iomap_page_mkwrite(vmf, - &xfs_page_mkwrite_iomap_ops); - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); - } else { - ret = filemap_fault(vmf); - } + ret = filemap_fault(vmf); }
+ if (lock_mode) + xfs_iunlock(XFS_I(inode), lock_mode); + if (write_fault) sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index fb85c5c81745..f9d29acd72b9 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3628,6 +3628,23 @@ xfs_iunlock2_io_mmap( inode_unlock(VFS_I(ip1)); }
+/* Drop the MMAPLOCK and the IOLOCK after a remap completes. */ +void +xfs_iunlock2_remapping( + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + xfs_iflags_clear(ip1, XFS_IREMAPPING); + + if (ip1 != ip2) + xfs_iunlock(ip1, XFS_MMAPLOCK_SHARED); + xfs_iunlock(ip2, XFS_MMAPLOCK_EXCL); + + if (ip1 != ip2) + inode_unlock_shared(VFS_I(ip1)); + inode_unlock(VFS_I(ip2)); +} + /* * Reload the incore inode list for this inode. Caller should ensure that * the link count cannot change, either by taking ILOCK_SHARED or otherwise diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 0c5bdb91152e..3dc47937da5d 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -347,6 +347,14 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip) /* Quotacheck is running but inode has not been added to quota counts. */ #define XFS_IQUOTAUNCHECKED (1 << 14)
+/* + * Remap in progress. Callers that wish to update file data while + * holding a shared IOLOCK or MMAPLOCK must drop the lock and retake + * the lock in exclusive mode. Relocking the file will block until + * IREMAPPING is cleared. + */ +#define XFS_IREMAPPING (1U << 15) + /* All inode state flags related to inode reclaim. */ #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \ XFS_IRECLAIM | \ @@ -595,6 +603,7 @@ void xfs_end_io(struct work_struct *work);
int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); +void xfs_iunlock2_remapping(struct xfs_inode *ip1, struct xfs_inode *ip2);
static inline bool xfs_inode_unlinked_incomplete( diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index eb9102453aff..658edee8381d 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1540,6 +1540,10 @@ xfs_reflink_remap_prep( if (ret) goto out_unlock;
+ xfs_iflags_set(src, XFS_IREMAPPING); + if (inode_in != inode_out) + xfs_ilock_demote(src, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL); + return 0; out_unlock: xfs_iunlock2_io_mmap(src, dest);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Long Li leo.lilong@huawei.com
commit 2a5db859c6825b5d50377dda9c3cc729c20cad43 upstream.
Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which not use transaction parameter, so it can be used after the transaction life cycle.
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_defer.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index bcfb6a4203cd..88388e12f8e7 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -245,21 +245,18 @@ xfs_defer_create_intents( return ret; }
-/* Abort all the intents that were committed. */ STATIC void -xfs_defer_trans_abort( - struct xfs_trans *tp, - struct list_head *dop_pending) +xfs_defer_pending_abort( + struct xfs_mount *mp, + struct list_head *dop_list) { struct xfs_defer_pending *dfp; const struct xfs_defer_op_type *ops;
- trace_xfs_defer_trans_abort(tp, _RET_IP_); - /* Abort intent items that don't have a done item. */ - list_for_each_entry(dfp, dop_pending, dfp_list) { + list_for_each_entry(dfp, dop_list, dfp_list) { ops = defer_op_types[dfp->dfp_type]; - trace_xfs_defer_pending_abort(tp->t_mountp, dfp); + trace_xfs_defer_pending_abort(mp, dfp); if (dfp->dfp_intent && !dfp->dfp_done) { ops->abort_intent(dfp->dfp_intent); dfp->dfp_intent = NULL; @@ -267,6 +264,16 @@ xfs_defer_trans_abort( } }
+/* Abort all the intents that were committed. */ +STATIC void +xfs_defer_trans_abort( + struct xfs_trans *tp, + struct list_head *dop_pending) +{ + trace_xfs_defer_trans_abort(tp, _RET_IP_); + xfs_defer_pending_abort(tp->t_mountp, dop_pending); +} + /* * Capture resources that the caller said not to release ("held") when the * transaction commits. Caller is responsible for zero-initializing @dres.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Long Li leo.lilong@huawei.com
commit f8f9d952e42dd49ae534f61f2fa7ca0876cb9848 upstream.
When recovering intents, we capture newly created intent items as part of committing recovered intent items. If intent recovery fails at a later point, we forget to remove those newly created intent items from the AIL and hang:
[root@localhost ~]# cat /proc/539/stack [<0>] xfs_ail_push_all_sync+0x174/0x230 [<0>] xfs_unmount_flush_inodes+0x8d/0xd0 [<0>] xfs_mountfs+0x15f7/0x1e70 [<0>] xfs_fs_fill_super+0x10ec/0x1b20 [<0>] get_tree_bdev+0x3c8/0x730 [<0>] vfs_get_tree+0x89/0x2c0 [<0>] path_mount+0xecf/0x1800 [<0>] do_mount+0xf3/0x110 [<0>] __x64_sys_mount+0x154/0x1f0 [<0>] do_syscall_64+0x39/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
When newly created intent items fail to commit via transaction, intent recovery hasn't created done items for these newly created intent items, so the capture structure is the sole owner of the captured intent items. We must release them explicitly or else they leak:
unreferenced object 0xffff888016719108 (size 432): comm "mount", pid 529, jiffies 4294706839 (age 144.463s) hex dump (first 32 bytes): 08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff ..q.......q..... 18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff ..q.......q..... backtrace: [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0 [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150 [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340 [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0 [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980 [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70 [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0 [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0 [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70 [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20 [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0 [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0 [<ffffffff81a9f35f>] path_mount+0xecf/0x1800 [<ffffffff81a9fd83>] do_mount+0xf3/0x110 [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0 [<ffffffff83968739>] do_syscall_64+0x39/0x80
Fix the problem above by abort intent items that don't have a done item when recovery intents fail.
Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery") Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_defer.c | 5 +++-- fs/xfs/libxfs/xfs_defer.h | 2 +- fs/xfs/xfs_log_recover.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 88388e12f8e7..f71679ce23b9 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -763,12 +763,13 @@ xfs_defer_ops_capture(
/* Release all resources that we used to capture deferred ops. */ void -xfs_defer_ops_capture_free( +xfs_defer_ops_capture_abort( struct xfs_mount *mp, struct xfs_defer_capture *dfc) { unsigned short i;
+ xfs_defer_pending_abort(mp, &dfc->dfc_dfops); xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
for (i = 0; i < dfc->dfc_held.dr_bufs; i++) @@ -809,7 +810,7 @@ xfs_defer_ops_capture_and_commit( /* Commit the transaction and add the capture structure to the list. */ error = xfs_trans_commit(tp); if (error) { - xfs_defer_ops_capture_free(mp, dfc); + xfs_defer_ops_capture_abort(mp, dfc); return error; }
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index 114a3a4930a3..8788ad5f6a73 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -121,7 +121,7 @@ int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp, struct list_head *capture_list); void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp, struct xfs_defer_resources *dres); -void xfs_defer_ops_capture_free(struct xfs_mount *mp, +void xfs_defer_ops_capture_abort(struct xfs_mount *mp, struct xfs_defer_capture *d); void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 13b94d2e605b..a1e18b24971a 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2511,7 +2511,7 @@ xlog_abort_defer_ops(
list_for_each_entry_safe(dfc, next, capture_list, dfc_list) { list_del_init(&dfc->dfc_list); - xfs_defer_ops_capture_free(mp, dfc); + xfs_defer_ops_capture_abort(mp, dfc); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 55f669f34184ecb25b8353f29c7f6f1ae5b313d1 upstream.
xfs_reflink_end_cow_extent looks up the COW extent and the data fork extent at offset_fsb, and then proceeds to remap the common subset between the two.
It does however not limit the remapped extent to the passed in [*offset_fsbm end_fsb] range and thus potentially remaps more blocks than the one handled by the current I/O completion. This means that with sufficiently large data and COW extents we could be remapping COW fork mappings that have not been written to, leading to a stale data exposure on a powerfail event.
We use to have a xfs_trim_range to make the remap fit the I/O completion range, but that got (apparently accidentally) removed in commit df2fd88f8ac7 ("xfs: rewrite xfs_reflink_end_cow to use intents").
Note that I've only found this by code inspection, and a test case would probably require very specific delay and error injection.
Fixes: df2fd88f8ac7 ("xfs: rewrite xfs_reflink_end_cow to use intents") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_reflink.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 658edee8381d..e5b62dc28466 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -784,6 +784,7 @@ xfs_reflink_end_cow_extent( } } del = got; + xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
/* Grab the corresponding mapping in the data fork. */ nmaps = 1;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leah Rumancik leah.rumancik@gmail.com
commit 471de20303dda0b67981e06d59cc6c4a83fd2a3c upstream.
We flush the data device cache before we issue external log IO. If the flush fails, we shut down the log immediately and return. However, the iclog->ic_sema is left in a decremented state so let's add an up(). Prior to this patch, xfs/438 would fail consistently when running with an external log device:
sync -> xfs_log_force -> xlog_write_iclog -> down(&iclog->ic_sema) -> blkdev_issue_flush (fail causes us to intiate shutdown) -> xlog_force_shutdown -> return
unmount -> xfs_log_umount -> xlog_wait_iclog_completion -> down(&iclog->ic_sema) --------> HANG
There is a second early return / shutdown. Make sure the up() happens for it as well. Also make sure we cleanup the iclog state, xlog_state_done_syncing, before dropping the iclog lock.
Fixes: b5d721eaae47 ("xfs: external logs need to flush data device") Fixes: 842a42d126b4 ("xfs: shutdown on failure to add page to log bio") Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_log.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 51c100c86177..ee206facf0dc 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1893,9 +1893,7 @@ xlog_write_iclog( * the buffer manually, the code needs to be kept in sync * with the I/O completion path. */ - xlog_state_done_syncing(iclog); - up(&iclog->ic_sema); - return; + goto sync; }
/* @@ -1925,20 +1923,17 @@ xlog_write_iclog( * avoid shutdown re-entering this path and erroring out again. */ if (log->l_targ != log->l_mp->m_ddev_targp && - blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { - xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); - return; - } + blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) + goto shutdown; } if (iclog->ic_flags & XLOG_ICL_NEED_FUA) iclog->ic_bio.bi_opf |= REQ_FUA;
iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
- if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) { - xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); - return; - } + if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) + goto shutdown; + if (is_vmalloc_addr(iclog->ic_data)) flush_kernel_vmap_range(iclog->ic_data, count);
@@ -1959,6 +1954,12 @@ xlog_write_iclog( }
submit_bio(&iclog->ic_bio); + return; +shutdown: + xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); +sync: + xlog_state_done_syncing(iclog); + up(&iclog->ic_sema); }
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Omar Sandoval osandov@fb.com
commit f63a5b3769ad7659da4c0420751d78958ab97675 upstream.
We've been seeing XFS errors like the following:
XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280 ... Call Trace: xfs_corruption_error+0x94/0xa0 xfs_btree_insert+0x221/0x280 xfs_alloc_fixup_trees+0x104/0x3e0 xfs_alloc_ag_vextent_size+0x667/0x820 xfs_alloc_fix_freelist+0x5d9/0x750 xfs_free_extent_fix_freelist+0x65/0xa0 __xfs_free_extent+0x57/0x180 ...
This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when xfs_btree_insrec() fails.
After converting this into a panic and dissecting the core dump, I found that xfs_btree_insrec() is failing because it's trying to split a leaf node in the cntbt when the AG free list is empty. In particular, it's failing to get a block from the AGFL _while trying to refill the AGFL_.
If a single operation splits every level of the bnobt and the cntbt (and the rmapbt if it is enabled) at once, the free list will be empty. Then, when the next operation tries to refill the free list, it allocates space. If the allocation does not use a full extent, it will need to insert records for the remaining space in the bnobt and cntbt. And if those new records go in full leaves, the leaves (and potentially more nodes up to the old root) need to be split.
Fix it by accounting for the additional splits that may be required to refill the free list in the calculation for the minimum free list size.
P.S. As far as I can tell, this bug has existed for a long time -- maybe back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros ...") in April 1994! It requires a very unlucky sequence of events, and in fact we didn't hit it until a particular sparse mmap workload updated from 5.12 to 5.19. But this bug existed in 5.12, so it must've been exposed by some other change in allocation or writeback patterns. It's also much less likely to be hit with the rmapbt enabled, since that increases the minimum free list size and is unlikely to split at the same time as the bnobt and cntbt.
Reviewed-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Omar Sandoval osandov@fb.com Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_alloc.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 3069194527dd..100ab5931b31 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2275,16 +2275,37 @@ xfs_alloc_min_freelist(
ASSERT(mp->m_alloc_maxlevels > 0);
+ /* + * For a btree shorter than the maximum height, the worst case is that + * every level gets split and a new level is added, then while inserting + * another entry to refill the AGFL, every level under the old root gets + * split again. This is: + * + * (full height split reservation) + (AGFL refill split height) + * = (current height + 1) + (current height - 1) + * = (new height) + (new height - 2) + * = 2 * new height - 2 + * + * For a btree of maximum height, the worst case is that every level + * under the root gets split, then while inserting another entry to + * refill the AGFL, every level under the root gets split again. This is + * also: + * + * 2 * (current height - 1) + * = 2 * (new height - 1) + * = 2 * new height - 2 + */ + /* space needed by-bno freespace btree */ min_free = min_t(unsigned int, levels[XFS_BTNUM_BNOi] + 1, - mp->m_alloc_maxlevels); + mp->m_alloc_maxlevels) * 2 - 2; /* space needed by-size freespace btree */ min_free += min_t(unsigned int, levels[XFS_BTNUM_CNTi] + 1, - mp->m_alloc_maxlevels); + mp->m_alloc_maxlevels) * 2 - 2; /* space needed reverse mapping used space btree */ if (xfs_has_rmapbt(mp)) min_free += min_t(unsigned int, levels[XFS_BTNUM_RMAPi] + 1, - mp->m_rmap_maxlevels); + mp->m_rmap_maxlevels) * 2 - 2;
return min_free; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anthony Iliopoulos ailiop@suse.com
commit a2e4388adfa44684c7c428a5a5980efe0d75e13e upstream.
Commit 57c0f4a8ea3a attempted to fix the select in the kconfig entry XFS_ONLINE_SCRUB_STATS by selecting XFS_DEBUG, but the original intention was to select DEBUG_FS, since the feature relies on debugfs to export the related scrub statistics.
Fixes: 57c0f4a8ea3a ("xfs: fix select in config XFS_ONLINE_SCRUB_STATS")
Reported-by: Holger Hoffstätte holger@applied-asynchrony.com Signed-off-by: Anthony Iliopoulos ailiop@suse.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index ed0bc8cbc703..567fb37274d3 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -147,7 +147,7 @@ config XFS_ONLINE_SCRUB_STATS bool "XFS online metadata check usage data collection" default y depends on XFS_ONLINE_SCRUB - select XFS_DEBUG + select DEBUG_FS help If you say Y here, the kernel will gather usage data about the online metadata check subsystem. This includes the number
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
commit 038ca189c0d2c1570b4d922f25b524007c85cf94 upstream.
Discovered when trying to track down a weird recovery corruption issue that wasn't detected at recovery time.
The specific corruption was a zero extent count field when big extent counts are in use, and it turns out the dinode verifier doesn't detect that specific corruption case, either. So fix it too.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index a35781577cad..0f970a0b3382 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -508,6 +508,9 @@ xfs_dinode_verify( if (mode && nextents + naextents > nblocks) return __this_address;
+ if (nextents + naextents == 0 && nblocks != 0) + return __this_address; + if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) return __this_address;
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index e6609067ef26..144198a6b270 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( struct xfs_log_dinode *ldip; uint isize; int need_free = 0; + xfs_failaddr_t fa;
if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { in_f = item->ri_buf[0].i_addr; @@ -530,8 +531,19 @@ xlog_recover_inode_commit_pass2( (dip->di_mode != 0)) error = xfs_recover_inode_owner_change(mp, dip, in_f, buffer_list); - /* re-generate the checksum. */ + /* re-generate the checksum and validate the recovered inode. */ xfs_dinode_calc_crc(log->l_mp, dip); + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); + if (fa) { + XFS_CORRUPTION_ERROR( + "Bad dinode after recovery", + XFS_ERRLEVEL_LOW, mp, dip, sizeof(*dip)); + xfs_alert(mp, + "Metadata corruption detected at %pS, inode 0x%llx", + fa, in_f->ilf_ino); + error = -EFSCORRUPTED; + goto out_release; + }
ASSERT(bp->b_mount == mp); bp->b_flags |= _XBF_LOGRECOVERY;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit ed17f7da5f0c8b65b7b5f7c98beb0aadbc0546ee upstream.
Since the introduction of xfs_dqblk in V5, xfs really ought to find the dqblk pointer from the dquot buffer, then compute the xfs_disk_dquot pointer from the dqblk pointer. Fix the open-coded xfs_buf_offset calls and do the type checking in the correct order.
Note that this has made no practical difference since the start of the xfs_disk_dquot is coincident with the start of the xfs_dqblk.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_dquot.c | 5 +++-- fs/xfs/xfs_dquot_item_recover.c | 7 ++++--- 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index ac6ba646624d..a013b87ab8d5 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -562,7 +562,8 @@ xfs_dquot_from_disk( struct xfs_dquot *dqp, struct xfs_buf *bp) { - struct xfs_disk_dquot *ddqp = bp->b_addr + dqp->q_bufoffset; + struct xfs_dqblk *dqb = xfs_buf_offset(bp, dqp->q_bufoffset); + struct xfs_disk_dquot *ddqp = &dqb->dd_diskdq;
/* * Ensure that we got the type and ID we were looking for. @@ -1250,7 +1251,7 @@ xfs_qm_dqflush( }
/* Flush the incore dquot to the ondisk buffer. */ - dqblk = bp->b_addr + dqp->q_bufoffset; + dqblk = xfs_buf_offset(bp, dqp->q_bufoffset); xfs_dquot_to_disk(&dqblk->dd_diskdq, dqp);
/* diff --git a/fs/xfs/xfs_dquot_item_recover.c b/fs/xfs/xfs_dquot_item_recover.c index 8966ba842395..db2cb5e4197b 100644 --- a/fs/xfs/xfs_dquot_item_recover.c +++ b/fs/xfs/xfs_dquot_item_recover.c @@ -65,6 +65,7 @@ xlog_recover_dquot_commit_pass2( { struct xfs_mount *mp = log->l_mp; struct xfs_buf *bp; + struct xfs_dqblk *dqb; struct xfs_disk_dquot *ddq, *recddq; struct xfs_dq_logformat *dq_f; xfs_failaddr_t fa; @@ -130,14 +131,14 @@ xlog_recover_dquot_commit_pass2( return error;
ASSERT(bp); - ddq = xfs_buf_offset(bp, dq_f->qlf_boffset); + dqb = xfs_buf_offset(bp, dq_f->qlf_boffset); + ddq = &dqb->dd_diskdq;
/* * If the dquot has an LSN in it, recover the dquot only if it's less * than the lsn of the transaction we are replaying. */ if (xfs_has_crc(mp)) { - struct xfs_dqblk *dqb = (struct xfs_dqblk *)ddq; xfs_lsn_t lsn = be64_to_cpu(dqb->dd_lsn);
if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) { @@ -147,7 +148,7 @@ xlog_recover_dquot_commit_pass2(
memcpy(ddq, recddq, item->ri_buf[1].i_len); if (xfs_has_crc(mp)) { - xfs_update_cksum((char *)ddq, sizeof(struct xfs_dqblk), + xfs_update_cksum((char *)dqb, sizeof(struct xfs_dqblk), XFS_DQUOT_CRC_OFF); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong djwong@kernel.org
commit 9c235dfc3d3f901fe22acb20f2ab37ff39f2ce02 upstream.
When we're recovering ondisk quota records from the log, we need to validate the recovered buffer contents before writing them to disk.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_dquot_item_recover.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/fs/xfs/xfs_dquot_item_recover.c b/fs/xfs/xfs_dquot_item_recover.c index db2cb5e4197b..2c2720ce6923 100644 --- a/fs/xfs/xfs_dquot_item_recover.c +++ b/fs/xfs/xfs_dquot_item_recover.c @@ -19,6 +19,7 @@ #include "xfs_log.h" #include "xfs_log_priv.h" #include "xfs_log_recover.h" +#include "xfs_error.h"
STATIC void xlog_recover_dquot_ra_pass2( @@ -152,6 +153,19 @@ xlog_recover_dquot_commit_pass2( XFS_DQUOT_CRC_OFF); }
+ /* Validate the recovered dquot. */ + fa = xfs_dqblk_verify(log->l_mp, dqb, dq_f->qlf_id); + if (fa) { + XFS_CORRUPTION_ERROR("Bad dquot after recovery", + XFS_ERRLEVEL_LOW, mp, dqb, + sizeof(struct xfs_dqblk)); + xfs_alert(mp, + "Metadata corruption detected at %pS, dquot 0x%x", + fa, dq_f->qlf_id); + error = -EFSCORRUPTED; + goto out_release; + } + ASSERT(dq_f->qlf_size == 2); ASSERT(bp->b_mount == mp); bp->b_flags |= _XBF_LOGRECOVERY;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit c421df0b19430417a04f68919fc3d1943d20ac04 upstream.
Introduce a local boolean variable if FS_XFLAG_REALTIME to make the checks for it more obvious, and de-densify a few of the conditionals using it to make them more readable while at it.
Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20231025141020.192413-4-hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_ioctl.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 55bb01173cde..be69e7be713e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1120,23 +1120,25 @@ xfs_ioctl_setattr_xflags( struct fileattr *fa) { struct xfs_mount *mp = ip->i_mount; + bool rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME); uint64_t i_flags2;
- /* Can't change realtime flag if any extents are allocated. */ - if ((ip->i_df.if_nextents || ip->i_delayed_blks) && - XFS_IS_REALTIME_INODE(ip) != (fa->fsx_xflags & FS_XFLAG_REALTIME)) - return -EINVAL; + if (rtflag != XFS_IS_REALTIME_INODE(ip)) { + /* Can't change realtime flag if any extents are allocated. */ + if (ip->i_df.if_nextents || ip->i_delayed_blks) + return -EINVAL; + }
- /* If realtime flag is set then must have realtime device */ - if (fa->fsx_xflags & FS_XFLAG_REALTIME) { + if (rtflag) { + /* If realtime flag is set then must have realtime device */ if (mp->m_sb.sb_rblocks == 0 || mp->m_sb.sb_rextsize == 0 || (ip->i_extsize % mp->m_sb.sb_rextsize)) return -EINVAL; - }
- /* Clear reflink if we are actually able to set the rt flag. */ - if ((fa->fsx_xflags & FS_XFLAG_REALTIME) && xfs_is_reflink_inode(ip)) - ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + /* Clear reflink if we are actually able to set the rt flag. */ + if (xfs_is_reflink_inode(ip)) + ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; + }
/* diflags2 only valid for v3 inodes. */ i_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig hch@lst.de
commit 9c04138414c00ae61421f36ada002712c4bac94a upstream.
Update the per-folio stable writes flag dependening on which device an inode resides on.
Signed-off-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20231025141020.192413-5-hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_inode.h | 8 ++++++++ fs/xfs/xfs_ioctl.c | 8 ++++++++ fs/xfs/xfs_iops.c | 7 +++++++ 3 files changed, 23 insertions(+)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 3dc47937da5d..3beb470f1892 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -569,6 +569,14 @@ extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
+static inline void xfs_update_stable_writes(struct xfs_inode *ip) +{ + if (bdev_stable_writes(xfs_inode_buftarg(ip)->bt_bdev)) + mapping_set_stable_writes(VFS_I(ip)->i_mapping); + else + mapping_clear_stable_writes(VFS_I(ip)->i_mapping); +} + /* * When setting up a newly allocated inode, we need to call * xfs_finish_inode_setup() once the inode is fully instantiated at diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index be69e7be713e..535f6d38cdb5 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1149,6 +1149,14 @@ xfs_ioctl_setattr_xflags( ip->i_diflags2 = i_flags2;
xfs_diflags_to_iflags(ip, false); + + /* + * Make the stable writes flag match that of the device the inode + * resides on when flipping the RT flag. + */ + if (rtflag != XFS_IS_REALTIME_INODE(ip) && S_ISREG(VFS_I(ip)->i_mode)) + xfs_update_stable_writes(ip); + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); XFS_STATS_INC(mp, xs_ig_attrchg); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 2b3b05c28e9e..b8ec045708c3 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1298,6 +1298,13 @@ xfs_setup_inode( gfp_mask = mapping_gfp_mask(inode->i_mapping); mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS)));
+ /* + * For real-time inodes update the stable write flags to that of the RT + * device instead of the data device. + */ + if (S_ISREG(inode->i_mode) && XFS_IS_REALTIME_INODE(ip)) + xfs_update_stable_writes(ip); + /* * If there is no attribute fork no ACL can exist on this inode, * and it can't have any file capabilities attached to it either.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuogee Hsieh quic_khsieh@quicinc.com
[ Upstream commit 77e8aad5519e04f6c1e132aaec1c5f8faf41844f ]
Since the value of DP_TEST_BIT_DEPTH_8 is already left shifted, in the BPC unknown case, the additional shift causes spill over to the other bits of the [DP_CONFIGURATION_CTRL] register. Fix this by changing the return value of dp_link_get_test_bits_depth() in the BPC unknown case to (DP_TEST_BIT_DEPTH_8 >> DP_TEST_BIT_DEPTH_SHIFT).
Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support") Signed-off-by: Kuogee Hsieh quic_khsieh@quicinc.com Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/573989/ Link: https://lore.kernel.org/r/1704917931-30133-1-git-send-email-quic_khsieh@quic... [quic_abhinavk@quicinc.com: fix minor checkpatch warning to align with opening braces] Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dp/dp_ctrl.c | 5 ----- drivers/gpu/drm/msm/dp/dp_link.c | 10 +++++++--- 2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c index 77a8d9366ed7..fb588fde298a 100644 --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c @@ -135,11 +135,6 @@ static void dp_ctrl_config_ctrl(struct dp_ctrl_private *ctrl) tbd = dp_link_get_test_bits_depth(ctrl->link, ctrl->panel->dp_mode.bpp);
- if (tbd == DP_TEST_BIT_DEPTH_UNKNOWN) { - pr_debug("BIT_DEPTH not set. Configure default\n"); - tbd = DP_TEST_BIT_DEPTH_8; - } - config |= tbd << DP_CONFIGURATION_CTRL_BPC_SHIFT;
/* Num of Lanes */ diff --git a/drivers/gpu/drm/msm/dp/dp_link.c b/drivers/gpu/drm/msm/dp/dp_link.c index 6375daaeb98e..487867979557 100644 --- a/drivers/gpu/drm/msm/dp/dp_link.c +++ b/drivers/gpu/drm/msm/dp/dp_link.c @@ -1211,6 +1211,9 @@ void dp_link_reset_phy_params_vx_px(struct dp_link *dp_link) u32 dp_link_get_test_bits_depth(struct dp_link *dp_link, u32 bpp) { u32 tbd; + struct dp_link_private *link; + + link = container_of(dp_link, struct dp_link_private, dp_link);
/* * Few simplistic rules and assumptions made here: @@ -1228,12 +1231,13 @@ u32 dp_link_get_test_bits_depth(struct dp_link *dp_link, u32 bpp) tbd = DP_TEST_BIT_DEPTH_10; break; default: - tbd = DP_TEST_BIT_DEPTH_UNKNOWN; + drm_dbg_dp(link->drm_dev, "bpp=%d not supported, use bpc=8\n", + bpp); + tbd = DP_TEST_BIT_DEPTH_8; break; }
- if (tbd != DP_TEST_BIT_DEPTH_UNKNOWN) - tbd = (tbd >> DP_TEST_BIT_DEPTH_SHIFT); + tbd = (tbd >> DP_TEST_BIT_DEPTH_SHIFT);
return tbd; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuogee Hsieh quic_khsieh@quicinc.com
[ Upstream commit fcccdafd91f8bdde568b86ff70848cf83f029add ]
MSA MISC0 bit 1 to 7 contains Colorimetry Indicator Field. dp_link_get_colorimetry_config() returns wrong colorimetry value in the DP_TEST_DYNAMIC_RANGE_CEA case in the current implementation. Hence fix this problem by having dp_link_get_colorimetry_config() return defined CEA RGB colorimetry value in the case of DP_TEST_DYNAMIC_RANGE_CEA.
Changes in V2: -- drop retrieving colorimetry from colorspace -- drop dr = link->dp_link.test_video.test_dyn_range assignment
Changes in V3: -- move defined MISCr0a Colorimetry vale to dp_reg.h -- rewording commit title -- rewording commit text to more precise describe this patch
Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support") Signed-off-by: Kuogee Hsieh quic_khsieh@quicinc.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/574888/ Link: https://lore.kernel.org/r/1705526010-597-1-git-send-email-quic_khsieh@quicin... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dp/dp_link.c | 12 +++++++----- drivers/gpu/drm/msm/dp/dp_reg.h | 3 +++ 2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/msm/dp/dp_link.c b/drivers/gpu/drm/msm/dp/dp_link.c index 487867979557..25950171caf3 100644 --- a/drivers/gpu/drm/msm/dp/dp_link.c +++ b/drivers/gpu/drm/msm/dp/dp_link.c @@ -7,6 +7,7 @@
#include <drm/drm_print.h>
+#include "dp_reg.h" #include "dp_link.h" #include "dp_panel.h"
@@ -1114,7 +1115,7 @@ int dp_link_process_request(struct dp_link *dp_link)
int dp_link_get_colorimetry_config(struct dp_link *dp_link) { - u32 cc; + u32 cc = DP_MISC0_COLORIMERY_CFG_LEGACY_RGB; struct dp_link_private *link;
if (!dp_link) { @@ -1128,10 +1129,11 @@ int dp_link_get_colorimetry_config(struct dp_link *dp_link) * Unless a video pattern CTS test is ongoing, use RGB_VESA * Only RGB_VESA and RGB_CEA supported for now */ - if (dp_link_is_video_pattern_requested(link)) - cc = link->dp_link.test_video.test_dyn_range; - else - cc = DP_TEST_DYNAMIC_RANGE_VESA; + if (dp_link_is_video_pattern_requested(link)) { + if (link->dp_link.test_video.test_dyn_range & + DP_TEST_DYNAMIC_RANGE_CEA) + cc = DP_MISC0_COLORIMERY_CFG_CEA_RGB; + }
return cc; } diff --git a/drivers/gpu/drm/msm/dp/dp_reg.h b/drivers/gpu/drm/msm/dp/dp_reg.h index ea85a691e72b..78785ed4b40c 100644 --- a/drivers/gpu/drm/msm/dp/dp_reg.h +++ b/drivers/gpu/drm/msm/dp/dp_reg.h @@ -143,6 +143,9 @@ #define DP_MISC0_COLORIMETRY_CFG_SHIFT (0x00000001) #define DP_MISC0_TEST_BITS_DEPTH_SHIFT (0x00000005)
+#define DP_MISC0_COLORIMERY_CFG_LEGACY_RGB (0) +#define DP_MISC0_COLORIMERY_CFG_CEA_RGB (0x04) + #define REG_DP_VALID_BOUNDARY (0x00000030) #define REG_DP_VALID_BOUNDARY_2 (0x00000034)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit 7f3d03c48b1eb6bc45ab20ca98b8b11be25f9f52 ]
The commit 8b45a26f2ba9 ("drm/msm/dpu: reserve cdm blocks for writeback in case of YUV output") introduced a smatch warning about another conditional block in dpu_encoder_helper_phys_cleanup() which had assumed hw_pp will always be valid which may not necessarily be true.
Lets fix the other conditional block by making sure hw_pp is valid before dereferencing it.
Reported-by: Dan Carpenter dan.carpenter@linaro.org Fixes: ae4d721ce100 ("drm/msm/dpu: add an API to reset the encoder related hw blocks") Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/574878/ Link: https://lore.kernel.org/r/20240117194109.21609-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c index 7d4cf81fd31c..ca4e5eae8e06 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c @@ -2063,7 +2063,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc) }
/* reset the merge 3D HW block */ - if (phys_enc->hw_pp->merge_3d) { + if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) { phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d, BLEND_3D_NONE); if (phys_enc->hw_ctl->ops.update_pending_flush_merge_3d) @@ -2085,7 +2085,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc) if (phys_enc->hw_wb) intf_cfg.wb = phys_enc->hw_wb->idx;
- if (phys_enc->hw_pp->merge_3d) + if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) intf_cfg.merge_3d = phys_enc->hw_pp->merge_3d->idx;
if (ctl->ops.reset_intf_cfg)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
[ Upstream commit a7a6a01f88e87dec4bf2365571dd2dc7403d52d0 ]
The recently introduced EFI memory attributes protocol should be used if it exists to ensure that the memory allocation created for the kernel permits execution. This is needed for compatibility with tightened requirements related to Windows logo certification for x86 PCs.
Currently, we simply strip the execute protect (XP) attribute from the entire range, but this might be rejected under some firmware security policies, and so in a subsequent patch, this will be changed to only strip XP from the executable region that runs early, and make it read-only (RO) as well.
In order to catch any issues early, ensure that the memory attribute protocol works as intended, and give up if it produces spurious errors.
Note that the DXE services based fallback was always based on best effort, so don't propagate any errors returned by that API.
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/x86-stub.c | 24 ++++++++++++++---------- drivers/firmware/efi/libstub/x86-stub.h | 4 ++-- 2 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 70b325a2f1f3..5d0934ae7dc8 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -223,8 +223,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params) } }
-void efi_adjust_memory_range_protection(unsigned long start, - unsigned long size) +efi_status_t efi_adjust_memory_range_protection(unsigned long start, + unsigned long size) { efi_status_t status; efi_gcd_memory_space_desc_t desc; @@ -236,13 +236,17 @@ void efi_adjust_memory_range_protection(unsigned long start, rounded_end = roundup(start + size, EFI_PAGE_SIZE);
if (memattr != NULL) { - efi_call_proto(memattr, clear_memory_attributes, rounded_start, - rounded_end - rounded_start, EFI_MEMORY_XP); - return; + status = efi_call_proto(memattr, clear_memory_attributes, + rounded_start, + rounded_end - rounded_start, + EFI_MEMORY_XP); + if (status != EFI_SUCCESS) + efi_warn("Failed to clear EFI_MEMORY_XP attribute\n"); + return status; }
if (efi_dxe_table == NULL) - return; + return EFI_SUCCESS;
/* * Don't modify memory region attributes, they are @@ -255,7 +259,7 @@ void efi_adjust_memory_range_protection(unsigned long start, status = efi_dxe_call(get_memory_space_descriptor, start, &desc);
if (status != EFI_SUCCESS) - return; + break;
next = desc.base_address + desc.length;
@@ -280,8 +284,10 @@ void efi_adjust_memory_range_protection(unsigned long start, unprotect_start, unprotect_start + unprotect_size, status); + break; } } + return EFI_SUCCESS; }
static void setup_unaccepted_memory(void) @@ -837,9 +843,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
*kernel_entry = addr + entry;
- efi_adjust_memory_range_protection(addr, kernel_total_size); - - return EFI_SUCCESS; + return efi_adjust_memory_range_protection(addr, kernel_total_size); }
static void __noreturn enter_kernel(unsigned long kernel_addr, diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h index 2748bca192df..4433d0f97441 100644 --- a/drivers/firmware/efi/libstub/x86-stub.h +++ b/drivers/firmware/efi/libstub/x86-stub.h @@ -7,8 +7,8 @@ extern struct boot_params *boot_params_pointer asm("boot_params"); extern void trampoline_32bit_src(void *, bool); extern const u16 trampoline_ljmp_imm_offset;
-void efi_adjust_memory_range_protection(unsigned long start, - unsigned long size); +efi_status_t efi_adjust_memory_range_protection(unsigned long start, + unsigned long size);
#ifdef CONFIG_X86_64 efi_status_t efi_setup_5level_paging(void);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ardb@kernel.org
[ Upstream commit 2f77465b05b1270c832b5e2ee27037672ad2a10a ]
The EFI stub's kernel placement logic randomizes the physical placement of the kernel by taking all available memory into account, and picking a region at random, based on a random seed.
When KASLR is disabled, this seed is set to 0x0, and this results in the lowest available region of memory to be selected for loading the kernel, even if this is below LOAD_PHYSICAL_ADDR. Some of this memory is typically reserved for the GFP_DMA region, to accommodate masters that can only access the first 16 MiB of system memory.
Even if such devices are rare these days, we may still end up with a warning in the kernel log, as reported by Tom:
swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
Fix this by tweaking the random allocation logic to accept a low bound on the placement, and set it to LOAD_PHYSICAL_ADDR.
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot") Reported-by: Tom Englund tomenglund26@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218404 Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/efistub.h | 3 ++- drivers/firmware/efi/libstub/kaslr.c | 2 +- drivers/firmware/efi/libstub/randomalloc.c | 12 +++++++----- drivers/firmware/efi/libstub/x86-stub.c | 1 + drivers/firmware/efi/libstub/zboot.c | 2 +- 5 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index 212687c30d79..c04b82ea40f2 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -956,7 +956,8 @@ efi_status_t efi_get_random_bytes(unsigned long size, u8 *out);
efi_status_t efi_random_alloc(unsigned long size, unsigned long align, unsigned long *addr, unsigned long random_seed, - int memory_type, unsigned long alloc_limit); + int memory_type, unsigned long alloc_min, + unsigned long alloc_max);
efi_status_t efi_random_get_seed(void);
diff --git a/drivers/firmware/efi/libstub/kaslr.c b/drivers/firmware/efi/libstub/kaslr.c index 62d63f7a2645..1a9808012abd 100644 --- a/drivers/firmware/efi/libstub/kaslr.c +++ b/drivers/firmware/efi/libstub/kaslr.c @@ -119,7 +119,7 @@ efi_status_t efi_kaslr_relocate_kernel(unsigned long *image_addr, */ status = efi_random_alloc(*reserve_size, min_kimg_align, reserve_addr, phys_seed, - EFI_LOADER_CODE, EFI_ALLOC_LIMIT); + EFI_LOADER_CODE, 0, EFI_ALLOC_LIMIT); if (status != EFI_SUCCESS) efi_warn("efi_random_alloc() failed: 0x%lx\n", status); } else { diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c index 674a064b8f7a..4e96a855fdf4 100644 --- a/drivers/firmware/efi/libstub/randomalloc.c +++ b/drivers/firmware/efi/libstub/randomalloc.c @@ -17,7 +17,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md, unsigned long size, unsigned long align_shift, - u64 alloc_limit) + u64 alloc_min, u64 alloc_max) { unsigned long align = 1UL << align_shift; u64 first_slot, last_slot, region_end; @@ -30,11 +30,11 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md, return 0;
region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1, - alloc_limit); + alloc_max); if (region_end < size) return 0;
- first_slot = round_up(md->phys_addr, align); + first_slot = round_up(max(md->phys_addr, alloc_min), align); last_slot = round_down(region_end - size + 1, align);
if (first_slot > last_slot) @@ -56,7 +56,8 @@ efi_status_t efi_random_alloc(unsigned long size, unsigned long *addr, unsigned long random_seed, int memory_type, - unsigned long alloc_limit) + unsigned long alloc_min, + unsigned long alloc_max) { unsigned long total_slots = 0, target_slot; unsigned long total_mirrored_slots = 0; @@ -78,7 +79,8 @@ efi_status_t efi_random_alloc(unsigned long size, efi_memory_desc_t *md = (void *)map->map + map_offset; unsigned long slots;
- slots = get_entry_num_slots(md, size, ilog2(align), alloc_limit); + slots = get_entry_num_slots(md, size, ilog2(align), alloc_min, + alloc_max); MD_NUM_SLOTS(md) = slots; total_slots += slots; if (md->attribute & EFI_MEMORY_MORE_RELIABLE) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 5d0934ae7dc8..4a11470bed5e 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -831,6 +831,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr, seed[0], EFI_LOADER_CODE, + LOAD_PHYSICAL_ADDR, EFI_X86_KERNEL_ALLOC_LIMIT); if (status != EFI_SUCCESS) return status; diff --git a/drivers/firmware/efi/libstub/zboot.c b/drivers/firmware/efi/libstub/zboot.c index bdb17eac0cb4..1ceace956758 100644 --- a/drivers/firmware/efi/libstub/zboot.c +++ b/drivers/firmware/efi/libstub/zboot.c @@ -119,7 +119,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab) }
status = efi_random_alloc(alloc_size, min_kimg_align, &image_base, - seed, EFI_LOADER_CODE, EFI_ALLOC_LIMIT); + seed, EFI_LOADER_CODE, 0, EFI_ALLOC_LIMIT); if (status != EFI_SUCCESS) { efi_err("Failed to allocate memory\n"); goto free_cmdline;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Furong Xu 0x1207@gmail.com
[ Upstream commit 46eba193d04f8bd717e525eb4110f3c46c12aec3 ]
Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core") checks and reports safety errors, but leaves the Data Path Parity Errors for each channel in DMA unhandled at all, lead to a storm of interrupt. Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core") Signed-off-by: Furong Xu 0x1207@gmail.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/common.h | 1 + .../net/ethernet/stmicro/stmmac/dwxgmac2.h | 3 + .../ethernet/stmicro/stmmac/dwxgmac2_core.c | 57 ++++++++++++++++++- 3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index 1e996c29043d..3d4f34e178a8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -216,6 +216,7 @@ struct stmmac_safety_stats { unsigned long mac_errors[32]; unsigned long mtl_errors[32]; unsigned long dma_errors[32]; + unsigned long dma_dpp_errors[32]; };
/* Number of fields in Safety Stats */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h index a4e8b498dea9..7d7133ef4994 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h @@ -319,6 +319,8 @@ #define XGMAC_RXCEIE BIT(4) #define XGMAC_TXCEIE BIT(0) #define XGMAC_MTL_ECC_INT_STATUS 0x000010cc +#define XGMAC_MTL_DPP_CONTROL 0x000010e0 +#define XGMAC_DDPP_DISABLE BIT(0) #define XGMAC_MTL_TXQ_OPMODE(x) (0x00001100 + (0x80 * (x))) #define XGMAC_TQS GENMASK(25, 16) #define XGMAC_TQS_SHIFT 16 @@ -401,6 +403,7 @@ #define XGMAC_DCEIE BIT(1) #define XGMAC_TCEIE BIT(0) #define XGMAC_DMA_ECC_INT_STATUS 0x0000306c +#define XGMAC_DMA_DPP_INT_STATUS 0x00003074 #define XGMAC_DMA_CH_CONTROL(x) (0x00003100 + (0x80 * (x))) #define XGMAC_SPH BIT(24) #define XGMAC_PBLx8 BIT(16) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c index a74e71db79f9..e7eccc0c406f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c @@ -830,6 +830,43 @@ static const struct dwxgmac3_error_desc dwxgmac3_dma_errors[32]= { { false, "UNKNOWN", "Unknown Error" }, /* 31 */ };
+static const char * const dpp_rx_err = "Read Rx Descriptor Parity checker Error"; +static const char * const dpp_tx_err = "Read Tx Descriptor Parity checker Error"; +static const struct dwxgmac3_error_desc dwxgmac3_dma_dpp_errors[32] = { + { true, "TDPES0", dpp_tx_err }, + { true, "TDPES1", dpp_tx_err }, + { true, "TDPES2", dpp_tx_err }, + { true, "TDPES3", dpp_tx_err }, + { true, "TDPES4", dpp_tx_err }, + { true, "TDPES5", dpp_tx_err }, + { true, "TDPES6", dpp_tx_err }, + { true, "TDPES7", dpp_tx_err }, + { true, "TDPES8", dpp_tx_err }, + { true, "TDPES9", dpp_tx_err }, + { true, "TDPES10", dpp_tx_err }, + { true, "TDPES11", dpp_tx_err }, + { true, "TDPES12", dpp_tx_err }, + { true, "TDPES13", dpp_tx_err }, + { true, "TDPES14", dpp_tx_err }, + { true, "TDPES15", dpp_tx_err }, + { true, "RDPES0", dpp_rx_err }, + { true, "RDPES1", dpp_rx_err }, + { true, "RDPES2", dpp_rx_err }, + { true, "RDPES3", dpp_rx_err }, + { true, "RDPES4", dpp_rx_err }, + { true, "RDPES5", dpp_rx_err }, + { true, "RDPES6", dpp_rx_err }, + { true, "RDPES7", dpp_rx_err }, + { true, "RDPES8", dpp_rx_err }, + { true, "RDPES9", dpp_rx_err }, + { true, "RDPES10", dpp_rx_err }, + { true, "RDPES11", dpp_rx_err }, + { true, "RDPES12", dpp_rx_err }, + { true, "RDPES13", dpp_rx_err }, + { true, "RDPES14", dpp_rx_err }, + { true, "RDPES15", dpp_rx_err }, +}; + static void dwxgmac3_handle_dma_err(struct net_device *ndev, void __iomem *ioaddr, bool correctable, struct stmmac_safety_stats *stats) @@ -841,6 +878,13 @@ static void dwxgmac3_handle_dma_err(struct net_device *ndev,
dwxgmac3_log_error(ndev, value, correctable, "DMA", dwxgmac3_dma_errors, STAT_OFF(dma_errors), stats); + + value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS); + writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS); + + dwxgmac3_log_error(ndev, value, false, "DMA_DPP", + dwxgmac3_dma_dpp_errors, + STAT_OFF(dma_dpp_errors), stats); }
static int @@ -881,6 +925,12 @@ dwxgmac3_safety_feat_config(void __iomem *ioaddr, unsigned int asp, value |= XGMAC_TMOUTEN; /* FSM Timeout Feature */ writel(value, ioaddr + XGMAC_MAC_FSM_CONTROL);
+ /* 5. Enable Data Path Parity Protection */ + value = readl(ioaddr + XGMAC_MTL_DPP_CONTROL); + /* already enabled by default, explicit enable it again */ + value &= ~XGMAC_DDPP_DISABLE; + writel(value, ioaddr + XGMAC_MTL_DPP_CONTROL); + return 0; }
@@ -914,7 +964,11 @@ static int dwxgmac3_safety_feat_irq_status(struct net_device *ndev, ret |= !corr; }
- err = dma & (XGMAC_DEUIS | XGMAC_DECIS); + /* DMA_DPP_Interrupt_Status is indicated by MCSIS bit in + * DMA_Safety_Interrupt_Status, so we handle DMA Data Path + * Parity Errors here + */ + err = dma & (XGMAC_DEUIS | XGMAC_DECIS | XGMAC_MCSIS); corr = dma & XGMAC_DECIS; if (err) { dwxgmac3_handle_dma_err(ndev, ioaddr, corr, stats); @@ -930,6 +984,7 @@ static const struct dwxgmac3_error { { dwxgmac3_mac_errors }, { dwxgmac3_mtl_errors }, { dwxgmac3_dma_errors }, + { dwxgmac3_dma_dpp_errors }, };
static int dwxgmac3_safety_feat_dump(struct stmmac_safety_stats *stats,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 9480adfe4e0f0319b9da04b44e4eebd5ad07e0cd ]
This looks up the link under RCU protection, but isn't guaranteed to actually have protection. Fix that.
Fixes: 8cc07265b691 ("wifi: mac80211: handle TDLS data frames with MLO") Link: https://msgid.link/20240129155348.8a9c0b1e1d89.I553f96ce953bb41b0b877d592056... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/tx.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index d45d4be63dd8..5481acbfc1d4 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -3086,10 +3086,11 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) /* DA SA BSSID */ build.da_offs = offsetof(struct ieee80211_hdr, addr1); build.sa_offs = offsetof(struct ieee80211_hdr, addr2); + rcu_read_lock(); link = rcu_dereference(sdata->link[tdls_link_id]); - if (WARN_ON_ONCE(!link)) - break; - memcpy(hdr->addr3, link->u.mgd.bssid, ETH_ALEN); + if (!WARN_ON_ONCE(!link)) + memcpy(hdr->addr3, link->u.mgd.bssid, ETH_ALEN); + rcu_read_unlock(); build.hdr_len = 24; break; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit a0b4f2291319c5d47ecb196b90400814fdcfd126 ]
This should be waiting if we don't have a beacon yet, but somehow I managed to invert the logic. Fix that.
Fixes: 74e1309acedc ("wifi: mac80211: mlme: look up beacon elems only if needed") Link: https://msgid.link/20240131164856.922701229546.I239b379e7cee04608e73c016b737... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/mlme.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 73f8df03d159..d9e716f38b0e 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -7727,8 +7727,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
rcu_read_lock(); beacon_ies = rcu_dereference(req->bss->beacon_ies); - - if (beacon_ies) { + if (!beacon_ies) { /* * Wait up to one beacon interval ... * should this be more if we miss one?
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit 16867c38bcd3be2eb9016a3198a096f93959086e ]
Currently the driver exits eSR by calling iwl_mvm_esr_mode_inactive() before updating the FW (by deactivating one of the links), and therefore before sending the EML frame notifying that we are no longer in eSR.
This is wrong for several reasons: 1. The driver sends SMPS activation frames when we are still in eSR and SMPS should be disabled when in eSR 2. The driver restores RLC configuration as it was before eSR entering, and RLC command shouldn't be sent in eSR
Fix this by calling iwl_mvm_esr_mode_inactive() after FW update
Fixes: 12bacfc2c065 ("wifi: iwlwifi: handle eSR transitions") Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Reviewed-by: Ilan Peer ilan.peer@intel.com Reviewed-by: Gregory Greenman gregory.greenman@intel.com Link: https://msgid.link/20240201155157.d8d9dc277d4e.Ib5aee0fd05e35b1da7f18753eb3c... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c index 1e58f0234293..2d1fd7ac8577 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c @@ -435,6 +435,9 @@ __iwl_mvm_mld_unassign_vif_chanctx(struct iwl_mvm *mvm, mvmvif->ap_ibss_active = false; }
+ iwl_mvm_link_changed(mvm, vif, link_conf, + LINK_CONTEXT_MODIFY_ACTIVE, false); + if (iwl_mvm_is_esr_supported(mvm->fwrt.trans) && n_active > 1) { int ret = iwl_mvm_esr_mode_inactive(mvm, vif);
@@ -446,9 +449,6 @@ __iwl_mvm_mld_unassign_vif_chanctx(struct iwl_mvm *mvm, if (vif->type == NL80211_IFTYPE_MONITOR) iwl_mvm_mld_rm_snif_sta(mvm, vif);
- iwl_mvm_link_changed(mvm, vif, link_conf, - LINK_CONTEXT_MODIFY_ACTIVE, false); - if (switching_chanctx) return; mvmvif->link[link_id]->phy_ctxt = NULL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 5bdda0048c8d1bbe2019513b2d6200cc0d09c7bd ]
After commit e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by"), the compiler may enforce dynamic array indexing of req->channels to stay below n_channels. As a result, n_channels needs to be increased _before_ accessing the newly added array index. Increment it first, then use "i" for the prior index. Solves this warning in the coming GCC that has __counted_by support:
../drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c: In function 'brcmf_internal_escan_add_info': ../drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:3783:46: warning: operation on 'req-> n_channels' may be undefined [-Wsequence-point] 3783 | req->channels[req->n_channels++] = chan; | ~~~~~~~~~~~~~~~^~
Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Cc: Arend van Spriel aspriel@gmail.com Cc: Franky Lin franky.lin@broadcom.com Cc: Hante Meuleman hante.meuleman@broadcom.com Cc: Kalle Valo kvalo@kernel.org Cc: Chi-hsien Lin chi-hsien.lin@infineon.com Cc: Ian Lin ian.lin@infineon.com Cc: Johannes Berg johannes.berg@intel.com Cc: Wright Feng wright.feng@cypress.com Cc: Hector Martin marcan@marcan.st Cc: linux-wireless@vger.kernel.org Cc: brcm80211-dev-list.pdl@broadcom.com Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://msgid.link/20240126223150.work.548-kees@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index 2a90bb24ba77..6049f9a761d9 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -3780,8 +3780,10 @@ static int brcmf_internal_escan_add_info(struct cfg80211_scan_request *req, if (req->channels[i] == chan) break; } - if (i == req->n_channels) - req->channels[req->n_channels++] = chan; + if (i == req->n_channels) { + req->n_channels++; + req->channels[i] = chan; + }
for (i = 0; i < req->n_ssids; i++) { if (req->ssids[i].ssid_len == ssid_len &&
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit ba5e1272142d051dcc57ca1d3225ad8a089f9858 ]
Many syzbot reports include the following trace [1]
If nsim_dev_trap_report_work() can not grab the mutex, it should rearm itself at least one jiffie later.
[1] Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Workqueue: events nsim_dev_trap_report_work RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline] RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline] RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline] RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline] RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline] RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189 Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00 RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046 RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3 RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0 RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002 R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> instrument_atomic_read include/linux/instrumented.h:68 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline] queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline] debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline] do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline] _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194 debug_object_activate+0x349/0x540 lib/debugobjects.c:726 debug_work_activate kernel/workqueue.c:578 [inline] insert_work+0x30/0x230 kernel/workqueue.c:1650 __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802 __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953 queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989 queue_delayed_work include/linux/workqueue.h:563 [inline] schedule_delayed_work include/linux/workqueue.h:677 [inline] nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842 process_one_work+0x886/0x15d0 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787 kthread+0x2c6/0x3a0 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 </TASK>
Fixes: 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20240201175324.3752746-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/netdevsim/dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c index b4d3b9cde8bd..92a7a36b93ac 100644 --- a/drivers/net/netdevsim/dev.c +++ b/drivers/net/netdevsim/dev.c @@ -835,14 +835,14 @@ static void nsim_dev_trap_report_work(struct work_struct *work) trap_report_dw.work); nsim_dev = nsim_trap_data->nsim_dev;
- /* For each running port and enabled packet trap, generate a UDP - * packet with a random 5-tuple and report it. - */ if (!devl_trylock(priv_to_devlink(nsim_dev))) { - schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 0); + schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 1); return; }
+ /* For each running port and enabled packet trap, generate a UDP + * packet with a random 5-tuple and report it. + */ list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) { if (!netif_running(nsim_dev_port->ns->netdev)) continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 2e7d3b67630dfd8f178c41fa2217aa00e79a5887 ]
Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes for PTP HWTS ring but then generic aq_ring_free() does not take this into account. Create and use a specific function to free HWTS ring to fix this issue.
Trace: [ 215.351607] ------------[ cut here ]------------ [ 215.351612] DMA-API: atlantic 0000:4b:00.0: device driver frees DMA memory with different size [device address=0x00000000fbdd0000] [map size=34816 bytes] [unmap size=32768 bytes] [ 215.351635] WARNING: CPU: 33 PID: 10759 at kernel/dma/debug.c:988 check_unmap+0xa6f/0x2360 ... [ 215.581176] Call Trace: [ 215.583632] <TASK> [ 215.585745] ? show_trace_log_lvl+0x1c4/0x2df [ 215.590114] ? show_trace_log_lvl+0x1c4/0x2df [ 215.594497] ? debug_dma_free_coherent+0x196/0x210 [ 215.599305] ? check_unmap+0xa6f/0x2360 [ 215.603147] ? __warn+0xca/0x1d0 [ 215.606391] ? check_unmap+0xa6f/0x2360 [ 215.610237] ? report_bug+0x1ef/0x370 [ 215.613921] ? handle_bug+0x3c/0x70 [ 215.617423] ? exc_invalid_op+0x14/0x50 [ 215.621269] ? asm_exc_invalid_op+0x16/0x20 [ 215.625480] ? check_unmap+0xa6f/0x2360 [ 215.629331] ? mark_lock.part.0+0xca/0xa40 [ 215.633445] debug_dma_free_coherent+0x196/0x210 [ 215.638079] ? __pfx_debug_dma_free_coherent+0x10/0x10 [ 215.643242] ? slab_free_freelist_hook+0x11d/0x1d0 [ 215.648060] dma_free_attrs+0x6d/0x130 [ 215.651834] aq_ring_free+0x193/0x290 [atlantic] [ 215.656487] aq_ptp_ring_free+0x67/0x110 [atlantic] ... [ 216.127540] ---[ end trace 6467e5964dd2640b ]--- [ 216.132160] DMA-API: Mapped at: [ 216.132162] debug_dma_alloc_coherent+0x66/0x2f0 [ 216.132165] dma_alloc_attrs+0xf5/0x1b0 [ 216.132168] aq_ring_hwts_rx_alloc+0x150/0x1f0 [atlantic] [ 216.132193] aq_ptp_ring_alloc+0x1bb/0x540 [atlantic] [ 216.132213] aq_nic_init+0x4a1/0x760 [atlantic]
Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20240201094752.883026-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/aquantia/atlantic/aq_ptp.c | 4 ++-- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 13 +++++++++++++ drivers/net/ethernet/aquantia/atlantic/aq_ring.h | 1 + 3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c b/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c index abd4832e4ed2..5acb3e16b567 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c @@ -993,7 +993,7 @@ int aq_ptp_ring_alloc(struct aq_nic_s *aq_nic) return 0;
err_exit_hwts_rx: - aq_ring_free(&aq_ptp->hwts_rx); + aq_ring_hwts_rx_free(&aq_ptp->hwts_rx); err_exit_ptp_rx: aq_ring_free(&aq_ptp->ptp_rx); err_exit_ptp_tx: @@ -1011,7 +1011,7 @@ void aq_ptp_ring_free(struct aq_nic_s *aq_nic)
aq_ring_free(&aq_ptp->ptp_tx); aq_ring_free(&aq_ptp->ptp_rx); - aq_ring_free(&aq_ptp->hwts_rx); + aq_ring_hwts_rx_free(&aq_ptp->hwts_rx);
aq_ptp_skb_ring_release(&aq_ptp->skb_ring); } diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c index cda8597b4e14..f7433abd6591 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -919,6 +919,19 @@ void aq_ring_free(struct aq_ring_s *self) } }
+void aq_ring_hwts_rx_free(struct aq_ring_s *self) +{ + if (!self) + return; + + if (self->dx_ring) { + dma_free_coherent(aq_nic_get_dev(self->aq_nic), + self->size * self->dx_size + AQ_CFG_RXDS_DEF, + self->dx_ring, self->dx_ring_pa); + self->dx_ring = NULL; + } +} + unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data) { unsigned int count; diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h index 52847310740a..d627ace850ff 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h @@ -210,6 +210,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self); int aq_ring_hwts_rx_alloc(struct aq_ring_s *self, struct aq_nic_s *aq_nic, unsigned int idx, unsigned int size, unsigned int dx_size); +void aq_ring_hwts_rx_free(struct aq_ring_s *self); void aq_ring_hwts_rx_clean(struct aq_ring_s *self, struct aq_nic_s *aq_nic);
unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit cb9f4a30fb85e1f4f149ada595a67899adb3db19 ]
The udpgro_fwd.sh self-tests are somewhat unstable. There are a few timing constraints the we struggle to meet on very slow environments.
Instead of skipping the whole tests in such envs, increase the test resilience WRT very slow hosts: increase the inter-packets timeouts, avoid resetting the counters every second and finally disable reduce the background traffic noise.
Tested with:
for I in $(seq 1 100); do ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:udpgro_fwd.sh || exit -1 done
in a slow environment.
Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/f4b6b11064a0d39182a9ae6a853abae3e9b4426a.170681200... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgro_fwd.sh | 14 ++++++++++++-- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- 2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh index d6b9c759043c..9cd5e885e91f 100755 --- a/tools/testing/selftests/net/udpgro_fwd.sh +++ b/tools/testing/selftests/net/udpgro_fwd.sh @@ -39,6 +39,10 @@ create_ns() { for ns in $NS_SRC $NS_DST; do ip netns add $ns ip -n $ns link set dev lo up + + # disable route solicitations to decrease 'noise' traffic + ip netns exec $ns sysctl -qw net.ipv6.conf.default.router_solicitations=0 + ip netns exec $ns sysctl -qw net.ipv6.conf.all.router_solicitations=0 done
ip link add name veth$SRC type veth peer name veth$DST @@ -80,6 +84,12 @@ create_vxlan_pair() { create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V6$((3 - $ns)) vxlan6$ns 6 ip -n $BASE$ns addr add dev vxlan6$ns $OL_NET_V6$ns/24 nodad done + + # preload neighbur cache, do avoid some noisy traffic + local addr_dst=$(ip -j -n $BASE$DST link show dev vxlan6$DST |jq -r '.[]["address"]') + local addr_src=$(ip -j -n $BASE$SRC link show dev vxlan6$SRC |jq -r '.[]["address"]') + ip -n $BASE$DST neigh add dev vxlan6$DST lladdr $addr_src $OL_NET_V6$SRC + ip -n $BASE$SRC neigh add dev vxlan6$SRC lladdr $addr_dst $OL_NET_V6$DST }
is_ipv6() { @@ -119,7 +129,7 @@ run_test() { # not enable GRO ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 4789 ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000 - ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args & + ip netns exec $NS_DST ./udpgso_bench_rx -C 2000 -R 100 -n 10 -l 1300 $rx_args & local spid=$! wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst @@ -168,7 +178,7 @@ run_bench() { # bind the sender and the receiver to different CPUs to try # get reproducible results ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus" - ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10 & + ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 2000 -R 100 & local spid=$! wait_local_port_listen "$NS_DST" 8000 udp ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c index f35a924d4a30..1cbadd267c96 100644 --- a/tools/testing/selftests/net/udpgso_bench_rx.c +++ b/tools/testing/selftests/net/udpgso_bench_rx.c @@ -375,7 +375,7 @@ static void do_recv(void) do_flush_udp(fd);
tnow = gettimeofday_ms(); - if (tnow > treport) { + if (!cfg_expected_pkt_nr && tnow > treport) { if (packets) fprintf(stderr, "%s rx: %6lu MB/s %8lu calls/s\n",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 0f4765d0b48d90ede9788c7edb2e072eee20f88e ]
Here is the test result after conversion.
# ./unicast_extensions.sh /usr/bin/which: no nettest in (/root/.local/bin:/root/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin) ########################################################################### Unicast address extensions tests (behavior of reserved IPv4 addresses) ########################################################################### TEST: assign and ping within 240/4 (1 of 2) (is allowed) [ OK ] TEST: assign and ping within 240/4 (2 of 2) (is allowed) [ OK ] TEST: assign and ping within 0/8 (1 of 2) (is allowed) [ OK ]
...
TEST: assign and ping class D address (is forbidden) [ OK ] TEST: routing using class D (is forbidden) [ OK ] TEST: routing using 127/8 (is forbidden) [ OK ]
Acked-by: David Ahern dsahern@kernel.org Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: e71e016ad0f6 ("selftests: net: fix tcp listener handling in pmtu.sh") Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/net/unicast_extensions.sh | 99 +++++++++---------- 1 file changed, 46 insertions(+), 53 deletions(-)
diff --git a/tools/testing/selftests/net/unicast_extensions.sh b/tools/testing/selftests/net/unicast_extensions.sh index 2d10ccac898a..b7a2cb9e7477 100755 --- a/tools/testing/selftests/net/unicast_extensions.sh +++ b/tools/testing/selftests/net/unicast_extensions.sh @@ -28,8 +28,7 @@ # These tests provide an easy way to flip the expected result of any # of these behaviors for testing kernel patches that change them.
-# Kselftest framework requirement - SKIP code is 4. -ksft_skip=4 +source ./lib.sh
# nettest can be run from PATH or from same directory as this selftest if ! which nettest >/dev/null; then @@ -61,20 +60,20 @@ _do_segmenttest(){ # foo --- bar # Arguments: ip_a ip_b prefix_length test_description # - # Caller must set up foo-ns and bar-ns namespaces + # Caller must set up $foo_ns and $bar_ns namespaces # containing linked veth devices foo and bar, # respectively.
- ip -n foo-ns address add $1/$3 dev foo || return 1 - ip -n foo-ns link set foo up || return 1 - ip -n bar-ns address add $2/$3 dev bar || return 1 - ip -n bar-ns link set bar up || return 1 + ip -n $foo_ns address add $1/$3 dev foo || return 1 + ip -n $foo_ns link set foo up || return 1 + ip -n $bar_ns address add $2/$3 dev bar || return 1 + ip -n $bar_ns link set bar up || return 1
- ip netns exec foo-ns timeout 2 ping -c 1 $2 || return 1 - ip netns exec bar-ns timeout 2 ping -c 1 $1 || return 1 + ip netns exec $foo_ns timeout 2 ping -c 1 $2 || return 1 + ip netns exec $bar_ns timeout 2 ping -c 1 $1 || return 1
- nettest -B -N bar-ns -O foo-ns -r $1 || return 1 - nettest -B -N foo-ns -O bar-ns -r $2 || return 1 + nettest -B -N $bar_ns -O $foo_ns -r $1 || return 1 + nettest -B -N $foo_ns -O $bar_ns -r $2 || return 1
return 0 } @@ -88,31 +87,31 @@ _do_route_test(){ # Arguments: foo_ip foo1_ip bar1_ip bar_ip prefix_len test_description # Displays test result and returns success or failure.
- # Caller must set up foo-ns, bar-ns, and router-ns + # Caller must set up $foo_ns, $bar_ns, and $router_ns # containing linked veth devices foo-foo1, bar1-bar - # (foo in foo-ns, foo1 and bar1 in router-ns, and - # bar in bar-ns). - - ip -n foo-ns address add $1/$5 dev foo || return 1 - ip -n foo-ns link set foo up || return 1 - ip -n foo-ns route add default via $2 || return 1 - ip -n bar-ns address add $4/$5 dev bar || return 1 - ip -n bar-ns link set bar up || return 1 - ip -n bar-ns route add default via $3 || return 1 - ip -n router-ns address add $2/$5 dev foo1 || return 1 - ip -n router-ns link set foo1 up || return 1 - ip -n router-ns address add $3/$5 dev bar1 || return 1 - ip -n router-ns link set bar1 up || return 1 - - echo 1 | ip netns exec router-ns tee /proc/sys/net/ipv4/ip_forward - - ip netns exec foo-ns timeout 2 ping -c 1 $2 || return 1 - ip netns exec foo-ns timeout 2 ping -c 1 $4 || return 1 - ip netns exec bar-ns timeout 2 ping -c 1 $3 || return 1 - ip netns exec bar-ns timeout 2 ping -c 1 $1 || return 1 - - nettest -B -N bar-ns -O foo-ns -r $1 || return 1 - nettest -B -N foo-ns -O bar-ns -r $4 || return 1 + # (foo in $foo_ns, foo1 and bar1 in $router_ns, and + # bar in $bar_ns). + + ip -n $foo_ns address add $1/$5 dev foo || return 1 + ip -n $foo_ns link set foo up || return 1 + ip -n $foo_ns route add default via $2 || return 1 + ip -n $bar_ns address add $4/$5 dev bar || return 1 + ip -n $bar_ns link set bar up || return 1 + ip -n $bar_ns route add default via $3 || return 1 + ip -n $router_ns address add $2/$5 dev foo1 || return 1 + ip -n $router_ns link set foo1 up || return 1 + ip -n $router_ns address add $3/$5 dev bar1 || return 1 + ip -n $router_ns link set bar1 up || return 1 + + echo 1 | ip netns exec $router_ns tee /proc/sys/net/ipv4/ip_forward + + ip netns exec $foo_ns timeout 2 ping -c 1 $2 || return 1 + ip netns exec $foo_ns timeout 2 ping -c 1 $4 || return 1 + ip netns exec $bar_ns timeout 2 ping -c 1 $3 || return 1 + ip netns exec $bar_ns timeout 2 ping -c 1 $1 || return 1 + + nettest -B -N $bar_ns -O $foo_ns -r $1 || return 1 + nettest -B -N $foo_ns -O $bar_ns -r $4 || return 1
return 0 } @@ -121,17 +120,15 @@ segmenttest(){ # Sets up veth link and tries to connect over it. # Arguments: ip_a ip_b prefix_len test_description hide_output - ip netns add foo-ns - ip netns add bar-ns - ip link add foo netns foo-ns type veth peer name bar netns bar-ns + setup_ns foo_ns bar_ns + ip link add foo netns $foo_ns type veth peer name bar netns $bar_ns
test_result=0 _do_segmenttest "$@" || test_result=1
- ip netns pids foo-ns | xargs -r kill -9 - ip netns pids bar-ns | xargs -r kill -9 - ip netns del foo-ns - ip netns del bar-ns + ip netns pids $foo_ns | xargs -r kill -9 + ip netns pids $bar_ns | xargs -r kill -9 + cleanup_ns $foo_ns $bar_ns show_output
# inverted tests will expect failure instead of success @@ -147,21 +144,17 @@ route_test(){ # Returns success or failure.
hide_output - ip netns add foo-ns - ip netns add bar-ns - ip netns add router-ns - ip link add foo netns foo-ns type veth peer name foo1 netns router-ns - ip link add bar netns bar-ns type veth peer name bar1 netns router-ns + setup_ns foo_ns bar_ns router_ns + ip link add foo netns $foo_ns type veth peer name foo1 netns $router_ns + ip link add bar netns $bar_ns type veth peer name bar1 netns $router_ns
test_result=0 _do_route_test "$@" || test_result=1
- ip netns pids foo-ns | xargs -r kill -9 - ip netns pids bar-ns | xargs -r kill -9 - ip netns pids router-ns | xargs -r kill -9 - ip netns del foo-ns - ip netns del bar-ns - ip netns del router-ns + ip netns pids $foo_ns | xargs -r kill -9 + ip netns pids $bar_ns | xargs -r kill -9 + ip netns pids $router_ns | xargs -r kill -9 + cleanup_ns $foo_ns $bar_ns $router_ns
show_output
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 378f082eaf3760cd7430fbcb1e4f8626bb6bc0ae ]
pmtu test use /bin/sh, so we need to source ./lib.sh instead of lib.sh Here is the test result after conversion.
# ./pmtu.sh TEST: ipv4: PMTU exceptions [ OK ] TEST: ipv4: PMTU exceptions - nexthop objects [ OK ] TEST: ipv6: PMTU exceptions [ OK ] TEST: ipv6: PMTU exceptions - nexthop objects [ OK ] ... TEST: ipv4: list and flush cached exceptions - nexthop objects [ OK ] TEST: ipv6: list and flush cached exceptions [ OK ] TEST: ipv6: list and flush cached exceptions - nexthop objects [ OK ] TEST: ipv4: PMTU exception w/route replace [ OK ] TEST: ipv4: PMTU exception w/route replace - nexthop objects [ OK ] TEST: ipv6: PMTU exception w/route replace [ OK ] TEST: ipv6: PMTU exception w/route replace - nexthop objects [ OK ]
Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: e71e016ad0f6 ("selftests: net: fix tcp listener handling in pmtu.sh") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/pmtu.sh | 27 +++++++++------------------ 1 file changed, 9 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 4a5f031be232..8518eaacf4b5 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -198,8 +198,7 @@ # - pmtu_ipv6_route_change # Same as above but with IPv6
-# Kselftest framework requirement - SKIP code is 4. -ksft_skip=4 +source ./lib.sh
PAUSE_ON_FAIL=no VERBOSE=0 @@ -268,16 +267,6 @@ tests=" pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1 pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1"
-NS_A="ns-A" -NS_B="ns-B" -NS_C="ns-C" -NS_R1="ns-R1" -NS_R2="ns-R2" -ns_a="ip netns exec ${NS_A}" -ns_b="ip netns exec ${NS_B}" -ns_c="ip netns exec ${NS_C}" -ns_r1="ip netns exec ${NS_R1}" -ns_r2="ip netns exec ${NS_R2}" # Addressing and routing for tests with routers: four network segments, with # index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an # identifier ID, which is 1 for hosts (A and B), 2 for routers (R1 and R2). @@ -543,13 +532,17 @@ setup_ip6ip6() { }
setup_namespaces() { + setup_ns NS_A NS_B NS_C NS_R1 NS_R2 for n in ${NS_A} ${NS_B} ${NS_C} ${NS_R1} ${NS_R2}; do - ip netns add ${n} || return 1 - # Disable DAD, so that we don't have to wait to use the # configured IPv6 addresses ip netns exec ${n} sysctl -q net/ipv6/conf/default/accept_dad=0 done + ns_a="ip netns exec ${NS_A}" + ns_b="ip netns exec ${NS_B}" + ns_c="ip netns exec ${NS_C}" + ns_r1="ip netns exec ${NS_R1}" + ns_r2="ip netns exec ${NS_R2}" }
setup_veth() { @@ -839,7 +832,7 @@ setup_bridge() { run_cmd ${ns_a} ip link set br0 up
run_cmd ${ns_c} ip link add veth_C-A type veth peer name veth_A-C - run_cmd ${ns_c} ip link set veth_A-C netns ns-A + run_cmd ${ns_c} ip link set veth_A-C netns ${NS_A}
run_cmd ${ns_a} ip link set veth_A-C up run_cmd ${ns_c} ip link set veth_C-A up @@ -944,9 +937,7 @@ cleanup() { done socat_pids=
- for n in ${NS_A} ${NS_B} ${NS_C} ${NS_R1} ${NS_R2}; do - ip netns del ${n} 2> /dev/null - done + cleanup_all_ns
ip link del veth_A-C 2>/dev/null ip link del veth_A-R1 2>/dev/null
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yujie Liu yujie.liu@intel.com
[ Upstream commit 05d92cb0e919239c29b3a26da1f76f1e18fed7d3 ]
The patch set [1] added a general lib.sh in net selftests, and converted several test scripts to source the lib.sh.
unicast_extensions.sh (converted in [1]) and pmtu.sh (converted in [2]) have a /bin/sh shebang which may point to various shells in different distributions, but "source" is only available in some of them. For example, "source" is a built-it function in bash, but it cannot be used in dash.
Refer to other scripts that were converted together, simply change the shebang to bash to fix the following issues when the default /bin/sh points to other shells.
not ok 51 selftests: net: unicast_extensions.sh # exit=1
v1 -> v2: - Fix pmtu.sh which has the same issue as unicast_extensions.sh, suggested by Hangbin - Change the style of the "source" line to be consistent with other tests, suggested by Hangbin
Link: https://lore.kernel.org/all/20231202020110.362433-1-liuhangbin@gmail.com/ [1] Link: https://lore.kernel.org/all/20231219094856.1740079-1-liuhangbin@gmail.com/ [2] Reported-by: kernel test robot oliver.sang@intel.com Fixes: 378f082eaf37 ("selftests/net: convert pmtu.sh to run it in unique namespace") Fixes: 0f4765d0b48d ("selftests/net: convert unicast_extensions.sh to run it in unique namespace") Signed-off-by: Yujie Liu yujie.liu@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Muhammad Usama Anjum usama.anjum@collabora.com Link: https://lore.kernel.org/r/20231229131931.3961150-1-yujie.liu@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: e71e016ad0f6 ("selftests: net: fix tcp listener handling in pmtu.sh") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/pmtu.sh | 4 ++-- tools/testing/selftests/net/unicast_extensions.sh | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 8518eaacf4b5..3f118e3f1c66 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # SPDX-License-Identifier: GPL-2.0 # # Check that route PMTU values match expectations, and that initial device MTU @@ -198,7 +198,7 @@ # - pmtu_ipv6_route_change # Same as above but with IPv6
-source ./lib.sh +source lib.sh
PAUSE_ON_FAIL=no VERBOSE=0 diff --git a/tools/testing/selftests/net/unicast_extensions.sh b/tools/testing/selftests/net/unicast_extensions.sh index b7a2cb9e7477..f52aa5f7da52 100755 --- a/tools/testing/selftests/net/unicast_extensions.sh +++ b/tools/testing/selftests/net/unicast_extensions.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # SPDX-License-Identifier: GPL-2.0 # # By Seth Schoen (c) 2021, for the IPv4 Unicast Extensions Project @@ -28,7 +28,7 @@ # These tests provide an easy way to flip the expected result of any # of these behaviors for testing kernel patches that change them.
-source ./lib.sh +source lib.sh
# nettest can be run from PATH or from same directory as this selftest if ! which nettest >/dev/null; then
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit e71e016ad0f6e641a7898b8cda5f62f8e2beb2f1 ]
The pmtu.sh test uses a few TCP listener in a problematic way: It hard-codes a constant timeout to wait for the listener starting-up in background. That introduces unneeded latency and on very slow and busy host it can fail.
Additionally the test starts again the same listener in the same namespace on the same port, just after the previous connection completed. Fast host can attempt starting the new server before the old one really closed the socket.
Address the issues using the wait_local_port_listen helper and explicitly waiting for the background listener process exit.
Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/f8e8f6d44427d8c45e9f6a71ee1a321047452087.170681200... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/pmtu.sh | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index 3f118e3f1c66..f0febc19baae 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -199,6 +199,7 @@ # Same as above but with IPv6
source lib.sh +source net_helper.sh
PAUSE_ON_FAIL=no VERBOSE=0 @@ -1336,13 +1337,15 @@ test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() { TCPDST="TCP:[${dst}]:50000" fi ${ns_b} socat -T 3 -u -6 TCP-LISTEN:50000 STDOUT > $tmpoutfile & + local socat_pid=$!
- sleep 1 + wait_local_port_listen ${NS_B} 50000 tcp
dd if=/dev/zero status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3
size=$(du -sb $tmpoutfile) size=${size%%/tmp/*} + wait ${socat_pid}
[ $size -ne 1048576 ] && err "File size $size mismatches exepcted value in locally bridged vxlan test" && return 1 done
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 691bb4e49c98a47bc643dd808453136ce78b15b4 ]
Using hard-coded constant timeout to wait for some expected event is deemed to fail sooner or later, especially in slow env.
Our CI has spotted another of such race: # TEST: ipv6: cleanup of cached exceptions - nexthop objects [FAIL] # can't delete veth device in a timely manner, PMTU dst likely leaked
Replace the crude sleep with a loop looking for the expected condition at low interval for a much longer range.
Fixes: b3cc4f8a8a41 ("selftests: pmtu: add explicit tests for PMTU exceptions cleanup") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/fd5c745e9bb665b724473af6a9373a8c2a62b247.170681200... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/pmtu.sh | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh index f0febc19baae..d65fdd407d73 100755 --- a/tools/testing/selftests/net/pmtu.sh +++ b/tools/testing/selftests/net/pmtu.sh @@ -1957,6 +1957,13 @@ check_command() { return 0 }
+check_running() { + pid=${1} + cmd=${2} + + [ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "{cmd}" ] +} + test_cleanup_vxlanX_exception() { outer="${1}" encap="vxlan" @@ -1987,11 +1994,12 @@ test_cleanup_vxlanX_exception() {
${ns_a} ip link del dev veth_A-R1 & iplink_pid=$! - sleep 1 - if [ "$(cat /proc/${iplink_pid}/cmdline 2>/dev/null | tr -d '\0')" = "iplinkdeldevveth_A-R1" ]; then - err " can't delete veth device in a timely manner, PMTU dst likely leaked" - return 1 - fi + for i in $(seq 1 20); do + check_running ${iplink_pid} "iplinkdeldevveth_A-R1" || return 0 + sleep 0.1 + done + err " can't delete veth device in a timely manner, PMTU dst likely leaked" + return 1 }
test_cleanup_ipv6_exception() {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerhard Engleder gerhard@engleder-embedded.com
[ Upstream commit d7f5fb33cf77247b7bf9a871aaeea72ca4f51ad7 ]
For XDP_TX action xdp_buff is converted to xdp_frame. The conversion is done by xdp_convert_buff_to_frame(). The memory type of the resulting xdp_frame depends on the memory type of the xdp_buff. For page pool based xdp_buff it produces xdp_frame with memory type MEM_TYPE_PAGE_POOL. For zero copy XSK pool based xdp_buff it produces xdp_frame with memory type MEM_TYPE_PAGE_ORDER0.
tsnep_xdp_xmit_back() is not prepared for that and uses always the page pool buffer type TSNEP_TX_TYPE_XDP_TX. This leads to invalid mappings and the transmission of undefined data.
Improve tsnep_xdp_xmit_back() to use the generic buffer type TSNEP_TX_TYPE_XDP_NDO for zero copy XDP_TX.
Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder gerhard@engleder-embedded.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/engleder/tsnep_main.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index 08e113e785a7..4f36b29d66c8 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -668,17 +668,25 @@ static void tsnep_xdp_xmit_flush(struct tsnep_tx *tx)
static bool tsnep_xdp_xmit_back(struct tsnep_adapter *adapter, struct xdp_buff *xdp, - struct netdev_queue *tx_nq, struct tsnep_tx *tx) + struct netdev_queue *tx_nq, struct tsnep_tx *tx, + bool zc) { struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp); bool xmit; + u32 type;
if (unlikely(!xdpf)) return false;
+ /* no page pool for zero copy */ + if (zc) + type = TSNEP_TX_TYPE_XDP_NDO; + else + type = TSNEP_TX_TYPE_XDP_TX; + __netif_tx_lock(tx_nq, smp_processor_id());
- xmit = tsnep_xdp_xmit_frame_ring(xdpf, tx, TSNEP_TX_TYPE_XDP_TX); + xmit = tsnep_xdp_xmit_frame_ring(xdpf, tx, type);
/* Avoid transmit queue timeout since we share it with the slow path */ if (xmit) @@ -1222,7 +1230,7 @@ static bool tsnep_xdp_run_prog(struct tsnep_rx *rx, struct bpf_prog *prog, case XDP_PASS: return false; case XDP_TX: - if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx)) + if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx, false)) goto out_failure; *status |= TSNEP_XDP_TX; return true; @@ -1272,7 +1280,7 @@ static bool tsnep_xdp_run_prog_zc(struct tsnep_rx *rx, struct bpf_prog *prog, case XDP_PASS: return false; case XDP_TX: - if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx)) + if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx, true)) goto out_failure; *status |= TSNEP_XDP_TX; return true;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antoine Tenart atenart@kernel.org
[ Upstream commit d75abeec401f8c86b470e7028a13fcdc87e5dd06 ]
If the ICMPv6 error is built from a non-linear skb we get the following splat,
BUG: KASAN: slab-out-of-bounds in do_csum+0x220/0x240 Read of size 4 at addr ffff88811d402c80 by task netperf/820 CPU: 0 PID: 820 Comm: netperf Not tainted 6.8.0-rc1+ #543 ... kasan_report+0xd8/0x110 do_csum+0x220/0x240 csum_partial+0xc/0x20 skb_tunnel_check_pmtu+0xeb9/0x3280 vxlan_xmit_one+0x14c2/0x4080 vxlan_xmit+0xf61/0x5c00 dev_hard_start_xmit+0xfb/0x510 __dev_queue_xmit+0x7cd/0x32a0 br_dev_queue_push_xmit+0x39d/0x6a0
Use skb_checksum instead of csum_partial who cannot deal with non-linear SKBs.
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Antoine Tenart atenart@kernel.org Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_tunnel_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 586b1b3e35b8..80ccd6661aa3 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -332,7 +332,7 @@ static int iptunnel_pmtud_build_icmpv6(struct sk_buff *skb, int mtu) }; skb_reset_network_header(skb);
- csum = csum_partial(icmp6h, len, 0); + csum = skb_checksum(skb, skb_transport_offset(skb), len, 0); icmp6h->icmp6_cksum = csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, len, IPPROTO_ICMPV6, csum);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit f3616173bf9be9bf39d131b120d6eea4e6324cb5 ]
When alloc_scq fails, card->vcs[0] (i.e. vc) should be freed. Otherwise, in the following call chain:
idt77252_init_one |-> idt77252_dev_open |-> open_card_ubr0 |-> alloc_scq [failed] |-> deinit_card |-> vfree(card->vcs);
card->vcs is freed and card->vcs[0] is leaked.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/atm/idt77252.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c index e327a0229dc1..e7f713cd70d3 100644 --- a/drivers/atm/idt77252.c +++ b/drivers/atm/idt77252.c @@ -2930,6 +2930,8 @@ open_card_ubr0(struct idt77252_dev *card) vc->scq = alloc_scq(card, vc->class); if (!vc->scq) { printk("%s: can't get SCQ.\n", card->name); + kfree(card->vcs[0]); + card->vcs[0] = NULL; return -ENOMEM; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit b09b58e31b0f43d76f79b9943da3fb7c2843dcbb ]
When qmem_alloc and pfvf->hw_ops->sq_aq_init fails, sq->sg should be freed to prevent memleak.
Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/nic/otx2_common.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index 629cf1659e5f..e6df4e6a78ab 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -951,8 +951,11 @@ int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura) if (pfvf->ptp && qidx < pfvf->hw.tx_queues) { err = qmem_alloc(pfvf->dev, &sq->timestamps, qset->sqe_cnt, sizeof(*sq->timestamps)); - if (err) + if (err) { + kfree(sq->sg); + sq->sg = NULL; return err; + } }
sq->head = 0; @@ -968,7 +971,14 @@ int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura) sq->stats.bytes = 0; sq->stats.pkts = 0;
- return pfvf->hw_ops->sq_aq_init(pfvf, qidx, sqb_aura); + err = pfvf->hw_ops->sq_aq_init(pfvf, qidx, sqb_aura); + if (err) { + kfree(sq->sg); + sq->sg = NULL; + return err; + } + + return 0;
}
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Loic Prylli lprylli@netflix.com
[ Upstream commit 1168491e7f53581ba7b6014a39a49cfbbb722feb ]
the ASPEED_PTCR_RESULT Register can only hold the result for a single fan input. Adding a mutex to protect the register until the reading is done.
Signed-off-by: Loic Prylli lprylli@netflix.com Signed-off-by: Alexander Hansen alexander.hansen@9elements.com Fixes: 2d7a548a3eff ("drivers: hwmon: Support for ASPEED PWM/Fan tach") Link: https://lore.kernel.org/r/121d888762a1232ef403cf35230ccf7b3887083a.169900740... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/aspeed-pwm-tacho.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/hwmon/aspeed-pwm-tacho.c b/drivers/hwmon/aspeed-pwm-tacho.c index 997df4b40509..b2ae2176f11f 100644 --- a/drivers/hwmon/aspeed-pwm-tacho.c +++ b/drivers/hwmon/aspeed-pwm-tacho.c @@ -193,6 +193,8 @@ struct aspeed_pwm_tacho_data { u8 fan_tach_ch_source[16]; struct aspeed_cooling_device *cdev[8]; const struct attribute_group *groups[3]; + /* protects access to shared ASPEED_PTCR_RESULT */ + struct mutex tach_lock; };
enum type { TYPEM, TYPEN, TYPEO }; @@ -527,6 +529,8 @@ static int aspeed_get_fan_tach_ch_rpm(struct aspeed_pwm_tacho_data *priv, u8 fan_tach_ch_source, type, mode, both; int ret;
+ mutex_lock(&priv->tach_lock); + regmap_write(priv->regmap, ASPEED_PTCR_TRIGGER, 0); regmap_write(priv->regmap, ASPEED_PTCR_TRIGGER, 0x1 << fan_tach_ch);
@@ -544,6 +548,8 @@ static int aspeed_get_fan_tach_ch_rpm(struct aspeed_pwm_tacho_data *priv, ASPEED_RPM_STATUS_SLEEP_USEC, usec);
+ mutex_unlock(&priv->tach_lock); + /* return -ETIMEDOUT if we didn't get an answer. */ if (ret) return ret; @@ -903,6 +909,7 @@ static int aspeed_pwm_tacho_probe(struct platform_device *pdev) priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; + mutex_init(&priv->tach_lock); priv->regmap = devm_regmap_init(dev, NULL, (__force void *)regs, &aspeed_pwm_tacho_regmap_config); if (IS_ERR(priv->regmap))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 4e440abc894585a34c2904a32cd54af1742311b3 ]
Fix a bug that pdata->cpu_map[] is set before out-of-bounds check. The problem might be triggered on systems with more than 128 cores per package.
Fixes: 7108b80a542b ("hwmon/coretemp: Handle large core ID value") Signed-off-by: Zhang Rui rui.zhang@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240202092144.71180-2-rui.zhang@intel.com Signed-off-by: Guenter Roeck linux@roeck-us.net Stable-dep-of: fdaf0c8629d4 ("hwmon: (coretemp) Fix bogus core_id to attr name mapping") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/coretemp.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index ba82d1e79c13..e78c76919111 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -509,18 +509,14 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu, if (pkg_flag) { attr_no = PKG_SYSFS_ATTR_NO; } else { - index = ida_alloc(&pdata->ida, GFP_KERNEL); + index = ida_alloc_max(&pdata->ida, NUM_REAL_CORES - 1, GFP_KERNEL); if (index < 0) return index; + pdata->cpu_map[index] = topology_core_id(cpu); attr_no = index + BASE_SYSFS_ATTR_NO; }
- if (attr_no > MAX_CORE_DATA - 1) { - err = -ERANGE; - goto ida_free; - } - tdata = init_temp_data(cpu, pkg_flag); if (!tdata) { err = -ENOMEM;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit fdaf0c8629d4524a168cb9e4ad4231875749b28c ]
Before commit 7108b80a542b ("hwmon/coretemp: Handle large core ID value"), there is a fixed mapping between 1. cpu_core_id 2. the index in pdata->core_data[] array 3. the sysfs attr name, aka "tempX_" The later two always equal cpu_core_id + 2.
After the commit, pdata->core_data[] index is got from ida so that it can handle sparse core ids and support more cores within a package.
However, the commit erroneously maps the sysfs attr name to pdata->core_data[] index instead of cpu_core_id + 2.
As a result, the code is not aligned with the comments, and brings user visible changes in hwmon sysfs on systems with sparse core id.
For example, before commit 7108b80a542b ("hwmon/coretemp: Handle large core ID value"), /sys/class/hwmon/hwmon2/temp2_label:Core 0 /sys/class/hwmon/hwmon2/temp3_label:Core 1 /sys/class/hwmon/hwmon2/temp4_label:Core 2 /sys/class/hwmon/hwmon2/temp5_label:Core 3 /sys/class/hwmon/hwmon2/temp6_label:Core 4 /sys/class/hwmon/hwmon3/temp10_label:Core 8 /sys/class/hwmon/hwmon3/temp11_label:Core 9 after commit, /sys/class/hwmon/hwmon2/temp2_label:Core 0 /sys/class/hwmon/hwmon2/temp3_label:Core 1 /sys/class/hwmon/hwmon2/temp4_label:Core 2 /sys/class/hwmon/hwmon2/temp5_label:Core 3 /sys/class/hwmon/hwmon2/temp6_label:Core 4 /sys/class/hwmon/hwmon2/temp7_label:Core 8 /sys/class/hwmon/hwmon2/temp8_label:Core 9
Restore the previous behavior and rework the code, comments and variable names to avoid future confusions.
Fixes: 7108b80a542b ("hwmon/coretemp: Handle large core ID value") Signed-off-by: Zhang Rui rui.zhang@intel.com Link: https://lore.kernel.org/r/20240202092144.71180-3-rui.zhang@intel.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/coretemp.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index e78c76919111..95f4c0b00b2d 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -419,7 +419,7 @@ static ssize_t show_temp(struct device *dev, }
static int create_core_attrs(struct temp_data *tdata, struct device *dev, - int attr_no) + int index) { int i; static ssize_t (*const rd_ptr[TOTAL_ATTRS]) (struct device *dev, @@ -431,13 +431,20 @@ static int create_core_attrs(struct temp_data *tdata, struct device *dev, };
for (i = 0; i < tdata->attr_size; i++) { + /* + * We map the attr number to core id of the CPU + * The attr number is always core id + 2 + * The Pkgtemp will always show up as temp1_*, if available + */ + int attr_no = tdata->is_pkg_data ? 1 : tdata->cpu_core_id + 2; + snprintf(tdata->attr_name[i], CORETEMP_NAME_LENGTH, "temp%d_%s", attr_no, suffixes[i]); sysfs_attr_init(&tdata->sd_attrs[i].dev_attr.attr); tdata->sd_attrs[i].dev_attr.attr.name = tdata->attr_name[i]; tdata->sd_attrs[i].dev_attr.attr.mode = 0444; tdata->sd_attrs[i].dev_attr.show = rd_ptr[i]; - tdata->sd_attrs[i].index = attr_no; + tdata->sd_attrs[i].index = index; tdata->attrs[i] = &tdata->sd_attrs[i].dev_attr.attr; } tdata->attr_group.attrs = tdata->attrs; @@ -495,26 +502,25 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu, struct platform_data *pdata = platform_get_drvdata(pdev); struct cpuinfo_x86 *c = &cpu_data(cpu); u32 eax, edx; - int err, index, attr_no; + int err, index;
if (!housekeeping_cpu(cpu, HK_TYPE_MISC)) return 0;
/* - * Find attr number for sysfs: - * We map the attr number to core id of the CPU - * The attr number is always core id + 2 - * The Pkgtemp will always show up as temp1_*, if available + * Get the index of tdata in pdata->core_data[] + * tdata for package: pdata->core_data[1] + * tdata for core: pdata->core_data[2] .. pdata->core_data[NUM_REAL_CORES + 1] */ if (pkg_flag) { - attr_no = PKG_SYSFS_ATTR_NO; + index = PKG_SYSFS_ATTR_NO; } else { index = ida_alloc_max(&pdata->ida, NUM_REAL_CORES - 1, GFP_KERNEL); if (index < 0) return index;
pdata->cpu_map[index] = topology_core_id(cpu); - attr_no = index + BASE_SYSFS_ATTR_NO; + index += BASE_SYSFS_ATTR_NO; }
tdata = init_temp_data(cpu, pkg_flag); @@ -540,20 +546,20 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu, if (get_ttarget(tdata, &pdev->dev) >= 0) tdata->attr_size++;
- pdata->core_data[attr_no] = tdata; + pdata->core_data[index] = tdata;
/* Create sysfs interfaces */ - err = create_core_attrs(tdata, pdata->hwmon_dev, attr_no); + err = create_core_attrs(tdata, pdata->hwmon_dev, index); if (err) goto exit_free;
return 0; exit_free: - pdata->core_data[attr_no] = NULL; + pdata->core_data[index] = NULL; kfree(tdata); ida_free: if (!pkg_flag) - ida_free(&pdata->ida, index); + ida_free(&pdata->ida, index - BASE_SYSFS_ATTR_NO); return err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit eef00a82c568944f113f2de738156ac591bbd5cd ]
inet_recv_error() is called without holding the socket lock.
IPv6 socket could mutate to IPv4 with IPV6_ADDRFORM socket option and trigger a KCSAN warning.
Fixes: f4713a3dfad0 ("net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/af_inet.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 1c58bd72e124..e59962f34caa 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1628,10 +1628,12 @@ EXPORT_SYMBOL(inet_current_timestamp);
int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) { - if (sk->sk_family == AF_INET) + unsigned int family = READ_ONCE(sk->sk_family); + + if (family == AF_INET) return ip_recv_error(sk, msg, len, addr_len); #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family == AF_INET6) + if (family == AF_INET6) return pingv6_ops.ipv6_recv_error(sk, msg, len, addr_len); #endif return -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 47caa96478b99d6d1199b89467cc3e5a6cc754ee ]
This code prints the wrong variable in the warning message. It should print "i" instead of "info->offset". On the first iteration "info" is uninitialized leading to a crash and on subsequent iterations it prints the previous offset instead of the current one.
Fixes: e0f74ed4634d ("i915/gvt: Separate the MMIO tracking table from GVT-g") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/11957c20-b178-4027-9b0a-e32e959... Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/handlers.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index a9f7fa9b90bd..d30f8814d9b1 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -2850,8 +2850,7 @@ static int handle_mmio(struct intel_gvt_mmio_table_iter *iter, u32 offset, for (i = start; i < end; i += 4) { p = intel_gvt_find_mmio_info(gvt, i); if (p) { - WARN(1, "dup mmio definition offset %x\n", - info->offset); + WARN(1, "dup mmio definition offset %x\n", i);
/* We return -EEXIST here to make GVT-g load fail. * So duplicated MMIO can be found as soon as
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit f31041417bf7f4a4df8b3bfb52cb31bbe805b934 ]
In the Rx protocol, every packet generated is marked with a per-connection monotonically increasing serial number. This number can be referenced in an ACK packet generated in response to an incoming packet - thereby allowing the sender to use this for RTT determination, amongst other things.
However, if the reference field in the ACK is zero, it doesn't refer to any incoming packet (it could be a ping to find out if a packet got lost, for example) - so we shouldn't generate zero serial numbers.
Fix the generation of serial numbers to retry if it comes up with a zero.
Furthermore, since the serial numbers are only ever allocated within the I/O thread this connection is bound to, there's no need for atomics so remove that too.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: "David S. Miller" davem@davemloft.net cc: Eric Dumazet edumazet@google.com cc: Jakub Kicinski kuba@kernel.org cc: Paolo Abeni pabeni@redhat.com cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/ar-internal.h | 16 +++++++++++++++- net/rxrpc/conn_event.c | 2 +- net/rxrpc/output.c | 8 ++++---- net/rxrpc/proc.c | 2 +- net/rxrpc/rxkad.c | 4 ++-- 5 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index e8b43408136a..668fdc94b299 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -506,7 +506,7 @@ struct rxrpc_connection { enum rxrpc_call_completion completion; /* Completion condition */ s32 abort_code; /* Abort code of connection abort */ int debug_id; /* debug ID for printks */ - atomic_t serial; /* packet serial number counter */ + rxrpc_serial_t tx_serial; /* Outgoing packet serial number counter */ unsigned int hi_serial; /* highest serial number received */ u32 service_id; /* Service ID, possibly upgraded */ u32 security_level; /* Security level selected */ @@ -818,6 +818,20 @@ static inline bool rxrpc_sending_to_client(const struct rxrpc_txbuf *txb)
#include <trace/events/rxrpc.h>
+/* + * Allocate the next serial number on a connection. 0 must be skipped. + */ +static inline rxrpc_serial_t rxrpc_get_next_serial(struct rxrpc_connection *conn) +{ + rxrpc_serial_t serial; + + serial = conn->tx_serial; + if (serial == 0) + serial = 1; + conn->tx_serial = serial + 1; + return serial; +} + /* * af_rxrpc.c */ diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 95f4bc206b3d..ec5eae60ab0c 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -117,7 +117,7 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, iov[2].iov_base = &ack_info; iov[2].iov_len = sizeof(ack_info);
- serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn);
pkt.whdr.epoch = htonl(conn->proto.epoch); pkt.whdr.cid = htonl(conn->proto.cid | channel); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index a0906145e829..4a292f860ae3 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -216,7 +216,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) iov[0].iov_len = sizeof(txb->wire) + sizeof(txb->ack) + n; len = iov[0].iov_len;
- serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn); txb->wire.serial = htonl(serial); trace_rxrpc_tx_ack(call->debug_id, serial, ntohl(txb->ack.firstPacket), @@ -302,7 +302,7 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) iov[0].iov_base = &pkt; iov[0].iov_len = sizeof(pkt);
- serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn); pkt.whdr.serial = htonl(serial);
iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, sizeof(pkt)); @@ -334,7 +334,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) _enter("%x,{%d}", txb->seq, txb->len);
/* Each transmission of a Tx packet needs a new serial number */ - serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn); txb->wire.serial = htonl(serial);
if (test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) && @@ -558,7 +558,7 @@ void rxrpc_send_conn_abort(struct rxrpc_connection *conn)
len = iov[0].iov_len + iov[1].iov_len;
- serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn); whdr.serial = htonl(serial);
iov_iter_kvec(&msg.msg_iter, WRITE, iov, 2, len); diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 682636d3b060..208312c244f6 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -181,7 +181,7 @@ static int rxrpc_connection_seq_show(struct seq_file *seq, void *v) atomic_read(&conn->active), state, key_serial(conn->key), - atomic_read(&conn->serial), + conn->tx_serial, conn->hi_serial, conn->channels[0].call_id, conn->channels[1].call_id, diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index b52dedcebce0..6b32d61d4cdc 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -664,7 +664,7 @@ static int rxkad_issue_challenge(struct rxrpc_connection *conn)
len = iov[0].iov_len + iov[1].iov_len;
- serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn); whdr.serial = htonl(serial);
ret = kernel_sendmsg(conn->local->socket, &msg, iov, 2, len); @@ -721,7 +721,7 @@ static int rxkad_send_response(struct rxrpc_connection *conn,
len = iov[0].iov_len + iov[1].iov_len + iov[2].iov_len;
- serial = atomic_inc_return(&conn->serial); + serial = rxrpc_get_next_serial(conn); whdr.serial = htonl(serial);
rxrpc_local_dont_fragment(conn->local, false);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit e7870cf13d20f56bfc19f9c3e89707c69cf104ef ]
Fix the construction of delayed ACKs to not set the reference serial number as they can't be used as an RTT reference.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: "David S. Miller" davem@davemloft.net cc: Eric Dumazet edumazet@google.com cc: Jakub Kicinski kuba@kernel.org cc: Paolo Abeni pabeni@redhat.com cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/ar-internal.h | 1 - net/rxrpc/call_event.c | 6 +----- 2 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 668fdc94b299..f6375772fa93 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -692,7 +692,6 @@ struct rxrpc_call { /* Receive-phase ACK management (ACKs we send). */ u8 ackr_reason; /* reason to ACK */ u16 ackr_sack_base; /* Starting slot in SACK table ring */ - rxrpc_serial_t ackr_serial; /* serial of packet being ACK'd */ rxrpc_seq_t ackr_window; /* Base of SACK window */ rxrpc_seq_t ackr_wtop; /* Base of SACK window */ unsigned int ackr_nr_unacked; /* Number of unacked packets */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index e363f21a2014..c61efe08695d 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -43,8 +43,6 @@ void rxrpc_propose_delay_ACK(struct rxrpc_call *call, rxrpc_serial_t serial, unsigned long expiry = rxrpc_soft_ack_delay; unsigned long now = jiffies, ack_at;
- call->ackr_serial = serial; - if (rxrpc_soft_ack_delay < expiry) expiry = rxrpc_soft_ack_delay; if (call->peer->srtt_us != 0) @@ -373,7 +371,6 @@ static void rxrpc_send_initial_ping(struct rxrpc_call *call) bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) { unsigned long now, next, t; - rxrpc_serial_t ackr_serial; bool resend = false, expired = false; s32 abort_code;
@@ -423,8 +420,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) if (time_after_eq(now, t)) { trace_rxrpc_timer(call, rxrpc_timer_exp_ack, now); cmpxchg(&call->delay_ack_at, t, now + MAX_JIFFY_OFFSET); - ackr_serial = xchg(&call->ackr_serial, 0); - rxrpc_send_ACK(call, RXRPC_ACK_DELAY, ackr_serial, + rxrpc_send_ACK(call, RXRPC_ACK_DELAY, 0, rxrpc_propose_ack_ping_for_lost_ack); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 6f769f22822aa4124b556339781b04d810f0e038 ]
Stop rxrpc from sending a DUP ACK in response to a PING RESPONSE ACK on a dead call. We may have initiated the ping but the call may have beaten the response to completion.
Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor") Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: "David S. Miller" davem@davemloft.net cc: Eric Dumazet edumazet@google.com cc: Jakub Kicinski kuba@kernel.org cc: Paolo Abeni pabeni@redhat.com cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/conn_event.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index ec5eae60ab0c..1f251d758cb9 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -95,6 +95,14 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
_enter("%d", conn->debug_id);
+ if (sp && sp->hdr.type == RXRPC_PACKET_TYPE_ACK) { + if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header), + &pkt.ack, sizeof(pkt.ack)) < 0) + return; + if (pkt.ack.reason == RXRPC_ACK_PING_RESPONSE) + return; + } + chan = &conn->channels[channel];
/* If the last call got moved on whilst we were waiting to run, just
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 41b7fa157ea1c8c3a575ca7f5f32034de9bee3ae ]
Fix the counting of new acks and nacks when parsing a packet - something that is used in congestion control.
As the code stands, it merely notes if there are any nacks whereas what we really should do is compare the previous SACK table to the new one, assuming we get two successive ACK packets with nacks in them. However, we really don't want to do that if we can avoid it as the tables might not correspond directly as one may be shifted from the other - something that will only get harder to deal with once extended ACK tables come into full use (with a capacity of up to 8192).
Instead, count the number of nacks shifted out of the old SACK, the number of nacks retained in the portion still active and the number of new acks and nacks in the new table then calculate what we need.
Note this ends up a bit of an estimate as the Rx protocol allows acks to be withdrawn by the receiver and packets requested to be retransmitted.
Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs") Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: "David S. Miller" davem@davemloft.net cc: Eric Dumazet edumazet@google.com cc: Jakub Kicinski kuba@kernel.org cc: Paolo Abeni pabeni@redhat.com cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/rxrpc.h | 8 ++- net/rxrpc/ar-internal.h | 20 ++++-- net/rxrpc/call_event.c | 6 +- net/rxrpc/call_object.c | 1 + net/rxrpc/input.c | 115 +++++++++++++++++++++++++++++------ 5 files changed, 122 insertions(+), 28 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index f7e537f64db4..0dd4a21d172d 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -128,6 +128,7 @@ EM(rxrpc_skb_eaten_by_unshare_nomem, "ETN unshar-nm") \ EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \ EM(rxrpc_skb_get_conn_work, "GET conn-work") \ + EM(rxrpc_skb_get_last_nack, "GET last-nack") \ EM(rxrpc_skb_get_local_work, "GET locl-work") \ EM(rxrpc_skb_get_reject_work, "GET rej-work ") \ EM(rxrpc_skb_get_to_recvmsg, "GET to-recv ") \ @@ -141,6 +142,7 @@ EM(rxrpc_skb_put_error_report, "PUT error-rep") \ EM(rxrpc_skb_put_input, "PUT input ") \ EM(rxrpc_skb_put_jumbo_subpacket, "PUT jumbo-sub") \ + EM(rxrpc_skb_put_last_nack, "PUT last-nack") \ EM(rxrpc_skb_put_purge, "PUT purge ") \ EM(rxrpc_skb_put_rotate, "PUT rotate ") \ EM(rxrpc_skb_put_unknown, "PUT unknown ") \ @@ -1549,7 +1551,7 @@ TRACE_EVENT(rxrpc_congest, memcpy(&__entry->sum, summary, sizeof(__entry->sum)); ),
- TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u r=%u b=%u u=%u d=%u l=%x%s%s%s", + TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u,%u b=%u u=%u d=%u l=%x%s%s%s", __entry->call, __entry->ack_serial, __print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names), @@ -1557,9 +1559,9 @@ TRACE_EVENT(rxrpc_congest, __print_symbolic(__entry->sum.mode, rxrpc_congest_modes), __entry->sum.cwnd, __entry->sum.ssthresh, - __entry->sum.nr_acks, __entry->sum.saw_nacks, + __entry->sum.nr_acks, __entry->sum.nr_retained_nacks, __entry->sum.nr_new_acks, - __entry->sum.nr_rot_new_acks, + __entry->sum.nr_new_nacks, __entry->top - __entry->hard_ack, __entry->sum.cumulative_acks, __entry->sum.dup_acks, diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index f6375772fa93..bda3f6690b32 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -198,11 +198,19 @@ struct rxrpc_host_header { */ struct rxrpc_skb_priv { struct rxrpc_connection *conn; /* Connection referred to (poke packet) */ - u16 offset; /* Offset of data */ - u16 len; /* Length of data */ - u8 flags; + union { + struct { + u16 offset; /* Offset of data */ + u16 len; /* Length of data */ + u8 flags; #define RXRPC_RX_VERIFIED 0x01 - + }; + struct { + rxrpc_seq_t first_ack; /* First packet in acks table */ + u8 nr_acks; /* Number of acks+nacks */ + u8 nr_nacks; /* Number of nacks */ + }; + }; struct rxrpc_host_header hdr; /* RxRPC packet header from this packet */ };
@@ -688,6 +696,7 @@ struct rxrpc_call { u8 cong_dup_acks; /* Count of ACKs showing missing packets */ u8 cong_cumul_acks; /* Cumulative ACK count */ ktime_t cong_tstamp; /* Last time cwnd was changed */ + struct sk_buff *cong_last_nack; /* Last ACK with nacks received */
/* Receive-phase ACK management (ACKs we send). */ u8 ackr_reason; /* reason to ACK */ @@ -725,7 +734,8 @@ struct rxrpc_call { struct rxrpc_ack_summary { u16 nr_acks; /* Number of ACKs in packet */ u16 nr_new_acks; /* Number of new ACKs in packet */ - u16 nr_rot_new_acks; /* Number of rotated new ACKs */ + u16 nr_new_nacks; /* Number of new nacks in packet */ + u16 nr_retained_nacks; /* Number of nacks retained between ACKs */ u8 ack_reason; bool saw_nacks; /* Saw NACKs in packet */ bool new_low_nack; /* T if new low NACK found */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index c61efe08695d..0f78544d043b 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -112,6 +112,7 @@ static void rxrpc_congestion_timeout(struct rxrpc_call *call) void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) { struct rxrpc_ackpacket *ack = NULL; + struct rxrpc_skb_priv *sp; struct rxrpc_txbuf *txb; unsigned long resend_at; rxrpc_seq_t transmitted = READ_ONCE(call->tx_transmitted); @@ -139,14 +140,15 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) * explicitly NAK'd packets. */ if (ack_skb) { + sp = rxrpc_skb(ack_skb); ack = (void *)ack_skb->data + sizeof(struct rxrpc_wire_header);
- for (i = 0; i < ack->nAcks; i++) { + for (i = 0; i < sp->nr_acks; i++) { rxrpc_seq_t seq;
if (ack->acks[i] & 1) continue; - seq = ntohl(ack->firstPacket) + i; + seq = sp->first_ack + i; if (after(txb->seq, transmitted)) break; if (after(txb->seq, seq)) diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index f10b37c14772..0a50341d920a 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -685,6 +685,7 @@ static void rxrpc_destroy_call(struct work_struct *work)
del_timer_sync(&call->timer);
+ rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); rxrpc_cleanup_ring(call); while ((txb = list_first_entry_or_null(&call->tx_sendmsg, struct rxrpc_txbuf, call_link))) { diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 92495e73b869..9691de00ade7 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -45,11 +45,9 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, }
cumulative_acks += summary->nr_new_acks; - cumulative_acks += summary->nr_rot_new_acks; if (cumulative_acks > 255) cumulative_acks = 255;
- summary->mode = call->cong_mode; summary->cwnd = call->cong_cwnd; summary->ssthresh = call->cong_ssthresh; summary->cumulative_acks = cumulative_acks; @@ -151,6 +149,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call, cwnd = RXRPC_TX_MAX_WINDOW; call->cong_cwnd = cwnd; call->cong_cumul_acks = cumulative_acks; + summary->mode = call->cong_mode; trace_rxrpc_congest(call, summary, acked_serial, change); if (resend) rxrpc_resend(call, skb); @@ -213,7 +212,6 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, list_for_each_entry_rcu(txb, &call->tx_buffer, call_link, false) { if (before_eq(txb->seq, call->acks_hard_ack)) continue; - summary->nr_rot_new_acks++; if (test_bit(RXRPC_TXBUF_LAST, &txb->flags)) { set_bit(RXRPC_CALL_TX_LAST, &call->flags); rot_last = true; @@ -254,6 +252,11 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, { ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags));
+ if (unlikely(call->cong_last_nack)) { + rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); + call->cong_last_nack = NULL; + } + switch (__rxrpc_call_state(call)) { case RXRPC_CALL_CLIENT_SEND_REQUEST: case RXRPC_CALL_CLIENT_AWAIT_REPLY: @@ -702,6 +705,43 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb, wake_up(&call->waitq); }
+/* + * Determine how many nacks from the previous ACK have now been satisfied. + */ +static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + rxrpc_seq_t seq) +{ + struct sk_buff *skb = call->cong_last_nack; + struct rxrpc_ackpacket ack; + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); + unsigned int i, new_acks = 0, retained_nacks = 0; + rxrpc_seq_t old_seq = sp->first_ack; + u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(ack); + + if (after_eq(seq, old_seq + sp->nr_acks)) { + summary->nr_new_acks += sp->nr_nacks; + summary->nr_new_acks += seq - (old_seq + sp->nr_acks); + summary->nr_retained_nacks = 0; + } else if (seq == old_seq) { + summary->nr_retained_nacks = sp->nr_nacks; + } else { + for (i = 0; i < sp->nr_acks; i++) { + if (acks[i] == RXRPC_ACK_TYPE_NACK) { + if (before(old_seq + i, seq)) + new_acks++; + else + retained_nacks++; + } + } + + summary->nr_new_acks += new_acks; + summary->nr_retained_nacks = retained_nacks; + } + + return old_seq + sp->nr_acks; +} + /* * Process individual soft ACKs. * @@ -711,25 +751,51 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb, * the timer on the basis that the peer might just not have processed them at * the time the ACK was sent. */ -static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks, - rxrpc_seq_t seq, int nr_acks, - struct rxrpc_ack_summary *summary) +static void rxrpc_input_soft_acks(struct rxrpc_call *call, + struct rxrpc_ack_summary *summary, + struct sk_buff *skb, + rxrpc_seq_t seq, + rxrpc_seq_t since) { - unsigned int i; + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); + unsigned int i, old_nacks = 0; + rxrpc_seq_t lowest_nak = seq + sp->nr_acks; + u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket);
- for (i = 0; i < nr_acks; i++) { + for (i = 0; i < sp->nr_acks; i++) { if (acks[i] == RXRPC_ACK_TYPE_ACK) { summary->nr_acks++; - summary->nr_new_acks++; + if (after_eq(seq, since)) + summary->nr_new_acks++; } else { - if (!summary->saw_nacks && - call->acks_lowest_nak != seq + i) { - call->acks_lowest_nak = seq + i; - summary->new_low_nack = true; - } summary->saw_nacks = true; + if (before(seq, since)) { + /* Overlap with previous ACK */ + old_nacks++; + } else { + summary->nr_new_nacks++; + sp->nr_nacks++; + } + + if (before(seq, lowest_nak)) + lowest_nak = seq; } + seq++; + } + + if (lowest_nak != call->acks_lowest_nak) { + call->acks_lowest_nak = lowest_nak; + summary->new_low_nack = true; } + + /* We *can* have more nacks than we did - the peer is permitted to drop + * packets it has soft-acked and re-request them. Further, it is + * possible for the nack distribution to change whilst the number of + * nacks stays the same or goes down. + */ + if (old_nacks < summary->nr_retained_nacks) + summary->nr_new_acks += summary->nr_retained_nacks - old_nacks; + summary->nr_retained_nacks = old_nacks; }
/* @@ -773,7 +839,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct rxrpc_ackinfo info; rxrpc_serial_t ack_serial, acked_serial; - rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt; + rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt, since; int nr_acks, offset, ioffset;
_enter(""); @@ -789,6 +855,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) prev_pkt = ntohl(ack.previousPacket); hard_ack = first_soft_ack - 1; nr_acks = ack.nAcks; + sp->first_ack = first_soft_ack; + sp->nr_acks = nr_acks; summary.ack_reason = (ack.reason < RXRPC_ACK__INVALID ? ack.reason : RXRPC_ACK__INVALID);
@@ -858,6 +926,16 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (nr_acks > 0) skb_condense(skb);
+ if (call->cong_last_nack) { + since = rxrpc_input_check_prev_ack(call, &summary, first_soft_ack); + rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); + call->cong_last_nack = NULL; + } else { + summary.nr_new_acks = first_soft_ack - call->acks_first_seq; + call->acks_lowest_nak = first_soft_ack + nr_acks; + since = first_soft_ack; + } + call->acks_latest_ts = skb->tstamp; call->acks_first_seq = first_soft_ack; call->acks_prev_seq = prev_pkt; @@ -866,7 +944,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) case RXRPC_ACK_PING: break; default: - if (after(acked_serial, call->acks_highest_serial)) + if (acked_serial && after(acked_serial, call->acks_highest_serial)) call->acks_highest_serial = acked_serial; break; } @@ -905,8 +983,9 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) if (nr_acks > 0) { if (offset > (int)skb->len - nr_acks) return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_short_sack); - rxrpc_input_soft_acks(call, skb->data + offset, first_soft_ack, - nr_acks, &summary); + rxrpc_input_soft_acks(call, &summary, skb, first_soft_ack, since); + rxrpc_get_skb(skb, rxrpc_skb_get_last_nack); + call->cong_last_nack = skb; }
if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) &&
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit a19747c3b9bf6476cc36d0a3a5ef0ff92999169e ]
In very slow environments, most big TCP cases including segmentation and reassembly of big TCP packets have a good chance to fail: by default the TCP client uses write size well below 64K. If the host is low enough autocorking is unable to build real big TCP packets.
Address the issue using much larger write operations.
Note that is hard to observe the issue without an extremely slow and/or overloaded environment; reduce the TCP transfer time to allow for much easier/faster reproducibility.
Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/big_tcp.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh index cde9a91c4797..2db9d15cd45f 100755 --- a/tools/testing/selftests/net/big_tcp.sh +++ b/tools/testing/selftests/net/big_tcp.sh @@ -122,7 +122,9 @@ do_netperf() { local netns=$1
[ "$NF" = "6" ] && serip=$SERVER_IP6 - ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null + + # use large write to be sure to generate big tcp packets + ip net exec $netns netperf -$NF -t TCP_STREAM -l 1 -H $serip -- -m 262144 2>&1 >/dev/null }
do_test() {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 3871aa01e1a779d866fa9dfdd5a836f342f4eb87 ]
syzbot reported the following general protection fault [1]:
general protection fault, probably for non-canonical address 0xdffffc0000000010: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000080-0x0000000000000087] ... RIP: 0010:tipc_udp_is_known_peer+0x9c/0x250 net/tipc/udp_media.c:291 ... Call Trace: <TASK> tipc_udp_nl_bearer_add+0x212/0x2f0 net/tipc/udp_media.c:646 tipc_nl_bearer_add+0x21e/0x360 net/tipc/bearer.c:1089 genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972 genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline] genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2544 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1367 netlink_sendmsg+0x8b7/0xd70 net/netlink/af_netlink.c:1909 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xd5/0x180 net/socket.c:745 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2584 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2638 __sys_sendmsg+0x117/0x1e0 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
The cause of this issue is that when tipc_nl_bearer_add() is called with the TIPC_NLA_BEARER_UDP_OPTS attribute, tipc_udp_nl_bearer_add() is called even if the bearer is not UDP.
tipc_udp_is_known_peer() called by tipc_udp_nl_bearer_add() assumes that the media_ptr field of the tipc_bearer has an udp_bearer type object, so the function goes crazy for non-UDP bearers.
This patch fixes the issue by checking the bearer type before calling tipc_udp_nl_bearer_add() in tipc_nl_bearer_add().
Fixes: ef20cd4dd163 ("tipc: introduce UDP replicast") Reported-and-tested-by: syzbot+5142b87a9abc510e14fa@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5142b87a9abc510e14fa [1] Signed-off-by: Shigeru Yoshida syoshida@redhat.com Reviewed-by: Tung Nguyen tung.q.nguyen@dektech.com.au Link: https://lore.kernel.org/r/20240131152310.4089541-1-syoshida@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/bearer.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 2cde375477e3..878415c43527 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -1086,6 +1086,12 @@ int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info)
#ifdef CONFIG_TIPC_MEDIA_UDP if (attrs[TIPC_NLA_BEARER_UDP_OPTS]) { + if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) { + rtnl_unlock(); + NL_SET_ERR_MSG(info->extack, "UDP option is unsupported"); + return -EINVAL; + } + err = tipc_udp_nl_bearer_add(b, attrs[TIPC_NLA_BEARER_UDP_OPTS]); if (err) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1279f9d9dec2d7462823a18c29ad61359e0a007d ]
syzbot reported a warning [0] in __unix_gc() with a repro, which creates a socketpair and sends one socket's fd to itself using the peer.
socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\360", iov_len=1}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[3]}], msg_controllen=24, msg_flags=0}, MSG_OOB|MSG_PROBE|MSG_DONTWAIT|MSG_ZEROCOPY) = 1
This forms a self-cyclic reference that GC should finally untangle but does not due to lack of MSG_OOB handling, resulting in memory leak.
Recently, commit 11498715f266 ("af_unix: Remove io_uring code for GC.") removed io_uring's dead code in GC and revealed the problem.
The code was executed at the final stage of GC and unconditionally moved all GC candidates from gc_candidates to gc_inflight_list. That papered over the reported problem by always making the following WARN_ON_ONCE(!list_empty(&gc_candidates)) false.
The problem has been there since commit 2aab4b969002 ("af_unix: fix struct pid leaks in OOB support") added full scm support for MSG_OOB while fixing another bug.
To fix this problem, we must call kfree_skb() for unix_sk(sk)->oob_skb if the socket still exists in gc_candidates after purging collected skb.
Then, we need to set NULL to oob_skb before calling kfree_skb() because it calls last fput() and triggers unix_release_sock(), where we call duplicate kfree_skb(u->oob_skb) if not NULL.
Note that the leaked socket remained being linked to a global list, so kmemleak also could not detect it. We need to check /proc/net/protocol to notice the unfreed socket.
[0]: WARNING: CPU: 0 PID: 2863 at net/unix/garbage.c:345 __unix_gc+0xc74/0xe80 net/unix/garbage.c:345 Modules linked in: CPU: 0 PID: 2863 Comm: kworker/u4:11 Not tainted 6.8.0-rc1-syzkaller-00583-g1701940b1a02 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Workqueue: events_unbound __unix_gc RIP: 0010:__unix_gc+0xc74/0xe80 net/unix/garbage.c:345 Code: 8b 5c 24 50 e9 86 f8 ff ff e8 f8 e4 22 f8 31 d2 48 c7 c6 30 6a 69 89 4c 89 ef e8 97 ef ff ff e9 80 f9 ff ff e8 dd e4 22 f8 90 <0f> 0b 90 e9 7b fd ff ff 48 89 df e8 5c e7 7c f8 e9 d3 f8 ff ff e8 RSP: 0018:ffffc9000b03fba0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffc9000b03fc10 RCX: ffffffff816c493e RDX: ffff88802c02d940 RSI: ffffffff896982f3 RDI: ffffc9000b03fb30 RBP: ffffc9000b03fce0 R08: 0000000000000001 R09: fffff52001607f66 R10: 0000000000000003 R11: 0000000000000002 R12: dffffc0000000000 R13: ffffc9000b03fc10 R14: ffffc9000b03fc10 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005559c8677a60 CR3: 000000000d57a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> process_one_work+0x889/0x15e0 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x8b9/0x12a0 kernel/workqueue.c:2787 kthread+0x2c6/0x3b0 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 </TASK>
Reported-by: syzbot+fa3ef895554bdbfd1183@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa3ef895554bdbfd1183 Fixes: 2aab4b969002 ("af_unix: fix struct pid leaks in OOB support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240203183149.63573-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/garbage.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2405f0f9af31..8f63f0b4bf01 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -314,6 +314,17 @@ void unix_gc(void) /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist);
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + list_for_each_entry_safe(u, next, &gc_candidates, link) { + struct sk_buff *skb = u->oob_skb; + + if (skb) { + u->oob_skb = NULL; + kfree_skb(skb); + } + } +#endif + spin_lock(&unix_gc_lock);
/* There could be io_uring registered files, just push them back to
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit cb88cb53badb8aeb3955ad6ce80b07b598e310b8 ]
syzbot triggered a warning [1] in __alloc_pages():
WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)
Willem fixed a similar issue in commit c0a2a1b0d631 ("ppp: limit MRU to 64K")
Adopt the same sanity check for ppp_async_ioctl(PPPIOCSMRU)
[1]:
WARNING: CPU: 1 PID: 11 at mm/page_alloc.c:4543 __alloc_pages+0x308/0x698 mm/page_alloc.c:4543 Modules linked in: CPU: 1 PID: 11 Comm: kworker/u4:0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98fb79 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Workqueue: events_unbound flush_to_ldisc pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __alloc_pages+0x308/0x698 mm/page_alloc.c:4543 lr : __alloc_pages+0xc8/0x698 mm/page_alloc.c:4537 sp : ffff800093967580 x29: ffff800093967660 x28: ffff8000939675a0 x27: dfff800000000000 x26: ffff70001272ceb4 x25: 0000000000000000 x24: ffff8000939675c0 x23: 0000000000000000 x22: 0000000000060820 x21: 1ffff0001272ceb8 x20: ffff8000939675e0 x19: 0000000000000010 x18: ffff800093967120 x17: ffff800083bded5c x16: ffff80008ac97500 x15: 0000000000000005 x14: 1ffff0001272cebc x13: 0000000000000000 x12: 0000000000000000 x11: ffff70001272cec1 x10: 1ffff0001272cec0 x9 : 0000000000000001 x8 : ffff800091c91000 x7 : 0000000000000000 x6 : 000000000000003f x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000020 x2 : 0000000000000008 x1 : 0000000000000000 x0 : ffff8000939675e0 Call trace: __alloc_pages+0x308/0x698 mm/page_alloc.c:4543 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] __kmalloc_large_node+0xbc/0x1fc mm/slub.c:3926 __do_kmalloc_node mm/slub.c:3969 [inline] __kmalloc_node_track_caller+0x418/0x620 mm/slub.c:4001 kmalloc_reserve+0x17c/0x23c net/core/skbuff.c:590 __alloc_skb+0x1c8/0x3d8 net/core/skbuff.c:651 __netdev_alloc_skb+0xb8/0x3e8 net/core/skbuff.c:715 netdev_alloc_skb include/linux/skbuff.h:3235 [inline] dev_alloc_skb include/linux/skbuff.h:3248 [inline] ppp_async_input drivers/net/ppp/ppp_async.c:863 [inline] ppp_asynctty_receive+0x588/0x186c drivers/net/ppp/ppp_async.c:341 tty_ldisc_receive_buf+0x12c/0x15c drivers/tty/tty_buffer.c:390 tty_port_default_receive_buf+0x74/0xac drivers/tty/tty_port.c:37 receive_buf drivers/tty/tty_buffer.c:444 [inline] flush_to_ldisc+0x284/0x6e4 drivers/tty/tty_buffer.c:494 process_one_work+0x694/0x1204 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x938/0xef4 kernel/workqueue.c:2787 kthread+0x288/0x310 kernel/kthread.c:388 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+c5da1f087c9e4ec6c933@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20240205171004.1059724-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ppp/ppp_async.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index fbaaa8c102a1..e94a4b08fd63 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -460,6 +460,10 @@ ppp_async_ioctl(struct ppp_channel *chan, unsigned int cmd, unsigned long arg) case PPPIOCSMRU: if (get_user(val, p)) break; + if (val > U16_MAX) { + err = -EINVAL; + break; + } if (val < PPP_MRU) val = PPP_MRU; ap->mru = val;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 4b00d0c513da58b68df015968721b11396fe4ab3 ]
cmsg_ipv6 test requests tcpdump to capture 4 packets, and sends until tcpdump quits. Only the first packet is "real", however, and the rest are basic UDP packets. So if tcpdump doesn't start in time it will miss the real packet and only capture the UDP ones.
This makes the test fail on slow machine (no KVM or with debug enabled) 100% of the time, while it passes in fast environments.
Repeat the "real" / expected packet.
Fixes: 9657ad09e1fa ("selftests: net: test IPV6_TCLASS") Fixes: 05ae83d5a4a2 ("selftests: net: test IPV6_HOPLIMIT") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/cmsg_ipv6.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/cmsg_ipv6.sh b/tools/testing/selftests/net/cmsg_ipv6.sh index 330d0b1ceced..c921750ca118 100755 --- a/tools/testing/selftests/net/cmsg_ipv6.sh +++ b/tools/testing/selftests/net/cmsg_ipv6.sh @@ -91,7 +91,7 @@ for ovr in setsock cmsg both diff; do check_result $? 0 "TCLASS $prot $ovr - pass"
while [ -d /proc/$BG ]; do - $NSEXE ./cmsg_sender -6 -p u $TGT6 1234 + $NSEXE ./cmsg_sender -6 -p $p $m $((TOS2)) $TGT6 1234 done
tcpdump -r $TMPF -v 2>&1 | grep "class $TOS2" >> /dev/null @@ -128,7 +128,7 @@ for ovr in setsock cmsg both diff; do check_result $? 0 "HOPLIMIT $prot $ovr - pass"
while [ -d /proc/$BG ]; do - $NSEXE ./cmsg_sender -6 -p u $TGT6 1234 + $NSEXE ./cmsg_sender -6 -p $p $m $LIM $TGT6 1234 done
tcpdump -r $TMPF -v 2>&1 | grep "hlim $LIM[^0-9]" >> /dev/null
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 36fa8d697132b4bed2312d700310e8a78b000c84 ]
xt_find_revision() expects u8, restrict it to this datatype.
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_compat.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index f0eeda97bfcd..001b6841a4b6 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -135,7 +135,7 @@ static void nft_target_eval_bridge(const struct nft_expr *expr,
static const struct nla_policy nft_target_policy[NFTA_TARGET_MAX + 1] = { [NFTA_TARGET_NAME] = { .type = NLA_NUL_STRING }, - [NFTA_TARGET_REV] = { .type = NLA_U32 }, + [NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255), [NFTA_TARGET_INFO] = { .type = NLA_BINARY }, };
@@ -419,7 +419,7 @@ static void nft_match_eval(const struct nft_expr *expr,
static const struct nla_policy nft_match_policy[NFTA_MATCH_MAX + 1] = { [NFTA_MATCH_NAME] = { .type = NLA_NUL_STRING }, - [NFTA_MATCH_REV] = { .type = NLA_U32 }, + [NFTA_MATCH_REV] = NLA_POLICY_MAX(NLA_BE32, 255), [NFTA_MATCH_INFO] = { .type = NLA_BINARY }, };
@@ -724,7 +724,7 @@ static int nfnl_compat_get_rcu(struct sk_buff *skb, static const struct nla_policy nfnl_compat_policy_get[NFTA_COMPAT_MAX+1] = { [NFTA_COMPAT_NAME] = { .type = NLA_NUL_STRING, .len = NFT_COMPAT_NAME_MAX-1 }, - [NFTA_COMPAT_REV] = { .type = NLA_U32 }, + [NFTA_COMPAT_REV] = NLA_POLICY_MAX(NLA_BE32, 255), [NFTA_COMPAT_TYPE] = { .type = NLA_U32 }, };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 292781c3c5485ce33bd22b2ef1b2bed709b4d672 ]
Flag (1 << 0) is ignored is set, never used, reject it it with EINVAL instead.
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/netfilter/nf_tables.h | 2 ++ net/netfilter/nft_compat.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index ca30232b7bc8..117c6a9b845b 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -285,9 +285,11 @@ enum nft_rule_attributes { /** * enum nft_rule_compat_flags - nf_tables rule compat flags * + * @NFT_RULE_COMPAT_F_UNUSED: unused * @NFT_RULE_COMPAT_F_INV: invert the check result */ enum nft_rule_compat_flags { + NFT_RULE_COMPAT_F_UNUSED = (1 << 0), NFT_RULE_COMPAT_F_INV = (1 << 1), NFT_RULE_COMPAT_F_MASK = NFT_RULE_COMPAT_F_INV, }; diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index 001b6841a4b6..ed71d5ecbe0a 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -212,7 +212,8 @@ static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv) return -EINVAL;
flags = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_FLAGS])); - if (flags & ~NFT_RULE_COMPAT_F_MASK) + if (flags & NFT_RULE_COMPAT_F_UNUSED || + flags & ~NFT_RULE_COMPAT_F_MASK) return -EINVAL; if (flags & NFT_RULE_COMPAT_F_INV) *inv = true;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit d694b754894c93fb4d71a7f3699439dec111decc ]
xt_check_{match,target} expects u16, but NFTA_RULE_COMPAT_PROTO is u32.
NLA_POLICY_MAX(NLA_BE32, 65535) cannot be used because .max in nla_policy is s16, see 3e48be05f3c7 ("netlink: add attribute range validation to policy").
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_compat.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index ed71d5ecbe0a..1f9474fefe84 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -200,6 +200,7 @@ static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1] static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv) { struct nlattr *tb[NFTA_RULE_COMPAT_MAX+1]; + u32 l4proto; u32 flags; int err;
@@ -218,7 +219,12 @@ static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv) if (flags & NFT_RULE_COMPAT_F_INV) *inv = true;
- *proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO])); + l4proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO])); + if (l4proto > U16_MAX) + return -EINVAL; + + *proto = l4proto; + return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
[ Upstream commit e96fddb32931d007db12b1fce9b5e8e4c080401b ]
'panel_cntl' structure used to control the display panel could be null, dereferencing it could lead to a null pointer access.
Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn21/dcn21_hwseq.c:269 dcn21_set_backlight_level() error: we previously assumed 'panel_cntl' could be null (see line 250)
Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call backs.") Cc: Yongqiang Sun yongqiang.sun@amd.com Cc: Anthony Koo Anthony.Koo@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Anthony Koo Anthony.Koo@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../drm/amd/display/dc/dcn21/dcn21_hwseq.c | 39 ++++++++++--------- 1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c index f99b1bc49694..7238930e6383 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c @@ -237,34 +237,35 @@ bool dcn21_set_backlight_level(struct pipe_ctx *pipe_ctx, { struct dc_context *dc = pipe_ctx->stream->ctx; struct abm *abm = pipe_ctx->stream_res.abm; + struct timing_generator *tg = pipe_ctx->stream_res.tg; struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl; + uint32_t otg_inst; + + if (!abm && !tg && !panel_cntl) + return false; + + otg_inst = tg->inst;
if (dc->dc->res_pool->dmcu) { dce110_set_backlight_level(pipe_ctx, backlight_pwm_u16_16, frame_ramp); return true; }
- if (abm != NULL) { - uint32_t otg_inst = pipe_ctx->stream_res.tg->inst; - - if (abm && panel_cntl) { - if (abm->funcs && abm->funcs->set_pipe_ex) { - abm->funcs->set_pipe_ex(abm, - otg_inst, - SET_ABM_PIPE_NORMAL, - panel_cntl->inst, - panel_cntl->pwrseq_inst); - } else { - dmub_abm_set_pipe(abm, - otg_inst, - SET_ABM_PIPE_NORMAL, - panel_cntl->inst, - panel_cntl->pwrseq_inst); - } - } + if (abm->funcs && abm->funcs->set_pipe_ex) { + abm->funcs->set_pipe_ex(abm, + otg_inst, + SET_ABM_PIPE_NORMAL, + panel_cntl->inst, + panel_cntl->pwrseq_inst); + } else { + dmub_abm_set_pipe(abm, + otg_inst, + SET_ABM_PIPE_NORMAL, + panel_cntl->inst, + panel_cntl->pwrseq_inst); }
- if (abm && abm->funcs && abm->funcs->set_backlight_level_pwm) + if (abm->funcs && abm->funcs->set_backlight_level_pwm) abm->funcs->set_backlight_level_pwm(abm, backlight_pwm_u16_16, frame_ramp, 0, panel_cntl->inst); else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
[ Upstream commit 66951d98d9bf45ba25acf37fe0747253fafdf298 ]
In "u32 otg_inst = pipe_ctx->stream_res.tg->inst;" pipe_ctx->stream_res.tg could be NULL, it is relying on the caller to ensure the tg is not NULL.
Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call backs.") Cc: Yongqiang Sun yongqiang.sun@amd.com Cc: Anthony Koo Anthony.Koo@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Anthony Koo Anthony.Koo@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../drm/amd/display/dc/dcn21/dcn21_hwseq.c | 24 +++++++++++-------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c index 7238930e6383..1b08749b084b 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_hwseq.c @@ -206,28 +206,32 @@ void dcn21_set_abm_immediate_disable(struct pipe_ctx *pipe_ctx) void dcn21_set_pipe(struct pipe_ctx *pipe_ctx) { struct abm *abm = pipe_ctx->stream_res.abm; - uint32_t otg_inst = pipe_ctx->stream_res.tg->inst; + struct timing_generator *tg = pipe_ctx->stream_res.tg; struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl; struct dmcu *dmcu = pipe_ctx->stream->ctx->dc->res_pool->dmcu; + uint32_t otg_inst; + + if (!abm && !tg && !panel_cntl) + return; + + otg_inst = tg->inst;
if (dmcu) { dce110_set_pipe(pipe_ctx); return; }
- if (abm && panel_cntl) { - if (abm->funcs && abm->funcs->set_pipe_ex) { - abm->funcs->set_pipe_ex(abm, + if (abm->funcs && abm->funcs->set_pipe_ex) { + abm->funcs->set_pipe_ex(abm, otg_inst, SET_ABM_PIPE_NORMAL, panel_cntl->inst, panel_cntl->pwrseq_inst); - } else { - dmub_abm_set_pipe(abm, otg_inst, - SET_ABM_PIPE_NORMAL, - panel_cntl->inst, - panel_cntl->pwrseq_inst); - } + } else { + dmub_abm_set_pipe(abm, otg_inst, + SET_ABM_PIPE_NORMAL, + panel_cntl->inst, + panel_cntl->pwrseq_inst); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
[ Upstream commit 58fca355ad37dcb5f785d9095db5f748b79c5dc2 ]
'stream_enc_regs' array is an array of dcn10_stream_enc_registers structures. The array is initialized with four elements, corresponding to the four calls to stream_enc_regs() in the array initializer. This means that valid indices for this array are 0, 1, 2, and 3.
The error message 'stream_enc_regs' 4 <= 5 below, is indicating that there is an attempt to access this array with an index of 5, which is out of bounds. This could lead to undefined behavior
Here, eng_id is used as an index to access the stream_enc_regs array. If eng_id is 5, this would result in an out-of-bounds access on the stream_enc_regs array.
Thus fixing Buffer overflow error in dcn301_stream_encoder_create reported by Smatch: drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn301/dcn301_resource.c:1011 dcn301_stream_encoder_create() error: buffer overflow 'stream_enc_regs' 4 <= 5
Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)") Cc: Roman Li Roman.Li@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Roman Li roman.li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c index 79d6697d13b6..9485fda890cd 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c @@ -996,7 +996,7 @@ static struct stream_encoder *dcn301_stream_encoder_create(enum engine_id eng_id vpg = dcn301_vpg_create(ctx, vpg_inst); afmt = dcn301_afmt_create(ctx, afmt_inst);
- if (!enc1 || !vpg || !afmt) { + if (!enc1 || !vpg || !afmt || eng_id >= ARRAY_SIZE(stream_enc_regs)) { kfree(enc1); kfree(vpg); kfree(afmt);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 38ed1c7062ada30d7c11e7a7acc749bf27aa14aa ]
Direction attribute is ignored, reject it in case this ever needs to be supported
Fixes: 3087c3f7c23b ("netfilter: nft_ct: Add ct id support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index aac98a3c966e..bfd3e5a14dab 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -476,6 +476,9 @@ static int nft_ct_get_init(const struct nft_ctx *ctx, break; #endif case NFT_CT_ID: + if (tb[NFTA_CT_DIRECTION]) + return -EINVAL; + len = sizeof(u32); break; default:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 76313d1a4aa9e30d5b43dee5efd8bcd4d8250006 ]
Pipapo needs a scratchpad area to keep state during matching. This state can be large and thus cannot reside on stack.
Each set preallocates percpu areas for this.
On each match stage, one scratchpad half starts with all-zero and the other is inited to all-ones.
At the end of each stage, the half that starts with all-ones is always zero. Before next field is tested, pointers to the two halves are swapped, i.e. resmap pointer turns into fill pointer and vice versa.
After the last field has been processed, pipapo stashes the index toggle in a percpu variable, with assumption that next packet will start with the all-zero half and sets all bits in the other to 1.
This isn't reliable.
There can be multiple sets and we can't be sure that the upper and lower half of all set scratch map is always in sync (lookups can be conditional), so one set might have swapped, but other might not have been queried.
Thus we need to keep the index per-set-and-cpu, just like the scratchpad.
Note that this bug fix is incomplete, there is a related issue.
avx2 and normal implementation might use slightly different areas of the map array space due to the avx2 alignment requirements, so m->scratch (generic/fallback implementation) and ->scratch_aligned (avx) may partially overlap. scratch and scratch_aligned are not distinct objects, the latter is just the aligned address of the former.
After this change, write to scratch_align->map_index may write to scratch->map, so this issue becomes more prominent, we can set to 1 a bit in the supposedly-all-zero area of scratch->map[].
A followup patch will remove the scratch_aligned and makes generic and avx code use the same (aligned) area.
Its done in a separate change to ease review.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 41 ++++++++++++++++++----------- net/netfilter/nft_set_pipapo.h | 14 ++++++++-- net/netfilter/nft_set_pipapo_avx2.c | 15 +++++------ 3 files changed, 44 insertions(+), 26 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 3ff31043f714..58e595a84cd0 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -342,9 +342,6 @@ #include "nft_set_pipapo_avx2.h" #include "nft_set_pipapo.h"
-/* Current working bitmap index, toggled between field matches */ -static DEFINE_PER_CPU(bool, nft_pipapo_scratch_index); - /** * pipapo_refill() - For each set bit, set bits from selected mapping table item * @map: Bitmap to be scanned for set bits @@ -412,6 +409,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, const u32 *key, const struct nft_set_ext **ext) { struct nft_pipapo *priv = nft_set_priv(set); + struct nft_pipapo_scratch *scratch; unsigned long *res_map, *fill_map; u8 genmask = nft_genmask_cur(net); const u8 *rp = (const u8 *)key; @@ -422,15 +420,17 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
local_bh_disable();
- map_index = raw_cpu_read(nft_pipapo_scratch_index); - m = rcu_dereference(priv->match);
if (unlikely(!m || !*raw_cpu_ptr(m->scratch))) goto out;
- res_map = *raw_cpu_ptr(m->scratch) + (map_index ? m->bsize_max : 0); - fill_map = *raw_cpu_ptr(m->scratch) + (map_index ? 0 : m->bsize_max); + scratch = *raw_cpu_ptr(m->scratch); + + map_index = scratch->map_index; + + res_map = scratch->map + (map_index ? m->bsize_max : 0); + fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
memset(res_map, 0xff, m->bsize_max * sizeof(*res_map));
@@ -460,7 +460,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt, last); if (b < 0) { - raw_cpu_write(nft_pipapo_scratch_index, map_index); + scratch->map_index = map_index; local_bh_enable();
return false; @@ -477,7 +477,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, * current inactive bitmap is clean and can be reused as * *next* bitmap (not initial) for the next packet. */ - raw_cpu_write(nft_pipapo_scratch_index, map_index); + scratch->map_index = map_index; local_bh_enable();
return true; @@ -1114,12 +1114,12 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, int i;
for_each_possible_cpu(i) { - unsigned long *scratch; + struct nft_pipapo_scratch *scratch; #ifdef NFT_PIPAPO_ALIGN - unsigned long *scratch_aligned; + void *scratch_aligned; #endif - - scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2 + + scratch = kzalloc_node(struct_size(scratch, map, + bsize_max * 2) + NFT_PIPAPO_ALIGN_HEADROOM, GFP_KERNEL, cpu_to_node(i)); if (!scratch) { @@ -1138,7 +1138,16 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, *per_cpu_ptr(clone->scratch, i) = scratch;
#ifdef NFT_PIPAPO_ALIGN - scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch); + /* Align &scratch->map (not the struct itself): the extra + * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() + * above guarantee we can waste up to those bytes in order + * to align the map field regardless of its offset within + * the struct. + */ + BUILD_BUG_ON(offsetof(struct nft_pipapo_scratch, map) > NFT_PIPAPO_ALIGN_HEADROOM); + + scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map); + scratch_aligned -= offsetof(struct nft_pipapo_scratch, map); *per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned; #endif } @@ -2130,7 +2139,7 @@ static int nft_pipapo_init(const struct nft_set *set, m->field_count = field_count; m->bsize_max = 0;
- m->scratch = alloc_percpu(unsigned long *); + m->scratch = alloc_percpu(struct nft_pipapo_scratch *); if (!m->scratch) { err = -ENOMEM; goto out_scratch; @@ -2139,7 +2148,7 @@ static int nft_pipapo_init(const struct nft_set *set, *per_cpu_ptr(m->scratch, i) = NULL;
#ifdef NFT_PIPAPO_ALIGN - m->scratch_aligned = alloc_percpu(unsigned long *); + m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *); if (!m->scratch_aligned) { err = -ENOMEM; goto out_free; diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h index 2e164a319945..75b1340c6335 100644 --- a/net/netfilter/nft_set_pipapo.h +++ b/net/netfilter/nft_set_pipapo.h @@ -130,6 +130,16 @@ struct nft_pipapo_field { union nft_pipapo_map_bucket *mt; };
+/** + * struct nft_pipapo_scratch - percpu data used for lookup and matching + * @map_index: Current working bitmap index, toggled between field matches + * @map: store partial matching results during lookup + */ +struct nft_pipapo_scratch { + u8 map_index; + unsigned long map[]; +}; + /** * struct nft_pipapo_match - Data used for lookup and matching * @field_count Amount of fields in set @@ -142,9 +152,9 @@ struct nft_pipapo_field { struct nft_pipapo_match { int field_count; #ifdef NFT_PIPAPO_ALIGN - unsigned long * __percpu *scratch_aligned; + struct nft_pipapo_scratch * __percpu *scratch_aligned; #endif - unsigned long * __percpu *scratch; + struct nft_pipapo_scratch * __percpu *scratch; size_t bsize_max; struct rcu_head rcu; struct nft_pipapo_field f[] __counted_by(field_count); diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 52e0d026d30a..78213c73af2e 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -71,9 +71,6 @@ #define NFT_PIPAPO_AVX2_ZERO(reg) \ asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
-/* Current working bitmap index, toggled between field matches */ -static DEFINE_PER_CPU(bool, nft_pipapo_avx2_scratch_index); - /** * nft_pipapo_avx2_prepare() - Prepare before main algorithm body * @@ -1120,11 +1117,12 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, const u32 *key, const struct nft_set_ext **ext) { struct nft_pipapo *priv = nft_set_priv(set); - unsigned long *res, *fill, *scratch; + struct nft_pipapo_scratch *scratch; u8 genmask = nft_genmask_cur(net); const u8 *rp = (const u8 *)key; struct nft_pipapo_match *m; struct nft_pipapo_field *f; + unsigned long *res, *fill; bool map_index; int i, ret = 0;
@@ -1146,10 +1144,11 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, kernel_fpu_end(); return false; } - map_index = raw_cpu_read(nft_pipapo_avx2_scratch_index);
- res = scratch + (map_index ? m->bsize_max : 0); - fill = scratch + (map_index ? 0 : m->bsize_max); + map_index = scratch->map_index; + + res = scratch->map + (map_index ? m->bsize_max : 0); + fill = scratch->map + (map_index ? 0 : m->bsize_max);
/* Starting map doesn't need to be set for this implementation */
@@ -1221,7 +1220,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
out: if (i % 2) - raw_cpu_write(nft_pipapo_avx2_scratch_index, !map_index); + scratch->map_index = !map_index; kernel_fpu_end();
return ret >= 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 47b1c03c3c1a119435480a1e73f27197dc59131d ]
After next patch simple kfree() is not enough anymore, so add a helper for it.
Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: 5a8cdf6fd860 ("netfilter: nft_set_pipapo: remove scratch_aligned pointer") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 58e595a84cd0..b6bca59b7ba6 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1101,6 +1101,24 @@ static void pipapo_map(struct nft_pipapo_match *m, f->mt[map[i].to + j].e = e; }
+/** + * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address + * @m: Matching data + * @cpu: CPU number + */ +static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu) +{ + struct nft_pipapo_scratch *s; + void *mem; + + s = *per_cpu_ptr(m->scratch, cpu); + if (!s) + return; + + mem = s; + kfree(mem); +} + /** * pipapo_realloc_scratch() - Reallocate scratch maps for partial match results * @clone: Copy of matching data with pending insertions and deletions @@ -1133,7 +1151,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, return -ENOMEM; }
- kfree(*per_cpu_ptr(clone->scratch, i)); + pipapo_free_scratch(clone, i);
*per_cpu_ptr(clone->scratch, i) = scratch;
@@ -1358,7 +1376,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old) } out_scratch_realloc: for_each_possible_cpu(i) - kfree(*per_cpu_ptr(new->scratch, i)); + pipapo_free_scratch(new, i); #ifdef NFT_PIPAPO_ALIGN free_percpu(new->scratch_aligned); #endif @@ -1646,7 +1664,7 @@ static void pipapo_free_match(struct nft_pipapo_match *m) int i;
for_each_possible_cpu(i) - kfree(*per_cpu_ptr(m->scratch, i)); + pipapo_free_scratch(m, i);
#ifdef NFT_PIPAPO_ALIGN free_percpu(m->scratch_aligned); @@ -2247,7 +2265,7 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx, free_percpu(m->scratch_aligned); #endif for_each_possible_cpu(cpu) - kfree(*per_cpu_ptr(m->scratch, cpu)); + pipapo_free_scratch(m, cpu); free_percpu(m->scratch); pipapo_free_fields(m); kfree(m); @@ -2264,7 +2282,7 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx, free_percpu(priv->clone->scratch_aligned); #endif for_each_possible_cpu(cpu) - kfree(*per_cpu_ptr(priv->clone->scratch, cpu)); + pipapo_free_scratch(priv->clone, cpu); free_percpu(priv->clone->scratch);
pipapo_free_fields(priv->clone);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 5a8cdf6fd860ac5e6d08d72edbcecee049a7fec4 ]
use ->scratch for both avx2 and the generic implementation.
After previous change the scratch->map member is always aligned properly for AVX2, so we can just use scratch->map in AVX2 too.
The alignoff delta is stored in the scratchpad so we can reconstruct the correct address to free the area again.
Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 41 +++++------------------------ net/netfilter/nft_set_pipapo.h | 6 ++--- net/netfilter/nft_set_pipapo_avx2.c | 2 +- 3 files changed, 10 insertions(+), 39 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index b6bca59b7ba6..8e9b20077966 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1116,6 +1116,7 @@ static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int c return;
mem = s; + mem -= s->align_off; kfree(mem); }
@@ -1135,6 +1136,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, struct nft_pipapo_scratch *scratch; #ifdef NFT_PIPAPO_ALIGN void *scratch_aligned; + u32 align_off; #endif scratch = kzalloc_node(struct_size(scratch, map, bsize_max * 2) + @@ -1153,8 +1155,6 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
pipapo_free_scratch(clone, i);
- *per_cpu_ptr(clone->scratch, i) = scratch; - #ifdef NFT_PIPAPO_ALIGN /* Align &scratch->map (not the struct itself): the extra * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() @@ -1166,8 +1166,12 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map); scratch_aligned -= offsetof(struct nft_pipapo_scratch, map); - *per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned; + align_off = scratch_aligned - (void *)scratch; + + scratch = scratch_aligned; + scratch->align_off = align_off; #endif + *per_cpu_ptr(clone->scratch, i) = scratch; }
return 0; @@ -1320,11 +1324,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old) if (!new->scratch) goto out_scratch;
-#ifdef NFT_PIPAPO_ALIGN - new->scratch_aligned = alloc_percpu(*new->scratch_aligned); - if (!new->scratch_aligned) - goto out_scratch; -#endif for_each_possible_cpu(i) *per_cpu_ptr(new->scratch, i) = NULL;
@@ -1377,9 +1376,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old) out_scratch_realloc: for_each_possible_cpu(i) pipapo_free_scratch(new, i); -#ifdef NFT_PIPAPO_ALIGN - free_percpu(new->scratch_aligned); -#endif out_scratch: free_percpu(new->scratch); kfree(new); @@ -1666,11 +1662,7 @@ static void pipapo_free_match(struct nft_pipapo_match *m) for_each_possible_cpu(i) pipapo_free_scratch(m, i);
-#ifdef NFT_PIPAPO_ALIGN - free_percpu(m->scratch_aligned); -#endif free_percpu(m->scratch); - pipapo_free_fields(m);
kfree(m); @@ -2165,16 +2157,6 @@ static int nft_pipapo_init(const struct nft_set *set, for_each_possible_cpu(i) *per_cpu_ptr(m->scratch, i) = NULL;
-#ifdef NFT_PIPAPO_ALIGN - m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *); - if (!m->scratch_aligned) { - err = -ENOMEM; - goto out_free; - } - for_each_possible_cpu(i) - *per_cpu_ptr(m->scratch_aligned, i) = NULL; -#endif - rcu_head_init(&m->rcu);
nft_pipapo_for_each_field(f, i, m) { @@ -2205,9 +2187,6 @@ static int nft_pipapo_init(const struct nft_set *set, return 0;
out_free: -#ifdef NFT_PIPAPO_ALIGN - free_percpu(m->scratch_aligned); -#endif free_percpu(m->scratch); out_scratch: kfree(m); @@ -2261,9 +2240,6 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
nft_set_pipapo_match_destroy(ctx, set, m);
-#ifdef NFT_PIPAPO_ALIGN - free_percpu(m->scratch_aligned); -#endif for_each_possible_cpu(cpu) pipapo_free_scratch(m, cpu); free_percpu(m->scratch); @@ -2278,9 +2254,6 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx, if (priv->dirty) nft_set_pipapo_match_destroy(ctx, set, m);
-#ifdef NFT_PIPAPO_ALIGN - free_percpu(priv->clone->scratch_aligned); -#endif for_each_possible_cpu(cpu) pipapo_free_scratch(priv->clone, cpu); free_percpu(priv->clone->scratch); diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h index 75b1340c6335..a4a58812c108 100644 --- a/net/netfilter/nft_set_pipapo.h +++ b/net/netfilter/nft_set_pipapo.h @@ -133,10 +133,12 @@ struct nft_pipapo_field { /** * struct nft_pipapo_scratch - percpu data used for lookup and matching * @map_index: Current working bitmap index, toggled between field matches + * @align_off: Offset to get the originally allocated address * @map: store partial matching results during lookup */ struct nft_pipapo_scratch { u8 map_index; + u32 align_off; unsigned long map[]; };
@@ -144,16 +146,12 @@ struct nft_pipapo_scratch { * struct nft_pipapo_match - Data used for lookup and matching * @field_count Amount of fields in set * @scratch: Preallocated per-CPU maps for partial matching results - * @scratch_aligned: Version of @scratch aligned to NFT_PIPAPO_ALIGN bytes * @bsize_max: Maximum lookup table bucket size of all fields, in longs * @rcu Matching data is swapped on commits * @f: Fields, with lookup and mapping tables */ struct nft_pipapo_match { int field_count; -#ifdef NFT_PIPAPO_ALIGN - struct nft_pipapo_scratch * __percpu *scratch_aligned; -#endif struct nft_pipapo_scratch * __percpu *scratch; size_t bsize_max; struct rcu_head rcu; diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index 78213c73af2e..90e275bb3e5d 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1139,7 +1139,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set, */ kernel_fpu_begin_mask(0);
- scratch = *raw_cpu_ptr(m->scratch_aligned); + scratch = *raw_cpu_ptr(m->scratch); if (unlikely(!scratch)) { kernel_fpu_end(); return false;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit b2dd7b953c25ffd5912dda17e980e7168bebcf6c ]
The issue here is when this is called from ntfs_load_attr_list(). The "size" comes from le32_to_cpu(attr->res.data_size) so it can't overflow on a 64bit systems but on 32bit systems the "+ 1023" can overflow and the result is zero. This means that the kmalloc will succeed by returning the ZERO_SIZE_PTR and then the memcpy() will crash with an Oops on the next line.
Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/ntfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h index 0e6a2777870c..29a9b0b29e4f 100644 --- a/fs/ntfs3/ntfs_fs.h +++ b/fs/ntfs3/ntfs_fs.h @@ -473,7 +473,7 @@ bool al_delete_le(struct ntfs_inode *ni, enum ATTR_TYPE type, CLST vcn, int al_update(struct ntfs_inode *ni, int sync); static inline size_t al_aligned(size_t size) { - return (size + 1023) & ~(size_t)1023; + return size_add(size, 1023) & ~(size_t)1023; }
/* Globals from bitfunc.c */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit c5e9b2c2ae82231d85d9650854e7b3e97dde33da ]
For now, tlb_flush() simply calls flush_tlb_mm() which results in a flush of the whole TLB. So let's use mmu_gather fields to provide a more fine-grained flush of the TLB.
Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Reviewed-by: Samuel Holland samuel.holland@sifive.com Tested-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com # On RZ/Five SMARC Link: https://lore.kernel.org/r/20231030133027.19542-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/tlb.h | 8 +++++++- arch/riscv/include/asm/tlbflush.h | 3 +++ arch/riscv/mm/tlbflush.c | 7 +++++++ 3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h index 120bcf2ed8a8..1eb5682b2af6 100644 --- a/arch/riscv/include/asm/tlb.h +++ b/arch/riscv/include/asm/tlb.h @@ -15,7 +15,13 @@ static void tlb_flush(struct mmu_gather *tlb);
static inline void tlb_flush(struct mmu_gather *tlb) { - flush_tlb_mm(tlb->mm); +#ifdef CONFIG_MMU + if (tlb->fullmm || tlb->need_flush_all) + flush_tlb_mm(tlb->mm); + else + flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end, + tlb_get_unmap_size(tlb)); +#endif }
#endif /* _ASM_RISCV_TLB_H */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index a09196f8de68..f5c4fb0ae642 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -32,6 +32,8 @@ static inline void local_flush_tlb_page(unsigned long addr) #if defined(CONFIG_SMP) && defined(CONFIG_MMU) void flush_tlb_all(void); void flush_tlb_mm(struct mm_struct *mm); +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned int page_size); void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr); void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); @@ -52,6 +54,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, }
#define flush_tlb_mm(mm) flush_tlb_all() +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all() #endif /* !CONFIG_SMP || !CONFIG_MMU */
/* Flush a range of kernel pages */ diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 77be59aadc73..fa03289853d8 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -132,6 +132,13 @@ void flush_tlb_mm(struct mm_struct *mm) __flush_tlb_range(mm, 0, -1, PAGE_SIZE); }
+void flush_tlb_mm_range(struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned int page_size) +{ + __flush_tlb_range(mm, start, end - start, page_size); +} + void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) { __flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
On 2024-02-13 11:21 AM, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
This patch has a bug that was fixed in 97cf301fa42e ("riscv: Flush the tlb when a page directory is freed"), so that fix should be backported as well.
Regards, Samuel
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit c5e9b2c2ae82231d85d9650854e7b3e97dde33da ]
For now, tlb_flush() simply calls flush_tlb_mm() which results in a flush of the whole TLB. So let's use mmu_gather fields to provide a more fine-grained flush of the TLB.
Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Reviewed-by: Samuel Holland samuel.holland@sifive.com Tested-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com # On RZ/Five SMARC Link: https://lore.kernel.org/r/20231030133027.19542-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap") Signed-off-by: Sasha Levin sashal@kernel.org
arch/riscv/include/asm/tlb.h | 8 +++++++- arch/riscv/include/asm/tlbflush.h | 3 +++ arch/riscv/mm/tlbflush.c | 7 +++++++ 3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h index 120bcf2ed8a8..1eb5682b2af6 100644 --- a/arch/riscv/include/asm/tlb.h +++ b/arch/riscv/include/asm/tlb.h @@ -15,7 +15,13 @@ static void tlb_flush(struct mmu_gather *tlb); static inline void tlb_flush(struct mmu_gather *tlb) {
- flush_tlb_mm(tlb->mm);
+#ifdef CONFIG_MMU
- if (tlb->fullmm || tlb->need_flush_all)
flush_tlb_mm(tlb->mm);
- else
flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
tlb_get_unmap_size(tlb));
+#endif } #endif /* _ASM_RISCV_TLB_H */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index a09196f8de68..f5c4fb0ae642 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -32,6 +32,8 @@ static inline void local_flush_tlb_page(unsigned long addr) #if defined(CONFIG_SMP) && defined(CONFIG_MMU) void flush_tlb_all(void); void flush_tlb_mm(struct mm_struct *mm); +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned int page_size);
void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr); void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); @@ -52,6 +54,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, } #define flush_tlb_mm(mm) flush_tlb_all() +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all() #endif /* !CONFIG_SMP || !CONFIG_MMU */ /* Flush a range of kernel pages */ diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 77be59aadc73..fa03289853d8 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -132,6 +132,13 @@ void flush_tlb_mm(struct mm_struct *mm) __flush_tlb_range(mm, 0, -1, PAGE_SIZE); } +void flush_tlb_mm_range(struct mm_struct *mm,
unsigned long start, unsigned long end,
unsigned int page_size)
+{
- __flush_tlb_range(mm, start, end - start, page_size);
+}
void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) { __flush_tlb_range(vma->vm_mm, addr, PAGE_SIZE, PAGE_SIZE);
On 2024-02-13 3:54 PM, Samuel Holland wrote:
On 2024-02-13 11:21 AM, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
This patch has a bug that was fixed in 97cf301fa42e ("riscv: Flush the tlb when a page directory is freed"), so that fix should be backported as well.
Sorry for the noise. I missed that this fix was included as patch 93 in the series.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit 9d4e8d5fa7dbbb606b355f40d918a1feef821bc5 ]
Currently, when the range to flush covers more than one page (a 4K page or a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole tlb comes with a greater cost than flushing a single entry so we should flush single entries up to a certain threshold so that: threshold * cost of flushing a single entry < cost of flushing the whole tlb.
Co-developed-by: Mayuresh Chitale mchitale@ventanamicro.com Signed-off-by: Mayuresh Chitale mchitale@ventanamicro.com Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Tested-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com # On RZ/Five SMARC Reviewed-by: Samuel Holland samuel.holland@sifive.com Tested-by: Samuel Holland samuel.holland@sifive.com Link: https://lore.kernel.org/r/20231030133027.19542-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/sbi.h | 3 - arch/riscv/include/asm/tlbflush.h | 3 + arch/riscv/kernel/sbi.c | 32 +++------ arch/riscv/mm/tlbflush.c | 115 +++++++++++++++--------------- 4 files changed, 72 insertions(+), 81 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h index 5b4a1bf5f439..b79d0228144f 100644 --- a/arch/riscv/include/asm/sbi.h +++ b/arch/riscv/include/asm/sbi.h @@ -273,9 +273,6 @@ void sbi_set_timer(uint64_t stime_value); void sbi_shutdown(void); void sbi_send_ipi(unsigned int cpu); int sbi_remote_fence_i(const struct cpumask *cpu_mask); -int sbi_remote_sfence_vma(const struct cpumask *cpu_mask, - unsigned long start, - unsigned long size);
int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask, unsigned long start, diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index f5c4fb0ae642..170a49c531c6 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -11,6 +11,9 @@ #include <asm/smp.h> #include <asm/errata_list.h>
+#define FLUSH_TLB_MAX_SIZE ((unsigned long)-1) +#define FLUSH_TLB_NO_ASID ((unsigned long)-1) + #ifdef CONFIG_MMU extern unsigned long asid_mask;
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c index c672c8ba9a2a..5a62ed1da453 100644 --- a/arch/riscv/kernel/sbi.c +++ b/arch/riscv/kernel/sbi.c @@ -11,6 +11,7 @@ #include <linux/reboot.h> #include <asm/sbi.h> #include <asm/smp.h> +#include <asm/tlbflush.h>
/* default SBI version is 0.1 */ unsigned long sbi_spec_version __ro_after_init = SBI_SPEC_VERSION_DEFAULT; @@ -376,32 +377,15 @@ int sbi_remote_fence_i(const struct cpumask *cpu_mask) } EXPORT_SYMBOL(sbi_remote_fence_i);
-/** - * sbi_remote_sfence_vma() - Execute SFENCE.VMA instructions on given remote - * harts for the specified virtual address range. - * @cpu_mask: A cpu mask containing all the target harts. - * @start: Start of the virtual address - * @size: Total size of the virtual address range. - * - * Return: 0 on success, appropriate linux error code otherwise. - */ -int sbi_remote_sfence_vma(const struct cpumask *cpu_mask, - unsigned long start, - unsigned long size) -{ - return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA, - cpu_mask, start, size, 0, 0); -} -EXPORT_SYMBOL(sbi_remote_sfence_vma); - /** * sbi_remote_sfence_vma_asid() - Execute SFENCE.VMA instructions on given - * remote harts for a virtual address range belonging to a specific ASID. + * remote harts for a virtual address range belonging to a specific ASID or not. * * @cpu_mask: A cpu mask containing all the target harts. * @start: Start of the virtual address * @size: Total size of the virtual address range. - * @asid: The value of address space identifier (ASID). + * @asid: The value of address space identifier (ASID), or FLUSH_TLB_NO_ASID + * for flushing all address spaces. * * Return: 0 on success, appropriate linux error code otherwise. */ @@ -410,8 +394,12 @@ int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask, unsigned long size, unsigned long asid) { - return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID, - cpu_mask, start, size, asid, 0); + if (asid == FLUSH_TLB_NO_ASID) + return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA, + cpu_mask, start, size, 0, 0); + else + return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID, + cpu_mask, start, size, asid, 0); } EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index fa03289853d8..88fa8b18ca22 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -8,28 +8,50 @@
static inline void local_flush_tlb_all_asid(unsigned long asid) { - __asm__ __volatile__ ("sfence.vma x0, %0" - : - : "r" (asid) - : "memory"); + if (asid != FLUSH_TLB_NO_ASID) + __asm__ __volatile__ ("sfence.vma x0, %0" + : + : "r" (asid) + : "memory"); + else + local_flush_tlb_all(); }
static inline void local_flush_tlb_page_asid(unsigned long addr, unsigned long asid) { - __asm__ __volatile__ ("sfence.vma %0, %1" - : - : "r" (addr), "r" (asid) - : "memory"); + if (asid != FLUSH_TLB_NO_ASID) + __asm__ __volatile__ ("sfence.vma %0, %1" + : + : "r" (addr), "r" (asid) + : "memory"); + else + local_flush_tlb_page(addr); }
-static inline void local_flush_tlb_range(unsigned long start, - unsigned long size, unsigned long stride) +/* + * Flush entire TLB if number of entries to be flushed is greater + * than the threshold below. + */ +static unsigned long tlb_flush_all_threshold __read_mostly = 64; + +static void local_flush_tlb_range_threshold_asid(unsigned long start, + unsigned long size, + unsigned long stride, + unsigned long asid) { - if (size <= stride) - local_flush_tlb_page(start); - else - local_flush_tlb_all(); + unsigned long nr_ptes_in_range = DIV_ROUND_UP(size, stride); + int i; + + if (nr_ptes_in_range > tlb_flush_all_threshold) { + local_flush_tlb_all_asid(asid); + return; + } + + for (i = 0; i < nr_ptes_in_range; ++i) { + local_flush_tlb_page_asid(start, asid); + start += stride; + } }
static inline void local_flush_tlb_range_asid(unsigned long start, @@ -37,8 +59,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start, { if (size <= stride) local_flush_tlb_page_asid(start, asid); - else + else if (size == FLUSH_TLB_MAX_SIZE) local_flush_tlb_all_asid(asid); + else + local_flush_tlb_range_threshold_asid(start, size, stride, asid); }
static void __ipi_flush_tlb_all(void *info) @@ -51,7 +75,7 @@ void flush_tlb_all(void) if (riscv_use_ipi_for_rfence()) on_each_cpu(__ipi_flush_tlb_all, NULL, 1); else - sbi_remote_sfence_vma(NULL, 0, -1); + sbi_remote_sfence_vma_asid(NULL, 0, FLUSH_TLB_MAX_SIZE, FLUSH_TLB_NO_ASID); }
struct flush_tlb_range_data { @@ -68,18 +92,12 @@ static void __ipi_flush_tlb_range_asid(void *info) local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid); }
-static void __ipi_flush_tlb_range(void *info) -{ - struct flush_tlb_range_data *d = info; - - local_flush_tlb_range(d->start, d->size, d->stride); -} - static void __flush_tlb_range(struct mm_struct *mm, unsigned long start, unsigned long size, unsigned long stride) { struct flush_tlb_range_data ftd; struct cpumask *cmask = mm_cpumask(mm); + unsigned long asid = FLUSH_TLB_NO_ASID; unsigned int cpuid; bool broadcast;
@@ -89,39 +107,24 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start, cpuid = get_cpu(); /* check if the tlbflush needs to be sent to other CPUs */ broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids; - if (static_branch_unlikely(&use_asid_allocator)) { - unsigned long asid = atomic_long_read(&mm->context.id) & asid_mask; - - if (broadcast) { - if (riscv_use_ipi_for_rfence()) { - ftd.asid = asid; - ftd.start = start; - ftd.size = size; - ftd.stride = stride; - on_each_cpu_mask(cmask, - __ipi_flush_tlb_range_asid, - &ftd, 1); - } else - sbi_remote_sfence_vma_asid(cmask, - start, size, asid); - } else { - local_flush_tlb_range_asid(start, size, stride, asid); - } + + if (static_branch_unlikely(&use_asid_allocator)) + asid = atomic_long_read(&mm->context.id) & asid_mask; + + if (broadcast) { + if (riscv_use_ipi_for_rfence()) { + ftd.asid = asid; + ftd.start = start; + ftd.size = size; + ftd.stride = stride; + on_each_cpu_mask(cmask, + __ipi_flush_tlb_range_asid, + &ftd, 1); + } else + sbi_remote_sfence_vma_asid(cmask, + start, size, asid); } else { - if (broadcast) { - if (riscv_use_ipi_for_rfence()) { - ftd.asid = 0; - ftd.start = start; - ftd.size = size; - ftd.stride = stride; - on_each_cpu_mask(cmask, - __ipi_flush_tlb_range, - &ftd, 1); - } else - sbi_remote_sfence_vma(cmask, start, size); - } else { - local_flush_tlb_range(start, size, stride); - } + local_flush_tlb_range_asid(start, size, stride, asid); }
put_cpu(); @@ -129,7 +132,7 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start,
void flush_tlb_mm(struct mm_struct *mm) { - __flush_tlb_range(mm, 0, -1, PAGE_SIZE); + __flush_tlb_range(mm, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); }
void flush_tlb_mm_range(struct mm_struct *mm,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit 5e22bfd520ea8740e9a20314d2a890baf304c9d2 ]
This function used to simply flush the whole tlb of all harts, be more subtile and try to only flush the range.
The problem is that we can only use PAGE_SIZE as stride since we don't know the size of the underlying mapping and then this function will be improved only if the size of the region to flush is < threshold * PAGE_SIZE.
Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Tested-by: Lad Prabhakar prabhakar.mahadev-lad.rj@bp.renesas.com # On RZ/Five SMARC Reviewed-by: Samuel Holland samuel.holland@sifive.com Tested-by: Samuel Holland samuel.holland@sifive.com Link: https://lore.kernel.org/r/20231030133027.19542-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/tlbflush.h | 11 +++++----- arch/riscv/mm/tlbflush.c | 34 ++++++++++++++++++++++--------- 2 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 170a49c531c6..8f3418c5f172 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -40,6 +40,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr); void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); +void flush_tlb_kernel_range(unsigned long start, unsigned long end); #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, @@ -56,15 +57,15 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, local_flush_tlb_all(); }
-#define flush_tlb_mm(mm) flush_tlb_all() -#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all() -#endif /* !CONFIG_SMP || !CONFIG_MMU */ - /* Flush a range of kernel pages */ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end) { - flush_tlb_all(); + local_flush_tlb_all(); }
+#define flush_tlb_mm(mm) flush_tlb_all() +#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all() +#endif /* !CONFIG_SMP || !CONFIG_MMU */ + #endif /* _ASM_RISCV_TLBFLUSH_H */ diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 88fa8b18ca22..8723adc884c7 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -96,20 +96,27 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start, unsigned long size, unsigned long stride) { struct flush_tlb_range_data ftd; - struct cpumask *cmask = mm_cpumask(mm); + const struct cpumask *cmask; unsigned long asid = FLUSH_TLB_NO_ASID; - unsigned int cpuid; bool broadcast;
- if (cpumask_empty(cmask)) - return; + if (mm) { + unsigned int cpuid; + + cmask = mm_cpumask(mm); + if (cpumask_empty(cmask)) + return;
- cpuid = get_cpu(); - /* check if the tlbflush needs to be sent to other CPUs */ - broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids; + cpuid = get_cpu(); + /* check if the tlbflush needs to be sent to other CPUs */ + broadcast = cpumask_any_but(cmask, cpuid) < nr_cpu_ids;
- if (static_branch_unlikely(&use_asid_allocator)) - asid = atomic_long_read(&mm->context.id) & asid_mask; + if (static_branch_unlikely(&use_asid_allocator)) + asid = atomic_long_read(&mm->context.id) & asid_mask; + } else { + cmask = cpu_online_mask; + broadcast = true; + }
if (broadcast) { if (riscv_use_ipi_for_rfence()) { @@ -127,7 +134,8 @@ static void __flush_tlb_range(struct mm_struct *mm, unsigned long start, local_flush_tlb_range_asid(start, size, stride, asid); }
- put_cpu(); + if (mm) + put_cpu(); }
void flush_tlb_mm(struct mm_struct *mm) @@ -152,6 +160,12 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, { __flush_tlb_range(vma->vm_mm, start, end - start, PAGE_SIZE); } + +void flush_tlb_kernel_range(unsigned long start, unsigned long end) +{ + __flush_tlb_range(NULL, start, end - start, PAGE_SIZE); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit 7a92fc8b4d20680e4c20289a670d8fca2d1f2c1b ]
The pcpu setup when using the page allocator sets up a new vmalloc mapping very early in the boot process, so early that it cannot use the flush_cache_vmap() function which may depend on structures not yet initialized (for example in riscv, we currently send an IPI to flush other cpus TLB).
But on some architectures, we must call flush_cache_vmap(): for example, in riscv, some uarchs can cache invalid TLB entries so we need to flush the new established mapping to avoid taking an exception.
So fix this by introducing a new function flush_cache_vmap_early() which is called right after setting the new page table entry and before accessing this new mapping. This new function implements a local flush tlb on riscv and is no-op for other architectures (same as today).
Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Acked-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Dennis Zhou dennis@kernel.org Stable-dep-of: d9807d60c145 ("riscv: mm: execute local TLB flush after populating vmemmap") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arc/include/asm/cacheflush.h | 1 + arch/arm/include/asm/cacheflush.h | 2 ++ arch/csky/abiv1/inc/abi/cacheflush.h | 1 + arch/csky/abiv2/inc/abi/cacheflush.h | 1 + arch/m68k/include/asm/cacheflush_mm.h | 1 + arch/mips/include/asm/cacheflush.h | 2 ++ arch/nios2/include/asm/cacheflush.h | 1 + arch/parisc/include/asm/cacheflush.h | 1 + arch/riscv/include/asm/cacheflush.h | 3 ++- arch/riscv/include/asm/tlbflush.h | 1 + arch/riscv/mm/tlbflush.c | 5 +++++ arch/sh/include/asm/cacheflush.h | 1 + arch/sparc/include/asm/cacheflush_32.h | 1 + arch/sparc/include/asm/cacheflush_64.h | 1 + arch/xtensa/include/asm/cacheflush.h | 6 ++++-- include/asm-generic/cacheflush.h | 6 ++++++ mm/percpu.c | 8 +------- 17 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h index bd5b1a9a0544..6fc74500a9f5 100644 --- a/arch/arc/include/asm/cacheflush.h +++ b/arch/arc/include/asm/cacheflush.h @@ -40,6 +40,7 @@ void dma_cache_wback(phys_addr_t start, unsigned long sz);
/* TBD: optimize this */ #define flush_cache_vmap(start, end) flush_cache_all() +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) flush_cache_all()
#define flush_cache_dup_mm(mm) /* called on fork (VIVT only) */ diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index f6181f69577f..1075534b0a2e 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -340,6 +340,8 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end) dsb(ishst); }
+#define flush_cache_vmap_early(start, end) do { } while (0) + static inline void flush_cache_vunmap(unsigned long start, unsigned long end) { if (!cache_is_vipt_nonaliasing()) diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h index 908d8b0bc4fd..d011a81575d2 100644 --- a/arch/csky/abiv1/inc/abi/cacheflush.h +++ b/arch/csky/abiv1/inc/abi/cacheflush.h @@ -43,6 +43,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma, */ extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); #define flush_cache_vmap(start, end) cache_wbinv_all() +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) cache_wbinv_all()
#define flush_icache_range(start, end) cache_wbinv_range(start, end) diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h index 40be16907267..6513ac5d2578 100644 --- a/arch/csky/abiv2/inc/abi/cacheflush.h +++ b/arch/csky/abiv2/inc/abi/cacheflush.h @@ -41,6 +41,7 @@ void flush_icache_mm_range(struct mm_struct *mm, void flush_icache_deferred(struct mm_struct *mm);
#define flush_cache_vmap(start, end) do { } while (0) +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) do { } while (0)
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \ diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h index ed12358c4783..9a71b0148461 100644 --- a/arch/m68k/include/asm/cacheflush_mm.h +++ b/arch/m68k/include/asm/cacheflush_mm.h @@ -191,6 +191,7 @@ extern void cache_push_v(unsigned long vaddr, int len); #define flush_cache_all() __flush_cache_all()
#define flush_cache_vmap(start, end) flush_cache_all() +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) flush_cache_all()
static inline void flush_cache_mm(struct mm_struct *mm) diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h index f36c2519ed97..1f14132b3fc9 100644 --- a/arch/mips/include/asm/cacheflush.h +++ b/arch/mips/include/asm/cacheflush.h @@ -97,6 +97,8 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end) __flush_cache_vmap(); }
+#define flush_cache_vmap_early(start, end) do { } while (0) + extern void (*__flush_cache_vunmap)(void);
static inline void flush_cache_vunmap(unsigned long start, unsigned long end) diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h index 348cea097792..81484a776b33 100644 --- a/arch/nios2/include/asm/cacheflush.h +++ b/arch/nios2/include/asm/cacheflush.h @@ -38,6 +38,7 @@ void flush_icache_pages(struct vm_area_struct *vma, struct page *page, #define flush_icache_pages flush_icache_pages
#define flush_cache_vmap(start, end) flush_dcache_range(start, end) +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) flush_dcache_range(start, end)
extern void copy_to_user_page(struct vm_area_struct *vma, struct page *page, diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h index b4006f2a9705..ba4c05bc24d6 100644 --- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -41,6 +41,7 @@ void flush_kernel_vmap_range(void *vaddr, int size); void invalidate_kernel_vmap_range(void *vaddr, int size);
#define flush_cache_vmap(start, end) flush_cache_all() +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) flush_cache_all()
void flush_dcache_folio(struct folio *folio); diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h index 3cb53c4df27c..a129dac4521d 100644 --- a/arch/riscv/include/asm/cacheflush.h +++ b/arch/riscv/include/asm/cacheflush.h @@ -37,7 +37,8 @@ static inline void flush_dcache_page(struct page *page) flush_icache_mm(vma->vm_mm, 0)
#ifdef CONFIG_64BIT -#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) +#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) +#define flush_cache_vmap_early(start, end) local_flush_tlb_kernel_range(start, end) #endif
#ifndef CONFIG_SMP diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 8f3418c5f172..a60416bbe190 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -41,6 +41,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr); void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); void flush_tlb_kernel_range(unsigned long start, unsigned long end); +void local_flush_tlb_kernel_range(unsigned long start, unsigned long end); #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 8723adc884c7..b1ab6cf78e9e 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -65,6 +65,11 @@ static inline void local_flush_tlb_range_asid(unsigned long start, local_flush_tlb_range_threshold_asid(start, size, stride, asid); }
+void local_flush_tlb_kernel_range(unsigned long start, unsigned long end) +{ + local_flush_tlb_range_asid(start, end, PAGE_SIZE, FLUSH_TLB_NO_ASID); +} + static void __ipi_flush_tlb_all(void *info) { local_flush_tlb_all(); diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h index 878b6b551bd2..51112f54552b 100644 --- a/arch/sh/include/asm/cacheflush.h +++ b/arch/sh/include/asm/cacheflush.h @@ -90,6 +90,7 @@ extern void copy_from_user_page(struct vm_area_struct *vma, unsigned long len);
#define flush_cache_vmap(start, end) local_flush_cache_all(NULL) +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) local_flush_cache_all(NULL)
#define flush_dcache_mmap_lock(mapping) do { } while (0) diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h index f3b7270bf71b..9fee0ccfccb8 100644 --- a/arch/sparc/include/asm/cacheflush_32.h +++ b/arch/sparc/include/asm/cacheflush_32.h @@ -48,6 +48,7 @@ static inline void flush_dcache_page(struct page *page) #define flush_dcache_mmap_unlock(mapping) do { } while (0)
#define flush_cache_vmap(start, end) flush_cache_all() +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) flush_cache_all()
/* When a context switch happens we must flush all user windows so that diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h index 0e879004efff..2b1261b77ecd 100644 --- a/arch/sparc/include/asm/cacheflush_64.h +++ b/arch/sparc/include/asm/cacheflush_64.h @@ -75,6 +75,7 @@ void flush_ptrace_access(struct vm_area_struct *, struct page *, #define flush_dcache_mmap_unlock(mapping) do { } while (0)
#define flush_cache_vmap(start, end) do { } while (0) +#define flush_cache_vmap_early(start, end) do { } while (0) #define flush_cache_vunmap(start, end) do { } while (0)
#endif /* !__ASSEMBLY__ */ diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 785a00ce83c1..38bcecb0e457 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -116,8 +116,9 @@ void flush_cache_page(struct vm_area_struct*, #define flush_cache_mm(mm) flush_cache_all() #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
-#define flush_cache_vmap(start,end) flush_cache_all() -#define flush_cache_vunmap(start,end) flush_cache_all() +#define flush_cache_vmap(start,end) flush_cache_all() +#define flush_cache_vmap_early(start,end) do { } while (0) +#define flush_cache_vunmap(start,end) flush_cache_all()
void flush_dcache_folio(struct folio *folio); #define flush_dcache_folio flush_dcache_folio @@ -140,6 +141,7 @@ void local_flush_cache_page(struct vm_area_struct *vma, #define flush_cache_dup_mm(mm) do { } while (0)
#define flush_cache_vmap(start,end) do { } while (0) +#define flush_cache_vmap_early(start,end) do { } while (0) #define flush_cache_vunmap(start,end) do { } while (0)
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0 diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h index 84ec53ccc450..7ee8a179d103 100644 --- a/include/asm-generic/cacheflush.h +++ b/include/asm-generic/cacheflush.h @@ -91,6 +91,12 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end) } #endif
+#ifndef flush_cache_vmap_early +static inline void flush_cache_vmap_early(unsigned long start, unsigned long end) +{ +} +#endif + #ifndef flush_cache_vunmap static inline void flush_cache_vunmap(unsigned long start, unsigned long end) { diff --git a/mm/percpu.c b/mm/percpu.c index a7665de8485f..d287cebd58ca 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3306,13 +3306,7 @@ int __init pcpu_page_first_chunk(size_t reserved_size, pcpu_fc_cpu_to_node_fn_t if (rc < 0) panic("failed to map percpu area, err=%d\n", rc);
- /* - * FIXME: Archs with virtual cache should flush local - * cache for the linear mapping here - something - * equivalent to flush_cache_vmap() on the local cpu. - * flush_cache_vmap() can't be used as most supporting - * data structures are not set up yet. - */ + flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size);
/* copy static data */ memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vincent Chen vincent.chen@sifive.com
[ Upstream commit d9807d60c145836043ffa602328ea1d66dc458b1 ]
The spare_init() calls memmap_populate() many times to create VA to PA mapping for the VMEMMAP area, where all "struct page" are located once CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later initialized in the zone_sizes_init() function. However, during this process, no sfence.vma instruction is executed for this VMEMMAP area. This omission may cause the hart to fail to perform page table walk because some data related to the address translation is invisible to the hart. To solve this issue, the local_flush_tlb_kernel_range() is called right after the sparse_init() to execute a sfence.vma instruction for this VMEMMAP area, ensuring that all data related to the address translation is visible to the hart.
Fixes: d95f1a542c3d ("RISC-V: Implement sparsemem") Signed-off-by: Vincent Chen vincent.chen@sifive.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240117140333.2479667-1-vincent.chen@sifive.com Fixes: 7a92fc8b4d20 ("mm: Introduce flush_cache_vmap_early()") Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/tlbflush.h | 1 + arch/riscv/mm/init.c | 4 ++++ arch/riscv/mm/tlbflush.c | 3 ++- 3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index a60416bbe190..51664ae4852e 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -67,6 +67,7 @@ static inline void flush_tlb_kernel_range(unsigned long start,
#define flush_tlb_mm(mm) flush_tlb_all() #define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all() +#define local_flush_tlb_kernel_range(start, end) flush_tlb_all() #endif /* !CONFIG_SMP || !CONFIG_MMU */
#endif /* _ASM_RISCV_TLBFLUSH_H */ diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index e71dd19ac801..b50faa232b5e 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1502,6 +1502,10 @@ void __init misc_mem_init(void) early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); arch_numa_init(); sparse_init(); +#ifdef CONFIG_SPARSEMEM_VMEMMAP + /* The entire VMEMMAP region has been populated. Flush TLB for this region */ + local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END); +#endif zone_sizes_init(); reserve_crashkernel(); memblock_dump_all(); diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index b1ab6cf78e9e..bdee5de918e0 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -65,9 +65,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start, local_flush_tlb_range_threshold_asid(start, size, stride, asid); }
+/* Flush a range of kernel pages without broadcasting */ void local_flush_tlb_kernel_range(unsigned long start, unsigned long end) { - local_flush_tlb_range_asid(start, end, PAGE_SIZE, FLUSH_TLB_NO_ASID); + local_flush_tlb_range_asid(start, end - start, PAGE_SIZE, FLUSH_TLB_NO_ASID); }
static void __ipi_flush_tlb_all(void *info)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit 1458eb2c9d88ad4b35eb6d6a4aa1d43d8fbf7f62 ]
As stated by the privileged specification, we must clear a NAPOT mapping and emit a sfence.vma before setting a new translation.
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240117195741.1926459-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/mm/hugetlbpage.c | 42 +++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index b52f0210481f..24c0179565d8 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -177,13 +177,36 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) return entry; }
+static void clear_flush(struct mm_struct *mm, + unsigned long addr, + pte_t *ptep, + unsigned long pgsize, + unsigned long ncontig) +{ + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); + unsigned long i, saddr = addr; + + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) + ptep_get_and_clear(mm, addr, ptep); + + flush_tlb_range(&vma, saddr, addr); +} + +/* + * When dealing with NAPOT mappings, the privileged specification indicates that + * "if an update needs to be made, the OS generally should first mark all of the + * PTEs invalid, then issue SFENCE.VMA instruction(s) covering all 4 KiB regions + * within the range, [...] then update the PTE(s), as described in Section + * 4.2.1.". That's the equivalent of the Break-Before-Make approach used by + * arm64. + */ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned long sz) { - unsigned long hugepage_shift; + unsigned long hugepage_shift, pgsize; int i, pte_num;
if (sz >= PGDIR_SIZE) @@ -198,7 +221,22 @@ void set_huge_pte_at(struct mm_struct *mm, hugepage_shift = PAGE_SHIFT;
pte_num = sz >> hugepage_shift; - for (i = 0; i < pte_num; i++, ptep++, addr += (1 << hugepage_shift)) + pgsize = 1 << hugepage_shift; + + if (!pte_present(pte)) { + for (i = 0; i < pte_num; i++, ptep++, addr += pgsize) + set_ptes(mm, addr, ptep, pte, 1); + return; + } + + if (!pte_napot(pte)) { + set_ptes(mm, addr, ptep, pte, 1); + return; + } + + clear_flush(mm, addr, ptep, pgsize, pte_num); + + for (i = 0; i < pte_num; i++, ptep++, addr += pgsize) set_pte_at(mm, addr, ptep, pte); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit a179a4bfb694f80f2709a1d0398469e787acb974 ]
When NAPOT is enabled, a new hugepage size is available and then we need to make hugetlb_mask_last_page() aware of that.
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240117195741.1926459-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/mm/hugetlbpage.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 24c0179565d8..87af75ee7186 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -125,6 +125,26 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return pte; }
+unsigned long hugetlb_mask_last_page(struct hstate *h) +{ + unsigned long hp_size = huge_page_size(h); + + switch (hp_size) { +#ifndef __PAGETABLE_PMD_FOLDED + case PUD_SIZE: + return P4D_SIZE - PUD_SIZE; +#endif + case PMD_SIZE: + return PUD_SIZE - PMD_SIZE; + case napot_cont_size(NAPOT_CONT64KB_ORDER): + return PMD_SIZE - napot_cont_size(NAPOT_CONT64KB_ORDER); + default: + break; + } + + return 0UL; +} + static pte_t get_clear_contig(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 4e6c9011990726f4d175e2cdfebe5b0b8cce4839 ]
Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler") intended to fix a hard lockup issue triggered by EH. The core idea was to move scsi_host_busy() out of the host lock when processing individual commands for EH. However, a suggested style change inadvertently caused scsi_host_busy() to remain under the host lock. Fix this by calling scsi_host_busy() outside the lock.
Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler") Cc: Sathya Prakash Veerichetty safhya.prakash@broadcom.com Cc: Bart Van Assche bvanassche@acm.org Cc: Ewan D. Milne emilne@redhat.com Signed-off-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_error.c | 3 ++- drivers/scsi/scsi_lib.c | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 3328b175a832..43eff1107038 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -282,11 +282,12 @@ static void scsi_eh_inc_host_failed(struct rcu_head *head) { struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu); struct Scsi_Host *shost = scmd->device->host; + unsigned int busy = scsi_host_busy(shost); unsigned long flags;
spin_lock_irqsave(shost->host_lock, flags); shost->host_failed++; - scsi_eh_wakeup(shost, scsi_host_busy(shost)); + scsi_eh_wakeup(shost, busy); spin_unlock_irqrestore(shost->host_lock, flags); }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index dfdffe55c5a6..552809bca350 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -278,9 +278,11 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd) rcu_read_lock(); __clear_bit(SCMD_STATE_INFLIGHT, &cmd->state); if (unlikely(scsi_host_in_recovery(shost))) { + unsigned int busy = scsi_host_busy(shost); + spin_lock_irqsave(shost->host_lock, flags); if (shost->host_failed || shost->host_eh_scheduled) - scsi_eh_wakeup(shost, scsi_host_busy(shost)); + scsi_eh_wakeup(shost, busy); spin_unlock_irqrestore(shost->host_lock, flags); } rcu_read_unlock();
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit 97cf301fa42e8ea6e0a24de97bc0abcdc87d9504 ]
The riscv privileged specification mandates to flush the TLB whenever a page directory is modified, so add that to tlb_flush().
Fixes: c5e9b2c2ae82 ("riscv: Improve tlb_flush()") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Charlie Jenkins charlie@rivosinc.com Link: https://lore.kernel.org/r/20240128120405.25876-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/tlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h index 1eb5682b2af6..50b63b5c15bd 100644 --- a/arch/riscv/include/asm/tlb.h +++ b/arch/riscv/include/asm/tlb.h @@ -16,7 +16,7 @@ static void tlb_flush(struct mmu_gather *tlb); static inline void tlb_flush(struct mmu_gather *tlb) { #ifdef CONFIG_MMU - if (tlb->fullmm || tlb->need_flush_all) + if (tlb->fullmm || tlb->need_flush_all || tlb->freed_tables) flush_tlb_mm(tlb->mm); else flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiubo Li xiubli@redhat.com
[ Upstream commit ee97302fbc0c98a25732d736fc73aaf4d62c4128 ]
These functions are supposed to behave like other read_partial_*() handlers: the contract with messenger v1 is that the handler bails if the area of the message it's responsible for is already processed. This comes up when handling short reads from the socket.
[ idryomov: changelog ]
Signed-off-by: Xiubo Li xiubli@redhat.com Acked-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Stable-dep-of: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket") Signed-off-by: Sasha Levin sashal@kernel.org --- net/ceph/messenger_v1.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c index f9a50d7f0d20..4cb60bacf5f5 100644 --- a/net/ceph/messenger_v1.c +++ b/net/ceph/messenger_v1.c @@ -991,7 +991,7 @@ static inline int read_partial_message_section(struct ceph_connection *con, return read_partial_message_chunk(con, section, sec_len, crc); }
-static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc) +static int read_partial_sparse_msg_extent(struct ceph_connection *con, u32 *crc) { struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor; bool do_bounce = ceph_test_opt(from_msgr(con->msgr), RXBOUNCE); @@ -1026,7 +1026,7 @@ static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc) return 1; }
-static int read_sparse_msg_data(struct ceph_connection *con) +static int read_partial_sparse_msg_data(struct ceph_connection *con) { struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor; bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC); @@ -1043,7 +1043,7 @@ static int read_sparse_msg_data(struct ceph_connection *con) con->v1.in_sr_len, &crc); else if (cursor->sr_resid > 0) - ret = read_sparse_msg_extent(con, &crc); + ret = read_partial_sparse_msg_extent(con, &crc);
if (ret <= 0) { if (do_datacrc) @@ -1254,7 +1254,7 @@ static int read_partial_message(struct ceph_connection *con) return -EIO;
if (m->sparse_read) - ret = read_sparse_msg_data(con); + ret = read_partial_sparse_msg_data(con); else if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) ret = read_partial_msg_data_bounce(con); else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 8e46a2d068c92a905d01cbb018b00d66991585ab ]
A short read may occur while reading the message footer from the socket. Later, when the socket is ready for another read, the messenger invokes all read_partial_*() handlers, including read_partial_sparse_msg_data(). The expectation is that read_partial_sparse_msg_data() would bail, allowing the messenger to invoke read_partial() for the footer and pick up where it left off.
However read_partial_sparse_msg_data() violates that and ends up calling into the state machine in the OSD client. The sparse-read state machine assumes that it's a new op and interprets some piece of the footer as the sparse-read header and returns bogus extents/data length, etc.
To determine whether read_partial_sparse_msg_data() should bail, let's reuse cursor->total_resid. Because once it reaches to zero that means all the extents and data have been successfully received in last read, else it could break out when partially reading any of the extents and data. And then osd_sparse_read() could continue where it left off.
[ idryomov: changelog ]
Link: https://tracker.ceph.com/issues/63586 Fixes: d396f89db39a ("libceph: add sparse read support to msgr1") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ceph/messenger.h | 2 +- net/ceph/messenger_v1.c | 25 +++++++++++++------------ net/ceph/messenger_v2.c | 4 ++-- net/ceph/osd_client.c | 9 +++------ 4 files changed, 19 insertions(+), 21 deletions(-)
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 2eaaabbe98cb..1717cc57cdac 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -283,7 +283,7 @@ struct ceph_msg { struct kref kref; bool more_to_follow; bool needs_out_seq; - bool sparse_read; + u64 sparse_read_total; int front_alloc_len;
struct ceph_msgpool *pool; diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c index 4cb60bacf5f5..0cb61c76b9b8 100644 --- a/net/ceph/messenger_v1.c +++ b/net/ceph/messenger_v1.c @@ -160,8 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con) static void prepare_message_data(struct ceph_msg *msg, u32 data_len) { /* Initialize data cursor if it's not a sparse read */ - if (!msg->sparse_read) - ceph_msg_data_cursor_init(&msg->cursor, msg, data_len); + u64 len = msg->sparse_read_total ? : data_len; + + ceph_msg_data_cursor_init(&msg->cursor, msg, len); }
/* @@ -1036,7 +1037,7 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con) if (do_datacrc) crc = con->in_data_crc;
- do { + while (cursor->total_resid) { if (con->v1.in_sr_kvec.iov_base) ret = read_partial_message_chunk(con, &con->v1.in_sr_kvec, @@ -1044,23 +1045,23 @@ static int read_partial_sparse_msg_data(struct ceph_connection *con) &crc); else if (cursor->sr_resid > 0) ret = read_partial_sparse_msg_extent(con, &crc); - - if (ret <= 0) { - if (do_datacrc) - con->in_data_crc = crc; - return ret; - } + if (ret <= 0) + break;
memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec)); ret = con->ops->sparse_read(con, cursor, (char **)&con->v1.in_sr_kvec.iov_base); + if (ret <= 0) { + ret = ret ? ret : 1; /* must return > 0 to indicate success */ + break; + } con->v1.in_sr_len = ret; - } while (ret > 0); + }
if (do_datacrc) con->in_data_crc = crc;
- return ret < 0 ? ret : 1; /* must return > 0 to indicate success */ + return ret; }
static int read_partial_msg_data(struct ceph_connection *con) @@ -1253,7 +1254,7 @@ static int read_partial_message(struct ceph_connection *con) if (!m->num_data_items) return -EIO;
- if (m->sparse_read) + if (m->sparse_read_total) ret = read_partial_sparse_msg_data(con); else if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE)) ret = read_partial_msg_data_bounce(con); diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index d09a39ff2cf0..a901cae2f106 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -1132,7 +1132,7 @@ static int decrypt_tail(struct ceph_connection *con) struct sg_table enc_sgt = {}; struct sg_table sgt = {}; struct page **pages = NULL; - bool sparse = con->in_msg->sparse_read; + bool sparse = !!con->in_msg->sparse_read_total; int dpos = 0; int tail_len; int ret; @@ -2064,7 +2064,7 @@ static int prepare_read_tail_plain(struct ceph_connection *con) }
if (data_len(msg)) { - if (msg->sparse_read) + if (msg->sparse_read_total) con->v2.in_state = IN_S_PREPARE_SPARSE_DATA; else con->v2.in_state = IN_S_PREPARE_READ_DATA; diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index d3a759e052c8..8d9760397b88 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -5510,7 +5510,7 @@ static struct ceph_msg *get_reply(struct ceph_connection *con, }
m = ceph_msg_get(req->r_reply); - m->sparse_read = (bool)srlen; + m->sparse_read_total = srlen;
dout("get_reply tid %lld %p\n", tid, m);
@@ -5777,11 +5777,8 @@ static int prep_next_sparse_read(struct ceph_connection *con, }
if (o->o_sparse_op_idx < 0) { - u64 srlen = sparse_data_requested(req); - - dout("%s: [%d] starting new sparse read req. srlen=0x%llx\n", - __func__, o->o_osd, srlen); - ceph_msg_data_cursor_init(cursor, con->in_msg, srlen); + dout("%s: [%d] starting new sparse read req\n", + __func__, o->o_osd); } else { u64 end;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ghiti alexghiti@rivosinc.com
[ Upstream commit ce68c035457bdd025a9961e0ba2157323090c581 ]
arch_hugetlb_migration_supported() must be reimplemented to add support for NAPOT hugepages, which is done here.
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page") Signed-off-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240130120114.106003-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/hugetlb.h | 3 +++ arch/riscv/mm/hugetlbpage.c | 16 +++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h index 4c5b0e929890..20f9c3ba2341 100644 --- a/arch/riscv/include/asm/hugetlb.h +++ b/arch/riscv/include/asm/hugetlb.h @@ -11,6 +11,9 @@ static inline void arch_clear_hugepage_flags(struct page *page) } #define arch_clear_hugepage_flags arch_clear_hugepage_flags
+bool arch_hugetlb_migration_supported(struct hstate *h); +#define arch_hugetlb_migration_supported arch_hugetlb_migration_supported + #ifdef CONFIG_RISCV_ISA_SVNAPOT #define __HAVE_ARCH_HUGE_PTE_CLEAR void huge_pte_clear(struct mm_struct *mm, unsigned long addr, diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 87af75ee7186..e7b69281875b 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -364,7 +364,7 @@ void huge_pte_clear(struct mm_struct *mm, pte_clear(mm, addr, ptep); }
-static __init bool is_napot_size(unsigned long size) +static bool is_napot_size(unsigned long size) { unsigned long order;
@@ -392,7 +392,7 @@ arch_initcall(napot_hugetlbpages_init);
#else
-static __init bool is_napot_size(unsigned long size) +static bool is_napot_size(unsigned long size) { return false; } @@ -409,7 +409,7 @@ int pmd_huge(pmd_t pmd) return pmd_leaf(pmd); }
-bool __init arch_hugetlb_valid_size(unsigned long size) +static bool __hugetlb_valid_size(unsigned long size) { if (size == HPAGE_SIZE) return true; @@ -421,6 +421,16 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return false; }
+bool __init arch_hugetlb_valid_size(unsigned long size) +{ + return __hugetlb_valid_size(size); +} + +bool arch_hugetlb_migration_supported(struct hstate *h) +{ + return __hugetlb_valid_size(huge_page_size(h)); +} + #ifdef CONFIG_CONTIG_ALLOC static __init int gigantic_pages_init(void) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Dooks ben.dooks@codethink.co.uk
[ Upstream commit 2cf963787529f615f7c93bdcf13a5e82029e7f38 ]
The percpu area overflow_stacks is exported from arch/riscv/kernel/traps.c for use in the entry code, but is not declared anywhere. Add the relevant declaration to arch/riscv/include/asm/stacktrace.h to silence the following sparse warning:
arch/riscv/kernel/traps.c:395:1: warning: symbol '__pcpu_scope_overflow_stack' was not declared. Should it be static?
We don't add the stackinfo_get_overflow() call as for some of the other architectures as this doesn't seem to be used yet, so just silence the warning.
Signed-off-by: Ben Dooks ben.dooks@codethink.co.uk Reviewed-by: Conor Dooley conor.dooley@microchip.com Fixes: be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe") Link: https://lore.kernel.org/r/20231123134214.81481-1-ben.dooks@codethink.co.uk Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/include/asm/stacktrace.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/riscv/include/asm/stacktrace.h b/arch/riscv/include/asm/stacktrace.h index f7e8ef2418b9..b1495a7e06ce 100644 --- a/arch/riscv/include/asm/stacktrace.h +++ b/arch/riscv/include/asm/stacktrace.h @@ -21,4 +21,9 @@ static inline bool on_thread_stack(void) return !(((unsigned long)(current->stack) ^ current_stack_pointer) & ~(THREAD_SIZE - 1)); }
+ +#ifdef CONFIG_VMAP_STACK +DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack); +#endif /* CONFIG_VMAP_STACK */ + #endif /* _ASM_RISCV_STACKTRACE_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
[ Upstream commit 2a427b49d02995ea4a6ff93a1432c40fa4d36821 ]
When iocg_kick_delay() is called from a CPU different than the one which set the delay, @now may be in the past of @iocg->delay_at leading to the following warning:
UBSAN: shift-out-of-bounds in block/blk-iocost.c:1359:23 shift exponent 18446744073709 is too large for 64-bit type 'u64' (aka 'unsigned long long') ... Call Trace: <TASK> dump_stack_lvl+0x79/0xc0 __ubsan_handle_shift_out_of_bounds+0x2ab/0x300 iocg_kick_delay+0x222/0x230 ioc_rqos_merge+0x1d7/0x2c0 __rq_qos_merge+0x2c/0x80 bio_attempt_back_merge+0x83/0x190 blk_attempt_plug_merge+0x101/0x150 blk_mq_submit_bio+0x2b1/0x720 submit_bio_noacct_nocheck+0x320/0x3e0 __swap_writepage+0x2ab/0x9d0
The underflow itself doesn't really affect the behavior in any meaningful way; however, the past timestamp may exaggerate the delay amount calculated later in the code, which shouldn't be a material problem given the nature of the delay mechanism.
If @now is in the past, this CPU is racing another CPU which recently set up the delay and there's nothing this CPU can contribute w.r.t. the delay. Let's bail early from iocg_kick_delay() in such cases.
Reported-by: Breno Leitão leitao@debian.org Signed-off-by: Tejun Heo tj@kernel.org Fixes: 5160a5a53c0c ("blk-iocost: implement delay adjustment hysteresis") Link: https://lore.kernel.org/r/ZVvc9L_CYk5LO1fT@slm.duckdns.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-iocost.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 089fcb9cfce3..7ee8d85c2c68 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1353,6 +1353,13 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
lockdep_assert_held(&iocg->waitq.lock);
+ /* + * If the delay is set by another CPU, we may be in the past. No need to + * change anything if so. This avoids decay calculation underflow. + */ + if (time_before64(now->now, iocg->delay_at)) + return false; + /* calculate the current delay in effect - 1/2 every second */ tdelta = now->now - iocg->delay_at; if (iocg->delay)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Tsoy alexander@tsoy.me
commit d915a6850e27efb383cd4400caadfe47792623df upstream.
Audio control requests that sets sampling frequency sometimes fail on this card. Adding delay between control messages eliminates that problem.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217601 Cc: stable@vger.kernel.org Signed-off-by: Alexander Tsoy alexander@tsoy.me Link: https://lore.kernel.org/r/20240124130239.358298-1-alexander@tsoy.me Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2073,6 +2073,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_GENERIC_IMPLICIT_FB), DEVICE_FLG(0x0763, 0x2031, /* M-Audio Fast Track C600 */ QUIRK_FLAG_GENERIC_IMPLICIT_FB), + DEVICE_FLG(0x07fd, 0x000b, /* MOTU M Series 2nd hardware revision */ + QUIRK_FLAG_CTL_MSG_DELAY_1M), DEVICE_FLG(0x08bb, 0x2702, /* LineX FM Transmitter */ QUIRK_FLAG_IGNORE_CTL_ERROR), DEVICE_FLG(0x0951, 0x16ad, /* Kingston HyperX */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Sikorski belegdol+github@gmail.com
commit a969210066054ea109d8b7aff29a9b1c98776841 upstream.
The device fails to initialize otherwise, giving the following error: [ 3676.671641] usb 2-1.1: 1:1: cannot get freq at ep 0x1
Signed-off-by: Julian Sikorski belegdol+github@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240123084935.2745-1-belegdol+github@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2031,6 +2031,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_CTL_MSG_DELAY_1M | QUIRK_FLAG_IGNORE_CTL_ERROR), DEVICE_FLG(0x0499, 0x1509, /* Steinberg UR22 */ QUIRK_FLAG_GENERIC_IMPLICIT_FB), + DEVICE_FLG(0x0499, 0x3108, /* Yamaha YIT-W12TX */ + QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x04d8, 0xfeea, /* Benchmark DAC1 Pre */ QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x04e8, 0xa051, /* Samsung USBC Headset (AKG) */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Young sean@mess.org
commit 7822baa844a87cbb93308c1032c3d47d4079bb8a upstream.
The RODE NT-USB+ is marketed as a professional usb microphone, however the usb audio interface is a mess:
[ 1.130977] usb 1-5: new full-speed USB device number 2 using xhci_hcd [ 1.503906] usb 1-5: config 1 has an invalid interface number: 5 but max is 4 [ 1.503912] usb 1-5: config 1 has no interface number 4 [ 1.519689] usb 1-5: New USB device found, idVendor=19f7, idProduct=0035, bcdDevice= 1.09 [ 1.519695] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 1.519697] usb 1-5: Product: RØDE NT-USB+ [ 1.519699] usb 1-5: Manufacturer: RØDE [ 1.519700] usb 1-5: SerialNumber: 1D773A1A [ 8.327495] usb 1-5: 1:1: cannot get freq at ep 0x82 [ 8.344500] usb 1-5: 1:2: cannot get freq at ep 0x82 [ 8.365499] usb 1-5: 2:1: cannot get freq at ep 0x2
Add QUIRK_FLAG_GET_SAMPLE_RATE to work around the broken sample rate get. I have asked Rode support to fix it, but they show no interest.
Signed-off-by: Sean Young sean@mess.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240124151524.23314-1-sean@mess.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2183,6 +2183,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_FIXED_RATE), DEVICE_FLG(0x1bcf, 0x2283, /* NexiGo N930AF FHD Webcam */ QUIRK_FLAG_GET_SAMPLE_RATE), + DEVICE_FLG(0x19f7, 0x0035, /* RODE NT-USB+ */ + QUIRK_FLAG_GET_SAMPLE_RATE),
/* Vendor matches */ VENDOR_FLG(0x045e, /* MS Lifecam */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: JackBB Wu wojackbb@gmail.com
commit 129690fb229a20b6e563a77a2c85266acecf20bc upstream.
Add support for Dell DW5826e with USB-id 0x413c:0x8217 & 0x413c:0x8218.
It is 0x413c:0x8217 T: Bus=02 Lev=01 Prnt=01 Port=05 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=413c ProdID=8217 Rev= 5.04 S: Manufacturer=DELL S: Product=COMPAL Electronics EXM-G1A S: SerialNumber=359302940050401 C:* #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=qcserial E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=qcserial E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=qcserial E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
It is 0x413c:0x8218 T: Bus=02 Lev=01 Prnt=01 Port=05 Cnt=01 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=413c ProdID=8218 Rev= 0.00 S: Manufacturer=DELL S: Product=COMPAL Electronics EXM-G1A S: SerialNumber=359302940050401 C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr= 2mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=qcserial E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Signed-off-by: JackBB Wu wojackbb@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/qcserial.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/serial/qcserial.c +++ b/drivers/usb/serial/qcserial.c @@ -184,6 +184,8 @@ static const struct usb_device_id id_tab {DEVICE_SWI(0x413c, 0x81d0)}, /* Dell Wireless 5819 */ {DEVICE_SWI(0x413c, 0x81d1)}, /* Dell Wireless 5818 */ {DEVICE_SWI(0x413c, 0x81d2)}, /* Dell Wireless 5818 */ + {DEVICE_SWI(0x413c, 0x8217)}, /* Dell Wireless DW5826e */ + {DEVICE_SWI(0x413c, 0x8218)}, /* Dell Wireless DW5826e QDL */
/* Huawei devices */ {DEVICE_HWI(0x03f0, 0x581d)}, /* HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e) */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Puliang Lu puliang.lu@fibocom.com
commit b4a1f4eaf1d798066affc6ad040f76eb1a16e1c9 upstream.
Update the USB serial option driver support for the Fibocom FM101-GL LTE modules as there are actually several different variants. - VID:PID 2cb7:01a3, FM101-GL are laptop M.2 cards (with MBIM interfaces for /Linux/Chrome OS)
0x01a3:mbim,gnss
Here are the outputs of usb-devices:
T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 3 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1 P: Vendor=2cb7 ProdID=01a3 Rev=05.04 S: Manufacturer=Fibocom Wireless Inc. S: Product=Fibocom FM101-GL Module S: SerialNumber=5ccd5cd4 C: #Ifs= 3 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
Signed-off-by: Puliang Lu puliang.lu@fibocom.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2269,6 +2269,7 @@ static const struct usb_device_id option { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x0111, 0xff) }, /* Fibocom FM160 (MBIM mode) */ { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a0, 0xff) }, /* Fibocom NL668-AM/NL652-EU (laptop MBIM) */ { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a2, 0xff) }, /* Fibocom FM101-GL (laptop MBIM) */ + { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a3, 0xff) }, /* Fibocom FM101-GL (laptop MBIM) */ { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a4, 0xff), /* Fibocom FM101-GL (laptop MBIM) */ .driver_info = RSVD(4) }, { USB_DEVICE_INTERFACE_CLASS(0x2df3, 0x9d03, 0xff) }, /* LongSung M5710 */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leonard Dallmayr leonard.dallmayr@mailbox.org
commit 12b17b4eb82a41977eb848048137b5908d52845c upstream.
The device IMST USB-Stick for Smart Meter is a rebranded IMST iM871A-USB Wireless M-Bus USB-adapter. It is used to read wireless water, gas and electricity meters.
Signed-off-by: Leonard Dallmayr leonard.dallmayr@mailbox.org Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -146,6 +146,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x10C4, 0x85F8) }, /* Virtenio Preon32 */ { USB_DEVICE(0x10C4, 0x8664) }, /* AC-Services CAN-IF */ { USB_DEVICE(0x10C4, 0x8665) }, /* AC-Services OBD-IF */ + { USB_DEVICE(0x10C4, 0x87ED) }, /* IMST USB-Stick for Smart Meter */ { USB_DEVICE(0x10C4, 0x8856) }, /* CEL EM357 ZigBee USB Stick - LR */ { USB_DEVICE(0x10C4, 0x8857) }, /* CEL EM357 ZigBee USB Stick */ { USB_DEVICE(0x10C4, 0x88A4) }, /* MMB Networks ZigBee USB Device */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Badhri Jagan Sridharan badhri@google.com
commit b717dfbf73e842d15174699fe2c6ee4fdde8aa1f upstream.
This reverts commit 1e35f074399dece73d5df11847d4a0d7a6f49434.
Given that ERROR_RECOVERY calls into PORT_RESET for Hi-Zing the CC pins, setting CC pins to default state during PORT_RESET breaks error recovery.
4.5.2.2.2.1 ErrorRecovery State Requirements The port shall not drive VBUS or VCONN, and shall present a high-impedance to ground (above zOPEN) on its CC1 and CC2 pins.
Hi-Zing the CC pins is the inteded behavior for PORT_RESET. CC pins are set to default state after tErrorRecovery in PORT_RESET_WAIT_OFF.
4.5.2.2.2.2 Exiting From ErrorRecovery State A Sink shall transition to Unattached.SNK after tErrorRecovery. A Source shall transition to Unattached.SRC after tErrorRecovery.
Cc: stable@vger.kernel.org Cc: Frank Wang frank.wang@rock-chips.com Fixes: 1e35f074399d ("usb: typec: tcpm: fix cc role at port reset") Signed-off-by: Badhri Jagan Sridharan badhri@google.com Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20240117114742.2587779-1-badhri@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -4862,8 +4862,7 @@ static void run_state_machine(struct tcp break; case PORT_RESET: tcpm_reset_port(port); - tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ? - TYPEC_CC_RD : tcpm_rp_cc(port)); + tcpm_set_cc(port, TYPEC_CC_OPEN); tcpm_set_state(port, PORT_RESET_WAIT_OFF, PD_T_ERROR_RECOVERY); break;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qiuxu Zhuo qiuxu.zhuo@intel.com
commit 8eed4e00a370b37b4e5985ed983dccedd555ea9d upstream.
During memory error injection test on kernels >= v6.4, the kernel panics like below. However, this issue couldn't be reproduced on kernels <= v6.3.
mce: [Hardware Error]: CPU 296: Machine Check Exception: f Bank 1: bd80000000100134 mce: [Hardware Error]: RIP 10:<ffffffff821b9776> {__get_user_nocheck_4+0x6/0x20} mce: [Hardware Error]: TSC 411a93533ed ADDR 346a8730040 MISC 86 mce: [Hardware Error]: PROCESSOR 0:a06d0 TIME 1706000767 SOCKET 1 APIC 211 microcode 80001490 mce: [Hardware Error]: Run the above through 'mcelog --ascii' mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel Kernel panic - not syncing: Fatal local machine check
The MCA code can recover from an in-kernel #MC if the fixup type is EX_TYPE_UACCESS, explicitly indicating that the kernel is attempting to access userspace memory. However, if the fixup type is EX_TYPE_DEFAULT the only thing that is raised for an in-kernel #MC is a panic.
ex_handler_uaccess() would warn if users gave a non-canonical addresses (with bit 63 clear) to {get, put}_user(), which was unexpected.
Therefore, commit
b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()")
replaced _ASM_EXTABLE_UA() with _ASM_EXTABLE() for {get, put}_user() fixups. However, the new fixup type EX_TYPE_DEFAULT results in a panic.
Commit
6014bc27561f ("x86-64: make access_ok() independent of LAM")
added the check gp_fault_address_ok() right before the WARN_ONCE() in ex_handler_uaccess() to not warn about non-canonical user addresses due to LAM.
With that in place, revert back to _ASM_EXTABLE_UA() for {get,put}_user() exception fixups in order to be able to handle in-kernel MCEs correctly again.
[ bp: Massage commit message. ]
Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Signed-off-by: Qiuxu Zhuo qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240129063842.61584-1-qiuxu.zhuo@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/lib/getuser.S | 24 ++++++++++++------------ arch/x86/lib/putuser.S | 20 ++++++++++---------- 2 files changed, 22 insertions(+), 22 deletions(-)
--- a/arch/x86/lib/getuser.S +++ b/arch/x86/lib/getuser.S @@ -163,23 +163,23 @@ SYM_CODE_END(__get_user_8_handle_excepti #endif
/* get_user */ - _ASM_EXTABLE(1b, __get_user_handle_exception) - _ASM_EXTABLE(2b, __get_user_handle_exception) - _ASM_EXTABLE(3b, __get_user_handle_exception) + _ASM_EXTABLE_UA(1b, __get_user_handle_exception) + _ASM_EXTABLE_UA(2b, __get_user_handle_exception) + _ASM_EXTABLE_UA(3b, __get_user_handle_exception) #ifdef CONFIG_X86_64 - _ASM_EXTABLE(4b, __get_user_handle_exception) + _ASM_EXTABLE_UA(4b, __get_user_handle_exception) #else - _ASM_EXTABLE(4b, __get_user_8_handle_exception) - _ASM_EXTABLE(5b, __get_user_8_handle_exception) + _ASM_EXTABLE_UA(4b, __get_user_8_handle_exception) + _ASM_EXTABLE_UA(5b, __get_user_8_handle_exception) #endif
/* __get_user */ - _ASM_EXTABLE(6b, __get_user_handle_exception) - _ASM_EXTABLE(7b, __get_user_handle_exception) - _ASM_EXTABLE(8b, __get_user_handle_exception) + _ASM_EXTABLE_UA(6b, __get_user_handle_exception) + _ASM_EXTABLE_UA(7b, __get_user_handle_exception) + _ASM_EXTABLE_UA(8b, __get_user_handle_exception) #ifdef CONFIG_X86_64 - _ASM_EXTABLE(9b, __get_user_handle_exception) + _ASM_EXTABLE_UA(9b, __get_user_handle_exception) #else - _ASM_EXTABLE(9b, __get_user_8_handle_exception) - _ASM_EXTABLE(10b, __get_user_8_handle_exception) + _ASM_EXTABLE_UA(9b, __get_user_8_handle_exception) + _ASM_EXTABLE_UA(10b, __get_user_8_handle_exception) #endif --- a/arch/x86/lib/putuser.S +++ b/arch/x86/lib/putuser.S @@ -134,15 +134,15 @@ SYM_CODE_START_LOCAL(__put_user_handle_e RET SYM_CODE_END(__put_user_handle_exception)
- _ASM_EXTABLE(1b, __put_user_handle_exception) - _ASM_EXTABLE(2b, __put_user_handle_exception) - _ASM_EXTABLE(3b, __put_user_handle_exception) - _ASM_EXTABLE(4b, __put_user_handle_exception) - _ASM_EXTABLE(5b, __put_user_handle_exception) - _ASM_EXTABLE(6b, __put_user_handle_exception) - _ASM_EXTABLE(7b, __put_user_handle_exception) - _ASM_EXTABLE(9b, __put_user_handle_exception) + _ASM_EXTABLE_UA(1b, __put_user_handle_exception) + _ASM_EXTABLE_UA(2b, __put_user_handle_exception) + _ASM_EXTABLE_UA(3b, __put_user_handle_exception) + _ASM_EXTABLE_UA(4b, __put_user_handle_exception) + _ASM_EXTABLE_UA(5b, __put_user_handle_exception) + _ASM_EXTABLE_UA(6b, __put_user_handle_exception) + _ASM_EXTABLE_UA(7b, __put_user_handle_exception) + _ASM_EXTABLE_UA(9b, __put_user_handle_exception) #ifdef CONFIG_X86_32 - _ASM_EXTABLE(8b, __put_user_handle_exception) - _ASM_EXTABLE(10b, __put_user_handle_exception) + _ASM_EXTABLE_UA(8b, __put_user_handle_exception) + _ASM_EXTABLE_UA(10b, __put_user_handle_exception) #endif
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prashanth K quic_prashk@quicinc.com
commit 817349b6d26aadd8b38283a05ce0bab106b4c765 upstream.
Upstream commit bac1ec551434 ("usb: xhci: Set quirk for XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI which fixes XHC timeout, which was seen on synopsys XHCs while using SG buffers. But the support for this quirk isn't present in the DWC3 layer.
We will encounter this XHCI timeout/hung issue if we run iperf loopback tests using RTL8156 ethernet adaptor on DWC3 targets with scatter-gather enabled. This gets resolved after enabling the XHCI_SG_TRB_CACHE_SIZE_QUIRK. This patch enables it using the xhci device property since its needed for DWC3 controller.
In Synopsys DWC3 databook, Table 9-3: xHCI Debug Capability Limitations Chained TRBs greater than TRB cache size: The debug capability driver must not create a multi-TRB TD that describes smaller than a 1K packet that spreads across 8 or more TRBs on either the IN TR or the OUT TR.
Cc: stable@vger.kernel.org #5.11 Signed-off-by: Prashanth K quic_prashk@quicinc.com Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Link: https://lore.kernel.org/r/20240116055816.1169821-2-quic_prashk@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/host.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/usb/dwc3/host.c +++ b/drivers/usb/dwc3/host.c @@ -61,7 +61,7 @@ out:
int dwc3_host_init(struct dwc3 *dwc) { - struct property_entry props[4]; + struct property_entry props[5]; struct platform_device *xhci; int ret, irq; int prop_idx = 0; @@ -89,6 +89,8 @@ int dwc3_host_init(struct dwc3 *dwc)
memset(props, 0, sizeof(struct property_entry) * ARRAY_SIZE(props));
+ props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-sg-trb-cache-size-quirk"); + if (dwc->usb3_lpm_capable) props[prop_idx++] = PROPERTY_ENTRY_BOOL("usb3-lpm-capable");
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prashanth K quic_prashk@quicinc.com
commit 520b391e3e813c1dd142d1eebb3ccfa6d08c3995 upstream.
Upstream commit bac1ec551434 ("usb: xhci: Set quirk for XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI which fixes XHC timeout, which was seen on synopsys XHCs while using SG buffers. Currently this quirk can only be set using xhci private data. But there are some drivers like dwc3/host.c which adds adds quirks using software node for xhci device. Hence set this xhci quirk by iterating over device properties.
Cc: stable@vger.kernel.org # 5.11 Fixes: bac1ec551434 ("usb: xhci: Set quirk for XHCI_SG_TRB_CACHE_SIZE_QUIRK") Signed-off-by: Prashanth K quic_prashk@quicinc.com Link: https://lore.kernel.org/r/20240116055816.1169821-3-quic_prashk@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-plat.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -250,6 +250,9 @@ int xhci_plat_probe(struct platform_devi if (device_property_read_bool(tmpdev, "quirk-broken-port-ped")) xhci->quirks |= XHCI_BROKEN_PORT_PED;
+ if (device_property_read_bool(tmpdev, "xhci-sg-trb-cache-size-quirk")) + xhci->quirks |= XHCI_SG_TRB_CACHE_SIZE_QUIRK; + device_property_read_u32(tmpdev, "imod-interval-ns", &xhci->imod_interval); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 5372c65e1311a16351ef03dd096ff576e6477674 upstream.
The last TRB of a isoc TD might not trigger an event if there was an error event for a TRB mid TD. This is seen on a NEC Corporation uPD720200 USB 3.0 Host
After an error mid a multi-TRB TD the xHC should according to xhci 4.9.1 generate events for passed TRBs with IOC flag set if it proceeds to the next TD. This event is either a copy of the original error, or a "success" transfer event.
If that event is missing then the driver and xHC host get out of sync as the driver is still expecting a transfer event for that first TD, while xHC host is already sending events for the next TD in the list. This leads to "Transfer event TRB DMA ptr not part of current TD" messages.
As a solution we tag the isoc TDs that get error events mid TD. If an event doesn't match the first TD, then check if the tag is set, and event points to the next TD. In that case give back the fist TD and process the next TD normally
Make sure TD status and transferred length stay valid in both cases with and without final TD completion event.
Reported-by: Michał Pecio michal.pecio@gmail.com Closes: https://lore.kernel.org/linux-usb/20240112235205.1259f60c@foxbook/ Tested-by: Michał Pecio michal.pecio@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240125152737.2983959-4-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 74 ++++++++++++++++++++++++++++++++++--------- drivers/usb/host/xhci.h | 1 2 files changed, 61 insertions(+), 14 deletions(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2377,6 +2377,9 @@ static int process_isoc_td(struct xhci_h /* handle completion code */ switch (trb_comp_code) { case COMP_SUCCESS: + /* Don't overwrite status if TD had an error, see xHCI 4.9.1 */ + if (td->error_mid_td) + break; if (remaining) { frame->status = short_framestatus; if (xhci->quirks & XHCI_TRUST_TX_LENGTH) @@ -2402,8 +2405,9 @@ static int process_isoc_td(struct xhci_h break; case COMP_USB_TRANSACTION_ERROR: frame->status = -EPROTO; + sum_trbs_for_length = true; if (ep_trb != td->last_trb) - return 0; + td->error_mid_td = true; break; case COMP_STOPPED: sum_trbs_for_length = true; @@ -2423,6 +2427,9 @@ static int process_isoc_td(struct xhci_h break; }
+ if (td->urb_length_set) + goto finish_td; + if (sum_trbs_for_length) frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) + ep_trb_len - remaining; @@ -2431,6 +2438,14 @@ static int process_isoc_td(struct xhci_h
td->urb->actual_length += frame->actual_length;
+finish_td: + /* Don't give back TD yet if we encountered an error mid TD */ + if (td->error_mid_td && ep_trb != td->last_trb) { + xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n"); + td->urb_length_set = true; + return 0; + } + return finish_td(xhci, ep, ep_ring, td, trb_comp_code); }
@@ -2809,17 +2824,51 @@ static int handle_tx_event(struct xhci_h }
if (!ep_seg) { - if (!ep->skip || - !usb_endpoint_xfer_isoc(&td->urb->ep->desc)) { - /* Some host controllers give a spurious - * successful event after a short transfer. - * Ignore it. - */ - if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) && - ep_ring->last_td_was_short) { - ep_ring->last_td_was_short = false; - goto cleanup; + + if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) { + skip_isoc_td(xhci, td, ep, status); + goto cleanup; + } + + /* + * Some hosts give a spurious success event after a short + * transfer. Ignore it. + */ + if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) && + ep_ring->last_td_was_short) { + ep_ring->last_td_was_short = false; + goto cleanup; + } + + /* + * xhci 4.10.2 states isoc endpoints should continue + * processing the next TD if there was an error mid TD. + * So host like NEC don't generate an event for the last + * isoc TRB even if the IOC flag is set. + * xhci 4.9.1 states that if there are errors in mult-TRB + * TDs xHC should generate an error for that TRB, and if xHC + * proceeds to the next TD it should genete an event for + * any TRB with IOC flag on the way. Other host follow this. + * So this event might be for the next TD. + */ + if (td->error_mid_td && + !list_is_last(&td->td_list, &ep_ring->td_list)) { + struct xhci_td *td_next = list_next_entry(td, td_list); + + ep_seg = trb_in_td(xhci, td_next->start_seg, td_next->first_trb, + td_next->last_trb, ep_trb_dma, false); + if (ep_seg) { + /* give back previous TD, start handling new */ + xhci_dbg(xhci, "Missing TD completion event after mid TD error\n"); + ep_ring->dequeue = td->last_trb; + ep_ring->deq_seg = td->last_trb_seg; + inc_deq(xhci, ep_ring); + xhci_td_cleanup(xhci, td, ep_ring, td->status); + td = td_next; } + } + + if (!ep_seg) { /* HC is busted, give up! */ xhci_err(xhci, "ERROR Transfer event TRB DMA ptr not " @@ -2831,9 +2880,6 @@ static int handle_tx_event(struct xhci_h ep_trb_dma, true); return -ESHUTDOWN; } - - skip_isoc_td(xhci, td, ep, status); - goto cleanup; } if (trb_comp_code == COMP_SHORT_PACKET) ep_ring->last_td_was_short = true; --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1573,6 +1573,7 @@ struct xhci_td { struct xhci_segment *bounce_seg; /* actual_length of the URB has already been set */ bool urb_length_set; + bool error_mid_td; unsigned int num_trbs; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Pecio michal.pecio@gmail.com
commit 7c4650ded49e5b88929ecbbb631efb8b0838e811 upstream.
xHCI 4.9 explicitly forbids assuming that the xHC has released its ownership of a multi-TRB TD when it reports an error on one of the early TRBs. Yet the driver makes such assumption and releases the TD, allowing the remaining TRBs to be freed or overwritten by new TDs.
The xHC should also report completion of the final TRB due to its IOC flag being set by us, regardless of prior errors. This event cannot be recognized if the TD has already been freed earlier, resulting in "Transfer event TRB DMA ptr not part of current TD" error message.
Fix this by reusing the logic for processing isoc Transaction Errors. This also handles hosts which fail to report the final completion.
Fix transfer length reporting on Babble errors. They may be caused by device malfunction, no guarantee that the buffer has been filled.
Signed-off-by: Michal Pecio michal.pecio@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240125152737.2983959-5-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2395,9 +2395,13 @@ static int process_isoc_td(struct xhci_h case COMP_BANDWIDTH_OVERRUN_ERROR: frame->status = -ECOMM; break; - case COMP_ISOCH_BUFFER_OVERRUN: case COMP_BABBLE_DETECTED_ERROR: + sum_trbs_for_length = true; + fallthrough; + case COMP_ISOCH_BUFFER_OVERRUN: frame->status = -EOVERFLOW; + if (ep_trb != td->last_trb) + td->error_mid_td = true; break; case COMP_INCOMPATIBLE_DEVICE_ERROR: case COMP_STALL_ERROR:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heikki Krogerus heikki.krogerus@linux.intel.com
commit de4b5b28c87ccae4da268a53c5df135437f5cfde upstream.
This patch adds the necessary PCI ID for Intel Arrow Lake-H devices.
Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Heikki Krogerus heikki.krogerus@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115092820.1454492-1-heikki.krogerus@linux.int... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-pci.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -51,6 +51,8 @@ #define PCI_DEVICE_ID_INTEL_MTLP 0x7ec1 #define PCI_DEVICE_ID_INTEL_MTLS 0x7f6f #define PCI_DEVICE_ID_INTEL_MTL 0x7e7e +#define PCI_DEVICE_ID_INTEL_ARLH 0x7ec1 +#define PCI_DEVICE_ID_INTEL_ARLH_PCH 0x777e #define PCI_DEVICE_ID_INTEL_TGL 0x9a15 #define PCI_DEVICE_ID_AMD_MR 0x163a
@@ -421,6 +423,8 @@ static const struct pci_device_id dwc3_p { PCI_DEVICE_DATA(INTEL, MTLP, &dwc3_pci_intel_swnode) }, { PCI_DEVICE_DATA(INTEL, MTL, &dwc3_pci_intel_swnode) }, { PCI_DEVICE_DATA(INTEL, MTLS, &dwc3_pci_intel_swnode) }, + { PCI_DEVICE_DATA(INTEL, ARLH, &dwc3_pci_intel_swnode) }, + { PCI_DEVICE_DATA(INTEL, ARLH_PCH, &dwc3_pci_intel_swnode) }, { PCI_DEVICE_DATA(INTEL, TGL, &dwc3_pci_intel_swnode) },
{ PCI_DEVICE_DATA(AMD, NL_USB, &dwc3_pci_amd_swnode) },
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker frederic@kernel.org
commit dad6a09f3148257ac1773cd90934d721d68ab595 upstream.
The hrtimers migration on CPU-down hotplug process has been moved earlier, before the CPU actually goes to die. This leaves a small window of opportunity to queue an hrtimer in a blind spot, leaving it ignored.
For example a practical case has been reported with RCU waking up a SCHED_FIFO task right before the CPUHP_AP_IDLE_DEAD stage, queuing that way a sched/rt timer to the local offline CPU.
Make sure such situations never go unnoticed and warn when that happens.
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Reported-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240129235646.3171983-4-boqun.feng@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/hrtimer.h | 4 +++- kernel/time/hrtimer.c | 3 +++ 2 files changed, 6 insertions(+), 1 deletion(-)
--- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -197,6 +197,7 @@ enum hrtimer_base_type { * @max_hang_time: Maximum time spent in hrtimer_interrupt * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are * expired + * @online: CPU is online from an hrtimers point of view * @timer_waiters: A hrtimer_cancel() invocation waits for the timer * callback to finish. * @expires_next: absolute time of the next event, is required for remote @@ -219,7 +220,8 @@ struct hrtimer_cpu_base { unsigned int hres_active : 1, in_hrtirq : 1, hang_detected : 1, - softirq_activated : 1; + softirq_activated : 1, + online : 1; #ifdef CONFIG_HIGH_RES_TIMERS unsigned int nr_events; unsigned short nr_retries; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1085,6 +1085,7 @@ static int enqueue_hrtimer(struct hrtime enum hrtimer_mode mode) { debug_activate(timer, mode); + WARN_ON_ONCE(!base->cpu_base->online);
base->cpu_base->active_bases |= 1 << base->index;
@@ -2183,6 +2184,7 @@ int hrtimers_prepare_cpu(unsigned int cp cpu_base->softirq_next_timer = NULL; cpu_base->expires_next = KTIME_MAX; cpu_base->softirq_expires_next = KTIME_MAX; + cpu_base->online = 1; hrtimer_cpu_base_init_expiry_lock(cpu_base); return 0; } @@ -2250,6 +2252,7 @@ int hrtimers_cpu_dying(unsigned int dyin smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
raw_spin_unlock(&new_base->lock); + old_base->online = 0; raw_spin_unlock(&old_base->lock);
return 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Sembach wse@tuxedocomputers.com
commit a60e6c3918d20848906ffcdfcf72ca6a8cfbcf2e upstream.
When closing the laptop lid with an external screen connected, the mouse pointer has a constant movement to the lower right corner. Opening the lid again stops this movement, but after that the touchpad does no longer register clicks.
The touchpad is connected both via i2c-hid and PS/2, the predecessor of this device (NS70MU) has the same layout in this regard and also strange behaviour caused by the psmouse and the i2c-hid driver fighting over touchpad control. This fix is reusing the same workaround by just disabling the PS/2 aux port, that is only used by the touchpad, to give the i2c-hid driver the lone control over the touchpad.
v2: Rebased on current master
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231205163602.16106-1-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -1210,6 +1210,12 @@ static const struct dmi_system_id i8042_ }, { .matches = { + DMI_MATCH(DMI_BOARD_NAME, "NS5x_7xPU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOAUX) + }, + { + .matches = { DMI_MATCH(DMI_BOARD_NAME, "NJ50_70CU"), }, .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS |
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 683cd8259a9b883a51973511f860976db2550a6e upstream.
After commit 936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode") the keyboard on Dell XPS 13 9350 / 9360 / 9370 models has stopped working after a suspend/resume.
The problem appears to be that atkbd_probe() fails when called from atkbd_reconnect() on resume, which on systems where ATKBD_CMD_GETID is skipped can only happen by ATKBD_CMD_SETLEDS failing. ATKBD_CMD_SETLEDS failing because ATKBD_CMD_GETID was skipped is weird, but apparently that is what is happening.
Fix this by also skipping ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID.
Fixes: 936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode") Reported-by: Paul Menzel pmenzel@molgen.mpg.de Closes: https://lore.kernel.org/linux-input/0aa4a61f-c939-46fe-a572-08022e8931c7@mol... Closes: https://bbs.archlinux.org/viewtopic.php?pid=2146300 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218424 Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2260517 Tested-by: Paul Menzel pmenzel@molgen.mpg.de Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240126160724.13278-2-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/keyboard/atkbd.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/input/keyboard/atkbd.c +++ b/drivers/input/keyboard/atkbd.c @@ -811,7 +811,6 @@ static int atkbd_probe(struct atkbd *atk { struct ps2dev *ps2dev = &atkbd->ps2dev; unsigned char param[2]; - bool skip_getid;
/* * Some systems, where the bit-twiddling when testing the io-lines of the @@ -825,6 +824,11 @@ static int atkbd_probe(struct atkbd *atk "keyboard reset failed on %s\n", ps2dev->serio->phys);
+ if (atkbd_skip_getid(atkbd)) { + atkbd->id = 0xab83; + return 0; + } + /* * Then we check the keyboard ID. We should get 0xab83 under normal conditions. * Some keyboards report different values, but the first byte is always 0xab or @@ -833,18 +837,17 @@ static int atkbd_probe(struct atkbd *atk */
param[0] = param[1] = 0xa5; /* initialize with invalid values */ - skip_getid = atkbd_skip_getid(atkbd); - if (skip_getid || ps2_command(ps2dev, param, ATKBD_CMD_GETID)) { + if (ps2_command(ps2dev, param, ATKBD_CMD_GETID)) {
/* - * If the get ID command was skipped or failed, we check if we can at least set + * If the get ID command failed, we check if we can at least set * the LEDs on the keyboard. This should work on every keyboard out there. * It also turns the LEDs off, which we want anyway. */ param[0] = 0; if (ps2_command(ps2dev, param, ATKBD_CMD_SETLEDS)) return -1; - atkbd->id = skip_getid ? 0xab83 : 0xabba; + atkbd->id = 0xabba; return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit c87011986fad043ce31a5e749f113540a179a73f which is commit c3ab23a10771bbe06300e5374efa809789c65455 upstream.
Link: https://lore.kernel.org/r/CAD_nV8BG0t7US=+C28kQOR==712MPfZ9m-fuKksgoZCgrEByC... Reported-by: Ted Chang tedchang2010@gmail.com Cc: Takashi Iwai tiwai@suse.de Cc: Venkata Prasad Potturu venkataprasad.potturu@amd.com Cc: Mark Brown broonie@kernel.org Cc: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/amd/acp-config.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-)
--- a/sound/soc/amd/acp-config.c +++ b/sound/soc/amd/acp-config.c @@ -3,7 +3,7 @@ // This file is provided under a dual BSD/GPLv2 license. When using or // redistributing this file, you may do so under either license. // -// Copyright(c) 2021, 2023 Advanced Micro Devices, Inc. +// Copyright(c) 2021 Advanced Micro Devices, Inc. // // Authors: Ajit Kumar Pandey AjitKumar.Pandey@amd.com // @@ -45,19 +45,6 @@ static const struct config_entry config_ }, }, {} - }, - }, - { - .flags = FLAG_AMD_LEGACY, - .device = ACP_PCI_DEV_ID, - .dmi_table = (const struct dmi_system_id []) { - { - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Valve"), - DMI_MATCH(DMI_PRODUCT_NAME, "Jupiter"), - }, - }, - {} }, }, {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aurelien Jarno aurelien@aurel32.net
commit 31e97d7c9ae3de072d7b424b2cf706a03ec10720 upstream.
This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10 driver. This improves the readability and more importantly, for the solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13):
- the preprocessed size from 121 MiB to 4.5 MiB;
- the build CPU time from 46.8 s to 1.6 s;
- the build memory from 2786 MiB to 98MiB.
In fine, this allows this relatively simple C file to be built on a 32-bit system.
Reported-by: Jiri Slaby jirislaby@gmail.com Closes: https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/ Cc: stable@vger.kernel.org # v6.7+ Suggested-by: David Laight David.Laight@ACULAB.COM Signed-off-by: Aurelien Jarno aurelien@aurel32.net Reviewed-by: David Laight David.Laight@ACULAB.COM Reviewed-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: regressions@leemhuis.info Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/media/pci/solo6x10/solo6x10-offsets.h +++ b/drivers/media/pci/solo6x10/solo6x10-offsets.h @@ -57,16 +57,16 @@ #define SOLO_MP4E_EXT_ADDR(__solo) \ (SOLO_EREF_EXT_ADDR(__solo) + SOLO_EREF_EXT_AREA(__solo)) #define SOLO_MP4E_EXT_SIZE(__solo) \ - max((__solo->nr_chans * 0x00080000), \ - min(((__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo)) - \ - __SOLO_JPEG_MIN_SIZE(__solo)), 0x00ff0000)) + clamp(__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo) - \ + __SOLO_JPEG_MIN_SIZE(__solo), \ + __solo->nr_chans * 0x00080000, 0x00ff0000)
#define __SOLO_JPEG_MIN_SIZE(__solo) (__solo->nr_chans * 0x00080000) #define SOLO_JPEG_EXT_ADDR(__solo) \ (SOLO_MP4E_EXT_ADDR(__solo) + SOLO_MP4E_EXT_SIZE(__solo)) #define SOLO_JPEG_EXT_SIZE(__solo) \ - max(__SOLO_JPEG_MIN_SIZE(__solo), \ - min((__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo)), 0x00ff0000)) + clamp(__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo), \ + __SOLO_JPEG_MIN_SIZE(__solo), 0x00ff0000)
#define SOLO_SDRAM_END(__solo) \ (SOLO_JPEG_EXT_ADDR(__solo) + SOLO_JPEG_EXT_SIZE(__solo))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 72bd80252feeb3bef8724230ee15d9f7ab541c6e upstream.
If we use IORING_OP_RECV with provided buffers and pass in '0' as the length of the request, the length is retrieved from the selected buffer. If MSG_WAITALL is also set and we get a short receive, then we may hit the retry path which decrements sr->len and increments the buffer for a retry. However, the length is still zero at this point, which means that sr->len now becomes huge and import_ubuf() will cap it to MAX_RW_COUNT and subsequently return -EFAULT for the range as a whole.
Fix this by always assigning sr->len once the buffer has been selected.
Cc: stable@vger.kernel.org Fixes: 7ba89d2af17a ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 1 + 1 file changed, 1 insertion(+)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -902,6 +902,7 @@ retry_multishot: if (!buf) return -ENOBUFS; sr->buf = buf; + sr->len = len; }
ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
Commit e84b01a880f635e3084a361afba41f95ff500d12 upstream.
In preparation for calling __io_poll_execute() higher up, move the functions to avoid forward declarations.
No functional changes in this patch.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/poll.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
--- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -228,6 +228,21 @@ enum { IOU_POLL_REISSUE = 3, };
+static void __io_poll_execute(struct io_kiocb *req, int mask) +{ + io_req_set_res(req, mask, 0); + req->io_task_work.func = io_poll_task_func; + + trace_io_uring_task_add(req, mask); + io_req_task_work_add(req); +} + +static inline void io_poll_execute(struct io_kiocb *req, int res) +{ + if (io_poll_get_ownership(req)) + __io_poll_execute(req, res); +} + /* * All poll tw should go through this. Checks for poll events, manages * references, does rewait, etc. @@ -364,21 +379,6 @@ void io_poll_task_func(struct io_kiocb * } }
-static void __io_poll_execute(struct io_kiocb *req, int mask) -{ - io_req_set_res(req, mask, 0); - req->io_task_work.func = io_poll_task_func; - - trace_io_uring_task_add(req, mask); - io_req_task_work_add(req); -} - -static inline void io_poll_execute(struct io_kiocb *req, int res) -{ - if (io_poll_get_ownership(req)) - __io_poll_execute(req, res); -} - static void io_poll_cancel_req(struct io_kiocb *req) { io_poll_mark_cancelled(req);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
Commit 91e5d765a82fb2c9d0b7ad930d8953208081ddf1 upstream.
In preparation for putting some retry logic in there, have the done path just skip straight to the end rather than have too much nesting in here.
No functional changes in this patch.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -645,23 +645,27 @@ static inline bool io_recv_finish(struct return true; }
- if (!mshot_finished) { - if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, - *ret, cflags | IORING_CQE_F_MORE)) { - io_recv_prep_retry(req); - /* Known not-empty or unknown state, retry */ - if (cflags & IORING_CQE_F_SOCK_NONEMPTY || - msg->msg_inq == -1) - return false; - if (issue_flags & IO_URING_F_MULTISHOT) - *ret = IOU_ISSUE_SKIP_COMPLETE; - else - *ret = -EAGAIN; - return true; - } - /* Otherwise stop multishot but use the current result. */ - } + if (mshot_finished) + goto finish;
+ /* + * Fill CQE for this receive and see if we should keep trying to + * receive from this socket. + */ + if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, + *ret, cflags | IORING_CQE_F_MORE)) { + io_recv_prep_retry(req); + /* Known not-empty or unknown state, retry */ + if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) + return false; + if (issue_flags & IO_URING_F_MULTISHOT) + *ret = IOU_ISSUE_SKIP_COMPLETE; + else + *ret = -EAGAIN; + return true; + } + /* Otherwise stop multishot but use the current result. */ +finish: io_req_set_res(req, *ret, cflags);
if (issue_flags & IO_URING_F_MULTISHOT)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
Commit 704ea888d646cb9d715662944cf389c823252ee0 upstream.
Since our poll handling is edge triggered, multishot handlers retry internally until they know that no more data is available. In preparation for limiting these retries, add an internal return code, IOU_REQUEUE, which can be used to inform the poll backend about the handler wanting to retry, but that this should happen through a normal task_work requeue rather than keep hammering on the issue side for this one request.
No functional changes in this patch, nobody is using this return code just yet.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.h | 7 +++++++ io_uring/poll.c | 9 ++++++++- 2 files changed, 15 insertions(+), 1 deletion(-)
--- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -31,6 +31,13 @@ enum { IOU_ISSUE_SKIP_COMPLETE = -EIOCBQUEUED,
/* + * Requeue the task_work to restart operations on this request. The + * actual value isn't important, should just be not an otherwise + * valid error code, yet less than -MAX_ERRNO and valid internally. + */ + IOU_REQUEUE = -3072, + + /* * Intended only when both IO_URING_F_MULTISHOT is passed * to indicate to the poll runner that multishot should be * removed and the result is set on req->cqe.res. --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -226,6 +226,7 @@ enum { IOU_POLL_NO_ACTION = 1, IOU_POLL_REMOVE_POLL_USE_RES = 2, IOU_POLL_REISSUE = 3, + IOU_POLL_REQUEUE = 4, };
static void __io_poll_execute(struct io_kiocb *req, int mask) @@ -324,6 +325,8 @@ static int io_poll_check_events(struct i int ret = io_poll_issue(req, ts); if (ret == IOU_STOP_MULTISHOT) return IOU_POLL_REMOVE_POLL_USE_RES; + else if (ret == IOU_REQUEUE) + return IOU_POLL_REQUEUE; if (ret < 0) return ret; } @@ -346,8 +349,12 @@ void io_poll_task_func(struct io_kiocb * int ret;
ret = io_poll_check_events(req, ts); - if (ret == IOU_POLL_NO_ACTION) + if (ret == IOU_POLL_NO_ACTION) { return; + } else if (ret == IOU_POLL_REQUEUE) { + __io_poll_execute(req, 0); + return; + } io_poll_remove_entries(req); io_poll_tw_hash_eject(req, ts);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
Commit 76b367a2d83163cf19173d5cb0b562acbabc8eac upstream.
If we have multiple clients and some/all are flooding the receives to such an extent that we can retry a LOT handling multishot receives, then we can be starving some clients and hence serving traffic in an imbalanced fashion.
Limit multishot retry attempts to some arbitrary value, whose only purpose serves to ensure that we don't keep serving a single connection for way too long. We default to 32 retries, which should be more than enough to provide fairness, yet not so small that we'll spend too much time requeuing rather than handling traffic.
Cc: stable@vger.kernel.org Depends-on: 704ea888d646 ("io_uring/poll: add requeue return code from poll multishot handling") Depends-on: 1e5d765a82f ("io_uring/net: un-indent mshot retry path in io_recv_finish()") Depends-on: e84b01a880f6 ("io_uring/poll: move poll execution helpers higher up") Fixes: b3fdea6ecb55 ("io_uring: multishot recv") Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg") Link: https://github.com/axboe/liburing/issues/1043 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -60,6 +60,7 @@ struct io_sr_msg { unsigned len; unsigned done_io; unsigned msg_flags; + unsigned nr_multishot_loops; u16 flags; /* initialised and used only by !msg send variants */ u16 addr_len; @@ -70,6 +71,13 @@ struct io_sr_msg { struct io_kiocb *notif; };
+/* + * Number of times we'll try and do receives if there's more data. If we + * exceed this limit, then add us to the back of the queue and retry from + * there. This helps fairness between flooding clients. + */ +#define MULTISHOT_MAX_RETRY 32 + static inline bool io_check_multishot(struct io_kiocb *req, unsigned int issue_flags) { @@ -611,6 +619,7 @@ int io_recvmsg_prep(struct io_kiocb *req sr->msg_flags |= MSG_CMSG_COMPAT; #endif sr->done_io = 0; + sr->nr_multishot_loops = 0; return 0; }
@@ -654,12 +663,20 @@ static inline bool io_recv_finish(struct */ if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, *ret, cflags | IORING_CQE_F_MORE)) { + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE; + io_recv_prep_retry(req); /* Known not-empty or unknown state, retry */ - if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) - return false; + if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) { + if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) + return false; + /* mshot retries exceeded, force a requeue */ + sr->nr_multishot_loops = 0; + mshot_retry_ret = IOU_REQUEUE; + } if (issue_flags & IO_URING_F_MULTISHOT) - *ret = IOU_ISSUE_SKIP_COMPLETE; + *ret = mshot_retry_ret; else *ret = -EAGAIN; return true;
Hello,
On Tue, 13 Feb 2024 18:20:09 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] bea54b0cb986 ("Linux 6.6.17-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
On Tue, 13 Feb 2024 18:20:09 +0100, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
Built and QEMU-booted for Rust:
Tested-by: Miguel Ojeda ojeda@kernel.org
Cheers, Miguel
On 2/13/24 09:20, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On 2/13/24 10:20, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, Feb 13, 2024 at 06:20:09PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed the kernel on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On Tue, 13 Feb 2024 at 22:57, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.6.17-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.6.y * git commit: bea54b0cb986e2c93aa5d99a24eb2255850384b1 * git describe: v6.6.16-122-gbea54b0cb986 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.16...
## Test Regressions (compared to v6.6.16)
## Metric Regressions (compared to v6.6.16)
## Test Fixes (compared to v6.6.16)
## Metric Fixes (compared to v6.6.16)
## Test result summary total: 149653, pass: 129011, fail: 2047, skip: 18425, xfail: 170
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 147 total, 132 passed, 15 failed * arm64: 54 total, 47 passed, 7 failed * i386: 43 total, 42 passed, 1 failed * mips: 26 total, 25 passed, 1 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 33 passed, 3 failed * riscv: 18 total, 18 passed, 0 failed * s390: 13 total, 12 passed, 1 failed * sh: 10 total, 10 passed, 0 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 48 total, 46 passed, 2 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-vm * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On 13/02/2024 17:20, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Builds are failing for Tegra ...
Test results for stable-v6.6: 10 builds: 3 pass, 7 fail 6 boots: 6 pass, 0 fail 18 tests: 18 pass, 0 fail
Linux version: 6.6.17-rc1-gbea54b0cb986 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra30-cardhu-a04
Builds failed: aarch64+defconfig+jetson, arm+multi_v7
Same cause as reported here [0].
Jon
[0] https://lore.kernel.org/linux-tegra/83771838-c346-4a90-92c1-6ba592a620ac@nvi...
Hi Greg,
On 13/02/24 10:50 pm, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Hi Greg
On Wed, Feb 14, 2024 at 2:27 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.17 release. There are 121 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 15 Feb 2024 17:18:29 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
6.6.17-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.6.17-rc1rv (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Wed Feb 14 18:48:07 JST 2024
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
linux-stable-mirror@lists.linaro.org