This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.63-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.6.63-rc1
SeongJae Park sj@kernel.org mm/damon/core: copy nr_accesses when splitting region
SeongJae Park sj@kernel.org mm/damon/core: handle zero schemes apply interval
SeongJae Park sj@kernel.org mm/damon/core: check apply interval in damon_do_apply_schemes()
Lorenzo Stoakes lorenzo.stoakes@oracle.com mm: resolve faulty mmap_region() error path behaviour
Lorenzo Stoakes lorenzo.stoakes@oracle.com mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
Lorenzo Stoakes lorenzo.stoakes@oracle.com mm: refactor map_deny_write_exec()
Lorenzo Stoakes lorenzo.stoakes@oracle.com mm: unconditionally close VMAs on error
Lorenzo Stoakes lorenzo.stoakes@oracle.com mm: avoid unsafe VMA hook invocation when error arises on mmap hook
George Stark gnstark@salutedevices.com leds: mlxreg: Use devm_mutex_init() for mutex initialization
Eric Van Hensbergen ericvh@kernel.org fs/9p: fix uninitialized values during inode evict
Tvrtko Ursulin tvrtko.ursulin@igalia.com drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: use _rcu variant under rcu_read_lock
Geliang Tang tanggeliang@kylinos.cn mptcp: drop lookup_by_id in lookup_addr
Geliang Tang tanggeliang@kylinos.cn mptcp: hold pm lock when deleting entry
Geliang Tang tanggeliang@kylinos.cn mptcp: update local address flags when setting it
Geliang Tang tanggeliang@kylinos.cn mptcp: add userspace_pm_lookup_addr_by_id helper
Geliang Tang geliang.tang@suse.com mptcp: define more local variables sk
Chuck Lever chuck.lever@oracle.com NFSD: Never decrement pending_async_copies on error
Chuck Lever chuck.lever@oracle.com NFSD: Initialize struct nfsd4_copy earlier
Chuck Lever chuck.lever@oracle.com NFSD: Limit the number of concurrent async COPY operations
Chuck Lever chuck.lever@oracle.com NFSD: Async COPY result needs to return a write verifier
Dai Ngo dai.ngo@oracle.com NFSD: initialize copy->cp_clp early in nfsd4_copy for use by trace point
Mauro Carvalho Chehab mchehab+huawei@kernel.org media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set
Jiri Olsa jolsa@kernel.org lib/buildid: Fix build ID parsing logic
Umang Jain umang.jain@ideasonboard.com staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation
Stefan Wahren wahrenst@gmx.net staging: vchiq_arm: Get the rid off struct vchiq_2835_state
SeongJae Park sj@kernel.org mm/damon/core: handle zero {aggregation,ops_update} intervals
SeongJae Park sj@kernel.org mm/damon/core: implement scheme-specific apply interval
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Adjust VSDB parser for replay feature
Vijendar Mukunda Vijendar.Mukunda@amd.com drm/amd: Fix initialization mistake for NBIO 7.7.0
Dave Airlie airlied@redhat.com nouveau: fw: sync dma after setup is called.
Peng Fan peng.fan@nxp.com pmdomain: imx93-blk-ctrl: correct remove path
Francesco Dolcini francesco.dolcini@toradex.com drm/bridge: tc358768: Fix DSI command tx
Andre Przywara andre.przywara@arm.com mmc: sunxi-mmc: Fix A100 compatible description
Aurelien Jarno aurelien@aurel32.net Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K"
Huacai Chen chenhuacai@kernel.org LoongArch: Make KASAN work with 5-level page-tables
Huacai Chen chenhuacai@kernel.org LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits
Huacai Chen chenhuacai@kernel.org LoongArch: Fix early_numa_add_cpu() usage for FDT systems
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
Dmitry Antipov dmantipov@yandex.ru ocfs2: fix UBSAN warning in ocfs2_verify_volume()
Maksym Glubokiy maxgl.kernel@gmail.com ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Fixed Clevo platform headset Mic issue
Hajime Tazaki thehajime@gmail.com nommu: pass NULL argument to vma_iter_prealloc()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
Sean Christopherson seanjc@google.com KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN
Sean Christopherson seanjc@google.com KVM: x86: Unconditionally set irr_pending when updating APICv state
Sean Christopherson seanjc@google.com KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
Samasth Norway Ananda samasth.norway.ananda@oracle.com ima: fix buffer overrun in ima_eventdigest_init_common
Xiaoguang Wang lege.wang@jaguarmicro.com vp_vdpa: fix id_table array not null terminated error
Si-Wei Liu si-wei.liu@oracle.com vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
Philipp Stanner pstanner@redhat.com vdpa: solidrun: Fix UB bug with devres
Andrew Morton akpm@linux-foundation.org mm: revert "mm: shmem: fix data-race in shmem_getattr()"
Dmitry Antipov dmantipov@yandex.ru ocfs2: uncache inode which has failed entering the group
Jinjiang Tu tujinjiang@huawei.com mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
Baoquan He bhe@redhat.com x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y
Motiejus JakÅ`tys motiejus@jakstys.lt tools/mm: fix compile error
Harith G harith.g@alifsemi.com ARM: 9419/1: mm: Fix kernel memory mapping for xip kernels
Hangbin Liu liuhangbin@gmail.com bonding: add ns target multicast address to slave device
Meghana Malladi m-malladi@ti.com net: ti: icssg-prueth: Fix 1 PPS sync
Vitalii Mordan mordan@ispras.ru stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
Jisheng Zhang jszhang@kernel.org net: stmmac: rename stmmac_pltfr_remove_no_dt to stmmac_pltfr_remove
Jisheng Zhang jszhang@kernel.org net: stmmac: dwmac-visconti: use devm_stmmac_probe_config_dt()
Jisheng Zhang jszhang@kernel.org net: stmmac: dwmac-intel-plat: use devm_stmmac_probe_config_dt()
Michal Luczaj mhal@rbox.co net: Make copy_safe_from_sockptr() match documentation
Nícolas F. R. A. Prado nfraprado@collabora.com net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
Wei Fang wei.fang@nxp.com samples: pktgen: correct dev to DEV
Alexandre Ferrieux alexandre.ferrieux@gmail.com net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
Pedro Tammela pctammela@mojatatu.com net/sched: cls_u32: replace int refcounts with proper refcounts
Kiran K kiran.k@intel.com Bluetooth: btintel: Direct exception event to bluetooth stack
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_core: Fix calling mgmt_device_connected
Leon Romanovsky leon@kernel.org Revert "RDMA/core: Fix ENODEV error for iWARP test over vlan"
Michal Luczaj mhal@rbox.co virtio/vsock: Fix accept_queue memory leak
Moshe Shemesh moshe@nvidia.com net/mlx5e: CT: Fix null-ptr-deref in add rule err flow
William Tu witu@nvidia.com net/mlx5e: clear xdp features on non-uplink representors
Dragos Tatulea dtatulea@nvidia.com net/mlx5e: kTLS, Fix incorrect page refcounting
Mark Bloch mbloch@nvidia.com net/mlx5: fs, lock FTE when checking if active
Paolo Abeni pabeni@redhat.com mptcp: cope racing subflow creation in mptcp_rcv_space_adjust
Paolo Abeni pabeni@redhat.com mptcp: error out earlier on disconnect
Andy Yan andy.yan@rock-chips.com drm/rockchip: vop: Fix a dereferenced before check warning
Stefan Wahren wahrenst@gmx.net net: vertexcom: mse102x: Fix tx_bytes calculation
Eric Dumazet edumazet@google.com sctp: fix possible UAF in sctp_v6_available()
Jakub Kicinski kuba@kernel.org netlink: terminate outstanding dump on socket close
-------------
Diffstat:
Makefile | 4 +- arch/arm/kernel/head.S | 8 +- arch/arm/mm/mmu.c | 34 +++--- arch/arm64/include/asm/mman.h | 10 +- arch/loongarch/include/asm/kasan.h | 2 +- arch/loongarch/kernel/smp.c | 2 +- arch/loongarch/mm/kasan_init.c | 41 ++++++- arch/parisc/include/asm/mman.h | 5 +- arch/x86/kvm/lapic.c | 29 +++-- arch/x86/kvm/vmx/nested.c | 30 ++++- arch/x86/kvm/vmx/vmx.c | 6 +- arch/x86/mm/ioremap.c | 6 +- drivers/bluetooth/btintel.c | 5 +- drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c | 6 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 7 +- drivers/gpu/drm/bridge/tc358768.c | 21 +++- drivers/gpu/drm/nouveau/nvkm/falcon/fw.c | 11 +- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 8 +- drivers/infiniband/core/addr.c | 2 - drivers/leds/leds-mlxreg.c | 16 +-- drivers/media/dvb-core/dvbdev.c | 15 +-- drivers/mmc/host/dw_mmc.c | 4 +- drivers/mmc/host/sunxi-mmc.c | 6 +- drivers/net/bonding/bond_main.c | 16 ++- drivers/net/bonding/bond_options.c | 82 ++++++++++++- drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 2 +- .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 19 ++- .../net/ethernet/stmicro/stmmac/dwmac-intel-plat.c | 40 +++---- .../net/ethernet/stmicro/stmmac/dwmac-mediatek.c | 4 +- .../net/ethernet/stmicro/stmmac/dwmac-visconti.c | 18 +-- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 25 +--- .../net/ethernet/stmicro/stmmac/stmmac_platform.h | 1 - drivers/net/ethernet/ti/icssg/icssg_prueth.c | 13 ++- drivers/net/ethernet/ti/icssg/icssg_prueth.h | 12 ++ drivers/net/ethernet/vertexcom/mse102x.c | 4 +- drivers/pmdomain/imx/imx93-blk-ctrl.c | 4 +- .../vc04_services/interface/vchiq_arm/vchiq_arm.c | 25 +--- drivers/vdpa/mlx5/core/mr.c | 8 +- drivers/vdpa/solidrun/snet_main.c | 14 ++- drivers/vdpa/virtio_pci/vp_vdpa.c | 10 +- fs/9p/vfs_inode.c | 17 +-- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4proc.c | 36 +++--- fs/nfsd/nfs4state.c | 1 + fs/nfsd/xdr4.h | 1 + fs/nilfs2/btnode.c | 2 - fs/nilfs2/gcinode.c | 4 +- fs/nilfs2/mdt.c | 1 - fs/nilfs2/page.c | 2 +- fs/ocfs2/resize.c | 2 + fs/ocfs2/super.c | 13 ++- include/linux/damon.h | 17 ++- include/linux/mman.h | 28 ++++- include/linux/sockptr.h | 4 +- include/net/bond_options.h | 2 + lib/buildid.c | 2 +- mm/damon/core.c | 84 ++++++++++++-- mm/damon/dbgfs.c | 3 +- mm/damon/lru_sort.c | 2 + mm/damon/reclaim.c | 2 + mm/damon/sysfs-schemes.c | 2 +- mm/internal.h | 45 ++++++++ mm/mmap.c | 128 ++++++++++++--------- mm/mprotect.c | 2 +- mm/nommu.c | 11 +- mm/page_alloc.c | 3 +- mm/shmem.c | 5 - net/bluetooth/hci_core.c | 2 - net/mptcp/pm_netlink.c | 15 ++- net/mptcp/pm_userspace.c | 77 ++++++++----- net/mptcp/protocol.c | 16 ++- net/netlink/af_netlink.c | 31 ++--- net/netlink/af_netlink.h | 2 - net/sched/cls_u32.c | 54 +++++---- net/sctp/ipv6.c | 19 ++- net/vmw_vsock/virtio_transport_common.c | 8 ++ samples/pktgen/pktgen_sample01_simple.sh | 2 +- security/integrity/ima/ima_template_lib.c | 14 ++- sound/pci/hda/patch_realtek.c | 3 + tools/mm/page-types.c | 2 +- 83 files changed, 824 insertions(+), 429 deletions(-)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 1904fb9ebf911441f90a68e96b22aa73e4410505 ]
Netlink supports iterative dumping of data. It provides the families the following ops: - start - (optional) kicks off the dumping process - dump - actual dump helper, keeps getting called until it returns 0 - done - (optional) pairs with .start, can be used for cleanup The whole process is asynchronous and the repeated calls to .dump don't actually happen in a tight loop, but rather are triggered in response to recvmsg() on the socket.
This gives the user full control over the dump, but also means that the user can close the socket without getting to the end of the dump. To make sure .start is always paired with .done we check if there is an ongoing dump before freeing the socket, and if so call .done.
The complication is that sockets can get freed from BH and .done is allowed to sleep. So we use a workqueue to defer the call, when needed.
Unfortunately this does not work correctly. What we defer is not the cleanup but rather releasing a reference on the socket. We have no guarantee that we own the last reference, if someone else holds the socket they may release it in BH and we're back to square one.
The whole dance, however, appears to be unnecessary. Only the user can interact with dumps, so we can clean up when socket is closed. And close always happens in process context. Some async code may still access the socket after close, queue notification skbs to it etc. but no dumps can start, end or otherwise make progress.
Delete the workqueue and flush the dump state directly from the release handler. Note that further cleanup is possible in -next, for instance we now always call .done before releasing the main module reference, so dump doesn't have to take a reference of its own.
Reported-by: syzkaller syzkaller@googlegroups.com Fixes: ed5d7788a934 ("netlink: Do not schedule work from sk_destruct") Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20241106015235.2458807-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/af_netlink.c | 31 ++++++++----------------------- net/netlink/af_netlink.h | 2 -- 2 files changed, 8 insertions(+), 25 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 50e13207a05aa..4aa2cbe9d6fa6 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -393,15 +393,6 @@ static void netlink_skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
static void netlink_sock_destruct(struct sock *sk) { - struct netlink_sock *nlk = nlk_sk(sk); - - if (nlk->cb_running) { - if (nlk->cb.done) - nlk->cb.done(&nlk->cb); - module_put(nlk->cb.module); - kfree_skb(nlk->cb.skb); - } - skb_queue_purge(&sk->sk_receive_queue);
if (!sock_flag(sk, SOCK_DEAD)) { @@ -414,14 +405,6 @@ static void netlink_sock_destruct(struct sock *sk) WARN_ON(nlk_sk(sk)->groups); }
-static void netlink_sock_destruct_work(struct work_struct *work) -{ - struct netlink_sock *nlk = container_of(work, struct netlink_sock, - work); - - sk_free(&nlk->sk); -} - /* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on * SMP. Look, when several writers sleep and reader wakes them up, all but one * immediately hit write lock and grab all the cpus. Exclusive sleep solves @@ -735,12 +718,6 @@ static void deferred_put_nlk_sk(struct rcu_head *head) if (!refcount_dec_and_test(&sk->sk_refcnt)) return;
- if (nlk->cb_running && nlk->cb.done) { - INIT_WORK(&nlk->work, netlink_sock_destruct_work); - schedule_work(&nlk->work); - return; - } - sk_free(sk); }
@@ -792,6 +769,14 @@ static int netlink_release(struct socket *sock) NETLINK_URELEASE, &n); }
+ /* Terminate any outstanding dump */ + if (nlk->cb_running) { + if (nlk->cb.done) + nlk->cb.done(&nlk->cb); + module_put(nlk->cb.module); + kfree_skb(nlk->cb.skb); + } + module_put(nlk->module);
if (netlink_is_kernel(sk)) { diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 9751e29d4bbb9..b1a17c0d97a10 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -4,7 +4,6 @@
#include <linux/rhashtable.h> #include <linux/atomic.h> -#include <linux/workqueue.h> #include <net/sock.h>
/* flags */ @@ -51,7 +50,6 @@ struct netlink_sock {
struct rhash_head node; struct rcu_head rcu; - struct work_struct work; };
static inline struct netlink_sock *nlk_sk(struct sock *sk)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit eb72e7fcc83987d5d5595b43222f23b295d5de7f ]
A lockdep report [1] with CONFIG_PROVE_RCU_LIST=y hints that sctp_v6_available() is calling dev_get_by_index_rcu() and ipv6_chk_addr() without holding rcu.
[1] ============================= WARNING: suspicious RCU usage 6.12.0-rc5-virtme #1216 Tainted: G W ----------------------------- net/core/dev.c:876 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by sctp_hello/31495: #0: ffff9f1ebbdb7418 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_bind (./arch/x86/include/asm/jump_label.h:27 net/sctp/socket.c:315) sctp
stack backtrace: CPU: 7 UID: 0 PID: 31495 Comm: sctp_hello Tainted: G W 6.12.0-rc5-virtme #1216 Tainted: [W]=WARN Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) dev_get_by_index_rcu (net/core/dev.c:876 (discriminator 7)) sctp_v6_available (net/sctp/ipv6.c:701) sctp sctp_do_bind (net/sctp/socket.c:400 (discriminator 1)) sctp sctp_bind (net/sctp/socket.c:320) sctp inet6_bind_sk (net/ipv6/af_inet6.c:465) ? security_socket_bind (security/security.c:4581 (discriminator 1)) __sys_bind (net/socket.c:1848 net/socket.c:1869) ? do_user_addr_fault (./include/linux/rcupdate.h:347 ./include/linux/rcupdate.h:880 ./include/linux/mm.h:729 arch/x86/mm/fault.c:1340) ? do_user_addr_fault (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:98 (discriminator 13) ./include/linux/rcupdate.h:882 (discriminator 13) ./include/linux/mm.h:729 (discriminator 13) arch/x86/mm/fault.c:1340 (discriminator 13)) __x64_sys_bind (net/socket.c:1877 (discriminator 1) net/socket.c:1875 (discriminator 1) net/socket.c:1875 (discriminator 1)) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f59b934a1e7 Code: 44 00 00 48 8b 15 39 8c 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 8c 0c 00 f7 d8 64 89 01 48 All code ======== 0: 44 00 00 add %r8b,(%rax) 3: 48 8b 15 39 8c 0c 00 mov 0xc8c39(%rip),%rdx # 0xc8c43 a: f7 d8 neg %eax c: 64 89 02 mov %eax,%fs:(%rdx) f: b8 ff ff ff ff mov $0xffffffff,%eax 14: eb bd jmp 0xffffffffffffffd3 16: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 1d: 00 00 00 20: 0f 1f 00 nopl (%rax) 23: b8 31 00 00 00 mov $0x31,%eax 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 01 jae 0x33 32: c3 ret 33: 48 8b 0d 09 8c 0c 00 mov 0xc8c09(%rip),%rcx # 0xc8c43 3a: f7 d8 neg %eax 3c: 64 89 01 mov %eax,%fs:(%rcx) 3f: 48 rex.W
Code starting with the faulting instruction =========================================== 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 6: 73 01 jae 0x9 8: c3 ret 9: 48 8b 0d 09 8c 0c 00 mov 0xc8c09(%rip),%rcx # 0xc8c19 10: f7 d8 neg %eax 12: 64 89 01 mov %eax,%fs:(%rcx) 15: 48 rex.W RSP: 002b:00007ffe2d0ad398 EFLAGS: 00000202 ORIG_RAX: 0000000000000031 RAX: ffffffffffffffda RBX: 00007ffe2d0ad3d0 RCX: 00007f59b934a1e7 RDX: 000000000000001c RSI: 00007ffe2d0ad3d0 RDI: 0000000000000005 RBP: 0000000000000005 R08: 1999999999999999 R09: 0000000000000000 R10: 00007f59b9253298 R11: 0000000000000202 R12: 00007ffe2d0ada61 R13: 0000000000000000 R14: 0000562926516dd8 R15: 00007f59b9479000 </TASK>
Fixes: 6fe1e52490a9 ("sctp: check ipv6 addr with sk_bound_dev if set") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Xin Long lucien.xin@gmail.com Link: https://patch.msgid.link/20241107192021.2579789-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/ipv6.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 43f2731bf590e..08acda9ecdf56 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -684,7 +684,7 @@ static int sctp_v6_available(union sctp_addr *addr, struct sctp_sock *sp) struct sock *sk = &sp->inet.sk; struct net *net = sock_net(sk); struct net_device *dev = NULL; - int type; + int type, res, bound_dev_if;
type = ipv6_addr_type(in6); if (IPV6_ADDR_ANY == type) @@ -698,14 +698,21 @@ static int sctp_v6_available(union sctp_addr *addr, struct sctp_sock *sp) if (!(type & IPV6_ADDR_UNICAST)) return 0;
- if (sk->sk_bound_dev_if) { - dev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if); + rcu_read_lock(); + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); + if (bound_dev_if) { + res = 0; + dev = dev_get_by_index_rcu(net, bound_dev_if); if (!dev) - return 0; + goto out; }
- return ipv6_can_nonlocal_bind(net, &sp->inet) || - ipv6_chk_addr(net, in6, dev, 0); + res = ipv6_can_nonlocal_bind(net, &sp->inet) || + ipv6_chk_addr(net, in6, dev, 0); + +out: + rcu_read_unlock(); + return res; }
/* This function checks if the address is a valid address to be used for
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit e68da664d379f352d41d7955712c44e0a738e4ab ]
The tx_bytes should consider the actual size of the Ethernet frames without the SPI encapsulation. But we still need to take care of Ethernet padding.
Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://patch.msgid.link/20241108114343.6174-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/vertexcom/mse102x.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/vertexcom/mse102x.c b/drivers/net/ethernet/vertexcom/mse102x.c index dd766e175f7db..8f67c39f479ee 100644 --- a/drivers/net/ethernet/vertexcom/mse102x.c +++ b/drivers/net/ethernet/vertexcom/mse102x.c @@ -437,13 +437,15 @@ static void mse102x_tx_work(struct work_struct *work) mse = &mses->mse102x;
while ((txb = skb_dequeue(&mse->txq))) { + unsigned int len = max_t(unsigned int, txb->len, ETH_ZLEN); + mutex_lock(&mses->lock); ret = mse102x_tx_pkt_spi(mse, txb, work_timeout); mutex_unlock(&mses->lock); if (ret) { mse->ndev->stats.tx_dropped++; } else { - mse->ndev->stats.tx_bytes += txb->len; + mse->ndev->stats.tx_bytes += len; mse->ndev->stats.tx_packets++; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Yan andy.yan@rock-chips.com
[ Upstream commit ab1c793f457f740ab7108cc0b1340a402dbf484d ]
The 'state' can't be NULL, we should check crtc_state.
Fix warning: drivers/gpu/drm/rockchip/rockchip_drm_vop.c:1096 vop_plane_atomic_async_check() warn: variable dereferenced before check 'state' (see line 1077)
Fixes: 5ddb0bd4ddc3 ("drm/atomic: Pass the full state to planes async atomic check and update") Signed-off-by: Andy Yan andy.yan@rock-chips.com Signed-off-by: Heiko Stuebner heiko@sntech.de Link: https://patchwork.freedesktop.org/patch/msgid/20241021072818.61621-1-andyshr... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index ee72e8c6ad69b..a34d3fc662489 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -1076,10 +1076,10 @@ static int vop_plane_atomic_async_check(struct drm_plane *plane, if (!plane->state->fb) return -EINVAL;
- if (state) - crtc_state = drm_atomic_get_existing_crtc_state(state, - new_plane_state->crtc); - else /* Special case for asynchronous cursor updates. */ + crtc_state = drm_atomic_get_existing_crtc_state(state, new_plane_state->crtc); + + /* Special case for asynchronous cursor updates. */ + if (!crtc_state) crtc_state = plane->crtc->state;
return drm_atomic_helper_check_plane_state(plane->state, crtc_state,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 581302298524e9d77c4c44ff5156a6cd112227ae ]
Eric reported a division by zero splat in the MPTCP protocol:
Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 UID: 0 PID: 6094 Comm: syz-executor317 Not tainted 6.12.0-rc5-syzkaller-00291-g05b92660cdfe #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:__tcp_select_window+0x5b4/0x1310 net/ipv4/tcp_output.c:3163 Code: f6 44 01 e3 89 df e8 9b 75 09 f8 44 39 f3 0f 8d 11 ff ff ff e8 0d 74 09 f8 45 89 f4 e9 04 ff ff ff e8 00 74 09 f8 44 89 f0 99 <f7> 7c 24 14 41 29 d6 45 89 f4 e9 ec fe ff ff e8 e8 73 09 f8 48 89 RSP: 0018:ffffc900041f7930 EFLAGS: 00010293 RAX: 0000000000017e67 RBX: 0000000000017e67 RCX: ffffffff8983314b RDX: 0000000000000000 RSI: ffffffff898331b0 RDI: 0000000000000004 RBP: 00000000005d6000 R08: 0000000000000004 R09: 0000000000017e67 R10: 0000000000003e80 R11: 0000000000000000 R12: 0000000000003e80 R13: ffff888031d9b440 R14: 0000000000017e67 R15: 00000000002eb000 FS: 00007feb5d7f16c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007feb5d8adbb8 CR3: 0000000074e4c000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __tcp_cleanup_rbuf+0x3e7/0x4b0 net/ipv4/tcp.c:1493 mptcp_rcv_space_adjust net/mptcp/protocol.c:2085 [inline] mptcp_recvmsg+0x2156/0x2600 net/mptcp/protocol.c:2289 inet_recvmsg+0x469/0x6a0 net/ipv4/af_inet.c:885 sock_recvmsg_nosec net/socket.c:1051 [inline] sock_recvmsg+0x1b2/0x250 net/socket.c:1073 __sys_recvfrom+0x1a5/0x2e0 net/socket.c:2265 __do_sys_recvfrom net/socket.c:2283 [inline] __se_sys_recvfrom net/socket.c:2279 [inline] __x64_sys_recvfrom+0xe0/0x1c0 net/socket.c:2279 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7feb5d857559 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007feb5d7f1208 EFLAGS: 00000246 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: 00007feb5d8e1318 RCX: 00007feb5d857559 RDX: 000000800000000e RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007feb5d8e1310 R08: 0000000000000000 R09: ffffffff81000000 R10: 0000000000000100 R11: 0000000000000246 R12: 00007feb5d8e131c R13: 00007feb5d8ae074 R14: 000000800000000e R15: 00000000fffffdef
and provided a nice reproducer.
The root cause is the current bad handling of racing disconnect. After the blamed commit below, sk_wait_data() can return (with error) with the underlying socket disconnected and a zero rcv_mss.
Catch the error and return without performing any additional operations on the current socket.
Reported-by: Eric Dumazet edumazet@google.com Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/8c82ecf71662ecbc47bf390f9905de70884c9f2d.1731060874... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index cd6f8d655c185..e99ef1e67e957 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2168,7 +2168,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, cmsg_flags = MPTCP_CMSG_INQ;
while (copied < len) { - int bytes_read; + int err, bytes_read;
bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied, flags, &tss, &cmsg_flags); if (unlikely(bytes_read < 0)) { @@ -2230,9 +2230,16 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, }
pr_debug("block timeout %ld\n", timeo); - sk_wait_data(sk, &timeo, NULL); + mptcp_rcv_space_adjust(msk, copied); + err = sk_wait_data(sk, &timeo, NULL); + if (err < 0) { + err = copied ? : err; + goto out_err; + } }
+ mptcp_rcv_space_adjust(msk, copied); + out_err: if (cmsg_flags && copied >= 0) { if (cmsg_flags & MPTCP_CMSG_TS) @@ -2248,8 +2255,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, pr_debug("msk=%p rx queue empty=%d:%d copied=%d\n", msk, skb_queue_empty_lockless(&sk->sk_receive_queue), skb_queue_empty(&msk->receive_queue), copied); - if (!(flags & MSG_PEEK)) - mptcp_rcv_space_adjust(msk, copied);
release_sock(sk); return copied;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit ce7356ae35943cc6494cc692e62d51a734062b7d ]
Additional active subflows - i.e. created by the in kernel path manager - are included into the subflow list before starting the 3whs.
A racing recvmsg() spooling data received on an already established subflow would unconditionally call tcp_cleanup_rbuf() on all the current subflows, potentially hitting a divide by zero error on the newly created ones.
Explicitly check that the subflow is in a suitable state before invoking tcp_cleanup_rbuf().
Fixes: c76c6956566f ("mptcp: call tcp_cleanup_rbuf on subflows") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/02374660836e1b52afc91966b7535c8c5f7bafb0.1731060874... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e99ef1e67e957..b8357d7c6b3a1 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2045,7 +2045,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) slow = lock_sock_fast(ssk); WRITE_ONCE(ssk->sk_rcvbuf, rcvbuf); WRITE_ONCE(tcp_sk(ssk)->window_clamp, window_clamp); - tcp_cleanup_rbuf(ssk, 1); + if (tcp_can_send_ack(ssk)) + tcp_cleanup_rbuf(ssk, 1); unlock_sock_fast(ssk, slow); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Bloch mbloch@nvidia.com
[ Upstream commit 9ca314419930f9135727e39d77e66262d5f7bef6 ]
The referenced commits introduced a two-step process for deleting FTEs:
- Lock the FTE, delete it from hardware, set the hardware deletion function to NULL and unlock the FTE. - Lock the parent flow group, delete the software copy of the FTE, and remove it from the xarray.
However, this approach encounters a race condition if a rule with the same match value is added simultaneously. In this scenario, fs_core may set the hardware deletion function to NULL prematurely, causing a panic during subsequent rule deletions.
To prevent this, ensure the active flag of the FTE is checked under a lock, which will prevent the fs_core layer from attaching a new steering rule to an FTE that is in the process of deletion.
[ 438.967589] MOSHE: 2496 mlx5_del_flow_rules del_hw_func [ 438.968205] ------------[ cut here ]------------ [ 438.968654] refcount_t: decrement hit 0; leaking memory. [ 438.969249] WARNING: CPU: 0 PID: 8957 at lib/refcount.c:31 refcount_warn_saturate+0xfb/0x110 [ 438.970054] Modules linked in: act_mirred cls_flower act_gact sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core zram zsmalloc fuse [last unloaded: cls_flower] [ 438.973288] CPU: 0 UID: 0 PID: 8957 Comm: tc Not tainted 6.12.0-rc1+ #8 [ 438.973888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 438.974874] RIP: 0010:refcount_warn_saturate+0xfb/0x110 [ 438.975363] Code: 40 66 3b 82 c6 05 16 e9 4d 01 01 e8 1f 7c a0 ff 0f 0b c3 cc cc cc cc 48 c7 c7 10 66 3b 82 c6 05 fd e8 4d 01 01 e8 05 7c a0 ff <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 [ 438.976947] RSP: 0018:ffff888124a53610 EFLAGS: 00010286 [ 438.977446] RAX: 0000000000000000 RBX: ffff888119d56de0 RCX: 0000000000000000 [ 438.978090] RDX: ffff88852c828700 RSI: ffff88852c81b3c0 RDI: ffff88852c81b3c0 [ 438.978721] RBP: ffff888120fa0e88 R08: 0000000000000000 R09: ffff888124a534b0 [ 438.979353] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888119d56de0 [ 438.979979] R13: ffff888120fa0ec0 R14: ffff888120fa0ee8 R15: ffff888119d56de0 [ 438.980607] FS: 00007fe6dcc0f800(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 [ 438.983984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.984544] CR2: 00000000004275e0 CR3: 0000000186982001 CR4: 0000000000372eb0 [ 438.985205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.985842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.986507] Call Trace: [ 438.986799] <TASK> [ 438.987070] ? __warn+0x7d/0x110 [ 438.987426] ? refcount_warn_saturate+0xfb/0x110 [ 438.987877] ? report_bug+0x17d/0x190 [ 438.988261] ? prb_read_valid+0x17/0x20 [ 438.988659] ? handle_bug+0x53/0x90 [ 438.989054] ? exc_invalid_op+0x14/0x70 [ 438.989458] ? asm_exc_invalid_op+0x16/0x20 [ 438.989883] ? refcount_warn_saturate+0xfb/0x110 [ 438.990348] mlx5_del_flow_rules+0x2f7/0x340 [mlx5_core] [ 438.990932] __mlx5_eswitch_del_rule+0x49/0x170 [mlx5_core] [ 438.991519] ? mlx5_lag_is_sriov+0x3c/0x50 [mlx5_core] [ 438.992054] ? xas_load+0x9/0xb0 [ 438.992407] mlx5e_tc_rule_unoffload+0x45/0xe0 [mlx5_core] [ 438.993037] mlx5e_tc_del_fdb_flow+0x2a6/0x2e0 [mlx5_core] [ 438.993623] mlx5e_flow_put+0x29/0x60 [mlx5_core] [ 438.994161] mlx5e_delete_flower+0x261/0x390 [mlx5_core] [ 438.994728] tc_setup_cb_destroy+0xb9/0x190 [ 438.995150] fl_hw_destroy_filter+0x94/0xc0 [cls_flower] [ 438.995650] fl_change+0x11a4/0x13c0 [cls_flower] [ 438.996105] tc_new_tfilter+0x347/0xbc0 [ 438.996503] ? ___slab_alloc+0x70/0x8c0 [ 438.996929] rtnetlink_rcv_msg+0xf9/0x3e0 [ 438.997339] ? __netlink_sendskb+0x4c/0x70 [ 438.997751] ? netlink_unicast+0x286/0x2d0 [ 438.998171] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 438.998625] netlink_rcv_skb+0x54/0x100 [ 438.999020] netlink_unicast+0x203/0x2d0 [ 438.999421] netlink_sendmsg+0x1e4/0x420 [ 438.999820] __sock_sendmsg+0xa1/0xb0 [ 439.000203] ____sys_sendmsg+0x207/0x2a0 [ 439.000600] ? copy_msghdr_from_user+0x6d/0xa0 [ 439.001072] ___sys_sendmsg+0x80/0xc0 [ 439.001459] ? ___sys_recvmsg+0x8b/0xc0 [ 439.001848] ? generic_update_time+0x4d/0x60 [ 439.002282] __sys_sendmsg+0x51/0x90 [ 439.002658] do_syscall_64+0x50/0x110 [ 439.003040] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes") Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup") Signed-off-by: Mark Bloch mbloch@nvidia.com Reviewed-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20241107183527.676877-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index e2f7cecce6f1a..991250f44c2ed 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -1946,13 +1946,22 @@ lookup_fte_locked(struct mlx5_flow_group *g, fte_tmp = NULL; goto out; } + + nested_down_write_ref_node(&fte_tmp->node, FS_LOCK_CHILD); + if (!fte_tmp->node.active) { + up_write_ref_node(&fte_tmp->node, false); + + if (take_write) + up_write_ref_node(&g->node, false); + else + up_read_ref_node(&g->node); + tree_put_node(&fte_tmp->node, false); - fte_tmp = NULL; - goto out; + + return NULL; }
- nested_down_write_ref_node(&fte_tmp->node, FS_LOCK_CHILD); out: if (take_write) up_write_ref_node(&g->node, false);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit dd6e972cc5890d91d6749bb48e3912721c4e4b25 ]
The kTLS tx handling code is using a mix of get_page() and page_ref_inc() APIs to increment the page reference. But on the release path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used.
This is an issue when using pages from large folios: the get_page() references are stored on the folio page while the page_ref_inc() references are stored directly in the given page. On release the folio page will be dereferenced too many times.
This was found while doing kTLS testing with sendfile() + ZC when the served file was read from NFS on a kernel with NFS large folios support (commit 49b29a573da8 ("nfs: add support for large folios")).
Fixes: 84d1bb2b139e ("net/mlx5e: kTLS, Limit DUMP wqe size") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20241107183527.676877-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c index d61be26a4df1a..3db31cc107192 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -660,7 +660,7 @@ tx_sync_info_get(struct mlx5e_ktls_offload_context_tx *priv_tx, while (remaining > 0) { skb_frag_t *frag = &record->frags[i];
- get_page(skb_frag_page(frag)); + page_ref_inc(skb_frag_page(frag)); remaining -= skb_frag_size(frag); info->frags[i++] = *frag; } @@ -763,7 +763,7 @@ void mlx5e_ktls_tx_handle_resync_dump_comp(struct mlx5e_txqsq *sq, stats = sq->stats;
mlx5e_tx_dma_unmap(sq->pdev, dma); - put_page(wi->resync_dump_frag_page); + page_ref_dec(wi->resync_dump_frag_page); stats->tls_dump_packets++; stats->tls_dump_bytes += wi->num_bytes; } @@ -816,12 +816,12 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
err_out: for (; i < info.nr_frags; i++) - /* The put_page() here undoes the page ref obtained in tx_sync_info_get(). + /* The page_ref_dec() here undoes the page ref obtained in tx_sync_info_get(). * Page refs obtained for the DUMP WQEs above (by page_ref_add) will be * released only upon their completions (or in mlx5e_free_txqsq_descs, * if channel closes). */ - put_page(skb_frag_page(&info.frags[i])); + page_ref_dec(skb_frag_page(&info.frags[i]));
return MLX5E_KTLS_SYNC_FAIL; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: William Tu witu@nvidia.com
[ Upstream commit c079389878debf767dc4e52fe877b9117258dfe2 ]
Non-uplink representor port does not support XDP. The patch clears the xdp feature by checking the net_device_ops.ndo_bpf is set or not.
Verify using the netlink tool: $ tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump dev-get
Representor netdev before the patch: {'ifindex': 8, 'xdp-features': {'basic', 'ndo-xmit', 'ndo-xmit-sg', 'redirect', 'rx-sg', 'xsk-zerocopy'}, 'xdp-rx-metadata-features': set(), 'xdp-zc-max-segs': 1, 'xsk-features': set()}, With the patch: {'ifindex': 8, 'xdp-features': set(), 'xdp-rx-metadata-features': set(), 'xsk-features': set()},
Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: William Tu witu@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20241107183527.676877-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a65c407aa60bd..6e431f587c233 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4067,7 +4067,8 @@ void mlx5e_set_xdp_feature(struct net_device *netdev) struct mlx5e_params *params = &priv->channels.params; xdp_features_t val;
- if (params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) { + if (!netdev->netdev_ops->ndo_bpf || + params->packet_merge.type != MLX5E_PACKET_MERGE_NONE) { xdp_clear_features_flag(netdev); return; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit e99c6873229fe0482e7ceb7d5600e32d623ed9d9 ]
In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add() callback returns error, zone_rule->attr is used uninitiated. Fix it to use attr which has the needed pointer value.
Kernel log: BUG: kernel NULL pointer dereference, address: 0000000000000110 RIP: 0010:mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core] … Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x150/0x3e0 ? exc_page_fault+0x74/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core] ? mlx5_tc_ct_entry_add_rule+0x1d5/0x2f0 [mlx5_core] mlx5_tc_ct_block_flow_offload+0xc6a/0xf90 [mlx5_core] ? nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table] nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table] flow_offload_work_handler+0x142/0x320 [nf_flow_table] ? finish_task_switch.isra.0+0x15b/0x2b0 process_one_work+0x16c/0x320 worker_thread+0x28c/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xb8/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK>
Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Yevgeny Kliteynik kliteyn@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20241107183527.676877-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c index 8c4e3ecef5901..65cee5c6f1dd6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c @@ -854,7 +854,7 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv, return 0;
err_rule: - mlx5_tc_ct_entry_destroy_mod_hdr(ct_priv, zone_rule->attr, zone_rule->mh); + mlx5_tc_ct_entry_destroy_mod_hdr(ct_priv, attr, zone_rule->mh); mlx5_put_label_mapping(ct_priv, attr->ct_attr.ct_labels_id); err_mod_hdr: kfree(attr);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit d7b0ff5a866724c3ad21f2628c22a63336deec3f ]
As the final stages of socket destruction may be delayed, it is possible that virtio_transport_recv_listen() will be called after the accept_queue has been flushed, but before the SOCK_DONE flag has been set. As a result, sockets enqueued after the flush would remain unremoved, leading to a memory leak.
vsock_release __vsock_release lock virtio_transport_release virtio_transport_close schedule_delayed_work(close_work) sk_shutdown = SHUTDOWN_MASK (!) flush accept_queue release virtio_transport_recv_pkt vsock_find_bound_socket lock if flag(SOCK_DONE) return virtio_transport_recv_listen child = vsock_create_connected (!) vsock_enqueue_accept(child) release close_work lock virtio_transport_do_close set_flag(SOCK_DONE) virtio_transport_remove_sock vsock_remove_sock vsock_remove_bound release
Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during socket destruction.
unreferenced object 0xffff888109e3f800 (size 2040): comm "kworker/5:2", pid 371, jiffies 4294940105 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............ backtrace (crc 9e5f4e84): [<ffffffff81418ff1>] kmem_cache_alloc_noprof+0x2c1/0x360 [<ffffffff81d27aa0>] sk_prot_alloc+0x30/0x120 [<ffffffff81d2b54c>] sk_alloc+0x2c/0x4b0 [<ffffffff81fe049a>] __vsock_create.constprop.0+0x2a/0x310 [<ffffffff81fe6d6c>] virtio_transport_recv_pkt+0x4dc/0x9a0 [<ffffffff81fe745d>] vsock_loopback_work+0xfd/0x140 [<ffffffff810fc6ac>] process_one_work+0x20c/0x570 [<ffffffff810fce3f>] worker_thread+0x1bf/0x3a0 [<ffffffff811070dd>] kthread+0xdd/0x110 [<ffffffff81044fdd>] ret_from_fork+0x2d/0x50 [<ffffffff8100785a>] ret_from_fork_asm+0x1a/0x30
Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed") Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Michal Luczaj mhal@rbox.co Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/vmw_vsock/virtio_transport_common.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 2a44505f4a223..43495820b64fb 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1314,6 +1314,14 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, return -ENOMEM; }
+ /* __vsock_release() might have already flushed accept_queue. + * Subsequent enqueues would lead to a memory leak. + */ + if (sk->sk_shutdown == SHUTDOWN_MASK) { + virtio_transport_reset_no_sock(t, skb); + return -ESHUTDOWN; + } + child = vsock_create_connected(sk); if (!child) { virtio_transport_reset_no_sock(t, skb);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 6abe2a90808192a5a8b2825293e5f10e80fdea56 ]
The citied commit in Fixes line caused to regression for udaddy [1] application. It doesn't work over VLANs anymore.
Client: ifconfig eth2 1.1.1.1 ip link add link eth2 name p0.3597 type vlan protocol 802.1Q id 3597 ip link set dev p0.3597 up ip addr add 2.2.2.2/16 dev p0.3597 udaddy -S 847 -C 220 -c 2 -t 0 -s 2.2.2.3 -b 2.2.2.2
Server: ifconfig eth2 1.1.1.3 ip link add link eth2 name p0.3597 type vlan protocol 802.1Q id 3597 ip link set dev p0.3597 up ip addr add 2.2.2.3/16 dev p0.3597 udaddy -S 847 -C 220 -c 2 -t 0 -b 2.2.2.3
[1] https://github.com/linux-rdma/rdma-core/blob/master/librdmacm/examples/udadd...
Fixes: 5069d7e202f6 ("RDMA/core: Fix ENODEV error for iWARP test over vlan") Reported-by: Leon Romanovsky leonro@nvidia.com Closes: https://lore.kernel.org/all/20241110130746.GA48891@unreal Link: https://patch.msgid.link/bb9d403419b2b9566da5b8bf0761fa8377927e49.1731401658... Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/addr.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index fd78d678877c4..f253295795f0a 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -269,8 +269,6 @@ rdma_find_ndev_for_src_ip_rcu(struct net *net, const struct sockaddr *src_in) break; #endif } - if (!ret && dev && is_vlan_dev(dev)) - dev = vlan_dev_real_dev(dev); return ret ? ERR_PTR(ret) : dev; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 7967dc8f797f454d4f4acec15c7df0cdf4801617 ]
Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until BT_CONNECTED state is reached") there is no long the need to call mgmt_device_connected as ACL data will be queued until BT_CONNECTED state.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219458 Link: https://github.com/bluez/bluez/issues/1014 Fixes: 333b4fd11e89 ("Bluetooth: L2CAP: Fix uaf in l2cap_connect") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index d4e607bf35baf..3cf4dd9cad8a3 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3752,8 +3752,6 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
hci_dev_lock(hdev); conn = hci_conn_hash_lookup_handle(hdev, handle); - if (conn && hci_dev_test_flag(hdev, HCI_MGMT)) - mgmt_device_connected(hdev, conn, NULL, 0); hci_dev_unlock(hdev);
if (conn) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kiran K kiran.k@intel.com
[ Upstream commit d5359a7f583ab9b7706915213b54deac065bcb81 ]
Have exception event part of HCI traces which helps for debug.
snoop traces:
HCI Event: Vendor (0xff) plen 79
Vendor Prefix (0x8780) Intel Extended Telemetry (0x03) Unknown extended telemetry event type (0xde) 01 01 de Unknown extended subevent 0x07 01 01 de 07 01 de 06 1c ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de 05 14 ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de 43 10 ef be ad de ef be ad de ef be ad de ef be ad de
Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support") Signed-off-by: Kiran K kiran.k@intel.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btintel.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c index a936219aebb81..3773cd9d998d5 100644 --- a/drivers/bluetooth/btintel.c +++ b/drivers/bluetooth/btintel.c @@ -2928,13 +2928,12 @@ static int btintel_diagnostics(struct hci_dev *hdev, struct sk_buff *skb) case INTEL_TLV_TEST_EXCEPTION: /* Generate devcoredump from exception */ if (!hci_devcd_init(hdev, skb->len)) { - hci_devcd_append(hdev, skb); + hci_devcd_append(hdev, skb_clone(skb, GFP_ATOMIC)); hci_devcd_complete(hdev); } else { bt_dev_err(hdev, "Failed to generate devcoredump"); - kfree_skb(skb); } - return 0; + break; default: bt_dev_err(hdev, "Invalid exception type %02X", tlv->val[0]); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pedro Tammela pctammela@mojatatu.com
[ Upstream commit 6b78debe1c07e6aa3c91ca0b1384bf3cb8217c50 ]
Proper refcounts will always warn splat when something goes wrong, be it underflow, saturation or object resurrection. As these are always a source of bugs, use it in cls_u32 as a safeguard to prevent/catch issues. Another benefit is that the refcount API self documents the code, making clear when transitions to dead are expected.
For such an update we had to make minor adaptations on u32 to fit the refcount API. First we set explicitly to '1' when objects are created, then the objects are alive until a 1 -> 0 happens, which is then released appropriately.
The above made clear some redundant operations in the u32 code around the root_ht handling that were removed. The root_ht is created with a refcnt set to 1. Then when it's associated with tcf_proto it increments the refcnt to 2. Throughout the entire code the root_ht is an exceptional case and can never be referenced, therefore the refcnt never incremented/decremented. Its lifetime is always bound to tcf_proto, meaning if you delete tcf_proto the root_ht is deleted as well. The code made up for the fact that root_ht refcnt is 2 and did a double decrement to free it, which is not a fit for the refcount API.
Even though refcount_t is implemented using atomics, we should observe a negligible control plane impact.
Signed-off-by: Pedro Tammela pctammela@mojatatu.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://lore.kernel.org/r/20231114141856.974326-2-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 73af53d82076 ("net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.") Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_u32.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 6663e971a13e7..b3531f458adaf 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -71,7 +71,7 @@ struct tc_u_hnode { struct tc_u_hnode __rcu *next; u32 handle; u32 prio; - int refcnt; + refcount_t refcnt; unsigned int divisor; struct idr handle_idr; bool is_root; @@ -86,7 +86,7 @@ struct tc_u_hnode { struct tc_u_common { struct tc_u_hnode __rcu *hlist; void *ptr; - int refcnt; + refcount_t refcnt; struct idr handle_idr; struct hlist_node hnode; long knodes; @@ -359,7 +359,7 @@ static int u32_init(struct tcf_proto *tp) if (root_ht == NULL) return -ENOBUFS;
- root_ht->refcnt++; + refcount_set(&root_ht->refcnt, 1); root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000; root_ht->prio = tp->prio; root_ht->is_root = true; @@ -371,18 +371,20 @@ static int u32_init(struct tcf_proto *tp) kfree(root_ht); return -ENOBUFS; } + refcount_set(&tp_c->refcnt, 1); tp_c->ptr = key; INIT_HLIST_NODE(&tp_c->hnode); idr_init(&tp_c->handle_idr);
hlist_add_head(&tp_c->hnode, tc_u_hash(key)); + } else { + refcount_inc(&tp_c->refcnt); }
- tp_c->refcnt++; RCU_INIT_POINTER(root_ht->next, tp_c->hlist); rcu_assign_pointer(tp_c->hlist, root_ht);
- root_ht->refcnt++; + /* root_ht must be destroyed when tcf_proto is destroyed */ rcu_assign_pointer(tp->root, root_ht); tp->data = tp_c; return 0; @@ -393,7 +395,7 @@ static void __u32_destroy_key(struct tc_u_knode *n) struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
tcf_exts_destroy(&n->exts); - if (ht && --ht->refcnt == 0) + if (ht && refcount_dec_and_test(&ht->refcnt)) kfree(ht); kfree(n); } @@ -601,8 +603,6 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, struct tc_u_hnode __rcu **hn; struct tc_u_hnode *phn;
- WARN_ON(--ht->refcnt); - u32_clear_hnode(tp, ht, extack);
hn = &tp_c->hlist; @@ -630,10 +630,10 @@ static void u32_destroy(struct tcf_proto *tp, bool rtnl_held,
WARN_ON(root_ht == NULL);
- if (root_ht && --root_ht->refcnt == 1) + if (root_ht && refcount_dec_and_test(&root_ht->refcnt)) u32_destroy_hnode(tp, root_ht, extack);
- if (--tp_c->refcnt == 0) { + if (refcount_dec_and_test(&tp_c->refcnt)) { struct tc_u_hnode *ht;
hlist_del(&tp_c->hnode); @@ -645,7 +645,7 @@ static void u32_destroy(struct tcf_proto *tp, bool rtnl_held, /* u32_destroy_key() will later free ht for us, if it's * still referenced by some knode */ - if (--ht->refcnt == 0) + if (refcount_dec_and_test(&ht->refcnt)) kfree_rcu(ht, rcu); }
@@ -674,7 +674,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, return -EINVAL; }
- if (ht->refcnt == 1) { + if (refcount_dec_if_one(&ht->refcnt)) { u32_destroy_hnode(tp, ht, extack); } else { NL_SET_ERR_MSG_MOD(extack, "Can not delete in-use filter"); @@ -682,7 +682,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last, }
out: - *last = tp_c->refcnt == 1 && tp_c->knodes == 0; + *last = refcount_read(&tp_c->refcnt) == 1 && tp_c->knodes == 0; return ret; }
@@ -766,14 +766,14 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp, NL_SET_ERR_MSG_MOD(extack, "Not linking to root node"); return -EINVAL; } - ht_down->refcnt++; + refcount_inc(&ht_down->refcnt); }
ht_old = rtnl_dereference(n->ht_down); rcu_assign_pointer(n->ht_down, ht_down);
if (ht_old) - ht_old->refcnt--; + refcount_dec(&ht_old->refcnt); }
if (ifindex >= 0) @@ -852,7 +852,7 @@ static struct tc_u_knode *u32_init_knode(struct net *net, struct tcf_proto *tp,
/* bump reference count as long as we hold pointer to structure */ if (ht) - ht->refcnt++; + refcount_inc(&ht->refcnt);
return new; } @@ -932,7 +932,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
ht_old = rtnl_dereference(n->ht_down); if (ht_old) - ht_old->refcnt++; + refcount_inc(&ht_old->refcnt); } __u32_destroy_key(new); return err; @@ -980,7 +980,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return err; } } - ht->refcnt = 1; + refcount_set(&ht->refcnt, 1); ht->divisor = divisor; ht->handle = handle; ht->prio = tp->prio;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Ferrieux alexandre.ferrieux@gmail.com
[ Upstream commit 73af53d82076bbe184d9ece9e14b0dc8599e6055 ]
To generate hnode handles (in gen_new_htid()), u32 uses IDR and encodes the returned small integer into a structured 32-bit word. Unfortunately, at disposal time, the needed decoding is not done. As a result, idr_remove() fails, and the IDR fills up. Since its size is 2048, the following script ends up with "Filter already exists":
tc filter add dev myve $FILTER1 tc filter add dev myve $FILTER2 for i in {1..2048} do echo $i tc filter del dev myve $FILTER2 tc filter add dev myve $FILTER2 done
This patch adds the missing decoding logic for handles that deserve it.
Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles") Reviewed-by: Eric Dumazet edumazet@google.com Acked-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: Alexandre Ferrieux alexandre.ferrieux@orange.com Tested-by: Victor Nogueira victor@mojatatu.com Link: https://patch.msgid.link/20241110172836.331319-1-alexandre.ferrieux@orange.c... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_u32.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index b3531f458adaf..67f27be138487 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -92,6 +92,16 @@ struct tc_u_common { long knodes; };
+static u32 handle2id(u32 h) +{ + return ((h & 0x80000000) ? ((h >> 20) & 0x7FF) : h); +} + +static u32 id2handle(u32 id) +{ + return (id | 0x800U) << 20; +} + static inline unsigned int u32_hash_fold(__be32 key, const struct tc_u32_sel *sel, u8 fshift) @@ -310,7 +320,7 @@ static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr) int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL); if (id < 0) return 0; - return (id | 0x800U) << 20; + return id2handle(id); }
static struct hlist_head *tc_u_common_hash; @@ -360,7 +370,7 @@ static int u32_init(struct tcf_proto *tp) return -ENOBUFS;
refcount_set(&root_ht->refcnt, 1); - root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000; + root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : id2handle(0); root_ht->prio = tp->prio; root_ht->is_root = true; idr_init(&root_ht->handle_idr); @@ -612,7 +622,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht, if (phn == ht) { u32_clear_hw_hnode(tp, ht, extack); idr_destroy(&ht->handle_idr); - idr_remove(&tp_c->handle_idr, ht->handle); + idr_remove(&tp_c->handle_idr, handle2id(ht->handle)); RCU_INIT_POINTER(*hn, ht->next); kfree_rcu(ht, rcu); return 0; @@ -989,7 +999,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
err = u32_replace_hw_hnode(tp, ht, userflags, extack); if (err) { - idr_remove(&tp_c->handle_idr, handle); + idr_remove(&tp_c->handle_idr, handle2id(handle)); kfree(ht); return err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei Fang wei.fang@nxp.com
[ Upstream commit 3342dc8b4623d835e7dd76a15cec2e5a94fe2f93 ]
In the pktgen_sample01_simple.sh script, the device variable is uppercase 'DEV' instead of lowercase 'dev'. Because of this typo, the script cannot enable UDP tx checksum.
Fixes: 460a9aa23de6 ("samples: pktgen: add UDP tx checksum support") Signed-off-by: Wei Fang wei.fang@nxp.com Reviewed-by: Simon Horman horms@kernel.org Acked-by: Jesper Dangaard Brouer hawk@kernel.org Link: https://patch.msgid.link/20241112030347.1849335-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- samples/pktgen/pktgen_sample01_simple.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/pktgen/pktgen_sample01_simple.sh b/samples/pktgen/pktgen_sample01_simple.sh index cdb9f497f87da..66cb707479e6c 100755 --- a/samples/pktgen/pktgen_sample01_simple.sh +++ b/samples/pktgen/pktgen_sample01_simple.sh @@ -76,7 +76,7 @@ if [ -n "$DST_PORT" ]; then pg_set $DEV "udp_dst_max $UDP_DST_MAX" fi
-[ ! -z "$UDP_CSUM" ] && pg_set $dev "flag UDPCSUM" +[ ! -z "$UDP_CSUM" ] && pg_set $DEV "flag UDPCSUM"
# Setup random UDP port src range pg_set $DEV "flag UDPSRC_RND"
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nícolas F. R. A. Prado nfraprado@collabora.com
[ Upstream commit a03b18a71c128846360cc81ac6fdb0e7d41597b4 ]
The mediatek,mac-wol property is being handled backwards to what is described in the binding: it currently enables PHY WOL when the property is present and vice versa. Invert the driver logic so it matches the binding description.
Fixes: fd1d62d80ebc ("net: stmmac: replace the use_phy_wol field with a flag") Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com Link: https://patch.msgid.link/20241109-mediatek-mac-wol-noninverted-v2-1-0e264e21... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c index cd796ec04132d..634ea6b33ea3c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.c @@ -589,9 +589,9 @@ static int mediatek_dwmac_common_data(struct platform_device *pdev,
plat->mac_interface = priv_plat->phy_mode; if (priv_plat->mac_wol) - plat->flags |= STMMAC_FLAG_USE_PHY_WOL; - else plat->flags &= ~STMMAC_FLAG_USE_PHY_WOL; + else + plat->flags |= STMMAC_FLAG_USE_PHY_WOL; plat->riwt_off = 1; plat->maxmtu = ETH_DATA_LEN; plat->host_dma_width = priv_plat->variant->dma_bit_mask;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Luczaj mhal@rbox.co
[ Upstream commit eb94b7bb10109a14a5431a67e5d8e31cfa06b395 ]
copy_safe_from_sockptr() return copy_from_sockptr() return copy_from_sockptr_offset() return copy_from_user()
copy_from_user() does not return an error on fault. Instead, it returns a number of bytes that were not copied. Have it handled.
Patch has a side effect: it un-breaks garbage input handling of nfc_llcp_setsockopt() and mISDN's data_sock_setsockopt().
Fixes: 6309863b31dd ("net: add copy_safe_from_sockptr() helper") Signed-off-by: Michal Luczaj mhal@rbox.co Link: https://patch.msgid.link/20241111-sockptr-copy-ret-fix-v1-1-a520083a93fb@rbo... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sockptr.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index 1c1a5d926b171..0eb3a2b1f81ff 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -77,7 +77,9 @@ static inline int copy_safe_from_sockptr(void *dst, size_t ksize, { if (optlen < ksize) return -EINVAL; - return copy_from_sockptr(dst, optval, ksize); + if (copy_from_sockptr(dst, optval, ksize)) + return -EFAULT; + return 0; }
static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit abea8fd5e801a679312479b2bf00d7b4285eca78 ]
Simplify the driver's probe() function by using the devres variant of stmmac_probe_config_dt().
The calling of stmmac_pltfr_remove() now needs to be switched to stmmac_pltfr_remove_no_dt().
Signed-off-by: Jisheng Zhang jszhang@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") Signed-off-by: Sasha Levin sashal@kernel.org --- .../stmicro/stmmac/dwmac-intel-plat.c | 27 +++++++------------ 1 file changed, 9 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c index d352a14f9d483..d1aec2ca2b429 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c @@ -85,17 +85,15 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) return ret;
- plat_dat = stmmac_probe_config_dt(pdev, stmmac_res.mac); + plat_dat = devm_stmmac_probe_config_dt(pdev, stmmac_res.mac); if (IS_ERR(plat_dat)) { dev_err(&pdev->dev, "dt configuration failed\n"); return PTR_ERR(plat_dat); }
dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL); - if (!dwmac) { - ret = -ENOMEM; - goto err_remove_config_dt; - } + if (!dwmac) + return -ENOMEM;
dwmac->dev = &pdev->dev; dwmac->tx_clk = NULL; @@ -110,10 +108,8 @@ static int intel_eth_plat_probe(struct platform_device *pdev) /* Enable TX clock */ if (dwmac->data->tx_clk_en) { dwmac->tx_clk = devm_clk_get(&pdev->dev, "tx_clk"); - if (IS_ERR(dwmac->tx_clk)) { - ret = PTR_ERR(dwmac->tx_clk); - goto err_remove_config_dt; - } + if (IS_ERR(dwmac->tx_clk)) + return PTR_ERR(dwmac->tx_clk);
clk_prepare_enable(dwmac->tx_clk);
@@ -126,7 +122,7 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "Failed to set tx_clk\n"); - goto err_remove_config_dt; + return ret; } } } @@ -140,7 +136,7 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "Failed to set clk_ptp_ref\n"); - goto err_remove_config_dt; + return ret; } } } @@ -158,22 +154,17 @@ static int intel_eth_plat_probe(struct platform_device *pdev) ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res); if (ret) { clk_disable_unprepare(dwmac->tx_clk); - goto err_remove_config_dt; + return ret; }
return 0; - -err_remove_config_dt: - stmmac_remove_config_dt(pdev, plat_dat); - - return ret; }
static void intel_eth_plat_remove(struct platform_device *pdev) { struct intel_dwmac *dwmac = get_stmmac_bsp_priv(&pdev->dev);
- stmmac_pltfr_remove(pdev); + stmmac_pltfr_remove_no_dt(pdev); clk_disable_unprepare(dwmac->tx_clk); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit d336a117b593e96559c309bb250f06b4fc22998f ]
Simplify the driver's probe() function by using the devres variant of stmmac_probe_config_dt().
The calling of stmmac_pltfr_remove() now needs to be switched to stmmac_pltfr_remove_no_dt().
Signed-off-by: Jisheng Zhang jszhang@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/stmicro/stmmac/dwmac-visconti.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c index 22d113fb8e09c..45f5d66a11c26 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c @@ -220,15 +220,13 @@ static int visconti_eth_dwmac_probe(struct platform_device *pdev) if (ret) return ret;
- plat_dat = stmmac_probe_config_dt(pdev, stmmac_res.mac); + plat_dat = devm_stmmac_probe_config_dt(pdev, stmmac_res.mac); if (IS_ERR(plat_dat)) return PTR_ERR(plat_dat);
dwmac = devm_kzalloc(&pdev->dev, sizeof(*dwmac), GFP_KERNEL); - if (!dwmac) { - ret = -ENOMEM; - goto remove_config; - } + if (!dwmac) + return -ENOMEM;
spin_lock_init(&dwmac->lock); dwmac->reg = stmmac_res.addr; @@ -238,7 +236,7 @@ static int visconti_eth_dwmac_probe(struct platform_device *pdev)
ret = visconti_eth_clock_probe(pdev, plat_dat); if (ret) - goto remove_config; + return ret;
visconti_eth_init_hw(pdev, plat_dat);
@@ -252,22 +250,15 @@ static int visconti_eth_dwmac_probe(struct platform_device *pdev)
remove: visconti_eth_clock_remove(pdev); -remove_config: - stmmac_remove_config_dt(pdev, plat_dat);
return ret; }
static void visconti_eth_dwmac_remove(struct platform_device *pdev) { - struct net_device *ndev = platform_get_drvdata(pdev); - struct stmmac_priv *priv = netdev_priv(ndev); - - stmmac_pltfr_remove(pdev); + stmmac_pltfr_remove_no_dt(pdev);
visconti_eth_clock_remove(pdev); - - stmmac_remove_config_dt(pdev, priv->plat); }
static const struct of_device_id visconti_eth_dwmac_match[] = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jisheng Zhang jszhang@kernel.org
[ Upstream commit 2c9fc838067b02cb3e6057fef5cd7cf1c04a95aa ]
Now, all users of the old stmmac_pltfr_remove() are converted to the devres helper, it's time to rename stmmac_pltfr_remove_no_dt() back to stmmac_pltfr_remove() and remove the old stmmac_pltfr_remove().
Signed-off-by: Jisheng Zhang jszhang@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") Signed-off-by: Sasha Levin sashal@kernel.org --- .../stmicro/stmmac/dwmac-intel-plat.c | 2 +- .../ethernet/stmicro/stmmac/dwmac-visconti.c | 3 +-- .../ethernet/stmicro/stmmac/stmmac_platform.c | 23 +++---------------- .../ethernet/stmicro/stmmac/stmmac_platform.h | 1 - 4 files changed, 5 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c index d1aec2ca2b429..70edc5232379f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c @@ -164,7 +164,7 @@ static void intel_eth_plat_remove(struct platform_device *pdev) { struct intel_dwmac *dwmac = get_stmmac_bsp_priv(&pdev->dev);
- stmmac_pltfr_remove_no_dt(pdev); + stmmac_pltfr_remove(pdev); clk_disable_unprepare(dwmac->tx_clk); }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c index 45f5d66a11c26..a5a5cfa989c6e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c @@ -256,8 +256,7 @@ static int visconti_eth_dwmac_probe(struct platform_device *pdev)
static void visconti_eth_dwmac_remove(struct platform_device *pdev) { - stmmac_pltfr_remove_no_dt(pdev); - + stmmac_pltfr_remove(pdev); visconti_eth_clock_remove(pdev); }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 30d5e635190e6..b4fdd40be63cb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -810,7 +810,7 @@ static void devm_stmmac_pltfr_remove(void *data) { struct platform_device *pdev = data;
- stmmac_pltfr_remove_no_dt(pdev); + stmmac_pltfr_remove(pdev); }
/** @@ -837,12 +837,12 @@ int devm_stmmac_pltfr_probe(struct platform_device *pdev, EXPORT_SYMBOL_GPL(devm_stmmac_pltfr_probe);
/** - * stmmac_pltfr_remove_no_dt + * stmmac_pltfr_remove * @pdev: pointer to the platform device * Description: This undoes the effects of stmmac_pltfr_probe() by removing the * driver and calling the platform's exit() callback. */ -void stmmac_pltfr_remove_no_dt(struct platform_device *pdev) +void stmmac_pltfr_remove(struct platform_device *pdev) { struct net_device *ndev = platform_get_drvdata(pdev); struct stmmac_priv *priv = netdev_priv(ndev); @@ -851,23 +851,6 @@ void stmmac_pltfr_remove_no_dt(struct platform_device *pdev) stmmac_dvr_remove(&pdev->dev); stmmac_pltfr_exit(pdev, plat); } -EXPORT_SYMBOL_GPL(stmmac_pltfr_remove_no_dt); - -/** - * stmmac_pltfr_remove - * @pdev: platform device pointer - * Description: this function calls the main to free the net resources - * and calls the platforms hook and release the resources (e.g. mem). - */ -void stmmac_pltfr_remove(struct platform_device *pdev) -{ - struct net_device *ndev = platform_get_drvdata(pdev); - struct stmmac_priv *priv = netdev_priv(ndev); - struct plat_stmmacenet_data *plat = priv->plat; - - stmmac_pltfr_remove_no_dt(pdev); - stmmac_remove_config_dt(pdev, plat); -} EXPORT_SYMBOL_GPL(stmmac_pltfr_remove);
/** diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h index c5565b2a70acc..bb07a99e1248b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h @@ -32,7 +32,6 @@ int stmmac_pltfr_probe(struct platform_device *pdev, int devm_stmmac_pltfr_probe(struct platform_device *pdev, struct plat_stmmacenet_data *plat, struct stmmac_resources *res); -void stmmac_pltfr_remove_no_dt(struct platform_device *pdev); void stmmac_pltfr_remove(struct platform_device *pdev); extern const struct dev_pm_ops stmmac_pltfr_pm_ops;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitalii Mordan mordan@ispras.ru
[ Upstream commit 5b366eae71937ae7412365340b431064625f9617 ]
If the clock dwmac->tx_clk was not enabled in intel_eth_plat_probe, it should not be disabled in any path.
Conversely, if it was enabled in intel_eth_plat_probe, it must be disabled in all error paths to ensure proper cleanup.
Found by Linux Verification Center (linuxtesting.org) with Klever.
Fixes: 9efc9b2b04c7 ("net: stmmac: Add dwmac-intel-plat for GBE driver") Signed-off-by: Vitalii Mordan mordan@ispras.ru Link: https://patch.msgid.link/20241108173334.2973603-1-mordan@ispras.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../stmicro/stmmac/dwmac-intel-plat.c | 25 +++++++++++++------ 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c index 70edc5232379f..134f6506df99a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c @@ -111,7 +111,12 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (IS_ERR(dwmac->tx_clk)) return PTR_ERR(dwmac->tx_clk);
- clk_prepare_enable(dwmac->tx_clk); + ret = clk_prepare_enable(dwmac->tx_clk); + if (ret) { + dev_err(&pdev->dev, + "Failed to enable tx_clk\n"); + return ret; + }
/* Check and configure TX clock rate */ rate = clk_get_rate(dwmac->tx_clk); @@ -122,7 +127,7 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "Failed to set tx_clk\n"); - return ret; + goto err_tx_clk_disable; } } } @@ -136,7 +141,7 @@ static int intel_eth_plat_probe(struct platform_device *pdev) if (ret) { dev_err(&pdev->dev, "Failed to set clk_ptp_ref\n"); - return ret; + goto err_tx_clk_disable; } } } @@ -152,12 +157,15 @@ static int intel_eth_plat_probe(struct platform_device *pdev) }
ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res); - if (ret) { - clk_disable_unprepare(dwmac->tx_clk); - return ret; - } + if (ret) + goto err_tx_clk_disable;
return 0; + +err_tx_clk_disable: + if (dwmac->data->tx_clk_en) + clk_disable_unprepare(dwmac->tx_clk); + return ret; }
static void intel_eth_plat_remove(struct platform_device *pdev) @@ -165,7 +173,8 @@ static void intel_eth_plat_remove(struct platform_device *pdev) struct intel_dwmac *dwmac = get_stmmac_bsp_priv(&pdev->dev);
stmmac_pltfr_remove(pdev); - clk_disable_unprepare(dwmac->tx_clk); + if (dwmac->data->tx_clk_en) + clk_disable_unprepare(dwmac->tx_clk); }
static struct platform_driver intel_eth_plat_driver = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Meghana Malladi m-malladi@ti.com
[ Upstream commit dc065076ee7768377d7c16af7d1b0767782d8c98 ]
The first PPS latch time needs to be calculated by the driver (in rounded off seconds) and configured as the start time offset for the cycle. After synchronizing two PTP clocks running as master/slave, missing this would cause master and slave to start immediately with some milliseconds drift which causes the PPS signal to never synchronize with the PTP master.
Fixes: 186734c15886 ("net: ti: icssg-prueth: add packet timestamping and ptp support") Signed-off-by: Meghana Malladi m-malladi@ti.com Reviewed-by: Vadim Fedorenko vadim.fedorenko@linux.dev Reviewed-by: MD Danish Anwar danishanwar@ti.com Link: https://patch.msgid.link/20241111095842.478833-1-m-malladi@ti.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ti/icssg/icssg_prueth.c | 13 +++++++++++-- drivers/net/ethernet/ti/icssg/icssg_prueth.h | 12 ++++++++++++ 2 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c index fb120baee5532..7efb3e347c042 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c @@ -15,6 +15,7 @@ #include <linux/genalloc.h> #include <linux/if_vlan.h> #include <linux/interrupt.h> +#include <linux/io-64-nonatomic-hi-lo.h> #include <linux/kernel.h> #include <linux/mfd/syscon.h> #include <linux/module.h> @@ -1245,6 +1246,8 @@ static int prueth_perout_enable(void *clockops_data, struct prueth_emac *emac = clockops_data; u32 reduction_factor = 0, offset = 0; struct timespec64 ts; + u64 current_cycle; + u64 start_offset; u64 ns_period;
if (!on) @@ -1283,8 +1286,14 @@ static int prueth_perout_enable(void *clockops_data, writel(reduction_factor, emac->prueth->shram.va + TIMESYNC_FW_WC_SYNCOUT_REDUCTION_FACTOR_OFFSET);
- writel(0, emac->prueth->shram.va + - TIMESYNC_FW_WC_SYNCOUT_START_TIME_CYCLECOUNT_OFFSET); + current_cycle = icssg_read_time(emac->prueth->shram.va + + TIMESYNC_FW_WC_CYCLECOUNT_OFFSET); + + /* Rounding of current_cycle count to next second */ + start_offset = roundup(current_cycle, MSEC_PER_SEC); + + hi_lo_writeq(start_offset, emac->prueth->shram.va + + TIMESYNC_FW_WC_SYNCOUT_START_TIME_CYCLECOUNT_OFFSET);
return 0; } diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h index 3fe80a8758d30..0713ad7897b68 100644 --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h @@ -257,6 +257,18 @@ static inline int prueth_emac_slice(struct prueth_emac *emac)
extern const struct ethtool_ops icssg_ethtool_ops;
+static inline u64 icssg_read_time(const void __iomem *addr) +{ + u32 low, high; + + do { + high = readl(addr + 4); + low = readl(addr); + } while (high != readl(addr + 4)); + + return low + ((u64)high << 32); +} + /* Classifier helpers */ void icssg_class_set_mac_addr(struct regmap *miig_rt, int slice, u8 *mac); void icssg_class_set_host_mac_addr(struct regmap *miig_rt, const u8 *mac);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 8eb36164d1a6769a20ed43033510067ff3dab9ee ]
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves") tried to resolve the issue where backup slaves couldn't be brought up when receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only worked for drivers that receive all multicast messages, such as the veth interface.
For standard drivers, the NS multicast message is silently dropped because the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target multicast group, ensuring it can receive these IPv6 NS messages to validate the slave’s status properly.
There are three policies before joining the multicast group: 1. All settings must be under active-backup mode (alb and tlb do not support arp_validate), with backup slaves and slaves supporting multicast. 2. We can add or remove multicast groups when arp_validate changes. 3. Other operations, such as enslaving, releasing, or setting NS targets, need to be guarded by arp_validate.
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 16 +++++- drivers/net/bonding/bond_options.c | 82 +++++++++++++++++++++++++++++- include/net/bond_options.h | 2 + 3 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 14b4780b73c72..bee93a437f997 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -934,6 +934,8 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active,
if (bond->dev->flags & IFF_UP) bond_hw_addr_flush(bond->dev, old_active->dev); + + bond_slave_ns_maddrs_add(bond, old_active); }
if (new_active) { @@ -950,6 +952,8 @@ static void bond_hw_addr_swap(struct bonding *bond, struct slave *new_active, dev_mc_sync(new_active->dev, bond->dev); netif_addr_unlock_bh(bond->dev); } + + bond_slave_ns_maddrs_del(bond, new_active); } }
@@ -2267,6 +2271,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, bond_compute_features(bond); bond_set_carrier(bond);
+ /* Needs to be called before bond_select_active_slave(), which will + * remove the maddrs if the slave is selected as active slave. + */ + bond_slave_ns_maddrs_add(bond, new_slave); + if (bond_uses_primary(bond)) { block_netpoll_tx(); bond_select_active_slave(bond); @@ -2276,7 +2285,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL);
- if (!slave_dev->netdev_ops->ndo_bpf || !slave_dev->netdev_ops->ndo_xdp_xmit) { if (bond->xdp_prog) { @@ -2474,6 +2482,12 @@ static int __bond_release_one(struct net_device *bond_dev, if (oldcurrent == slave) bond_change_active_slave(bond, NULL);
+ /* Must be called after bond_change_active_slave () as the slave + * might change from an active slave to a backup slave. Then it is + * necessary to clear the maddrs on the backup slave. + */ + bond_slave_ns_maddrs_del(bond, slave); + if (bond_is_lb(bond)) { /* Must be called only after the slave has been * detached from the list and the curr_active_slave diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index d1208d058eea1..8c326e41b8d63 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -15,6 +15,7 @@ #include <linux/sched/signal.h>
#include <net/bonding.h> +#include <net/ndisc.h>
static int bond_option_active_slave_set(struct bonding *bond, const struct bond_opt_value *newval); @@ -1218,6 +1219,68 @@ static int bond_option_arp_ip_targets_set(struct bonding *bond, }
#if IS_ENABLED(CONFIG_IPV6) +static bool slave_can_set_ns_maddr(const struct bonding *bond, struct slave *slave) +{ + return BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP && + !bond_is_active_slave(slave) && + slave->dev->flags & IFF_MULTICAST; +} + +static void slave_set_ns_maddrs(struct bonding *bond, struct slave *slave, bool add) +{ + struct in6_addr *targets = bond->params.ns_targets; + char slot_maddr[MAX_ADDR_LEN]; + int i; + + if (!slave_can_set_ns_maddr(bond, slave)) + return; + + for (i = 0; i < BOND_MAX_NS_TARGETS; i++) { + if (ipv6_addr_any(&targets[i])) + break; + + if (!ndisc_mc_map(&targets[i], slot_maddr, slave->dev, 0)) { + if (add) + dev_mc_add(slave->dev, slot_maddr); + else + dev_mc_del(slave->dev, slot_maddr); + } + } +} + +void bond_slave_ns_maddrs_add(struct bonding *bond, struct slave *slave) +{ + if (!bond->params.arp_validate) + return; + slave_set_ns_maddrs(bond, slave, true); +} + +void bond_slave_ns_maddrs_del(struct bonding *bond, struct slave *slave) +{ + if (!bond->params.arp_validate) + return; + slave_set_ns_maddrs(bond, slave, false); +} + +static void slave_set_ns_maddr(struct bonding *bond, struct slave *slave, + struct in6_addr *target, struct in6_addr *slot) +{ + char target_maddr[MAX_ADDR_LEN], slot_maddr[MAX_ADDR_LEN]; + + if (!bond->params.arp_validate || !slave_can_set_ns_maddr(bond, slave)) + return; + + /* remove the previous maddr from slave */ + if (!ipv6_addr_any(slot) && + !ndisc_mc_map(slot, slot_maddr, slave->dev, 0)) + dev_mc_del(slave->dev, slot_maddr); + + /* add new maddr on slave if target is set */ + if (!ipv6_addr_any(target) && + !ndisc_mc_map(target, target_maddr, slave->dev, 0)) + dev_mc_add(slave->dev, target_maddr); +} + static void _bond_options_ns_ip6_target_set(struct bonding *bond, int slot, struct in6_addr *target, unsigned long last_rx) @@ -1227,8 +1290,10 @@ static void _bond_options_ns_ip6_target_set(struct bonding *bond, int slot, struct slave *slave;
if (slot >= 0 && slot < BOND_MAX_NS_TARGETS) { - bond_for_each_slave(bond, slave, iter) + bond_for_each_slave(bond, slave, iter) { slave->target_last_arp_rx[slot] = last_rx; + slave_set_ns_maddr(bond, slave, target, &targets[slot]); + } targets[slot] = *target; } } @@ -1280,15 +1345,30 @@ static int bond_option_ns_ip6_targets_set(struct bonding *bond, { return -EPERM; } + +static void slave_set_ns_maddrs(struct bonding *bond, struct slave *slave, bool add) {} + +void bond_slave_ns_maddrs_add(struct bonding *bond, struct slave *slave) {} + +void bond_slave_ns_maddrs_del(struct bonding *bond, struct slave *slave) {} #endif
static int bond_option_arp_validate_set(struct bonding *bond, const struct bond_opt_value *newval) { + bool changed = !!bond->params.arp_validate != !!newval->value; + struct list_head *iter; + struct slave *slave; + netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n", newval->string, newval->value); bond->params.arp_validate = newval->value;
+ if (changed) { + bond_for_each_slave(bond, slave, iter) + slave_set_ns_maddrs(bond, slave, !!bond->params.arp_validate); + } + return 0; }
diff --git a/include/net/bond_options.h b/include/net/bond_options.h index 69292ecc03257..f631d9f099410 100644 --- a/include/net/bond_options.h +++ b/include/net/bond_options.h @@ -160,5 +160,7 @@ void bond_option_arp_ip_targets_clear(struct bonding *bond); #if IS_ENABLED(CONFIG_IPV6) void bond_option_ns_ip6_targets_clear(struct bonding *bond); #endif +void bond_slave_ns_maddrs_add(struct bonding *bond, struct slave *slave); +void bond_slave_ns_maddrs_del(struct bonding *bond, struct slave *slave);
#endif /* _NET_BOND_OPTIONS_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harith G harith.g@alifsemi.com
[ Upstream commit ed6cbe6e5563452f305e89c15846820f2874e431 ]
The patchset introducing kernel_sec_start/end variables to separate the kernel/lowmem memory mappings, broke the mapping of the kernel memory for xipkernels.
kernel_sec_start/end variables are in RO area before the MMU is switched on for xipkernels. So these cannot be set early in boot in head.S. Fix this by setting these after MMU is switched on. xipkernels need two different mappings for kernel text (starting at CONFIG_XIP_PHYS_ADDR) and data (starting at CONFIG_PHYS_OFFSET). Also, move the kernel code mapping from devicemaps_init() to map_kernel().
Fixes: a91da5457085 ("ARM: 9089/1: Define kernel physical section start and end") Signed-off-by: Harith George harith.g@alifsemi.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/kernel/head.S | 8 ++++++-- arch/arm/mm/mmu.c | 34 +++++++++++++++++++++------------- 2 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 1ec35f065617e..28873cda464f5 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -252,11 +252,15 @@ __create_page_tables: */ add r0, r4, #KERNEL_OFFSET >> (SECTION_SHIFT - PMD_ENTRY_ORDER) ldr r6, =(_end - 1) + + /* For XIP, kernel_sec_start/kernel_sec_end are currently in RO memory */ +#ifndef CONFIG_XIP_KERNEL adr_l r5, kernel_sec_start @ _pa(kernel_sec_start) #if defined CONFIG_CPU_ENDIAN_BE8 || defined CONFIG_CPU_ENDIAN_BE32 str r8, [r5, #4] @ Save physical start of kernel (BE) #else str r8, [r5] @ Save physical start of kernel (LE) +#endif #endif orr r3, r8, r7 @ Add the MMU flags add r6, r4, r6, lsr #(SECTION_SHIFT - PMD_ENTRY_ORDER) @@ -264,6 +268,7 @@ __create_page_tables: add r3, r3, #1 << SECTION_SHIFT cmp r0, r6 bls 1b +#ifndef CONFIG_XIP_KERNEL eor r3, r3, r7 @ Remove the MMU flags adr_l r5, kernel_sec_end @ _pa(kernel_sec_end) #if defined CONFIG_CPU_ENDIAN_BE8 || defined CONFIG_CPU_ENDIAN_BE32 @@ -271,8 +276,7 @@ __create_page_tables: #else str r3, [r5] @ Save physical end of kernel (LE) #endif - -#ifdef CONFIG_XIP_KERNEL +#else /* * Map the kernel image separately as it is not located in RAM. */ diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 674ed71573a84..073de5b24560d 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1402,18 +1402,6 @@ static void __init devicemaps_init(const struct machine_desc *mdesc) create_mapping(&map); }
- /* - * Map the kernel if it is XIP. - * It is always first in the modulearea. - */ -#ifdef CONFIG_XIP_KERNEL - map.pfn = __phys_to_pfn(CONFIG_XIP_PHYS_ADDR & SECTION_MASK); - map.virtual = MODULES_VADDR; - map.length = ((unsigned long)_exiprom - map.virtual + ~SECTION_MASK) & SECTION_MASK; - map.type = MT_ROM; - create_mapping(&map); -#endif - /* * Map the cache flushing regions. */ @@ -1603,12 +1591,27 @@ static void __init map_kernel(void) * This will only persist until we turn on proper memory management later on * and we remap the whole kernel with page granularity. */ +#ifdef CONFIG_XIP_KERNEL + phys_addr_t kernel_nx_start = kernel_sec_start; +#else phys_addr_t kernel_x_start = kernel_sec_start; phys_addr_t kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE); phys_addr_t kernel_nx_start = kernel_x_end; +#endif phys_addr_t kernel_nx_end = kernel_sec_end; struct map_desc map;
+ /* + * Map the kernel if it is XIP. + * It is always first in the modulearea. + */ +#ifdef CONFIG_XIP_KERNEL + map.pfn = __phys_to_pfn(CONFIG_XIP_PHYS_ADDR & SECTION_MASK); + map.virtual = MODULES_VADDR; + map.length = ((unsigned long)_exiprom - map.virtual + ~SECTION_MASK) & SECTION_MASK; + map.type = MT_ROM; + create_mapping(&map); +#else map.pfn = __phys_to_pfn(kernel_x_start); map.virtual = __phys_to_virt(kernel_x_start); map.length = kernel_x_end - kernel_x_start; @@ -1618,7 +1621,7 @@ static void __init map_kernel(void) /* If the nx part is small it may end up covered by the tail of the RWX section */ if (kernel_x_end == kernel_nx_end) return; - +#endif map.pfn = __phys_to_pfn(kernel_nx_start); map.virtual = __phys_to_virt(kernel_nx_start); map.length = kernel_nx_end - kernel_nx_start; @@ -1763,6 +1766,11 @@ void __init paging_init(const struct machine_desc *mdesc) { void *zero_page;
+#ifdef CONFIG_XIP_KERNEL + /* Store the kernel RW RAM region start/end in these variables */ + kernel_sec_start = CONFIG_PHYS_OFFSET & SECTION_MASK; + kernel_sec_end = round_up(__pa(_end), SECTION_SIZE); +#endif pr_debug("physical kernel sections: 0x%08llx-0x%08llx\n", kernel_sec_start, kernel_sec_end);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Motiejus JakÅ`tys motiejus@jakstys.lt
[ Upstream commit a39326767c55c00c7c313333404cbcb502cce8fe ]
Add a missing semicolon.
Link: https://lkml.kernel.org/r/20241112171655.1662670-1-motiejus@jakstys.lt Fixes: ece5897e5a10 ("tools/mm: -Werror fixes in page-types/slabinfo") Signed-off-by: Motiejus JakÅ`tys motiejus@jakstys.lt Closes: https://github.com/NixOS/nixpkgs/issues/355369 Reviewed-by: SeongJae Park sj@kernel.org Reviewed-by: Vishal Moola (Oracle) vishal.moola@gmail.com Acked-by: Oleksandr Natalenko oleksandr@natalenko.name Cc: Wladislav Wiebe wladislav.kw@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/mm/page-types.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/mm/page-types.c b/tools/mm/page-types.c index 2a4ca4dd2da80..69f00eab1b8c7 100644 --- a/tools/mm/page-types.c +++ b/tools/mm/page-types.c @@ -421,7 +421,7 @@ static void show_page(unsigned long voffset, unsigned long offset, if (opt_file) printf("%lx\t", voffset); if (opt_list_cgroup) - printf("@%" PRIu64 "\t", cgroup) + printf("@%" PRIu64 "\t", cgroup); if (opt_list_mapcnt) printf("%" PRIu64 "\t", mapcnt);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baoquan He bhe@redhat.com
commit 8d9ffb2fe65a6c4ef114e8d4f947958a12751bbe upstream.
The kdump kernel is broken on SME systems with CONFIG_IMA_KEXEC=y enabled. Debugging traced the issue back to
b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec").
Testing was previously not conducted on SME systems with CONFIG_IMA_KEXEC enabled, which led to the oversight, with the following incarnation:
... ima: No TPM chip found, activating TPM-bypass! Loading compiled-in module X.509 certificates Loaded X.509 cert 'Build time autogenerated kernel key: 18ae0bc7e79b64700122bb1d6a904b070fef2656' ima: Allocated hash algorithm: sha256 Oops: general protection fault, probably for non-canonical address 0xcfacfdfe6660003e: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14 Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023 RIP: 0010:ima_restore_measurement_list Call Trace: <TASK> ? show_trace_log_lvl ? show_trace_log_lvl ? ima_load_kexec_buffer ? __die_body.cold ? die_addr ? exc_general_protection ? asm_exc_general_protection ? ima_restore_measurement_list ? vprintk_emit ? ima_load_kexec_buffer ima_load_kexec_buffer ima_init ? __pfx_init_ima init_ima ? __pfx_init_ima do_one_initcall do_initcalls ? __pfx_kernel_init kernel_init_freeable kernel_init ret_from_fork ? __pfx_kernel_init ret_from_fork_asm </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- ... Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Rebooting in 10 seconds..
Adding debug printks showed that the stored addr and size of ima_kexec buffer are not decrypted correctly like:
ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, size:0xe48066052d5df359
Three types of setup_data info
— SETUP_EFI, - SETUP_IMA, and - SETUP_RNG_SEED
are passed to the kexec/kdump kernel. Only the ima_kexec buffer experienced incorrect decryption. Debugging identified a bug in early_memremap_is_setup_data(), where an incorrect range calculation occurred due to the len variable in struct setup_data ended up only representing the length of the data field, excluding the struct's size, and thus leading to miscalculation.
Address a similar issue in memremap_is_setup_data() while at it.
[ bp: Heavily massage. ]
Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect") Signed-off-by: Baoquan He bhe@redhat.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Tom Lendacky thomas.lendacky@amd.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240911081615.262202-3-bhe@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/ioremap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -655,7 +655,8 @@ static bool memremap_is_setup_data(resou paddr_next = data->next; len = data->len;
- if ((phys_addr > paddr) && (phys_addr < (paddr + len))) { + if ((phys_addr > paddr) && + (phys_addr < (paddr + sizeof(struct setup_data) + len))) { memunmap(data); return true; } @@ -717,7 +718,8 @@ static bool __init early_memremap_is_set paddr_next = data->next; len = data->len;
- if ((phys_addr > paddr) && (phys_addr < (paddr + len))) { + if ((phys_addr > paddr) && + (phys_addr < (paddr + sizeof(struct setup_data) + len))) { early_memunmap(data, sizeof(*data)); return true; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jinjiang Tu tujinjiang@huawei.com
commit 8ce41b0f9d77cca074df25afd39b86e2ee3aa68e upstream.
We triggered a NULL pointer dereference for ac.preferred_zoneref->zone in alloc_pages_bulk_noprof() when the task is migrated between cpusets.
When cpuset is enabled, in prepare_alloc_pages(), ac->nodemask may be ¤t->mems_allowed. when first_zones_zonelist() is called to find preferred_zoneref, the ac->nodemask may be modified concurrently if the task is migrated between different cpusets. Assuming we have 2 NUMA Node, when traversing Node1 in ac->zonelist, the nodemask is 2, and when traversing Node2 in ac->zonelist, the nodemask is 1. As a result, the ac->preferred_zoneref points to NULL zone.
In alloc_pages_bulk_noprof(), for_each_zone_zonelist_nodemask() finds a allowable zone and calls zonelist_node_idx(ac.preferred_zoneref), leading to NULL pointer dereference.
__alloc_pages_noprof() fixes this issue by checking NULL pointer in commit ea57485af8f4 ("mm, page_alloc: fix check for NULL preferred_zone") and commit df76cee6bbeb ("mm, page_alloc: remove redundant checks from alloc fastpath").
To fix it, check NULL pointer for preferred_zoneref->zone.
Link: https://lkml.kernel.org/r/20241113083235.166798-1-tujinjiang@huawei.com Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") Signed-off-by: Jinjiang Tu tujinjiang@huawei.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Cc: Alexander Lobakin alobakin@pm.me Cc: David Hildenbrand david@redhat.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Nanyong Sun sunnanyong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4301,7 +4301,8 @@ unsigned long __alloc_pages_bulk(gfp_t g gfp = alloc_gfp;
/* Find an allowed local zone that meets the low watermark. */ - for_each_zone_zonelist_nodemask(zone, z, ac.zonelist, ac.highest_zoneidx, ac.nodemask) { + z = ac.preferred_zoneref; + for_next_zone_zonelist_nodemask(zone, z, ac.highest_zoneidx, ac.nodemask) { unsigned long mark;
if (cpusets_enabled() && (alloc_flags & ALLOC_CPUSET) &&
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
commit 737f34137844d6572ab7d473c998c7f977ff30eb upstream.
Syzbot has reported the following BUG:
kernel BUG at fs/ocfs2/uptodate.c:509! ... Call Trace: <TASK> ? __die_body+0x5f/0xb0 ? die+0x9e/0xc0 ? do_trap+0x15a/0x3a0 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ? do_error_trap+0x1dc/0x2c0 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ? __pfx_do_error_trap+0x10/0x10 ? handle_invalid_op+0x34/0x40 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ? exc_invalid_op+0x38/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? ocfs2_set_new_buffer_uptodate+0x2e/0x160 ? ocfs2_set_new_buffer_uptodate+0x144/0x160 ? ocfs2_set_new_buffer_uptodate+0x145/0x160 ocfs2_group_add+0x39f/0x15a0 ? __pfx_ocfs2_group_add+0x10/0x10 ? __pfx_lock_acquire+0x10/0x10 ? mnt_get_write_access+0x68/0x2b0 ? __pfx_lock_release+0x10/0x10 ? rcu_read_lock_any_held+0xb7/0x160 ? __pfx_rcu_read_lock_any_held+0x10/0x10 ? smack_log+0x123/0x540 ? mnt_get_write_access+0x68/0x2b0 ? mnt_get_write_access+0x68/0x2b0 ? mnt_get_write_access+0x226/0x2b0 ocfs2_ioctl+0x65e/0x7d0 ? __pfx_ocfs2_ioctl+0x10/0x10 ? smack_file_ioctl+0x29e/0x3a0 ? __pfx_smack_file_ioctl+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x43d/0x780 ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10 ? __pfx_ocfs2_ioctl+0x10/0x10 __se_sys_ioctl+0xfb/0x170 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... </TASK>
When 'ioctl(OCFS2_IOC_GROUP_ADD, ...)' has failed for the particular inode in 'ocfs2_verify_group_and_input()', corresponding buffer head remains cached and subsequent call to the same 'ioctl()' for the same inode issues the BUG() in 'ocfs2_set_new_buffer_uptodate()' (trying to cache the same buffer head of that inode). Fix this by uncaching the buffer head with 'ocfs2_remove_from_cache()' on error path in 'ocfs2_group_add()'.
Link: https://lkml.kernel.org/r/20241114043844.111847-1-dmantipov@yandex.ru Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize") Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Reported-by: syzbot+453873f1588c2d75b447@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=453873f1588c2d75b447 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Dmitry Antipov dmantipov@yandex.ru Cc: Joel Becker jlbec@evilplan.org Cc: Mark Fasheh mark@fasheh.com Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/resize.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/ocfs2/resize.c +++ b/fs/ocfs2/resize.c @@ -566,6 +566,8 @@ out_commit: ocfs2_commit_trans(osb, handle);
out_free_group_bh: + if (ret < 0) + ocfs2_remove_from_cache(INODE_CACHE(inode), group_bh); brelse(group_bh);
out_unlock:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Morton akpm@linux-foundation.org
commit d1aa0c04294e29883d65eac6c2f72fe95cc7c049 upstream.
Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as suggested by Chuck [1]. It is causing deadlocks when accessing tmpfs over NFS.
As Hugh commented, "added just to silence a syzbot sanitizer splat: added where there has never been any practical problem".
Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1] Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") Acked-by: Hugh Dickins hughd@google.com Cc: Chuck Lever chuck.lever@oracle.com Cc: Jeongjun Park aha310510@gmail.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/shmem.c | 2 -- 1 file changed, 2 deletions(-)
--- a/mm/shmem.c +++ b/mm/shmem.c @@ -1158,9 +1158,7 @@ static int shmem_getattr(struct mnt_idma stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); - inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); - inode_unlock_shared(inode);
if (shmem_is_huge(inode, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philipp Stanner pstanner@redhat.com
commit 0b364cf53b20204e92bac7c6ebd1ee7d3ec62931 upstream.
In psnet_open_pf_bar() and snet_open_vf_bar() a string later passed to pcim_iomap_regions() is placed on the stack. Neither pcim_iomap_regions() nor the functions it calls copy that string.
Should the string later ever be used, this, consequently, causes undefined behavior since the stack frame will by then have disappeared.
Fix the bug by allocating the strings on the heap through devm_kasprintf().
Cc: stable@vger.kernel.org # v6.3 Fixes: 51a8f9d7f587 ("virtio: vdpa: new SolidNET DPU driver.") Reported-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Closes: https://lore.kernel.org/all/74e9109a-ac59-49e2-9b1d-d825c9c9f891@wanadoo.fr/ Suggested-by: Andy Shevchenko andy@kernel.org Signed-off-by: Philipp Stanner pstanner@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Message-Id: 20241028074357.9104-3-pstanner@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vdpa/solidrun/snet_main.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c index 99428a04068d..c8b74980dbd1 100644 --- a/drivers/vdpa/solidrun/snet_main.c +++ b/drivers/vdpa/solidrun/snet_main.c @@ -555,7 +555,7 @@ static const struct vdpa_config_ops snet_config_ops = {
static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) { - char name[50]; + char *name; int ret, i, mask = 0; /* We don't know which BAR will be used to communicate.. * We will map every bar with len > 0. @@ -573,7 +573,10 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) return -ENODEV; }
- snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev)); + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "psnet[%s]-bars", pci_name(pdev)); + if (!name) + return -ENOMEM; + ret = pcim_iomap_regions(pdev, mask, name); if (ret) { SNET_ERR(pdev, "Failed to request and map PCI BARs\n"); @@ -590,10 +593,13 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet) { - char name[50]; + char *name; int ret;
- snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev)); + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "snet[%s]-bars", pci_name(pdev)); + if (!name) + return -ENOMEM; + /* Request and map BAR */ ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name); if (ret) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Si-Wei Liu si-wei.liu@oracle.com
commit 29ce8b8a4fa74e841342c8b8f8941848a3c6f29f upstream.
When calculating the physical address range based on the iotlb and mr [start,end) ranges, the offset of mr->start relative to map->start is not taken into account. This leads to some incorrect and duplicate mappings.
For the case when mr->start < map->start the code is already correct: the range in [mr->start, map->start) was handled by a different iteration.
Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Cc: stable@vger.kernel.org Signed-off-by: Si-Wei Liu si-wei.liu@oracle.com Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Message-Id: 20241021134040.975221-2-dtatulea@nvidia.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vdpa/mlx5/core/mr.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -232,7 +232,7 @@ static int map_direct_mr(struct mlx5_vdp struct page *pg; unsigned int nsg; int sglen; - u64 pa; + u64 pa, offset; u64 paend; struct scatterlist *sg; struct device *dma = mvdev->vdev.dma_dev; @@ -255,8 +255,10 @@ static int map_direct_mr(struct mlx5_vdp sg = mr->sg_head.sgl; for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1); map; map = vhost_iotlb_itree_next(map, mr->start, mr->end - 1)) { - paend = map->addr + maplen(map, mr); - for (pa = map->addr; pa < paend; pa += sglen) { + offset = mr->start > map->start ? mr->start - map->start : 0; + pa = map->addr + offset; + paend = map->addr + offset + maplen(map, mr); + for (; pa < paend; pa += sglen) { pg = pfn_to_page(__phys_to_pfn(pa)); if (!sg) { mlx5_vdpa_warn(mvdev, "sg null. start 0x%llx, end 0x%llx\n",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiaoguang Wang lege.wang@jaguarmicro.com
commit 4e39ecadf1d2a08187139619f1f314b64ba7d947 upstream.
Allocate one extra virtio_device_id as null terminator, otherwise vdpa_mgmtdev_get_classes() may iterate multiple times and visit undefined memory.
Fixes: ffbda8e9df10 ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa") Cc: stable@vger.kernel.org Suggested-by: Parav Pandit parav@nvidia.com Signed-off-by: Angus Chen angus.chen@jaguarmicro.com Signed-off-by: Xiaoguang Wang lege.wang@jaguarmicro.com Message-Id: 20241105133518.1494-1-lege.wang@jaguarmicro.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Parav Pandit parav@nvidia.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vdpa/virtio_pci/vp_vdpa.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -591,7 +591,11 @@ static int vp_vdpa_probe(struct pci_dev goto mdev_err; }
- mdev_id = kzalloc(sizeof(struct virtio_device_id), GFP_KERNEL); + /* + * id_table should be a null terminated array, so allocate one additional + * entry here, see vdpa_mgmtdev_get_classes(). + */ + mdev_id = kcalloc(2, sizeof(struct virtio_device_id), GFP_KERNEL); if (!mdev_id) { err = -ENOMEM; goto mdev_id_err; @@ -611,8 +615,8 @@ static int vp_vdpa_probe(struct pci_dev goto probe_err; }
- mdev_id->device = mdev->id.device; - mdev_id->vendor = mdev->id.vendor; + mdev_id[0].device = mdev->id.device; + mdev_id[0].vendor = mdev->id.vendor; mgtdev->id_table = mdev_id; mgtdev->max_supported_vqs = vp_modern_get_num_queues(mdev); mgtdev->supported_features = vp_modern_get_features(mdev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samasth Norway Ananda samasth.norway.ananda@oracle.com
commit 923168a0631bc42fffd55087b337b1b6c54dcff5 upstream.
Function ima_eventdigest_init() calls ima_eventdigest_init_common() with HASH_ALGO__LAST which is then used to access the array hash_digest_size[] leading to buffer overrun. Have a conditional statement to handle this.
Fixes: 9fab303a2cb3 ("ima: fix violation measurement list record") Signed-off-by: Samasth Norway Ananda samasth.norway.ananda@oracle.com Tested-by: Enrico Bravi (PhD at polito.it) enrico.bravi@huawei.com Cc: stable@vger.kernel.org # 5.19+ Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/integrity/ima/ima_template_lib.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/security/integrity/ima/ima_template_lib.c +++ b/security/integrity/ima/ima_template_lib.c @@ -318,15 +318,21 @@ static int ima_eventdigest_init_common(c hash_algo_name[hash_algo]); }
- if (digest) + if (digest) { memcpy(buffer + offset, digest, digestsize); - else + } else { /* * If digest is NULL, the event being recorded is a violation. * Make room for the digest by increasing the offset by the - * hash algorithm digest size. + * hash algorithm digest size. If the hash algorithm is not + * specified increase the offset by IMA_DIGEST_SIZE which + * fits SHA1 or MD5 */ - offset += hash_digest_size[hash_algo]; + if (hash_algo < HASH_ALGO__LAST) + offset += hash_digest_size[hash_algo]; + else + offset += IMA_DIGEST_SIZE; + }
return ima_write_template_field_data(buffer, offset + digestsize, fmt, field_data);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit 2657b82a78f18528bef56dc1b017158490970873 upstream.
When getting the current VPID, e.g. to emulate a guest TLB flush, return vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled in vmcs12. Architecturally, if VPID is disabled, then the guest and host effectively share VPID=0. KVM emulates this behavior by using vpid01 when running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so KVM must also treat vpid01 as the current VPID while L2 is active.
Unconditionally treating vpid02 as the current VPID when L2 is active causes KVM to flush TLB entries for vpid02 instead of vpid01, which results in TLB entries from L1 being incorrectly preserved across nested VM-Enter to L2 (L2=>L1 isn't problematic, because the TLB flush after nested VM-Exit flushes vpid01).
The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM incorrectly retains TLB entries for the APIC-access page across a nested VM-Enter.
Opportunisticaly add comments at various touchpoints to explain the architectural requirements, and also why KVM uses vpid01 instead of vpid02.
All credit goes to Chao, who root caused the issue and identified the fix.
Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST") Cc: stable@vger.kernel.org Cc: Like Xu like.xu.linux@gmail.com Debugged-by: Chao Gao chao.gao@intel.com Reviewed-by: Chao Gao chao.gao@intel.com Tested-by: Chao Gao chao.gao@intel.com Link: https://lore.kernel.org/r/20241031202011.1580522-1-seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/nested.c | 30 +++++++++++++++++++++++++----- arch/x86/kvm/vmx/vmx.c | 2 +- 2 files changed, 26 insertions(+), 6 deletions(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1150,11 +1150,14 @@ static void nested_vmx_transition_tlb_fl kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
/* - * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings - * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a - * full TLB flush from the guest's perspective. This is required even - * if VPID is disabled in the host as KVM may need to synchronize the - * MMU in response to the guest TLB flush. + * If VPID is disabled, then guest TLB accesses use VPID=0, i.e. the + * same VPID as the host, and so architecturally, linear and combined + * mappings for VPID=0 must be flushed at VM-Enter and VM-Exit. KVM + * emulates L2 sharing L1's VPID=0 by using vpid01 while running L2, + * and so KVM must also emulate TLB flush of VPID=0, i.e. vpid01. This + * is required if VPID is disabled in KVM, as a TLB flush (there are no + * VPIDs) still occurs from L1's perspective, and KVM may need to + * synchronize the MMU in response to the guest TLB flush. * * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use. * EPT is a special snowflake, as guest-physical mappings aren't @@ -2229,6 +2232,17 @@ static void prepare_vmcs02_early_rare(st
vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA);
+ /* + * If VPID is disabled, then guest TLB accesses use VPID=0, i.e. the + * same VPID as the host. Emulate this behavior by using vpid01 for L2 + * if VPID is disabled in vmcs12. Note, if VPID is disabled, VM-Enter + * and VM-Exit are architecturally required to flush VPID=0, but *only* + * VPID=0. I.e. using vpid02 would be ok (so long as KVM emulates the + * required flushes), but doing so would cause KVM to over-flush. E.g. + * if L1 runs L2 X with VPID12=1, then runs L2 Y with VPID12 disabled, + * and then runs L2 X again, then KVM can and should retain TLB entries + * for VPID12=1. + */ if (enable_vpid) { if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02) vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02); @@ -5827,6 +5841,12 @@ static int handle_invvpid(struct kvm_vcp return nested_vmx_fail(vcpu, VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+ /* + * Always flush the effective vpid02, i.e. never flush the current VPID + * and never explicitly flush vpid01. INVVPID targets a VPID, not a + * VMCS, and so whether or not the current vmcs12 has VPID enabled is + * irrelevant (and there may not be a loaded vmcs12). + */ vpid02 = nested_get_vpid02(vcpu); switch (type) { case VMX_VPID_EXTENT_INDIVIDUAL_ADDR: --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3193,7 +3193,7 @@ static void vmx_flush_tlb_all(struct kvm
static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) + if (is_guest_mode(vcpu) && nested_cpu_has_vpid(get_vmcs12(vcpu))) return nested_get_vpid02(vcpu); return to_vmx(vcpu)->vpid; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit d3ddef46f22e8c3124e0df1f325bc6a18dadff39 upstream.
Always set irr_pending (to true) when updating APICv status to fix a bug where KVM fails to set irr_pending when userspace sets APIC state and APICv is disabled, which ultimate results in KVM failing to inject the pending interrupt(s) that userspace stuffed into the vIRR, until another interrupt happens to be emulated by KVM.
Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to be true if APICv is enabled, because not all vIRR updates will be visible to KVM.
Hit the bug with a big hammer, even though strictly speaking KVM can scan the vIRR and set/clear irr_pending as appropriate for this specific case. The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as the shortlog suggests, deleted code that updated irr_pending.
Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with with the crucial difference that kvm_apic_update_apicv() did the scan even when APICv was being *disabled*, e.g. due to an AVIC inhibition.
struct kvm_lapic *apic = vcpu->arch.apic;
if (vcpu->arch.apicv_active) { /* irr_pending is always true when apicv is activated. */ apic->irr_pending = true; apic->isr_count = 1; } else { apic->irr_pending = (apic_search_irr(apic) != -1); apic->isr_count = count_vectors(apic->regs + APIC_ISR); }
And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function"), prior to which KVM unconditionally set irr_pending to true in kvm_apic_set_state(), i.e. assumed that the new virtual APIC state could have a pending IRQ.
Furthermore, in addition to introducing this issue, commit 755c2bf87860 also papered over the underlying bug: KVM doesn't ensure CPUs and devices see APICv as disabled prior to searching the IRR. Waiting until KVM emulates an EOI to update irr_pending "works", but only because KVM won't emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of memory barriers in between. I.e. leaving irr_pending set is basically hacking around bad ordering.
So, effectively revert to the pre-b26a695a1d78 behavior for state restore, even though it's sub-optimal if no IRQs are pending, in order to provide a minimal fix, but leave behind a FIXME to document the ugliness. With luck, the ordering issue will be fixed and the mess will be cleaned up in the not-too-distant future.
Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it") Cc: stable@vger.kernel.org Cc: Maxim Levitsky mlevitsk@redhat.com Reported-by: Yong He zhuangel570@gmail.com Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com Signed-off-by: Sean Christopherson seanjc@google.com Message-ID: 20241106015135.2462147-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/lapic.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2603,19 +2603,26 @@ void kvm_apic_update_apicv(struct kvm_vc { struct kvm_lapic *apic = vcpu->arch.apic;
- if (apic->apicv_active) { - /* irr_pending is always true when apicv is activated. */ - apic->irr_pending = true; + /* + * When APICv is enabled, KVM must always search the IRR for a pending + * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU + * isn't running. If APICv is disabled, KVM _should_ search the IRR + * for a pending IRQ. But KVM currently doesn't ensure *all* hardware, + * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching + * the IRR at this time could race with IRQ delivery from hardware that + * still sees APICv as being enabled. + * + * FIXME: Ensure other vCPUs and devices observe the change in APICv + * state prior to updating KVM's metadata caches, so that KVM + * can safely search the IRR and set irr_pending accordingly. + */ + apic->irr_pending = true; + + if (apic->apicv_active) apic->isr_count = 1; - } else { - /* - * Don't clear irr_pending, searching the IRR can race with - * updates from the CPU as APICv is still active from hardware's - * perspective. The flag will be cleared as appropriate when - * KVM injects the interrupt. - */ + else apic->isr_count = count_vectors(apic->regs + APIC_ISR); - } + apic->highest_isr_cache = -1; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson seanjc@google.com
commit aa0d42cacf093a6fcca872edc954f6f812926a17 upstream.
Hide KVM's pt_mode module param behind CONFIG_BROKEN, i.e. disable support for virtualizing Intel PT via guest/host mode unless BROKEN=y. There are myriad bugs in the implementation, some of which are fatal to the guest, and others which put the stability and health of the host at risk.
For guest fatalities, the most glaring issue is that KVM fails to ensure tracing is disabled, and *stays* disabled prior to VM-Enter, which is necessary as hardware disallows loading (the guest's) RTIT_CTL if tracing is enabled (enforced via a VMX consistency check). Per the SDM:
If the logical processor is operating with Intel PT enabled (if IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load IA32_RTIT_CTL" VM-entry control must be 0.
On the host side, KVM doesn't validate the guest CPUID configuration provided by userspace, and even worse, uses the guest configuration to decide what MSRs to save/load at VM-Enter and VM-Exit. E.g. configuring guest CPUID to enumerate more address ranges than are supported in hardware will result in KVM trying to passthrough, save, and load non-existent MSRs, which generates a variety of WARNs, ToPA ERRORs in the host, a potential deadlock, etc.
Fixes: f99e3daf94ff ("KVM: x86: Add Intel PT virtualization work mode") Cc: stable@vger.kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Xiaoyao Li xiaoyao.li@intel.com Tested-by: Adrian Hunter adrian.hunter@intel.com Message-ID: 20241101185031.1799556-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -212,9 +212,11 @@ module_param(ple_window_shrink, uint, 04 static unsigned int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX; module_param(ple_window_max, uint, 0444);
-/* Default is SYSTEM mode, 1 for host-guest mode */ +/* Default is SYSTEM mode, 1 for host-guest mode (which is BROKEN) */ int __read_mostly pt_mode = PT_MODE_SYSTEM; +#ifdef CONFIG_BROKEN module_param(pt_mode, int, S_IRUGO); +#endif
static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit cd45e963e44b0f10d90b9e6c0e8b4f47f3c92471 upstream.
Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints".
This series fixes null pointer dereference bugs that occur when using nilfs2 and two block-related tracepoints.
This patch (of 2):
It has been reported that when using "block:block_touch_buffer" tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a NULL pointer dereference, or a general protection fault when KASAN is enabled.
This happens because since the tracepoint was added in touch_buffer(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, the block_device structure is set after the function returns to the caller.
Here, touch_buffer() is used to mark the folio/page that owns the buffer head as accessed, but the common search helper for folio/page used by the caller function was optimized to mark the folio/page as accessed when it was reimplemented a long time ago, eliminating the need to call touch_buffer() here in the first place.
So this solves the issue by eliminating the touch_buffer() call itself.
Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: Ubisectech Sirius bugreport@valiantsec.com Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@val... Reported-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2 Tested-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/page.c | 1 - 1 file changed, 1 deletion(-)
--- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -39,7 +39,6 @@ __nilfs_get_page_block(struct page *page first_block = (unsigned long)index << (PAGE_SHIFT - blkbits); bh = nilfs_page_get_nth_block(page, block - first_block);
- touch_buffer(bh); wait_on_buffer(bh); return bh; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hajime Tazaki thehajime@gmail.com
commit 247d720b2c5d22f7281437fd6054a138256986ba upstream.
When deleting a vma entry from a maple tree, it has to pass NULL to vma_iter_prealloc() in order to calculate internal state of the tree, but it passed a wrong argument. As a result, nommu kernels crashed upon accessing a vma iterator, such as acct_collect() reading the size of vma entries after do_munmap().
This commit fixes this issue by passing a right argument to the preallocation call.
Link: https://lkml.kernel.org/r/20241108222834.3625217-1-thehajime@gmail.com Fixes: b5df09226450 ("mm: set up vma iterator for vma_iter_prealloc() calls") Signed-off-by: Hajime Tazaki thehajime@gmail.com Reviewed-by: Liam R. Howlett Liam.Howlett@Oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/nommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/nommu.c +++ b/mm/nommu.c @@ -584,7 +584,7 @@ static int delete_vma_from_mm(struct vm_ VMA_ITERATOR(vmi, vma->vm_mm, vma->vm_start);
vma_iter_config(&vmi, vma->vm_start, vma->vm_end); - if (vma_iter_prealloc(&vmi, vma)) { + if (vma_iter_prealloc(&vmi, NULL)) { pr_warn("Allocation of vma tree for process %d failed\n", current->pid); return -ENOMEM;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
commit 42ee87df8530150d637aa48363b72b22a9bbd78f upstream.
Clevo platform with ALC255 Headset Mic was disable by default. Assigned verb table for Mic pin will enable it.
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/b2dcac3e09ef4f82b36d6712194e1ea4@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11064,6 +11064,8 @@ static const struct snd_hda_pin_quirk al {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, {0x19, 0x40000000}), + SND_HDA_PIN_QUIRK(0x10ec0255, 0x1558, "Clevo", ALC2XX_FIXUP_HEADSET_MIC, + {0x19, 0x40000000}), {} };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maksym Glubokiy maxgl.kernel@gmail.com
commit 96409eeab8cdd394e03ec494ea9547edc27f7ab4 upstream.
HP EliteBook 645 G10 uses ALC236 codec and need the ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make mute LED and micmute LED work.
Signed-off-by: Maksym Glubokiy maxgl.kernel@gmail.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241112154815.10888-1-maxgl.kernel@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9996,6 +9996,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8b59, "HP Elite mt645 G7 Mobile Thin Client U89", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b5d, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b5e, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), + SND_PCI_QUIRK(0x103c, 0x8b5f, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b63, "HP Elite Dragonfly 13.5 inch G4", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b65, "HP ProBook 455 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b66, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
commit 23aab037106d46e6168ce1214a958ce9bf317f2e upstream.
Syzbot has reported the following splat triggered by UBSAN:
UBSAN: shift-out-of-bounds in fs/ocfs2/super.c:2336:10 shift exponent 32768 is too large for 32-bit type 'int' CPU: 2 UID: 0 PID: 5255 Comm: repro Not tainted 6.12.0-rc4-syzkaller-00047-gc2ee9f594da8 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x241/0x360 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? __asan_memset+0x23/0x50 ? lockdep_init_map_type+0xa1/0x910 __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 ocfs2_fill_super+0xf9c/0x5750 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? validate_chain+0x11e/0x5920 ? __lock_acquire+0x1384/0x2050 ? __pfx_validate_chain+0x10/0x10 ? string+0x26a/0x2b0 ? widen_string+0x3a/0x310 ? string+0x26a/0x2b0 ? bdev_name+0x2b1/0x3c0 ? pointer+0x703/0x1210 ? __pfx_pointer+0x10/0x10 ? __pfx_format_decode+0x10/0x10 ? __lock_acquire+0x1384/0x2050 ? vsnprintf+0x1ccd/0x1da0 ? snprintf+0xda/0x120 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_lock+0x14f/0x370 ? __pfx_snprintf+0x10/0x10 ? set_blocksize+0x1f9/0x360 ? sb_set_blocksize+0x98/0xf0 ? setup_bdev_super+0x4e6/0x5d0 mount_bdev+0x20c/0x2d0 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_mount_bdev+0x10/0x10 ? vfs_parse_fs_string+0x190/0x230 ? __pfx_vfs_parse_fs_string+0x10/0x10 legacy_get_tree+0xf0/0x190 ? __pfx_ocfs2_mount+0x10/0x10 vfs_get_tree+0x92/0x2b0 do_new_mount+0x2be/0xb40 ? __pfx_do_new_mount+0x10/0x10 __se_sys_mount+0x2d6/0x3c0 ? __pfx___se_sys_mount+0x10/0x10 ? do_syscall_64+0x100/0x230 ? __x64_sys_mount+0x20/0xc0 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f37cae96fda Code: 48 8b 0d 51 ce 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e ce 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff6c1aa228 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007fff6c1aa240 RCX: 00007f37cae96fda RDX: 00000000200002c0 RSI: 0000000020000040 RDI: 00007fff6c1aa240 RBP: 0000000000000004 R08: 00007fff6c1aa280 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000206 R12: 00000000000008c0 R13: 00007fff6c1aa280 R14: 0000000000000003 R15: 0000000001000000 </TASK>
For a really damaged superblock, the value of 'i_super.s_blocksize_bits' may exceed the maximum possible shift for an underlying 'int'. So add an extra check whether the aforementioned field represents the valid block size, which is 512 bytes, 1K, 2K, or 4K.
Link: https://lkml.kernel.org/r/20241106092100.2661330-1-dmantipov@yandex.ru Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Reported-by: syzbot+56f7cd1abe4b8e475180@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=56f7cd1abe4b8e475180 Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/super.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -2322,6 +2322,7 @@ static int ocfs2_verify_volume(struct oc struct ocfs2_blockcheck_stats *stats) { int status = -EAGAIN; + u32 blksz_bits;
if (memcmp(di->i_signature, OCFS2_SUPER_BLOCK_SIGNATURE, strlen(OCFS2_SUPER_BLOCK_SIGNATURE)) == 0) { @@ -2336,11 +2337,15 @@ static int ocfs2_verify_volume(struct oc goto out; } status = -EINVAL; - if ((1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits)) != blksz) { + /* Acceptable block sizes are 512 bytes, 1K, 2K and 4K. */ + blksz_bits = le32_to_cpu(di->id2.i_super.s_blocksize_bits); + if (blksz_bits < 9 || blksz_bits > 12) { mlog(ML_ERROR, "found superblock with incorrect block " - "size: found %u, should be %u\n", - 1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits), - blksz); + "size bits: found %u, should be 9, 10, 11, or 12\n", + blksz_bits); + } else if ((1 << le32_to_cpu(blksz_bits)) != blksz) { + mlog(ML_ERROR, "found superblock with incorrect block " + "size: found %u, should be %u\n", 1 << blksz_bits, blksz); } else if (le16_to_cpu(di->id2.i_super.s_major_rev_level) != OCFS2_MAJOR_REV_LEVEL || le16_to_cpu(di->id2.i_super.s_minor_rev_level) !=
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 2026559a6c4ce34db117d2db8f710fe2a9420d5a upstream.
When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty() may cause a NULL pointer dereference, or a general protection fault when KASAN is enabled.
This happens because, since the tracepoint was added in mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure.
In the current implementation, nilfs_grab_buffer(), which grabs a buffer to read (or create) a block of metadata, including b-tree node blocks, does not set the block device, but instead does so only if the buffer is not in the "uptodate" state for each of its caller block reading functions. However, if the uptodate flag is set on a folio/page, and the buffer heads are detached from it by try_to_free_buffers(), and new buffer heads are then attached by create_empty_buffers(), the uptodate flag may be restored to each buffer without the block device being set to bh->b_bdev, and mark_buffer_dirty() may be called later in that state, resulting in the bug mentioned above.
Fix this issue by making nilfs_grab_buffer() always set the block device of the super block structure to the buffer head, regardless of the state of the buffer's uptodate flag.
Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: Tejun Heo tj@kernel.org Cc: Ubisectech Sirius bugreport@valiantsec.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/btnode.c | 2 -- fs/nilfs2/gcinode.c | 4 +--- fs/nilfs2/mdt.c | 1 - fs/nilfs2/page.c | 1 + 4 files changed, 2 insertions(+), 6 deletions(-)
--- a/fs/nilfs2/btnode.c +++ b/fs/nilfs2/btnode.c @@ -68,7 +68,6 @@ nilfs_btnode_create_block(struct address goto failed; } memset(bh->b_data, 0, i_blocksize(inode)); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = blocknr; set_buffer_mapped(bh); set_buffer_uptodate(bh); @@ -133,7 +132,6 @@ int nilfs_btnode_submit_block(struct add goto found; } set_buffer_mapped(bh); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = pblocknr; /* set block address for read */ bh->b_end_io = end_buffer_read_sync; get_bh(bh); --- a/fs/nilfs2/gcinode.c +++ b/fs/nilfs2/gcinode.c @@ -83,10 +83,8 @@ int nilfs_gccache_submit_read_data(struc goto out; }
- if (!buffer_mapped(bh)) { - bh->b_bdev = inode->i_sb->s_bdev; + if (!buffer_mapped(bh)) set_buffer_mapped(bh); - } bh->b_blocknr = pbn; bh->b_end_io = end_buffer_read_sync; get_bh(bh); --- a/fs/nilfs2/mdt.c +++ b/fs/nilfs2/mdt.c @@ -89,7 +89,6 @@ static int nilfs_mdt_create_block(struct if (buffer_uptodate(bh)) goto failed_bh;
- bh->b_bdev = sb->s_bdev; err = nilfs_mdt_insert_new_block(inode, block, bh, init_block); if (likely(!err)) { get_bh(bh); --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -63,6 +63,7 @@ struct buffer_head *nilfs_grab_buffer(st put_page(page); return NULL; } + bh->b_bdev = inode->i_sb->s_bdev; return bh; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit 30cec747d6bf2c3e915c075d76d9712e54cde0a6 upstream.
early_numa_add_cpu() applies on physical CPU id rather than logical CPU id, so use cpuid instead of cpu.
Cc: stable@vger.kernel.org Fixes: 3de9c42d02a79a5 ("LoongArch: Add all CPUs enabled by fdt to NUMA node 0") Reported-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -272,7 +272,7 @@ static void __init fdt_smp_setup(void) __cpu_number_map[cpuid] = cpu; __cpu_logical_map[cpu] = cpuid;
- early_numa_add_cpu(cpu, 0); + early_numa_add_cpu(cpuid, 0); set_cpuid_to_node(cpuid, 0); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit 227ca9f6f6aeb8aa8f0c10430b955f1fe2aeab91 upstream.
If PGDIR_SIZE is too large for cpu_vabits, KASAN_SHADOW_END will overflow UINTPTR_MAX because KASAN_SHADOW_START/KASAN_SHADOW_END are aligned up by PGDIR_SIZE. And then the overflowed KASAN_SHADOW_END looks like a user space address.
For example, PGDIR_SIZE of CONFIG_4KB_4LEVEL is 2^39, which is too large for Loongson-2K series whose cpu_vabits = 39.
Since CONFIG_4KB_4LEVEL is completely legal for CPUs with cpu_vabits <= 39, we just disable KASAN via early return in kasan_init(). Otherwise we get a boot failure.
Moreover, we change KASAN_SHADOW_END from the first address after KASAN shadow area to the last address in KASAN shadow area, in order to avoid the end address exactly overflow to 0 (which is a legal case). We don't need to worry about alignment because pgd_addr_end() can handle it.
Cc: stable@vger.kernel.org Reviewed-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/include/asm/kasan.h | 2 +- arch/loongarch/mm/kasan_init.c | 15 +++++++++++++-- 2 files changed, 14 insertions(+), 3 deletions(-)
--- a/arch/loongarch/include/asm/kasan.h +++ b/arch/loongarch/include/asm/kasan.h @@ -51,7 +51,7 @@ /* KAsan shadow memory start right after vmalloc. */ #define KASAN_SHADOW_START round_up(KFENCE_AREA_END, PGDIR_SIZE) #define KASAN_SHADOW_SIZE (XKVRANGE_VC_SHADOW_END - XKPRANGE_CC_KASAN_OFFSET) -#define KASAN_SHADOW_END round_up(KASAN_SHADOW_START + KASAN_SHADOW_SIZE, PGDIR_SIZE) +#define KASAN_SHADOW_END (round_up(KASAN_SHADOW_START + KASAN_SHADOW_SIZE, PGDIR_SIZE) - 1)
#define XKPRANGE_CC_SHADOW_OFFSET (KASAN_SHADOW_START + XKPRANGE_CC_KASAN_OFFSET) #define XKPRANGE_UC_SHADOW_OFFSET (KASAN_SHADOW_START + XKPRANGE_UC_KASAN_OFFSET) --- a/arch/loongarch/mm/kasan_init.c +++ b/arch/loongarch/mm/kasan_init.c @@ -218,7 +218,7 @@ static void __init kasan_map_populate(un asmlinkage void __init kasan_early_init(void) { BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_START, PGDIR_SIZE)); - BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, PGDIR_SIZE)); + BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END + 1, PGDIR_SIZE)); }
static inline void kasan_set_pgd(pgd_t *pgdp, pgd_t pgdval) @@ -233,7 +233,7 @@ static void __init clear_pgds(unsigned l * swapper_pg_dir. pgd_clear() can't be used * here because it's nop on 2,3-level pagetable setups */ - for (; start < end; start += PGDIR_SIZE) + for (; start < end; start = pgd_addr_end(start, end)) kasan_set_pgd((pgd_t *)pgd_offset_k(start), __pgd(0)); }
@@ -243,6 +243,17 @@ void __init kasan_init(void) phys_addr_t pa_start, pa_end;
/* + * If PGDIR_SIZE is too large for cpu_vabits, KASAN_SHADOW_END will + * overflow UINTPTR_MAX and then looks like a user space address. + * For example, PGDIR_SIZE of CONFIG_4KB_4LEVEL is 2^39, which is too + * large for Loongson-2K series whose cpu_vabits = 39. + */ + if (KASAN_SHADOW_END < vm_map_base) { + pr_warn("PGDIR_SIZE too large for cpu_vabits, KernelAddressSanitizer disabled.\n"); + return; + } + + /* * PGD was populated as invalid_pmd_table or invalid_pud_table * in pagetable_init() which depends on how many levels of page * table you are using, but we had to clean the gpd of kasan
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit a410656643ce4844ba9875aa4e87a7779308259b upstream.
Make KASAN work with 5-level page-tables, including: 1. Implement and use __pgd_none() and kasan_p4d_offset(). 2. As done in kasan_pmd_populate() and kasan_pte_populate(), restrict the loop conditions of kasan_p4d_populate() and kasan_pud_populate() to avoid unnecessary population.
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/mm/kasan_init.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-)
--- a/arch/loongarch/mm/kasan_init.c +++ b/arch/loongarch/mm/kasan_init.c @@ -13,6 +13,13 @@
static pgd_t kasan_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
+#ifdef __PAGETABLE_P4D_FOLDED +#define __pgd_none(early, pgd) (0) +#else +#define __pgd_none(early, pgd) (early ? (pgd_val(pgd) == 0) : \ +(__pa(pgd_val(pgd)) == (unsigned long)__pa(kasan_early_shadow_p4d))) +#endif + #ifdef __PAGETABLE_PUD_FOLDED #define __p4d_none(early, p4d) (0) #else @@ -142,6 +149,19 @@ static pud_t *__init kasan_pud_offset(p4 return pud_offset(p4dp, addr); }
+static p4d_t *__init kasan_p4d_offset(pgd_t *pgdp, unsigned long addr, int node, bool early) +{ + if (__pgd_none(early, pgdp_get(pgdp))) { + phys_addr_t p4d_phys = early ? + __pa_symbol(kasan_early_shadow_p4d) : kasan_alloc_zeroed_page(node); + if (!early) + memcpy(__va(p4d_phys), kasan_early_shadow_p4d, sizeof(kasan_early_shadow_p4d)); + pgd_populate(&init_mm, pgdp, (p4d_t *)__va(p4d_phys)); + } + + return p4d_offset(pgdp, addr); +} + static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr, unsigned long end, int node, bool early) { @@ -178,19 +198,19 @@ static void __init kasan_pud_populate(p4 do { next = pud_addr_end(addr, end); kasan_pmd_populate(pudp, addr, next, node, early); - } while (pudp++, addr = next, addr != end); + } while (pudp++, addr = next, addr != end && __pud_none(early, READ_ONCE(*pudp))); }
static void __init kasan_p4d_populate(pgd_t *pgdp, unsigned long addr, unsigned long end, int node, bool early) { unsigned long next; - p4d_t *p4dp = p4d_offset(pgdp, addr); + p4d_t *p4dp = kasan_p4d_offset(pgdp, addr, node, early);
do { next = p4d_addr_end(addr, end); kasan_pud_populate(p4dp, addr, next, node, early); - } while (p4dp++, addr = next, addr != end); + } while (p4dp++, addr = next, addr != end && __p4d_none(early, READ_ONCE(*p4dp))); }
static void __init kasan_pgd_populate(unsigned long addr, unsigned long end,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aurelien Jarno aurelien@aurel32.net
commit 1635e407a4a64d08a8517ac59ca14ad4fc785e75 upstream.
The commit 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") increased the max_req_size, even for 4K pages, causing various issues: - Panic booting the kernel/rootfs from an SD card on Rockchip RK3566 - Panic booting the kernel/rootfs from an SD card on StarFive JH7100 - "swiotlb buffer is full" and data corruption on StarFive JH7110
At this stage no fix have been found, so it's probably better to just revert the change.
This reverts commit 8396c793ffdf28bb8aee7cfe0891080f8cab7890.
Cc: stable@vger.kernel.org Cc: Sam Protsenko semen.protsenko@linaro.org Fixes: 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") Closes: https://lore.kernel.org/linux-mmc/614692b4-1dbe-31b8-a34d-cb6db1909bb7@w6rz.... Closes: https://lore.kernel.org/linux-mmc/CAC8uq=Ppnmv98mpa1CrWLawWoPnu5abtU69v-=G-P... Signed-off-by: Aurelien Jarno aurelien@aurel32.net Message-ID: 20241110114700.622372-1-aurelien@aurel32.net Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/dw_mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2952,8 +2952,8 @@ static int dw_mci_init_slot(struct dw_mc if (host->use_dma == TRANS_MODE_IDMAC) { mmc->max_segs = host->ring_size; mmc->max_blk_size = 65535; - mmc->max_req_size = DW_MCI_DESC_DATA_LENGTH * host->ring_size; - mmc->max_seg_size = mmc->max_req_size; + mmc->max_seg_size = 0x1000; + mmc->max_req_size = mmc->max_seg_size * host->ring_size; mmc->max_blk_count = mmc->max_req_size / 512; } else if (host->use_dma == TRANS_MODE_EDMAC) { mmc->max_segs = 64;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andre Przywara andre.przywara@arm.com
commit 85b580afc2c215394e08974bf033de9face94955 upstream.
It turns out that the Allwinner A100/A133 SoC only supports 8K DMA blocks (13 bits wide), for both the SD/SDIO and eMMC instances. And while this alone would make a trivial fix, the H616 falls back to the A100 compatible string, so we have to now match the H616 compatible string explicitly against the description advertising 64K DMA blocks.
As the A100 is now compatible with the D1 description, let the A100 compatible string point to that block instead, and introduce an explicit match against the H616 string, pointing to the old description. Also remove the redundant setting of clk_delays to NULL on the way.
Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller") Cc: stable@vger.kernel.org Signed-off-by: Andre Przywara andre.przywara@arm.com Tested-by: Parthiban Nallathambi parthiban@linumiz.com Reviewed-by: Chen-Yu Tsai wens@csie.org Message-ID: 20241107014240.24669-1-andre.przywara@arm.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sunxi-mmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/mmc/host/sunxi-mmc.c +++ b/drivers/mmc/host/sunxi-mmc.c @@ -1191,10 +1191,9 @@ static const struct sunxi_mmc_cfg sun50i .needs_new_timings = true, };
-static const struct sunxi_mmc_cfg sun50i_a100_cfg = { +static const struct sunxi_mmc_cfg sun50i_h616_cfg = { .idma_des_size_bits = 16, .idma_des_shift = 2, - .clk_delays = NULL, .can_calibrate = true, .mask_data0 = true, .needs_new_timings = true, @@ -1217,8 +1216,9 @@ static const struct of_device_id sunxi_m { .compatible = "allwinner,sun20i-d1-mmc", .data = &sun20i_d1_cfg }, { .compatible = "allwinner,sun50i-a64-mmc", .data = &sun50i_a64_cfg }, { .compatible = "allwinner,sun50i-a64-emmc", .data = &sun50i_a64_emmc_cfg }, - { .compatible = "allwinner,sun50i-a100-mmc", .data = &sun50i_a100_cfg }, + { .compatible = "allwinner,sun50i-a100-mmc", .data = &sun20i_d1_cfg }, { .compatible = "allwinner,sun50i-a100-emmc", .data = &sun50i_a100_emmc_cfg }, + { .compatible = "allwinner,sun50i-h616-mmc", .data = &sun50i_h616_cfg }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, sunxi_mmc_of_match);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Francesco Dolcini francesco.dolcini@toradex.com
commit 32c4514455b2b8fde506f8c0962f15c7e4c26f1d upstream.
Wait for the command transmission to be completed in the DSI transfer function polling for the dc_start bit to go back to idle state after the transmission is started.
This is documented in the datasheet and failures to do so lead to commands corruption.
Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver") Cc: stable@vger.kernel.org Signed-off-by: Francesco Dolcini francesco.dolcini@toradex.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20240926141246.48282-1-francesco@dolcini.it Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240926141246.48282-1-frances... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/tc358768.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/bridge/tc358768.c +++ b/drivers/gpu/drm/bridge/tc358768.c @@ -125,6 +125,9 @@ #define TC358768_DSI_CONFW_MODE_CLR (6 << 29) #define TC358768_DSI_CONFW_ADDR_DSI_CONTROL (0x3 << 24)
+/* TC358768_DSICMD_TX (0x0600) register */ +#define TC358768_DSI_CMDTX_DC_START BIT(0) + static const char * const tc358768_supplies[] = { "vddc", "vddmipi", "vddio" }; @@ -229,6 +232,21 @@ static void tc358768_update_bits(struct tc358768_write(priv, reg, tmp); }
+static void tc358768_dsicmd_tx(struct tc358768_priv *priv) +{ + u32 val; + + /* start transfer */ + tc358768_write(priv, TC358768_DSICMD_TX, TC358768_DSI_CMDTX_DC_START); + if (priv->error) + return; + + /* wait transfer completion */ + priv->error = regmap_read_poll_timeout(priv->regmap, TC358768_DSICMD_TX, val, + (val & TC358768_DSI_CMDTX_DC_START) == 0, + 100, 100000); +} + static int tc358768_sw_reset(struct tc358768_priv *priv) { /* Assert Reset */ @@ -516,8 +534,7 @@ static ssize_t tc358768_dsi_host_transfe } }
- /* start transfer */ - tc358768_write(priv, TC358768_DSICMD_TX, 1); + tc358768_dsicmd_tx(priv);
ret = tc358768_clear_error(priv); if (ret)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Fan peng.fan@nxp.com
commit f7c7c5aa556378a2c8da72c1f7f238b6648f95fb upstream.
The check condition should be 'i < bc->onecell_data.num_domains', not 'bc->onecell_data.num_domains' which will make the look never finish and cause kernel panic.
Also disable runtime to address "imx93-blk-ctrl 4ac10000.system-controller: Unbalanced pm_runtime_enable!"
Fixes: e9aa77d413c9 ("soc: imx: add i.MX93 media blk ctrl driver") Signed-off-by: Peng Fan peng.fan@nxp.com Reviewed-by: Stefan Wahren wahrenst@gmx.net Cc: stable@vger.kernel.org Message-ID: 20241101101252.1448466-1-peng.fan@oss.nxp.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/imx/imx93-blk-ctrl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/pmdomain/imx/imx93-blk-ctrl.c +++ b/drivers/pmdomain/imx/imx93-blk-ctrl.c @@ -313,7 +313,9 @@ static int imx93_blk_ctrl_remove(struct
of_genpd_del_provider(pdev->dev.of_node);
- for (i = 0; bc->onecell_data.num_domains; i++) { + pm_runtime_disable(&pdev->dev); + + for (i = 0; i < bc->onecell_data.num_domains; i++) { struct imx93_blk_ctrl_domain *domain = &bc->domains[i];
pm_genpd_remove(&domain->genpd);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
commit 21ec425eaf2cb7c0371f7683f81ad7d9679b6eb5 upstream.
When this code moved to non-coherent allocator the sync was put too early for some firmwares which called the setup function, move the sync down after the setup function.
Reported-by: Diogo Ivo diogo.ivo@tecnico.ulisboa.pt Tested-by: Diogo Ivo diogo.ivo@tecnico.ulisboa.pt Reviewed-by: Lyude Paul lyude@redhat.com Fixes: 9b340aeb26d5 ("nouveau/firmware: use dma non-coherent allocator") Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20241114004603.3095485-1-airli... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/falcon/fw.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c b/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c index a1c8545f1249..cac6d64ab67d 100644 --- a/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c @@ -89,11 +89,6 @@ nvkm_falcon_fw_boot(struct nvkm_falcon_fw *fw, struct nvkm_subdev *user, nvkm_falcon_fw_dtor_sigs(fw); }
- /* after last write to the img, sync dma mappings */ - dma_sync_single_for_device(fw->fw.device->dev, - fw->fw.phys, - sg_dma_len(&fw->fw.mem.sgl), - DMA_TO_DEVICE);
FLCNFW_DBG(fw, "resetting"); fw->func->reset(fw); @@ -105,6 +100,12 @@ nvkm_falcon_fw_boot(struct nvkm_falcon_fw *fw, struct nvkm_subdev *user, goto done; }
+ /* after last write to the img, sync dma mappings */ + dma_sync_single_for_device(fw->fw.device->dev, + fw->fw.phys, + sg_dma_len(&fw->fw.mem.sgl), + DMA_TO_DEVICE); + ret = fw->func->load(fw); if (ret) goto done;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vijendar Mukunda Vijendar.Mukunda@amd.com
commit 7013a8268d311fded6c7a6528fc1de82668e75f6 upstream.
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME events while in the D0 state.
Co-developed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Vijendar Mukunda Vijendar.Mukunda@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 447a54a0f79c9a409ceaa17804bdd2e0206397b9) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c @@ -247,6 +247,12 @@ static void nbio_v7_7_init_registers(str if (def != data) WREG32_SOC15(NBIO, 0, regBIF0_PCIE_MST_CTRL_3, data);
+ switch (adev->ip_versions[NBIO_HWIP][0]) { + case IP_VERSION(7, 7, 0): + data = RREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF5_STRAP4) & ~BIT(23); + WREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF5_STRAP4, data); + break; + } }
static void nbio_v7_7_update_medium_grain_clock_gating(struct amdgpu_device *adev,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rodrigo Siqueira Rodrigo.Siqueira@amd.com
commit 16dd2825c23530f2259fc671960a3a65d2af69bd upstream.
At some point, the IEEE ID identification for the replay check in the AMD EDID was added. However, this check causes the following out-of-bounds issues when using KASAN:
[ 27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu] [ 27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383
...
[ 27.821207] Memory state around the buggy address: [ 27.821215] ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821224] ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821243] ^ [ 27.821250] ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821259] ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821268] ==================================================================
This is caused because the ID extraction happens outside of the range of the edid lenght. This commit addresses this issue by considering the amd_vsdb_block size.
Cc: ChiaHsuan Chung chiahsuan.chung@amd.com Reviewed-by: Leo Li sunpeng.li@amd.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit b7e381b1ccd5e778e3d9c44c669ad38439a861d8) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -10725,7 +10725,7 @@ static int parse_amd_vsdb(struct amdgpu_ break; }
- while (j < EDID_LENGTH) { + while (j < EDID_LENGTH - sizeof(struct amd_vsdb_block)) { struct amd_vsdb_block *amd_vsdb = (struct amd_vsdb_block *)&edid_ext[j]; unsigned int ieeeId = (amd_vsdb->ieee_id[2] << 16) | (amd_vsdb->ieee_id[1] << 8) | (amd_vsdb->ieee_id[0]);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
[ Upstream commit 42f994b71404b17abcd6b170de7a6aa95ffe5d4a ]
DAMON-based operation schemes are applied for every aggregation interval. That was mainly because schemes were using nr_accesses, which be complete to be used for every aggregation interval. However, the schemes are now using nr_accesses_bp, which is updated for each sampling interval in a way that reasonable to be used. Therefore, there is no reason to apply schemes for each aggregation interval.
The unnecessary alignment with aggregation interval was also making some use cases of DAMOS tricky. Quotas setting under long aggregation interval is one such example. Suppose the aggregation interval is ten seconds, and there is a scheme having CPU quota 100ms per 1s. The scheme will actually uses 100ms per ten seconds, since it cannobe be applied before next aggregation interval. The feature is working as intended, but the results might not that intuitive for some users. This could be fixed by updating the quota to 1s per 10s. But, in the case, the CPU usage of DAMOS could look like spikes, and would actually make a bad effect to other CPU-sensitive workloads.
Implement a dedicated timing interval for each DAMON-based operation scheme, namely apply_interval. The interval will be sampling interval aligned, and each scheme will be applied for its apply_interval. The interval is set to 0 by default, and it means the scheme should use the aggregation interval instead. This avoids old users getting any behavioral difference.
Link: https://lkml.kernel.org/r/20230916020945.47296-5-sj@kernel.org Signed-off-by: SeongJae Park sj@kernel.org Cc: Jonathan Corbet corbet@lwn.net Cc: Shuah Khan shuah@kernel.org Cc: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: 3488af097044 ("mm/damon/core: handle zero {aggregation,ops_update} intervals") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/damon.h | 17 ++++++++-- mm/damon/core.c | 72 ++++++++++++++++++++++++++++++++++++---- mm/damon/dbgfs.c | 3 +- mm/damon/lru_sort.c | 2 ++ mm/damon/reclaim.c | 2 ++ mm/damon/sysfs-schemes.c | 2 +- 6 files changed, 87 insertions(+), 11 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h index a953d7083cd59..343132a146cf0 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -298,16 +298,19 @@ struct damos_access_pattern { * struct damos - Represents a Data Access Monitoring-based Operation Scheme. * @pattern: Access pattern of target regions. * @action: &damo_action to be applied to the target regions. + * @apply_interval_us: The time between applying the @action. * @quota: Control the aggressiveness of this scheme. * @wmarks: Watermarks for automated (in)activation of this scheme. * @filters: Additional set of &struct damos_filter for &action. * @stat: Statistics of this scheme. * @list: List head for siblings. * - * For each aggregation interval, DAMON finds regions which fit in the + * For each @apply_interval_us, DAMON finds regions which fit in the * &pattern and applies &action to those. To avoid consuming too much * CPU time or IO resources for the &action, "a is used. * + * If @apply_interval_us is zero, &damon_attrs->aggr_interval is used instead. + * * To do the work only when needed, schemes can be activated for specific * system situations using &wmarks. If all schemes that registered to the * monitoring context are inactive, DAMON stops monitoring either, and just @@ -327,6 +330,14 @@ struct damos_access_pattern { struct damos { struct damos_access_pattern pattern; enum damos_action action; + unsigned long apply_interval_us; +/* private: internal use only */ + /* + * number of sample intervals that should be passed before applying + * @action + */ + unsigned long next_apply_sis; +/* public: */ struct damos_quota quota; struct damos_watermarks wmarks; struct list_head filters; @@ -627,7 +638,9 @@ void damos_add_filter(struct damos *s, struct damos_filter *f); void damos_destroy_filter(struct damos_filter *f);
struct damos *damon_new_scheme(struct damos_access_pattern *pattern, - enum damos_action action, struct damos_quota *quota, + enum damos_action action, + unsigned long apply_interval_us, + struct damos_quota *quota, struct damos_watermarks *wmarks); void damon_add_scheme(struct damon_ctx *ctx, struct damos *s); void damon_destroy_scheme(struct damos *s); diff --git a/mm/damon/core.c b/mm/damon/core.c index ae55f20835b06..a29390fd55935 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -312,7 +312,9 @@ static struct damos_quota *damos_quota_init_priv(struct damos_quota *quota) }
struct damos *damon_new_scheme(struct damos_access_pattern *pattern, - enum damos_action action, struct damos_quota *quota, + enum damos_action action, + unsigned long apply_interval_us, + struct damos_quota *quota, struct damos_watermarks *wmarks) { struct damos *scheme; @@ -322,6 +324,13 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern, return NULL; scheme->pattern = *pattern; scheme->action = action; + scheme->apply_interval_us = apply_interval_us; + /* + * next_apply_sis will be set when kdamond starts. While kdamond is + * running, it will also updated when it is added to the DAMON context, + * or damon_attrs are updated. + */ + scheme->next_apply_sis = 0; INIT_LIST_HEAD(&scheme->filters); scheme->stat = (struct damos_stat){}; INIT_LIST_HEAD(&scheme->list); @@ -334,9 +343,21 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern, return scheme; }
+static void damos_set_next_apply_sis(struct damos *s, struct damon_ctx *ctx) +{ + unsigned long sample_interval = ctx->attrs.sample_interval ? + ctx->attrs.sample_interval : 1; + unsigned long apply_interval = s->apply_interval_us ? + s->apply_interval_us : ctx->attrs.aggr_interval; + + s->next_apply_sis = ctx->passed_sample_intervals + + apply_interval / sample_interval; +} + void damon_add_scheme(struct damon_ctx *ctx, struct damos *s) { list_add_tail(&s->list, &ctx->schemes); + damos_set_next_apply_sis(s, ctx); }
static void damon_del_scheme(struct damos *s) @@ -548,6 +569,7 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs) { unsigned long sample_interval = attrs->sample_interval ? attrs->sample_interval : 1; + struct damos *s;
if (attrs->min_nr_regions < 3) return -EINVAL; @@ -563,6 +585,10 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs)
damon_update_monitoring_results(ctx, attrs); ctx->attrs = *attrs; + + damon_for_each_scheme(s, ctx) + damos_set_next_apply_sis(s, ctx); + return 0; }
@@ -1055,14 +1081,29 @@ static void kdamond_apply_schemes(struct damon_ctx *c) struct damon_target *t; struct damon_region *r, *next_r; struct damos *s; + unsigned long sample_interval = c->attrs.sample_interval ? + c->attrs.sample_interval : 1; + bool has_schemes_to_apply = false;
damon_for_each_scheme(s, c) { + if (c->passed_sample_intervals != s->next_apply_sis) + continue; + + s->next_apply_sis += + (s->apply_interval_us ? s->apply_interval_us : + c->attrs.aggr_interval) / sample_interval; + if (!s->wmarks.activated) continue;
+ has_schemes_to_apply = true; + damos_adjust_quota(c, s); }
+ if (!has_schemes_to_apply) + return; + damon_for_each_target(t, c) { damon_for_each_region_safe(r, next_r, t) damon_do_apply_schemes(c, t, r); @@ -1348,11 +1389,19 @@ static void kdamond_init_intervals_sis(struct damon_ctx *ctx) { unsigned long sample_interval = ctx->attrs.sample_interval ? ctx->attrs.sample_interval : 1; + unsigned long apply_interval; + struct damos *scheme;
ctx->passed_sample_intervals = 0; ctx->next_aggregation_sis = ctx->attrs.aggr_interval / sample_interval; ctx->next_ops_update_sis = ctx->attrs.ops_update_interval / sample_interval; + + damon_for_each_scheme(scheme, ctx) { + apply_interval = scheme->apply_interval_us ? + scheme->apply_interval_us : ctx->attrs.aggr_interval; + scheme->next_apply_sis = apply_interval / sample_interval; + } }
/* @@ -1405,19 +1454,28 @@ static int kdamond_fn(void *data) if (ctx->ops.check_accesses) max_nr_accesses = ctx->ops.check_accesses(ctx);
- sample_interval = ctx->attrs.sample_interval ? - ctx->attrs.sample_interval : 1; if (ctx->passed_sample_intervals == next_aggregation_sis) { - ctx->next_aggregation_sis = next_aggregation_sis + - ctx->attrs.aggr_interval / sample_interval; kdamond_merge_regions(ctx, max_nr_accesses / 10, sz_limit); if (ctx->callback.after_aggregation && ctx->callback.after_aggregation(ctx)) break; - if (!list_empty(&ctx->schemes)) - kdamond_apply_schemes(ctx); + } + + /* + * do kdamond_apply_schemes() after kdamond_merge_regions() if + * possible, to reduce overhead + */ + if (!list_empty(&ctx->schemes)) + kdamond_apply_schemes(ctx); + + sample_interval = ctx->attrs.sample_interval ? + ctx->attrs.sample_interval : 1; + if (ctx->passed_sample_intervals == next_aggregation_sis) { + ctx->next_aggregation_sis = next_aggregation_sis + + ctx->attrs.aggr_interval / sample_interval; + kdamond_reset_aggregated(ctx); kdamond_split_regions(ctx); if (ctx->ops.reset_aggregated) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 124f0f8c97b75..dc0ea1fc30ca5 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -278,7 +278,8 @@ static struct damos **str_to_schemes(const char *str, ssize_t len, goto fail;
pos += parsed; - scheme = damon_new_scheme(&pattern, action, "a, &wmarks); + scheme = damon_new_scheme(&pattern, action, 0, "a, + &wmarks); if (!scheme) goto fail;
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c index e84495ab92cf3..3de2916a65c38 100644 --- a/mm/damon/lru_sort.c +++ b/mm/damon/lru_sort.c @@ -158,6 +158,8 @@ static struct damos *damon_lru_sort_new_scheme( pattern, /* (de)prioritize on LRU-lists */ action, + /* for each aggregation interval */ + 0, /* under the quota. */ "a, /* (De)activate this according to the watermarks. */ diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c index eca9d000ecc53..66e190f0374ac 100644 --- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -142,6 +142,8 @@ static struct damos *damon_reclaim_new_scheme(void) &pattern, /* page out those, as soon as found */ DAMOS_PAGEOUT, + /* for each aggregation interval */ + 0, /* under the quota. */ &damon_reclaim_quota, /* (De)activate this according to the watermarks. */ diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c index 36dcd881a19c0..26c948f87489e 100644 --- a/mm/damon/sysfs-schemes.c +++ b/mm/damon/sysfs-schemes.c @@ -1613,7 +1613,7 @@ static struct damos *damon_sysfs_mk_scheme( .low = sysfs_wmarks->low, };
- scheme = damon_new_scheme(&pattern, sysfs_scheme->action, "a, + scheme = damon_new_scheme(&pattern, sysfs_scheme->action, 0, "a, &wmarks); if (!scheme) return NULL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
[ Upstream commit 3488af0970445ff5532c7e8dc5e6456b877aee5e ]
Patch series "mm/damon/core: fix handling of zero non-sampling intervals".
DAMON's internal intervals accounting logic is not correctly handling non-sampling intervals of zero values for a wrong assumption. This could cause unexpected monitoring behavior, and even result in infinite hang of DAMON sysfs interface user threads in case of zero aggregation interval. Fix those by updating the intervals accounting logic. For details of the root case and solutions, please refer to commit messages of fixes.
This patch (of 2):
DAMON's logics to determine if this is the time to do aggregation and ops update assumes next_{aggregation,ops_update}_sis are always set larger than current passed_sample_intervals. And therefore it further assumes continuously incrementing passed_sample_intervals every sampling interval will make it reaches to the next_{aggregation,ops_update}_sis in future. The logic therefore make the action and update next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same to the counts, respectively.
If Aggregation interval or Ops update interval are zero, however, next_aggregation_sis or next_ops_update_sis are set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_{aggregation,ops_update}_sis check. Hence, passed_sample_intervals becomes larger than next_{aggregation,ops_update}_sis, and the logic says it is not the time to do the action and update next_{aggregation,ops_update}_sis forever, until an overflow happens. In other words, DAMON stops doing aggregations or ops updates effectively forever, and users cannot get monitoring results.
Based on the documents and the common sense, a reasonable behavior for such inputs is doing an aggregation and an ops update for every sampling interval. Handle the case by removing the assumption.
Note that this could incur particular real issue for DAMON sysfs interface users, in case of zero Aggregation interval. When user starts DAMON with zero Aggregation interval and asks online DAMON parameter tuning via DAMON sysfs interface, the request is handled by the aggregation callback. Until the callback finishes the work, the user who requested the online tuning just waits. Hence, the user will be stuck until the passed_sample_intervals overflows.
Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@kernel.org Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@kernel.org Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org [6.7.x] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/damon/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c index a29390fd55935..d0441e24a8ed5 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1454,7 +1454,7 @@ static int kdamond_fn(void *data) if (ctx->ops.check_accesses) max_nr_accesses = ctx->ops.check_accesses(ctx);
- if (ctx->passed_sample_intervals == next_aggregation_sis) { + if (ctx->passed_sample_intervals >= next_aggregation_sis) { kdamond_merge_regions(ctx, max_nr_accesses / 10, sz_limit); @@ -1472,7 +1472,7 @@ static int kdamond_fn(void *data)
sample_interval = ctx->attrs.sample_interval ? ctx->attrs.sample_interval : 1; - if (ctx->passed_sample_intervals == next_aggregation_sis) { + if (ctx->passed_sample_intervals >= next_aggregation_sis) { ctx->next_aggregation_sis = next_aggregation_sis + ctx->attrs.aggr_interval / sample_interval;
@@ -1482,7 +1482,7 @@ static int kdamond_fn(void *data) ctx->ops.reset_aggregated(ctx); }
- if (ctx->passed_sample_intervals == next_ops_update_sis) { + if (ctx->passed_sample_intervals >= next_ops_update_sis) { ctx->next_ops_update_sis = next_ops_update_sis + ctx->attrs.ops_update_interval / sample_interval;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit 4e2766102da632f26341d5539519b0abf73df887 ]
The whole benefit of this encapsulating struct is questionable. It just stores a flag to signalize the init state of vchiq_arm_state. Beside the fact this flag is set too soon, the access to uninitialized members should be avoided. So initialize vchiq_arm_state properly before assign it directly to vchiq_state.
Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20240621131958.98208-6-wahrenst@gmx.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 404b739e8955 ("staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation") Signed-off-by: Sasha Levin sashal@kernel.org --- .../interface/vchiq_arm/vchiq_arm.c | 25 +++++-------------- 1 file changed, 6 insertions(+), 19 deletions(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index aa2313f3bcab8..0a97fb237f5e7 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -115,11 +115,6 @@ struct vchiq_arm_state { int first_connect; };
-struct vchiq_2835_state { - int inited; - struct vchiq_arm_state arm_state; -}; - struct vchiq_pagelist_info { struct pagelist *pagelist; size_t pagelist_buffer_size; @@ -580,29 +575,21 @@ vchiq_arm_init_state(struct vchiq_state *state, int vchiq_platform_init_state(struct vchiq_state *state) { - struct vchiq_2835_state *platform_state; + struct vchiq_arm_state *platform_state;
- state->platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); - if (!state->platform_state) + platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + if (!platform_state) return -ENOMEM;
- platform_state = (struct vchiq_2835_state *)state->platform_state; - - platform_state->inited = 1; - vchiq_arm_init_state(state, &platform_state->arm_state); + vchiq_arm_init_state(state, platform_state); + state->platform_state = (struct opaque_platform_state *)platform_state;
return 0; }
static struct vchiq_arm_state *vchiq_platform_get_arm_state(struct vchiq_state *state) { - struct vchiq_2835_state *platform_state; - - platform_state = (struct vchiq_2835_state *)state->platform_state; - - WARN_ON_ONCE(!platform_state->inited); - - return &platform_state->arm_state; + return (struct vchiq_arm_state *)state->platform_state; }
void
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Umang Jain umang.jain@ideasonboard.com
[ Upstream commit 404b739e895522838f1abdc340c554654d671dde ]
The struct vchiq_arm_state 'platform_state' is currently allocated dynamically using kzalloc(). Unfortunately, it is never freed and is subjected to memory leaks in the error handling paths of the probe() function.
To address the issue, use device resource management helper devm_kzalloc(), to ensure cleanup after its allocation.
Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver") Cc: stable@vger.kernel.org Signed-off-by: Umang Jain umang.jain@ideasonboard.com Reviewed-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 0a97fb237f5e7..92aa98bbdc662 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -577,7 +577,7 @@ vchiq_platform_init_state(struct vchiq_state *state) { struct vchiq_arm_state *platform_state;
- platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL); + platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL); if (!platform_state) return -ENOMEM;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
The parse_build_id_buf does not account Elf32_Nhdr header size when getting the build id data pointer and returns wrong build id data as result.
This is problem only for stable trees that merged c83a80d8b84f fix, the upstream build id code was refactored and returns proper build id.
Acked-by: Andrii Nakryiko andrii@kernel.org Fixes: c83a80d8b84f ("lib/buildid: harden build ID parsing logic") Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/buildid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/buildid.c +++ b/lib/buildid.c @@ -40,7 +40,7 @@ static int parse_build_id_buf(unsigned c name_sz == note_name_sz && memcmp(nhdr + 1, note_name, note_name_sz) == 0 && desc_sz > 0 && desc_sz <= BUILD_ID_SIZE_MAX) { - data = note_start + note_off + ALIGN(note_name_sz, 4); + data = note_start + note_off + sizeof(Elf32_Nhdr) + ALIGN(note_name_sz, 4); memcpy(build_id, data, desc_sz); memset(build_id + desc_sz, 0, BUILD_ID_SIZE_MAX - desc_sz); if (size)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mauro Carvalho Chehab mchehab+huawei@kernel.org
commit a4aebaf6e6efff548b01a3dc49b4b9074751c15b upstream.
When CONFIG_DVB_DYNAMIC_MINORS, ret is not initialized, and a semaphore is left at the wrong state, in case of errors.
Make the code simpler and avoid mistakes by having just one error check logic used weather DVB_DYNAMIC_MINORS is used or not.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/r/202410201717.ULWWdJv8-lkp@intel.com/ Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Link: https://lore.kernel.org/r/9e067488d8935b8cf00959764a1fa5de85d65725.173092625... Cc: Nathan Chancellor nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-core/dvbdev.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-)
--- a/drivers/media/dvb-core/dvbdev.c +++ b/drivers/media/dvb-core/dvbdev.c @@ -530,6 +530,9 @@ int dvb_register_device(struct dvb_adapt for (minor = 0; minor < MAX_DVB_MINORS; minor++) if (!dvb_minors[minor]) break; +#else + minor = nums2minor(adap->num, type, id); +#endif if (minor >= MAX_DVB_MINORS) { if (new_node) { list_del(&new_node->list_head); @@ -543,17 +546,7 @@ int dvb_register_device(struct dvb_adapt mutex_unlock(&dvbdev_register_lock); return -EINVAL; } -#else - minor = nums2minor(adap->num, type, id); - if (minor >= MAX_DVB_MINORS) { - dvb_media_device_free(dvbdev); - list_del(&dvbdev->list_head); - kfree(dvbdev); - *pdvbdev = NULL; - mutex_unlock(&dvbdev_register_lock); - return ret; - } -#endif + dvbdev->minor = minor; dvb_minors[minor] = dvb_device_get(dvbdev); up_write(&minor_rwsem);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 15d1975b7279693d6f09398e0e2e31aca2310275 ]
Prepare for adding server copy trace points.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Tested-by: Chen Hanxiao chenhx.fnst@fujitsu.com Stable-dep-of: 9ed666eba4e0 ("NFSD: Async COPY result needs to return a write verifier") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1798,6 +1798,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struc __be32 status; struct nfsd4_copy *async_copy = NULL;
+ copy->cp_clp = cstate->clp; if (nfsd4_ssc_is_inter(copy)) { if (!inter_copy_offload_enable || nfsd4_copy_is_sync(copy)) { status = nfserr_notsupp; @@ -1812,7 +1813,6 @@ nfsd4_copy(struct svc_rqst *rqstp, struc return status; }
- copy->cp_clp = cstate->clp; memcpy(©->fh, &cstate->current_fh.fh_handle, sizeof(struct knfsd_fh)); if (nfsd4_copy_is_async(copy)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 9ed666eba4e0a2bb8ffaa3739d830b64d4f2aaad ]
Currently, when NFSD handles an asynchronous COPY, it returns a zero write verifier, relying on the subsequent CB_OFFLOAD callback to pass the write verifier and a stable_how4 value to the client.
However, if the CB_OFFLOAD never arrives at the client (for example, if a network partition occurs just as the server sends the CB_OFFLOAD operation), the client will never receive this verifier. Thus, if the client sends a follow-up COMMIT, there is no way for the client to assess the COMMIT result.
The usual recovery for a missing CB_OFFLOAD is for the client to send an OFFLOAD_STATUS operation, but that operation does not carry a write verifier in its result. Neither does it carry a stable_how4 value, so the client /must/ send a COMMIT in this case -- which will always fail because currently there's still no write verifier in the COPY result.
Thus the server needs to return a normal write verifier in its COPY result even if the COPY operation is to be performed asynchronously.
If the server recognizes the callback stateid in subsequent OFFLOAD_STATUS operations, then obviously it has not restarted, and the write verifier the client received in the COPY result is still valid and can be used to assess a COMMIT of the copied data, if one is needed.
Reviewed-by: Jeff Layton jlayton@kernel.org [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4proc.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-)
--- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -751,15 +751,6 @@ nfsd4_access(struct svc_rqst *rqstp, str &access->ac_supported); }
-static void gen_boot_verifier(nfs4_verifier *verifier, struct net *net) -{ - __be32 *verf = (__be32 *)verifier->data; - - BUILD_BUG_ON(2*sizeof(*verf) != sizeof(verifier->data)); - - nfsd_copy_write_verifier(verf, net_generic(net, nfsd_net_id)); -} - static __be32 nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) @@ -1623,7 +1614,6 @@ static void nfsd4_init_copy_res(struct n test_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags) ? NFS_FILE_SYNC : NFS_UNSTABLE; nfsd4_copy_set_sync(copy, sync); - gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); }
static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, @@ -1794,9 +1784,14 @@ static __be32 nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + struct nfsd4_copy *async_copy = NULL; struct nfsd4_copy *copy = &u->copy; + struct nfsd42_write_res *result; __be32 status; - struct nfsd4_copy *async_copy = NULL; + + result = ©->cp_res; + nfsd_copy_write_verifier((__be32 *)&result->wr_verifier.data, nn);
copy->cp_clp = cstate->clp; if (nfsd4_ssc_is_inter(copy)) { @@ -1816,8 +1811,6 @@ nfsd4_copy(struct svc_rqst *rqstp, struc memcpy(©->fh, &cstate->current_fh.fh_handle, sizeof(struct knfsd_fh)); if (nfsd4_copy_is_async(copy)) { - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); - status = nfserrno(-ENOMEM); async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) @@ -1829,8 +1822,8 @@ nfsd4_copy(struct svc_rqst *rqstp, struc goto out_err; if (!nfs4_init_copy_state(nn, copy)) goto out_err; - memcpy(©->cp_res.cb_stateid, ©->cp_stateid.cs_stid, - sizeof(copy->cp_res.cb_stateid)); + memcpy(&result->cb_stateid, ©->cp_stateid.cs_stid, + sizeof(result->cb_stateid)); dup_copy_fields(copy, async_copy); async_copy->copy_task = kthread_create(nfsd4_do_async_copy, async_copy, "%s", "copy thread");
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit aadc3bbea163b6caaaebfdd2b6c4667fbc726752 ]
Nothing appears to limit the number of concurrent async COPY operations that clients can start. In addition, AFAICT each async COPY can copy an unlimited number of 4MB chunks, so can run for a long time. Thus IMO async COPY can become a DoS vector.
Add a restriction mechanism that bounds the number of concurrent background COPY operations. Start simple and try to be fair -- this patch implements a per-namespace limit.
An async COPY request that occurs while this limit is exceeded gets NFS4ERR_DELAY. The requesting client can choose to send the request again after a delay or fall back to a traditional read/write style copy.
If there is need to make the mechanism more sophisticated, we can visit that in future patches.
Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-49974 Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4proc.c | 11 +++++++++-- fs/nfsd/nfs4state.c | 1 + fs/nfsd/xdr4.h | 1 + 4 files changed, 12 insertions(+), 2 deletions(-)
--- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -153,6 +153,7 @@ struct nfsd_net { u32 s2s_cp_cl_id; struct idr s2s_cp_stateids; spinlock_t s2s_cp_lock; + atomic_t pending_async_copies;
/* * Version information --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1273,6 +1273,7 @@ static void nfs4_put_copy(struct nfsd4_c { if (!refcount_dec_and_test(©->refcount)) return; + atomic_dec(©->cp_nn->pending_async_copies); kfree(copy->cp_src); kfree(copy); } @@ -1811,10 +1812,16 @@ nfsd4_copy(struct svc_rqst *rqstp, struc memcpy(©->fh, &cstate->current_fh.fh_handle, sizeof(struct knfsd_fh)); if (nfsd4_copy_is_async(copy)) { - status = nfserrno(-ENOMEM); async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) goto out_err; + async_copy->cp_nn = nn; + /* Arbitrary cap on number of pending async copy operations */ + if (atomic_inc_return(&nn->pending_async_copies) > + (int)rqstp->rq_pool->sp_nrthreads) { + atomic_dec(&nn->pending_async_copies); + goto out_err; + } INIT_LIST_HEAD(&async_copy->copies); refcount_set(&async_copy->refcount, 1); async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); @@ -1853,7 +1860,7 @@ out_err: } if (async_copy) cleanup_async_copy(async_copy); - status = nfserrno(-ENOMEM); + status = nfserr_jukebox; goto out; }
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -8142,6 +8142,7 @@ static int nfs4_state_create_net(struct spin_lock_init(&nn->client_lock); spin_lock_init(&nn->s2s_cp_lock); idr_init(&nn->s2s_cp_stateids); + atomic_set(&nn->pending_async_copies, 0);
spin_lock_init(&nn->blocked_locks_lock); INIT_LIST_HEAD(&nn->blocked_locks_lru); --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -574,6 +574,7 @@ struct nfsd4_copy { struct nfsd4_ssc_umount_item *ss_nsui; struct nfs_fh c_fh; nfs4_stateid stateid; + struct nfsd_net *cp_nn; };
static inline void nfsd4_copy_set_sync(struct nfsd4_copy *copy, bool sync)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 63fab04cbd0f96191b6e5beedc3b643b01c15889 ]
Ensure the refcount and async_copies fields are initialized early. cleanup_async_copy() will reference these fields if an error occurs in nfsd4_copy(). If they are not correctly initialized, at the very least, a refcount underflow occurs.
Reported-by: Olga Kornievskaia okorniev@redhat.com Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Reviewed-by: Jeff Layton jlayton@kernel.org Tested-by: Olga Kornievskaia okorniev@redhat.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1816,14 +1816,14 @@ nfsd4_copy(struct svc_rqst *rqstp, struc if (!async_copy) goto out_err; async_copy->cp_nn = nn; + INIT_LIST_HEAD(&async_copy->copies); + refcount_set(&async_copy->refcount, 1); /* Arbitrary cap on number of pending async copy operations */ if (atomic_inc_return(&nn->pending_async_copies) > (int)rqstp->rq_pool->sp_nrthreads) { atomic_dec(&nn->pending_async_copies); goto out_err; } - INIT_LIST_HEAD(&async_copy->copies); - refcount_set(&async_copy->refcount, 1); async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); if (!async_copy->cp_src) goto out_err;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 8286f8b622990194207df9ab852e0f87c60d35e9 ]
The error flow in nfsd4_copy() calls cleanup_async_copy(), which already decrements nn->pending_async_copies.
Reported-by: Olga Kornievskaia okorniev@redhat.com Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4proc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1820,10 +1820,8 @@ nfsd4_copy(struct svc_rqst *rqstp, struc refcount_set(&async_copy->refcount, 1); /* Arbitrary cap on number of pending async copy operations */ if (atomic_inc_return(&nn->pending_async_copies) > - (int)rqstp->rq_pool->sp_nrthreads) { - atomic_dec(&nn->pending_async_copies); + (int)rqstp->rq_pool->sp_nrthreads) goto out_err; - } async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); if (!async_copy->cp_src) goto out_err;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit 14cb0e0bf39bd10429ba14e9e2f905f1144226fc upstream.
'(struct sock *)msk' is used several times in mptcp_nl_cmd_announce(), mptcp_nl_cmd_remove() or mptcp_userspace_pm_set_flags() in pm_userspace.c, it's worth adding a local variable sk to point it.
Reviewed-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau martineau@kernel.org Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-8-db8f25f798eb@... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 06afe09091ee ("mptcp: add userspace_pm_lookup_addr_by_id helper") Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_userspace.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-)
--- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -183,6 +183,7 @@ int mptcp_nl_cmd_announce(struct sk_buff struct mptcp_pm_addr_entry addr_val; struct mptcp_sock *msk; int err = -EINVAL; + struct sock *sk; u32 token_val;
if (!addr || !token) { @@ -198,6 +199,8 @@ int mptcp_nl_cmd_announce(struct sk_buff return err; }
+ sk = (struct sock *)msk; + if (!mptcp_pm_is_userspace(msk)) { GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected"); goto announce_err; @@ -221,7 +224,7 @@ int mptcp_nl_cmd_announce(struct sk_buff goto announce_err; }
- lock_sock((struct sock *)msk); + lock_sock(sk); spin_lock_bh(&msk->pm.lock);
if (mptcp_pm_alloc_anno_list(msk, &addr_val.addr)) { @@ -231,11 +234,11 @@ int mptcp_nl_cmd_announce(struct sk_buff }
spin_unlock_bh(&msk->pm.lock); - release_sock((struct sock *)msk); + release_sock(sk);
err = 0; announce_err: - sock_put((struct sock *)msk); + sock_put(sk); return err; }
@@ -282,6 +285,7 @@ int mptcp_nl_cmd_remove(struct sk_buff * struct mptcp_sock *msk; LIST_HEAD(free_list); int err = -EINVAL; + struct sock *sk; u32 token_val; u8 id_val;
@@ -299,6 +303,8 @@ int mptcp_nl_cmd_remove(struct sk_buff * return err; }
+ sk = (struct sock *)msk; + if (!mptcp_pm_is_userspace(msk)) { GENL_SET_ERR_MSG(info, "invalid request; userspace PM not selected"); goto remove_err; @@ -309,7 +315,7 @@ int mptcp_nl_cmd_remove(struct sk_buff * goto remove_err; }
- lock_sock((struct sock *)msk); + lock_sock(sk);
list_for_each_entry(entry, &msk->pm.userspace_pm_local_addr_list, list) { if (entry->addr.id == id_val) { @@ -320,7 +326,7 @@ int mptcp_nl_cmd_remove(struct sk_buff *
if (!match) { GENL_SET_ERR_MSG(info, "address with specified id not found"); - release_sock((struct sock *)msk); + release_sock(sk); goto remove_err; }
@@ -328,15 +334,15 @@ int mptcp_nl_cmd_remove(struct sk_buff *
mptcp_pm_remove_addrs(msk, &free_list);
- release_sock((struct sock *)msk); + release_sock(sk);
list_for_each_entry_safe(match, entry, &free_list, list) { - sock_kfree_s((struct sock *)msk, match, sizeof(*match)); + sock_kfree_s(sk, match, sizeof(*match)); }
err = 0; remove_err: - sock_put((struct sock *)msk); + sock_put(sk); return err; }
@@ -558,6 +564,7 @@ int mptcp_userspace_pm_set_flags(struct { struct mptcp_sock *msk; int ret = -EINVAL; + struct sock *sk; u32 token_val;
token_val = nla_get_u32(token); @@ -566,6 +573,8 @@ int mptcp_userspace_pm_set_flags(struct if (!msk) return ret;
+ sk = (struct sock *)msk; + if (!mptcp_pm_is_userspace(msk)) goto set_flags_err;
@@ -573,11 +582,11 @@ int mptcp_userspace_pm_set_flags(struct rem->addr.family == AF_UNSPEC) goto set_flags_err;
- lock_sock((struct sock *)msk); + lock_sock(sk); ret = mptcp_pm_nl_mp_prio_send_ack(msk, &loc->addr, &rem->addr, bkup); - release_sock((struct sock *)msk); + release_sock(sk);
set_flags_err: - sock_put((struct sock *)msk); + sock_put(sk); return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit 06afe09091ee69dc7ab058b4be9917ae59cc81e5 upstream.
Corresponding __lookup_addr_by_id() helper in the in-kernel netlink PM, this patch adds a new helper mptcp_userspace_pm_lookup_addr_by_id() to lookup the address entry with the given id on the userspace pm local address list.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: f642c5c4d528 ("mptcp: hold pm lock when deleting entry") Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_userspace.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
--- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -107,19 +107,26 @@ static int mptcp_userspace_pm_delete_loc return -EINVAL; }
+static struct mptcp_pm_addr_entry * +mptcp_userspace_pm_lookup_addr_by_id(struct mptcp_sock *msk, unsigned int id) +{ + struct mptcp_pm_addr_entry *entry; + + list_for_each_entry(entry, &msk->pm.userspace_pm_local_addr_list, list) { + if (entry->addr.id == id) + return entry; + } + return NULL; +} + int mptcp_userspace_pm_get_flags_and_ifindex_by_id(struct mptcp_sock *msk, unsigned int id, u8 *flags, int *ifindex) { - struct mptcp_pm_addr_entry *entry, *match = NULL; + struct mptcp_pm_addr_entry *match;
spin_lock_bh(&msk->pm.lock); - list_for_each_entry(entry, &msk->pm.userspace_pm_local_addr_list, list) { - if (id == entry->addr.id) { - match = entry; - break; - } - } + match = mptcp_userspace_pm_lookup_addr_by_id(msk, id); spin_unlock_bh(&msk->pm.lock); if (match) { *flags = match->flags; @@ -280,7 +287,7 @@ int mptcp_nl_cmd_remove(struct sk_buff * { struct nlattr *token = info->attrs[MPTCP_PM_ATTR_TOKEN]; struct nlattr *id = info->attrs[MPTCP_PM_ATTR_LOC_ID]; - struct mptcp_pm_addr_entry *match = NULL; + struct mptcp_pm_addr_entry *match; struct mptcp_pm_addr_entry *entry; struct mptcp_sock *msk; LIST_HEAD(free_list); @@ -317,13 +324,7 @@ int mptcp_nl_cmd_remove(struct sk_buff *
lock_sock(sk);
- list_for_each_entry(entry, &msk->pm.userspace_pm_local_addr_list, list) { - if (entry->addr.id == id_val) { - match = entry; - break; - } - } - + match = mptcp_userspace_pm_lookup_addr_by_id(msk, id_val); if (!match) { GENL_SET_ERR_MSG(info, "address with specified id not found"); release_sock(sk);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit e0266319413d5d687ba7b6df7ca99e4b9724a4f2 upstream.
Just like in-kernel pm, when userspace pm does set_flags, it needs to send out MP_PRIO signal, and also modify the flags of the corresponding address entry in the local address list. This patch implements the missing logic.
Traverse all address entries on userspace_pm_local_addr_list to find the local address entry, if bkup is true, set the flags of this entry with FLAG_BACKUP, otherwise, clear FLAG_BACKUP.
Fixes: 892f396c8e68 ("mptcp: netlink: issue MP_PRIO signals from userspace PMs") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-1-b835580cefa8@k... Signed-off-by: Jakub Kicinski kuba@kernel.org [ Conflicts in pm_userspace.c, because commit 6a42477fe449 ("mptcp: update set_flags interfaces"), is not in this version, and causes too many conflicts when backporting it. The same code can still be added at the same place, before sending the ACK. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_userspace.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -563,6 +563,7 @@ int mptcp_userspace_pm_set_flags(struct struct mptcp_pm_addr_entry *loc, struct mptcp_pm_addr_entry *rem, u8 bkup) { + struct mptcp_pm_addr_entry *entry; struct mptcp_sock *msk; int ret = -EINVAL; struct sock *sk; @@ -583,6 +584,17 @@ int mptcp_userspace_pm_set_flags(struct rem->addr.family == AF_UNSPEC) goto set_flags_err;
+ spin_lock_bh(&msk->pm.lock); + list_for_each_entry(entry, &msk->pm.userspace_pm_local_addr_list, list) { + if (mptcp_addresses_equal(&entry->addr, &loc->addr, false)) { + if (bkup) + entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP; + else + entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; + } + } + spin_unlock_bh(&msk->pm.lock); + lock_sock(sk); ret = mptcp_pm_nl_mp_prio_send_ack(msk, &loc->addr, &rem->addr, bkup); release_sock(sk);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit f642c5c4d528d11bd78b6c6f84f541cd3c0bea86 upstream.
When traversing userspace_pm_local_addr_list and deleting an entry from it in mptcp_pm_nl_remove_doit(), msk->pm.lock should be held.
This patch holds this lock before mptcp_userspace_pm_lookup_addr_by_id() and releases it after list_move() in mptcp_pm_nl_remove_doit().
Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-2-b835580cefa8@k... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_userspace.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -324,14 +324,17 @@ int mptcp_nl_cmd_remove(struct sk_buff *
lock_sock(sk);
+ spin_lock_bh(&msk->pm.lock); match = mptcp_userspace_pm_lookup_addr_by_id(msk, id_val); if (!match) { GENL_SET_ERR_MSG(info, "address with specified id not found"); + spin_unlock_bh(&msk->pm.lock); release_sock(sk); goto remove_err; }
list_move(&match->list, &free_list); + spin_unlock_bh(&msk->pm.lock);
mptcp_pm_remove_addrs(msk, &free_list);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang tanggeliang@kylinos.cn
commit af250c27ea1c404e210fc3a308b20f772df584d6 upstream.
When the lookup_by_id parameter of __lookup_addr() is true, it's the same as __lookup_addr_by_id(), it can be replaced by __lookup_addr_by_id() directly. So drop this parameter, let __lookup_addr() only looks up address on the local address list by comparing addresses in it, not address ids.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cle... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: db3eab8110bc ("mptcp: pm: use _rcu variant under rcu_read_lock") [ Conflicts in pm_netlink.c, because commit 6a42477fe449 ("mptcp: update set_flags interfaces") is not in this version, and causes too many conflicts when backporting it. The conflict is easy to resolve: addr is a pointer here here in mptcp_pm_nl_set_flags(), the rest of the code is the same. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -521,15 +521,12 @@ __lookup_addr_by_id(struct pm_nl_pernet }
static struct mptcp_pm_addr_entry * -__lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info, - bool lookup_by_id) +__lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) { struct mptcp_pm_addr_entry *entry;
list_for_each_entry(entry, &pernet->local_addr_list, list) { - if ((!lookup_by_id && - mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) || - (lookup_by_id && entry->addr.id == info->id)) + if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) return entry; } return NULL; @@ -560,7 +557,7 @@ static void mptcp_pm_create_subflow_or_s
mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); rcu_read_lock(); - entry = __lookup_addr(pernet, &mpc_addr, false); + entry = __lookup_addr(pernet, &mpc_addr); if (entry) { __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); msk->mpc_endpoint_id = entry->addr.id; @@ -2064,7 +2061,8 @@ int mptcp_pm_nl_set_flags(struct net *ne }
spin_lock_bh(&pernet->lock); - entry = __lookup_addr(pernet, &addr->addr, lookup_by_id); + entry = lookup_by_id ? __lookup_addr_by_id(pernet, addr->addr.id) : + __lookup_addr(pernet, &addr->addr); if (!entry) { spin_unlock_bh(&pernet->lock); return -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
commit db3eab8110bc0520416101b6a5b52f44a43fb4cf upstream.
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are used as expected to iterate over the list of local addresses, but list_for_each_entry() was used instead of list_for_each_entry_rcu() in __lookup_addr(). It is important to use this variant which adds the required READ_ONCE() (and diagnostic checks if enabled).
Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it is called under the pernet->lock and not rcu_read_lock(), an extra condition is then passed to help the diagnostic checks making sure either the associated spin lock or the RCU lock is held.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-3-b835580cefa8@k... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -525,7 +525,8 @@ __lookup_addr(struct pm_nl_pernet *perne { struct mptcp_pm_addr_entry *entry;
- list_for_each_entry(entry, &pernet->local_addr_list, list) { + list_for_each_entry_rcu(entry, &pernet->local_addr_list, list, + lockdep_is_held(&pernet->lock)) { if (mptcp_addresses_equal(&entry->addr, info, entry->addr.port)) return entry; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tvrtko Ursulin tvrtko.ursulin@igalia.com
commit 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 upstream.
KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
[ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0
Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block.
Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit.
The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write.
v2: * Drop impossible v3_0 case. (Mario)
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Evan Quan evan.quan@amd.com Cc: Wenyou Yang WenYou.Yang@amd.com Cc: Alex Deucher alexander.deucher@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703) Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Bin Lan bin.lan.cn@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -256,10 +256,9 @@ static int vangogh_tables_init(struct sm goto err0_out; smu_table->metrics_time = 0;
- if (smu_version >= 0x043F3E00) - smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_3); - else - smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2); + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3)); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4)); smu_table->gpu_metrics_table = kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); if (!smu_table->gpu_metrics_table) goto err1_out;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Van Hensbergen ericvh@kernel.org
[ Upstream commit 6630036b7c228f57c7893ee0403e92c2db2cd21d ]
If an iget fails due to not being able to retrieve information from the server then the inode structure is only partially initialized. When the inode gets evicted, references to uninitialized structures (like fscache cookies) were being made.
This patch checks for a bad_inode before doing anything other than clearing the inode from the cache. Since the inode is bad, it shouldn't have any state associated with it that needs to be written back (and there really isn't a way to complete those anyways).
Reported-by: syzbot+eb83fe1cce5833cd66a0@syzkaller.appspotmail.com Signed-off-by: Eric Van Hensbergen ericvh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org [Xiangyu: CVE-2024-36923 Minor conflict resolution ] Signed-off-by: Xiangyu Chen xiangyu.chen@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/9p/vfs_inode.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
--- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -374,20 +374,23 @@ void v9fs_evict_inode(struct inode *inod struct v9fs_inode __maybe_unused *v9inode = V9FS_I(inode); __le32 __maybe_unused version;
- truncate_inode_pages_final(&inode->i_data); + if (!is_bad_inode(inode)) { + truncate_inode_pages_final(&inode->i_data);
#ifdef CONFIG_9P_FSCACHE - version = cpu_to_le32(v9inode->qid.version); - fscache_clear_inode_writeback(v9fs_inode_cookie(v9inode), inode, + version = cpu_to_le32(v9inode->qid.version); + fscache_clear_inode_writeback(v9fs_inode_cookie(v9inode), inode, &version); #endif - - clear_inode(inode); - filemap_fdatawrite(&inode->i_data); + clear_inode(inode); + filemap_fdatawrite(&inode->i_data);
#ifdef CONFIG_9P_FSCACHE - fscache_relinquish_cookie(v9fs_inode_cookie(v9inode), false); + if (v9fs_inode_cookie(v9inode)) + fscache_relinquish_cookie(v9fs_inode_cookie(v9inode), false); #endif + } else + clear_inode(inode); }
static int v9fs_test_inode(struct inode *inode, void *data)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: George Stark gnstark@salutedevices.com
commit efc347b9efee1c2b081f5281d33be4559fa50a16 upstream.
In this driver LEDs are registered using devm_led_classdev_register() so they are automatically unregistered after module's remove() is done. led_classdev_unregister() calls module's led_set_brightness() to turn off the LEDs and that callback uses mutex which was destroyed already in module's remove() so use devm API instead.
Signed-off-by: George Stark gnstark@salutedevices.com Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Link: https://lore.kernel.org/r/20240411161032.609544-8-gnstark@salutedevices.com Signed-off-by: Lee Jones lee@kernel.org [ Resolve minor conflicts to fix CVE-2024-42129 ] Signed-off-by: Bin Lan bin.lan.cn@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/leds/leds-mlxreg.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-)
--- a/drivers/leds/leds-mlxreg.c +++ b/drivers/leds/leds-mlxreg.c @@ -257,6 +257,7 @@ static int mlxreg_led_probe(struct platf { struct mlxreg_core_platform_data *led_pdata; struct mlxreg_led_priv_data *priv; + int err;
led_pdata = dev_get_platdata(&pdev->dev); if (!led_pdata) { @@ -268,28 +269,21 @@ static int mlxreg_led_probe(struct platf if (!priv) return -ENOMEM;
- mutex_init(&priv->access_lock); + err = devm_mutex_init(&pdev->dev, &priv->access_lock); + if (err) + return err; + priv->pdev = pdev; priv->pdata = led_pdata;
return mlxreg_led_config(priv); }
-static int mlxreg_led_remove(struct platform_device *pdev) -{ - struct mlxreg_led_priv_data *priv = dev_get_drvdata(&pdev->dev); - - mutex_destroy(&priv->access_lock); - - return 0; -} - static struct platform_driver mlxreg_led_driver = { .driver = { .name = "leds-mlxreg", }, .probe = mlxreg_led_probe, - .remove = mlxreg_led_remove, };
module_platform_driver(mlxreg_led_driver);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
[ Upstream commit 3dd6ed34ce1f2356a77fb88edafb5ec96784e3cf ]
Patch series "fix error handling in mmap_region() and refactor (hotfixes)", v4.
mmap_region() is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state.
This series goes to great lengths to simplify how mmap_region() works and to avoid unwinding errors late on in the process of setting up the VMA for the new mapping, and equally avoids such operations occurring while the VMA is in an inconsistent state.
The patches in this series comprise the minimal changes required to resolve existing issues in mmap_region() error handling, in order that they can be hotfixed and backported. There is additionally a follow up series which goes further, separated out from the v1 series and sent and updated separately.
This patch (of 5):
After an attempted mmap() fails, we are no longer in a situation where we can safely interact with VMA hooks. This is currently not enforced, meaning that we need complicated handling to ensure we do not incorrectly call these hooks.
We can avoid the whole issue by treating the VMA as suspect the moment that the file->f_ops->mmap() function reports an error by replacing whatever VMA operations were installed with a dummy empty set of VMA operations.
We do so through a new helper function internal to mm - mmap_file() - which is both more logically named than the existing call_mmap() function and correctly isolates handling of the vm_op reassignment to mm.
All the existing invocations of call_mmap() outside of mm are ultimately nested within the call_mmap() from mm, which we now replace.
It is therefore safe to leave call_mmap() in place as a convenience function (and to avoid churn). The invokers are:
ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap() coda_file_operations -> mmap -> coda_file_mmap() shm_file_operations -> shm_mmap() shm_file_operations_huge -> shm_mmap() dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops -> i915_gem_dmabuf_mmap()
None of these callers interact with vm_ops or mappings in a problematic way on error, quickly exiting out.
Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.173022466... Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Jann Horn jannh@google.com Cc: Andreas Larsson andreas@gaisler.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: David S. Miller davem@davemloft.net Cc: Helge Deller deller@gmx.de Cc: James E.J. Bottomley James.Bottomley@HansenPartnership.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mark Brown broonie@kernel.org Cc: Peter Xu peterx@redhat.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/internal.h | 27 +++++++++++++++++++++++++++ mm/mmap.c | 4 ++-- mm/nommu.c | 4 ++-- 3 files changed, 31 insertions(+), 4 deletions(-)
--- a/mm/internal.h +++ b/mm/internal.h @@ -83,6 +83,33 @@ static inline void *folio_raw_mapping(st return (void *)(mapping & ~PAGE_MAPPING_FLAGS); }
+/* + * This is a file-backed mapping, and is about to be memory mapped - invoke its + * mmap hook and safely handle error conditions. On error, VMA hooks will be + * mutated. + * + * @file: File which backs the mapping. + * @vma: VMA which we are mapping. + * + * Returns: 0 if success, error otherwise. + */ +static inline int mmap_file(struct file *file, struct vm_area_struct *vma) +{ + int err = call_mmap(file, vma); + + if (likely(!err)) + return 0; + + /* + * OK, we tried to call the file hook for mmap(), but an error + * arose. The mapping is in an inconsistent state and we most not invoke + * any further hooks on it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + + return err; +} + void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio, int nr_throttled); static inline void acct_reclaim_writeback(struct folio *folio) --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2779,7 +2779,7 @@ cannot_expand: }
vma->vm_file = get_file(file); - error = call_mmap(file, vma); + error = mmap_file(file, vma); if (error) goto unmap_and_free_vma;
@@ -2793,7 +2793,7 @@ cannot_expand:
vma_iter_config(&vmi, addr, end); /* - * If vm_flags changed after call_mmap(), we should try merge + * If vm_flags changed after mmap_file(), we should try merge * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && prev)) { --- a/mm/nommu.c +++ b/mm/nommu.c @@ -896,7 +896,7 @@ static int do_mmap_shared_file(struct vm { int ret;
- ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); if (ret == 0) { vma->vm_region->vm_top = vma->vm_region->vm_end; return 0; @@ -929,7 +929,7 @@ static int do_mmap_private(struct vm_are * happy. */ if (capabilities & NOMMU_MAP_DIRECT) { - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); /* shouldn't return success if we're not sharing */ if (WARN_ON_ONCE(!is_nommu_shared_mapping(vma->vm_flags))) ret = -ENOSYS;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
[ Upstream commit 4080ef1579b2413435413988d14ac8c68e4d42c8 ]
Incorrect invocation of VMA callbacks when the VMA is no longer in a consistent state is bug prone and risky to perform.
With regards to the important vm_ops->close() callback We have gone to great lengths to try to track whether or not we ought to close VMAs.
Rather than doing so and risking making a mistake somewhere, instead unconditionally close and reset vma->vm_ops to an empty dummy operations set with a NULL .close operator.
We introduce a new function to do so - vma_close() - and simplify existing vms logic which tracked whether we needed to close or not.
This simplifies the logic, avoids incorrect double-calling of the .close() callback and allows us to update error paths to simply call vma_close() unconditionally - making VMA closure idempotent.
Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.173022466... Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Jann Horn jannh@google.com Cc: Andreas Larsson andreas@gaisler.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: David S. Miller davem@davemloft.net Cc: Helge Deller deller@gmx.de Cc: James E.J. Bottomley James.Bottomley@HansenPartnership.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mark Brown broonie@kernel.org Cc: Peter Xu peterx@redhat.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/internal.h | 18 ++++++++++++++++++ mm/mmap.c | 9 +++------ mm/nommu.c | 3 +-- 3 files changed, 22 insertions(+), 8 deletions(-)
--- a/mm/internal.h +++ b/mm/internal.h @@ -110,6 +110,24 @@ static inline int mmap_file(struct file return err; }
+/* + * If the VMA has a close hook then close it, and since closing it might leave + * it in an inconsistent state which makes the use of any hooks suspect, clear + * them down by installing dummy empty hooks. + */ +static inline void vma_close(struct vm_area_struct *vma) +{ + if (vma->vm_ops && vma->vm_ops->close) { + vma->vm_ops->close(vma); + + /* + * The mapping is in an inconsistent state, and no further hooks + * may be invoked upon it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + } +} + void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio, int nr_throttled); static inline void acct_reclaim_writeback(struct folio *folio) --- a/mm/mmap.c +++ b/mm/mmap.c @@ -137,8 +137,7 @@ void unlink_file_vma(struct vm_area_stru static void remove_vma(struct vm_area_struct *vma, bool unreachable) { might_sleep(); - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); @@ -2899,8 +2898,7 @@ expanded: return addr;
close_and_free_vma: - if (file && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma);
if (file || vma->vm_file) { unmap_and_free_vma: @@ -3392,8 +3390,7 @@ struct vm_area_struct *copy_vma(struct v return new_vma;
out_vma_link: - if (new_vma->vm_ops && new_vma->vm_ops->close) - new_vma->vm_ops->close(new_vma); + vma_close(new_vma);
if (new_vma->vm_file) fput(new_vma->vm_file); --- a/mm/nommu.c +++ b/mm/nommu.c @@ -600,8 +600,7 @@ static int delete_vma_from_mm(struct vm_ */ static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma) { - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); put_nommu_region(vma->vm_region);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
[ Upstream commit 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf ]
Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit.
While we're here, we refactor the function to be more readable and add some additional documentation.
Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.173022466... Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Jann Horn jannh@google.com Cc: Andreas Larsson andreas@gaisler.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: David S. Miller davem@davemloft.net Cc: Helge Deller deller@gmx.de Cc: James E.J. Bottomley James.Bottomley@HansenPartnership.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mark Brown broonie@kernel.org Cc: Peter Xu peterx@redhat.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mman.h | 21 ++++++++++++++++++--- mm/mmap.c | 2 +- mm/mprotect.c | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-)
--- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -187,16 +187,31 @@ static inline bool arch_memory_deny_writ * * d) mmap(PROT_READ | PROT_EXEC) * mmap(PROT_READ | PROT_EXEC | PROT_BTI) + * + * This is only applicable if the user has set the Memory-Deny-Write-Execute + * (MDWE) protection mask for the current process. + * + * @old specifies the VMA flags the VMA originally possessed, and @new the ones + * we propose to set. + * + * Return: false if proposed change is OK, true if not ok and should be denied. */ -static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +static inline bool map_deny_write_exec(unsigned long old, unsigned long new) { + /* If MDWE is disabled, we have nothing to deny. */ if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags)) return false;
- if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + /* If the new VMA is not executable, we have nothing to deny. */ + if (!(new & VM_EXEC)) + return false; + + /* Under MDWE we do not accept newly writably executable VMAs... */ + if (new & VM_WRITE) return true;
- if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + /* ...nor previously non-executable VMAs becoming executable. */ + if (!(old & VM_EXEC)) return true;
return false; --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2826,7 +2826,7 @@ cannot_expand: vma_set_anonymous(vma); }
- if (map_deny_write_exec(vma, vma->vm_flags)) { + if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { error = -EACCES; goto close_and_free_vma; } --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -791,7 +791,7 @@ static int do_mprotect_pkey(unsigned lon break; }
- if (map_deny_write_exec(vma, newflags)) { + if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
[ Upstream commit 5baf8b037debf4ec60108ccfeccb8636d1dbad81 ]
Currently MTE is permitted in two circumstances (desiring to use MTE having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is specified, as checked by arch_calc_vm_flag_bits() and actualised by setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap hook is activated in mmap_region().
The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also set is the arm64 implementation of arch_validate_flags().
Unfortunately, we intend to refactor mmap_region() to perform this check earlier, meaning that in the case of a shmem backing we will not have invoked shmem_mmap() yet, causing the mapping to fail spuriously.
It is inappropriate to set this architecture-specific flag in general mm code anyway, so a sensible resolution of this issue is to instead move the check somewhere else.
We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via the arch_calc_vm_flag_bits() call.
This is an appropriate place to do this as we already check for the MAP_ANONYMOUS case here, and the shmem file case is simply a variant of the same idea - we permit RAM-backed memory.
This requires a modification to the arch_calc_vm_flag_bits() signature to pass in a pointer to the struct file associated with the mapping, however this is not too egregious as this is only used by two architectures anyway - arm64 and parisc.
So this patch performs this adjustment and removes the unnecessary assignment of VM_MTE_ALLOWED in shmem_mmap().
[akpm@linux-foundation.org: fix whitespace, per Catalin] Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.173022466... Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Suggested-by: Catalin Marinas catalin.marinas@arm.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Cc: Andreas Larsson andreas@gaisler.com Cc: David S. Miller davem@davemloft.net Cc: Helge Deller deller@gmx.de Cc: James E.J. Bottomley James.Bottomley@HansenPartnership.com Cc: Liam R. Howlett Liam.Howlett@oracle.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Mark Brown broonie@kernel.org Cc: Peter Xu peterx@redhat.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/mman.h | 10 +++++++--- arch/parisc/include/asm/mman.h | 5 +++-- include/linux/mman.h | 7 ++++--- mm/mmap.c | 2 +- mm/nommu.c | 2 +- mm/shmem.c | 3 --- 6 files changed, 16 insertions(+), 13 deletions(-)
--- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -3,6 +3,8 @@ #define __ASM_MMAN_H__
#include <linux/compiler.h> +#include <linux/fs.h> +#include <linux/shmem_fs.h> #include <linux/types.h> #include <uapi/asm/mman.h>
@@ -21,19 +23,21 @@ static inline unsigned long arch_calc_vm } #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, + unsigned long flags) { /* * Only allow MTE on anonymous mappings as these are guaranteed to be * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ - if (system_supports_mte() && (flags & MAP_ANONYMOUS)) + if (system_supports_mte() && + ((flags & MAP_ANONYMOUS) || shmem_file(file))) return VM_MTE_ALLOWED;
return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
static inline bool arch_validate_prot(unsigned long prot, unsigned long addr __always_unused) --- a/arch/parisc/include/asm/mman.h +++ b/arch/parisc/include/asm/mman.h @@ -2,6 +2,7 @@ #ifndef __ASM_MMAN_H__ #define __ASM_MMAN_H__
+#include <linux/fs.h> #include <uapi/asm/mman.h>
/* PARISC cannot allow mdwe as it needs writable stacks */ @@ -11,7 +12,7 @@ static inline bool arch_memory_deny_writ } #define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags) { /* * The stack on parisc grows upwards, so if userspace requests memory @@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm
return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
#endif /* __ASM_MMAN_H__ */ --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -2,6 +2,7 @@ #ifndef _LINUX_MMAN_H #define _LINUX_MMAN_H
+#include <linux/fs.h> #include <linux/mm.h> #include <linux/percpu_counter.h>
@@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long #endif
#ifndef arch_calc_vm_flag_bits -#define arch_calc_vm_flag_bits(flags) 0 +#define arch_calc_vm_flag_bits(file, flags) 0 #endif
#ifndef arch_validate_prot @@ -151,12 +152,12 @@ calc_vm_prot_bits(unsigned long prot, un * Combine the mmap "flags" argument into "vm_flags" used internally. */ static inline unsigned long -calc_vm_flag_bits(unsigned long flags) +calc_vm_flag_bits(struct file *file, unsigned long flags) { return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | - arch_calc_vm_flag_bits(flags); + arch_calc_vm_flag_bits(file, flags); }
unsigned long vm_commit_limit(void); --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1273,7 +1273,7 @@ unsigned long do_mmap(struct file *file, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
if (flags & MAP_LOCKED) --- a/mm/nommu.c +++ b/mm/nommu.c @@ -853,7 +853,7 @@ static unsigned long determine_vm_flags( { unsigned long vm_flags;
- vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags); + vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags);
if (!file) { /* --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2400,9 +2400,6 @@ static int shmem_mmap(struct file *file, if (ret) return ret;
- /* arm64 - allow memory tagging on RAM-based files */ - vm_flags_set(vma, VM_MTE_ALLOWED); - file_accessed(file); /* This is anonymous shared memory if it is unlinked at the time of mmap */ if (inode->i_nlink)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Stoakes lorenzo.stoakes@oracle.com
[ Upstream commit 5de195060b2e251a835f622759550e6202167641 ]
The mmap_region() function is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state.
Taking advantage of previous patches in this series we move a number of checks earlier in the code, simplifying things by moving the core of the logic into a static internal function __mmap_region().
Doing this allows us to perform a number of checks up front before we do any real work, and allows us to unwind the writable unmap check unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE validation unconditionally also.
We move a number of things here:
1. We preallocate memory for the iterator before we call the file-backed memory hook, allowing us to exit early and avoid having to perform complicated and error-prone close/free logic. We carefully free iterator state on both success and error paths.
2. The enclosing mmap_region() function handles the mapping_map_writable() logic early. Previously the logic had the mapping_map_writable() at the point of mapping a newly allocated file-backed VMA, and a matching mapping_unmap_writable() on success and error paths.
We now do this unconditionally if this is a file-backed, shared writable mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however doing so does not invalidate the seal check we just performed, and we in any case always decrement the counter in the wrapper.
We perform a debug assert to ensure a driver does not attempt to do the opposite.
3. We also move arch_validate_flags() up into the mmap_region() function. This is only relevant on arm64 and sparc64, and the check is only meaningful for SPARC with ADI enabled. We explicitly add a warning for this arch if a driver invalidates this check, though the code ought eventually to be fixed to eliminate the need for this.
With all of these measures in place, we no longer need to explicitly close the VMA on error paths, as we place all checks which might fail prior to a call to any driver mmap hook.
This eliminates an entire class of errors, makes the code easier to reason about and more robust.
Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.173022466... Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Reported-by: Jann Horn jannh@google.com Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Tested-by: Mark Brown broonie@kernel.org Cc: Andreas Larsson andreas@gaisler.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: David S. Miller davem@davemloft.net Cc: Helge Deller deller@gmx.de Cc: James E.J. Bottomley James.Bottomley@HansenPartnership.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Xu peterx@redhat.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mmap.c | 117 +++++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 67 insertions(+), 50 deletions(-)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -2666,14 +2666,14 @@ int do_munmap(struct mm_struct *mm, unsi return do_vmi_munmap(&vmi, mm, start, len, uf, false); }
-unsigned long mmap_region(struct file *file, unsigned long addr, +static unsigned long __mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; struct vm_area_struct *next, *prev, *merge; - pgoff_t pglen = len >> PAGE_SHIFT; + pgoff_t pglen = PHYS_PFN(len); unsigned long charged = 0; unsigned long end = addr + len; unsigned long merge_start = addr, merge_end = end; @@ -2770,25 +2770,26 @@ cannot_expand: vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff;
- if (file) { - if (vm_flags & VM_SHARED) { - error = mapping_map_writable(file->f_mapping); - if (error) - goto free_vma; - } + if (vma_iter_prealloc(&vmi, vma)) { + error = -ENOMEM; + goto free_vma; + }
+ if (file) { vma->vm_file = get_file(file); error = mmap_file(file, vma); if (error) - goto unmap_and_free_vma; + goto unmap_and_free_file_vma;
+ /* Drivers cannot alter the address of the VMA. */ + WARN_ON_ONCE(addr != vma->vm_start); /* - * Expansion is handled above, merging is handled below. - * Drivers should not alter the address of the VMA. + * Drivers should not permit writability when previously it was + * disallowed. */ - error = -EINVAL; - if (WARN_ON((addr != vma->vm_start))) - goto close_and_free_vma; + VM_WARN_ON_ONCE(vm_flags != vma->vm_flags && + !(vm_flags & VM_MAYWRITE) && + (vma->vm_flags & VM_MAYWRITE));
vma_iter_config(&vmi, addr, end); /* @@ -2800,6 +2801,7 @@ cannot_expand: vma->vm_end, vma->vm_flags, NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX, NULL); + if (merge) { /* * ->mmap() can change vma->vm_file and fput @@ -2813,7 +2815,7 @@ cannot_expand: vma = merge; /* Update vm_flags to pick up the change. */ vm_flags = vma->vm_flags; - goto unmap_writable; + goto file_expanded; } }
@@ -2821,24 +2823,15 @@ cannot_expand: } else if (vm_flags & VM_SHARED) { error = shmem_zero_setup(vma); if (error) - goto free_vma; + goto free_iter_vma; } else { vma_set_anonymous(vma); }
- if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { - error = -EACCES; - goto close_and_free_vma; - } - - /* Allow architectures to sanity-check the vm_flags */ - error = -EINVAL; - if (!arch_validate_flags(vma->vm_flags)) - goto close_and_free_vma; - - error = -ENOMEM; - if (vma_iter_prealloc(&vmi, vma)) - goto close_and_free_vma; +#ifdef CONFIG_SPARC64 + /* TODO: Fix SPARC ADI! */ + WARN_ON_ONCE(!arch_validate_flags(vm_flags)); +#endif
/* Lock the VMA since it is modified after insertion into VMA tree */ vma_start_write(vma); @@ -2861,10 +2854,7 @@ cannot_expand: */ khugepaged_enter_vma(vma, vma->vm_flags);
- /* Once vma denies write, undo our temporary denial count */ -unmap_writable: - if (file && vm_flags & VM_SHARED) - mapping_unmap_writable(file->f_mapping); +file_expanded: file = vma->vm_file; ksm_add_vma(vma); expanded: @@ -2894,33 +2884,60 @@ expanded:
vma_set_page_prot(vma);
- validate_mm(mm); return addr;
-close_and_free_vma: - vma_close(vma); - - if (file || vma->vm_file) { -unmap_and_free_vma: - fput(vma->vm_file); - vma->vm_file = NULL; - - vma_iter_set(&vmi, vma->vm_end); - /* Undo any partial mapping done by a device driver. */ - unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start, - vma->vm_end, vma->vm_end, true); - } - if (file && (vm_flags & VM_SHARED)) - mapping_unmap_writable(file->f_mapping); +unmap_and_free_file_vma: + fput(vma->vm_file); + vma->vm_file = NULL; + + vma_iter_set(&vmi, vma->vm_end); + /* Undo any partial mapping done by a device driver. */ + unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start, + vma->vm_end, vma->vm_end, true); +free_iter_vma: + vma_iter_free(&vmi); free_vma: vm_area_free(vma); unacct_error: if (charged) vm_unacct_memory(charged); - validate_mm(mm); return error; }
+unsigned long mmap_region(struct file *file, unsigned long addr, + unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, + struct list_head *uf) +{ + unsigned long ret; + bool writable_file_mapping = false; + + /* Check to see if MDWE is applicable. */ + if (map_deny_write_exec(vm_flags, vm_flags)) + return -EACCES; + + /* Allow architectures to sanity-check the vm_flags. */ + if (!arch_validate_flags(vm_flags)) + return -EINVAL; + + /* Map writable and ensure this isn't a sealed memfd. */ + if (file && (vm_flags & VM_SHARED)) { + int error = mapping_map_writable(file->f_mapping); + + if (error) + return error; + writable_file_mapping = true; + } + + ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf); + + /* Clear our write mapping regardless of error. */ + if (writable_file_mapping) + mapping_unmap_writable(file->f_mapping); + + validate_mm(current->mm); + return ret; +} + static int __vm_munmap(unsigned long start, size_t len, bool unlock) { int ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
commit e9e3db69966d5e9e6f7e7d017b407c0025180fe5 upstream.
kdamond_apply_schemes() checks apply intervals of schemes and avoid further applying any schemes if no scheme passed its apply interval. However, the following schemes applying function, damon_do_apply_schemes() iterates all schemes without the apply interval check. As a result, the shortest apply interval is applied to all schemes. Fix the problem by checking the apply interval in damon_do_apply_schemes().
Link: https://lkml.kernel.org/r/20240205201306.88562-1-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org [6.7.x] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/core.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -989,6 +989,9 @@ static void damon_do_apply_schemes(struc damon_for_each_scheme(s, c) { struct damos_quota *quota = &s->quota;
+ if (c->passed_sample_intervals != s->next_apply_sis) + continue; + if (!s->wmarks.activated) continue;
@@ -1089,10 +1092,6 @@ static void kdamond_apply_schemes(struct if (c->passed_sample_intervals != s->next_apply_sis) continue;
- s->next_apply_sis += - (s->apply_interval_us ? s->apply_interval_us : - c->attrs.aggr_interval) / sample_interval; - if (!s->wmarks.activated) continue;
@@ -1108,6 +1107,14 @@ static void kdamond_apply_schemes(struct damon_for_each_region_safe(r, next_r, t) damon_do_apply_schemes(c, t, r); } + + damon_for_each_scheme(s, c) { + if (c->passed_sample_intervals != s->next_apply_sis) + continue; + s->next_apply_sis += + (s->apply_interval_us ? s->apply_interval_us : + c->attrs.aggr_interval) / sample_interval; + } }
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
commit 8e7bde615f634a82a44b1f3d293c049fd3ef9ca9 upstream.
DAMON's logics to determine if this is the time to apply damos schemes assumes next_apply_sis is always set larger than current passed_sample_intervals. And therefore assume continuously incrementing passed_sample_intervals will make it reaches to the next_apply_sis in future. The logic hence does apply the scheme and update next_apply_sis only if passed_sample_intervals is same to next_apply_sis.
If Schemes apply interval is set as zero, however, next_apply_sis is set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_apply_sis check. Hence, next_apply_sis becomes larger than next_apply_sis, and the logic says it is not the time to apply schemes and update next_apply_sis. In other words, DAMON stops applying schemes until passed_sample_intervals overflows.
Based on the documents and the common sense, a reasonable behavior for such inputs would be applying the schemes for every sampling interval. Handle the case by removing the assumption.
Link: https://lkml.kernel.org/r/20241031183757.49610-3-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org [6.7.x] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -989,7 +989,7 @@ static void damon_do_apply_schemes(struc damon_for_each_scheme(s, c) { struct damos_quota *quota = &s->quota;
- if (c->passed_sample_intervals != s->next_apply_sis) + if (c->passed_sample_intervals < s->next_apply_sis) continue;
if (!s->wmarks.activated) @@ -1089,7 +1089,7 @@ static void kdamond_apply_schemes(struct bool has_schemes_to_apply = false;
damon_for_each_scheme(s, c) { - if (c->passed_sample_intervals != s->next_apply_sis) + if (c->passed_sample_intervals < s->next_apply_sis) continue;
if (!s->wmarks.activated) @@ -1109,9 +1109,9 @@ static void kdamond_apply_schemes(struct }
damon_for_each_scheme(s, c) { - if (c->passed_sample_intervals != s->next_apply_sis) + if (c->passed_sample_intervals < s->next_apply_sis) continue; - s->next_apply_sis += + s->next_apply_sis = c->passed_sample_intervals + (s->apply_interval_us ? s->apply_interval_us : c->attrs.aggr_interval) / sample_interval; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
commit 1f3730fd9e8d4d77fb99c60d0e6ad4b1104e7e04 upstream.
Regions split function ('damon_split_region_at()') is called at the beginning of an aggregation interval, and when DAMOS applying the actions and charging quota. Because 'nr_accesses' fields of all regions are reset at the beginning of each aggregation interval, and DAMOS was applying the action at the end of each aggregation interval, there was no need to copy the 'nr_accesses' field to the split-out region.
However, commit 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") made DAMOS applies action on its own timing interval. Hence, 'nr_accesses' should also copied to split-out regions, but the commit didn't. Fix it by copying it.
Link: https://lkml.kernel.org/r/20231119171529.66863-1-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park sj@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1215,6 +1215,7 @@ static void damon_split_region_at(struct
new->age = r->age; new->last_nr_accesses = r->last_nr_accesses; + new->nr_accesses = r->nr_accesses;
damon_insert_region(new, r, damon_next_region(r), t); }
On Wed, Nov 20, 2024 at 01:56:10PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
Hello,
On Wed, 20 Nov 2024 13:56:10 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/damonitor/damon-tests/tree/next/corr [2] 2c6a63e3d044 ("Linux 6.6.63-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh # SKIP ok 12 selftests: damon-tests: build_m68k.sh # SKIP ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
On 11/20/24 04:56, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.63-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On 11/20/24 05:56, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.63-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 11/20/24 04:56, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.63-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Wed, 20 Nov 2024 at 18:30, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.63-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.6.63-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: 2c6a63e3d044aa21274c98760650830a22b5d54c * git describe: v6.6.62-83-g2c6a63e3d044 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.62...
## Test Regressions (compared to v6.6.60-169-g68a649492c1f)
## Metric Regressions (compared to v6.6.60-169-g68a649492c1f)
## Test Fixes (compared to v6.6.60-169-g68a649492c1f)
## Metric Fixes (compared to v6.6.60-169-g68a649492c1f)
## Test result summary total: 120020, pass: 98405, fail: 1453, skip: 20080, xfail: 82
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 128 total, 128 passed, 0 failed * arm64: 40 total, 40 passed, 0 failed * i386: 27 total, 25 passed, 2 failed * mips: 26 total, 25 passed, 1 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 32 total, 31 passed, 1 failed * riscv: 19 total, 19 passed, 0 failed * s390: 14 total, 13 passed, 1 failed * sh: 10 total, 10 passed, 0 failed * sparc: 7 total, 7 passed, 0 failed * x86_64: 32 total, 32 passed, 0 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
The kernel, modules, BPF tool, and kselftest tool for 6.6.63-rc1 builds successfully on both amd64 and arm64 Azure Linux VMs.
Tested-by: Hardik Garg hargar@linux.microsoft.com
Thanks, Hardik
On Wed, 20 Nov 2024 13:56:10 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.63 release. There are 82 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 22 Nov 2024 12:56:17 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.63-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.6: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 111 tests: 111 pass, 0 fail
Linux version: 6.6.63-rc1-g2c6a63e3d044 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
linux-stable-mirror@lists.linaro.org