August 2024 - Linux-stable-mirror

[PATCH v3] x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported

by Mitchell Levy via B4 Relay

From: Mitchell Levy <levymitchell0(a)gmail.com> When computing which xfeatures are available, make sure that LBR is only present if both LBR is supported in general, as well as by XSAVES. There are two distinct CPU features related to the use of XSAVES as it applies to LBR: whether LBR is itself supported (strictly speaking, I'm not sure that this is necessary to check though it's certainly a good sanity check), and whether XSAVES supports LBR (see sections 13.2 and 13.5.12 of the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 1). Currently, the LBR subsystem correctly checks both (see intel_pmu_arch_lbr_init), however the xstate initialization subsystem does not. When calculating what value to place in the IA32_XSS MSR, xfeatures_mask_independent only checks whether LBR support is present, not whether XSAVES supports LBR. If XSAVES does not support LBR, this write causes #GP, leaving the state of IA32_XSS unchanged (i.e., set to zero, as its not written with other values, and its default value is zero out of RESET per section 13.3 of the arch manual). Then, the next time XRSTORS is used to restore supervisor state, it will fail with #GP (because the RFBM has zero for all supervisor features, which does not match the XCOMP_BV field). In particular, XFEATURE_MASK_FPSTATE includes supervisor features, so setting up the FPU will cause a #GP. This results in a call to fpu_reset_from_exception_fixup, which by the same process results in another #GP. Eventually this causes the kernel to run out of stack space and #DF. Fixes: f0dccc9da4c0 ("x86/fpu/xstate: Support dynamic supervisor feature for LBR") Cc: stable(a)vger.kernel.org Suggested-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Mitchell Levy <levymitchell0(a)gmail.com> --- Changes in v3: - Use a proper Suggested-by: (thanks tglx) - Link to v2: https://lore.kernel.org/r/20240809-xsave-lbr-fix-v2-1-04296b387380@gmail.com Changes in v2: - Corrected Fixes tag (thanks tglx) - Properly check for XSAVES support of LBR (thanks tglx) - Link to v1: https://lore.kernel.org/r/20240808-xsave-lbr-fix-v1-1-a223806c83e7@gmail.com --- arch/x86/include/asm/fpu/types.h | 7 +++++++ arch/x86/kernel/fpu/xstate.c | 3 +++ arch/x86/kernel/fpu/xstate.h | 4 ++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index eb17f31b06d2..de16862bf230 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -591,6 +591,13 @@ struct fpu_state_config { * even without XSAVE support, i.e. legacy features FP + SSE */ u64 legacy_features; + /* + * @independent_features: + * + * Features that are supported by XSAVES, but not managed as part of + * the FPU core, such as LBR + */ + u64 independent_features; }; /* FPU state configuration information */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c5a026fee5e0..1339f8328db5 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -788,6 +788,9 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) goto out_disable; } + fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features & + XFEATURE_MASK_INDEPENDENT; + /* * Clear XSAVE features that are disabled in the normal CPUID. */ diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 2ee0b9c53dcc..afb404cd2059 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -62,9 +62,9 @@ static inline u64 xfeatures_mask_supervisor(void) static inline u64 xfeatures_mask_independent(void) { if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) - return XFEATURE_MASK_INDEPENDENT & ~XFEATURE_MASK_LBR; + return fpu_kernel_cfg.independent_features & ~XFEATURE_MASK_LBR; - return XFEATURE_MASK_INDEPENDENT; + return fpu_kernel_cfg.independent_features; } /* XSAVE/XRSTOR wrapper functions */ --- base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed change-id: 20240807-xsave-lbr-fix-02d52f641653 Best regards, -- Mitchell Levy <levymitchell0(a)gmail.com>

1 year, 4 months

3
2
0 0

FAILED: patch "[PATCH] mptcp: pm: don't try to create sf if alloc failed" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x cd7c957f936f8cb80d03e5152f4013aae65bd986 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081245-deem-refinance-8605@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: cd7c957f936f ("mptcp: pm: don't try to create sf if alloc failed") c95eb32ced82 ("mptcp: pm: reduce indentation blocks") 528cb5f2a1e8 ("mptcp: pass addr to mptcp_pm_alloc_anno_list") 77e4b94a3de6 ("mptcp: update userspace pm infos") 24430f8bf516 ("mptcp: add address into userspace pm list") fb00ee4f3343 ("mptcp: netlink: respect v4/v6-only sockets") 80638684e840 ("mptcp: get sk from msk directly") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From cd7c957f936f8cb80d03e5152f4013aae65bd986 Mon Sep 17 00:00:00 2001 From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org> Date: Wed, 31 Jul 2024 13:05:56 +0200 Subject: [PATCH] mptcp: pm: don't try to create sf if alloc failed It sounds better to avoid wasting cycles and / or put extreme memory pressure on the system by trying to create new subflows if it was not possible to add a new item in the announce list. While at it, a warning is now printed if the entry was already in the list as it should not happen with the in-kernel path-manager. With this PM, mptcp_pm_alloc_anno_list() should only fail in case of memory pressure. Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable(a)vger.kernel.org Suggested-by: Paolo Abeni <pabeni(a)redhat.com> Reviewed-by: Mat Martineau <martineau(a)kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-… Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 780f4cca165c..2be7af377cda 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -348,7 +348,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr); if (add_entry) { - if (mptcp_pm_is_kernel(msk)) + if (WARN_ON_ONCE(mptcp_pm_is_kernel(msk))) return false; sk_reset_timer(sk, &add_entry->add_timer, @@ -555,8 +555,6 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) /* check first for announce */ if (msk->pm.add_addr_signaled < add_addr_signal_max) { - local = select_signal_address(pernet, msk); - /* due to racing events on both ends we can reach here while * previous add address is still running: if we invoke now * mptcp_pm_announce_addr(), that will fail and the @@ -567,11 +565,15 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL)) return; + local = select_signal_address(pernet, msk); if (!local) goto subflow; + /* If the alloc fails, we are on memory pressure, not worth + * continuing, and trying to create subflows. + */ if (!mptcp_pm_alloc_anno_list(msk, &local->addr)) - goto subflow; + return; __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++;

1 year, 4 months

3
7
0 0

FAILED: patch "[PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081414-distance-hurler-0efd@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage") e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap") 9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap") f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap") 4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap") 13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap") 77435322777d ("fs: port ->get_acl() to pass mnt_idmap") 011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap") 5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap") c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap") 7a77db95511c ("fs: port ->symlink() to pass mnt_idmap") 6c960e68aaed ("fs: port ->create() to pass mnt_idmap") b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap") c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap") abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap") 6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001 From: Kees Cook <kees(a)kernel.org> Date: Thu, 8 Aug 2024 11:39:08 -0700 Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges. For example, if a file could change permissions from executable and not set-id: ---------x 1 root root 16048 Aug 7 13:16 target to set-id and non-executable: ---S------ 1 root root 16048 Aug 7 13:16 target it is possible to gain root privileges when execution should have been disallowed. While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root". Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal. Reported-by: Marco Vanotti <mvanotti(a)google.com> Tested-by: Marco Vanotti <mvanotti(a)google.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org Cc: Eric Biederman <ebiederm(a)xmission.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Signed-off-by: Kees Cook <kees(a)kernel.org> diff --git a/fs/exec.c b/fs/exec.c index a126e3d1cacb..50e76cc633c4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) unsigned int mode; vfsuid_t vfsuid; vfsgid_t vfsgid; + int err; if (!mnt_may_suid(file->f_path.mnt)) return; @@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) /* Be careful if suid/sgid is set */ inode_lock(inode); - /* reload atomically mode/uid/gid now that lock held */ + /* Atomically reload and check mode/uid/gid now that lock held. */ mode = inode->i_mode; vfsuid = i_uid_into_vfsuid(idmap, inode); vfsgid = i_gid_into_vfsgid(idmap, inode); + err = inode_permission(idmap, inode, MAY_EXEC); inode_unlock(inode); + /* Did the exec bit vanish out from under us? Give up. */ + if (err) + return; + /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) || !vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))

1 year, 4 months

1
0
0 0

FAILED: patch "[PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081412-caddie-manhandle-397e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage") e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap") 9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap") f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap") 4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap") 13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap") 77435322777d ("fs: port ->get_acl() to pass mnt_idmap") 011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap") 5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap") c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap") 7a77db95511c ("fs: port ->symlink() to pass mnt_idmap") 6c960e68aaed ("fs: port ->create() to pass mnt_idmap") b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap") c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap") abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap") 6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001 From: Kees Cook <kees(a)kernel.org> Date: Thu, 8 Aug 2024 11:39:08 -0700 Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges. For example, if a file could change permissions from executable and not set-id: ---------x 1 root root 16048 Aug 7 13:16 target to set-id and non-executable: ---S------ 1 root root 16048 Aug 7 13:16 target it is possible to gain root privileges when execution should have been disallowed. While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root". Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal. Reported-by: Marco Vanotti <mvanotti(a)google.com> Tested-by: Marco Vanotti <mvanotti(a)google.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org Cc: Eric Biederman <ebiederm(a)xmission.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Signed-off-by: Kees Cook <kees(a)kernel.org> diff --git a/fs/exec.c b/fs/exec.c index a126e3d1cacb..50e76cc633c4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) unsigned int mode; vfsuid_t vfsuid; vfsgid_t vfsgid; + int err; if (!mnt_may_suid(file->f_path.mnt)) return; @@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) /* Be careful if suid/sgid is set */ inode_lock(inode); - /* reload atomically mode/uid/gid now that lock held */ + /* Atomically reload and check mode/uid/gid now that lock held. */ mode = inode->i_mode; vfsuid = i_uid_into_vfsuid(idmap, inode); vfsgid = i_gid_into_vfsgid(idmap, inode); + err = inode_permission(idmap, inode, MAY_EXEC); inode_unlock(inode); + /* Did the exec bit vanish out from under us? Give up. */ + if (err) + return; + /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) || !vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))

1 year, 4 months

1
0
0 0

FAILED: patch "[PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081405-fender-shortcut-a18f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage") e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap") 9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap") f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap") 4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap") 13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap") 77435322777d ("fs: port ->get_acl() to pass mnt_idmap") 011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap") 5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap") c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap") 7a77db95511c ("fs: port ->symlink() to pass mnt_idmap") 6c960e68aaed ("fs: port ->create() to pass mnt_idmap") b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap") c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap") abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap") 6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001 From: Kees Cook <kees(a)kernel.org> Date: Thu, 8 Aug 2024 11:39:08 -0700 Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges. For example, if a file could change permissions from executable and not set-id: ---------x 1 root root 16048 Aug 7 13:16 target to set-id and non-executable: ---S------ 1 root root 16048 Aug 7 13:16 target it is possible to gain root privileges when execution should have been disallowed. While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root". Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal. Reported-by: Marco Vanotti <mvanotti(a)google.com> Tested-by: Marco Vanotti <mvanotti(a)google.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org Cc: Eric Biederman <ebiederm(a)xmission.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Signed-off-by: Kees Cook <kees(a)kernel.org> diff --git a/fs/exec.c b/fs/exec.c index a126e3d1cacb..50e76cc633c4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) unsigned int mode; vfsuid_t vfsuid; vfsgid_t vfsgid; + int err; if (!mnt_may_suid(file->f_path.mnt)) return; @@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) /* Be careful if suid/sgid is set */ inode_lock(inode); - /* reload atomically mode/uid/gid now that lock held */ + /* Atomically reload and check mode/uid/gid now that lock held. */ mode = inode->i_mode; vfsuid = i_uid_into_vfsuid(idmap, inode); vfsgid = i_gid_into_vfsgid(idmap, inode); + err = inode_permission(idmap, inode, MAY_EXEC); inode_unlock(inode); + /* Did the exec bit vanish out from under us? Give up. */ + if (err) + return; + /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) || !vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))

1 year, 4 months

1
0
0 0

FAILED: patch "[PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x f50733b45d865f91db90919f8311e2127ce5a0cb # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081458-grumbly-glance-dff9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: f50733b45d86 ("exec: Fix ToCToU between perm check and set-uid/gid usage") e67fe63341b8 ("fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap") 9452e93e6dae ("fs: port privilege checking helpers to mnt_idmap") f2d40141d5d9 ("fs: port inode_init_owner() to mnt_idmap") 4609e1f18e19 ("fs: port ->permission() to pass mnt_idmap") 13e83a4923be ("fs: port ->set_acl() to pass mnt_idmap") 77435322777d ("fs: port ->get_acl() to pass mnt_idmap") 011e2b717b1b ("fs: port ->tmpfile() to pass mnt_idmap") 5ebb29bee8d5 ("fs: port ->mknod() to pass mnt_idmap") c54bd91e9eab ("fs: port ->mkdir() to pass mnt_idmap") 7a77db95511c ("fs: port ->symlink() to pass mnt_idmap") 6c960e68aaed ("fs: port ->create() to pass mnt_idmap") b74d24f7a74f ("fs: port ->getattr() to pass mnt_idmap") c1632a0f1120 ("fs: port ->setattr() to pass mnt_idmap") abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap") 6022ec6ee2c3 ("Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f50733b45d865f91db90919f8311e2127ce5a0cb Mon Sep 17 00:00:00 2001 From: Kees Cook <kees(a)kernel.org> Date: Thu, 8 Aug 2024 11:39:08 -0700 Subject: [PATCH] exec: Fix ToCToU between perm check and set-uid/gid usage When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges. For example, if a file could change permissions from executable and not set-id: ---------x 1 root root 16048 Aug 7 13:16 target to set-id and non-executable: ---S------ 1 root root 16048 Aug 7 13:16 target it is possible to gain root privileges when execution should have been disallowed. While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root". Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal. Reported-by: Marco Vanotti <mvanotti(a)google.com> Tested-by: Marco Vanotti <mvanotti(a)google.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org Cc: Eric Biederman <ebiederm(a)xmission.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Signed-off-by: Kees Cook <kees(a)kernel.org> diff --git a/fs/exec.c b/fs/exec.c index a126e3d1cacb..50e76cc633c4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1692,6 +1692,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) unsigned int mode; vfsuid_t vfsuid; vfsgid_t vfsgid; + int err; if (!mnt_may_suid(file->f_path.mnt)) return; @@ -1708,12 +1709,17 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) /* Be careful if suid/sgid is set */ inode_lock(inode); - /* reload atomically mode/uid/gid now that lock held */ + /* Atomically reload and check mode/uid/gid now that lock held. */ mode = inode->i_mode; vfsuid = i_uid_into_vfsuid(idmap, inode); vfsgid = i_gid_into_vfsgid(idmap, inode); + err = inode_permission(idmap, inode, MAY_EXEC); inode_unlock(inode); + /* Did the exec bit vanish out from under us? Give up. */ + if (err) + return; + /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!vfsuid_has_mapping(bprm->cred->user_ns, vfsuid) || !vfsgid_has_mapping(bprm->cred->user_ns, vfsgid))

1 year, 4 months

1
0
0 0

[PATCH net-next 7/9] idpf: fix netdev Tx queue stop/wake

by Tony Nguyen

From: Michal Kubiak <michal.kubiak(a)intel.com> netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable(a)vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak(a)intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> --- .../ethernet/intel/idpf/idpf_singleq_txrx.c | 4 +++ drivers/net/ethernet/intel/idpf/idpf_txrx.c | 35 +++++-------------- drivers/net/ethernet/intel/idpf/idpf_txrx.h | 9 ++++- 3 files changed, 21 insertions(+), 27 deletions(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c index 947d3ff9677c..5ba360abbe66 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c +++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c @@ -375,6 +375,10 @@ netdev_tx_t idpf_tx_singleq_frame(struct sk_buff *skb, IDPF_TX_DESCS_FOR_CTX)) { idpf_tx_buf_hw_update(tx_q, tx_q->next_to_use, false); + u64_stats_update_begin(&tx_q->stats_sync); + u64_stats_inc(&tx_q->q_stats.q_busy); + u64_stats_update_end(&tx_q->stats_sync); + return NETDEV_TX_BUSY; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c index c1df4341e9ab..60f875decd8a 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c @@ -2128,29 +2128,6 @@ void idpf_tx_splitq_build_flow_desc(union idpf_tx_flex_desc *desc, desc->flow.qw1.compl_tag = cpu_to_le16(params->compl_tag); } -/** - * idpf_tx_maybe_stop_common - 1st level check for common Tx stop conditions - * @tx_q: the queue to be checked - * @size: number of descriptors we want to assure is available - * - * Returns 0 if stop is not needed - */ -int idpf_tx_maybe_stop_common(struct idpf_tx_queue *tx_q, unsigned int size) -{ - struct netdev_queue *nq; - - if (likely(IDPF_DESC_UNUSED(tx_q) >= size)) - return 0; - - u64_stats_update_begin(&tx_q->stats_sync); - u64_stats_inc(&tx_q->q_stats.q_busy); - u64_stats_update_end(&tx_q->stats_sync); - - nq = netdev_get_tx_queue(tx_q->netdev, tx_q->idx); - - return netif_txq_maybe_stop(nq, IDPF_DESC_UNUSED(tx_q), size, size); -} - /** * idpf_tx_maybe_stop_splitq - 1st level check for Tx splitq stop conditions * @tx_q: the queue to be checked @@ -2162,7 +2139,7 @@ static int idpf_tx_maybe_stop_splitq(struct idpf_tx_queue *tx_q, unsigned int descs_needed) { if (idpf_tx_maybe_stop_common(tx_q, descs_needed)) - goto splitq_stop; + goto out; /* If there are too many outstanding completions expected on the * completion queue, stop the TX queue to give the device some time to @@ -2181,10 +2158,12 @@ static int idpf_tx_maybe_stop_splitq(struct idpf_tx_queue *tx_q, return 0; splitq_stop: + netif_stop_subqueue(tx_q->netdev, tx_q->idx); + +out: u64_stats_update_begin(&tx_q->stats_sync); u64_stats_inc(&tx_q->q_stats.q_busy); u64_stats_update_end(&tx_q->stats_sync); - netif_stop_subqueue(tx_q->netdev, tx_q->idx); return -EBUSY; } @@ -2207,7 +2186,11 @@ void idpf_tx_buf_hw_update(struct idpf_tx_queue *tx_q, u32 val, nq = netdev_get_tx_queue(tx_q->netdev, tx_q->idx); tx_q->next_to_use = val; - idpf_tx_maybe_stop_common(tx_q, IDPF_TX_DESC_NEEDED); + if (idpf_tx_maybe_stop_common(tx_q, IDPF_TX_DESC_NEEDED)) { + u64_stats_update_begin(&tx_q->stats_sync); + u64_stats_inc(&tx_q->q_stats.q_busy); + u64_stats_update_end(&tx_q->stats_sync); + } /* Force memory writes to complete before letting h/w * know there are new descriptors to fetch. (Only diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h index 2478f71adb95..df3574ac58c2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h @@ -1020,7 +1020,6 @@ void idpf_tx_dma_map_error(struct idpf_tx_queue *txq, struct sk_buff *skb, struct idpf_tx_buf *first, u16 ring_idx); unsigned int idpf_tx_desc_count_required(struct idpf_tx_queue *txq, struct sk_buff *skb); -int idpf_tx_maybe_stop_common(struct idpf_tx_queue *tx_q, unsigned int size); void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue); netdev_tx_t idpf_tx_singleq_frame(struct sk_buff *skb, struct idpf_tx_queue *tx_q); @@ -1029,4 +1028,12 @@ bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_rx_queue *rxq, u16 cleaned_count); int idpf_tso(struct sk_buff *skb, struct idpf_tx_offload_params *off); +static inline bool idpf_tx_maybe_stop_common(struct idpf_tx_queue *tx_q, + u32 needed) +{ + return !netif_subqueue_maybe_stop(tx_q->netdev, tx_q->idx, + IDPF_DESC_UNUSED(tx_q), + needed, needed); +} + #endif /* !_IDPF_TXRX_H_ */ -- 2.42.0

1 year, 4 months

1
0
0 0

[PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking

by David Hildenbrand

We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD. Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock. However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. This issue can be reproduced [1], for example triggering: [ 3105.936100] ------------[ cut here ]------------ [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188 [ 3105.944634] Modules linked in: [...] [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1 [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024 [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3105.991108] pc : try_grab_folio+0x11c/0x188 [ 3105.994013] lr : follow_page_pte+0xd8/0x430 [ 3105.996986] sp : ffff80008eafb8f0 [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48 [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000 [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0 [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080 [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 [ 3106.047957] Call trace: [ 3106.049522] try_grab_folio+0x11c/0x188 [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 [ 3106.055527] follow_page_mask+0x1a0/0x2b8 [ 3106.058118] __get_user_pages+0xf0/0x348 [ 3106.060647] faultin_page_range+0xb0/0x360 [ 3106.063651] do_madvise+0x340/0x598 Let's make huge_pte_lockptr() effectively use the same PT locks as any core-mm page table walker would. Add ptep_lockptr() to obtain the PTE page table lock using a pte pointer -- unfortunately we cannot convert pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables we can have with CONFIG_HIGHPTE. Take care of PTE tables possibly spanning multiple pages, and take care of CONFIG_PGTABLE_LEVELS complexity when e.g., PMD_SIZE == PUD_SIZE. For example, with CONFIG_PGTABLE_LEVELS == 2, core-mm would detect with hugepagesize==PMD_SIZE pmd_leaf() and use the pmd_lockptr(), which would end up just mapping to the per-MM PT lock. There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb folio being mapped using two PTE page tables. While hugetlb wants to take the PMD table lock, core-mm would grab the PTE table lock of one of both PTE page tables. In such corner cases, we have to make sure that both locks match, which is (fortunately!) currently guaranteed for 8xx as it does not support SMP and consequently doesn't use split PT locks. [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/ Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code") Reviewed-by: James Houghton <jthoughton(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- Third time is the charm? Retested on arm64 and x86-64. Cross-compiled on a bunch of others. v2 -> v3: * Handle CONFIG_PGTABLE_LEVELS oddities as good as possible. It's a mess. Remove the size >= P4D_SIZE check and simply default to the &mm->page_table_lock. * Align the PTE pointer to the start of the page table to handle PTE page tables bigger than a single page (unclear if this could currently trigger). * Extend patch description v1 -> 2: * Extend patch description * Drop "mm: let pte_lockptr() consume a pte_t pointer" * Introduce ptep_lockptr() in this patch --- include/linux/hugetlb.h | 27 +++++++++++++++++++++++++-- include/linux/mm.h | 22 ++++++++++++++++++++++ 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c9bf68c239a01..e6437a06e2346 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -944,9 +944,32 @@ static inline bool htlb_allow_alloc_fallback(int reason) static inline spinlock_t *huge_pte_lockptr(struct hstate *h, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + unsigned long size = huge_page_size(h); + + VM_WARN_ON(size == PAGE_SIZE); + + /* + * hugetlb must use the exact same PT locks as core-mm page table + * walkers would. When modifying a PTE table, hugetlb must take the + * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD + * PT lock etc. + * + * The expectation is that any hugetlb folio smaller than a PMD is + * always mapped into a single PTE table and that any hugetlb folio + * smaller than a PUD (but at least as big as a PMD) is always mapped + * into a single PMD table. + * + * If that does not hold for an architecture, then that architecture + * must disable split PT locks such that all *_lockptr() functions + * will give us the same result: the per-MM PT lock. + */ + if (size < PMD_SIZE && !IS_ENABLED(CONFIG_HIGHPTE)) + /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */ + return ptep_lockptr(mm, pte); + else if (size < PUD_SIZE || CONFIG_PGTABLE_LEVELS == 2) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); + else if (size < P4D_SIZE || CONFIG_PGTABLE_LEVELS == 3) + return pud_lockptr(mm, (pud_t *) pte); return &mm->page_table_lock; } diff --git a/include/linux/mm.h b/include/linux/mm.h index b100df8cb5857..f6c7fe8f5746f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2926,6 +2926,24 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); } +static inline struct page *ptep_pgtable_page(pte_t *pte) +{ + unsigned long mask = ~(PTRS_PER_PTE * sizeof(pte_t) - 1); + + BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE)); + return virt_to_page((void *)((unsigned long)pte & mask)); +} + +static inline struct ptdesc *ptep_ptdesc(pte_t *pte) +{ + return page_ptdesc(ptep_pgtable_page(pte)); +} + +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) +{ + return ptlock_ptr(ptep_ptdesc(pte)); +} + static inline bool ptlock_init(struct ptdesc *ptdesc) { /* @@ -2950,6 +2968,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) { return &mm->page_table_lock; } +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) +{ + return &mm->page_table_lock; +} static inline void ptlock_cache_init(void) {} static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void ptlock_free(struct ptdesc *ptdesc) {} -- 2.45.2

1 year, 4 months

4
10
0 0

[PATCH v5 bpf-next 01/10] lib/buildid: harden build ID parsing logic

by Andrii Nakryiko

Harden build ID parsing logic, adding explicit READ_ONCE() where it's important to have a consistent value read and validated just once. Also, as pointed out by Andi Kleen, we need to make sure that entire ELF note is within a page bounds, so move the overflow check up and add an extra note_size boundaries validation. Fixes tag below points to the code that moved this code into lib/buildid.c, and then subsequently was used in perf subsystem, making this code exposed to perf_event_open() users in v5.12+. Cc: stable(a)vger.kernel.org Cc: Jann Horn <jannh(a)google.com> Suggested-by: Andi Kleen <ak(a)linux.intel.com> Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib") Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org> --- lib/buildid.c | 74 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 43 insertions(+), 31 deletions(-) diff --git a/lib/buildid.c b/lib/buildid.c index e02b5507418b..61bc5b767064 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -18,31 +18,37 @@ static int parse_build_id_buf(unsigned char *build_id, const void *note_start, Elf32_Word note_size) { - Elf32_Word note_offs = 0, new_offs; - - while (note_offs + sizeof(Elf32_Nhdr) < note_size) { - Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs); + const char note_name[] = "GNU"; + const size_t note_name_sz = sizeof(note_name); + u64 note_off = 0, new_off, name_sz, desc_sz; + const char *data; + + while (note_off + sizeof(Elf32_Nhdr) < note_size && + note_off + sizeof(Elf32_Nhdr) > note_off /* overflow */) { + Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_off); + + name_sz = READ_ONCE(nhdr->n_namesz); + desc_sz = READ_ONCE(nhdr->n_descsz); + + new_off = note_off + sizeof(Elf32_Nhdr); + if (check_add_overflow(new_off, ALIGN(name_sz, 4), &new_off) || + check_add_overflow(new_off, ALIGN(desc_sz, 4), &new_off) || + new_off > note_size) + break; if (nhdr->n_type == BUILD_ID && - nhdr->n_namesz == sizeof("GNU") && - !strcmp((char *)(nhdr + 1), "GNU") && - nhdr->n_descsz > 0 && - nhdr->n_descsz <= BUILD_ID_SIZE_MAX) { - memcpy(build_id, - note_start + note_offs + - ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr), - nhdr->n_descsz); - memset(build_id + nhdr->n_descsz, 0, - BUILD_ID_SIZE_MAX - nhdr->n_descsz); + name_sz == note_name_sz && + strcmp((char *)(nhdr + 1), note_name) == 0 && + desc_sz > 0 && desc_sz <= BUILD_ID_SIZE_MAX) { + data = note_start + note_off + ALIGN(note_name_sz, 4); + memcpy(build_id, data, desc_sz); + memset(build_id + desc_sz, 0, BUILD_ID_SIZE_MAX - desc_sz); if (size) - *size = nhdr->n_descsz; + *size = desc_sz; return 0; } - new_offs = note_offs + sizeof(Elf32_Nhdr) + - ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4); - if (new_offs <= note_offs) /* overflow */ - break; - note_offs = new_offs; + + note_off = new_off; } return -EINVAL; @@ -71,7 +77,7 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id, { Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr; Elf32_Phdr *phdr; - int i; + __u32 i, phnum; /* * FIXME @@ -80,9 +86,10 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id, */ if (ehdr->e_phoff != sizeof(Elf32_Ehdr)) return -EINVAL; + + phnum = READ_ONCE(ehdr->e_phnum); /* only supports phdr that fits in one page */ - if (ehdr->e_phnum > - (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) + if (phnum > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) return -EINVAL; phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr)); @@ -90,8 +97,8 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id, for (i = 0; i < ehdr->e_phnum; ++i) { if (phdr[i].p_type == PT_NOTE && !parse_build_id(page_addr, build_id, size, - page_addr + phdr[i].p_offset, - phdr[i].p_filesz)) + page_addr + READ_ONCE(phdr[i].p_offset), + READ_ONCE(phdr[i].p_filesz))) return 0; } return -EINVAL; @@ -103,7 +110,7 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id, { Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr; Elf64_Phdr *phdr; - int i; + __u32 i, phnum; /* * FIXME @@ -112,18 +119,19 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id, */ if (ehdr->e_phoff != sizeof(Elf64_Ehdr)) return -EINVAL; + + phnum = READ_ONCE(ehdr->e_phnum); /* only supports phdr that fits in one page */ - if (ehdr->e_phnum > - (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) + if (phnum > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) return -EINVAL; phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr)); - for (i = 0; i < ehdr->e_phnum; ++i) { + for (i = 0; i < phnum; ++i) { if (phdr[i].p_type == PT_NOTE && !parse_build_id(page_addr, build_id, size, - page_addr + phdr[i].p_offset, - phdr[i].p_filesz)) + page_addr + READ_ONCE(phdr[i].p_offset), + READ_ONCE(phdr[i].p_filesz))) return 0; } return -EINVAL; @@ -152,6 +160,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, page = find_get_page(vma->vm_file->f_mapping, 0); if (!page) return -EFAULT; /* page not mapped */ + if (!PageUptodate(page)) { + put_page(page); + return -EFAULT; + } ret = -EINVAL; page_addr = kmap_local_page(page); -- 2.43.5

1 year, 4 months

4
6
0 0

[PATCH] drm/xe: prevent UAF around preempt fence

by Matthew Auld

The fence lock is part of the queue, therefore in the current design anything locking the fence should then also hold a ref to the queue to prevent the queue from being freed. However, currently it looks like we signal the fence and then drop the queue ref, but if something is waiting on the fence, the waiter is kicked to wake up at some later point, where upon waking up it first grabs the lock before checking the fence state. But if we have already dropped the queue ref, then the lock might already be freed as part of the queue, leading to uaf. To prevent this, move the fence lock into the fence itself so we don't run into lifetime issues. Alternative might be to have device level lock, or only release the queue in the fence release callback, however that might require pushing to another worker to avoid locking issues. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2454 References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2342 References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2020 Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.8+ --- drivers/gpu/drm/xe/xe_exec_queue.c | 1 - drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 -- drivers/gpu/drm/xe/xe_preempt_fence.c | 3 ++- drivers/gpu/drm/xe/xe_preempt_fence_types.h | 2 ++ 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 971e1234b8ea..0f610d273fb6 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -614,7 +614,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data, if (xe_vm_in_preempt_fence_mode(vm)) { q->lr.context = dma_fence_context_alloc(1); - spin_lock_init(&q->lr.lock); err = xe_vm_add_compute_exec_queue(vm, q); if (XE_IOCTL_DBG(xe, err)) diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h index 1408b02eea53..fc2a1a20b7e4 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h @@ -126,8 +126,6 @@ struct xe_exec_queue { u32 seqno; /** @lr.link: link into VM's list of exec queues */ struct list_head link; - /** @lr.lock: preemption fences lock */ - spinlock_t lock; } lr; /** @ops: submission backend exec queue operations */ diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c index 56e709d2fb30..83fbeea5aa20 100644 --- a/drivers/gpu/drm/xe/xe_preempt_fence.c +++ b/drivers/gpu/drm/xe/xe_preempt_fence.c @@ -134,8 +134,9 @@ xe_preempt_fence_arm(struct xe_preempt_fence *pfence, struct xe_exec_queue *q, { list_del_init(&pfence->link); pfence->q = xe_exec_queue_get(q); + spin_lock_init(&pfence->lock); dma_fence_init(&pfence->base, &preempt_fence_ops, - &q->lr.lock, context, seqno); + &pfence->lock, context, seqno); return &pfence->base; } diff --git a/drivers/gpu/drm/xe/xe_preempt_fence_types.h b/drivers/gpu/drm/xe/xe_preempt_fence_types.h index b54b5c29b533..312c3372a49f 100644 --- a/drivers/gpu/drm/xe/xe_preempt_fence_types.h +++ b/drivers/gpu/drm/xe/xe_preempt_fence_types.h @@ -25,6 +25,8 @@ struct xe_preempt_fence { struct xe_exec_queue *q; /** @preempt_work: work struct which issues preemption */ struct work_struct preempt_work; + /** @lock: dma-fence fence lock */ + spinlock_t lock; /** @error: preempt fence is in error state */ int error; }; -- 2.46.0

1 year, 4 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2024