October 2024 - Linux-stable-mirror

[PATCH 5.15.y 5.10.y 5.4.y] wifi: mac80211: fix potential key use-after-free

by Sherry Yang

From: Johannes Berg <johannes.berg(a)intel.com> [ Upstream commit 31db78a4923ef5e2008f2eed321811ca79e7f71b ] When ieee80211_key_link() is called by ieee80211_gtk_rekey_add() but returns 0 due to KRACK protection (identical key reinstall), ieee80211_gtk_rekey_add() will still return a pointer into the key, in a potential use-after-free. This normally doesn't happen since it's only called by iwlwifi in case of WoWLAN rekey offload which has its own KRACK protection, but still better to fix, do that by returning an error code and converting that to success on the cfg80211 boundary only, leaving the error for bad callers of ieee80211_gtk_rekey_add(). Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> [Sherry: bp to fix CVE-2023-52530, resolved minor conflicts in net/mac80211/cfg.c because of context change due to missing commit 23a5f0af6ff4 ("wifi: mac80211: remove cipher scheme support") ccdde7c74ffd ("wifi: mac80211: properly implement MLO key handling")] Signed-off-by: Sherry Yang <sherry.yang(a)oracle.com> --- net/mac80211/cfg.c | 3 +++ net/mac80211/key.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index f652982a106b..c54b3be62c0a 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -511,6 +511,9 @@ static int ieee80211_add_key(struct wiphy *wiphy, struct net_device *dev, sta->cipher_scheme = cs; err = ieee80211_key_link(key, sdata, sta); + /* KRACK protection, shouldn't happen but just silently accept key */ + if (err == -EALREADY) + err = 0; out_unlock: mutex_unlock(&local->sta_mtx); diff --git a/net/mac80211/key.c b/net/mac80211/key.c index f695fc80088b..7b427e39831b 100644 --- a/net/mac80211/key.c +++ b/net/mac80211/key.c @@ -843,7 +843,7 @@ int ieee80211_key_link(struct ieee80211_key *key, */ if (ieee80211_key_identical(sdata, old_key, key)) { ieee80211_key_free_unused(key); - ret = 0; + ret = -EALREADY; goto out; } -- 2.46.0

1 year, 2 months

2
1
0 0

[PATCH 6.1.y] nvme: fix metadata handling in nvme-passthrough

by Puranjay Mohan

[ Upstream commit 7c2fd76048e95dd267055b5f5e0a48e6e7c81fd9 ] On an NVMe namespace that does not support metadata, it is possible to send an IO command with metadata through io-passthru. This allows issues like [1] to trigger in the completion code path. nvme_map_user_request() doesn't check if the namespace supports metadata before sending it forward. It also allows admin commands with metadata to be processed as it ignores metadata when bdev == NULL and may report success. Reject an IO command with metadata when the NVMe namespace doesn't support it and reject an admin command if it has metadata. [1] https://lore.kernel.org/all/mb61pcylvnym8.fsf@amazon.com/ Suggested-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me> Reviewed-by: Anuj Gupta <anuj20.g(a)samsung.com> Signed-off-by: Keith Busch <kbusch(a)kernel.org> [ Minor changes to make it work on 6.1 ] Signed-off-by: Puranjay Mohan <pjy(a)amazon.com> --- drivers/nvme/host/ioctl.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index b3e322e4ade38..a02873792890e 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -3,6 +3,7 @@ * Copyright (c) 2011-2014, Intel Corporation. * Copyright (c) 2017-2021 Christoph Hellwig. */ +#include <linux/blk-integrity.h> #include <linux/ptrace.h> /* for force_successful_syscall_return */ #include <linux/nvme_ioctl.h> #include <linux/io_uring.h> @@ -95,10 +96,15 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer, struct request_queue *q = req->q; struct nvme_ns *ns = q->queuedata; struct block_device *bdev = ns ? ns->disk->part0 : NULL; + bool supports_metadata = bdev && blk_get_integrity(bdev->bd_disk); + bool has_metadata = meta_buffer && meta_len; struct bio *bio = NULL; void *meta = NULL; int ret; + if (has_metadata && !supports_metadata) + return -EINVAL; + if (ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED)) { struct iov_iter iter; @@ -122,7 +128,7 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer, if (bdev) bio_set_dev(bio, bdev); - if (bdev && meta_buffer && meta_len) { + if (has_metadata) { meta = nvme_add_user_metadata(req, meta_buffer, meta_len, meta_seed); if (IS_ERR(meta)) { -- 2.40.1

1 year, 2 months

2
1
0 0

[PATCH stable 6.1 0/2] devlink: Fix RCU stall when unregistering a devlink instance

by Ido Schimmel

Upstream commit c2368b19807a ("net: devlink: introduce "unregistering" mark and use it during devlinks iteration") in v6.0 introduced a race when unregistering a devlink instance that can result in RCU stalls and in the system completely locking up. Exact details and reproducer can be found here [1]. The bug was inadvertently fixed in v6.3 by upstream commit d77278196441 ("devlink: bump the instance index directly when iterating"). This patchset fixes the bug by backporting the second commit and a related dependency from v6.3 to v6.1.y while adjusting them to the devlink file structure in v6.1.y (net/devlink/{core.c,devl_internal.h} -> net/devlink/leftover.c). Tested by running the devlink tests under tools/testing/selftests/drivers/net/netdevsim/ and the reproducer mentioned in [1]. [1] https://lore.kernel.org/stable/20241001112035.973187-1-idosch@nvidia.com/ Jakub Kicinski (2): devlink: drop the filter argument from devlinks_xa_find_get devlink: bump the instance index directly when iterating net/devlink/leftover.c | 40 ++++++++++------------------------------ 1 file changed, 10 insertions(+), 30 deletions(-) -- 2.47.0

1 year, 2 months

2
3
0 0

FAILED: patch "[PATCH] secretmem: disable memfd_secret() if arch cannot set direct" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 532b53cebe58f34ce1c0f34d866f5c0e335c53c6 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101412-prowling-snowflake-9fe0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: 532b53cebe58 ("secretmem: disable memfd_secret() if arch cannot set direct map") f7c5b1aab5ef ("mm/secretmem: remove reduntant return value") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 532b53cebe58f34ce1c0f34d866f5c0e335c53c6 Mon Sep 17 00:00:00 2001 From: Patrick Roy <roypat(a)amazon.co.uk> Date: Tue, 1 Oct 2024 09:00:41 +0100 Subject: [PATCH] secretmem: disable memfd_secret() if arch cannot set direct map Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map(). This is the case for example on some arm64 configurations, where marking 4k PTEs in the direct map not present can only be done if the direct map is set up at 4k granularity in the first place (as ARM's break-before-make semantics do not easily allow breaking apart large/gigantic pages). More precisely, on arm64 systems with !can_set_direct_map(), set_direct_map_invalid_noflush() is a no-op, however it returns success (0) instead of an error. This means that memfd_secret will seemingly "work" (e.g. syscall succeeds, you can mmap the fd and fault in pages), but it does not actually achieve its goal of removing its memory from the direct map. Note that with this patch, memfd_secret() will start erroring on systems where can_set_direct_map() returns false (arm64 with CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and CONFIG_KFENCE=n), but that still seems better than the current silent failure. Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most arm64 systems actually have a working memfd_secret() and aren't be affected. From going through the iterations of the original memfd_secret patch series, it seems that disabling the syscall in these scenarios was the intended behavior [1] (preferred over having set_direct_map_invalid_noflush return an error as that would result in SIGBUSes at page-fault time), however the check for it got dropped between v16 [2] and v17 [3], when secretmem moved away from CMA allocations. [1]: https://lore.kernel.org/lkml/20201124164930.GK8537@kernel.org/ [2]: https://lore.kernel.org/lkml/20210121122723.3446-11-rppt@kernel.org/#t [3]: https://lore.kernel.org/lkml/20201125092208.12544-10-rppt@kernel.org/ Link: https://lkml.kernel.org/r/20241001080056.784735-1-roypat@amazon.co.uk Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Patrick Roy <roypat(a)amazon.co.uk> Reviewed-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Cc: Alexander Graf <graf(a)amazon.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: James Gowans <jgowans(a)amazon.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/secretmem.c b/mm/secretmem.c index 3afb5ad701e1..399552814fd0 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -238,7 +238,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags) /* make sure local flags do not confict with global fcntl.h */ BUILD_BUG_ON(SECRETMEM_FLAGS_MASK & O_CLOEXEC); - if (!secretmem_enable) + if (!secretmem_enable || !can_set_direct_map()) return -ENOSYS; if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) @@ -280,7 +280,7 @@ static struct file_system_type secretmem_fs = { static int __init secretmem_init(void) { - if (!secretmem_enable) + if (!secretmem_enable || !can_set_direct_map()) return 0; secretmem_mnt = kern_mount(&secretmem_fs);

1 year, 2 months

3
4
0 0

[PATCH 6.6 00/21] xfs backports for 6.6.y (from 6.10)

by Catherine Hoang

Hello, This series contains backports for 6.6 from the 6.10 release. This patchset has gone through xfs testing and review. Christoph Hellwig (4): xfs: fix error returns from xfs_bmapi_write xfs: fix xfs_bmap_add_extent_delay_real for partial conversions xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent xfs: fix freeing speculative preallocations for preallocated files Darrick J. Wong (11): xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2 xfs: fix missing check for invalid attr flags xfs: check shortform attr entry flags specifically xfs: validate recovered name buffers when recovering xattr items xfs: enforce one namespace per attribute xfs: revert commit 44af6c7e59b12 xfs: use dontcache for grabbing inodes during scrub xfs: allow symlinks with short remote targets xfs: restrict when we try to align cow fork delalloc to cowextsz hints xfs: allow unlinked symlinks and dirs with zero size Dave Chinner (1): xfs: fix unlink vs cluster buffer instantiation race Wengang Wang (1): xfs: make sure sb_fdblocks is non-negative Zhang Yi (4): xfs: match lock mode in xfs_buffered_write_iomap_begin() xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset xfs: convert delayed extents to unwritten when zeroing post eof blocks fs/xfs/libxfs/xfs_attr.c | 11 +++ fs/xfs/libxfs/xfs_attr.h | 4 +- fs/xfs/libxfs/xfs_attr_leaf.c | 6 +- fs/xfs/libxfs/xfs_attr_remote.c | 1 - fs/xfs/libxfs/xfs_bmap.c | 130 ++++++++++++++++++++++++++------ fs/xfs/libxfs/xfs_da_btree.c | 20 ++--- fs/xfs/libxfs/xfs_da_format.h | 5 ++ fs/xfs/libxfs/xfs_inode_buf.c | 47 ++++++++++-- fs/xfs/libxfs/xfs_sb.c | 7 +- fs/xfs/scrub/attr.c | 47 +++++++----- fs/xfs/scrub/common.c | 12 +-- fs/xfs/scrub/scrub.h | 7 ++ fs/xfs/xfs_aops.c | 54 ++++--------- fs/xfs/xfs_attr_item.c | 98 ++++++++++++++++++++---- fs/xfs/xfs_attr_list.c | 11 ++- fs/xfs/xfs_bmap_util.c | 61 +++++++++------ fs/xfs/xfs_bmap_util.h | 2 +- fs/xfs/xfs_dquot.c | 1 - fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 37 +++++---- fs/xfs/xfs_iomap.c | 81 +++++++++++--------- fs/xfs/xfs_reflink.c | 20 ----- fs/xfs/xfs_rtalloc.c | 2 - 23 files changed, 433 insertions(+), 233 deletions(-) -- 2.39.3

1 year, 2 months

2
22
0 0

FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 2b0f922323ccfa76219bcaacd35cd50aeaa1359 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101841-keep-coma-4963@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 2b0f922323ccfa76219bcaacd35cd50aeaa13592 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david(a)redhat.com> Date: Fri, 11 Oct 2024 12:24:45 +0200 Subject: [PATCH] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma We (or rather, readahead logic :) ) might be allocating a THP in the pagecache and then try mapping it into a process that explicitly disabled THP: we might end up installing PMD mappings. This is a problem for s390x KVM, which explicitly remaps all PMD-mapped THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before starting the VM. For example, starting a VM backed on a file system with large folios supported makes the VM crash when the VM tries accessing such a mapping using KVM. Is it also a problem when the HW disabled THP using TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case without X86_FEATURE_PSE. In the future, we might be able to do better on s390x and only disallow PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED really wants. For now, fix it by essentially performing the same check as would be done in __thp_vma_allowable_orders() or in shmem code, where this works as expected, and disallow PMD mappings, making us fallback to PTE mappings. Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Leo Fu <bfu(a)redhat.com> Tested-by: Thomas Huth <thuth(a)redhat.com> Cc: Thomas Huth <thuth(a)redhat.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com> Cc: Janosch Frank <frankja(a)linux.ibm.com> Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index c0869a962ddd..30feedabc932 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4920,6 +4920,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) pmd_t entry; vm_fault_t ret = VM_FAULT_FALLBACK; + /* + * It is too late to allocate a small folio, we already have a large + * folio in the pagecache: especially s390 KVM cannot tolerate any + * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any + * PMD mappings if THPs are disabled. + */ + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) + return ret; + if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) return ret;

1 year, 2 months

1
0
0 0

FAILED: patch "[PATCH] mm: huge_memory: add vma_thp_disabled() and" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 963756aac1f011d904ddd9548ae82286d3a91f96 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101848-lucid-mountain-2cdf@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 963756aac1f011d904ddd9548ae82286d3a91f96 Mon Sep 17 00:00:00 2001 From: Kefeng Wang <wangkefeng.wang(a)huawei.com> Date: Fri, 11 Oct 2024 12:24:44 +0200 Subject: [PATCH] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma". During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit. For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate. This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it. Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma. This patch (of 2): Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders(). [david(a)redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Leo Fu <bfu(a)redhat.com> Tested-by: Thomas Huth <thuth(a)redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Boqiao Fu <bfu(a)redhat.com> Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com> Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Janosch Frank <frankja(a)linux.ibm.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 67d0ab3c3bba..ef5b80e48599 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -322,6 +322,24 @@ struct thpsize { (transparent_hugepage_flags & \ (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG)) +static inline bool vma_thp_disabled(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + /* + * Explicitly disabled through madvise or prctl, or some + * architectures may disable THP for some mappings, for + * example, s390 kvm. + */ + return (vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags); +} + +static inline bool thp_disabled_by_hw(void) +{ + /* If the hardware/firmware marked hugepage support disabled. */ + return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED); +} + unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87b49ecc7b1e..2fb328880b50 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -109,18 +109,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, if (!vma->vm_mm) /* vdso */ return 0; - /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. - * */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0; /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/shmem.c b/mm/shmem.c index 4f11b5506363..c5adb987b23c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1664,12 +1664,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, loff_t i_size; int order; - if (vma && ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) - return 0; - - /* If the hardware/firmware marked hugepage support disabled. */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) return 0; global_huge = shmem_huge_global_enabled(inode, index, write_end,

1 year, 2 months

1
0
0 0

FAILED: patch "[PATCH] mm: huge_memory: add vma_thp_disabled() and" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 963756aac1f011d904ddd9548ae82286d3a91f96 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101842-flatness-osmosis-b08e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 963756aac1f011d904ddd9548ae82286d3a91f96 Mon Sep 17 00:00:00 2001 From: Kefeng Wang <wangkefeng.wang(a)huawei.com> Date: Fri, 11 Oct 2024 12:24:44 +0200 Subject: [PATCH] mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma". During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit. For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate. This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it. Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma. This patch (of 2): Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders(). [david(a)redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Leo Fu <bfu(a)redhat.com> Tested-by: Thomas Huth <thuth(a)redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Boqiao Fu <bfu(a)redhat.com> Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com> Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Janosch Frank <frankja(a)linux.ibm.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 67d0ab3c3bba..ef5b80e48599 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -322,6 +322,24 @@ struct thpsize { (transparent_hugepage_flags & \ (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG)) +static inline bool vma_thp_disabled(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + /* + * Explicitly disabled through madvise or prctl, or some + * architectures may disable THP for some mappings, for + * example, s390 kvm. + */ + return (vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags); +} + +static inline bool thp_disabled_by_hw(void) +{ + /* If the hardware/firmware marked hugepage support disabled. */ + return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED); +} + unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87b49ecc7b1e..2fb328880b50 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -109,18 +109,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, if (!vma->vm_mm) /* vdso */ return 0; - /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. - * */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0; /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/shmem.c b/mm/shmem.c index 4f11b5506363..c5adb987b23c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1664,12 +1664,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, loff_t i_size; int order; - if (vma && ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))) - return 0; - - /* If the hardware/firmware marked hugepage support disabled. */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) return 0; global_huge = shmem_huge_global_enabled(inode, index, write_end,

1 year, 2 months

1
0
0 0

FAILED: patch "[PATCH] mm/swapfile: skip HugeTLB pages for unuse_vma" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 7528c4fb1237512ee18049f852f014eba80bbe8d # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101858-rewire-vocation-c981@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7528c4fb1237512ee18049f852f014eba80bbe8d Mon Sep 17 00:00:00 2001 From: Liu Shixin <liushixin2(a)huawei.com> Date: Tue, 15 Oct 2024 09:45:21 +0800 Subject: [PATCH] mm/swapfile: skip HugeTLB pages for unuse_vma I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The problem can be reproduced by the following steps: 1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory. 2. Swapout the above anonymous memory. 3. run swapoff and we will get a bad pud error in kernel message: mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7) We can tell that pud_clear_bad is called by pud_none_or_clear_bad in unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never be freed because we lost it from page table. We can skip HugeTLB pages for unuse_vma to fix it. Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage") Signed-off-by: Liu Shixin <liushixin2(a)huawei.com> Acked-by: Muchun Song <muchun.song(a)linux.dev> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/swapfile.c b/mm/swapfile.c index eb782fcd5627..b0915f3fab31 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2313,7 +2313,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type) mmap_read_lock(mm); for_each_vma(vmi, vma) { - if (vma->anon_vma) { + if (vma->anon_vma && !is_vm_hugetlb_page(vma)) { ret = unuse_vma(vma, type); if (ret) break;

1 year, 2 months

1
0
0 0

FAILED: patch "[PATCH] mm: khugepaged: fix the arguments order in" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 37f0b47c5143c2957909ced44fc09ffb118c99f7 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101803-cage-smokiness-cb8b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 37f0b47c5143c2957909ced44fc09ffb118c99f7 Mon Sep 17 00:00:00 2001 From: Yang Shi <yang(a)os.amperecomputing.com> Date: Fri, 11 Oct 2024 18:17:02 -0700 Subject: [PATCH] mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point The "addr" and "is_shmem" arguments have different order in TP_PROTO and TP_ARGS. This resulted in the incorrect trace result: text-hugepage-644429 [276] 392092.878683: mm_khugepaged_collapse_file: mm=0xffff20025d52c440, hpage_pfn=0x200678c00, index=512, addr=1, is_shmem=0, filename=text-hugepage, nr=512, result=failed The value of "addr" is wrong because it was treated as bool value, the type of is_shmem. Fix the order in TP_PROTO to keep "addr" is before "is_shmem" since the original patch review suggested this order to achieve best packing. And use "lx" for "addr" instead of "ld" in TP_printk because address is typically shown in hex. After the fix, the trace result looks correct: text-hugepage-7291 [004] 128.627251: mm_khugepaged_collapse_file: mm=0xffff0001328f9500, hpage_pfn=0x20016ea00, index=512, addr=0x400000, is_shmem=0, filename=text-hugepage, nr=512, result=failed Link: https://lkml.kernel.org/r/20241012011702.1084846-1-yang@os.amperecomputing.… Fixes: 4c9473e87e75 ("mm/khugepaged: add tracepoint to collapse_file()") Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com> Cc: Gautam Menghani <gautammenghani201(a)gmail.com> Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org> Cc: <stable(a)vger.kernel.org> [6.2+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index b5f5369b6300..9d5c00b0285c 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -208,7 +208,7 @@ TRACE_EVENT(mm_khugepaged_scan_file, TRACE_EVENT(mm_khugepaged_collapse_file, TP_PROTO(struct mm_struct *mm, struct folio *new_folio, pgoff_t index, - bool is_shmem, unsigned long addr, struct file *file, + unsigned long addr, bool is_shmem, struct file *file, int nr, int result), TP_ARGS(mm, new_folio, index, addr, is_shmem, file, nr, result), TP_STRUCT__entry( @@ -233,7 +233,7 @@ TRACE_EVENT(mm_khugepaged_collapse_file, __entry->result = result; ), - TP_printk("mm=%p, hpage_pfn=0x%lx, index=%ld, addr=%ld, is_shmem=%d, filename=%s, nr=%d, result=%s", + TP_printk("mm=%p, hpage_pfn=0x%lx, index=%ld, addr=%lx, is_shmem=%d, filename=%s, nr=%d, result=%s", __entry->mm, __entry->hpfn, __entry->index, diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f9c39898eaff..a420eff92011 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2227,7 +2227,7 @@ rollback: folio_put(new_folio); out: VM_BUG_ON(!list_empty(&pagelist)); - trace_mm_khugepaged_collapse_file(mm, new_folio, index, is_shmem, addr, file, HPAGE_PMD_NR, result); + trace_mm_khugepaged_collapse_file(mm, new_folio, index, addr, is_shmem, file, HPAGE_PMD_NR, result); return result; }

1 year, 2 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror October 2024