If a userptr vma subject to prefetching was already invalidated
or invalidated during the prefetch operation, the operation would
repeatedly return -EAGAIN which would typically cause an infinite
loop.
Validate the userptr to ensure this doesn't happen.
v2:
- Don't fallthrough from UNMAP to PREFETCH (Matthew Brost)
Fixes: 5bd24e78829a ("drm/xe/vm: Subclass userptr vmas")
Fixes: 617eebb9c480 ("drm/xe: Fix array of binds")
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.9+
Suggested-by: Matthew Brost <matthew.brost(a)intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 996000f2424e..6fdc17be619e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2306,8 +2306,17 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
break;
}
case DRM_GPUVA_OP_UNMAP:
+ xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask);
+ break;
case DRM_GPUVA_OP_PREFETCH:
- /* FIXME: Need to skip some prefetch ops */
+ vma = gpuva_to_vma(op->base.prefetch.va);
+
+ if (xe_vma_is_userptr(vma)) {
+ err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
+ if (err)
+ return err;
+ }
+
xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask);
break;
default:
--
2.48.1
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 2b90e7ace79774a3540ce569e000388f8d22c9e0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030421-disgrace-wince-4786@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b90e7ace79774a3540ce569e000388f8d22c9e0 Mon Sep 17 00:00:00 2001
From: Peter Jones <pjones(a)redhat.com>
Date: Wed, 26 Feb 2025 15:18:39 -0500
Subject: [PATCH] efi: Don't map the entire mokvar table to determine its size
Currently, when validating the mokvar table, we (re)map the entire table
on each iteration of the loop, adding space as we discover new entries.
If the table grows over a certain size, this fails due to limitations of
early_memmap(), and we get a failure and traceback:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:139 __early_ioremap+0xef/0x220
...
Call Trace:
<TASK>
? __early_ioremap+0xef/0x220
? __warn.cold+0x93/0xfa
? __early_ioremap+0xef/0x220
? report_bug+0xff/0x140
? early_fixup_exception+0x5d/0xb0
? early_idt_handler_common+0x2f/0x3a
? __early_ioremap+0xef/0x220
? efi_mokvar_table_init+0xce/0x1d0
? setup_arch+0x864/0xc10
? start_kernel+0x6b/0xa10
? x86_64_start_reservations+0x24/0x30
? x86_64_start_kernel+0xed/0xf0
? common_startup_64+0x13e/0x141
</TASK>
---[ end trace 0000000000000000 ]---
mokvar: Failed to map EFI MOKvar config table pa=0x7c4c3000, size=265187.
Mapping the entire structure isn't actually necessary, as we don't ever
need more than one entry header mapped at once.
Changes efi_mokvar_table_init() to only map each entry header, not the
entire table, when determining the table size. Since we're not mapping
any data past the variable name, it also changes the code to enforce
that each variable name is NUL terminated, rather than attempting to
verify it in place.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Jones <pjones(a)redhat.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
diff --git a/drivers/firmware/efi/mokvar-table.c b/drivers/firmware/efi/mokvar-table.c
index 5ed0602c2f75..d865cb1dbaad 100644
--- a/drivers/firmware/efi/mokvar-table.c
+++ b/drivers/firmware/efi/mokvar-table.c
@@ -103,7 +103,6 @@ void __init efi_mokvar_table_init(void)
void *va = NULL;
unsigned long cur_offset = 0;
unsigned long offset_limit;
- unsigned long map_size = 0;
unsigned long map_size_needed = 0;
unsigned long size;
struct efi_mokvar_table_entry *mokvar_entry;
@@ -134,48 +133,34 @@ void __init efi_mokvar_table_init(void)
*/
err = -EINVAL;
while (cur_offset + sizeof(*mokvar_entry) <= offset_limit) {
- mokvar_entry = va + cur_offset;
- map_size_needed = cur_offset + sizeof(*mokvar_entry);
- if (map_size_needed > map_size) {
- if (va)
- early_memunmap(va, map_size);
- /*
- * Map a little more than the fixed size entry
- * header, anticipating some data. It's safe to
- * do so as long as we stay within current memory
- * descriptor.
- */
- map_size = min(map_size_needed + 2*EFI_PAGE_SIZE,
- offset_limit);
- va = early_memremap(efi.mokvar_table, map_size);
- if (!va) {
- pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%lu.\n",
- efi.mokvar_table, map_size);
- return;
- }
- mokvar_entry = va + cur_offset;
+ if (va)
+ early_memunmap(va, sizeof(*mokvar_entry));
+ va = early_memremap(efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+ if (!va) {
+ pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%zu.\n",
+ efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+ return;
}
+ mokvar_entry = va;
/* Check for last sentinel entry */
if (mokvar_entry->name[0] == '\0') {
if (mokvar_entry->data_size != 0)
break;
err = 0;
+ map_size_needed = cur_offset + sizeof(*mokvar_entry);
break;
}
- /* Sanity check that the name is null terminated */
- size = strnlen(mokvar_entry->name,
- sizeof(mokvar_entry->name));
- if (size >= sizeof(mokvar_entry->name))
- break;
+ /* Enforce that the name is NUL terminated */
+ mokvar_entry->name[sizeof(mokvar_entry->name) - 1] = '\0';
/* Advance to the next entry */
- cur_offset = map_size_needed + mokvar_entry->data_size;
+ cur_offset += sizeof(*mokvar_entry) + mokvar_entry->data_size;
}
if (va)
- early_memunmap(va, map_size);
+ early_memunmap(va, sizeof(*mokvar_entry));
if (err) {
pr_err("EFI MOKvar config table is not valid\n");
return;
The pnfs that we obtain from hmm_range_fault() point to pages that
we don't have a reference on, and the guarantee that they are still
in the cpu page-tables is that the notifier lock must be held and the
notifier seqno is still valid.
So while building the sg table and marking the pages accesses / dirty
we need to hold this lock with a validated seqno.
However, the lock is reclaim tainted which makes
sg_alloc_table_from_pages_segment() unusable, since it internally
allocates memory.
Instead build the sg-table manually. For the non-iommu case
this might lead to fewer coalesces, but if that's a problem it can
be fixed up later in the resource cursor code. For the iommu case,
the whole sg-table may still be coalesced to a single contigous
device va region.
This avoids marking pages that we don't own dirty and accessed, and
it also avoid dereferencing struct pages that we don't own.
v2:
- Use assert to check whether hmm pfns are valid (Matthew Auld)
- Take into account that large pages may cross range boundaries
(Matthew Auld)
v3:
- Don't unnecessarily check for a non-freed sg-table. (Matthew Auld)
- Add a missing up_read() in an error path. (Matthew Auld)
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
---
drivers/gpu/drm/xe/xe_hmm.c | 112 +++++++++++++++++++++++++++---------
1 file changed, 86 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index c56738fa713b..d33dd7bc3f41 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -42,6 +42,42 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
}
}
+static int xe_alloc_sg(struct xe_device *xe, struct sg_table *st,
+ struct hmm_range *range, struct rw_semaphore *notifier_sem)
+{
+ unsigned long i, npages, hmm_pfn;
+ unsigned long num_chunks = 0;
+ int ret;
+
+ /* HMM docs says this is needed. */
+ ret = down_read_interruptible(notifier_sem);
+ if (ret)
+ return ret;
+
+ if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
+ up_read(notifier_sem);
+ return -EAGAIN;
+ }
+
+ npages = xe_npages_in_range(range->start, range->end);
+ for (i = 0; i < npages;) {
+ unsigned long len;
+
+ hmm_pfn = range->hmm_pfns[i];
+ xe_assert(xe, hmm_pfn & HMM_PFN_VALID);
+
+ len = 1UL << hmm_pfn_to_map_order(hmm_pfn);
+
+ /* If order > 0 the page may extend beyond range->start */
+ len -= (hmm_pfn & ~HMM_PFN_FLAGS) & (len - 1);
+ i += len;
+ num_chunks++;
+ }
+ up_read(notifier_sem);
+
+ return sg_alloc_table(st, num_chunks, GFP_KERNEL);
+}
+
/**
* xe_build_sg() - build a scatter gather table for all the physical pages/pfn
* in a hmm_range. dma-map pages if necessary. dma-address is save in sg table
@@ -50,6 +86,7 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
* @range: the hmm range that we build the sg table from. range->hmm_pfns[]
* has the pfn numbers of pages that back up this hmm address range.
* @st: pointer to the sg table.
+ * @notifier_sem: The xe notifier lock.
* @write: whether we write to this range. This decides dma map direction
* for system pages. If write we map it bi-diretional; otherwise
* DMA_TO_DEVICE
@@ -76,38 +113,41 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
* Returns 0 if successful; -ENOMEM if fails to allocate memory
*/
static int xe_build_sg(struct xe_device *xe, struct hmm_range *range,
- struct sg_table *st, bool write)
+ struct sg_table *st,
+ struct rw_semaphore *notifier_sem,
+ bool write)
{
+ unsigned long npages = xe_npages_in_range(range->start, range->end);
struct device *dev = xe->drm.dev;
- struct page **pages;
- u64 i, npages;
- int ret;
+ struct scatterlist *sgl;
+ struct page *page;
+ unsigned long i, j;
- npages = xe_npages_in_range(range->start, range->end);
- pages = kvmalloc_array(npages, sizeof(*pages), GFP_KERNEL);
- if (!pages)
- return -ENOMEM;
+ lockdep_assert_held(notifier_sem);
- for (i = 0; i < npages; i++) {
- pages[i] = hmm_pfn_to_page(range->hmm_pfns[i]);
- xe_assert(xe, !is_device_private_page(pages[i]));
- }
+ i = 0;
+ for_each_sg(st->sgl, sgl, st->nents, j) {
+ unsigned long hmm_pfn, size;
- ret = sg_alloc_table_from_pages_segment(st, pages, npages, 0, npages << PAGE_SHIFT,
- xe_sg_segment_size(dev), GFP_KERNEL);
- if (ret)
- goto free_pages;
+ hmm_pfn = range->hmm_pfns[i];
+ page = hmm_pfn_to_page(hmm_pfn);
+ xe_assert(xe, !is_device_private_page(page));
+
+ size = 1UL << hmm_pfn_to_map_order(hmm_pfn);
+ size -= page_to_pfn(page) & (size - 1);
+ i += size;
- ret = dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
- if (ret) {
- sg_free_table(st);
- st = NULL;
+ if (unlikely(j == st->nents - 1)) {
+ if (i > npages)
+ size -= (i - npages);
+ sg_mark_end(sgl);
+ }
+ sg_set_page(sgl, page, size << PAGE_SHIFT, 0);
}
+ xe_assert(xe, i == npages);
-free_pages:
- kvfree(pages);
- return ret;
+ return dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
}
/**
@@ -235,16 +275,36 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
if (ret)
goto free_pfns;
- ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
+ ret = xe_alloc_sg(vm->xe, &userptr->sgt, &hmm_range, &vm->userptr.notifier_lock);
if (ret)
goto free_pfns;
+ ret = down_read_interruptible(&vm->userptr.notifier_lock);
+ if (ret)
+ goto free_st;
+
+ if (mmu_interval_read_retry(hmm_range.notifier, hmm_range.notifier_seq)) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+
+ ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt,
+ &vm->userptr.notifier_lock, write);
+ if (ret)
+ goto out_unlock;
+
xe_mark_range_accessed(&hmm_range, write);
userptr->sg = &userptr->sgt;
userptr->notifier_seq = hmm_range.notifier_seq;
+ up_read(&vm->userptr.notifier_lock);
+ kvfree(pfns);
+ return 0;
+out_unlock:
+ up_read(&vm->userptr.notifier_lock);
+free_st:
+ sg_free_table(&userptr->sgt);
free_pfns:
kvfree(pfns);
return ret;
}
-
--
2.48.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 15b3b3254d1453a8db038b7d44b311a2d6c71f98
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030445-collected-spoken-1e75@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15b3b3254d1453a8db038b7d44b311a2d6c71f98 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Sat, 15 Feb 2025 11:11:29 +0000
Subject: [PATCH] btrfs: do regular iput instead of delayed iput during extent
map shrinking
The extent map shrinker now runs in the system unbound workqueue and no
longer in kswapd context so it can directly do an iput() on inodes even
if that blocks or needs to acquire any lock (we aren't holding any locks
when requesting the delayed iput from the shrinker). So we don't need to
add a delayed iput, wake up the cleaner and delegate the iput() to the
cleaner, which also adds extra contention on the spinlock that protects
the delayed iputs list.
Reported-by: Ivan Shapovalov <intelfx(a)intelfx.name>
Tested-by: Ivan Shapovalov <intelfx(a)intelfx.name>
Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12…
CC: stable(a)vger.kernel.org # 6.12+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 8c6b85ffd18f..7f46abbd6311 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -1256,7 +1256,7 @@ static long btrfs_scan_root(struct btrfs_root *root, struct btrfs_em_shrink_ctx
min_ino = btrfs_ino(inode) + 1;
fs_info->em_shrinker_last_ino = btrfs_ino(inode);
- btrfs_add_delayed_iput(inode);
+ iput(&inode->vfs_inode);
if (ctx->scanned >= ctx->nr_to_scan || btrfs_fs_closing(fs_info))
break;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fb8179ce2996bffaa36a04e2b6262843b01b7565
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030439-announcer-lard-995b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb8179ce2996bffaa36a04e2b6262843b01b7565 Mon Sep 17 00:00:00 2001
From: Rob Herring <robh(a)kernel.org>
Date: Mon, 4 Nov 2024 13:03:13 -0600
Subject: [PATCH] riscv: cacheinfo: Use of_property_present() for non-boolean
properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
Reviewed-by: Clément Léger <cleger(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 2d40736fc37c..26b085dbdd07 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -108,11 +108,11 @@ int populate_cache_leaves(unsigned int cpu)
if (!np)
return -ENOENT;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
prev = np;
@@ -125,11 +125,11 @@ int populate_cache_leaves(unsigned int cpu)
break;
if (level <= levels)
break;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
levels = level;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 564fc8eb6f78e01292ff10801f318feae6153fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030452-rural-foe-f342@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 564fc8eb6f78e01292ff10801f318feae6153fdd Mon Sep 17 00:00:00 2001
From: Yong-Xuan Wang <yongxuan.wang(a)sifive.com>
Date: Fri, 20 Dec 2024 16:39:24 +0800
Subject: [PATCH] riscv: signal: fix signal_minsigstksz
The init_rt_signal_env() funciton is called before the alternative patch
is applied, so using the alternative-related API to check the availability
of an extension within this function doesn't have the intended effect.
This patch reorders the init_rt_signal_env() and apply_boot_alternatives()
to get the correct signal_minsigstksz.
Fixes: e92f469b0771 ("riscv: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang(a)sifive.com>
Reviewed-by: Zong Li <zong.li(a)sifive.com>
Reviewed-by: Andy Chiu <andybnac(a)gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241220083926.19453-3-yongxuan.wang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f1793630fc51..4fe45daa6281 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -322,8 +322,8 @@ void __init setup_arch(char **cmdline_p)
riscv_init_cbo_blocksizes();
riscv_fill_hwcap();
- init_rt_signal_env();
apply_boot_alternatives();
+ init_rt_signal_env();
if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) &&
riscv_isa_extension_available(NULL, ZICBOM))