The patch below does not apply to the 6.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.17.y git checkout FETCH_HEAD git cherry-pick -x a2fff99f92dae9c0eaf0d75de3def70ec68dad92 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025112136-panama-nape-342b@gregkh' --subject-prefix 'PATCH 6.17.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2fff99f92dae9c0eaf0d75de3def70ec68dad92 Mon Sep 17 00:00:00 2001 From: Pasha Tatashin pasha.tatashin@soleen.com Date: Mon, 20 Oct 2025 20:08:51 -0400 Subject: [PATCH] kho: increase metadata bitmap size to PAGE_SIZE
KHO memory preservation metadata is preserved in 512 byte chunks which requires their allocation from slab allocator. Slabs are not safe to be used with KHO because of kfence, and because partial slabs may lead leaks to the next kernel. Change the size to be PAGE_SIZE.
The kfence specifically may cause memory corruption, where it randomly provides slab objects that can be within the scratch area. The reason for that is that kfence allocates its objects prior to KHO scratch is marked as CMA region.
While this change could potentially increase metadata overhead on systems with sparsely preserved memory, this is being mitigated by ongoing work to reduce sparseness during preservation via 1G guest pages. Furthermore, this change aligns with future work on a stateless KHO, which will also use page-sized bitmaps for its radix tree metadata.
Link: https://lkml.kernel.org/r/20251021000852.2924827-3-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin pasha.tatashin@soleen.com Reviewed-by: Mike Rapoport (Microsoft) rppt@kernel.org Reviewed-by: Pratyush Yadav pratyush@kernel.org Cc: Alexander Graf graf@amazon.com Cc: Christian Brauner brauner@kernel.org Cc: David Matlack dmatlack@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Jonathan Corbet corbet@lwn.net Cc: Masahiro Yamada masahiroy@kernel.org Cc: Miguel Ojeda ojeda@kernel.org Cc: Randy Dunlap rdunlap@infradead.org Cc: Samiullah Khawaja skhawaja@google.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 0bc9001e532a..9217d2fdd2d3 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -69,10 +69,10 @@ early_param("kho", kho_parse_enable); * Keep track of memory that is to be preserved across KHO. * * The serializing side uses two levels of xarrays to manage chunks of per-order - * 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a - * 1TB system would fit inside a single 512 byte bitmap. For order 0 allocations - * each bitmap will cover 16M of address space. Thus, for 16G of memory at most - * 512K of bitmap memory will be needed for order 0. + * PAGE_SIZE byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order + * of a 8TB system would fit inside a single 4096 byte bitmap. For order 0 + * allocations each bitmap will cover 128M of address space. Thus, for 16G of + * memory at most 512K of bitmap memory will be needed for order 0. * * This approach is fully incremental, as the serialization progresses folios * can continue be aggregated to the tracker. The final step, immediately prior @@ -80,12 +80,14 @@ early_param("kho", kho_parse_enable); * successor kernel to parse. */
-#define PRESERVE_BITS (512 * 8) +#define PRESERVE_BITS (PAGE_SIZE * 8)
struct kho_mem_phys_bits { DECLARE_BITMAP(preserve, PRESERVE_BITS); };
+static_assert(sizeof(struct kho_mem_phys_bits) == PAGE_SIZE); + struct kho_mem_phys { /* * Points to kho_mem_phys_bits, a sparse bitmap array. Each bit is sized @@ -133,19 +135,19 @@ static struct kho_out kho_out = { .finalized = false, };
-static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) +static void *xa_load_or_alloc(struct xarray *xa, unsigned long index) { void *res = xa_load(xa, index);
if (res) return res;
- void *elm __free(kfree) = kzalloc(sz, GFP_KERNEL); + void *elm __free(kfree) = kzalloc(PAGE_SIZE, GFP_KERNEL);
if (!elm) return ERR_PTR(-ENOMEM);
- if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), sz))) + if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), PAGE_SIZE))) return ERR_PTR(-EINVAL);
res = xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); @@ -218,8 +220,7 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn, } }
- bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, - sizeof(*bits)); + bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS); if (IS_ERR(bits)) return PTR_ERR(bits);
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
[ Upstream commit 469661d0d3a55a7ba1e7cb847c26baf78cace086 ]
Patch series "kho: add support for preserving vmalloc allocations", v5.
Following the discussion about preservation of memfd with LUO [1] these patches add support for preserving vmalloc allocations.
Any KHO uses case presumes that there's a data structure that lists physical addresses of preserved folios (and potentially some additional metadata). Allowing vmalloc preservations with KHO allows scalable preservation of such data structures.
For instance, instead of allocating array describing preserved folios in the fdt, memfd preservation can use vmalloc:
preserved_folios = vmalloc_array(nr_folios, sizeof(*preserved_folios)); memfd_luo_preserve_folios(preserved_folios, folios, nr_folios); kho_preserve_vmalloc(preserved_folios, &folios_info);
This patch (of 4):
Instead of checking if kho is finalized in each caller of __kho_preserve_order(), do it in the core function itself.
Link: https://lkml.kernel.org/r/20250921054458.4043761-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20250921054458.4043761-2-rppt@kernel.org Link: https://lore.kernel.org/all/20250807014442.3829950-30-pasha.tatashin@soleen.... [1] Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org Reviewed-by: Pratyush Yadav pratyush@kernel.org Cc: Alexander Graf graf@amazon.com Cc: Baoquan He bhe@redhat.com Cc: Changyuan Lyu changyuanl@google.com Cc: Chris Li chrisl@kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: Pasha Tatashin pasha.tatashin@soleen.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: a2fff99f92da ("kho: increase metadata bitmap size to PAGE_SIZE") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kexec_handover.c | 55 +++++++++++++++++++---------------------- 1 file changed, 26 insertions(+), 29 deletions(-)
diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index c58afd23a241f..4e5774a6f0738 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -91,6 +91,29 @@ struct kho_serialization { struct khoser_mem_chunk *preserved_mem_map; };
+struct kho_out { + struct blocking_notifier_head chain_head; + + struct dentry *dir; + + struct mutex lock; /* protects KHO FDT finalization */ + + struct kho_serialization ser; + bool finalized; +}; + +static struct kho_out kho_out = { + .chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head), + .lock = __MUTEX_INITIALIZER(kho_out.lock), + .ser = { + .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), + .track = { + .orders = XARRAY_INIT(kho_out.ser.track.orders, 0), + }, + }, + .finalized = false, +}; + static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) { void *elm, *res; @@ -149,6 +172,9 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
might_sleep();
+ if (kho_out.finalized) + return -EBUSY; + physxa = xa_load(&track->orders, order); if (!physxa) { int err; @@ -631,29 +657,6 @@ int kho_add_subtree(struct kho_serialization *ser, const char *name, void *fdt) } EXPORT_SYMBOL_GPL(kho_add_subtree);
-struct kho_out { - struct blocking_notifier_head chain_head; - - struct dentry *dir; - - struct mutex lock; /* protects KHO FDT finalization */ - - struct kho_serialization ser; - bool finalized; -}; - -static struct kho_out kho_out = { - .chain_head = BLOCKING_NOTIFIER_INIT(kho_out.chain_head), - .lock = __MUTEX_INITIALIZER(kho_out.lock), - .ser = { - .fdt_list = LIST_HEAD_INIT(kho_out.ser.fdt_list), - .track = { - .orders = XARRAY_INIT(kho_out.ser.track.orders, 0), - }, - }, - .finalized = false, -}; - int register_kho_notifier(struct notifier_block *nb) { return blocking_notifier_chain_register(&kho_out.chain_head, nb); @@ -681,9 +684,6 @@ int kho_preserve_folio(struct folio *folio) const unsigned int order = folio_order(folio); struct kho_mem_track *track = &kho_out.ser.track;
- if (kho_out.finalized) - return -EBUSY; - return __kho_preserve_order(track, pfn, order); } EXPORT_SYMBOL_GPL(kho_preserve_folio); @@ -707,9 +707,6 @@ int kho_preserve_phys(phys_addr_t phys, size_t size) int err = 0; struct kho_mem_track *track = &kho_out.ser.track;
- if (kho_out.finalized) - return -EBUSY; - if (!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size)) return -EINVAL;
From: Pasha Tatashin pasha.tatashin@soleen.com
[ Upstream commit a2fff99f92dae9c0eaf0d75de3def70ec68dad92 ]
KHO memory preservation metadata is preserved in 512 byte chunks which requires their allocation from slab allocator. Slabs are not safe to be used with KHO because of kfence, and because partial slabs may lead leaks to the next kernel. Change the size to be PAGE_SIZE.
The kfence specifically may cause memory corruption, where it randomly provides slab objects that can be within the scratch area. The reason for that is that kfence allocates its objects prior to KHO scratch is marked as CMA region.
While this change could potentially increase metadata overhead on systems with sparsely preserved memory, this is being mitigated by ongoing work to reduce sparseness during preservation via 1G guest pages. Furthermore, this change aligns with future work on a stateless KHO, which will also use page-sized bitmaps for its radix tree metadata.
Link: https://lkml.kernel.org/r/20251021000852.2924827-3-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin pasha.tatashin@soleen.com Reviewed-by: Mike Rapoport (Microsoft) rppt@kernel.org Reviewed-by: Pratyush Yadav pratyush@kernel.org Cc: Alexander Graf graf@amazon.com Cc: Christian Brauner brauner@kernel.org Cc: David Matlack dmatlack@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Jonathan Corbet corbet@lwn.net Cc: Masahiro Yamada masahiroy@kernel.org Cc: Miguel Ojeda ojeda@kernel.org Cc: Randy Dunlap rdunlap@infradead.org Cc: Samiullah Khawaja skhawaja@google.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kexec_handover.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 492e40b6b8023..040cfeb1d3fab 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -52,10 +52,10 @@ early_param("kho", kho_parse_enable); * Keep track of memory that is to be preserved across KHO. * * The serializing side uses two levels of xarrays to manage chunks of per-order - * 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a - * 1TB system would fit inside a single 512 byte bitmap. For order 0 allocations - * each bitmap will cover 16M of address space. Thus, for 16G of memory at most - * 512K of bitmap memory will be needed for order 0. + * PAGE_SIZE byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order + * of a 8TB system would fit inside a single 4096 byte bitmap. For order 0 + * allocations each bitmap will cover 128M of address space. Thus, for 16G of + * memory at most 512K of bitmap memory will be needed for order 0. * * This approach is fully incremental, as the serialization progresses folios * can continue be aggregated to the tracker. The final step, immediately prior @@ -63,12 +63,14 @@ early_param("kho", kho_parse_enable); * successor kernel to parse. */
-#define PRESERVE_BITS (512 * 8) +#define PRESERVE_BITS (PAGE_SIZE * 8)
struct kho_mem_phys_bits { DECLARE_BITMAP(preserve, PRESERVE_BITS); };
+static_assert(sizeof(struct kho_mem_phys_bits) == PAGE_SIZE); + struct kho_mem_phys { /* * Points to kho_mem_phys_bits, a sparse bitmap array. Each bit is sized @@ -116,19 +118,19 @@ static struct kho_out kho_out = { .finalized = false, };
-static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) +static void *xa_load_or_alloc(struct xarray *xa, unsigned long index) { void *res = xa_load(xa, index);
if (res) return res;
- void *elm __free(kfree) = kzalloc(sz, GFP_KERNEL); + void *elm __free(kfree) = kzalloc(PAGE_SIZE, GFP_KERNEL);
if (!elm) return ERR_PTR(-ENOMEM);
- if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), sz))) + if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), PAGE_SIZE))) return ERR_PTR(-EINVAL);
res = xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); @@ -201,8 +203,7 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn, } }
- bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, - sizeof(*bits)); + bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS); if (IS_ERR(bits)) return PTR_ERR(bits);
linux-stable-mirror@lists.linaro.org