Running as a Xen PV guest uncovered some bugs when ITS mitigation is active.
Juergen Gross (3): x86/execmem: don't use PAGE_KERNEL protection for code pages x86/mm/pat: don't collapse pages without PSE set x86/alternative: make kernel ITS thunks read-only
arch/x86/kernel/alternative.c | 16 ++++++++++++++++ arch/x86/mm/init.c | 2 +- arch/x86/mm/pat/set_memory.c | 3 +++ 3 files changed, 20 insertions(+), 1 deletion(-)
In case X86_FEATURE_PSE isn't available (e.g. when running as a Xen PV guest), execmem_arch_setup() will fall back to use PAGE_KERNEL protection for the EXECMEM_MODULE_TEXT range.
This will result in attempts to execute code with the NX bit set in case of ITS mitigation being applied.
Avoid this problem by using PAGE_KERNEL_EXEC protection instead, which will not set the NX bit.
Cc: stable@vger.kernel.org Reported-by: Xin Li xin@zytor.com Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 7456df985d96..f5012ae31d8b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1089,7 +1089,7 @@ struct execmem_info __init *execmem_arch_setup(void) pgprot = PAGE_KERNEL_ROX; flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE; } else { - pgprot = PAGE_KERNEL; + pgprot = PAGE_KERNEL_EXEC; flags = EXECMEM_KASAN_SHADOW; }
On Wed, May 28, 2025 at 02:35:55PM +0200, Juergen Gross wrote:
In case X86_FEATURE_PSE isn't available (e.g. when running as a Xen PV guest), execmem_arch_setup() will fall back to use PAGE_KERNEL protection for the EXECMEM_MODULE_TEXT range.
This will result in attempts to execute code with the NX bit set in case of ITS mitigation being applied.
Avoid this problem by using PAGE_KERNEL_EXEC protection instead, which will not set the NX bit.
Cc: stable@vger.kernel.org Reported-by: Xin Li xin@zytor.com Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 7456df985d96..f5012ae31d8b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1089,7 +1089,7 @@ struct execmem_info __init *execmem_arch_setup(void) pgprot = PAGE_KERNEL_ROX; flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE; } else {
pgprot = PAGE_KERNEL;
pgprot = PAGE_KERNEL_EXEC;
Please don't. Everything except ITS can work with PAGE_KENREL so the fix should be on ITS side.
flags = EXECMEM_KASAN_SHADOW;
} -- 2.43.0
On 28.05.25 19:27, Mike Rapoport wrote:
On Wed, May 28, 2025 at 02:35:55PM +0200, Juergen Gross wrote:
In case X86_FEATURE_PSE isn't available (e.g. when running as a Xen PV guest), execmem_arch_setup() will fall back to use PAGE_KERNEL protection for the EXECMEM_MODULE_TEXT range.
This will result in attempts to execute code with the NX bit set in case of ITS mitigation being applied.
Avoid this problem by using PAGE_KERNEL_EXEC protection instead, which will not set the NX bit.
Cc: stable@vger.kernel.org Reported-by: Xin Li xin@zytor.com Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 7456df985d96..f5012ae31d8b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1089,7 +1089,7 @@ struct execmem_info __init *execmem_arch_setup(void) pgprot = PAGE_KERNEL_ROX; flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE; } else {
pgprot = PAGE_KERNEL;
pgprot = PAGE_KERNEL_EXEC;
Please don't. Everything except ITS can work with PAGE_KENREL so the fix should be on ITS side.
Hmm, maybe adding another element to execmem_info[] with the new type EXECMEM_KERNEL_TEXT, specifying PAGE_KERNEL_EXEC if !PSE?
Juergen
On Wed, May 28, 2025 at 08:27:19PM +0300, Mike Rapoport wrote:
On Wed, May 28, 2025 at 02:35:55PM +0200, Juergen Gross wrote:
In case X86_FEATURE_PSE isn't available (e.g. when running as a Xen PV guest), execmem_arch_setup() will fall back to use PAGE_KERNEL protection for the EXECMEM_MODULE_TEXT range.
This will result in attempts to execute code with the NX bit set in case of ITS mitigation being applied.
Avoid this problem by using PAGE_KERNEL_EXEC protection instead, which will not set the NX bit.
Cc: stable@vger.kernel.org Reported-by: Xin Li xin@zytor.com Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 7456df985d96..f5012ae31d8b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1089,7 +1089,7 @@ struct execmem_info __init *execmem_arch_setup(void) pgprot = PAGE_KERNEL_ROX; flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE; } else {
pgprot = PAGE_KERNEL;
pgprot = PAGE_KERNEL_EXEC;
Please don't. Everything except ITS can work with PAGE_KENREL so the fix should be on ITS side.
Well, this is early vs post make_ro again.
Does something like so work for you?
--- diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 7456df985d96..f5012ae31d8b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1089,7 +1089,7 @@ struct execmem_info __init *execmem_arch_setup(void) pgprot = PAGE_KERNEL_ROX; flags = EXECMEM_KASAN_SHADOW | EXECMEM_ROX_CACHE; } else { - pgprot = PAGE_KERNEL; + pgprot = PAGE_KERNEL_EXEC; flags = EXECMEM_KASAN_SHADOW; }
diff --git a/mm/execmem.c b/mm/execmem.c index 6f7a2653b280..dbe2eedea0e6 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -258,6 +258,7 @@ static bool execmem_cache_rox = false;
void execmem_cache_make_ro(void) { + struct execmem_range *module_text = &execmem_info->ranges[EXECMEM_MODULE_TEXT]; struct maple_tree *free_areas = &execmem_cache.free_areas; struct maple_tree *busy_areas = &execmem_cache.busy_areas; MA_STATE(mas_free, free_areas, 0, ULONG_MAX); @@ -269,6 +270,9 @@ void execmem_cache_make_ro(void)
mutex_lock(mutex);
+ if (!(module_text->flags & EXECMEM_ROX_CACHE)) + module_text->pgprot = PAGE_KERNEL; + mas_for_each(&mas_free, area, ULONG_MAX) { unsigned long pages = mas_range_len(&mas_free) >> PAGE_SHIFT; set_memory_ro(mas_free.index, pages);
Collapsing pages to a leaf PMD or PUD should be done only if X86_FEATURE_PSE is available, which is not the case when running e.g. as a Xen PV guest.
Cc: stable@vger.kernel.org Fixes: 41d88484c71c ("x86/mm/pat: restore large ROX pages after fragmentation") Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/mm/pat/set_memory.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 30ab4aced761..e6a6ac8cdc1d 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1257,6 +1257,9 @@ static int collapse_pmd_page(pmd_t *pmd, unsigned long addr, pgprot_t pgprot; int i = 0;
+ if (!cpu_feature_enabled(X86_FEATURE_PSE)) + return 0; + addr &= PMD_MASK; pte = pte_offset_kernel(pmd, addr); first = *pte;
When allocating memory pages for kernel ITS thunks, make them read-only after having written the last thunk.
This will be needed when X86_FEATURE_PSE isn't available, as the thunk memory will have PAGE_KERNEL_EXEC protection, which is including the write permission.
Cc: stable@vger.kernel.org Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/kernel/alternative.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..bd974a0ac88a 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -217,6 +217,15 @@ static void *its_alloc(void) return no_free_ptr(page); }
+static void its_set_kernel_ro(void *addr) +{ +#ifdef CONFIG_MODULES + if (its_mod) + return; +#endif + execmem_restore_rox(addr, PAGE_SIZE); +} + static void *its_allocate_thunk(int reg) { int size = 3 + (reg / 8); @@ -234,6 +243,8 @@ static void *its_allocate_thunk(int reg) #endif
if (!its_page || (its_offset + size - 1) >= PAGE_SIZE) { + if (its_page) + its_set_kernel_ro(its_page); its_page = its_alloc(); if (!its_page) { pr_err("ITS page allocation failed\n"); @@ -2338,6 +2349,11 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
+ /* Make potential last thunk page read-only. */ + if (its_page) + its_set_kernel_ro(its_page); + its_page = NULL; + /* * Adjust all CALL instructions to point to func()-10, including * those in .altinstr_replacement.
On Wed, May 28, 2025 at 02:35:57PM +0200, Juergen Gross wrote:
When allocating memory pages for kernel ITS thunks, make them read-only after having written the last thunk.
This will be needed when X86_FEATURE_PSE isn't available, as the thunk memory will have PAGE_KERNEL_EXEC protection, which is including the write permission.
Cc: stable@vger.kernel.org Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/kernel/alternative.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..bd974a0ac88a 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -217,6 +217,15 @@ static void *its_alloc(void) return no_free_ptr(page); } +static void its_set_kernel_ro(void *addr) +{ +#ifdef CONFIG_MODULES
- if (its_mod)
return;
+#endif
- execmem_restore_rox(addr, PAGE_SIZE);
+}
static void *its_allocate_thunk(int reg) { int size = 3 + (reg / 8); @@ -234,6 +243,8 @@ static void *its_allocate_thunk(int reg) #endif if (!its_page || (its_offset + size - 1) >= PAGE_SIZE) {
if (its_page)
its_page = its_alloc(); if (!its_page) { pr_err("ITS page allocation failed\n");its_set_kernel_ro(its_page);
@@ -2338,6 +2349,11 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- /* Make potential last thunk page read-only. */
- if (its_page)
its_set_kernel_ro(its_page);
- its_page = NULL;
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
No, this is all sorts of wrong. Execmem API should ensure this.
On 28.05.25 15:10, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 02:35:57PM +0200, Juergen Gross wrote:
When allocating memory pages for kernel ITS thunks, make them read-only after having written the last thunk.
This will be needed when X86_FEATURE_PSE isn't available, as the thunk memory will have PAGE_KERNEL_EXEC protection, which is including the write permission.
Cc: stable@vger.kernel.org Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/kernel/alternative.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..bd974a0ac88a 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -217,6 +217,15 @@ static void *its_alloc(void) return no_free_ptr(page); } +static void its_set_kernel_ro(void *addr) +{ +#ifdef CONFIG_MODULES
- if (its_mod)
return;
+#endif
- execmem_restore_rox(addr, PAGE_SIZE);
+}
- static void *its_allocate_thunk(int reg) { int size = 3 + (reg / 8);
@@ -234,6 +243,8 @@ static void *its_allocate_thunk(int reg) #endif if (!its_page || (its_offset + size - 1) >= PAGE_SIZE) {
if (its_page)
its_page = its_alloc(); if (!its_page) { pr_err("ITS page allocation failed\n");its_set_kernel_ro(its_page);
@@ -2338,6 +2349,11 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- /* Make potential last thunk page read-only. */
- if (its_page)
its_set_kernel_ro(its_page);
- its_page = NULL;
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
No, this is all sorts of wrong. Execmem API should ensure this.
You are aware that this patch is basically mirroring the work which is already done for modules in alternative.c?
Juergen
On Wed, May 28, 2025 at 03:19:24PM +0200, Jürgen Groß wrote:
On 28.05.25 15:10, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 02:35:57PM +0200, Juergen Gross wrote:
When allocating memory pages for kernel ITS thunks, make them read-only after having written the last thunk.
This will be needed when X86_FEATURE_PSE isn't available, as the thunk memory will have PAGE_KERNEL_EXEC protection, which is including the write permission.
Cc: stable@vger.kernel.org Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/kernel/alternative.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..bd974a0ac88a 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -217,6 +217,15 @@ static void *its_alloc(void) return no_free_ptr(page); } +static void its_set_kernel_ro(void *addr) +{ +#ifdef CONFIG_MODULES
- if (its_mod)
return;
+#endif
- execmem_restore_rox(addr, PAGE_SIZE);
+}
- static void *its_allocate_thunk(int reg) { int size = 3 + (reg / 8);
@@ -234,6 +243,8 @@ static void *its_allocate_thunk(int reg) #endif if (!its_page || (its_offset + size - 1) >= PAGE_SIZE) {
if (its_page)
its_page = its_alloc(); if (!its_page) { pr_err("ITS page allocation failed\n");its_set_kernel_ro(its_page);
@@ -2338,6 +2349,11 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- /* Make potential last thunk page read-only. */
- if (its_page)
its_set_kernel_ro(its_page);
- its_page = NULL;
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
No, this is all sorts of wrong. Execmem API should ensure this.
You are aware that this patch is basically mirroring the work which is already done for modules in alternative.c?
I am having trouble parsing that -- where does alternative.c do this to modules?
On 28.05.25 15:22, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 03:19:24PM +0200, Jürgen Groß wrote:
On 28.05.25 15:10, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 02:35:57PM +0200, Juergen Gross wrote:
When allocating memory pages for kernel ITS thunks, make them read-only after having written the last thunk.
This will be needed when X86_FEATURE_PSE isn't available, as the thunk memory will have PAGE_KERNEL_EXEC protection, which is including the write permission.
Cc: stable@vger.kernel.org Fixes: 5185e7f9f3bd ("x86/module: enable ROX caches for module text on 64 bit") Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/kernel/alternative.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..bd974a0ac88a 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -217,6 +217,15 @@ static void *its_alloc(void) return no_free_ptr(page); } +static void its_set_kernel_ro(void *addr) +{ +#ifdef CONFIG_MODULES
- if (its_mod)
return;
+#endif
- execmem_restore_rox(addr, PAGE_SIZE);
+}
- static void *its_allocate_thunk(int reg) { int size = 3 + (reg / 8);
@@ -234,6 +243,8 @@ static void *its_allocate_thunk(int reg) #endif if (!its_page || (its_offset + size - 1) >= PAGE_SIZE) {
if (its_page)
its_set_kernel_ro(its_page); its_page = its_alloc(); if (!its_page) { pr_err("ITS page allocation failed\n");
@@ -2338,6 +2349,11 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- /* Make potential last thunk page read-only. */
- if (its_page)
its_set_kernel_ro(its_page);
- its_page = NULL;
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
No, this is all sorts of wrong. Execmem API should ensure this.
You are aware that this patch is basically mirroring the work which is already done for modules in alternative.c?
I am having trouble parsing that -- where does alternative.c do this to modules?
Have a look at its_fini_mod().
Juergen
On Wed, May 28, 2025 at 03:30:33PM +0200, Jürgen Groß wrote:
Have a look at its_fini_mod().
Oh, that's what you mean. But this still isn't very nice, you now have restore_rox() without make_temp_rw(), which was the intended usage pattern.
Bah, I hate how execmem works different for !PSE, Mike, you see a sane way to fix this?
Anyway, if we have to do something like this, then I would prefer it shaped something like so:
--- diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..33d4d139cb50 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -111,9 +111,8 @@ static bool cfi_paranoid __ro_after_init;
#ifdef CONFIG_MITIGATION_ITS
-#ifdef CONFIG_MODULES static struct module *its_mod; -#endif +static struct its_array its_pages; static void *its_page; static unsigned int its_offset;
@@ -151,68 +150,78 @@ static void *its_init_thunk(void *thunk, int reg) return thunk + offset; }
-#ifdef CONFIG_MODULES void its_init_mod(struct module *mod) { if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- mutex_lock(&text_mutex); - its_mod = mod; - its_page = NULL; + if (mod) { + mutex_lock(&text_mutex); + its_mod = mod; + its_page = NULL; + } }
void its_fini_mod(struct module *mod) { + struct its_array *pages = &its_pages; + if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
WARN_ON_ONCE(its_mod != mod);
- its_mod = NULL; - its_page = NULL; - mutex_unlock(&text_mutex); + if (mod) { + pages = &mod->arch.its_pages; + its_mod = NULL; + its_page = NULL; + mutex_unlock(&text_mutex); + }
- for (int i = 0; i < mod->its_num_pages; i++) { - void *page = mod->its_page_array[i]; + for (int i = 0; i < pages->num; i++) { + void *page = pages->pages[i]; execmem_restore_rox(page, PAGE_SIZE); } + + if (!mod) + kfree(pages->pages); }
void its_free_mod(struct module *mod) { + struct its_array *pages = &its_pages; + if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- for (int i = 0; i < mod->its_num_pages; i++) { - void *page = mod->its_page_array[i]; + if (mod) + pages = &mod->arch.its_pages; + + for (int i = 0; i < pages->num; i++) { + void *page = pages->pages[i]; execmem_free(page); } - kfree(mod->its_page_array); + kfree(pages->pages); } -#endif /* CONFIG_MODULES */
static void *its_alloc(void) { - void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE); + struct its_array *pages = &its_pages; + void *tmp;
+ void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE); if (!page) return NULL;
-#ifdef CONFIG_MODULES - if (its_mod) { - void *tmp = krealloc(its_mod->its_page_array, - (its_mod->its_num_pages+1) * sizeof(void *), - GFP_KERNEL); - if (!tmp) - return NULL; + tmp = krealloc(pages->pages, (pages->num + 1) * sizeof(void *), GFP_KERNEL); + if (!tmp) + return NULL;
- its_mod->its_page_array = tmp; - its_mod->its_page_array[its_mod->its_num_pages++] = page; + pages->pages = tmp; + pages->pages[pages->num++] = page;
+ if (its_mod) execmem_make_temp_rw(page, PAGE_SIZE); - } -#endif /* CONFIG_MODULES */
return no_free_ptr(page); } @@ -2338,6 +2347,8 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
+ its_fini_mod(NULL); + /* * Adjust all CALL instructions to point to func()-10, including * those in .altinstr_replacement.
On Wed, May 28, 2025 at 05:58:21PM +0200, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 03:30:33PM +0200, Jürgen Groß wrote:
Have a look at its_fini_mod().
Oh, that's what you mean. But this still isn't very nice, you now have restore_rox() without make_temp_rw(), which was the intended usage pattern.
Bah, I hate how execmem works different for !PSE, Mike, you see a sane way to fix this?
Anyway, if we have to do something like this, then I would prefer it shaped something like so:
Missing file:
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h index e988bac0a4a1..3c2de4ce3b10 100644 --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -5,12 +5,20 @@ #include <asm-generic/module.h> #include <asm/orc_types.h>
+struct its_array { +#ifdef CONFIG_MITIGATION_ITS + void **pages; + int num; +#endif +}; + struct mod_arch_specific { #ifdef CONFIG_UNWINDER_ORC unsigned int num_orcs; int *orc_unwind_ip; struct orc_entry *orc_unwind; #endif + struct its_array its_pages; };
#endif /* _ASM_X86_MODULE_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..33d4d139cb50 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -111,9 +111,8 @@ static bool cfi_paranoid __ro_after_init; #ifdef CONFIG_MITIGATION_ITS -#ifdef CONFIG_MODULES static struct module *its_mod; -#endif +static struct its_array its_pages; static void *its_page; static unsigned int its_offset; @@ -151,68 +150,78 @@ static void *its_init_thunk(void *thunk, int reg) return thunk + offset; } -#ifdef CONFIG_MODULES void its_init_mod(struct module *mod) { if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- mutex_lock(&text_mutex);
- its_mod = mod;
- its_page = NULL;
- if (mod) {
mutex_lock(&text_mutex);
its_mod = mod;
its_page = NULL;
- }
} void its_fini_mod(struct module *mod) {
- struct its_array *pages = &its_pages;
- if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
WARN_ON_ONCE(its_mod != mod);
- its_mod = NULL;
- its_page = NULL;
- mutex_unlock(&text_mutex);
- if (mod) {
pages = &mod->arch.its_pages;
its_mod = NULL;
its_page = NULL;
mutex_unlock(&text_mutex);
- }
- for (int i = 0; i < mod->its_num_pages; i++) {
void *page = mod->its_page_array[i];
- for (int i = 0; i < pages->num; i++) {
execmem_restore_rox(page, PAGE_SIZE); }void *page = pages->pages[i];
- if (!mod)
kfree(pages->pages);
} void its_free_mod(struct module *mod) {
- struct its_array *pages = &its_pages;
- if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- for (int i = 0; i < mod->its_num_pages; i++) {
void *page = mod->its_page_array[i];
- if (mod)
pages = &mod->arch.its_pages;
- for (int i = 0; i < pages->num; i++) {
execmem_free(page); }void *page = pages->pages[i];
- kfree(mod->its_page_array);
- kfree(pages->pages);
} -#endif /* CONFIG_MODULES */ static void *its_alloc(void) {
- void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE);
- struct its_array *pages = &its_pages;
- void *tmp;
- void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE); if (!page) return NULL;
-#ifdef CONFIG_MODULES
- if (its_mod) {
void *tmp = krealloc(its_mod->its_page_array,
(its_mod->its_num_pages+1) * sizeof(void *),
GFP_KERNEL);
if (!tmp)
return NULL;
- tmp = krealloc(pages->pages, (pages->num + 1) * sizeof(void *), GFP_KERNEL);
- if (!tmp)
return NULL;
its_mod->its_page_array = tmp;
its_mod->its_page_array[its_mod->its_num_pages++] = page;
- pages->pages = tmp;
- pages->pages[pages->num++] = page;
- if (its_mod) execmem_make_temp_rw(page, PAGE_SIZE);
- }
-#endif /* CONFIG_MODULES */ return no_free_ptr(page); } @@ -2338,6 +2347,8 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- its_fini_mod(NULL);
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
On Wed, May 28, 2025 at 05:58:21PM +0200, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 03:30:33PM +0200, Jürgen Groß wrote:
Have a look at its_fini_mod().
Oh, that's what you mean. But this still isn't very nice, you now have restore_rox() without make_temp_rw(), which was the intended usage pattern.
Bah, I hate how execmem works different for !PSE, Mike, you see a sane way to fix this?
Not really :(
But just resetting permissions in the end like you did makes perfect sense to me. It's like STRICT_MODULE_RWX, somebody has to set the pages to ROX at some point and running execmem_restore_rox() on something that was already ROX won't cost much, set_memory will bail out early.
Anyway, if we have to do something like this, then I would prefer it shaped something like so:
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..33d4d139cb50 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -111,9 +111,8 @@ static bool cfi_paranoid __ro_after_init; #ifdef CONFIG_MITIGATION_ITS -#ifdef CONFIG_MODULES static struct module *its_mod; -#endif +static struct its_array its_pages; static void *its_page; static unsigned int its_offset; @@ -151,68 +150,78 @@ static void *its_init_thunk(void *thunk, int reg) return thunk + offset; } -#ifdef CONFIG_MODULES void its_init_mod(struct module *mod) { if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- mutex_lock(&text_mutex);
- its_mod = mod;
- its_page = NULL;
- if (mod) {
mutex_lock(&text_mutex);
its_mod = mod;
its_page = NULL;
- }
} void its_fini_mod(struct module *mod) {
- struct its_array *pages = &its_pages;
- if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
WARN_ON_ONCE(its_mod != mod);
- its_mod = NULL;
- its_page = NULL;
- mutex_unlock(&text_mutex);
- if (mod) {
pages = &mod->arch.its_pages;
its_mod = NULL;
its_page = NULL;
mutex_unlock(&text_mutex);
- }
- for (int i = 0; i < mod->its_num_pages; i++) {
void *page = mod->its_page_array[i];
- for (int i = 0; i < pages->num; i++) {
execmem_restore_rox(page, PAGE_SIZE); }void *page = pages->pages[i];
- if (!mod)
kfree(pages->pages);
} void its_free_mod(struct module *mod) {
- struct its_array *pages = &its_pages;
- if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- for (int i = 0; i < mod->its_num_pages; i++) {
void *page = mod->its_page_array[i];
- if (mod)
pages = &mod->arch.its_pages;
- for (int i = 0; i < pages->num; i++) {
execmem_free(page); }void *page = pages->pages[i];
- kfree(mod->its_page_array);
- kfree(pages->pages);
} -#endif /* CONFIG_MODULES */ static void *its_alloc(void) {
- void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE);
- struct its_array *pages = &its_pages;
- void *tmp;
- void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE); if (!page) return NULL;
-#ifdef CONFIG_MODULES
- if (its_mod) {
void *tmp = krealloc(its_mod->its_page_array,
(its_mod->its_num_pages+1) * sizeof(void *),
GFP_KERNEL);
if (!tmp)
return NULL;
- tmp = krealloc(pages->pages, (pages->num + 1) * sizeof(void *), GFP_KERNEL);
- if (!tmp)
return NULL;
its_mod->its_page_array = tmp;
its_mod->its_page_array[its_mod->its_num_pages++] = page;
- pages->pages = tmp;
- pages->pages[pages->num++] = page;
- if (its_mod) execmem_make_temp_rw(page, PAGE_SIZE);
- }
-#endif /* CONFIG_MODULES */ return no_free_ptr(page); } @@ -2338,6 +2347,8 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- its_fini_mod(NULL);
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
On Wed, May 28, 2025 at 05:58:21PM +0200, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 03:30:33PM +0200, Jürgen Groß wrote:
Have a look at its_fini_mod().
Oh, that's what you mean. But this still isn't very nice, you now have restore_rox() without make_temp_rw(), which was the intended usage pattern.
Bah, I hate how execmem works different for !PSE, Mike, you see a sane way to fix this?
The least ugly thing I could think of is to replace the current pattern of
execmem_alloc() exemem_make_temp_rw() /* update */ execmem_restore_rox()
with
execmem_alloc_rw() /* update */ execmem_protect()
but I still haven't got to try it.
On Wed, May 28, 2025 at 05:58:21PM +0200, Peter Zijlstra wrote:
On Wed, May 28, 2025 at 03:30:33PM +0200, Jürgen Groß wrote:
Have a look at its_fini_mod().
Oh, that's what you mean. But this still isn't very nice, you now have restore_rox() without make_temp_rw(), which was the intended usage pattern.
Bah, I hate how execmem works different for !PSE, Mike, you see a sane way to fix this?
Anyway, if we have to do something like this, then I would prefer it shaped something like so:
I expanded this a bit and here's what I've got:
https://lore.kernel.org/lkml/20250603111446.2609381-1-rppt@kernel.org/
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index ecfe7b497cad..33d4d139cb50 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -111,9 +111,8 @@ static bool cfi_paranoid __ro_after_init; #ifdef CONFIG_MITIGATION_ITS -#ifdef CONFIG_MODULES static struct module *its_mod; -#endif +static struct its_array its_pages; static void *its_page; static unsigned int its_offset; @@ -151,68 +150,78 @@ static void *its_init_thunk(void *thunk, int reg) return thunk + offset; } -#ifdef CONFIG_MODULES void its_init_mod(struct module *mod) { if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- mutex_lock(&text_mutex);
- its_mod = mod;
- its_page = NULL;
- if (mod) {
mutex_lock(&text_mutex);
its_mod = mod;
its_page = NULL;
- }
} void its_fini_mod(struct module *mod) {
- struct its_array *pages = &its_pages;
- if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
WARN_ON_ONCE(its_mod != mod);
- its_mod = NULL;
- its_page = NULL;
- mutex_unlock(&text_mutex);
- if (mod) {
pages = &mod->arch.its_pages;
its_mod = NULL;
its_page = NULL;
mutex_unlock(&text_mutex);
- }
- for (int i = 0; i < mod->its_num_pages; i++) {
void *page = mod->its_page_array[i];
- for (int i = 0; i < pages->num; i++) {
execmem_restore_rox(page, PAGE_SIZE); }void *page = pages->pages[i];
- if (!mod)
kfree(pages->pages);
} void its_free_mod(struct module *mod) {
- struct its_array *pages = &its_pages;
- if (!cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS)) return;
- for (int i = 0; i < mod->its_num_pages; i++) {
void *page = mod->its_page_array[i];
- if (mod)
pages = &mod->arch.its_pages;
- for (int i = 0; i < pages->num; i++) {
execmem_free(page); }void *page = pages->pages[i];
- kfree(mod->its_page_array);
- kfree(pages->pages);
} -#endif /* CONFIG_MODULES */ static void *its_alloc(void) {
- void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE);
- struct its_array *pages = &its_pages;
- void *tmp;
- void *page __free(execmem) = execmem_alloc(EXECMEM_MODULE_TEXT, PAGE_SIZE); if (!page) return NULL;
-#ifdef CONFIG_MODULES
- if (its_mod) {
void *tmp = krealloc(its_mod->its_page_array,
(its_mod->its_num_pages+1) * sizeof(void *),
GFP_KERNEL);
if (!tmp)
return NULL;
- tmp = krealloc(pages->pages, (pages->num + 1) * sizeof(void *), GFP_KERNEL);
- if (!tmp)
return NULL;
its_mod->its_page_array = tmp;
its_mod->its_page_array[its_mod->its_num_pages++] = page;
- pages->pages = tmp;
- pages->pages[pages->num++] = page;
- if (its_mod) execmem_make_temp_rw(page, PAGE_SIZE);
- }
-#endif /* CONFIG_MODULES */ return no_free_ptr(page); } @@ -2338,6 +2347,8 @@ void __init alternative_instructions(void) apply_retpolines(__retpoline_sites, __retpoline_sites_end); apply_returns(__return_sites, __return_sites_end);
- its_fini_mod(NULL);
- /*
- Adjust all CALL instructions to point to func()-10, including
- those in .altinstr_replacement.
Hi Juergen,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/master] [also build test ERROR on tip/x86/core linus/master v6.15 next-20250528] [cannot apply to tip/x86/mm tip/auto-latest] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Juergen-Gross/x86-execmem-don... base: tip/master patch link: https://lore.kernel.org/r/20250528123557.12847-4-jgross%40suse.com patch subject: [PATCH 3/3] x86/alternative: make kernel ITS thunks read-only config: x86_64-buildonly-randconfig-002-20250529 (https://download.01.org/0day-ci/archive/20250529/202505291124.eAZ7fgbG-lkp@i...) compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250529/202505291124.eAZ7fgbG-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202505291124.eAZ7fgbG-lkp@intel.com/
All errors (new ones prefixed by >>):
arch/x86/kernel/alternative.c:2353:6: error: use of undeclared identifier 'its_page'
2353 | if (its_page) | ^
arch/x86/kernel/alternative.c:2354:3: error: call to undeclared function 'its_set_kernel_ro'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
2354 | its_set_kernel_ro(its_page); | ^ arch/x86/kernel/alternative.c:2354:21: error: use of undeclared identifier 'its_page' 2354 | its_set_kernel_ro(its_page); | ^ arch/x86/kernel/alternative.c:2355:2: error: use of undeclared identifier 'its_page' 2355 | its_page = NULL; | ^ 4 errors generated.
vim +/its_page +2353 arch/x86/kernel/alternative.c
2308 2309 void __init alternative_instructions(void) 2310 { 2311 u64 ibt; 2312 2313 int3_selftest(); 2314 2315 /* 2316 * The patching is not fully atomic, so try to avoid local 2317 * interruptions that might execute the to be patched code. 2318 * Other CPUs are not running. 2319 */ 2320 stop_nmi(); 2321 2322 /* 2323 * Don't stop machine check exceptions while patching. 2324 * MCEs only happen when something got corrupted and in this 2325 * case we must do something about the corruption. 2326 * Ignoring it is worse than an unlikely patching race. 2327 * Also machine checks tend to be broadcast and if one CPU 2328 * goes into machine check the others follow quickly, so we don't 2329 * expect a machine check to cause undue problems during to code 2330 * patching. 2331 */ 2332 2333 /* 2334 * Make sure to set (artificial) features depending on used paravirt 2335 * functions which can later influence alternative patching. 2336 */ 2337 paravirt_set_cap(); 2338 2339 /* Keep CET-IBT disabled until caller/callee are patched */ 2340 ibt = ibt_save(/*disable*/ true); 2341 2342 __apply_fineibt(__retpoline_sites, __retpoline_sites_end, 2343 __cfi_sites, __cfi_sites_end, true); 2344 2345 /* 2346 * Rewrite the retpolines, must be done before alternatives since 2347 * those can rewrite the retpoline thunks. 2348 */ 2349 apply_retpolines(__retpoline_sites, __retpoline_sites_end); 2350 apply_returns(__return_sites, __return_sites_end); 2351 2352 /* Make potential last thunk page read-only. */
2353 if (its_page) 2354 its_set_kernel_ro(its_page);
2355 its_page = NULL; 2356 2357 /* 2358 * Adjust all CALL instructions to point to func()-10, including 2359 * those in .altinstr_replacement. 2360 */ 2361 callthunks_patch_builtin_calls(); 2362 2363 apply_alternatives(__alt_instructions, __alt_instructions_end); 2364 2365 /* 2366 * Seal all functions that do not have their address taken. 2367 */ 2368 apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end); 2369 2370 ibt_restore(ibt); 2371
linux-stable-mirror@lists.linaro.org