The patch titled
Subject: mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmalloc-fix-vmalloc-which-may-return-null-if-called-with-__gfp_nofail.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Hailong.Liu" <hailong.liu(a)oppo.com>
Subject: mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
Date: Fri, 10 May 2024 18:01:31 +0800
commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit
dd544141b9eb ("vmalloc: back off when the current task is OOM-killed"). A
possible scenario is as follows:
process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
__vmalloc_area_node()
vm_area_alloc_pages()
--> oom-killer send SIGKILL to process-a
if (fatal_signal_pending(current)) break;
--> return NULL;
To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.
This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198
[65731.259685] [T32454] Call trace:
[65731.259698] [T32454] dump_backtrace+0xf4/0x118
[65731.259734] [T32454] show_stack+0x18/0x24
[65731.259756] [T32454] dump_stack_lvl+0x60/0x7c
[65731.259781] [T32454] dump_stack+0x18/0x38
[65731.259800] [T32454] mrdump_common_die+0x250/0x39c [mrdump]
[65731.259936] [T32454] ipanic_die+0x20/0x34 [mrdump]
[65731.260019] [T32454] atomic_notifier_call_chain+0xb4/0xfc
[65731.260047] [T32454] notify_die+0x114/0x198
[65731.260073] [T32454] die+0xf4/0x5b4
[65731.260098] [T32454] die_kernel_fault+0x80/0x98
[65731.260124] [T32454] __do_kernel_fault+0x160/0x2a8
[65731.260146] [T32454] do_bad_area+0x68/0x148
[65731.260174] [T32454] do_mem_abort+0x151c/0x1b34
[65731.260204] [T32454] el1_abort+0x3c/0x5c
[65731.260227] [T32454] el1h_64_sync_handler+0x54/0x90
[65731.260248] [T32454] el1h_64_sync+0x68/0x6c
[65731.260269] [T32454] z_erofs_decompress_queue+0x7f0/0x2258
--> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
kernel panic by NULL pointer dereference.
erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
[65731.260293] [T32454] z_erofs_runqueue+0xf30/0x104c
[65731.260314] [T32454] z_erofs_readahead+0x4f0/0x968
[65731.260339] [T32454] read_pages+0x170/0xadc
[65731.260364] [T32454] page_cache_ra_unbounded+0x874/0xf30
[65731.260388] [T32454] page_cache_ra_order+0x24c/0x714
[65731.260411] [T32454] filemap_fault+0xbf0/0x1a74
[65731.260437] [T32454] __do_fault+0xd0/0x33c
[65731.260462] [T32454] handle_mm_fault+0xf74/0x3fe0
[65731.260486] [T32454] do_mem_abort+0x54c/0x1b34
[65731.260509] [T32454] el0_da+0x44/0x94
[65731.260531] [T32454] el0t_64_sync_handler+0x98/0xb4
[65731.260553] [T32454] el0t_64_sync+0x198/0x19c
Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
Signed-off-by: Hailong.Liu <hailong.liu(a)oppo.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Suggested-by: Barry Song <21cnbao(a)gmail.com>
Reported-by: Oven <liyangouwen1(a)oppo.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Chao Yu <chao(a)kernel.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Gao Xiang <xiang(a)kernel.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-fix-vmalloc-which-may-return-null-if-called-with-__gfp_nofail
+++ a/mm/vmalloc.c
@@ -3492,7 +3492,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
{
unsigned int nr_allocated = 0;
gfp_t alloc_gfp = gfp;
- bool nofail = false;
+ bool nofail = gfp & __GFP_NOFAIL;
struct page *page;
int i;
@@ -3549,12 +3549,11 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
* and compaction etc.
*/
alloc_gfp &= ~__GFP_NOFAIL;
- nofail = true;
}
/* High-order pages or fallback path if "bulk" fails. */
while (nr_allocated < nr_pages) {
- if (fatal_signal_pending(current))
+ if (!nofail && fatal_signal_pending(current))
break;
if (nid == NUMA_NO_NODE)
_
Patches currently in -mm which might be from hailong.liu(a)oppo.com are
mm-vmalloc-fix-vmalloc-which-may-return-null-if-called-with-__gfp_nofail.patch
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6c41468c7c12d74843bb414fc00307ea8a6318c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041134-curvature-campsite-e51b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6c41468c7c12 ("KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
6ad75c5c99f7 ("KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c41468c7c12d74843bb414fc00307ea8a6318c3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:32:59 -0700
Subject: [PATCH] KVM: x86: Clear "has_error_code", not "error_code", for RM
exception injection
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45017576ad5e..7d6f98b7635f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,13 +9908,20 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
+ /*
+ * Suppress the error code if the vCPU is in Real Mode, as Real Mode
+ * exceptions don't report error codes. The presence of an error code
+ * is carried with the exception and only stripped when the exception
+ * is injected as intercepted #PF VM-Exits for AMD's Paged Real Mode do
+ * report an error code despite the CPU being in Real Mode.
+ */
+ vcpu->arch.exception.has_error_code &= is_protmode(vcpu);
+
trace_kvm_inj_exception(vcpu->arch.exception.vector,
vcpu->arch.exception.has_error_code,
vcpu->arch.exception.error_code,
vcpu->arch.exception.injected);
- if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
- vcpu->arch.exception.error_code = false;
static_call(kvm_x86_inject_exception)(vcpu);
}
There is nothing preventing kernel memory allocators from allocating a
page that overlaps with PTR_ERR(), except for architecture-specific
code that setup memblock.
It was discovered that RISCV architecture doesn't setup memblock
corectly, leading to a page overlapping with PTR_ERR() being allocated,
and subsequently crashing the kernel (link in Close: )
The reported crash has nothing to do with PTR_ERR(): the last page
(at address 0xfffff000) being allocated leads to an unexpected
arithmetic overflow in ext4; but still, this page shouldn't be
allocated in the first place.
Because PTR_ERR() is an architecture-independent thing, we shouldn't
ask every single architecture to set this up. There may be other
architectures beside RISCV that have the same problem.
Fix this one and for all by reserving the physical memory page that
may be mapped to the last virtual memory page as part of low memory.
Unfortunately, this means if there is actual memory at this reserved
location, that memory will become inaccessible. However, if this page
is not reserved, it can only be accessed as high memory, so this
doesn't matter if high memory is not supported. Even if high memory is
supported, it is still only one page.
Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong…
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: <stable(a)vger.kernel.org> # all versions
---
init/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/init/main.c b/init/main.c
index 881f6230ee59..f8d2793c4641 100644
--- a/init/main.c
+++ b/init/main.c
@@ -900,6 +900,7 @@ void start_kernel(void)
page_address_init();
pr_notice("%s", linux_banner);
early_security_init();
+ memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
setup_arch(&command_line);
setup_boot_config();
setup_command_line(command_line);
--
2.39.2
The NOP op flags should have been checked from beginning like any other
opcode, otherwise NOP may not be extended with the op flags.
Given both liburing and Rust io-uring crate always zeros SQE op flags, just
ignore users which play raw NOP uring interface without zeroing SQE, because
NOP is just for test purpose. Then we can save one NOP2 opcode.
Suggested-by: Jens Axboe <axboe(a)kernel.dk>
Fixes: 2b188cc1bb85 ("Add io_uring IO interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
io_uring/nop.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/io_uring/nop.c b/io_uring/nop.c
index d956599a3c1b..1a4e312dfe51 100644
--- a/io_uring/nop.c
+++ b/io_uring/nop.c
@@ -12,6 +12,8 @@
int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
+ if (READ_ONCE(sqe->rw_flags))
+ return -EINVAL;
return 0;
}
--
2.42.0