The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6c41468c7c12d74843bb414fc00307ea8a6318c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041134-curvature-campsite-e51b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6c41468c7c12 ("KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
6ad75c5c99f7 ("KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c41468c7c12d74843bb414fc00307ea8a6318c3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:32:59 -0700
Subject: [PATCH] KVM: x86: Clear "has_error_code", not "error_code", for RM
exception injection
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45017576ad5e..7d6f98b7635f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,13 +9908,20 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
+ /*
+ * Suppress the error code if the vCPU is in Real Mode, as Real Mode
+ * exceptions don't report error codes. The presence of an error code
+ * is carried with the exception and only stripped when the exception
+ * is injected as intercepted #PF VM-Exits for AMD's Paged Real Mode do
+ * report an error code despite the CPU being in Real Mode.
+ */
+ vcpu->arch.exception.has_error_code &= is_protmode(vcpu);
+
trace_kvm_inj_exception(vcpu->arch.exception.vector,
vcpu->arch.exception.has_error_code,
vcpu->arch.exception.error_code,
vcpu->arch.exception.injected);
- if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
- vcpu->arch.exception.error_code = false;
static_call(kvm_x86_inject_exception)(vcpu);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 64620e0a1e712a778095bd35cbb277dc2259281f Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 11 Jan 2022 14:43:41 +0000
Subject: [PATCH] bpf: Fix out of bounds access for ringbuf helpers
Both bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument. They both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.
Meaning, after a NULL check in the code, the verifier will promote the register
type in the non-NULL branch to a PTR_TO_MEM and in the NULL branch to a known
zero scalar. Generally, pointer arithmetic on PTR_TO_MEM is allowed, so the
latter could have an offset.
The ARG_PTR_TO_ALLOC_MEM expects a PTR_TO_MEM register type. However, the non-
zero result from bpf_ringbuf_reserve() must be fed into either bpf_ringbuf_submit()
or bpf_ringbuf_discard() but with the original offset given it will then read
out the struct bpf_ringbuf_hdr mapping.
The verifier missed to enforce a zero offset, so that out of bounds access
can be triggered which could be used to escalate privileges if unprivileged
BPF was enabled (disabled by default in kernel).
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: <tr3e.wang(a)gmail.com> (SecCoder Security Lab)
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e0b3f4d683eb..c72c57a6684f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5318,9 +5318,15 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
case PTR_TO_BUF:
case PTR_TO_BUF | MEM_RDONLY:
case PTR_TO_STACK:
+ /* Some of the argument types nevertheless require a
+ * zero register offset.
+ */
+ if (arg_type == ARG_PTR_TO_ALLOC_MEM)
+ goto force_off_check;
break;
/* All the rest must be rejected: */
default:
+force_off_check:
err = __check_ptr_off_reg(env, reg, regno,
type == PTR_TO_BTF_ID);
if (err < 0)
From: Jeff Vanhoof <qjv001(a)motorola.com>
arm-smmu related crashes seen after a Missed ISOC interrupt when
no_interrupt=1 is used. This can happen if the hardware is still using
the data associated with a TRB after the usb_request's ->complete call
has been made. Instead of immediately releasing a request when a Missed
ISOC interrupt has occurred, this change will add logic to cancel the
request instead where it will eventually be released when the
END_TRANSFER command has completed. This logic is similar to some of the
cleanup done in dwc3_gadget_ep_dequeue.
Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001(a)motorola.com>
Co-developed-by: Dan Vacura <w36195(a)motorola.com>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8f9959ba9fd4..9b005d912241 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -943,6 +943,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1
u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..411532c5c378 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ if (status == -EXDEV) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;
- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);
- if (!dep->endpoint.desc)
- return no_started_trb;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+ dwc3_gadget_move_cancelled_request(req,
+ DWC3_REQUEST_STATUS_MISSED_ISOC);
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }
out:
/*
--
2.34.1
The special casing was originally added in pre-git history; reproducing
the commit log here:
> commit a318a92567d77
> Author: Andrew Morton <akpm(a)osdl.org>
> Date: Sun Sep 21 01:42:22 2003 -0700
>
> [PATCH] Speed up direct-io hugetlbpage handling
>
> This patch short-circuits all the direct-io page dirtying logic for
> higher-order pages. Without this, we pointlessly bounce BIOs up to
> keventd all the time.
In the last twenty years, compound pages have become used for more than
just hugetlb. Rewrite these functions to operate on folios instead
of pages and remove the special case for hugetlbfs; I don't think
it's needed any more (and if it is, we can put it back in as a call
to folio_test_hugetlb()).
This was found by inspection; as far as I can tell, this bug can lead
to pages used as the destination of a direct I/O read not being marked
as dirty. If those pages are then reclaimed by the MM without being
dirtied for some other reason, they won't be written out. Then when
they're faulted back in, they will not contain the data they should.
It'll take a pretty unusual setup to produce this problem with several
races all going the wrong way.
This problem predates the folio work; it could for example have been
triggered by mmaping a THP in tmpfs and using that as the target of an
O_DIRECT read.
Fixes: 800d8c63b2e98 ("shmem: add huge pages support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
---
block/bio.c | 46 ++++++++++++++++++++++++----------------------
1 file changed, 24 insertions(+), 22 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 8672179213b9..f46d8ec71fbd 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1171,13 +1171,22 @@ EXPORT_SYMBOL(bio_add_folio);
void __bio_release_pages(struct bio *bio, bool mark_dirty)
{
- struct bvec_iter_all iter_all;
- struct bio_vec *bvec;
+ struct folio_iter fi;
+
+ bio_for_each_folio_all(fi, bio) {
+ struct page *page;
+ size_t done = 0;
- bio_for_each_segment_all(bvec, bio, iter_all) {
- if (mark_dirty && !PageCompound(bvec->bv_page))
- set_page_dirty_lock(bvec->bv_page);
- bio_release_page(bio, bvec->bv_page);
+ if (mark_dirty) {
+ folio_lock(fi.folio);
+ folio_mark_dirty(fi.folio);
+ folio_unlock(fi.folio);
+ }
+ page = folio_page(fi.folio, fi.offset / PAGE_SIZE);
+ do {
+ bio_release_page(bio, page++);
+ done += PAGE_SIZE;
+ } while (done < fi.length);
}
}
EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1455,18 +1464,12 @@ EXPORT_SYMBOL(bio_free_pages);
* bio_set_pages_dirty() and bio_check_pages_dirty() are support functions
* for performing direct-IO in BIOs.
*
- * The problem is that we cannot run set_page_dirty() from interrupt context
+ * The problem is that we cannot run folio_mark_dirty() from interrupt context
* because the required locks are not interrupt-safe. So what we can do is to
* mark the pages dirty _before_ performing IO. And in interrupt context,
* check that the pages are still dirty. If so, fine. If not, redirty them
* in process context.
*
- * We special-case compound pages here: normally this means reads into hugetlb
- * pages. The logic in here doesn't really work right for compound pages
- * because the VM does not uniformly chase down the head page in all cases.
- * But dirtiness of compound pages is pretty meaningless anyway: the VM doesn't
- * handle them at all. So we skip compound pages here at an early stage.
- *
* Note that this code is very hard to test under normal circumstances because
* direct-io pins the pages with get_user_pages(). This makes
* is_page_cache_freeable return false, and the VM will not clean the pages.
@@ -1482,12 +1485,12 @@ EXPORT_SYMBOL(bio_free_pages);
*/
void bio_set_pages_dirty(struct bio *bio)
{
- struct bio_vec *bvec;
- struct bvec_iter_all iter_all;
+ struct folio_iter fi;
- bio_for_each_segment_all(bvec, bio, iter_all) {
- if (!PageCompound(bvec->bv_page))
- set_page_dirty_lock(bvec->bv_page);
+ bio_for_each_folio_all(fi, bio) {
+ folio_lock(fi.folio);
+ folio_mark_dirty(fi.folio);
+ folio_unlock(fi.folio);
}
}
@@ -1530,12 +1533,11 @@ static void bio_dirty_fn(struct work_struct *work)
void bio_check_pages_dirty(struct bio *bio)
{
- struct bio_vec *bvec;
+ struct folio_iter fi;
unsigned long flags;
- struct bvec_iter_all iter_all;
- bio_for_each_segment_all(bvec, bio, iter_all) {
- if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page))
+ bio_for_each_folio_all(fi, bio) {
+ if (!folio_test_dirty(fi.folio))
goto defer;
}
--
2.40.1
Hi!
The Tegra20 requires an enabled VDE power domain during startup. As the
VDE is currently not used, it is disabled during runtime.
Since 8f0c714ad9be, there is a workaround for the "normal restart path"
which enables the VDE before doing PMC's warm reboot. This workaround is
not executed in the "emergency restart path", leading to a hang-up
during start.
This series implements and registers a new pmic-based restart handler
for boards with tps6586x. This cold reboot ensures that the VDE power
domain is enabled during startup on tegra20-based boards.
Since bae1d3a05a8b, i2c transfers are non-atomic while preemption is
disabled (which is e.g. done during panic()). This could lead to
warnings ("Voluntary context switch within RCU") in i2c-based restart
handlers during emergency restart. The state of preemption should be
detected by i2c_in_atomic_xfer_mode() to use atomic i2c xfer when
required. Beside the new system_state check, the check is the same as
the one pre v5.2.
---
v7:
- 5/5: drop mode check (suggested by Dmitry)
- Link to v6: https://lore.kernel.org/r/20230327-tegra-pmic-reboot-v6-0-af44a4cd82e9@skid…
v6:
- drop 4/6 to abort restart on unexpected failure (suggested by Dmitry)
- 4,5: fix comments in handlers (suggested by Lee)
- 4,5: same delay for both handlers (suggested by Lee)
v5:
- introduce new 3 & 4, therefore 3 -> 5, 4 -> 6
- 3: provide dev to sys_off handler, if it is known
- 4: return NOTIFY_DONE from sys_off_notify, to avoid skipping
- 5: drop Reviewed-by from Dmitry, add poweroff timeout
- 5,6: return notifier value instead of direct errno from handler
- 5,6: use new dev field instead of passing dev as cb_data
- 5,6: increase timeout values based on error observations
- 6: skip unsupported reboot modes in restart handler
---
Benjamin Bara (5):
kernel/reboot: emergency_restart: set correct system_state
i2c: core: run atomic i2c xfer when !preemptible
kernel/reboot: add device to sys_off_handler
mfd: tps6586x: use devm-based power off handler
mfd: tps6586x: register restart handler
drivers/i2c/i2c-core.h | 2 +-
drivers/mfd/tps6586x.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
include/linux/reboot.h | 3 +++
kernel/reboot.c | 4 ++++
4 files changed, 50 insertions(+), 9 deletions(-)
---
base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
change-id: 20230327-tegra-pmic-reboot-4175ff814a4b
Best regards,
--
Benjamin Bara <benjamin.bara(a)skidata.com>
There are two major types of uncorrected error (UC) :
- Action Required: The error is detected and the processor already consumes the
memory. OS requires to take action (for example, offline failure page/kill
failure thread) to recover this uncorrectable error.
- Action Optional: The error is detected out of processor execution context.
Some data in the memory are corrupted. But the data have not been consumed.
OS is optional to take action to recover this uncorrectable error.
For X86 platforms, we can easily distinguish between these two types
based on the MCA Bank. While for arm64 platform, the memory failure
flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0,
a.k.a, Action Optional now.
If UC is detected by a background scrubber, it is obviously an Action
Optional error. For other errors, we should conservatively regard them
as Action Required.
cper_sec_mem_err::error_type identifies the type of error that occurred
if CPER_MEM_VALID_ERROR_TYPE is set. So, set memory failure flags as 0
for Scrub Uncorrected Error (type 14). Otherwise, set memory failure
flags as MF_ACTION_REQUIRED.
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
---
drivers/acpi/apei/ghes.c | 10 ++++++++--
include/linux/cper.h | 3 +++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 80ad530583c9..6c03059cbfc6 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -474,8 +474,14 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
if (sec_sev == GHES_SEV_CORRECTED &&
(gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
flags = MF_SOFT_OFFLINE;
- if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
- flags = 0;
+ if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) {
+ if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
+ flags = mem_err->error_type == CPER_MEM_SCRUB_UC ?
+ 0 :
+ MF_ACTION_REQUIRED;
+ else
+ flags = MF_ACTION_REQUIRED;
+ }
if (flags != -1)
return ghes_do_memory_failure(mem_err->physical_addr, flags);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index eacb7dd7b3af..b77ab7636614 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -235,6 +235,9 @@ enum {
#define CPER_MEM_VALID_BANK_ADDRESS 0x100000
#define CPER_MEM_VALID_CHIP_ID 0x200000
+#define CPER_MEM_SCRUB_CE 13
+#define CPER_MEM_SCRUB_UC 14
+
#define CPER_MEM_EXT_ROW_MASK 0x3
#define CPER_MEM_EXT_ROW_SHIFT 16
--
2.20.1.9.gb50a0d7
Add an erratum for versions [v0.8 to v1.3) of OpenSBI which fail to add
the "no-map" property to the reserved memory nodes for the regions it
has protected using PMPs.
Our existing fix sweeping hibernation under the carpet by marking it
NONPORTABLE is insufficient as there are other ways to generate
accesses to these reserved memory regions, as Petr discovered [1]
while testing crash kernels & kdump.
Intercede during the boot process when the afflicted versions of OpenSBI
are present & set the "no-map" property in all "mmode_resv" nodes before
the kernel does its reserved memory region initialisation.
Reported-by: Song Shuai <suagrfillet(a)gmail.com>
Link: https://lore.kernel.org/all/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75Y…
Reported-by: JeeHeng Sia <jeeheng.sia(a)starfivetech.com>
Link: https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8
Reported-by: Petr Tesarik <petrtesarik(a)huaweicloud.com>
Closes: https://lore.kernel.org/linux-riscv/76ff0f51-d6c1-580d-f943-061e93073306@hu… [1]
CC: stable(a)vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
arch/riscv/include/asm/sbi.h | 5 +++++
arch/riscv/kernel/sbi.c | 42 +++++++++++++++++++++++++++++++++++-
arch/riscv/mm/init.c | 3 +++
3 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 5b4a1bf5f439..5360f3476278 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -252,6 +252,9 @@ enum sbi_pmu_ctr_type {
#define SBI_ERR_ALREADY_STARTED -7
#define SBI_ERR_ALREADY_STOPPED -8
+/* SBI implementation IDs */
+#define SBI_IMP_OPENSBI 1
+
extern unsigned long sbi_spec_version;
struct sbiret {
long error;
@@ -259,6 +262,8 @@ struct sbiret {
};
void sbi_init(void);
+void sbi_apply_reserved_mem_erratum(void *dtb_va);
+
struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4,
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index c672c8ba9a2a..aeb27263fa53 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -5,8 +5,10 @@
* Copyright (c) 2020 Western Digital Corporation or its affiliates.
*/
+#include <linux/acpi.h>
#include <linux/bits.h>
#include <linux/init.h>
+#include <linux/libfdt.h>
#include <linux/pm.h>
#include <linux/reboot.h>
#include <asm/sbi.h>
@@ -583,6 +585,40 @@ long sbi_get_mimpid(void)
}
EXPORT_SYMBOL_GPL(sbi_get_mimpid);
+static long sbi_firmware_id;
+static long sbi_firmware_version;
+
+/*
+ * For devicetrees patched by OpenSBI a "mmode_resv" node is added to cover
+ * the region OpenSBI has protected by means of a PMP. Some versions of OpenSBI,
+ * [v0.8 to v1.3), omitted the "no-map" property, but this trips up hibernation
+ * among other things.
+ */
+void __init sbi_apply_reserved_mem_erratum(void *dtb_pa)
+{
+ int child, reserved_mem;
+
+ if (sbi_firmware_id != SBI_IMP_OPENSBI)
+ return;
+
+ if (!acpi_disabled)
+ return;
+
+ if (sbi_firmware_version >= 0x10003 || sbi_firmware_version < 0x8)
+ return;
+
+ reserved_mem = fdt_path_offset((void *)dtb_pa, "/reserved-memory");
+ if (reserved_mem < 0)
+ return;
+
+ fdt_for_each_subnode(child, (void *)dtb_pa, reserved_mem) {
+ const char *name = fdt_get_name((void *)dtb_pa, child, NULL);
+
+ if (!strncmp(name, "mmode_resv", 10))
+ fdt_setprop((void *)dtb_pa, child, "no-map", NULL, 0);
+ }
+};
+
void __init sbi_init(void)
{
int ret;
@@ -596,8 +632,12 @@ void __init sbi_init(void)
sbi_major_version(), sbi_minor_version());
if (!sbi_spec_is_0_1()) {
+ sbi_firmware_id = sbi_get_firmware_id();
+ sbi_firmware_version = sbi_get_firmware_version();
+
pr_info("SBI implementation ID=0x%lx Version=0x%lx\n",
- sbi_get_firmware_id(), sbi_get_firmware_version());
+ sbi_firmware_id, sbi_firmware_version);
+
if (sbi_probe_extension(SBI_EXT_TIME)) {
__sbi_set_timer = __sbi_set_timer_v02;
pr_info("SBI TIME extension detected\n");
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 70fb31960b63..cb16bfdeacdb 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -29,6 +29,7 @@
#include <asm/tlbflush.h>
#include <asm/sections.h>
#include <asm/soc.h>
+#include <asm/sbi.h>
#include <asm/io.h>
#include <asm/ptdump.h>
#include <asm/numa.h>
@@ -253,6 +254,8 @@ static void __init setup_bootmem(void)
* in the device tree, otherwise the allocation could end up in a
* reserved region.
*/
+
+ sbi_apply_reserved_mem_erratum(dtb_early_va);
early_init_fdt_scan_reserved_mem();
/*
--
2.40.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 41a506ef71eb38d94fe133f565c87c3e06ccc072
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080733-wife-elope-de1b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind")
a5f04d1f2724 ("powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S")
228216716cb5 ("powerpc/ftrace: Refactor ftrace_{regs_}caller")
9bdb2eec3dde ("powerpc/ftrace: Don't use lmw/stmw in ftrace_regs_caller()")
76b372814b08 ("powerpc/ftrace: Style cleanup in ftrace_mprofile.S")
fc75f8733798 ("powerpc/ftrace: Have arch_ftrace_get_regs() return NULL unless FL_SAVE_REGS is set")
34d8dac807f0 ("powerpc/ftrace: Also save r1 in ftrace_caller()")
4ee83a2cfbc4 ("powerpc/ftrace: Remove ftrace_32.S")
41315494beed ("powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32")
830213786c49 ("powerpc/ftrace: directly call of function graph tracer by ftrace caller")
0c81ed5ed438 ("powerpc/ftrace: Refactor ftrace_{en/dis}able_ftrace_graph_caller")
40b035efe288 ("powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
c75388a8ceff ("powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
d95bf254be5f ("powerpc/ftrace: Prepare PPC32's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
7bdb478c1d15 ("powerpc/ftrace: Simplify PPC32's return_to_handler()")
7875bc9b07cd ("powerpc/ftrace: Don't save again LR in ftrace_regs_caller() on PPC32")
c545b9f040f3 ("powerpc/inst: Define ppc_inst_t")
aebd1fb45c62 ("powerpc: flexible GPR range save/restore macros")
7dfbfb87c243 ("powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32")
c93d4f6ecf4b ("powerpc/ftrace: Add module_trampoline_target() for PPC32")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41a506ef71eb38d94fe133f565c87c3e06ccc072 Mon Sep 17 00:00:00 2001
From: Naveen N Rao <naveen(a)kernel.org>
Date: Wed, 21 Jun 2023 10:43:49 +0530
Subject: [PATCH] powerpc/ftrace: Create a dummy stackframe to fix stack unwind
With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to
call into ftrace are emitted right at function entry. The instruction
sequence used is minimal to reduce overhead. Crucially, a stackframe is
not created for the function being traced. This breaks stack unwinding
since the function being traced does not have a stackframe for itself.
As such, it never shows up in the backtrace:
/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled
/sys/kernel/debug/tracing # cat stack_trace
Depth Size Location (17 entries)
----- ---- --------
0) 4144 32 ftrace_call+0x4/0x44
1) 4112 432 get_page_from_freelist+0x26c/0x1ad0
2) 3680 496 __alloc_pages+0x290/0x1280
3) 3184 336 __folio_alloc+0x34/0x90
4) 2848 176 vma_alloc_folio+0xd8/0x540
5) 2672 272 __handle_mm_fault+0x700/0x1cc0
6) 2400 208 handle_mm_fault+0xf0/0x3f0
7) 2192 80 ___do_page_fault+0x3e4/0xbe0
8) 2112 160 do_page_fault+0x30/0xc0
9) 1952 256 data_access_common_virt+0x210/0x220
10) 1696 400 0xc00000000f16b100
11) 1296 384 load_elf_binary+0x804/0x1b80
12) 912 208 bprm_execve+0x2d8/0x7e0
13) 704 64 do_execveat_common+0x1d0/0x2f0
14) 640 160 sys_execve+0x54/0x70
15) 480 64 system_call_exception+0x138/0x350
16) 416 416 system_call_common+0x160/0x2c4
Fix this by having ftrace create a dummy stackframe for the function
being traced. With this, backtraces now capture the function being
traced:
/sys/kernel/debug/tracing # cat stack_trace
Depth Size Location (17 entries)
----- ---- --------
0) 3888 32 _raw_spin_trylock+0x8/0x70
1) 3856 576 get_page_from_freelist+0x26c/0x1ad0
2) 3280 64 __alloc_pages+0x290/0x1280
3) 3216 336 __folio_alloc+0x34/0x90
4) 2880 176 vma_alloc_folio+0xd8/0x540
5) 2704 416 __handle_mm_fault+0x700/0x1cc0
6) 2288 96 handle_mm_fault+0xf0/0x3f0
7) 2192 48 ___do_page_fault+0x3e4/0xbe0
8) 2144 192 do_page_fault+0x30/0xc0
9) 1952 608 data_access_common_virt+0x210/0x220
10) 1344 16 0xc0000000334bbb50
11) 1328 416 load_elf_binary+0x804/0x1b80
12) 912 64 bprm_execve+0x2d8/0x7e0
13) 848 176 do_execveat_common+0x1d0/0x2f0
14) 672 192 sys_execve+0x54/0x70
15) 480 64 system_call_exception+0x138/0x350
16) 416 416 system_call_common+0x160/0x2c4
This results in two additional stores in the ftrace entry code, but
produces reliable backtraces.
Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Naveen N Rao <naveen(a)kernel.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230621051349.759567-1-naveen@kernel.org
diff --git a/arch/powerpc/kernel/trace/ftrace_mprofile.S b/arch/powerpc/kernel/trace/ftrace_mprofile.S
index ffb1db386849..1f7d86de1538 100644
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S
@@ -33,6 +33,9 @@
* and then arrange for the ftrace function to be called.
*/
.macro ftrace_regs_entry allregs
+ /* Create a minimal stack frame for representing B */
+ PPC_STLU r1, -STACK_FRAME_MIN_SIZE(r1)
+
/* Create our stack frame + pt_regs */
PPC_STLU r1,-SWITCH_FRAME_SIZE(r1)
@@ -42,7 +45,7 @@
#ifdef CONFIG_PPC64
/* Save the original return address in A's stack frame */
- std r0, LRSAVE+SWITCH_FRAME_SIZE(r1)
+ std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi r3, 0
@@ -77,6 +80,8 @@
mflr r7
/* Save it as pt_regs->nip */
PPC_STL r7, _NIP(r1)
+ /* Also save it in B's stackframe header for proper unwind */
+ PPC_STL r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
/* Save the read LR in pt_regs->link */
PPC_STL r0, _LINK(r1)
@@ -142,7 +147,7 @@
#endif
/* Pop our stack frame */
- addi r1, r1, SWITCH_FRAME_SIZE
+ addi r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
#ifdef CONFIG_LIVEPATCH_64
/* Based on the cmpd above, if the NIP was altered handle livepatch */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 41a506ef71eb38d94fe133f565c87c3e06ccc072
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080735-masculine-twister-ba16@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind")
a5f04d1f2724 ("powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S")
228216716cb5 ("powerpc/ftrace: Refactor ftrace_{regs_}caller")
9bdb2eec3dde ("powerpc/ftrace: Don't use lmw/stmw in ftrace_regs_caller()")
76b372814b08 ("powerpc/ftrace: Style cleanup in ftrace_mprofile.S")
fc75f8733798 ("powerpc/ftrace: Have arch_ftrace_get_regs() return NULL unless FL_SAVE_REGS is set")
34d8dac807f0 ("powerpc/ftrace: Also save r1 in ftrace_caller()")
4ee83a2cfbc4 ("powerpc/ftrace: Remove ftrace_32.S")
41315494beed ("powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32")
830213786c49 ("powerpc/ftrace: directly call of function graph tracer by ftrace caller")
0c81ed5ed438 ("powerpc/ftrace: Refactor ftrace_{en/dis}able_ftrace_graph_caller")
40b035efe288 ("powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
c75388a8ceff ("powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
d95bf254be5f ("powerpc/ftrace: Prepare PPC32's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
7bdb478c1d15 ("powerpc/ftrace: Simplify PPC32's return_to_handler()")
7875bc9b07cd ("powerpc/ftrace: Don't save again LR in ftrace_regs_caller() on PPC32")
c545b9f040f3 ("powerpc/inst: Define ppc_inst_t")
aebd1fb45c62 ("powerpc: flexible GPR range save/restore macros")
7dfbfb87c243 ("powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32")
c93d4f6ecf4b ("powerpc/ftrace: Add module_trampoline_target() for PPC32")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 41a506ef71eb38d94fe133f565c87c3e06ccc072 Mon Sep 17 00:00:00 2001
From: Naveen N Rao <naveen(a)kernel.org>
Date: Wed, 21 Jun 2023 10:43:49 +0530
Subject: [PATCH] powerpc/ftrace: Create a dummy stackframe to fix stack unwind
With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to
call into ftrace are emitted right at function entry. The instruction
sequence used is minimal to reduce overhead. Crucially, a stackframe is
not created for the function being traced. This breaks stack unwinding
since the function being traced does not have a stackframe for itself.
As such, it never shows up in the backtrace:
/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled
/sys/kernel/debug/tracing # cat stack_trace
Depth Size Location (17 entries)
----- ---- --------
0) 4144 32 ftrace_call+0x4/0x44
1) 4112 432 get_page_from_freelist+0x26c/0x1ad0
2) 3680 496 __alloc_pages+0x290/0x1280
3) 3184 336 __folio_alloc+0x34/0x90
4) 2848 176 vma_alloc_folio+0xd8/0x540
5) 2672 272 __handle_mm_fault+0x700/0x1cc0
6) 2400 208 handle_mm_fault+0xf0/0x3f0
7) 2192 80 ___do_page_fault+0x3e4/0xbe0
8) 2112 160 do_page_fault+0x30/0xc0
9) 1952 256 data_access_common_virt+0x210/0x220
10) 1696 400 0xc00000000f16b100
11) 1296 384 load_elf_binary+0x804/0x1b80
12) 912 208 bprm_execve+0x2d8/0x7e0
13) 704 64 do_execveat_common+0x1d0/0x2f0
14) 640 160 sys_execve+0x54/0x70
15) 480 64 system_call_exception+0x138/0x350
16) 416 416 system_call_common+0x160/0x2c4
Fix this by having ftrace create a dummy stackframe for the function
being traced. With this, backtraces now capture the function being
traced:
/sys/kernel/debug/tracing # cat stack_trace
Depth Size Location (17 entries)
----- ---- --------
0) 3888 32 _raw_spin_trylock+0x8/0x70
1) 3856 576 get_page_from_freelist+0x26c/0x1ad0
2) 3280 64 __alloc_pages+0x290/0x1280
3) 3216 336 __folio_alloc+0x34/0x90
4) 2880 176 vma_alloc_folio+0xd8/0x540
5) 2704 416 __handle_mm_fault+0x700/0x1cc0
6) 2288 96 handle_mm_fault+0xf0/0x3f0
7) 2192 48 ___do_page_fault+0x3e4/0xbe0
8) 2144 192 do_page_fault+0x30/0xc0
9) 1952 608 data_access_common_virt+0x210/0x220
10) 1344 16 0xc0000000334bbb50
11) 1328 416 load_elf_binary+0x804/0x1b80
12) 912 64 bprm_execve+0x2d8/0x7e0
13) 848 176 do_execveat_common+0x1d0/0x2f0
14) 672 192 sys_execve+0x54/0x70
15) 480 64 system_call_exception+0x138/0x350
16) 416 416 system_call_common+0x160/0x2c4
This results in two additional stores in the ftrace entry code, but
produces reliable backtraces.
Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Naveen N Rao <naveen(a)kernel.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://msgid.link/20230621051349.759567-1-naveen@kernel.org
diff --git a/arch/powerpc/kernel/trace/ftrace_mprofile.S b/arch/powerpc/kernel/trace/ftrace_mprofile.S
index ffb1db386849..1f7d86de1538 100644
--- a/arch/powerpc/kernel/trace/ftrace_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_mprofile.S
@@ -33,6 +33,9 @@
* and then arrange for the ftrace function to be called.
*/
.macro ftrace_regs_entry allregs
+ /* Create a minimal stack frame for representing B */
+ PPC_STLU r1, -STACK_FRAME_MIN_SIZE(r1)
+
/* Create our stack frame + pt_regs */
PPC_STLU r1,-SWITCH_FRAME_SIZE(r1)
@@ -42,7 +45,7 @@
#ifdef CONFIG_PPC64
/* Save the original return address in A's stack frame */
- std r0, LRSAVE+SWITCH_FRAME_SIZE(r1)
+ std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi r3, 0
@@ -77,6 +80,8 @@
mflr r7
/* Save it as pt_regs->nip */
PPC_STL r7, _NIP(r1)
+ /* Also save it in B's stackframe header for proper unwind */
+ PPC_STL r7, LRSAVE+SWITCH_FRAME_SIZE(r1)
/* Save the read LR in pt_regs->link */
PPC_STL r0, _LINK(r1)
@@ -142,7 +147,7 @@
#endif
/* Pop our stack frame */
- addi r1, r1, SWITCH_FRAME_SIZE
+ addi r1, r1, SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE
#ifdef CONFIG_LIVEPATCH_64
/* Based on the cmpd above, if the NIP was altered handle livepatch */