6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fangyu Yu <fangyu.yu(a)linux.alibaba.com>
[ Upstream commit 974555d6e417974e63444266e495a06d06c23af5 ]
When executing HLV* instructions at the HS mode, a guest page fault
may occur when a g-stage page table migration between triggering the
virtual instruction exception and executing the HLV* instruction.
This may be a corner case, and one simpler way to handle this is to
re-execute the instruction where the virtual instruction exception
occurred, and the guest page fault will be automatically handled.
Fixes: b91f0e4cb8a3 ("RISC-V: KVM: Factor-out instruction emulation into separate sources")
Signed-off-by: Fangyu Yu <fangyu.yu(a)linux.alibaba.com>
Reviewed-by: Anup Patel <anup(a)brainfault.org>
Link: https://lore.kernel.org/r/20251121133543.46822-1-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup(a)brainfault.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/riscv/kvm/vcpu_insn.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
index de1f96ea62251..4d89b94128aea 100644
--- a/arch/riscv/kvm/vcpu_insn.c
+++ b/arch/riscv/kvm/vcpu_insn.c
@@ -298,6 +298,22 @@ static int system_opcode_insn(struct kvm_vcpu *vcpu, struct kvm_run *run,
return (rc <= 0) ? rc : 1;
}
+static bool is_load_guest_page_fault(unsigned long scause)
+{
+ /**
+ * If a g-stage page fault occurs, the direct approach
+ * is to let the g-stage page fault handler handle it
+ * naturally, however, calling the g-stage page fault
+ * handler here seems rather strange.
+ * Considering this is a corner case, we can directly
+ * return to the guest and re-execute the same PC, this
+ * will trigger a g-stage page fault again and then the
+ * regular g-stage page fault handler will populate
+ * g-stage page table.
+ */
+ return (scause == EXC_LOAD_GUEST_PAGE_FAULT);
+}
+
/**
* kvm_riscv_vcpu_virtual_insn -- Handle virtual instruction trap
*
@@ -323,6 +339,8 @@ int kvm_riscv_vcpu_virtual_insn(struct kvm_vcpu *vcpu, struct kvm_run *run,
ct->sepc,
&utrap);
if (utrap.scause) {
+ if (is_load_guest_page_fault(utrap.scause))
+ return 1;
utrap.sepc = ct->sepc;
kvm_riscv_vcpu_trap_redirect(vcpu, &utrap);
return 1;
@@ -378,6 +396,8 @@ int kvm_riscv_vcpu_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, ct->sepc,
&utrap);
if (utrap.scause) {
+ if (is_load_guest_page_fault(utrap.scause))
+ return 1;
/* Redirect trap if we failed to read instruction */
utrap.sepc = ct->sepc;
kvm_riscv_vcpu_trap_redirect(vcpu, &utrap);
@@ -504,6 +524,8 @@ int kvm_riscv_vcpu_mmio_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, ct->sepc,
&utrap);
if (utrap.scause) {
+ if (is_load_guest_page_fault(utrap.scause))
+ return 1;
/* Redirect trap if we failed to read instruction */
utrap.sepc = ct->sepc;
kvm_riscv_vcpu_trap_redirect(vcpu, &utrap);
--
2.51.0
6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Teichmann <martin.teichmann(a)xfel.eu>
[ Upstream commit e3245f8990431950d20631c72236d4e8cb2dcde8 ]
A successful ebpf tail call does not return to the caller, but to the
caller-of-the-caller, often just finishing the ebpf program altogether.
Any restrictions that the verifier needs to take into account - notably
the fact that the tail call might have modified packet pointers - are to
be checked on the caller-of-the-caller. Checking it on the caller made
the verifier refuse perfectly fine programs that would use the packet
pointers after a tail call, which is no problem as this code is only
executed if the tail call was unsuccessful, i.e. nothing happened.
This patch simulates the behavior of a tail call in the verifier. A
conditional jump to the code after the tail call is added for the case
of an unsucessful tail call, and a return to the caller is simulated for
a successful tail call.
For the successful case we assume that the tail call returns an int,
as tail calls are currently only allowed in functions that return and
int. We always assume that the tail call modified the packet pointers,
as we do not know what the tail call did.
For the unsuccessful case we know nothing happened, so we do not need to
add new constraints.
This approach also allows to check other problems that may occur with
tail calls, namely we are now able to check that precision is properly
propagated into subprograms using tail calls, as well as checking the
live slots in such a subprogram.
Fixes: 1a4607ffba35 ("bpf: consider that tail calls invalidate packet pointers")
Link: https://lore.kernel.org/bpf/20251029105828.1488347-1-martin.teichmann@xfel.…
Signed-off-by: Martin Teichmann <martin.teichmann(a)xfel.eu>
Acked-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20251119160355.1160932-2-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/bpf/verifier.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 52c01c011c6fb..89560e455ce7b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4408,6 +4408,11 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
bt_reg_mask(bt));
return -EFAULT;
}
+ if (insn->src_reg == BPF_REG_0 && insn->imm == BPF_FUNC_tail_call
+ && subseq_idx - idx != 1) {
+ if (bt_subprog_enter(bt))
+ return -EFAULT;
+ }
} else if (opcode == BPF_EXIT) {
bool r0_precise;
@@ -10995,6 +11000,10 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
bool in_callback_fn;
int err;
+ err = bpf_update_live_stack(env);
+ if (err)
+ return err;
+
callee = state->frame[state->curframe];
r0 = &callee->regs[BPF_REG_0];
if (r0->type == PTR_TO_STACK) {
@@ -11912,6 +11921,25 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
env->prog->call_get_func_ip = true;
}
+ if (func_id == BPF_FUNC_tail_call) {
+ if (env->cur_state->curframe) {
+ struct bpf_verifier_state *branch;
+
+ mark_reg_scratched(env, BPF_REG_0);
+ branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false);
+ if (IS_ERR(branch))
+ return PTR_ERR(branch);
+ clear_all_pkt_pointers(env);
+ mark_reg_unknown(env, regs, BPF_REG_0);
+ err = prepare_func_exit(env, &env->insn_idx);
+ if (err)
+ return err;
+ env->insn_idx--;
+ } else {
+ changes_data = false;
+ }
+ }
+
if (changes_data)
clear_all_pkt_pointers(env);
return 0;
@@ -19810,9 +19838,6 @@ static int process_bpf_exit_full(struct bpf_verifier_env *env,
return PROCESS_BPF_EXIT;
if (env->cur_state->curframe) {
- err = bpf_update_live_stack(env);
- if (err)
- return err;
/* exit from nested function */
err = prepare_func_exit(env, &env->insn_idx);
if (err)
--
2.51.0
6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matt Bobrowski <mattbobrowski(a)google.com>
[ Upstream commit ae24fc8a16b0481ea8c5acbc66453c49ec0431c4 ]
Currently, test_perf_branches_no_hw() relies on the busy loop within
test_perf_branches_common() being slow enough to allow at least one
perf event sample tick to occur before starting to tear down the
backing perf event BPF program. With a relatively small fixed
iteration count of 1,000,000, this is not guaranteed on modern fast
CPUs, resulting in the test run to subsequently fail with the
following:
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:PASS:output not valid 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
test_perf_branches_no_hw:PASS:perf_event_open 0 nsec
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_bad_sample:FAIL:output not valid no valid sample from prog
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.
On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1
million increments of a volatile integer can take significantly less
than 1 millisecond. If the spin loop and detachment of the perf event
BPF program elapses before the first 1 ms sampling interval elapses,
the perf event will never end up firing. Fix this by bumping the loop
iteration counter a little within test_perf_branches_common(), along
with ensuring adding another loop termination condition which is
directly influenced by the backing perf event BPF program
executing. Notably, a concious decision was made to not adjust the
sample_freq value as that is just not a reliable way to go about
fixing the problem. It effectively still leaves the race window open.
Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski(a)google.com>
Reviewed-by: Jiri Olsa <jolsa(a)kernel.org>
Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/prog_tests/perf_branches.c | 16 ++++++++++++++--
.../selftests/bpf/progs/test_perf_branches.c | 3 +++
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
index 06c7986131d96..0a7ef770c487c 100644
--- a/tools/testing/selftests/bpf/prog_tests/perf_branches.c
+++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
@@ -15,6 +15,10 @@ static void check_good_sample(struct test_perf_branches *skel)
int pbe_size = sizeof(struct perf_branch_entry);
int duration = 0;
+ if (CHECK(!skel->bss->run_cnt, "invalid run_cnt",
+ "checked sample validity before prog run"))
+ return;
+
if (CHECK(!skel->bss->valid, "output not valid",
"no valid sample from prog"))
return;
@@ -45,6 +49,10 @@ static void check_bad_sample(struct test_perf_branches *skel)
int written_stack = skel->bss->written_stack_out;
int duration = 0;
+ if (CHECK(!skel->bss->run_cnt, "invalid run_cnt",
+ "checked sample validity before prog run"))
+ return;
+
if (CHECK(!skel->bss->valid, "output not valid",
"no valid sample from prog"))
return;
@@ -83,8 +91,12 @@ static void test_perf_branches_common(int perf_fd,
err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
if (CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
goto out_destroy;
- /* spin the loop for a while (random high number) */
- for (i = 0; i < 1000000; ++i)
+
+ /* Spin the loop for a while by using a high iteration count, and by
+ * checking whether the specific run count marker has been explicitly
+ * incremented at least once by the backing perf_event BPF program.
+ */
+ for (i = 0; i < 100000000 && !*(volatile int *)&skel->bss->run_cnt; ++i)
++j;
test_perf_branches__detach(skel);
diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
index a1ccc831c882f..05ac9410cd68c 100644
--- a/tools/testing/selftests/bpf/progs/test_perf_branches.c
+++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
@@ -8,6 +8,7 @@
#include <bpf/bpf_tracing.h>
int valid = 0;
+int run_cnt = 0;
int required_size_out = 0;
int written_stack_out = 0;
int written_global_out = 0;
@@ -24,6 +25,8 @@ int perf_branches(void *ctx)
__u64 entries[4 * 3] = {0};
int required_size, written_stack, written_global;
+ ++run_cnt;
+
/* write to stack */
written_stack = bpf_read_branch_records(ctx, entries, sizeof(entries), 0);
/* ignore spurious events */
--
2.51.0
6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gopi Krishna Menon <krishnagopi487(a)gmail.com>
[ Upstream commit a5160af78be7fcf3ade6caab0a14e349560c96d7 ]
The previous commit removed the PAGE_SIZE limit on transfer length of
raw_io buffer in order to avoid any problems with emulating USB devices
whose full configuration descriptor exceeds PAGE_SIZE in length. However
this also removes the upperbound on user supplied length, allowing very
large values to be passed to the allocator.
syzbot on fuzzing the transfer length with very large value (1.81GB)
results in kmalloc() to fall back to the page allocator, which triggers
a kernel warning as the page allocator cannot handle allocations more
than MAX_PAGE_ORDER/KMALLOC_MAX_SIZE.
Since there is no limit imposed on the size of buffer for both control
and non control transfers, cap the raw_io transfer length to
KMALLOC_MAX_SIZE and return -EINVAL for larger transfer length to
prevent any warnings from the page allocator.
Fixes: 37b9dd0d114a ("usb: raw-gadget: do not limit transfer length")
Tested-by: syzbot+d8fd35fa6177afa8c92b(a)syzkaller.appspotmail.com
Reported-by: syzbot+d8fd35fa6177afa8c92b(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68fc07a0.a70a0220.3bf6c6.01ab.GAE@google.com/
Signed-off-by: Gopi Krishna Menon <krishnagopi487(a)gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Link: https://patch.msgid.link/20251028165659.50962-1-krishnagopi487@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/usb/gadget/legacy/raw_gadget.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
index b71680c58de6c..46f343ba48b3d 100644
--- a/drivers/usb/gadget/legacy/raw_gadget.c
+++ b/drivers/usb/gadget/legacy/raw_gadget.c
@@ -40,6 +40,7 @@ MODULE_LICENSE("GPL");
static DEFINE_IDA(driver_id_numbers);
#define DRIVER_DRIVER_NAME_LENGTH_MAX 32
+#define USB_RAW_IO_LENGTH_MAX KMALLOC_MAX_SIZE
#define RAW_EVENT_QUEUE_SIZE 16
@@ -667,6 +668,8 @@ static void *raw_alloc_io_data(struct usb_raw_ep_io *io, void __user *ptr,
return ERR_PTR(-EINVAL);
if (!usb_raw_io_flags_valid(io->flags))
return ERR_PTR(-EINVAL);
+ if (io->length > USB_RAW_IO_LENGTH_MAX)
+ return ERR_PTR(-EINVAL);
if (get_from_user)
data = memdup_user(ptr + sizeof(*io), io->length);
else {
--
2.51.0
6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jisheng Zhang <jszhang(a)kernel.org>
[ Upstream commit 2b94b054ac4974ad2f89f7f7461840c851933adb ]
dwc2 on most platforms needs phy controller, clock and power supply.
All of them must be enabled/activated to properly operate. If dwc2
is configured as peripheral mode, then all the above three hardware
resources are disabled at the end of the probe:
/* Gadget code manages lowlevel hw on its own */
if (hsotg->dr_mode == USB_DR_MODE_PERIPHERAL)
dwc2_lowlevel_hw_disable(hsotg);
But the dwc2_suspend() tries to read the dwc2's reg to check whether
is_device_mode or not, this would result in hang during suspend if dwc2
is configured as peripheral mode.
Fix this hang by bypassing suspend/resume if lowlevel hw isn't
enabled.
Fixes: 09a75e857790 ("usb: dwc2: refactor common low-level hw code to platform.c")
Signed-off-by: Jisheng Zhang <jszhang(a)kernel.org>
Link: https://patch.msgid.link/20251104002503.17158-3-jszhang@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/usb/dwc2/platform.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index b07bdf16326a4..ef0d730770347 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -649,9 +649,13 @@ static int dwc2_driver_probe(struct platform_device *dev)
static int __maybe_unused dwc2_suspend(struct device *dev)
{
struct dwc2_hsotg *dwc2 = dev_get_drvdata(dev);
- bool is_device_mode = dwc2_is_device_mode(dwc2);
+ bool is_device_mode;
int ret = 0;
+ if (!dwc2->ll_hw_enabled)
+ return 0;
+
+ is_device_mode = dwc2_is_device_mode(dwc2);
if (is_device_mode)
dwc2_hsotg_suspend(dwc2);
@@ -728,6 +732,9 @@ static int __maybe_unused dwc2_resume(struct device *dev)
struct dwc2_hsotg *dwc2 = dev_get_drvdata(dev);
int ret = 0;
+ if (!dwc2->ll_hw_enabled)
+ return 0;
+
if (dwc2->phy_off_for_suspend && dwc2->ll_hw_enabled) {
ret = __dwc2_lowlevel_hw_enable(dwc2);
if (ret)
--
2.51.0