From: Chengming Zhou <zhouchengming(a)bytedance.com>
commit e5c0ca13659e9d18f53368d651ed7e6e433ec1cf upstream.
Chuck reported [1] an IO hang problem on NFS exports that reside on SATA
devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
submission path for post-flush requests").
We analysed the IO hang problem, found there are two postflush requests
waiting for each other.
The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
failed to blk_kick_flush() because of the second postflush request which
is inflight waiting in scheduler queue.
The second postflush waiting in scheduler queue can't be dispatched because
the first postflush hasn't released scheduler resource even though it has
completed by itself.
Fix it by releasing scheduler resource when the first postflush request
completed, so the second postflush can be dispatched and completed, then
make blk_kick_flush() succeed.
While at it, remove the check for e->ops.finish_request, as all
schedulers set that. Reaffirm this requirement by adding a WARN_ON_ONCE()
at scheduler registration time, just like we do for insert_requests and
dispatch_request.
[1] https://lore.kernel.org/all/7A57C7AE-A51A-4254-888B-FE15CA21F9E9@oracle.com/
Link: https://lore.kernel.org/linux-block/20230819031206.2744005-1-chengming.zhou…
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@intel.com
Fixes: 615939a2ae73 ("blk-mq: defer to the normal submission path for post-flush requests")
Reported-by: Chuck Lever <chuck.lever(a)oracle.com>
Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Tested-by: Chuck Lever <chuck.lever(a)oracle.com>
Link: https://lore.kernel.org/r/20230813152325.3017343-1-chengming.zhou@linux.dev
[axboe: folded in incremental fix and added tags]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
[bvanassche: changed RQF_USE_SCHED into RQF_ELVPRIV; restored the
finish_request pointer check before calling finish_request and removed
the new warning from the elevator code. This patch fixes an I/O hang
when submitting a REQ_FUA request to a request queue for a zoned block
device for which FUA has been disabled (QUEUE_FLAG_FUA is not set).]
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
block/blk-mq.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7ed6b9469f97..07610505c177 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -675,6 +675,22 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);
+static void blk_mq_finish_request(struct request *rq)
+{
+ struct request_queue *q = rq->q;
+
+ if ((rq->rq_flags & RQF_ELVPRIV) &&
+ q->elevator->type->ops.finish_request) {
+ q->elevator->type->ops.finish_request(rq);
+ /*
+ * For postflush request that may need to be
+ * completed twice, we should clear this flag
+ * to avoid double finish_request() on the rq.
+ */
+ rq->rq_flags &= ~RQF_ELVPRIV;
+ }
+}
+
static void __blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
@@ -701,9 +717,7 @@ void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
- if ((rq->rq_flags & RQF_ELVPRIV) &&
- q->elevator->type->ops.finish_request)
- q->elevator->type->ops.finish_request(rq);
+ blk_mq_finish_request(rq);
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);
@@ -1025,6 +1039,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
if (blk_mq_need_time_stamp(rq))
__blk_mq_end_request_acct(rq, ktime_get_ns());
+ blk_mq_finish_request(rq);
+
if (rq->end_io) {
rq_qos_done(rq->q, rq);
if (rq->end_io(rq, error) == RQ_END_IO_FREE)
@@ -1079,6 +1095,8 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
if (iob->need_ts)
__blk_mq_end_request_acct(rq, now);
+ blk_mq_finish_request(rq);
+
rq_qos_done(rq->q, rq);
/*
commit f45812cc23fb74bef62d4eb8a69fe7218f4b9f2a upstream.
Work around a quirk in a few old (2011-ish) UEFI implementations, where
a call to `GetNextVariableName` with a buffer size larger than 512 bytes
will always return EFI_INVALID_PARAMETER.
There is some lore around EFI variable names being up to 1024 bytes in
size, but this has no basis in the UEFI specification, and the upper
bounds are typically platform specific, and apply to the entire variable
(name plus payload).
Given that Linux does not permit creating files with names longer than
NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a
reasonable limit.
Cc: <stable(a)vger.kernel.org> # 6.1+
Signed-off-by: Tim Schumacher <timschumi(a)gmx.de>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
[timschumi(a)gmx.de: adjusted diff for changed context and code move]
Signed-off-by: Tim Schumacher <timschumi(a)gmx.de>
---
Please apply this patch to stable kernel 5.15, 5.10, 5.4, and 4.19
respectively. Kernel 6.1 and upwards were already handled via CC,
5.15 and below required a separate patch due to a slight refactor of
surrounding code in bbc6d2c6ef22 ("efi: vars: Switch to new wrapper
layer") and a subsequent code move in 2d82e6227ea1 ("efi: vars: Move
efivar caching layer into efivarfs").
Please note that the upper Signed-off-by tags are remnants from the
original patch, I documented my modifications below them and added
another sign-off. As far as I was able to gather, this is the expected
format for diverged stable patches.
I'm not sure on the specifics of manual stable backports, so let me
know in case anything doesn't follow the process. The linux-efi team
and list are on CC both for documentation/review purposes and in case
a new sign-off/ack of theirs is required.
---
drivers/firmware/efi/vars.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
index cae590bd08f2..eaed1ddcc803 100644
--- a/drivers/firmware/efi/vars.c
+++ b/drivers/firmware/efi/vars.c
@@ -415,7 +415,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
void *data, bool duplicates, struct list_head *head)
{
const struct efivar_operations *ops;
- unsigned long variable_name_size = 1024;
+ unsigned long variable_name_size = 512;
efi_char16_t *variable_name;
efi_status_t status;
efi_guid_t vendor_guid;
@@ -438,12 +438,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
}
/*
- * Per EFI spec, the maximum storage allocated for both
- * the variable name and variable data is 1024 bytes.
+ * A small set of old UEFI implementations reject sizes
+ * above a certain threshold, the lowest seen in the wild
+ * is 512.
*/
do {
- variable_name_size = 1024;
+ variable_name_size = 512;
status = ops->get_next_variable(&variable_name_size,
variable_name,
@@ -491,9 +492,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
break;
case EFI_NOT_FOUND:
break;
+ case EFI_BUFFER_TOO_SMALL:
+ pr_warn("efivars: Variable name size exceeds maximum (%lu > 512)\n",
+ variable_name_size);
+ status = EFI_NOT_FOUND;
+ break;
default:
- printk(KERN_WARNING "efivars: get_next_variable: status=%lx\n",
- status);
+ pr_warn("efivars: get_next_variable: status=%lx\n", status);
status = EFI_NOT_FOUND;
break;
}
--
2.44.0
From: Yang Jihong <yangjihong1(a)huawei.com>
commit 6b959ba22d34ca793ffdb15b5715457c78e38b1a upstream.
perf_output_read_group may respond to IPI request of other cores and invoke
__perf_install_in_context function. As a result, hwc configuration is modified.
causing inconsistency and unexpected consequences.
Interrupts are not disabled when perf_output_read_group reads PMU counter.
In this case, IPI request may be received from other cores.
As a result, PMU configuration is modified and an error occurs when
reading PMU counter:
CPU0 CPU1
__se_sys_perf_event_open
perf_install_in_context
perf_output_read_group smp_call_function_single
for_each_sibling_event(sub, leader) { generic_exec_single
if ((sub != event) && remote_function
(sub->state == PERF_EVENT_STATE_ACTIVE)) |
<enter IPI handler: __perf_install_in_context> <----RAISE IPI-----+
__perf_install_in_context
ctx_resched
event_sched_out
armpmu_del
...
hwc->idx = -1; // event->hwc.idx is set to -1
...
<exit IPI>
sub->pmu->read(sub);
armpmu_read
armv8pmu_read_counter
armv8pmu_read_hw_counter
int idx = event->hw.idx; // idx = -1
u64 val = armv8pmu_read_evcntr(idx);
u32 counter = ARMV8_IDX_TO_COUNTER(idx); // invalid counter = 30
read_pmevcntrn(counter) // undefined instruction
Signed-off-by: Yang Jihong <yangjihong1(a)huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lkml.kernel.org/r/20220902082918.179248-1-yangjihong1@huawei.com
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
---
This race may also lead to observed behavior like RCU stalls, hang tasks,
OOM. Likely due to list corruption or a similar root cause.
---
kernel/events/core.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4e5a73c7db12..e79cd0fd1d2b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7119,9 +7119,16 @@ static void perf_output_read_group(struct perf_output_handle *handle,
{
struct perf_event *leader = event->group_leader, *sub;
u64 read_format = event->attr.read_format;
+ unsigned long flags;
u64 values[6];
int n = 0;
+ /*
+ * Disabling interrupts avoids all counter scheduling
+ * (context switches, timer based rotation and IPIs).
+ */
+ local_irq_save(flags);
+
values[n++] = 1 + leader->nr_siblings;
if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -7157,6 +7164,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
__output_copy(handle, values, n * sizeof(u64));
}
+
+ local_irq_restore(flags);
}
#define PERF_FORMAT_TOTAL_TIMES (PERF_FORMAT_TOTAL_TIME_ENABLED|\
--
2.34.1
commit 38b43539d64b2fa020b3b9a752a986769f87f7a6 upstream.
Fix an incorrect number of pages being released for buffers that do not
start at the beginning of a page.
[ Tony: backport to v6.1 by replacing bio_release_page() loop with
folio_put_refs() as commits fd363244e883 and e4cc64657bec are not
present. ]
Fixes: 1b151e2435fc ("block: Remove special-casing of compound pages")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tony Battersby <tonyb(a)cybernetics.com>
Tested-by: Greg Edwards <gedwards(a)ddn.com>
Link: https://lore.kernel.org/r/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
This is the backport for 6.1.
The upstream patch should apply cleanly to 6.6, 6.7, and 6.8.
This patch does not need to be backported to 5.15, 5.10, 5.4, or 4.19,
since the backport of 1b151e2435fc to those kernels did not include
the bug fixed by this patch.
block/bio.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 74c2818c7ec9..3318e0022fdf 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1112,19 +1112,16 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
struct folio_iter fi;
bio_for_each_folio_all(fi, bio) {
- struct page *page;
- size_t done = 0;
+ size_t nr_pages;
if (mark_dirty) {
folio_lock(fi.folio);
folio_mark_dirty(fi.folio);
folio_unlock(fi.folio);
}
- page = folio_page(fi.folio, fi.offset / PAGE_SIZE);
- do {
- folio_put(fi.folio);
- done += PAGE_SIZE;
- } while (done < fi.length);
+ nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE -
+ fi.offset / PAGE_SIZE + 1;
+ folio_put_refs(fi.folio, nr_pages);
}
}
EXPORT_SYMBOL_GPL(__bio_release_pages);
base-commit: 61adba85cc40287232a539e607164f273260e0fe
--
2.25.1
This is a backport of recently upstreamed fix for XPS 9530 sound issue.
Both apply cleanly to 6.8.y, and could also be cherry-picked from upstream.
Ideally should be applied to all branches where upstream commit
d110858a6925827609d11db8513d76750483ec06 exists (6.8.y) or was backported
(6.7.y) as it adds initial yet incomplete support for this laptop. Patches
for 6.7.y require modification, will be submitted separately.
Signed-off-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Aleksandrs Vinarskis (2):
mfd: intel-lpss: Switch to generalized quirk table
mfd: intel-lpss: Introduce QUIRK_CLOCK_DIVIDER_UNITY for XPS 9530
drivers/mfd/intel-lpss-pci.c | 28 ++++++++++++++++++++--------
drivers/mfd/intel-lpss.c | 9 ++++++++-
drivers/mfd/intel-lpss.h | 14 +++++++++++++-
3 files changed, 41 insertions(+), 10 deletions(-)
--
2.40.1
v2:
- This includes the backport of recently upstreamed mitigation of a CPU
vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746).
This is because RFDS has a dependency on "Delay VERW" series, and it
is convenient to merge them together.
- rebased to v5.10.212
v1: https://lore.kernel.org/r/20240305-delay-verw-backport-5-10-y-v1-0-50bf452e…
This is the backport of recently upstreamed series that moves VERW
execution to a later point in exit-to-user path. This is needed because
in some cases it may be possible for data accessed after VERW executions
may end into MDS affected CPU buffers. Moving VERW closer to ring
transition reduces the attack surface.
- The series includes a dependency commit f87bc8dc7a7c ("x86/asm: Add
_ASM_RIP() macro for x86-64 (%rip) suffix").
- Patch 2 includes a change that adds runtime patching for jmp (instead
of verw in original series) due to lack of rip-relative relocation
support in kernels <v6.5.
- Fixed warning:
arch/x86/entry/entry.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction.
- Resolved merge conflicts in:
syscall_return_via_sysret in entry_64.S
swapgs_restore_regs_and_return_to_usermode in entry_64.S.
__vmx_vcpu_run in vmenter.S.
vmx_update_fb_clear_dis in vmx.c.
- Boot tested with KASLR and KPTI enabled.
- Verified VERW being executed with mitigation ON.
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
H. Peter Anvin (Intel) (1):
x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
Pawan Gupta (9):
x86/bugs: Add asm helpers for executing VERW
x86/entry_64: Add VERW just before userspace transition
x86/entry_32: Add VERW just before userspace transition
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Sean Christopherson (1):
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 21 ++++
Documentation/x86/mds.rst | 38 +++++---
arch/x86/Kconfig | 11 +++
arch/x86/entry/entry.S | 23 +++++
arch/x86/entry/entry_32.S | 3 +
arch/x86/entry/entry_64.S | 10 ++
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/include/asm/asm.h | 5 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/irqflags.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/include/asm/nospec-branch.h | 27 +++---
arch/x86/kernel/cpu/bugs.c | 107 ++++++++++++++++++---
arch/x86/kernel/cpu/common.c | 38 +++++++-
arch/x86/kernel/nmi.c | 3 -
arch/x86/kvm/vmx/run_flags.h | 7 +-
arch/x86/kvm/vmx/vmenter.S | 9 +-
arch/x86/kvm/vmx/vmx.c | 12 ++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 8 ++
include/linux/cpu.h | 2 +
25 files changed, 394 insertions(+), 54 deletions(-)
---
base-commit: 7cfcd0ed929b28ff6942c2bee15816d08d6f7266
change-id: 20240304-delay-verw-backport-5-10-y-00aad69432f4
Best regards,
--
Thanks,
Pawan