The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6c41468c7c12d74843bb414fc00307ea8a6318c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023041134-curvature-campsite-e51b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6c41468c7c12 ("KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
6ad75c5c99f7 ("KVM: x86: Rename kvm_x86_ops.queue_exception to inject_exception")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
9bd1f0efa859 ("KVM: nVMX: Clear IDT vectoring on nested VM-Exit for double/triple fault")
c3634d25fbee ("KVM: nVMX: Leave most VM-Exit info fields unmodified on failed VM-Entry")
1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
db663af4a001 ("kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense")
b9f3973ab3a8 ("KVM: x86: nSVM: implement nested VMLOAD/VMSAVE")
23e5092b6e2a ("KVM: SVM: Rename hook implementations to conform to kvm_x86_ops' names")
e27bc0440ebd ("KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names")
068f7ea61895 ("KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch")
e1779c2714c3 ("KVM: x86: nSVM: fix potential NULL derefernce on nested migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c41468c7c12d74843bb414fc00307ea8a6318c3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 22 Mar 2023 07:32:59 -0700
Subject: [PATCH] KVM: x86: Clear "has_error_code", not "error_code", for RM
exception injection
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230322143300.2209476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45017576ad5e..7d6f98b7635f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9908,13 +9908,20 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
+ /*
+ * Suppress the error code if the vCPU is in Real Mode, as Real Mode
+ * exceptions don't report error codes. The presence of an error code
+ * is carried with the exception and only stripped when the exception
+ * is injected as intercepted #PF VM-Exits for AMD's Paged Real Mode do
+ * report an error code despite the CPU being in Real Mode.
+ */
+ vcpu->arch.exception.has_error_code &= is_protmode(vcpu);
+
trace_kvm_inj_exception(vcpu->arch.exception.vector,
vcpu->arch.exception.has_error_code,
vcpu->arch.exception.error_code,
vcpu->arch.exception.injected);
- if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
- vcpu->arch.exception.error_code = false;
static_call(kvm_x86_inject_exception)(vcpu);
}
xen kbdfront registers itself as being able to deliver *any* key since
it doesn't know what keys the backend may produce.
Unfortunately, the generated modalias gets too large and uevent creation
fails with -ENOMEM.
This can lead to gdm not using the keyboard since there is no seat
associated [1] and the debian installer crashing [2].
Trim the ranges of key capabilities by removing some BTN_* ranges.
While doing this, some neighboring undefined ranges are removed to trim
it further.
An upper limit of KEY_KBD_LCD_MENU5 is still too large. Use an upper
limit of KEY_BRIGHTNESS_MENU.
This removes:
BTN_DPAD_UP(0x220)..BTN_DPAD_RIGHT(0x223)
Empty space 0x224..0x229
Empty space 0x28a..0x28f
KEY_MACRO1(0x290)..KEY_MACRO30(0x2ad)
KEY_MACRO_RECORD_START 0x2b0
KEY_MACRO_RECORD_STOP 0x2b1
KEY_MACRO_PRESET_CYCLE 0x2b2
KEY_MACRO_PRESET1(0x2b3)..KEY_MACRO_PRESET3(0xb5)
Empty space 0x2b6..0x2b7
KEY_KBD_LCD_MENU1(0x2b8)..KEY_KBD_LCD_MENU5(0x2bc)
Empty space 0x2bd..0x2bf
BTN_TRIGGER_HAPPY(0x2c0)..BTN_TRIGGER_HAPPY40(0x2e7)
Empty space 0x2e8..0x2ff
The modalias shrinks from 2082 to 1550 bytes.
A chunk of keys need to be removed to allow the keyboard to be used.
This may break some functionality, but the hope is these macro keys are
uncommon and don't affect any users.
[1] https://github.com/systemd/systemd/issues/22944
[2] https://lore.kernel.org/xen-devel/87o8dw52jc.fsf@vps.thesusis.net/T/
Cc: Phillip Susi <phill(a)thesusis.net>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk(a)gmail.com>
Reviewed-by: Mattijs Korpershoek <mkorpershoek(a)baylibre.com>
---
v3:
Add Mattijs R-b
Put /* and */ on separate lines
---
drivers/input/misc/xen-kbdfront.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/input/misc/xen-kbdfront.c b/drivers/input/misc/xen-kbdfront.c
index 67f1c7364c95..d59ba8f9852e 100644
--- a/drivers/input/misc/xen-kbdfront.c
+++ b/drivers/input/misc/xen-kbdfront.c
@@ -256,7 +256,16 @@ static int xenkbd_probe(struct xenbus_device *dev,
__set_bit(EV_KEY, kbd->evbit);
for (i = KEY_ESC; i < KEY_UNKNOWN; i++)
__set_bit(i, kbd->keybit);
- for (i = KEY_OK; i < KEY_MAX; i++)
+ /*
+ * In theory we want to go KEY_OK..KEY_MAX, but that grows the
+ * modalias line too long. There is a gap of buttons from
+ * BTN_DPAD_UP..BTN_DPAD_RIGHT and KEY_ALS_TOGGLE is the next
+ * defined. Then continue up to KEY_BRIGHTNESS_MENU as an upper
+ * limit.
+ */
+ for (i = KEY_OK; i < BTN_DPAD_UP; i++)
+ __set_bit(i, kbd->keybit);
+ for (i = KEY_ALS_TOGGLE; i <= KEY_BRIGHTNESS_MENU; i++)
__set_bit(i, kbd->keybit);
ret = input_register_device(kbd);
--
2.41.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 64620e0a1e712a778095bd35cbb277dc2259281f Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 11 Jan 2022 14:43:41 +0000
Subject: [PATCH] bpf: Fix out of bounds access for ringbuf helpers
Both bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument. They both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.
Meaning, after a NULL check in the code, the verifier will promote the register
type in the non-NULL branch to a PTR_TO_MEM and in the NULL branch to a known
zero scalar. Generally, pointer arithmetic on PTR_TO_MEM is allowed, so the
latter could have an offset.
The ARG_PTR_TO_ALLOC_MEM expects a PTR_TO_MEM register type. However, the non-
zero result from bpf_ringbuf_reserve() must be fed into either bpf_ringbuf_submit()
or bpf_ringbuf_discard() but with the original offset given it will then read
out the struct bpf_ringbuf_hdr mapping.
The verifier missed to enforce a zero offset, so that out of bounds access
can be triggered which could be used to escalate privileges if unprivileged
BPF was enabled (disabled by default in kernel).
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: <tr3e.wang(a)gmail.com> (SecCoder Security Lab)
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e0b3f4d683eb..c72c57a6684f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5318,9 +5318,15 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
case PTR_TO_BUF:
case PTR_TO_BUF | MEM_RDONLY:
case PTR_TO_STACK:
+ /* Some of the argument types nevertheless require a
+ * zero register offset.
+ */
+ if (arg_type == ARG_PTR_TO_ALLOC_MEM)
+ goto force_off_check;
break;
/* All the rest must be rejected: */
default:
+force_off_check:
err = __check_ptr_off_reg(env, reg, regno,
type == PTR_TO_BTF_ID);
if (err < 0)
The commit 3a2dbc510c43 ("driver core: fw_devlink: Don't purge child
fwnode's consumer links") introduces the possibility to use the
supplier's parent device instead of the supplier itself.
In that case the supplier fwnode used is not updated and is no more
consistent with the supplier device used.
Use the fwnode consistent with the supplier device when checking flags.
Fixes: 3a2dbc510c43 ("driver core: fw_devlink: Don't purge child fwnode's consumer links")
Cc: stable(a)vger.kernel.org
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
---
Changes v2 -> v3:
Do not update the supplier handle in order to keep the original handle
for debug traces.
Changes v1 -> v2:
Remove sup_handle check and related pr_debug() call as sup_handle cannot be
invalid if sup_dev is valid.
drivers/base/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4d8b315c48a1..440b52ec027f 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2082,7 +2082,7 @@ static int fw_devlink_create_devlink(struct device *con,
* supplier device indefinitely.
*/
if (sup_dev->links.status == DL_DEV_NO_DRIVER &&
- sup_handle->flags & FWNODE_FLAG_INITIALIZED) {
+ sup_dev->fwnode->flags & FWNODE_FLAG_INITIALIZED) {
dev_dbg(con,
"Not linking %pfwf - dev might never probe\n",
sup_handle);
--
2.41.0
From: Jeff Vanhoof <qjv001(a)motorola.com>
arm-smmu related crashes seen after a Missed ISOC interrupt when
no_interrupt=1 is used. This can happen if the hardware is still using
the data associated with a TRB after the usb_request's ->complete call
has been made. Instead of immediately releasing a request when a Missed
ISOC interrupt has occurred, this change will add logic to cancel the
request instead where it will eventually be released when the
END_TRANSFER command has completed. This logic is similar to some of the
cleanup done in dwc3_gadget_ep_dequeue.
Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001(a)motorola.com>
Co-developed-by: Dan Vacura <w36195(a)motorola.com>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8f9959ba9fd4..9b005d912241 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -943,6 +943,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1
u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..411532c5c378 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ if (status == -EXDEV) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;
- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);
- if (!dep->endpoint.desc)
- return no_started_trb;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+ dwc3_gadget_move_cancelled_request(req,
+ DWC3_REQUEST_STATUS_MISSED_ISOC);
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }
out:
/*
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d5fbeff1ab812b6c473b6924bee8748469462e2c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023100421-divisible-bacterium-18b5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d5fbeff1ab812b6c473b6924bee8748469462e2c Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Sat, 16 Sep 2023 12:52:46 +0200
Subject: [PATCH] mptcp: move __mptcp_error_report in protocol.c
This will simplify the next patch ("mptcp: process pending subflow error
on close").
No functional change intended.
Cc: stable(a)vger.kernel.org # v5.12+
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a7fc16f5175d..915860027b1a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -770,6 +770,42 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
return moved;
}
+void __mptcp_error_report(struct sock *sk)
+{
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ int err = sock_error(ssk);
+ int ssk_state;
+
+ if (!err)
+ continue;
+
+ /* only propagate errors on fallen-back sockets or
+ * on MPC connect
+ */
+ if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk))
+ continue;
+
+ /* We need to propagate only transition to CLOSE state.
+ * Orphaned socket will see such state change via
+ * subflow_sched_work_if_closed() and that path will properly
+ * destroy the msk as needed.
+ */
+ ssk_state = inet_sk_state_load(ssk);
+ if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
+ inet_sk_state_store(sk, ssk_state);
+ WRITE_ONCE(sk->sk_err, -err);
+
+ /* This barrier is coupled with smp_rmb() in mptcp_poll() */
+ smp_wmb();
+ sk_error_report(sk);
+ break;
+ }
+}
+
/* In most cases we will be able to lock the mptcp socket. If its already
* owned, we need to defer to the work queue to avoid ABBA deadlock.
*/
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 9bf3c7bc1762..2f40c23fdb0d 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1362,42 +1362,6 @@ void mptcp_space(const struct sock *ssk, int *space, int *full_space)
*full_space = mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf));
}
-void __mptcp_error_report(struct sock *sk)
-{
- struct mptcp_subflow_context *subflow;
- struct mptcp_sock *msk = mptcp_sk(sk);
-
- mptcp_for_each_subflow(msk, subflow) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
- int err = sock_error(ssk);
- int ssk_state;
-
- if (!err)
- continue;
-
- /* only propagate errors on fallen-back sockets or
- * on MPC connect
- */
- if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk))
- continue;
-
- /* We need to propagate only transition to CLOSE state.
- * Orphaned socket will see such state change via
- * subflow_sched_work_if_closed() and that path will properly
- * destroy the msk as needed.
- */
- ssk_state = inet_sk_state_load(ssk);
- if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
- inet_sk_state_store(sk, ssk_state);
- WRITE_ONCE(sk->sk_err, -err);
-
- /* This barrier is coupled with smp_rmb() in mptcp_poll() */
- smp_wmb();
- sk_error_report(sk);
- break;
- }
-}
-
static void subflow_error_report(struct sock *ssk)
{
struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
The special casing was originally added in pre-git history; reproducing
the commit log here:
> commit a318a92567d77
> Author: Andrew Morton <akpm(a)osdl.org>
> Date: Sun Sep 21 01:42:22 2003 -0700
>
> [PATCH] Speed up direct-io hugetlbpage handling
>
> This patch short-circuits all the direct-io page dirtying logic for
> higher-order pages. Without this, we pointlessly bounce BIOs up to
> keventd all the time.
In the last twenty years, compound pages have become used for more than
just hugetlb. Rewrite these functions to operate on folios instead
of pages and remove the special case for hugetlbfs; I don't think
it's needed any more (and if it is, we can put it back in as a call
to folio_test_hugetlb()).
This was found by inspection; as far as I can tell, this bug can lead
to pages used as the destination of a direct I/O read not being marked
as dirty. If those pages are then reclaimed by the MM without being
dirtied for some other reason, they won't be written out. Then when
they're faulted back in, they will not contain the data they should.
It'll take a pretty unusual setup to produce this problem with several
races all going the wrong way.
This problem predates the folio work; it could for example have been
triggered by mmaping a THP in tmpfs and using that as the target of an
O_DIRECT read.
Fixes: 800d8c63b2e98 ("shmem: add huge pages support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
---
block/bio.c | 46 ++++++++++++++++++++++++----------------------
1 file changed, 24 insertions(+), 22 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 8672179213b9..f46d8ec71fbd 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1171,13 +1171,22 @@ EXPORT_SYMBOL(bio_add_folio);
void __bio_release_pages(struct bio *bio, bool mark_dirty)
{
- struct bvec_iter_all iter_all;
- struct bio_vec *bvec;
+ struct folio_iter fi;
+
+ bio_for_each_folio_all(fi, bio) {
+ struct page *page;
+ size_t done = 0;
- bio_for_each_segment_all(bvec, bio, iter_all) {
- if (mark_dirty && !PageCompound(bvec->bv_page))
- set_page_dirty_lock(bvec->bv_page);
- bio_release_page(bio, bvec->bv_page);
+ if (mark_dirty) {
+ folio_lock(fi.folio);
+ folio_mark_dirty(fi.folio);
+ folio_unlock(fi.folio);
+ }
+ page = folio_page(fi.folio, fi.offset / PAGE_SIZE);
+ do {
+ bio_release_page(bio, page++);
+ done += PAGE_SIZE;
+ } while (done < fi.length);
}
}
EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1455,18 +1464,12 @@ EXPORT_SYMBOL(bio_free_pages);
* bio_set_pages_dirty() and bio_check_pages_dirty() are support functions
* for performing direct-IO in BIOs.
*
- * The problem is that we cannot run set_page_dirty() from interrupt context
+ * The problem is that we cannot run folio_mark_dirty() from interrupt context
* because the required locks are not interrupt-safe. So what we can do is to
* mark the pages dirty _before_ performing IO. And in interrupt context,
* check that the pages are still dirty. If so, fine. If not, redirty them
* in process context.
*
- * We special-case compound pages here: normally this means reads into hugetlb
- * pages. The logic in here doesn't really work right for compound pages
- * because the VM does not uniformly chase down the head page in all cases.
- * But dirtiness of compound pages is pretty meaningless anyway: the VM doesn't
- * handle them at all. So we skip compound pages here at an early stage.
- *
* Note that this code is very hard to test under normal circumstances because
* direct-io pins the pages with get_user_pages(). This makes
* is_page_cache_freeable return false, and the VM will not clean the pages.
@@ -1482,12 +1485,12 @@ EXPORT_SYMBOL(bio_free_pages);
*/
void bio_set_pages_dirty(struct bio *bio)
{
- struct bio_vec *bvec;
- struct bvec_iter_all iter_all;
+ struct folio_iter fi;
- bio_for_each_segment_all(bvec, bio, iter_all) {
- if (!PageCompound(bvec->bv_page))
- set_page_dirty_lock(bvec->bv_page);
+ bio_for_each_folio_all(fi, bio) {
+ folio_lock(fi.folio);
+ folio_mark_dirty(fi.folio);
+ folio_unlock(fi.folio);
}
}
@@ -1530,12 +1533,11 @@ static void bio_dirty_fn(struct work_struct *work)
void bio_check_pages_dirty(struct bio *bio)
{
- struct bio_vec *bvec;
+ struct folio_iter fi;
unsigned long flags;
- struct bvec_iter_all iter_all;
- bio_for_each_segment_all(bvec, bio, iter_all) {
- if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page))
+ bio_for_each_folio_all(fi, bio) {
+ if (!folio_test_dirty(fi.folio))
goto defer;
}
--
2.40.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9f1a98813b4b686482e5ef3c9d998581cace0ba6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023100447-durable-snowiness-8b36@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9f1a98813b4b686482e5ef3c9d998581cace0ba6 Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Sat, 16 Sep 2023 12:52:47 +0200
Subject: [PATCH] mptcp: process pending subflow error on close
On incoming TCP reset, subflow closing could happen before error
propagation. That in turn could cause the socket error being ignored,
and a missing socket state transition, as reported by Daire-Byrne.
Address the issues explicitly checking for subflow socket error at
close time. To avoid code duplication, factor-out of __mptcp_error_report()
a new helper implementing the relevant bits.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429
Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 915860027b1a..1c96b8da71df 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -770,40 +770,44 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
return moved;
}
+static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
+{
+ int err = sock_error(ssk);
+ int ssk_state;
+
+ if (!err)
+ return false;
+
+ /* only propagate errors on fallen-back sockets or
+ * on MPC connect
+ */
+ if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(mptcp_sk(sk)))
+ return false;
+
+ /* We need to propagate only transition to CLOSE state.
+ * Orphaned socket will see such state change via
+ * subflow_sched_work_if_closed() and that path will properly
+ * destroy the msk as needed.
+ */
+ ssk_state = inet_sk_state_load(ssk);
+ if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
+ inet_sk_state_store(sk, ssk_state);
+ WRITE_ONCE(sk->sk_err, -err);
+
+ /* This barrier is coupled with smp_rmb() in mptcp_poll() */
+ smp_wmb();
+ sk_error_report(sk);
+ return true;
+}
+
void __mptcp_error_report(struct sock *sk)
{
struct mptcp_subflow_context *subflow;
struct mptcp_sock *msk = mptcp_sk(sk);
- mptcp_for_each_subflow(msk, subflow) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
- int err = sock_error(ssk);
- int ssk_state;
-
- if (!err)
- continue;
-
- /* only propagate errors on fallen-back sockets or
- * on MPC connect
- */
- if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk))
- continue;
-
- /* We need to propagate only transition to CLOSE state.
- * Orphaned socket will see such state change via
- * subflow_sched_work_if_closed() and that path will properly
- * destroy the msk as needed.
- */
- ssk_state = inet_sk_state_load(ssk);
- if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
- inet_sk_state_store(sk, ssk_state);
- WRITE_ONCE(sk->sk_err, -err);
-
- /* This barrier is coupled with smp_rmb() in mptcp_poll() */
- smp_wmb();
- sk_error_report(sk);
- break;
- }
+ mptcp_for_each_subflow(msk, subflow)
+ if (__mptcp_subflow_error_report(sk, mptcp_subflow_tcp_sock(subflow)))
+ break;
}
/* In most cases we will be able to lock the mptcp socket. If its already
@@ -2428,6 +2432,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
}
out_release:
+ __mptcp_subflow_error_report(sk, ssk);
release_sock(ssk);
sock_put(ssk);
atomisp_create_pipes_stream does not handle errors in
__create_pipes and __create_streams in versions 5.10
and 5.15. The following patch may fix it.
Found by Linux Verification Center (linuxtesting.org) with Svace.