It turned out that Aya Neo Air Plus had a different board name than
expected.
This patch changes Aya Neo Air's quirk to account for that, as both
devices share "Air" in DMI product name.
Tested on Air claiming to be an Air Pro, and on Air Plus.
Signed-off-by: Maya Matuszczyk <maccraft123mc(a)gmail.com>
---
drivers/gpu/drm/drm_panel_orientation_quirks.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_panel_orientation_quirks.c b/drivers/gpu/drm/drm_panel_orientation_quirks.c
index b1a38e6ce2f8..0cb646cb04ee 100644
--- a/drivers/gpu/drm/drm_panel_orientation_quirks.c
+++ b/drivers/gpu/drm/drm_panel_orientation_quirks.c
@@ -179,7 +179,7 @@ static const struct dmi_system_id orientation_data[] = {
}, { /* AYA NEO AIR */
.matches = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "AYANEO"),
- DMI_MATCH(DMI_BOARD_NAME, "AIR"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "AIR"),
},
.driver_data = (void *)&lcd1080x1920_leftside_up,
}, { /* AYA NEO NEXT */
--
2.40.1
From: "Russell King (Oracle)" <rmk+kernel(a)armlinux.org.uk>
The i.MX6 CPU frequency driver sometimes fails to register at boot time
due to nvmem_cell_read_u32() sporadically returning -ENOENT.
This happens because there is a window where __nvmem_device_get() in
of_nvmem_cell_get() is able to return the nvmem device, but as cells
have been setup, nvmem_find_cell_entry_by_node() returns NULL.
The occurs because the nvmem core registration code violates one of the
fundamental principles of kernel programming: do not publish data
structures before their setup is complete.
Fix this by making nvmem core code conform with this principle.
Cc: stable(a)vger.kernel.org
Fixes: eace75cfdcf7 ("nvmem: Add a simple NVMEM framework for nvmem providers")
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Christian Gabriel <christian.gabriel(a)linutronix.de>
---
drivers/nvmem/core.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 84f4078216a3..6aa8947c4d57 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -418,16 +418,10 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
device_initialize(&nvmem->dev);
- dev_dbg(&nvmem->dev, "Registering nvmem device %s\n", config->name);
-
- rval = device_add(&nvmem->dev);
- if (rval)
- goto err_put_device;
-
if (config->compat) {
rval = nvmem_sysfs_setup_compat(nvmem, config);
if (rval)
- goto err_device_del;
+ goto err_put_device;
}
if (config->cells) {
@@ -444,6 +438,12 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
if (rval)
goto err_remove_cells;
+ dev_dbg(&nvmem->dev, "Registering nvmem device %s\n", config->name);
+
+ rval = device_add(&nvmem->dev);
+ if (rval)
+ goto err_remove_cells;
+
blocking_notifier_call_chain(&nvmem_notifier, NVMEM_ADD, nvmem);
return nvmem;
@@ -453,8 +453,6 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
err_teardown_compat:
if (config->compat)
nvmem_sysfs_remove_compat(nvmem, config);
-err_device_del:
- device_del(&nvmem->dev);
err_put_device:
put_device(&nvmem->dev);
--
2.30.2
Greetings,
I am a loan Initiating officer of a UAE based loan finance company,
who are ready to fund projects outside the UAE.
We grant loans of various amounts to both Corporate and
Private entities at a low interest rate of 2% percent interest per annum.
The Loan terms are very flexible and interesting.
Kindly revert back to us if you have projects that needs loan
funding for further discussion and negotiation:
Thanks
Loan Initiating Officer
This patch series contains various bug fixes for the kvaser_pciefd driver.
Jimmy Assarsson (6):
can: kvaser_pciefd: Set CAN_STATE_STOPPED in kvaser_pciefd_stop()
can: kvaser_pciefd: Clear listen-only bit if not explicitly requested
can: kvaser_pciefd: Call request_irq() before enabling interrupts
can: kvaser_pciefd: Empty SRB buffer in probe
can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt
can: kvaser_pciefd: Disable interrupts in probe error path
drivers/net/can/kvaser_pciefd.c | 51 +++++++++++++++++++--------------
1 file changed, 29 insertions(+), 22 deletions(-)
--
2.40.0
Hi Joerg,
[Cc'ing stable(a)vger.kernel.org per Nadav's suggestion]
This bug fix seems to have gotten the necessary reviews (AMD and
previous commit author). Is it eligible to be applied?
Thanks,
Jon
On Wed, Apr 26, 2023 at 1:32 PM Jon Pan-Doh <pandoh(a)google.com> wrote:
>
> When running on an AMD vIOMMU, we observed multiple invalidations (of
> decreasing power of 2 aligned sizes) when unmapping a single page.
>
> Domain flush takes gather bounds (end-start) as size param. However,
> gather->end is defined as the last inclusive address (start + size - 1).
> This leads to an off by 1 error.
>
> With this patch, verified that 1 invalidation occurs when unmapping a
> single page.
>
> Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM")
> Signed-off-by: Jon Pan-Doh <pandoh(a)google.com>
> Tested-by: Sudheer Dantuluri <dantuluris(a)google.com>
> Suggested-by: Gary Zibrat <gzibrat(a)google.com>
> ---
> drivers/iommu/amd/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 5a505ba5467e..da45b1ab042d 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2378,7 +2378,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
> unsigned long flags;
>
> spin_lock_irqsave(&dom->lock, flags);
> - domain_flush_pages(dom, gather->start, gather->end - gather->start, 1);
> + domain_flush_pages(dom, gather->start, gather->end - gather->start + 1, 1);
> amd_iommu_domain_flush_complete(dom);
> spin_unlock_irqrestore(&dom->lock, flags);
> }
> --
> 2.40.0.634.g4ca3ef3211-goog
>
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
The control message provided by isotp support MSG_CMSG_COMPAT but
blocked recvmsg() syscalls that have set this flag, i.e. on 32bit user
space on 64 bit kernels.
Link: https://github.com/hartkopp/can-isotp/issues/59
Cc: Oleksij Rempel <o.rempel(a)pengutronix.de>
Suggested-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Fixes: 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket")
Link: https://lore.kernel.org/20230505110308.81087-2-mkl@pengutronix.de
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/isotp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index a750259cb79c..84f9aba02901 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1139,7 +1139,7 @@ static int isotp_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
struct isotp_sock *so = isotp_sk(sk);
int ret = 0;
- if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK))
+ if (flags & ~(MSG_DONTWAIT | MSG_TRUNC | MSG_PEEK | MSG_CMSG_COMPAT))
return -EINVAL;
if (!so->bound)
base-commit: df0acdc59b094cdaef19b1c8d83c9721082bab7b
--
2.39.2
fprobe_hander and fprobe_kprobe_handler has guarded ftrace recursion
detection but fprobe_exit_handler has not, which possibly introduce
recursive calls if the fprobe exit callback calls any traceable
functions. Checking in fprobe_hander or fprobe_kprobe_handler
is not enough and misses this case.
So add recursion free guard the same way as fprobe_hander. Since
ftrace recursion check does not employ ip(s), so here use entry_ip and
entry_parent_ip the same as fprobe_handler.
Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support")
Signed-off-by: Ze Gao <zegao(a)tencent.com>
Cc: stable(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Link: https://lore.kernel.org/linux-trace-kernel/20230516071830.8190-4-zegao@tenc…
---
kernel/trace/fprobe.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 097c740799ba..281b58c7dd14 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -17,6 +17,7 @@
struct fprobe_rethook_node {
struct rethook_node node;
unsigned long entry_ip;
+ unsigned long entry_parent_ip;
char data[];
};
@@ -39,6 +40,7 @@ static inline void __fprobe_handler(unsigned long ip, unsigned long
}
fpr = container_of(rh, struct fprobe_rethook_node, node);
fpr->entry_ip = ip;
+ fpr->entry_parent_ip = parent_ip;
if (fp->entry_data_size)
entry_data = fpr->data;
}
@@ -114,14 +116,26 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
{
struct fprobe *fp = (struct fprobe *)data;
struct fprobe_rethook_node *fpr;
+ int bit;
if (!fp || fprobe_disabled(fp))
return;
fpr = container_of(rh, struct fprobe_rethook_node, node);
+ /*
+ * we need to assure no calls to traceable functions in-between the
+ * end of fprobe_handler and the beginning of fprobe_exit_handler.
+ */
+ bit = ftrace_test_recursion_trylock(fpr->entry_ip, fpr->entry_parent_ip);
+ if (bit < 0) {
+ fp->nmissed++;
+ return;
+ }
+
fp->exit_handler(fp, fpr->entry_ip, regs,
fp->entry_data_size ? (void *)fpr->data : NULL);
+ ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_exit_handler);
--
2.40.1
Commit c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved
the call to swap_free() before the call to set_pte_at(), which meant that
the MTE tags could end up being freed before set_pte_at() had a chance
to restore them. Fix it by adding a call to the arch_swap_restore() hook
before the call to swap_free().
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c51…
Cc: <stable(a)vger.kernel.org> # 6.1
Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
Reported-by: Qun-wei Lin (林群崴) <Qun-wei.Lin(a)mediatek.com>
Link: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@…
---
v2:
- Call arch_swap_restore() directly instead of via arch_do_swap_page()
mm/memory.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index 01a23ad48a04..a2d9e6952d31 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3914,6 +3914,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
}
}
+ /*
+ * Some architectures may have to restore extra metadata to the page
+ * when reading from swap. This metadata may be indexed by swap entry
+ * so this must be called before swap_free().
+ */
+ arch_swap_restore(entry, folio);
+
/*
* Remove the swap entry and conditionally try to free up the swapcache.
* We're already holding a reference on the page but haven't mapped it
--
2.40.1.606.ga4b1b128d6-goog
In a SCSI request, storvsc pre-allocates space for up to
MAX_PAGE_BUFFER_COUNT physical frame numbers to be passed to Hyper-V.
If the size of the I/O request requires more PFNs, a separate memory
area of exactly the correct size is dynamically allocated.
But when the pre-allocated area is used, current code always passes
MAX_PAGE_BUFFER_COUNT PFNs to Hyper-V, even if fewer are needed. While
this doesn't break anything because the additional PFNs are always zero,
more bytes than necessary are copied into the VMBus channel ring buffer.
This takes CPU cycles and wastes space in the ring buffer. For a typical
4 Kbyte I/O that requires only a single PFN, 248 unnecessary bytes are
copied.
Fix this by setting the payload_sz based on the actual number of PFNs
required, not the size of the pre-allocated space.
Reported-by: John Starks <jostarks(a)microsoft.com>
Fixes: 8f43710543ef ("scsi: storvsc: Support PAGE_SIZE larger than 4K")
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
drivers/scsi/storvsc_drv.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index d9ce379..e6bc622 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1780,7 +1780,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
length = scsi_bufflen(scmnd);
payload = (struct vmbus_packet_mpb_array *)&cmd_request->mpb;
- payload_sz = sizeof(cmd_request->mpb);
+ payload_sz = 0;
if (scsi_sg_count(scmnd)) {
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
@@ -1789,10 +1789,10 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
unsigned long hvpfn, hvpfns_to_add;
int j, i = 0, sg_count;
- if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
+ payload_sz = (hvpg_count * sizeof(u64) +
+ sizeof(struct vmbus_packet_mpb_array));
- payload_sz = (hvpg_count * sizeof(u64) +
- sizeof(struct vmbus_packet_mpb_array));
+ if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
payload = kzalloc(payload_sz, GFP_ATOMIC);
if (!payload)
return SCSI_MLQUEUE_DEVICE_BUSY;
--
1.8.3.1
Hi Stable Team,
This patch, ID 0627f3df95e1609693f89e7ceb4156ac5db6e358, can be
applied to stable kernels 5.4 to 5.15 AS IS.
The patch has been merged to stable 6.1 and later. Thank you for your support!
Cheers,
Ping
Hi Stable Team,
This patch, ID 94b179052f95c294d83e9c9c34f7833cf3cd4305, can be
applied to stable kernel 4.14 to 5.15 AS IS.
The patch has been merged to stable 6.1 and later. It fixes a missing
proximity out event issue.
Thank you for your support!
Ping
On Tue, May 16, 2023 at 05:31:05PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd(a)arndb.de>
>
> The DT version of this board has a custom file with the gpio
> device. However, it does nothing because the d2net_init()
> has no caller or prototype:
>
> arch/arm/mach-orion5x/board-d2net.c:101:13: error: no previous prototype for 'd2net_init'
>
> Call it from the board-dt file as intended.
>
> Fixes: 94b0bd366e36 ("ARM: orion5x: convert d2net to Device Tree")
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Andrew
The mte_sync_page_tags() function sets PG_mte_tagged if it initializes
page tags. Then we return to mte_sync_tags(), which sets PG_mte_tagged
again. At best, this is redundant. However, it is possible for
mte_sync_page_tags() to return without having initialized tags for the
page, i.e. in the case where check_swap is true (non-compound page),
is_swap_pte(old_pte) is false and pte_is_tagged is false. So at worst,
we set PG_mte_tagged on a page with uninitialized tags. This can happen
if, for example, page migration causes a PTE for an untagged page to
be replaced. If the userspace program subsequently uses mprotect() to
enable PROT_MTE for that page, the uninitialized tags will be exposed
to userspace.
Fix it by removing the redundant call to set_page_mte_tagged().
Fixes: e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics")
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Cc: <stable(a)vger.kernel.org> # 6.1
Link: https://linux-review.googlesource.com/id/Ib02d004d435b2ed87603b858ef7480f7b…
---
arch/arm64/kernel/mte.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..7e89968bd282 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -66,13 +66,10 @@ void mte_sync_tags(pte_t old_pte, pte_t pte)
return;
/* if PG_mte_tagged is set, tags have already been initialised */
- for (i = 0; i < nr_pages; i++, page++) {
- if (!page_mte_tagged(page)) {
+ for (i = 0; i < nr_pages; i++, page++)
+ if (!page_mte_tagged(page))
mte_sync_page_tags(page, old_pte, check_swap,
pte_is_tagged);
- set_page_mte_tagged(page);
- }
- }
/* ensure the tags are visible before the PTE is set */
smp_wmb();
--
2.40.0.634.g4ca3ef3211-goog
Consider the following sequence of events:
1) A page in a PROT_READ|PROT_WRITE VMA is faulted.
2) Page migration allocates a page with the KASAN allocator,
causing it to receive a non-match-all tag, and uses it
to replace the page faulted in 1.
3) The program uses mprotect() to enable PROT_MTE on the page faulted in 1.
As a result of step 3, we are left with a non-match-all tag for a page
with tags accessible to userspace, which can lead to the same kind of
tag check faults that commit e74a68468062 ("arm64: Reset KASAN tag in
copy_highpage with HW tags only") intended to fix.
The general invariant that we have for pages in a VMA with VM_MTE_ALLOWED
is that they cannot have a non-match-all tag. As a result of step 2, the
invariant is broken. This means that the fix in the referenced commit
was incomplete and we also need to reset the tag for pages without
PG_mte_tagged.
Fixes: e5b8d9218951 ("arm64: mte: reset the page tag in page->flags")
Cc: <stable(a)vger.kernel.org> # 5.15
Link: https://linux-review.googlesource.com/id/I7409cdd41acbcb215c2a7417c1e50d37b…
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
---
arch/arm64/mm/copypage.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 4aadcfb01754..a7bb20055ce0 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -21,9 +21,10 @@ void copy_highpage(struct page *to, struct page *from)
copy_page(kto, kfrom);
+ if (kasan_hw_tags_enabled())
+ page_kasan_tag_reset(to);
+
if (system_supports_mte() && page_mte_tagged(from)) {
- if (kasan_hw_tags_enabled())
- page_kasan_tag_reset(to);
/* It's a new page, shouldn't have been tagged yet */
WARN_ON_ONCE(!try_page_mte_tagging(to));
mte_copy_page_tags(kto, kfrom);
--
2.40.0.396.gfff15efe05-goog
Hi Greg, Sasha,
This is a batch of -stable backport fixes for 4.19. This batch includes
dependency patches which are not currently in the 4.19 branch.
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 4f16d25c68ec ("netfilter: nftables: add nft_parse_register_load() and use it")
2) 345023b0db31 ("netfilter: nftables: add nft_parse_register_store() and use it")
3) 08a01c11a5bb ("netfilter: nftables: statify nft_parse_register()")
4) 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.")
5) 20a1452c3542 ("netfilter: nf_tables: add nft_setelem_parse_key()")
6) fdb9c405e35b ("netfilter: nf_tables: allow up to 64 bytes in the set element data area")
7) 7e6bc1f6cabc ("netfilter: nf_tables: stricter validation of element data")
8) 5a2f3dc31811 ("netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag")
9) 36d5b2913219 ("netfilter: nf_tables: do not allow RULE_ID to refer to another chain")
Patches #1, #2, #3, #5, #6 are dependencies.
Please apply,
Thanks,
Pablo Neira Ayuso (9):
netfilter: nftables: add nft_parse_register_load() and use it
netfilter: nftables: add nft_parse_register_store() and use it
netfilter: nftables: statify nft_parse_register()
netfilter: nf_tables: validate registers coming from userspace.
netfilter: nf_tables: add nft_setelem_parse_key()
netfilter: nf_tables: allow up to 64 bytes in the set element data area
netfilter: nf_tables: stricter validation of element data
netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag
netfilter: nf_tables: do not allow RULE_ID to refer to another chain
include/net/netfilter/nf_tables.h | 15 +-
include/net/netfilter/nf_tables_core.h | 9 +-
include/net/netfilter/nft_fib.h | 2 +-
include/net/netfilter/nft_masq.h | 4 +-
include/net/netfilter/nft_redir.h | 4 +-
net/ipv4/netfilter/nft_dup_ipv4.c | 18 +-
net/ipv6/netfilter/nft_dup_ipv6.c | 18 +-
net/netfilter/nf_tables_api.c | 228 ++++++++++++++++---------
net/netfilter/nft_bitwise.c | 14 +-
net/netfilter/nft_byteorder.c | 14 +-
net/netfilter/nft_cmp.c | 8 +-
net/netfilter/nft_ct.c | 12 +-
net/netfilter/nft_dup_netdev.c | 6 +-
net/netfilter/nft_dynset.c | 12 +-
net/netfilter/nft_exthdr.c | 14 +-
net/netfilter/nft_fib.c | 5 +-
net/netfilter/nft_fwd_netdev.c | 18 +-
net/netfilter/nft_hash.c | 25 ++-
net/netfilter/nft_immediate.c | 6 +-
net/netfilter/nft_lookup.c | 14 +-
net/netfilter/nft_masq.c | 14 +-
net/netfilter/nft_meta.c | 12 +-
net/netfilter/nft_nat.c | 35 ++--
net/netfilter/nft_numgen.c | 15 +-
net/netfilter/nft_objref.c | 6 +-
net/netfilter/nft_osf.c | 8 +-
net/netfilter/nft_payload.c | 10 +-
net/netfilter/nft_queue.c | 12 +-
net/netfilter/nft_range.c | 6 +-
net/netfilter/nft_redir.c | 14 +-
net/netfilter/nft_rt.c | 7 +-
net/netfilter/nft_socket.c | 7 +-
net/netfilter/nft_tproxy.c | 14 +-
net/netfilter/nft_tunnel.c | 8 +-
34 files changed, 328 insertions(+), 286 deletions(-)
--
2.30.2
The current uses of PageAnon in page table check functions can lead to
type confusion bugs between struct page and slab [1], if slab pages are
accidentally mapped into the user space. This is because slab reuses the
bits in struct page to store its internal states, which renders PageAnon
ineffective on slab pages.
Since slab pages are not expected to be mapped into the user space, this
patch adds BUG_ON(PageSlab(page)) checks to make sure that slab pages
are not inadvertently mapped. Otherwise, there must be some bugs in the
kernel.
Reported-by: syzbot+fcf1a817ceb50935ce99(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: df4e817b7108 ("mm: page table check")
Cc: <stable(a)vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000(a)pku.edu.cn>
---
include/linux/page-flags.h | 6 ++++++
mm/page_table_check.c | 6 ++++++
2 files changed, 12 insertions(+)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1c68d67b8..92a2063a0 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -617,6 +617,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
* Please note that, confusingly, "page_mapping" refers to the inode
* address_space which maps the page from disk; whereas "page_mapped"
* refers to user virtual address space into which the page is mapped.
+ *
+ * For slab pages, since slab reuses the bits in struct page to store its
+ * internal states, the page->mapping does not exist as such, nor do these
+ * flags below. So in order to avoid testing non-existent bits, please
+ * make sure that PageSlab(page) actually evaluates to false before calling
+ * the following functions (e.g., PageAnon). See mm/slab.h.
*/
#define PAGE_MAPPING_ANON 0x1
#define PAGE_MAPPING_MOVABLE 0x2
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 25d8610c0..f2baf97d5 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -71,6 +71,8 @@ static void page_table_check_clear(struct mm_struct *mm, unsigned long addr,
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+
+ BUG_ON(PageSlab(page));
anon = PageAnon(page);
for (i = 0; i < pgcnt; i++) {
@@ -107,6 +109,8 @@ static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
page = pfn_to_page(pfn);
page_ext = page_ext_get(page);
+
+ BUG_ON(PageSlab(page));
anon = PageAnon(page);
for (i = 0; i < pgcnt; i++) {
@@ -133,6 +137,8 @@ void __page_table_check_zero(struct page *page, unsigned int order)
struct page_ext *page_ext;
unsigned long i;
+ BUG_ON(PageSlab(page));
+
page_ext = page_ext_get(page);
BUG_ON(!page_ext);
for (i = 0; i < (1ul << order); i++) {
--
2.40.1
vmbus_wait_for_unload() may be called in the panic path after other
CPUs are stopped. vmbus_wait_for_unload() currently loops through
online CPUs looking for the UNLOAD response message. But the values of
CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
to stop the other CPUs, and in one of the paths the stopped CPUs
are removed from cpu_online_mask. This removal happens in both
x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
only checks the panic'ing CPU, and misses the UNLOAD response message
except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
eventually times out, but only after waiting 100 seconds.
Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
The cpu_present_mask is not modified by stopping the other CPUs in the
panic path, nor should it be. Furthermore, the synic_message_page
being checked in vmbus_wait_for_unload() is allocated in
hv_synic_alloc() for all present CPUs. So looping through the
present CPUs is more consistent.
For additional safety, also add a check for the message_page being
NULL before looking for the UNLOAD response message.
Reported-by: John Starks <jostarks(a)microsoft.com>
Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
drivers/hv/channel_mgmt.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 007f26d..df2ba20 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -829,11 +829,14 @@ static void vmbus_wait_for_unload(void)
if (completion_done(&vmbus_connection.unload_event))
goto completed;
- for_each_online_cpu(cpu) {
+ for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
page_addr = hv_cpu->synic_message_page;
+ if (!page_addr)
+ continue;
+
msg = (struct hv_message *)page_addr
+ VMBUS_MESSAGE_SINT;
@@ -867,11 +870,14 @@ static void vmbus_wait_for_unload(void)
* maybe-pending messages on all CPUs to be able to receive new
* messages after we reconnect.
*/
- for_each_online_cpu(cpu) {
+ for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
page_addr = hv_cpu->synic_message_page;
+ if (!page_addr)
+ continue;
+
msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
msg->header.message_type = HVMSG_NONE;
}
--
1.8.3.1
Without EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary
physical memory regions into the userspace via /dev/mem. At the same
time, pages may change their properties (e.g., from anonymous pages to
named pages) while they are still being mapped in the userspace, leading
to "corruption" detected by the page table check.
To avoid these false positives, this patch makes PAGE_TABLE_CHECK
depends on EXCLUSIVE_SYSTEM_RAM. This dependency is understandable
because PAGE_TABLE_CHECK is a hardening technique but /dev/mem without
STRICT_DEVMEM (i.e., !EXCLUSIVE_SYSTEM_RAM) is itself a security
problem.
Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be
mapped via /dev/mem. However, these pages are always considered as named
pages, so they won't break the logic used in the page table check.
Cc: <stable(a)vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000(a)pku.edu.cn>
---
Documentation/mm/page_table_check.rst | 19 +++++++++++++++++++
mm/Kconfig.debug | 1 +
2 files changed, 20 insertions(+)
diff --git a/Documentation/mm/page_table_check.rst b/Documentation/mm/page_table_check.rst
index cfd8f4117..c12838ce6 100644
--- a/Documentation/mm/page_table_check.rst
+++ b/Documentation/mm/page_table_check.rst
@@ -52,3 +52,22 @@ Build kernel with:
Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
table support without extra kernel parameter.
+
+Implementation notes
+====================
+
+We specifically decided not to use VMA information in order to avoid relying on
+MM states (except for limited "struct page" info). The page table check is a
+separate from Linux-MM state machine that verifies that the user accessible
+pages are not falsely shared.
+
+PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without
+EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory
+regions into the userspace via /dev/mem. At the same time, pages may change
+their properties (e.g., from anonymous pages to named pages) while they are
+still being mapped in the userspace, leading to "corruption" detected by the
+page table check.
+
+Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via
+/dev/mem. However, these pages are always considered as named pages, so they
+won't break the logic used in the page table check.
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index a925415b4..018a5bd2f 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -98,6 +98,7 @@ config PAGE_OWNER
config PAGE_TABLE_CHECK
bool "Check for invalid mappings in user page tables"
depends on ARCH_SUPPORTS_PAGE_TABLE_CHECK
+ depends on EXCLUSIVE_SYSTEM_RAM
select PAGE_EXTENSION
help
Check that anonymous page is not being mapped twice with read write
--
2.40.1
When hcd->localmem_pool is non-null, localmem_pool is used to allocate
DMA memory. In this case, the dma address will be properly returned (in
dma_handle), and dma_mmap_coherent should be used to map this memory
into the user space. However, the current implementation uses
pfn_remap_range, which is supposed to map normal pages.
Instead of repeating the logic in the memory allocation function, this
patch introduces a more robust solution. Here, the type of allocated
memory is checked by testing whether dma_handle is properly set. If
dma_handle is properly returned, it means some DMA pages are allocated
and dma_mmap_coherent should be used to map them. Otherwise, normal
pages are allocated and pfn_remap_range should be called. This ensures
that the correct mmap functions are used consistently, independently
with logic details that determine which type of memory gets allocated.
Fixes: a0e710a7def4 ("USB: usbfs: fix mmap dma mismatch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000(a)pku.edu.cn>
---
drivers/usb/core/devio.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 3936ca7f7..fcf68818e 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -235,7 +235,7 @@ static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
size_t size = vma->vm_end - vma->vm_start;
void *mem;
unsigned long flags;
- dma_addr_t dma_handle;
+ dma_addr_t dma_handle = DMA_MAPPING_ERROR;
int ret;
ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory));
@@ -265,7 +265,14 @@ static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
usbm->vma_use_count = 1;
INIT_LIST_HEAD(&usbm->memlist);
- if (hcd->localmem_pool || !hcd_uses_dma(hcd)) {
+ /*
+ * In DMA-unavailable cases, hcd_buffer_alloc_pages allocates
+ * normal pages and assigns DMA_MAPPING_ERROR to dma_handle. Check
+ * whether we are in such cases, and then use remap_pfn_range (or
+ * dma_mmap_coherent) to map normal (or DMA) pages into the user
+ * space, respectively.
+ */
+ if (dma_handle == DMA_MAPPING_ERROR) {
if (remap_pfn_range(vma, vma->vm_start,
virt_to_phys(usbm->mem) >> PAGE_SHIFT,
size, vma->vm_page_prot) < 0) {
--
2.40.1
When a GIC local interrupt is not routable, it's vl_map will be used
to control some internal states for core (providing IPTI, IPPCI, IPFDC
input signal for core). Overriding it will interfere core's intetrupt
controller.
Do not touch vl_map if a local interrupt is not routable, we are not
going to remap it.
Before dd098a0e0319 (" irqchip/mips-gic: Get rid of the reliance on
irq_cpu_online()"), if a local interrupt is not routable, then it won't
be requested from GIC Local domain, and thus gic_all_vpes_irq_cpu_online
won't be called for that particular interrupt.
Fixes: dd098a0e0319 (" irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
drivers/irqchip/irq-mips-gic.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index 046c355e120b..b568d55ef7c5 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -399,6 +399,8 @@ static void gic_all_vpes_irq_cpu_online(void)
unsigned int intr = local_intrs[i];
struct gic_all_vpes_chip_data *cd;
+ if (!gic_local_irq_is_routable(intr))
+ continue;
cd = &gic_all_vpes_chip_data[intr];
write_gic_vl_map(mips_gic_vx_map_reg(intr), cd->map);
if (cd->mask)
--
2.34.1
fprobe_hander and fprobe_kprobe_handler has guarded ftrace recursion
detection but fprobe_exit_handler has not, which possibly introduce
recursive calls if the fprobe exit callback calls any traceable
functions. Checking in fprobe_hander or fprobe_kprobe_handler
is not enough and misses this case.
So add recursion free guard the same way as fprobe_hander. Since
ftrace recursion check does not employ ip(s), so here use entry_ip and
entry_parent_ip the same as fprobe_handler.
Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support")
Signed-off-by: Ze Gao <zegao(a)tencent.com>
Cc: stable(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
---
kernel/trace/fprobe.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 097c740799ba..a9580a88cc15 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -17,6 +17,7 @@
struct fprobe_rethook_node {
struct rethook_node node;
unsigned long entry_ip;
+ unsigned long entry_parent_ip;
char data[];
};
@@ -39,6 +40,7 @@ static inline void __fprobe_handler(unsigned long ip, unsigned long
}
fpr = container_of(rh, struct fprobe_rethook_node, node);
fpr->entry_ip = ip;
+ fpr->entry_parent_ip = parent_ip;
if (fp->entry_data_size)
entry_data = fpr->data;
}
@@ -114,14 +116,25 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
{
struct fprobe *fp = (struct fprobe *)data;
struct fprobe_rethook_node *fpr;
+ int bit;
if (!fp || fprobe_disabled(fp))
return;
fpr = container_of(rh, struct fprobe_rethook_node, node);
+ /* we need to assure no calls to traceable functions in-between the
+ * end of fprobe_handler and the beginning of fprobe_exit_handler.
+ */
+ bit = ftrace_test_recursion_trylock(fpr->entry_ip, fpr->entry_parent_ip);
+ if (bit < 0) {
+ fp->nmissed++;
+ return;
+ }
+
fp->exit_handler(fp, fpr->entry_ip, regs,
fp->entry_data_size ? (void *)fpr->data : NULL);
+ ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_exit_handler);
--
2.40.1
This patch replaces preempt_{disable, enable} with its corresponding
notrace version in rethook_trampoline_handler so no worries about stack
recursion or overflow introduced by preempt_count_{add, sub} under
fprobe + rethook context.
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Signed-off-by: Ze Gao <zegao(a)tencent.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
---
kernel/trace/rethook.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
index 32c3dfdb4d6a..60f6cb2b486b 100644
--- a/kernel/trace/rethook.c
+++ b/kernel/trace/rethook.c
@@ -288,7 +288,7 @@ unsigned long rethook_trampoline_handler(struct pt_regs *regs,
* These loops must be protected from rethook_free_rcu() because those
* are accessing 'rhn->rethook'.
*/
- preempt_disable();
+ preempt_disable_notrace();
/*
* Run the handler on the shadow stack. Do not unlink the list here because
@@ -321,7 +321,7 @@ unsigned long rethook_trampoline_handler(struct pt_regs *regs,
first = first->next;
rethook_recycle(rhn);
}
- preempt_enable();
+ preempt_enable_notrace();
return correct_ret_addr;
}
--
2.40.1
Hello all,
I recently did have a regression on v6.4rc1, and it seems that the same
exact issue is now happening also on v6.1.28.
I was not able yet to bisect it (yet), but what is happening is that
libusbgx[1] that we use to configure a USB NCM gadget interface[2][3] just
hang completely at boot.
This is happening with multiple ARM32 and ARM64 i.MX SOC (i.MX6, i.MX7,
i.MX8MM).
The logs is something like that
```
[* �F] A start job is running for Load def…t schema g1.schema (6s / no limit)
M[K[** �F] A start job is running for Load def…t schema g1.schema (7s / no limit)
M[K[*** �F] A start job is running for Load def…t schema g1.schema (8s / no limit)
M[K[ *** �F] A start job is running for Load def…t schema g1.schema (8s / no limit)
```
I will try to bisect this and provide more useful feedback ASAP, I
decided to not wait for it and just send this email in case someone has
some insight on what is going on.
Francesco
[1] https://github.com/linux-usb-gadgets/libusbgx
[2] https://git.toradex.com/cgit/meta-toradex-bsp-common.git/tree/recipes-suppo…
[3] https://git.toradex.com/cgit/meta-toradex-bsp-common.git/tree/recipes-suppo…
As a result of the previous two patches, there are no circumstances
in which a swapped-in page is installed in a page table without first
having arch_swap_restore() called on it. Therefore, we no longer need
the logic in set_pte_at() that restores the tags, so remove it.
Because we can now rely on the page being locked, we no longer need to
handle the case where a page is having its tags restored by multiple tasks
concurrently, so we can slightly simplify the logic in mte_restore_tags().
This patch also fixes an issue where a page can have PG_mte_tagged set
with uninitialized tags. The issue is that the mte_sync_page_tags()
function sets PG_mte_tagged if it initializes page tags. Then we
return to mte_sync_tags(), which sets PG_mte_tagged again. At best,
this is redundant. However, it is possible for mte_sync_page_tags()
to return without having initialized tags for the page, i.e. in the
case where check_swap is true (non-compound page), is_swap_pte(old_pte)
is false and pte_is_tagged is false. So at worst, we set PG_mte_tagged
on a page with uninitialized tags. This can happen if, for example,
page migration causes a PTE for an untagged page to be replaced. If the
userspace program subsequently uses mprotect() to enable PROT_MTE for
that page, the uninitialized tags will be exposed to userspace.
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Link: https://linux-review.googlesource.com/id/I8ad54476f3b2d0144ccd8ce0c1d7a2963…
Fixes: e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics")
Cc: <stable(a)vger.kernel.org> # 6.1
---
The Fixes: tag (and the commit message in general) are written assuming
that this patch is landed in a maintainer tree instead of
"arm64: mte: Do not set PG_mte_tagged if tags were not initialized".
arch/arm64/include/asm/mte.h | 4 ++--
arch/arm64/include/asm/pgtable.h | 14 ++------------
arch/arm64/kernel/mte.c | 32 +++-----------------------------
arch/arm64/mm/mteswap.c | 7 +++----
4 files changed, 10 insertions(+), 47 deletions(-)
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 20dd06d70af5..dfea486a6a85 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -90,7 +90,7 @@ static inline bool try_page_mte_tagging(struct page *page)
}
void mte_zero_clear_page_tags(void *addr);
-void mte_sync_tags(pte_t old_pte, pte_t pte);
+void mte_sync_tags(pte_t pte);
void mte_copy_page_tags(void *kto, const void *kfrom);
void mte_thread_init_user(void);
void mte_thread_switch(struct task_struct *next);
@@ -122,7 +122,7 @@ static inline bool try_page_mte_tagging(struct page *page)
static inline void mte_zero_clear_page_tags(void *addr)
{
}
-static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
+static inline void mte_sync_tags(pte_t pte)
{
}
static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b6ba466e2e8a..efdf48392026 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -337,18 +337,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
* don't expose tags (instruction fetches don't check tags).
*/
if (system_supports_mte() && pte_access_permitted(pte, false) &&
- !pte_special(pte)) {
- pte_t old_pte = READ_ONCE(*ptep);
- /*
- * We only need to synchronise if the new PTE has tags enabled
- * or if swapping in (in which case another mapping may have
- * set tags in the past even if this PTE isn't tagged).
- * (!pte_none() && !pte_present()) is an open coded version of
- * is_swap_pte()
- */
- if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
- mte_sync_tags(old_pte, pte);
- }
+ !pte_special(pte) && pte_tagged(pte))
+ mte_sync_tags(pte);
__check_safe_pte_update(mm, ptep, pte);
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..c40728046fed 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -35,41 +35,15 @@ DEFINE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
EXPORT_SYMBOL_GPL(mte_async_or_asymm_mode);
#endif
-static void mte_sync_page_tags(struct page *page, pte_t old_pte,
- bool check_swap, bool pte_is_tagged)
-{
- if (check_swap && is_swap_pte(old_pte)) {
- swp_entry_t entry = pte_to_swp_entry(old_pte);
-
- if (!non_swap_entry(entry))
- mte_restore_tags(entry, page);
- }
-
- if (!pte_is_tagged)
- return;
-
- if (try_page_mte_tagging(page)) {
- mte_clear_page_tags(page_address(page));
- set_page_mte_tagged(page);
- }
-}
-
-void mte_sync_tags(pte_t old_pte, pte_t pte)
+void mte_sync_tags(pte_t pte)
{
struct page *page = pte_page(pte);
long i, nr_pages = compound_nr(page);
- bool check_swap = nr_pages == 1;
- bool pte_is_tagged = pte_tagged(pte);
-
- /* Early out if there's nothing to do */
- if (!check_swap && !pte_is_tagged)
- return;
/* if PG_mte_tagged is set, tags have already been initialised */
for (i = 0; i < nr_pages; i++, page++) {
- if (!page_mte_tagged(page)) {
- mte_sync_page_tags(page, old_pte, check_swap,
- pte_is_tagged);
+ if (try_page_mte_tagging(page)) {
+ mte_clear_page_tags(page_address(page));
set_page_mte_tagged(page);
}
}
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index cd508ba80ab1..3a78bf1b1364 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -53,10 +53,9 @@ void mte_restore_tags(swp_entry_t entry, struct page *page)
if (!tags)
return;
- if (try_page_mte_tagging(page)) {
- mte_restore_page_tags(page_address(page), tags);
- set_page_mte_tagged(page);
- }
+ WARN_ON_ONCE(!try_page_mte_tagging(page));
+ mte_restore_page_tags(page_address(page), tags);
+ set_page_mte_tagged(page);
}
void mte_invalidate_tags(int type, pgoff_t offset)
--
2.40.1.606.ga4b1b128d6-goog
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3ce29c17dc847bf4245e16aad78a7617afa96297
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050749-deskwork-snowboard-82cf@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3ce29c17dc84 ("igc: read before write to SRRCTL register")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ce29c17dc847bf4245e16aad78a7617afa96297 Mon Sep 17 00:00:00 2001
From: Song Yoong Siang <yoong.siang.song(a)intel.com>
Date: Tue, 2 May 2023 08:48:06 -0700
Subject: [PATCH] igc: read before write to SRRCTL register
igc_configure_rx_ring() function will be called as part of XDP program
setup. If Rx hardware timestamp is enabled prio to XDP program setup,
this timestamp enablement will be overwritten when buffer size is
written into SRRCTL register.
Thus, this commit read the register value before write to SRRCTL
register. This commit is tested by using xdp_hw_metadata bpf selftest
tool. The tool enables Rx hardware timestamp and then attach XDP program
to igc driver. It will display hardware timestamp of UDP packet with
port number 9092. Below are detail of test steps and results.
Command on DUT:
sudo ./xdp_hw_metadata <interface name>
Command on Link Partner:
echo -n skb | nc -u -q1 <destination IPv4 addr> 9092
Result before this patch:
skb hwtstamp is not found!
Result after this patch:
found skb hwtstamp = 1677800973.642836757
Optionally, read PHC to confirm the values obtained are almost the same:
Command:
sudo ./testptp -d /dev/ptp0 -g
Result:
clock time: 1677800973.913598978 or Fri Mar 3 07:49:33 2023
Fixes: fc9df2a0b520 ("igc: Enable RX via AF_XDP zero-copy")
Cc: <stable(a)vger.kernel.org> # 5.14+
Signed-off-by: Song Yoong Siang <yoong.siang.song(a)intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller(a)intel.com>
Reviewed-by: Jesper Dangaard Brouer <brouer(a)redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer(a)redhat.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Reviewed-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/intel/igc/igc_base.h b/drivers/net/ethernet/intel/igc/igc_base.h
index 7a992befca24..9f3827eda157 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.h
+++ b/drivers/net/ethernet/intel/igc/igc_base.h
@@ -87,8 +87,13 @@ union igc_adv_rx_desc {
#define IGC_RXDCTL_SWFLUSH 0x04000000 /* Receive Software Flush */
/* SRRCTL bit definitions */
-#define IGC_SRRCTL_BSIZEPKT_SHIFT 10 /* Shift _right_ */
-#define IGC_SRRCTL_BSIZEHDRSIZE_SHIFT 2 /* Shift _left_ */
-#define IGC_SRRCTL_DESCTYPE_ADV_ONEBUF 0x02000000
+#define IGC_SRRCTL_BSIZEPKT_MASK GENMASK(6, 0)
+#define IGC_SRRCTL_BSIZEPKT(x) FIELD_PREP(IGC_SRRCTL_BSIZEPKT_MASK, \
+ (x) / 1024) /* in 1 KB resolution */
+#define IGC_SRRCTL_BSIZEHDR_MASK GENMASK(13, 8)
+#define IGC_SRRCTL_BSIZEHDR(x) FIELD_PREP(IGC_SRRCTL_BSIZEHDR_MASK, \
+ (x) / 64) /* in 64 bytes resolution */
+#define IGC_SRRCTL_DESCTYPE_MASK GENMASK(27, 25)
+#define IGC_SRRCTL_DESCTYPE_ADV_ONEBUF FIELD_PREP(IGC_SRRCTL_DESCTYPE_MASK, 1)
#endif /* _IGC_BASE_H */
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ba49728be919..1c4676882082 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -640,8 +640,11 @@ static void igc_configure_rx_ring(struct igc_adapter *adapter,
else
buf_size = IGC_RXBUFFER_2048;
- srrctl = IGC_RX_HDR_LEN << IGC_SRRCTL_BSIZEHDRSIZE_SHIFT;
- srrctl |= buf_size >> IGC_SRRCTL_BSIZEPKT_SHIFT;
+ srrctl = rd32(IGC_SRRCTL(reg_idx));
+ srrctl &= ~(IGC_SRRCTL_BSIZEPKT_MASK | IGC_SRRCTL_BSIZEHDR_MASK |
+ IGC_SRRCTL_DESCTYPE_MASK);
+ srrctl |= IGC_SRRCTL_BSIZEHDR(IGC_RX_HDR_LEN);
+ srrctl |= IGC_SRRCTL_BSIZEPKT(buf_size);
srrctl |= IGC_SRRCTL_DESCTYPE_ADV_ONEBUF;
wr32(IGC_SRRCTL(reg_idx), srrctl);
This is a backport of the CR0.WP KVM series[1] to Linux v5.15. It
differs from the v6.1 backport as in needing additional prerequisite
patches from Lai Jiangshan (and fixes for those) to ensure the
assumption it's safe to let CR0.WP be a guest owned bit still stand.
I used 'ssdd 10 50000' from rt-tests[2] as a micro-benchmark, running on
a grsecurity L1 VM. Below table shows the results (runtime in seconds,
lower is better):
legacy TDP shadow
Linux v5.15.106 9.94s 66.1s 64.9s
+ patches 4.81s 4.79s 64.6s
It's interesting to see that using the TDP MMU is even slower than
shadow paging on a vanilla kernel, making the impact of this backport
even more significant.
The KVM unit test suite showed no regressions.
Please consider applying.
Thanks,
Mathias
[1] https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/
[2] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
Lai Jiangshan (3):
KVM: X86: Don't reset mmu context when X86_CR4_PCIDE 1->0
KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE
KVM: x86/mmu: Reconstruct shadow page root if the guest PDPTEs is
changed
Mathias Krause (3):
KVM: x86: Do not unload MMU roots when only toggling CR0.WP with TDP
enabled
KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
KVM: VMX: Make CR0.WP a guest owned bit
Paolo Bonzini (1):
KVM: x86/mmu: Avoid indirect call for get_cr3
Sean Christopherson (1):
KVM: x86/mmu: Refresh CR0.WP prior to checking for emulated permission
faults
arch/x86/kvm/kvm_cache_regs.h | 2 +-
arch/x86/kvm/mmu.h | 42 ++++++++++++++++++++++++++++++----
arch/x86/kvm/mmu/mmu.c | 27 +++++++++++++++++-----
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/pmu.c | 4 ++--
arch/x86/kvm/vmx/nested.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 6 ++---
arch/x86/kvm/vmx/vmx.h | 18 +++++++++++++++
arch/x86/kvm/x86.c | 27 +++++++++++++++++++---
9 files changed, 110 insertions(+), 22 deletions(-)
--
2.39.2
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
The control message provided by J1939 support MSG_CMSG_COMPAT but
blocked recvmsg() syscalls that have set this flag, i.e. on 32bit user
space on 64 bit kernels.
Link: https://github.com/hartkopp/can-isotp/issues/59
Cc: Oleksij Rempel <o.rempel(a)pengutronix.de>
Suggested-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Tested-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/20230505110308.81087-3-mkl@pengutronix.de
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/j1939/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 7e90f9e61d9b..1790469b2580 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -798,7 +798,7 @@ static int j1939_sk_recvmsg(struct socket *sock, struct msghdr *msg,
struct j1939_sk_buff_cb *skcb;
int ret = 0;
- if (flags & ~(MSG_DONTWAIT | MSG_ERRQUEUE))
+ if (flags & ~(MSG_DONTWAIT | MSG_ERRQUEUE | MSG_CMSG_COMPAT))
return -EINVAL;
if (flags & MSG_ERRQUEUE)
--
2.39.2
The patch titled
Subject: nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-incomplete-buffer-cleanup-in-nilfs_btnode_abort_change_key.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
Date: Sat, 13 May 2023 19:24:28 +0900
A syzbot fault injection test reported that nilfs_btnode_create_block, a
helper function that allocates a new node block for b-trees, causes a
kernel BUG for disk images where the file system block size is smaller
than the page size.
This was due to unexpected flags on the newly allocated buffer head, and
it turned out to be because the buffer flags were not cleared by
nilfs_btnode_abort_change_key() after an error occurred during a b-tree
update operation and the buffer was later reused in that state.
Fix this issue by using nilfs_btnode_delete() to abandon the unused
preallocated buffer in nilfs_btnode_abort_change_key().
Link: https://lkml.kernel.org/r/20230513102428.10223-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+b0a35a5c1f7e846d3b09(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000d1d6c205ebc4d512@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/btnode.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/btnode.c~nilfs2-fix-incomplete-buffer-cleanup-in-nilfs_btnode_abort_change_key
+++ a/fs/nilfs2/btnode.c
@@ -285,6 +285,14 @@ void nilfs_btnode_abort_change_key(struc
if (nbh == NULL) { /* blocksize == pagesize */
xa_erase_irq(&btnc->i_pages, newkey);
unlock_page(ctxt->bh->b_page);
- } else
- brelse(nbh);
+ } else {
+ /*
+ * When canceling a buffer that a prepare operation has
+ * allocated to copy a node block to another location, use
+ * nilfs_btnode_delete() to initialize and release the buffer
+ * so that the buffer flags will not be in an inconsistent
+ * state when it is reallocated.
+ */
+ nilfs_btnode_delete(nbh);
+ }
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-bug-of-nilfs_root-in-nilfs_evict_inode.patch
nilfs2-fix-incomplete-buffer-cleanup-in-nilfs_btnode_abort_change_key.patch
The probe() id argument may be NULL in 2 scenarios:
1. brcmf_pcie_pm_leave_D3() calling brcmf_pcie_probe() to reprobe
the device.
2. If a user tries to manually bind the driver from sysfs then the sdio /
pcie / usb probe() function gets called with NULL as id argument.
1. Is being hit by users causing the following oops on resume and causing
wifi to stop working:
BUG: kernel NULL pointer dereference, address: 0000000000000018
<snip>
Hardware name: Dell Inc. XPS 13 9350/0PWNCR, BIDS 1.13.0 02/10/2020
Workgueue: events_unbound async_run_entry_fn
RIP: 0010:brcmf_pcie_probe+Ox16b/0x7a0 [brcmfmac]
<snip>
Call Trace:
<TASK>
brcmf_pcie_pm_leave_D3+0xc5/8x1a0 [brcmfmac be3b4cefca451e190fa35be8f00db1bbec293887]
? pci_pm_resume+0x5b/0xf0
? pci_legacy_resume+0x80/0x80
dpm_run_callback+0x47/0x150
device_resume+0xa2/0x1f0
async_resume+0x1d/0x30
<snip>
Fix this by checking for id being NULL.
In the PCI and USB cases try a manual lookup of the id so that manually
binding the driver through sysfs and more importantly brcmf_pcie_probe()
on resume will work.
For the SDIO case there is no helper to do a manual sdio_device_id lookup,
so just directly error out on a NULL id there.
Fixes: da6d9c8ecd00 ("wifi: brcmfmac: add firmware vendor info in driver info")
Reported-by: Felix <nimrod4garoa(a)gmail.com>
Link: https://lore.kernel.org/regressions/4ef3f252ff530cbfa336f5a0d80710020fc5cb1…
Cc: Arend van Spriel <arend.vanspriel(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v2:
- Using BRCMF_FWVENDOR_INVALID will just lead to a probe() error later on,
if NULL error out immediately instead of using BRCMF_FWVENDOR_INVALID.
---
.../net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 5 +++++
.../net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 11 +++++++++++
.../net/wireless/broadcom/brcm80211/brcmfmac/usb.c | 11 +++++++++++
3 files changed, 27 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index 65d4799a5658..f06684f84789 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -1032,6 +1032,11 @@ static int brcmf_ops_sdio_probe(struct sdio_func *func,
struct brcmf_sdio_dev *sdiodev;
struct brcmf_bus *bus_if;
+ if (!id) {
+ dev_err(&func->dev, "Error no sdio_device_id passed for %x:%x\n", func->vendor, func->device);
+ return -ENODEV;
+ }
+
brcmf_dbg(SDIO, "Enter\n");
brcmf_dbg(SDIO, "Class=%x\n", func->class);
brcmf_dbg(SDIO, "sdio vendor ID: 0x%04x\n", func->vendor);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
index a9b9b2dc62d4..d9ecaa0cfdc4 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -2339,6 +2339,9 @@ static void brcmf_pcie_debugfs_create(struct device *dev)
}
#endif
+/* Forward declaration for pci_match_id() call */
+static const struct pci_device_id brcmf_pcie_devid_table[];
+
static int
brcmf_pcie_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
@@ -2349,6 +2352,14 @@ brcmf_pcie_probe(struct pci_dev *pdev, const struct pci_device_id *id)
struct brcmf_core *core;
struct brcmf_bus *bus;
+ if (!id) {
+ id = pci_match_id(brcmf_pcie_devid_table, pdev);
+ if (!id) {
+ pci_err(pdev, "Error could not find pci_device_id for %x:%x\n", pdev->vendor, pdev->device);
+ return -ENODEV;
+ }
+ }
+
brcmf_dbg(PCIE, "Enter %x:%x\n", pdev->vendor, pdev->device);
ret = -ENOMEM;
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
index 246843aeb696..2178675ae1a4 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
@@ -1331,6 +1331,9 @@ brcmf_usb_disconnect_cb(struct brcmf_usbdev_info *devinfo)
brcmf_usb_detach(devinfo);
}
+/* Forward declaration for usb_match_id() call */
+static const struct usb_device_id brcmf_usb_devid_table[];
+
static int
brcmf_usb_probe(struct usb_interface *intf, const struct usb_device_id *id)
{
@@ -1342,6 +1345,14 @@ brcmf_usb_probe(struct usb_interface *intf, const struct usb_device_id *id)
u32 num_of_eps;
u8 endpoint_num, ep;
+ if (!id) {
+ id = usb_match_id(intf, brcmf_usb_devid_table);
+ if (!id) {
+ dev_err(&intf->dev, "Error could not find matching usb_device_id\n");
+ return -ENODEV;
+ }
+ }
+
brcmf_dbg(USB, "Enter 0x%04x:0x%04x\n", id->idVendor, id->idProduct);
devinfo = kzalloc(sizeof(*devinfo), GFP_ATOMIC);
--
2.40.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1007843a91909a4995ee78a538f62d8665705b66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042446-gills-morality-d566@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1007843a91909a4995ee78a538f62d8665705b66 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 4 Apr 2023 23:31:58 +0900
Subject: [PATCH] mm/page_alloc: fix potential deadlock on zonelist_update_seq
seqlock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKUR…
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10(a)syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Patrick Daly <quic_pdaly(a)quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..e8b4f294d763 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init
From: Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
(cherry picked from upstream af0e6242909c3c4297392ca3e94eff1b4db71a97)
Taking one interrupt for every byte is rather slow. Since the
controller is perfectly capable of transmitting 32 bits at a time,
change t->bits_per-word to 32 when the length is divisible by 4 and
large enough that the reduced number of interrupts easily compensates
for the one or two extra fsl_spi_setup_transfer() calls this causes.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
---
drivers/spi/spi-fsl-spi.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 479d10dc6cb8..946b417f2d1c 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -357,12 +357,28 @@ static int fsl_spi_bufs(struct spi_device *spi, struct spi_transfer *t,
static int fsl_spi_do_one_msg(struct spi_master *master,
struct spi_message *m)
{
+ struct mpc8xxx_spi *mpc8xxx_spi = spi_master_get_devdata(master);
struct spi_device *spi = m->spi;
struct spi_transfer *t, *first;
unsigned int cs_change;
const int nsecs = 50;
int status;
+ /*
+ * In CPU mode, optimize large byte transfers to use larger
+ * bits_per_word values to reduce number of interrupts taken.
+ */
+ if (!(mpc8xxx_spi->flags & SPI_CPM_MODE)) {
+ list_for_each_entry(t, &m->transfers, transfer_list) {
+ if (t->len < 256 || t->bits_per_word != 8)
+ continue;
+ if ((t->len & 3) == 0)
+ t->bits_per_word = 32;
+ else if ((t->len & 1) == 0)
+ t->bits_per_word = 16;
+ }
+ }
+
/* Don't allow changes if CS is active */
first = list_first_entry(&m->transfers, struct spi_transfer,
transfer_list);
--
2.40.1
Hello,
In addition to c20c57d9868d ("spi: fsl-spi: Fix CPM/QE mode Litte
Endian") that you already applied to all stable branches, could you
please also apply:
8a5299a1278e ("spi: fsl-spi: Re-organise transfer bits_per_word adaptation")
fc96ec826bce ("spi: fsl-cpm: Use 16 bit mode for large transfers with
even size")
For 4.14 and 4.19, as prerequisit you will also need
af0e6242909c ("spi: spi-fsl-spi: automatically adapt bits-per-word in
cpu mode")
Thanks
Christophe
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6a0c637bfee69a74c104468544d9f2a6579626d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050613-slacked-gush-009c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
6a0c637bfee6 ("bus: mhi: host: Range check CHDBOFF and ERDBOFF")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6a0c637bfee69a74c104468544d9f2a6579626d0 Mon Sep 17 00:00:00 2001
From: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
Date: Fri, 24 Mar 2023 10:13:04 -0600
Subject: [PATCH] bus: mhi: host: Range check CHDBOFF and ERDBOFF
If the value read from the CHDBOFF and ERDBOFF registers is outside the
range of the MHI register space then an invalid address might be computed
which later causes a kernel panic. Range check the read value to prevent
a crash due to bad data from the device.
Fixes: 6cd330ae76ff ("bus: mhi: core: Add support for ringing channel/event ring doorbells")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy(a)quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
Link: https://lore.kernel.org/r/1679674384-27209-1-git-send-email-quic_jhugo@quic…
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index 3d779ee6396d..b46a0821adf9 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -516,6 +516,12 @@ int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
return -EIO;
}
+ if (val >= mhi_cntrl->reg_len - (8 * MHI_DEV_WAKE_DB)) {
+ dev_err(dev, "CHDB offset: 0x%x is out of range: 0x%zx\n",
+ val, mhi_cntrl->reg_len - (8 * MHI_DEV_WAKE_DB));
+ return -ERANGE;
+ }
+
/* Setup wake db */
mhi_cntrl->wake_db = base + val + (8 * MHI_DEV_WAKE_DB);
mhi_cntrl->wake_set = false;
@@ -532,6 +538,12 @@ int mhi_init_mmio(struct mhi_controller *mhi_cntrl)
return -EIO;
}
+ if (val >= mhi_cntrl->reg_len - (8 * mhi_cntrl->total_ev_rings)) {
+ dev_err(dev, "ERDB offset: 0x%x is out of range: 0x%zx\n",
+ val, mhi_cntrl->reg_len - (8 * mhi_cntrl->total_ev_rings));
+ return -ERANGE;
+ }
+
/* Setup event db address for each ev_ring */
mhi_event = mhi_cntrl->mhi_event;
for (i = 0; i < mhi_cntrl->total_ev_rings; i++, val += 8, mhi_event++) {
This incorrect tracking caused unnecessary ring expansion in some
usecases which over days of use consume a lot of memory.
xhci driver tries to keep track of free transfer blocks (TRBs) on the
ring buffer, but failed to add back some cancelled transfers that were
turned into no-op operations instead of just moving past them.
This can happen if there are several queued pending transfers which
then are cancelled in reverse order.
Solve this by counting the numer of steps we move the dequeue pointer
once we complete a transfer, and add it to the number of free trbs
instead of just adding the trb number of the current transfer.
This way we ensure we count the no-op trbs on the way as well.
Fixes: 55f6153d8cc8 ("xhci: remove extra loop in interrupt context")
Cc: stable(a)vger.kernel.org
Reported-by: Miller Hunter <MillerH(a)hearthnhome.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217242
Tested-by: Miller Hunter <MillerH(a)hearthnhome.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1ad12d5a4857..2bc82b3a2f98 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -276,6 +276,26 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
trace_xhci_inc_enq(ring);
}
+static int xhci_num_trbs_to(struct xhci_segment *start_seg, union xhci_trb *start,
+ struct xhci_segment *end_seg, union xhci_trb *end,
+ unsigned int num_segs)
+{
+ union xhci_trb *last_on_seg;
+ int num = 0;
+ int i = 0;
+
+ do {
+ if (start_seg == end_seg && end >= start)
+ return num + (end - start);
+ last_on_seg = &start_seg->trbs[TRBS_PER_SEGMENT - 1];
+ num += last_on_seg - start;
+ start_seg = start_seg->next;
+ start = start_seg->trbs;
+ } while (i++ <= num_segs);
+
+ return -EINVAL;
+}
+
/*
* Check to see if there's room to enqueue num_trbs on the ring and make sure
* enqueue pointer will not advance into dequeue segment. See rules above.
@@ -2140,6 +2160,7 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
u32 trb_comp_code)
{
struct xhci_ep_ctx *ep_ctx;
+ int trbs_freed;
ep_ctx = xhci_get_ep_ctx(xhci, ep->vdev->out_ctx, ep->ep_index);
@@ -2209,9 +2230,15 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
}
/* Update ring dequeue pointer */
+ trbs_freed = xhci_num_trbs_to(ep_ring->deq_seg, ep_ring->dequeue,
+ td->last_trb_seg, td->last_trb,
+ ep_ring->num_segs);
+ if (trbs_freed < 0)
+ xhci_dbg(xhci, "Failed to count freed trbs at TD finish\n");
+ else
+ ep_ring->num_trbs_free += trbs_freed;
ep_ring->dequeue = td->last_trb;
ep_ring->deq_seg = td->last_trb_seg;
- ep_ring->num_trbs_free += td->num_trbs - 1;
inc_deq(xhci, ep_ring);
return xhci_td_cleanup(xhci, td, ep_ring, td->status);
--
2.25.1
From: Mario Limonciello <mario.limonciello(a)amd.com>
Donghun reports that a notebook that has an AMD Ryzen 5700U but supports
S3 has problems with USB after resuming from suspend. The issue was
bisected down to commit d1658268e439 ("usb: pci-quirks: disable D3cold on
xhci suspend for s2idle on AMD Renoir").
As this issue only happens on S3, narrow the broken D3cold quirk to only
run in s2idle.
Fixes: d1658268e439 ("usb: pci-quirks: disable D3cold on xhci suspend for s2idle on AMD Renoir")
Reported-and-tested-by: Donghun Yoon <donghun.yoon(a)lge.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-pci.c | 12 ++++++++++--
drivers/usb/host/xhci.h | 2 +-
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ddb79f23fb3b..79b3691f373f 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/acpi.h>
#include <linux/reset.h>
+#include <linux/suspend.h>
#include "xhci.h"
#include "xhci-trace.h"
@@ -387,7 +388,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_AMD &&
pdev->device == PCI_DEVICE_ID_AMD_RENOIR_XHCI)
- xhci->quirks |= XHCI_BROKEN_D3COLD;
+ xhci->quirks |= XHCI_BROKEN_D3COLD_S2I;
if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
xhci->quirks |= XHCI_LPM_SUPPORT;
@@ -801,9 +802,16 @@ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
* Systems with the TI redriver that loses port status change events
* need to have the registers polled during D3, so avoid D3cold.
*/
- if (xhci->quirks & (XHCI_COMP_MODE_QUIRK | XHCI_BROKEN_D3COLD))
+ if (xhci->quirks & XHCI_COMP_MODE_QUIRK)
pci_d3cold_disable(pdev);
+#ifdef CONFIG_SUSPEND
+ /* d3cold is broken, but only when s2idle is used */
+ if (pm_suspend_target_state == PM_SUSPEND_TO_IDLE &&
+ xhci->quirks & (XHCI_BROKEN_D3COLD_S2I))
+ pci_d3cold_disable(pdev);
+#endif
+
if (xhci->quirks & XHCI_PME_STUCK_QUIRK)
xhci_pme_quirk(hcd);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 08d721921b7b..6b690ec91ff3 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1901,7 +1901,7 @@ struct xhci_hcd {
#define XHCI_DISABLE_SPARSE BIT_ULL(38)
#define XHCI_SG_TRB_CACHE_SIZE_QUIRK BIT_ULL(39)
#define XHCI_NO_SOFT_RETRY BIT_ULL(40)
-#define XHCI_BROKEN_D3COLD BIT_ULL(41)
+#define XHCI_BROKEN_D3COLD_S2I BIT_ULL(41)
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
--
2.25.1
This patch series backports a few VM preemption_status, steal_time and
PV TLB flushing fixes to 5.10 stable kernel.
Most of the changes backport cleanly except i had to work around a few
because of missing support/APIs in 5.10 kernel. I have captured those in
the changelog as well in the individual patches.
Earlier patch series that i'm resending for stable.
https://lore.kernel.org/all/20220909181351.23983-1-risbhat@amazon.com/
Changelog
- Use mark_page_dirty_in_slot api without kvm argument (KVM: x86: Fix
recording of guest steal time / preempted status)
- Avoid checking for xen_msr and SEV-ES conditions (KVM: x86:
do not set st->preempted when going back to user space)
- Use VCPU_STAT macro to expose preemption_reported and
preemption_other fields (KVM: x86: do not report a vCPU as preempted
outside instruction boundaries)
David Woodhouse (2):
KVM: x86: Fix recording of guest steal time / preempted status
KVM: Fix steal time asm constraints
Lai Jiangshan (1):
KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior
Paolo Bonzini (5):
KVM: x86: do not set st->preempted when going back to user space
KVM: x86: do not report a vCPU as preempted outside instruction
boundaries
KVM: x86: revalidate steal time cache if MSR value changes
KVM: x86: do not report preemption if the steal time cache is stale
KVM: x86: move guest_pv_has out of user_access section
Sean Christopherson (1):
KVM: x86: Remove obsolete disabling of page faults in
kvm_arch_vcpu_put()
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 164 ++++++++++++++++++++++----------
4 files changed, 122 insertions(+), 50 deletions(-)
--
2.39.2
As an alternative to backporting part of a large s390 patch series to the
6.3-stable tree as dependencies, here are just couple of rebased changes.
Avoids the need for:
Patch "s390/boot: rework decompressor reserved tracking" has been added to the 6.3-stable tree
Patch "s390/boot: rename mem_detect to physmem_info" has been added to the 6.3-stable tree
Patch "s390/boot: remove non-functioning image bootable check" has been added to the 6.3-stable tree
Thank you
Heiko Carstens (2):
s390/mm: rename POPULATE_ONE2ONE to POPULATE_DIRECT
s390/mm: fix direct map accounting
arch/s390/boot/vmem.c | 27 ++++++++++++++++++++-------
arch/s390/include/asm/pgtable.h | 2 +-
arch/s390/mm/pageattr.c | 2 +-
3 files changed, 22 insertions(+), 9 deletions(-)
--
2.38.1
This triggers a -Wdeclaration-after-statement as the code has changed a
bit since upstream. It might be better to hoist the whole block up, but
this is a smaller change so I went with it.
arch/riscv/mm/init.c:755:16: warning: mixing declarations and code is a C99 extension [-Wdeclaration-after-statement]
unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
^
1 warning generated.
Fixes: bbf94b042155 ("riscv: Move early dtb mapping into the fixmap region")
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304300429.SXZOA5up-lkp@intel.com/
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
---
v2:
- Fix rv64 warning introduced by the v1
- Add Fixes tag
arch/riscv/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index e800d7981e99..a382623e91cf 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -716,6 +716,7 @@ static void __init setup_vm_final(void)
{
uintptr_t va, map_size;
phys_addr_t pa, start, end;
+ unsigned long idx __maybe_unused;
u64 i;
/**
@@ -735,7 +736,7 @@ static void __init setup_vm_final(void)
* directly in swapper_pg_dir in addition to the pgd entry that points
* to fixmap_pte.
*/
- unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
+ idx = pgd_index(__fix_to_virt(FIX_FDT));
set_pgd(&swapper_pg_dir[idx], early_pg_dir[idx]);
#endif
--
2.37.2
From: Hans de Goede <hdegoede(a)redhat.com>
commit 085a9f43433f30cbe8a1ade62d9d7827c3217f4d upstream.
Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.
This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:
pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
============================================
WARNING: possible recursive locking detected
5.16.0-rc2+ #621 Not tainted
--------------------------------------------
irq/124-pciehp/86 is trying to acquire lock:
ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
but task is already holding lock:
ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ctrl->reset_lock);
lock(&ctrl->reset_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by irq/124-pciehp/86:
#0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
#1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
#2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
stack backtrace:
CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
Call Trace:
<TASK>
dump_stack_lvl+0x59/0x73
__lock_acquire.cold+0xc5/0x2c6
lock_acquire+0xb5/0x2b0
down_read+0x3e/0x50
pciehp_check_presence+0x23/0x80
pciehp_runtime_resume+0x5c/0xa0
device_for_each_child+0x45/0x70
pcie_port_device_runtime_resume+0x20/0x30
pci_pm_runtime_resume+0xa7/0xc0
__rpm_callback+0x41/0x110
rpm_callback+0x59/0x70
rpm_resume+0x512/0x7b0
__pm_runtime_resume+0x4a/0x90
__device_release_driver+0x28/0x240
device_release_driver+0x26/0x40
pci_stop_bus_device+0x68/0x90
pci_stop_bus_device+0x2c/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pciehp_unconfigure_device+0x6c/0x110
pciehp_disable_slot+0x5b/0xe0
pciehp_handle_presence_or_link_change+0xc3/0x2f0
pciehp_ist+0x179/0x180
This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.
Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.
The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.
Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redh…
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso(a)mit.edu>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Lukas Wunner <lukas(a)wunner.de>
Cc: stable(a)vger.kernel.org
[lukas: backport to v5.4-stable]
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
---
drivers/pci/hotplug/pciehp.h | 3 +++
drivers/pci/hotplug/pciehp_core.c | 2 +-
drivers/pci/hotplug/pciehp_hpc.c | 19 +++++++++++++++++--
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index aa61d4c219d7..79a713f5dbf4 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -72,6 +72,8 @@ extern int pciehp_poll_time;
* @reset_lock: prevents access to the Data Link Layer Link Active bit in the
* Link Status register and to the Presence Detect State bit in the Slot
* Status register during a slot reset which may cause them to flap
+ * @depth: Number of additional hotplug ports in the path to the root bus,
+ * used as lock subclass for @reset_lock
* @ist_running: flag to keep user request waiting while IRQ thread is running
* @request_result: result of last user request submitted to the IRQ thread
* @requester: wait queue to wake up on completion of user request,
@@ -102,6 +104,7 @@ struct controller {
struct hotplug_slot hotplug_slot; /* hotplug core interface */
struct rw_semaphore reset_lock;
+ unsigned int depth;
unsigned int ist_running;
int request_result;
wait_queue_head_t requester;
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 312cc45c44c7..f5255045b149 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -165,7 +165,7 @@ static void pciehp_check_presence(struct controller *ctrl)
{
int occupied;
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
mutex_lock(&ctrl->state_lock);
occupied = pciehp_card_present_or_link_active(ctrl);
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 13f3bc239c66..651664fe4058 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -674,7 +674,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
* Disable requests have higher priority than Presence Detect Changed
* or Data Link Layer State Changed events.
*/
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (events & DISABLE_SLOT)
pciehp_handle_disable_request(ctrl);
else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
@@ -808,7 +808,7 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
if (probe)
return 0;
- down_write(&ctrl->reset_lock);
+ down_write_nested(&ctrl->reset_lock, ctrl->depth);
if (!ATTN_BUTTN(ctrl)) {
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
@@ -864,6 +864,20 @@ static inline void dbg_ctrl(struct controller *ctrl)
#define FLAG(x, y) (((x) & (y)) ? '+' : '-')
+static inline int pcie_hotplug_depth(struct pci_dev *dev)
+{
+ struct pci_bus *bus = dev->bus;
+ int depth = 0;
+
+ while (bus->parent) {
+ bus = bus->parent;
+ if (bus->self && bus->self->is_hotplug_bridge)
+ depth++;
+ }
+
+ return depth;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -877,6 +891,7 @@ struct controller *pcie_init(struct pcie_device *dev)
return NULL;
ctrl->pcie = dev;
+ ctrl->depth = pcie_hotplug_depth(dev->port);
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
if (pdev->hotplug_user_indicators)
--
2.39.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3899d94e3831ee07ea6821c032dc297aec80586a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050708-absently-subsidy-099a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
3899d94e3831 ("drbd: correctly submit flush bio on barrier")
07888c665b40 ("block: pass a block_device and opf to bio_alloc")
b77c88c2100c ("block: pass a block_device and opf to bio_alloc_kiocb")
609be1066731 ("block: pass a block_device and opf to bio_alloc_bioset")
0a3140ea0fae ("block: pass a block_device and opf to blk_next_bio")
3b005bf6acf0 ("block: move blk_next_bio to bio.c")
7d8d0c658d48 ("xen-blkback: bio_alloc can't fail if it is allow to sleep")
d7b78de2b155 ("rnbd-srv: remove struct rnbd_dev_blk_io")
1fe0640ff94f ("rnbd-srv: simplify bio mapping in process_rdma")
4b1dc86d1857 ("drbd: bio_alloc can't fail if it is allow to sleep")
3f868c09ea8f ("dm-crypt: remove clone_init")
53db984e004c ("dm: bio_alloc can't fail if it is allowed to sleep")
39146b6f66ba ("ntfs3: remove ntfs_alloc_bio")
5d2ca2132f88 ("nfs/blocklayout: remove bl_alloc_init_bio")
f0d911927b3c ("nilfs2: remove nilfs_alloc_seg_bio")
d5f68a42da7a ("fs: remove mpage_alloc")
ae4c81644e91 ("RDMA/rtrs-srv: Rename rtrs_srv_sess to rtrs_srv_path")
d9372794717f ("RDMA/rtrs: Rename rtrs_sess to rtrs_path")
512b7931ad05 ("Merge branch 'akpm' (patches from Andrew)")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3899d94e3831ee07ea6821c032dc297aec80586a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoph=20B=C3=B6hmwalder?=
<christoph.boehmwalder(a)linbit.com>
Date: Wed, 3 May 2023 14:19:37 +0200
Subject: [PATCH] drbd: correctly submit flush bio on barrier
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When we receive a flush command (or "barrier" in DRBD), we currently use
a REQ_OP_FLUSH with the REQ_PREFLUSH flag set.
The correct way to submit a flush bio is by using a REQ_OP_WRITE without
any data, and set the REQ_PREFLUSH flag.
Since commit b4a6bb3a67aa ("block: add a sanity check for non-write
flush/fua bios"), this triggers a warning in the block layer, but this
has been broken for quite some time before that.
So use the correct set of flags to actually make the flush happen.
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: stable(a)vger.kernel.org
Fixes: f9ff0da56437 ("drbd: allow parallel flushes for multi-volume resources")
Reported-by: Thomas Voegtle <tv(a)lio96.de>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder(a)linbit.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20230503121937.17232-1-christoph.boehmwalder@linb…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index e54404c632e7..34b112752ab1 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1283,7 +1283,7 @@ static void one_flush_endio(struct bio *bio)
static void submit_one_flush(struct drbd_device *device, struct issue_flush_context *ctx)
{
struct bio *bio = bio_alloc(device->ldev->backing_bdev, 0,
- REQ_OP_FLUSH | REQ_PREFLUSH, GFP_NOIO);
+ REQ_OP_WRITE | REQ_PREFLUSH, GFP_NOIO);
struct one_flush_context *octx = kmalloc(sizeof(*octx), GFP_NOIO);
if (!octx) {
[AMD Official Use Only - General]
+stable, Sasha
> > Together with this patch there are now at least two regressions if
> > -rc1 whch could have been avoided and may impact testability on
> > affected systems.
>
> Are you saying that this patch which fixes s2idle on some random box
> should've gone to Linus *immediately*?
>
> And read my mail again:
>
> "Some fixes need longer testing because there have been cases where
> a fix breaks something else."
>
> So yes, I disagree with rushing fixes immediately. If they're obvious
> - whatever that means - then sure but not all of them are such.
>
> --
Unfortunately, it looks like the broken commit got backported into 6.1.28,
but the fix still isn't in Linus' tree.
Sasha,
Can you please pick up
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/u…
for 6.1.29 to fix the regression?
Thanks,
If userspace races tcsetattr() with a write, the drained condition
might not be guaranteed by the kernel. There is a race window after
checking Tx is empty before tty_set_termios() takes termios_rwsem for
write. During that race window, more characters can be queued by a
racing writer.
Any ongoing transmission might produce garbage during HW's
->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be
preventing such a character corruption. If those flags are set, take
tty's write lock to stop any writer before performing the lower layer
Tx empty check and wait for the pending characters to be sent (if any).
The initial wait for all-writers-done must be placed outside of tty's
write lock to avoid deadlock which makes it impossible to use
tty_wait_until_sent(). The write lock is retried if a racing write is
detected.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230317113318.31327-2-ilpo.jarvinen@linux.intel.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
(cherry picked from commit 094fb49a2d0d6827c86d2e0840873e6db0c491d2)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/tty/tty_io.c | 4 ++--
drivers/tty/tty_ioctl.c | 45 ++++++++++++++++++++++++++++++-----------
include/linux/tty.h | 2 ++
3 files changed, 37 insertions(+), 14 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index c37d2657308c..8cc45f2b3a17 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -941,13 +941,13 @@ static ssize_t tty_read(struct kiocb *iocb, struct iov_iter *to)
return i;
}
-static void tty_write_unlock(struct tty_struct *tty)
+void tty_write_unlock(struct tty_struct *tty)
{
mutex_unlock(&tty->atomic_write_lock);
wake_up_interruptible_poll(&tty->write_wait, EPOLLOUT);
}
-static int tty_write_lock(struct tty_struct *tty, int ndelay)
+int tty_write_lock(struct tty_struct *tty, int ndelay)
{
if (!mutex_trylock(&tty->atomic_write_lock)) {
if (ndelay)
diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c
index 803da2d111c8..fa81ff535f92 100644
--- a/drivers/tty/tty_ioctl.c
+++ b/drivers/tty/tty_ioctl.c
@@ -397,21 +397,42 @@ static int set_termios(struct tty_struct *tty, void __user *arg, int opt)
tmp_termios.c_ispeed = tty_termios_input_baud_rate(&tmp_termios);
tmp_termios.c_ospeed = tty_termios_baud_rate(&tmp_termios);
- ld = tty_ldisc_ref(tty);
+ if (opt & (TERMIOS_FLUSH|TERMIOS_WAIT)) {
+retry_write_wait:
+ retval = wait_event_interruptible(tty->write_wait, !tty_chars_in_buffer(tty));
+ if (retval < 0)
+ return retval;
- if (ld != NULL) {
- if ((opt & TERMIOS_FLUSH) && ld->ops->flush_buffer)
- ld->ops->flush_buffer(tty);
- tty_ldisc_deref(ld);
- }
+ if (tty_write_lock(tty, 0) < 0)
+ goto retry_write_wait;
- if (opt & TERMIOS_WAIT) {
- tty_wait_until_sent(tty, 0);
- if (signal_pending(current))
- return -ERESTARTSYS;
- }
+ /* Racing writer? */
+ if (tty_chars_in_buffer(tty)) {
+ tty_write_unlock(tty);
+ goto retry_write_wait;
+ }
- tty_set_termios(tty, &tmp_termios);
+ ld = tty_ldisc_ref(tty);
+ if (ld != NULL) {
+ if ((opt & TERMIOS_FLUSH) && ld->ops->flush_buffer)
+ ld->ops->flush_buffer(tty);
+ tty_ldisc_deref(ld);
+ }
+
+ if ((opt & TERMIOS_WAIT) && tty->ops->wait_until_sent) {
+ tty->ops->wait_until_sent(tty, 0);
+ if (signal_pending(current)) {
+ tty_write_unlock(tty);
+ return -ERESTARTSYS;
+ }
+ }
+
+ tty_set_termios(tty, &tmp_termios);
+
+ tty_write_unlock(tty);
+ } else {
+ tty_set_termios(tty, &tmp_termios);
+ }
/* FIXME: Arguably if tmp_termios == tty->termios AND the
actual requested termios was not tmp_termios then we may
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 5972f43b9d5a..ba33dd7c0847 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -480,6 +480,8 @@ extern void __stop_tty(struct tty_struct *tty);
extern void stop_tty(struct tty_struct *tty);
extern void __start_tty(struct tty_struct *tty);
extern void start_tty(struct tty_struct *tty);
+void tty_write_unlock(struct tty_struct *tty);
+int tty_write_lock(struct tty_struct *tty, int ndelay);
extern int tty_register_driver(struct tty_driver *driver);
extern int tty_unregister_driver(struct tty_driver *driver);
extern struct device *tty_register_device(struct tty_driver *driver,
--
2.30.2
Hi,
Could you please apply on v4.14 commit 78c855835478 ("perf bench: Share
some global variables to fix build with gcc 10") from v4.19.
Upstream commit is e4d9b04b973b ("perf bench: Share some global
variables to fix build with gcc 10")
Thanks
Christophe
While testing region autodetect with physical hardware a few fixes fell
out. The most interesting being evidence that a device is sensitive to
8-byte reads of 2 consecutive 4-byte registers. The other is a long
reported issue by Jonathan on how "passthrough" decoders are detected,
and having an example with physical hardware to reinforce the
observation from QEMU.
The rest are ancillary fixes and new debug messages.
---
Dan Williams (5):
cxl/hdm: Fail upon detecting 0-sized decoders
cxl/hdm: Use 4-byte reads to retrieve HDM decoder base+limit
cxl/core: Drop unused io-64-nonatomic-lo-hi.h
cxl/port: Scan single-target ports for decoders
cxl/hdm: Add more HDM decoder debug messages at startup
drivers/cxl/core/hdm.c | 52 ++++++++++++++++++++++++++++++++++++-----------
drivers/cxl/core/mbox.c | 1 -
drivers/cxl/core/port.c | 1 -
drivers/cxl/port.c | 18 ++++++++++++----
4 files changed, 53 insertions(+), 19 deletions(-)
base-commit: 24b18197184ac39bb8566fb82c0bf788bcd0d45b
Dear,
Please grant me permission to share a very crucial discussion with
you.I am looking forward to hearing from you at your earliest
convenience.
Mrs. Mariam Kouame
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e8c5d45f82ce0c238a4817739892fe8897a3dcc3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050701-epileptic-unethical-f46c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e8c5d45f82ce ("dm verity: fix error handling for check_at_most_once on FEC")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e8c5d45f82ce0c238a4817739892fe8897a3dcc3 Mon Sep 17 00:00:00 2001
From: Yeongjin Gil <youngjin.gil(a)samsung.com>
Date: Mon, 20 Mar 2023 15:59:32 +0900
Subject: [PATCH] dm verity: fix error handling for check_at_most_once on FEC
In verity_end_io(), if bi_status is not BLK_STS_OK, it can be return
directly. But if FEC configured, it is desired to correct the data page
through verity_verify_io. And the return value will be converted to
blk_status and passed to verity_finish_io().
BTW, when a bit is set in v->validated_blocks, verity_verify_io() skips
verification regardless of I/O error for the corresponding bio. In this
case, the I/O error could not be returned properly, and as a result,
there is a problem that abnormal data could be read for the
corresponding block.
To fix this problem, when an I/O error occurs, do not skip verification
even if the bit related is set in v->validated_blocks.
Fixes: 843f38d382b1 ("dm verity: add 'check_at_most_once' option to only validate hashes once")
Cc: stable(a)vger.kernel.org
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index ade83ef3b439..9316399b920e 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -523,7 +523,7 @@ static int verity_verify_io(struct dm_verity_io *io)
sector_t cur_block = io->block + b;
struct ahash_request *req = verity_io_hash_req(v, io);
- if (v->validated_blocks &&
+ if (v->validated_blocks && bio->bi_status == BLK_STS_OK &&
likely(test_bit(cur_block, v->validated_blocks))) {
verity_bv_skip_block(v, io, iter);
continue;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 949f95ff39bf188e594e7ecd8e29b82eb108f5bf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051546-either-second-fa36@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
949f95ff39bf ("ext4: fix lockdep warning when enabling MMP")
6810fad956df ("ext4: fix ext4_error_err save negative errno into superblock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 949f95ff39bf188e594e7ecd8e29b82eb108f5bf Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Tue, 11 Apr 2023 14:10:19 +0200
Subject: [PATCH] ext4: fix lockdep warning when enabling MMP
When we enable MMP in ext4_multi_mount_protect() during mount or
remount, we end up calling sb_start_write() from write_mmp_block(). This
triggers lockdep warning because freeze protection ranks above s_umount
semaphore we are holding during mount / remount. The problem is harmless
because we are guaranteed the filesystem is not frozen during mount /
remount but still let's fix the warning by not grabbing freeze
protection from ext4_multi_mount_protect().
Cc: stable(a)kernel.org
Reported-by: syzbot+6b7df7d5506b32467149(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ab7e5b6f400b7778d46f01841422e5718fb818…
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Link: https://lore.kernel.org/r/20230411121019.21940-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 4022bc713421..0aaf38ffcb6e 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -39,28 +39,36 @@ static void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
* Write the MMP block using REQ_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+static int write_mmp_block_thawed(struct super_block *sb,
+ struct buffer_head *bh)
{
struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
- /*
- * We protect against freezing so that we don't create dirty buffers
- * on frozen filesystem.
- */
- sb_start_write(sb);
ext4_mmp_csum_set(sb, mmp);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
- sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
-
return 0;
}
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+{
+ int err;
+
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
+ err = write_mmp_block_thawed(sb, bh);
+ sb_end_write(sb);
+ return err;
+}
+
/*
* Read the MMP block. It _must_ be read from disk and hence we clear the
* uptodate flag on the buffer.
@@ -344,7 +352,11 @@ int ext4_multi_mount_protect(struct super_block *sb,
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(sb, bh);
+ /*
+ * On mount / remount we are protected against fs freezing (by s_umount
+ * semaphore) and grabbing freeze protection upsets lockdep
+ */
+ retval = write_mmp_block_thawed(sb, bh);
if (retval)
goto failed;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 949f95ff39bf188e594e7ecd8e29b82eb108f5bf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051545-occupant-cottage-bfa2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
949f95ff39bf ("ext4: fix lockdep warning when enabling MMP")
6810fad956df ("ext4: fix ext4_error_err save negative errno into superblock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 949f95ff39bf188e594e7ecd8e29b82eb108f5bf Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Tue, 11 Apr 2023 14:10:19 +0200
Subject: [PATCH] ext4: fix lockdep warning when enabling MMP
When we enable MMP in ext4_multi_mount_protect() during mount or
remount, we end up calling sb_start_write() from write_mmp_block(). This
triggers lockdep warning because freeze protection ranks above s_umount
semaphore we are holding during mount / remount. The problem is harmless
because we are guaranteed the filesystem is not frozen during mount /
remount but still let's fix the warning by not grabbing freeze
protection from ext4_multi_mount_protect().
Cc: stable(a)kernel.org
Reported-by: syzbot+6b7df7d5506b32467149(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ab7e5b6f400b7778d46f01841422e5718fb818…
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Link: https://lore.kernel.org/r/20230411121019.21940-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 4022bc713421..0aaf38ffcb6e 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -39,28 +39,36 @@ static void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
* Write the MMP block using REQ_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+static int write_mmp_block_thawed(struct super_block *sb,
+ struct buffer_head *bh)
{
struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
- /*
- * We protect against freezing so that we don't create dirty buffers
- * on frozen filesystem.
- */
- sb_start_write(sb);
ext4_mmp_csum_set(sb, mmp);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
- sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
-
return 0;
}
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+{
+ int err;
+
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
+ err = write_mmp_block_thawed(sb, bh);
+ sb_end_write(sb);
+ return err;
+}
+
/*
* Read the MMP block. It _must_ be read from disk and hence we clear the
* uptodate flag on the buffer.
@@ -344,7 +352,11 @@ int ext4_multi_mount_protect(struct super_block *sb,
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(sb, bh);
+ /*
+ * On mount / remount we are protected against fs freezing (by s_umount
+ * semaphore) and grabbing freeze protection upsets lockdep
+ */
+ retval = write_mmp_block_thawed(sb, bh);
if (retval)
goto failed;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 949f95ff39bf188e594e7ecd8e29b82eb108f5bf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051544-refold-dosage-6c5f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
949f95ff39bf ("ext4: fix lockdep warning when enabling MMP")
6810fad956df ("ext4: fix ext4_error_err save negative errno into superblock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 949f95ff39bf188e594e7ecd8e29b82eb108f5bf Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Tue, 11 Apr 2023 14:10:19 +0200
Subject: [PATCH] ext4: fix lockdep warning when enabling MMP
When we enable MMP in ext4_multi_mount_protect() during mount or
remount, we end up calling sb_start_write() from write_mmp_block(). This
triggers lockdep warning because freeze protection ranks above s_umount
semaphore we are holding during mount / remount. The problem is harmless
because we are guaranteed the filesystem is not frozen during mount /
remount but still let's fix the warning by not grabbing freeze
protection from ext4_multi_mount_protect().
Cc: stable(a)kernel.org
Reported-by: syzbot+6b7df7d5506b32467149(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ab7e5b6f400b7778d46f01841422e5718fb818…
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Link: https://lore.kernel.org/r/20230411121019.21940-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 4022bc713421..0aaf38ffcb6e 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -39,28 +39,36 @@ static void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
* Write the MMP block using REQ_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+static int write_mmp_block_thawed(struct super_block *sb,
+ struct buffer_head *bh)
{
struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
- /*
- * We protect against freezing so that we don't create dirty buffers
- * on frozen filesystem.
- */
- sb_start_write(sb);
ext4_mmp_csum_set(sb, mmp);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
- sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
-
return 0;
}
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+{
+ int err;
+
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
+ err = write_mmp_block_thawed(sb, bh);
+ sb_end_write(sb);
+ return err;
+}
+
/*
* Read the MMP block. It _must_ be read from disk and hence we clear the
* uptodate flag on the buffer.
@@ -344,7 +352,11 @@ int ext4_multi_mount_protect(struct super_block *sb,
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(sb, bh);
+ /*
+ * On mount / remount we are protected against fs freezing (by s_umount
+ * semaphore) and grabbing freeze protection upsets lockdep
+ */
+ retval = write_mmp_block_thawed(sb, bh);
if (retval)
goto failed;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 949f95ff39bf188e594e7ecd8e29b82eb108f5bf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051543-brute-atrocious-12bc@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
949f95ff39bf ("ext4: fix lockdep warning when enabling MMP")
6810fad956df ("ext4: fix ext4_error_err save negative errno into superblock")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 949f95ff39bf188e594e7ecd8e29b82eb108f5bf Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Tue, 11 Apr 2023 14:10:19 +0200
Subject: [PATCH] ext4: fix lockdep warning when enabling MMP
When we enable MMP in ext4_multi_mount_protect() during mount or
remount, we end up calling sb_start_write() from write_mmp_block(). This
triggers lockdep warning because freeze protection ranks above s_umount
semaphore we are holding during mount / remount. The problem is harmless
because we are guaranteed the filesystem is not frozen during mount /
remount but still let's fix the warning by not grabbing freeze
protection from ext4_multi_mount_protect().
Cc: stable(a)kernel.org
Reported-by: syzbot+6b7df7d5506b32467149(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ab7e5b6f400b7778d46f01841422e5718fb818…
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Link: https://lore.kernel.org/r/20230411121019.21940-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 4022bc713421..0aaf38ffcb6e 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -39,28 +39,36 @@ static void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
* Write the MMP block using REQ_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+static int write_mmp_block_thawed(struct super_block *sb,
+ struct buffer_head *bh)
{
struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
- /*
- * We protect against freezing so that we don't create dirty buffers
- * on frozen filesystem.
- */
- sb_start_write(sb);
ext4_mmp_csum_set(sb, mmp);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
- sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
-
return 0;
}
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+{
+ int err;
+
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
+ err = write_mmp_block_thawed(sb, bh);
+ sb_end_write(sb);
+ return err;
+}
+
/*
* Read the MMP block. It _must_ be read from disk and hence we clear the
* uptodate flag on the buffer.
@@ -344,7 +352,11 @@ int ext4_multi_mount_protect(struct super_block *sb,
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(sb, bh);
+ /*
+ * On mount / remount we are protected against fs freezing (by s_umount
+ * semaphore) and grabbing freeze protection upsets lockdep
+ */
+ retval = write_mmp_block_thawed(sb, bh);
if (retval)
goto failed;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 949f95ff39bf188e594e7ecd8e29b82eb108f5bf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051542-twiddling-guise-b47e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
949f95ff39bf ("ext4: fix lockdep warning when enabling MMP")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 949f95ff39bf188e594e7ecd8e29b82eb108f5bf Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Tue, 11 Apr 2023 14:10:19 +0200
Subject: [PATCH] ext4: fix lockdep warning when enabling MMP
When we enable MMP in ext4_multi_mount_protect() during mount or
remount, we end up calling sb_start_write() from write_mmp_block(). This
triggers lockdep warning because freeze protection ranks above s_umount
semaphore we are holding during mount / remount. The problem is harmless
because we are guaranteed the filesystem is not frozen during mount /
remount but still let's fix the warning by not grabbing freeze
protection from ext4_multi_mount_protect().
Cc: stable(a)kernel.org
Reported-by: syzbot+6b7df7d5506b32467149(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ab7e5b6f400b7778d46f01841422e5718fb818…
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Link: https://lore.kernel.org/r/20230411121019.21940-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index 4022bc713421..0aaf38ffcb6e 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -39,28 +39,36 @@ static void ext4_mmp_csum_set(struct super_block *sb, struct mmp_struct *mmp)
* Write the MMP block using REQ_SYNC to try to get the block on-disk
* faster.
*/
-static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+static int write_mmp_block_thawed(struct super_block *sb,
+ struct buffer_head *bh)
{
struct mmp_struct *mmp = (struct mmp_struct *)(bh->b_data);
- /*
- * We protect against freezing so that we don't create dirty buffers
- * on frozen filesystem.
- */
- sb_start_write(sb);
ext4_mmp_csum_set(sb, mmp);
lock_buffer(bh);
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
submit_bh(REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
- sb_end_write(sb);
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
-
return 0;
}
+static int write_mmp_block(struct super_block *sb, struct buffer_head *bh)
+{
+ int err;
+
+ /*
+ * We protect against freezing so that we don't create dirty buffers
+ * on frozen filesystem.
+ */
+ sb_start_write(sb);
+ err = write_mmp_block_thawed(sb, bh);
+ sb_end_write(sb);
+ return err;
+}
+
/*
* Read the MMP block. It _must_ be read from disk and hence we clear the
* uptodate flag on the buffer.
@@ -344,7 +352,11 @@ int ext4_multi_mount_protect(struct super_block *sb,
seq = mmp_new_seq();
mmp->mmp_seq = cpu_to_le32(seq);
- retval = write_mmp_block(sb, bh);
+ /*
+ * On mount / remount we are protected against fs freezing (by s_umount
+ * semaphore) and grabbing freeze protection upsets lockdep
+ */
+ retval = write_mmp_block_thawed(sb, bh);
if (retval)
goto failed;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x f4ce24f54d9cca4f09a395f3eecce20d6bec4663
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051541-whacking-until-c2b2@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
f4ce24f54d9c ("ext4: fix deadlock when converting an inline directory in nojournal mode")
f036adb39976 ("ext4: rename "dirent_csum" functions to use "dirblock"")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
ee73f9a52a34 ("ext4: convert to new i_version API")
ae5e165d855d ("fs: new API for handling inode->i_version")
5cea7647e646 ("Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4ce24f54d9cca4f09a395f3eecce20d6bec4663 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Sat, 6 May 2023 21:04:01 -0400
Subject: [PATCH] ext4: fix deadlock when converting an inline directory in
nojournal mode
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock. There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.
A simple reproducer demonstrating the problem:
mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
mount -t ext4 -o dirsync /dev/vdc /vdc
cd /vdc
mkdir file0
cd file0
touch file0
touch file1
attr -s BurnSpaceInEA -V abcde .
touch supercalifragilisticexpialidocious
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu
Reported-by: syzbot+91dccab7c64e2850a4e5(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 859bc4e2c9b0..d3dfc51a43c5 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1175,6 +1175,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle,
ext4_initialize_dirent_tail(dir_block,
inode->i_sb->s_blocksize);
set_buffer_uptodate(dir_block);
+ unlock_buffer(dir_block);
err = ext4_handle_dirty_dirblock(handle, inode, dir_block);
if (err)
return err;
@@ -1249,6 +1250,7 @@ static int ext4_convert_inline_data_nolock(handle_t *handle,
if (!S_ISDIR(inode->i_mode)) {
memcpy(data_bh->b_data, buf, inline_size);
set_buffer_uptodate(data_bh);
+ unlock_buffer(data_bh);
error = ext4_handle_dirty_metadata(handle,
inode, data_bh);
} else {
@@ -1256,7 +1258,6 @@ static int ext4_convert_inline_data_nolock(handle_t *handle,
buf, inline_size);
}
- unlock_buffer(data_bh);
out_restore:
if (error)
ext4_restore_inline_data(handle, inode, iloc, buf, inline_size);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f4ce24f54d9cca4f09a395f3eecce20d6bec4663
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051540-afterlife-impeding-8523@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f4ce24f54d9c ("ext4: fix deadlock when converting an inline directory in nojournal mode")
f036adb39976 ("ext4: rename "dirent_csum" functions to use "dirblock"")
b886ee3e778e ("ext4: Support case-insensitive file name lookups")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4ce24f54d9cca4f09a395f3eecce20d6bec4663 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Sat, 6 May 2023 21:04:01 -0400
Subject: [PATCH] ext4: fix deadlock when converting an inline directory in
nojournal mode
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock. There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.
A simple reproducer demonstrating the problem:
mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
mount -t ext4 -o dirsync /dev/vdc /vdc
cd /vdc
mkdir file0
cd file0
touch file0
touch file1
attr -s BurnSpaceInEA -V abcde .
touch supercalifragilisticexpialidocious
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu
Reported-by: syzbot+91dccab7c64e2850a4e5(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 859bc4e2c9b0..d3dfc51a43c5 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1175,6 +1175,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle,
ext4_initialize_dirent_tail(dir_block,
inode->i_sb->s_blocksize);
set_buffer_uptodate(dir_block);
+ unlock_buffer(dir_block);
err = ext4_handle_dirty_dirblock(handle, inode, dir_block);
if (err)
return err;
@@ -1249,6 +1250,7 @@ static int ext4_convert_inline_data_nolock(handle_t *handle,
if (!S_ISDIR(inode->i_mode)) {
memcpy(data_bh->b_data, buf, inline_size);
set_buffer_uptodate(data_bh);
+ unlock_buffer(data_bh);
error = ext4_handle_dirty_metadata(handle,
inode, data_bh);
} else {
@@ -1256,7 +1258,6 @@ static int ext4_convert_inline_data_nolock(handle_t *handle,
buf, inline_size);
}
- unlock_buffer(data_bh);
out_restore:
if (error)
ext4_restore_inline_data(handle, inode, iloc, buf, inline_size);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051555-reiterate-strainer-a247@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
3bbef91bdd21 ("ext4: remove an unused variable warning with CONFIG_QUOTA=n")
61bb4a1c417e ("ext4: fix possible UAF when remounting r/o a mmp-protected file system")
618f003199c6 ("ext4: fix memory leak in ext4_fill_super")
2a4ae3bcdf05 ("ext4: fix timer use-after-free on failed mount")
c92dc856848f ("ext4: defer saving error info from atomic context")
02a7780e4d2f ("ext4: simplify ext4 error translation")
4067662388f9 ("ext4: move functions in super.c")
014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()")
b08070eca9e2 ("ext4: don't remount read-only with errors=continue on reboot")
9b5f6c9b83d9 ("ext4: make s_mount_flags modifications atomic")
f8f4acb6cded ("ext4: use generic casefolding support")
ababea77bc50 ("ext4: use s_mount_flags instead of s_mount_state for fast commit state")
8016e29f4362 ("ext4: fast commit recovery path")
5b849b5f96b4 ("jbd2: fast commit recovery path")
aa75f4d3daae ("ext4: main fast-commit commit path")
ff780b91efe9 ("jbd2: add fast commit machinery")
6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051554-tint-nervy-32db@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
3bbef91bdd21 ("ext4: remove an unused variable warning with CONFIG_QUOTA=n")
61bb4a1c417e ("ext4: fix possible UAF when remounting r/o a mmp-protected file system")
618f003199c6 ("ext4: fix memory leak in ext4_fill_super")
2a4ae3bcdf05 ("ext4: fix timer use-after-free on failed mount")
c92dc856848f ("ext4: defer saving error info from atomic context")
02a7780e4d2f ("ext4: simplify ext4 error translation")
4067662388f9 ("ext4: move functions in super.c")
014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()")
b08070eca9e2 ("ext4: don't remount read-only with errors=continue on reboot")
9b5f6c9b83d9 ("ext4: make s_mount_flags modifications atomic")
f8f4acb6cded ("ext4: use generic casefolding support")
ababea77bc50 ("ext4: use s_mount_flags instead of s_mount_state for fast commit state")
8016e29f4362 ("ext4: fast commit recovery path")
5b849b5f96b4 ("jbd2: fast commit recovery path")
aa75f4d3daae ("ext4: main fast-commit commit path")
ff780b91efe9 ("jbd2: add fast commit machinery")
6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051553-gangrene-pliable-3d3b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
3bbef91bdd21 ("ext4: remove an unused variable warning with CONFIG_QUOTA=n")
61bb4a1c417e ("ext4: fix possible UAF when remounting r/o a mmp-protected file system")
618f003199c6 ("ext4: fix memory leak in ext4_fill_super")
2a4ae3bcdf05 ("ext4: fix timer use-after-free on failed mount")
c92dc856848f ("ext4: defer saving error info from atomic context")
02a7780e4d2f ("ext4: simplify ext4 error translation")
4067662388f9 ("ext4: move functions in super.c")
014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()")
b08070eca9e2 ("ext4: don't remount read-only with errors=continue on reboot")
9b5f6c9b83d9 ("ext4: make s_mount_flags modifications atomic")
f8f4acb6cded ("ext4: use generic casefolding support")
ababea77bc50 ("ext4: use s_mount_flags instead of s_mount_state for fast commit state")
8016e29f4362 ("ext4: fast commit recovery path")
5b849b5f96b4 ("jbd2: fast commit recovery path")
aa75f4d3daae ("ext4: main fast-commit commit path")
ff780b91efe9 ("jbd2: add fast commit machinery")
6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051552-obscure-freefall-5242@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
3bbef91bdd21 ("ext4: remove an unused variable warning with CONFIG_QUOTA=n")
61bb4a1c417e ("ext4: fix possible UAF when remounting r/o a mmp-protected file system")
618f003199c6 ("ext4: fix memory leak in ext4_fill_super")
2a4ae3bcdf05 ("ext4: fix timer use-after-free on failed mount")
c92dc856848f ("ext4: defer saving error info from atomic context")
02a7780e4d2f ("ext4: simplify ext4 error translation")
4067662388f9 ("ext4: move functions in super.c")
014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()")
b08070eca9e2 ("ext4: don't remount read-only with errors=continue on reboot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051551-gigabyte-tabby-21cd@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
3bbef91bdd21 ("ext4: remove an unused variable warning with CONFIG_QUOTA=n")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051550-doubling-punch-d572@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051549-unguided-revolving-383c@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 6.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.3.y
git checkout FETCH_HEAD
git cherry-pick -x a44be64bbecb15a452496f60db6eacfee2b59c79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051548-agenda-purgatory-e96e@gregkh' --subject-prefix 'PATCH 6.3.y' HEAD^..
Possible dependencies:
a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled")
3b50d5018ed0 ("ext4: reflect error codes from ext4_multi_mount_protect() to its callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a44be64bbecb15a452496f60db6eacfee2b59c79 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Fri, 5 May 2023 21:02:30 -0400
Subject: [PATCH] ext4: don't clear SB_RDONLY when remounting r/w until quota
is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the
WARN_ON_ONCE(dquot_initialize_needed(inode));
in ext4_xattr_block_set(), with the following stack trace:
WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
Call Trace:
ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
security_inode_init_security+0x2df/0x3f0 security/security.c:1147
__ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
vfs_mkdir+0x29d/0x450 fs/namei.c:4038
do_mkdirat+0x264/0x520 fs/namei.c:4061
__do_sys_mkdirat fs/namei.c:4076 [inline]
__se_sys_mkdirat fs/namei.c:4074 [inline]
__x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c3…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 425b95a7a0ab..c7bc4a2709cc 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6387,6 +6387,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
struct ext4_mount_options old_opts;
ext4_group_t g;
int err = 0;
+ int enable_rw = 0;
#ifdef CONFIG_QUOTA
int enable_quota = 0;
int i, j;
@@ -6573,7 +6574,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
- sb->s_flags &= ~SB_RDONLY;
+ enable_rw = 1;
if (ext4_has_feature_mmp(sb)) {
err = ext4_multi_mount_protect(sb,
le64_to_cpu(es->s_mmp_block));
@@ -6632,6 +6633,9 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
+ if (enable_rw)
+ sb->s_flags &= ~SB_RDONLY;
+
if (!ext4_has_feature_mmp(sb) || sb_rdonly(sb))
ext4_stop_mmpd(sbi);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x fa83c34e3e56b3c672af38059e066242655271b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051521-buckshot-elliptic-0f14@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
fa83c34e3e56 ("ext4: check iomap type only if ext4_iomap_begin() does not fail")
8cd115bdda17 ("ext4: Optimize ext4 DIO overwrites")
aa9714d0e397 ("ext4: Start with shared i_rwsem in case of DIO instead of exclusive")
378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure")
0b9f230b94dd ("ext4: move inode extension check out from ext4_iomap_alloc()")
569342dc2485 ("ext4: move inode extension/truncate code out from ->iomap_end() callback")
b1b4705d54ab ("ext4: introduce direct I/O read using iomap infrastructure")
09edf4d38195 ("ext4: introduce new callback for IOMAP_REPORT")
f063db5ee989 ("ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper")
c8fdfe294187 ("ext4: move set iomap routines into a separate helper ext4_set_iomap()")
2e9b51d78229 ("ext4: iomap that extends beyond EOF should be marked dirty")
548feebec7e9 ("ext4: update direct I/O read lock pattern for IOCB_NOWAIT")
53e5cca56795 ("ext4: reorder map.m_flags checks within ext4_iomap_begin()")
c8cc88163f40 ("ext4: Add support for blocksize < pagesize in dioread_nolock")
2943fdbc688e ("ext4: Refactor mpage_map_and_submit_buffers function")
a00713ea982b ("ext4: Add API to bring in support for unwritten io_end_vec conversion")
821ff38d192a ("ext4: keep uniform naming convention for io & io_end variables")
70cb0d02b581 ("Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fa83c34e3e56b3c672af38059e066242655271b1 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Fri, 5 May 2023 21:24:29 +0800
Subject: [PATCH] ext4: check iomap type only if ext4_iomap_begin() does not
fail
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.
Cc: stable(a)kernel.org
Reported-by: syzbot+08106c4b7d60702dbc14(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3cb774d9e3f1..ce5f21b6c2b3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3377,7 +3377,7 @@ static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
*/
flags &= ~IOMAP_WRITE;
ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
- WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
+ WARN_ON_ONCE(!ret && iomap->type != IOMAP_MAPPED);
return ret;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x fa83c34e3e56b3c672af38059e066242655271b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051520-amigo-consent-6b7c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
fa83c34e3e56 ("ext4: check iomap type only if ext4_iomap_begin() does not fail")
8cd115bdda17 ("ext4: Optimize ext4 DIO overwrites")
aa9714d0e397 ("ext4: Start with shared i_rwsem in case of DIO instead of exclusive")
378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure")
0b9f230b94dd ("ext4: move inode extension check out from ext4_iomap_alloc()")
569342dc2485 ("ext4: move inode extension/truncate code out from ->iomap_end() callback")
b1b4705d54ab ("ext4: introduce direct I/O read using iomap infrastructure")
09edf4d38195 ("ext4: introduce new callback for IOMAP_REPORT")
f063db5ee989 ("ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper")
c8fdfe294187 ("ext4: move set iomap routines into a separate helper ext4_set_iomap()")
2e9b51d78229 ("ext4: iomap that extends beyond EOF should be marked dirty")
548feebec7e9 ("ext4: update direct I/O read lock pattern for IOCB_NOWAIT")
53e5cca56795 ("ext4: reorder map.m_flags checks within ext4_iomap_begin()")
c8cc88163f40 ("ext4: Add support for blocksize < pagesize in dioread_nolock")
2943fdbc688e ("ext4: Refactor mpage_map_and_submit_buffers function")
a00713ea982b ("ext4: Add API to bring in support for unwritten io_end_vec conversion")
821ff38d192a ("ext4: keep uniform naming convention for io & io_end variables")
70cb0d02b581 ("Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fa83c34e3e56b3c672af38059e066242655271b1 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Fri, 5 May 2023 21:24:29 +0800
Subject: [PATCH] ext4: check iomap type only if ext4_iomap_begin() does not
fail
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.
Cc: stable(a)kernel.org
Reported-by: syzbot+08106c4b7d60702dbc14(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3cb774d9e3f1..ce5f21b6c2b3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3377,7 +3377,7 @@ static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
*/
flags &= ~IOMAP_WRITE;
ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
- WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
+ WARN_ON_ONCE(!ret && iomap->type != IOMAP_MAPPED);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x fa83c34e3e56b3c672af38059e066242655271b1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023051519-quarters-shrank-0a63@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
fa83c34e3e56 ("ext4: check iomap type only if ext4_iomap_begin() does not fail")
8cd115bdda17 ("ext4: Optimize ext4 DIO overwrites")
aa9714d0e397 ("ext4: Start with shared i_rwsem in case of DIO instead of exclusive")
378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure")
0b9f230b94dd ("ext4: move inode extension check out from ext4_iomap_alloc()")
569342dc2485 ("ext4: move inode extension/truncate code out from ->iomap_end() callback")
b1b4705d54ab ("ext4: introduce direct I/O read using iomap infrastructure")
09edf4d38195 ("ext4: introduce new callback for IOMAP_REPORT")
f063db5ee989 ("ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper")
c8fdfe294187 ("ext4: move set iomap routines into a separate helper ext4_set_iomap()")
2e9b51d78229 ("ext4: iomap that extends beyond EOF should be marked dirty")
548feebec7e9 ("ext4: update direct I/O read lock pattern for IOCB_NOWAIT")
53e5cca56795 ("ext4: reorder map.m_flags checks within ext4_iomap_begin()")
c8cc88163f40 ("ext4: Add support for blocksize < pagesize in dioread_nolock")
2943fdbc688e ("ext4: Refactor mpage_map_and_submit_buffers function")
a00713ea982b ("ext4: Add API to bring in support for unwritten io_end_vec conversion")
821ff38d192a ("ext4: keep uniform naming convention for io & io_end variables")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fa83c34e3e56b3c672af38059e066242655271b1 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Fri, 5 May 2023 21:24:29 +0800
Subject: [PATCH] ext4: check iomap type only if ext4_iomap_begin() does not
fail
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.
Cc: stable(a)kernel.org
Reported-by: syzbot+08106c4b7d60702dbc14(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3cb774d9e3f1..ce5f21b6c2b3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3377,7 +3377,7 @@ static int ext4_iomap_overwrite_begin(struct inode *inode, loff_t offset,
*/
flags &= ~IOMAP_WRITE;
ret = ext4_iomap_begin(inode, offset, length, flags, iomap, srcmap);
- WARN_ON_ONCE(iomap->type != IOMAP_MAPPED);
+ WARN_ON_ONCE(!ret && iomap->type != IOMAP_MAPPED);
return ret;
}
Pass to dev_err_probe() PTR_ERR from actual dev_pm_opp_find_bw_floor()
call which failed, instead of previous ret which at this point is 0.
Failure of dev_pm_opp_find_bw_floor() would result in prematurely ending
the probe with success.
Fixes smatch warnings:
drivers/soc/qcom/icc-bwmon.c:776 bwmon_probe() warn: passing zero to 'dev_err_probe'
drivers/soc/qcom/icc-bwmon.c:781 bwmon_probe() warn: passing zero to 'dev_err_probe'
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/202305131657.76XeHDjF-lkp@intel.com/
Cc: <stable(a)vger.kernel.org>
Fixes: b9c2ae6cac40 ("soc: qcom: icc-bwmon: Add bandwidth monitoring driver")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Code was tested previously with smatch. Just this test in smatch is new.
---
drivers/soc/qcom/icc-bwmon.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/soc/qcom/icc-bwmon.c b/drivers/soc/qcom/icc-bwmon.c
index fd58c5b69897..f65bfeca7ed6 100644
--- a/drivers/soc/qcom/icc-bwmon.c
+++ b/drivers/soc/qcom/icc-bwmon.c
@@ -773,12 +773,12 @@ static int bwmon_probe(struct platform_device *pdev)
bwmon->max_bw_kbps = UINT_MAX;
opp = dev_pm_opp_find_bw_floor(dev, &bwmon->max_bw_kbps, 0);
if (IS_ERR(opp))
- return dev_err_probe(dev, ret, "failed to find max peak bandwidth\n");
+ return dev_err_probe(dev, PTR_ERR(opp), "failed to find max peak bandwidth\n");
bwmon->min_bw_kbps = 0;
opp = dev_pm_opp_find_bw_ceil(dev, &bwmon->min_bw_kbps, 0);
if (IS_ERR(opp))
- return dev_err_probe(dev, ret, "failed to find min peak bandwidth\n");
+ return dev_err_probe(dev, PTR_ERR(opp), "failed to find min peak bandwidth\n");
bwmon->dev = dev;
--
2.34.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x e8c5d45f82ce0c238a4817739892fe8897a3dcc3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050708-verdict-proton-a5f0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
e8c5d45f82ce ("dm verity: fix error handling for check_at_most_once on FEC")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e8c5d45f82ce0c238a4817739892fe8897a3dcc3 Mon Sep 17 00:00:00 2001
From: Yeongjin Gil <youngjin.gil(a)samsung.com>
Date: Mon, 20 Mar 2023 15:59:32 +0900
Subject: [PATCH] dm verity: fix error handling for check_at_most_once on FEC
In verity_end_io(), if bi_status is not BLK_STS_OK, it can be return
directly. But if FEC configured, it is desired to correct the data page
through verity_verify_io. And the return value will be converted to
blk_status and passed to verity_finish_io().
BTW, when a bit is set in v->validated_blocks, verity_verify_io() skips
verification regardless of I/O error for the corresponding bio. In this
case, the I/O error could not be returned properly, and as a result,
there is a problem that abnormal data could be read for the
corresponding block.
To fix this problem, when an I/O error occurs, do not skip verification
even if the bit related is set in v->validated_blocks.
Fixes: 843f38d382b1 ("dm verity: add 'check_at_most_once' option to only validate hashes once")
Cc: stable(a)vger.kernel.org
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index ade83ef3b439..9316399b920e 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -523,7 +523,7 @@ static int verity_verify_io(struct dm_verity_io *io)
sector_t cur_block = io->block + b;
struct ahash_request *req = verity_io_hash_req(v, io);
- if (v->validated_blocks &&
+ if (v->validated_blocks && bio->bi_status == BLK_STS_OK &&
likely(test_bit(cur_block, v->validated_blocks))) {
verity_bv_skip_block(v, io, iter);
continue;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x e8c5d45f82ce0c238a4817739892fe8897a3dcc3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050709-dry-stand-f81b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
e8c5d45f82ce ("dm verity: fix error handling for check_at_most_once on FEC")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e8c5d45f82ce0c238a4817739892fe8897a3dcc3 Mon Sep 17 00:00:00 2001
From: Yeongjin Gil <youngjin.gil(a)samsung.com>
Date: Mon, 20 Mar 2023 15:59:32 +0900
Subject: [PATCH] dm verity: fix error handling for check_at_most_once on FEC
In verity_end_io(), if bi_status is not BLK_STS_OK, it can be return
directly. But if FEC configured, it is desired to correct the data page
through verity_verify_io. And the return value will be converted to
blk_status and passed to verity_finish_io().
BTW, when a bit is set in v->validated_blocks, verity_verify_io() skips
verification regardless of I/O error for the corresponding bio. In this
case, the I/O error could not be returned properly, and as a result,
there is a problem that abnormal data could be read for the
corresponding block.
To fix this problem, when an I/O error occurs, do not skip verification
even if the bit related is set in v->validated_blocks.
Fixes: 843f38d382b1 ("dm verity: add 'check_at_most_once' option to only validate hashes once")
Cc: stable(a)vger.kernel.org
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index ade83ef3b439..9316399b920e 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -523,7 +523,7 @@ static int verity_verify_io(struct dm_verity_io *io)
sector_t cur_block = io->block + b;
struct ahash_request *req = verity_io_hash_req(v, io);
- if (v->validated_blocks &&
+ if (v->validated_blocks && bio->bi_status == BLK_STS_OK &&
likely(test_bit(cur_block, v->validated_blocks))) {
verity_bv_skip_block(v, io, iter);
continue;
META Unternehmen
Facebook, Instagram und WhatsApp
1601 WILLOW ROAD MENLO PARK, CA 94025
www. facebook. com
Dem Unternehmen META gehören neben anderen Produkten und Dienstleistungen auch Facebook, Instagram und WhatsApp. Meta ist eines der wertvollsten Unternehmen der Welt.
META (die Muttergesellschaft von Facebook, Instagram und WhatsApp) vergibt einen Zuschuss in Höhe von $920.000,00USD an 10 glückliche Kunden aus allen Ländern der Welt. Wir gratulieren Ihnen, dass Sie einer der zehn Glücklichen sind, die diesen Zuschuss von $920.000,00USD erhalten.
Kontaktieren Sie uns per E-Mail: contact(a)mtagroup.info mit den folgenden Informationen, um Ihren Zuschuss zu bearbeiten.
Vollständige Namen .............
Telefonnummer ........
Adresse .............
Land ................
Geschlecht .................
Alter ....................
Beruf .............
Die Firma META schätzt Ihr Recht auf Privatsphäre! Ihre Daten sind zu 100 % sicher und werden ausschließlich für den Zweck dieses Zuschusses verwendet.
Es gelten die allgemeinen Geschäftsbedingungen.
Büro des Präsidenten von
Facebook-CEO
Herr Mark Zuckerberg
Príjemca
Dostali ste sa na ocenenie od Organizácie Spojených národov pridruženej k
medzinárodný menový fond, v ktorom bola vaša e-mailová adresa a fond
nám boli uvolnené na váš prevod, preto pošlite svoje údaje na prevod.
Dostali sme pokyn, aby sme všetky neuhradené transakcie previedli v
rámci nasledujúceho
48 hodín, alebo ste už dostali svoj fond, ak okamžite nevyhoviete. Poznámka:
Potrebujeme vašu naliehavú odpoved, toto nie je jeden z tých
internetových podvodníkov
tam vonku, je to úlava od COVID-19.
Thanks
Jennifer
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 1007843a91909a4995ee78a538f62d8665705b66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042455-skinless-muzzle-1c50@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1007843a91909a4995ee78a538f62d8665705b66 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 4 Apr 2023 23:31:58 +0900
Subject: [PATCH] mm/page_alloc: fix potential deadlock on zonelist_update_seq
seqlock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKUR…
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10(a)syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Patrick Daly <quic_pdaly(a)quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..e8b4f294d763 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1007843a91909a4995ee78a538f62d8665705b66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042452-stopper-engross-e9da@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1007843a91909a4995ee78a538f62d8665705b66 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 4 Apr 2023 23:31:58 +0900
Subject: [PATCH] mm/page_alloc: fix potential deadlock on zonelist_update_seq
seqlock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKUR…
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10(a)syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Patrick Daly <quic_pdaly(a)quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..e8b4f294d763 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1007843a91909a4995ee78a538f62d8665705b66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042449-wobbling-putdown-13ea@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1007843a91909a4995ee78a538f62d8665705b66 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 4 Apr 2023 23:31:58 +0900
Subject: [PATCH] mm/page_alloc: fix potential deadlock on zonelist_update_seq
seqlock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKUR…
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10(a)syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Patrick Daly <quic_pdaly(a)quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7136c36c5d01..e8b4f294d763 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init