The base iomux offsets for each GPIO pin line are accumulatively
calculated based off iomux width flag in rockchip_pinctrl_get_soc_data.
If the iomux width flag is one of IOMUX_WIDTH_4BIT, IOMUX_WIDTH_3BIT or
IOMUX_WIDTH_2BIT, the base offset for next pin line would increase by 8
bytes, otherwise it would increase by 4 bytes.
Despite most of GPIO2-B iomux have 2-bit data width, which can be fit
into 4 bytes space with write mask, it actually take 8 bytes width for
whole GPIO2-B line.
Commit e8448a6c817c ("pinctrl: rockchip: fix pinmux bits for RK3328
GPIO2-B pins") wrongly set iomux width flag to 0, causing all base
iomux offset for line after GPIO2-B to be calculated wrong. Fix the
iomux width flag to IOMUX_WIDTH_2BIT so the offset after GPIO2-B is
correctly increased by 8, matching the actual width of GPIO2-B iomux.
Fixes: e8448a6c817c ("pinctrl: rockchip: fix pinmux bits for RK3328 GPIO2-B pins")
Cc: stable(a)vger.kernel.org
Reported-by: Richard Kojedzinszky <richard(a)kojedz.in>
Closes: https://lore.kernel.org/linux-rockchip/4f29b743202397d60edfb3c725537415@koj…
Tested-by: Richard Kojedzinszky <richard(a)kojedz.in>
Signed-off-by: Huang-Huang Bao <i(a)eh5.me>
---
I have double checked the iomux offsets in debug message match iomux
register definitions in "GRF Register Description" section in RK3328
TRM[1].
[1]: https://opensource.rock-chips.com/images/9/97/Rockchip_RK3328TRM_V1.1-Part1…
Kernel pinctrl debug message with dyndbg="file pinctrl-rockchip.c +p":
rockchip-pinctrl pinctrl: bank 0, iomux 0 has iom_offset 0x0 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 0, iomux 1 has iom_offset 0x4 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 0, iomux 2 has iom_offset 0x8 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 0, iomux 3 has iom_offset 0xc drv_offset 0x0
rockchip-pinctrl pinctrl: bank 1, iomux 0 has iom_offset 0x10 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 1, iomux 1 has iom_offset 0x14 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 1, iomux 2 has iom_offset 0x18 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 1, iomux 3 has iom_offset 0x1c drv_offset 0x0
rockchip-pinctrl pinctrl: bank 2, iomux 0 has iom_offset 0x20 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 2, iomux 1 has iom_offset 0x24 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 2, iomux 2 has iom_offset 0x2c drv_offset 0x0
rockchip-pinctrl pinctrl: bank 2, iomux 3 has iom_offset 0x34 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 3, iomux 0 has iom_offset 0x38 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 3, iomux 1 has iom_offset 0x40 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 3, iomux 2 has iom_offset 0x48 drv_offset 0x0
rockchip-pinctrl pinctrl: bank 3, iomux 3 has iom_offset 0x4c drv_offset 0x0
The "Closes" links to test report from original reporter with original
issue contained, which was not delivered to any mailing list thus not
available on the web.
Added CC stable as the problematic e8448a6c817c fixed by this patch was
recently merged to stable kernels.
Sorry for the inconvenience caused,
Huang-Huang
drivers/pinctrl/pinctrl-rockchip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c
index 3f56991f5b89..f6da91941fbd 100644
--- a/drivers/pinctrl/pinctrl-rockchip.c
+++ b/drivers/pinctrl/pinctrl-rockchip.c
@@ -3813,7 +3813,7 @@ static struct rockchip_pin_bank rk3328_pin_banks[] = {
PIN_BANK_IOMUX_FLAGS(0, 32, "gpio0", 0, 0, 0, 0),
PIN_BANK_IOMUX_FLAGS(1, 32, "gpio1", 0, 0, 0, 0),
PIN_BANK_IOMUX_FLAGS(2, 32, "gpio2", 0,
- 0,
+ IOMUX_WIDTH_2BIT,
IOMUX_WIDTH_3BIT,
0),
PIN_BANK_IOMUX_FLAGS(3, 32, "gpio3",
base-commit: 4376e966ecb78c520b0faf239d118ecfab42a119
--
2.45.2
In _emif_get_id(), of_get_address() may return NULL which is later
dereferenced. Fix this bug by adding NULL check.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408160935.A6QFliqt-lkp@intel.com/
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v5:
- According to the developer's suggestion, added an inspection of function
of_translate_address(). However, kernel test robot reported a build
warning, so the inspection is removed here, reverting to the modification
solution of patch v3.
Changes in v4:
- added the check of of_translate_address() as suggestions.
Changes in v3:
- added the patch operations omitted in PATCH v2 RESEND compared to PATCH
v2. Sorry for my oversight.
Changes in v2:
- added Cc stable line.
---
drivers/edac/ti_edac.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/edac/ti_edac.c b/drivers/edac/ti_edac.c
index 29723c9592f7..6f3da8d99eab 100644
--- a/drivers/edac/ti_edac.c
+++ b/drivers/edac/ti_edac.c
@@ -207,6 +207,9 @@ static int _emif_get_id(struct device_node *node)
int my_id = 0;
addrp = of_get_address(node, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
my_addr = (u32)of_translate_address(node, addrp);
for_each_matching_node(np, ti_edac_of_match) {
@@ -214,6 +217,9 @@ static int _emif_get_id(struct device_node *node)
continue;
addrp = of_get_address(np, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
addr = (u32)of_translate_address(np, addrp);
edac_printk(KERN_INFO, EDAC_MOD_NAME,
--
2.25.1
Commit da87d35a6e51 ("PCI: dra7xx: Use threaded IRQ handler for
"dra7xx-pcie-main" IRQ") switched from devm_request_irq() to
devm_request_threaded_irq(). In this process, the "handler" and the
"thread_fn" parameters were erroneously interchanged, with "NULL" being
passed as the "handler" and "dra7xx_pcie_irq_handler()" being registered
as the function to be called in a threaded interrupt context.
Fix this by interchanging the "handler" and "thread_fn" parameters.
While at it, correct the indentation.
Fixes: da87d35a6e51 ("PCI: dra7xx: Use threaded IRQ handler for "dra7xx-pcie-main" IRQ")
Cc: <stable(a)vger.kernel.org>
Reported-by: Udit Kumar <u-kumar1(a)ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli(a)ti.com>
---
Hello,
This patch is based on commit
d2bafcf224f3 Merge tag 'cgroup-for-6.11-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
of Mainline Linux.
Regards,
Siddharth.
drivers/pci/controller/dwc/pci-dra7xx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c
index 4fe3b0cb72ec..4c64ac27af40 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -849,8 +849,9 @@ static int dra7xx_pcie_probe(struct platform_device *pdev)
}
dra7xx->mode = mode;
- ret = devm_request_threaded_irq(dev, irq, NULL, dra7xx_pcie_irq_handler,
- IRQF_SHARED, "dra7xx-pcie-main", dra7xx);
+ ret = devm_request_threaded_irq(dev, irq, dra7xx_pcie_irq_handler, NULL,
+ IRQF_SHARED, "dra7xx-pcie-main",
+ dra7xx);
if (ret) {
dev_err(dev, "failed to request irq\n");
goto err_gpio;
--
2.40.1
The patch titled
Subject: mm/memcontrol: respect zswap.writeback setting from parent cg too
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memcontrol-respect-zswapwriteback-setting-from-parent-cg-too.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Yuan <me(a)yhndnzj.com>
Subject: mm/memcontrol: respect zswap.writeback setting from parent cg too
Date: Fri, 23 Aug 2024 16:27:06 +0000
Currently, the behavior of zswap.writeback wrt. the cgroup hierarchy
seems a bit odd. Unlike zswap.max, it doesn't honor the value from parent
cgroups. This surfaced when people tried to globally disable zswap
writeback, i.e. reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=3D1 still performing writeback.
The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback=3D systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups. It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value. Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.
[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate=
#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734
Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disablin=
g")
Signed-off-by: Mike Yuan <me(a)yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs(a)gmail.com>
Acked-by: Yosry Ahmed <yosryahmed(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Michal Koutn�� <mkoutny(a)suse.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/cgroup-v2.rst | 7 ++++---
mm/memcontrol.c | 12 +++++++++---
2 files changed, 13 insertions(+), 6 deletions(-)
--- a/Documentation/admin-guide/cgroup-v2.rst~mm-memcontrol-respect-zswapwriteback-setting-from-parent-cg-too
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -1717,9 +1717,10 @@ The following nested keys are defined.
entries fault back in or are written out to disk.
memory.zswap.writeback
- A read-write single value file. The default value is "1". The
- initial value of the root cgroup is 1, and when a new cgroup is
- created, it inherits the current value of its parent.
+ A read-write single value file. The default value is "1".
+ Note that this setting is hierarchical, i.e. the writeback would be
+ implicitly disabled for child cgroups if the upper hierarchy
+ does so.
When this is set to 0, all swapping attempts to swapping devices
are disabled. This included both zswap writebacks, and swapping due
--- a/mm/memcontrol.c~mm-memcontrol-respect-zswapwriteback-setting-from-parent-cg-too
+++ a/mm/memcontrol.c
@@ -3613,8 +3613,7 @@ mem_cgroup_css_alloc(struct cgroup_subsy
memcg1_soft_limit_reset(memcg);
#ifdef CONFIG_ZSWAP
memcg->zswap_max = PAGE_COUNTER_MAX;
- WRITE_ONCE(memcg->zswap_writeback,
- !parent || READ_ONCE(parent->zswap_writeback));
+ WRITE_ONCE(memcg->zswap_writeback, true);
#endif
page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX);
if (parent) {
@@ -5320,7 +5319,14 @@ void obj_cgroup_uncharge_zswap(struct ob
bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
{
/* if zswap is disabled, do not block pages going to the swapping device */
- return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback);
+ if (!zswap_is_enabled())
+ return true;
+
+ for (; memcg; memcg = parent_mem_cgroup(memcg))
+ if (!READ_ONCE(memcg->zswap_writeback))
+ return false;
+
+ return true;
}
static u64 zswap_current_read(struct cgroup_subsys_state *css,
_
Patches currently in -mm which might be from me(a)yhndnzj.com are
mm-memcontrol-respect-zswapwriteback-setting-from-parent-cg-too.patch
documentation-cgroup-v2-clarify-that-zswapwriteback-is-ignored-if-zswap-is-disabled.patch
selftests-test_zswap-add-test-for-hierarchical-zswapwriteback.patch
This is a note to let you know that I've just added the patch titled
usb: cdnsp: fix for Link TRB with TC
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 740f2e2791b98e47288b3814c83a3f566518fed2 Mon Sep 17 00:00:00 2001
From: Pawel Laszczak <pawell(a)cadence.com>
Date: Wed, 21 Aug 2024 06:07:42 +0000
Subject: usb: cdnsp: fix for Link TRB with TC
Stop Endpoint command on LINK TRB with TC bit set to 1 causes that
internal cycle bit can have incorrect state after command complete.
In consequence empty transfer ring can be incorrectly detected
when EP is resumed.
NOP TRB before LINK TRB avoid such scenario. Stop Endpoint command
is then on NOP TRB and internal cycle bit is not changed and have
correct value.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/PH7PR07MB953878279F375CCCE6C6F40FDD8E2@PH7PR07MB9…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/cdnsp-gadget.h | 3 +++
drivers/usb/cdns3/cdnsp-ring.c | 28 ++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.h b/drivers/usb/cdns3/cdnsp-gadget.h
index dbee6f085277..84887dfea763 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.h
+++ b/drivers/usb/cdns3/cdnsp-gadget.h
@@ -811,6 +811,7 @@ struct cdnsp_stream_info {
* generate Missed Service Error Event.
* Set skip flag when receive a Missed Service Error Event and
* process the missed tds on the endpoint ring.
+ * @wa1_nop_trb: hold pointer to NOP trb.
*/
struct cdnsp_ep {
struct usb_ep endpoint;
@@ -838,6 +839,8 @@ struct cdnsp_ep {
#define EP_UNCONFIGURED BIT(7)
bool skip;
+ union cdnsp_trb *wa1_nop_trb;
+
};
/**
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index a60c0cb991cd..dbd83d321bca 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -1904,6 +1904,23 @@ int cdnsp_queue_bulk_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
if (ret)
return ret;
+ /*
+ * workaround 1: STOP EP command on LINK TRB with TC bit set to 1
+ * causes that internal cycle bit can have incorrect state after
+ * command complete. In consequence empty transfer ring can be
+ * incorrectly detected when EP is resumed.
+ * NOP TRB before LINK TRB avoid such scenario. STOP EP command is
+ * then on NOP TRB and internal cycle bit is not changed and have
+ * correct value.
+ */
+ if (pep->wa1_nop_trb) {
+ field = le32_to_cpu(pep->wa1_nop_trb->trans_event.flags);
+ field ^= TRB_CYCLE;
+
+ pep->wa1_nop_trb->trans_event.flags = cpu_to_le32(field);
+ pep->wa1_nop_trb = NULL;
+ }
+
/*
* Don't give the first TRB to the hardware (by toggling the cycle bit)
* until we've finished creating all the other TRBs. The ring's cycle
@@ -1999,6 +2016,17 @@ int cdnsp_queue_bulk_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
send_addr = addr;
}
+ if (cdnsp_trb_is_link(ring->enqueue + 1)) {
+ field = TRB_TYPE(TRB_TR_NOOP) | TRB_IOC;
+ if (!ring->cycle_state)
+ field |= TRB_CYCLE;
+
+ pep->wa1_nop_trb = ring->enqueue;
+
+ cdnsp_queue_trb(pdev, ring, 0, 0x0, 0x0,
+ TRB_INTR_TARGET(0), field);
+ }
+
cdnsp_check_trb_math(preq, enqd_len);
ret = cdnsp_giveback_first_trb(pdev, pep, preq->request.stream_id,
start_cycle, start_trb);
--
2.46.0
In some cases the sink can reset itself after it was configured into MST
mode, without the driver noticing the disconnected state. For instance
the reset may happen in the middle of a modeset, or the (long) HPD pulse
generated may be not long enough for the encoder detect handler to
observe the HPD's deasserted state. In this case the sink's DPCD
register programmed to enable MST will be reset, while the driver still
assumes MST is still enabled. Detect this condition, which will tear
down and recreate/re-enable the MST topology.
v2:
- Add a code comment about adjusting the expected DP_MSTM_CTRL register
value for SST + SideBand. (Suraj, Jani)
- Print a debug message about detecting the link reset. (Jani)
- Verify the DPCD MST state only if it wasn't already determined that
the sink is disconnected.
Cc: stable(a)vger.kernel.org
Cc: Jani Nikula <jani.nikula(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11195
Reviewed-by: Suraj Kandpal <suraj.kandpal(a)intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_dp.c | 12 +++++++
drivers/gpu/drm/i915/display/intel_dp_mst.c | 40 +++++++++++++++++++++
drivers/gpu/drm/i915/display/intel_dp_mst.h | 1 +
3 files changed, 53 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 6a0c7ae654f40..789c2f78826d0 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5999,6 +5999,18 @@ intel_dp_detect(struct drm_connector *connector,
else
status = connector_status_disconnected;
+ if (status != connector_status_disconnected &&
+ !intel_dp_mst_verify_dpcd_state(intel_dp))
+ /*
+ * This requires retrying detection for instance to re-enable
+ * the MST mode that got reset via a long HPD pulse. The retry
+ * will happen either via the hotplug handler's retry logic,
+ * ensured by setting the connector here to SST/disconnected,
+ * or via a userspace connector probing in response to the
+ * hotplug uevent sent when removing the MST connectors.
+ */
+ status = connector_status_disconnected;
+
if (status == connector_status_disconnected) {
memset(&intel_dp->compliance, 0, sizeof(intel_dp->compliance));
memset(intel_connector->dp.dsc_dpcd, 0, sizeof(intel_connector->dp.dsc_dpcd));
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 45d2230d1801b..15541932b809e 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -2062,3 +2062,43 @@ void intel_dp_mst_prepare_probe(struct intel_dp *intel_dp)
intel_mst_set_probed_link_params(intel_dp, link_rate, lane_count);
}
+
+/*
+ * intel_dp_mst_verify_dpcd_state - verify the MST SW enabled state wrt. the DPCD
+ * @intel_dp: DP port object
+ *
+ * Verify if @intel_dp's MST enabled SW state matches the corresponding DPCD
+ * state. A long HPD pulse - not long enough to be detected as a disconnected
+ * state - could've reset the DPCD state, which requires tearing
+ * down/recreating the MST topology.
+ *
+ * Returns %true if the SW MST enabled and DPCD states match, %false
+ * otherwise.
+ */
+bool intel_dp_mst_verify_dpcd_state(struct intel_dp *intel_dp)
+{
+ struct intel_display *display = to_intel_display(intel_dp);
+ struct intel_connector *connector = intel_dp->attached_connector;
+ struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
+ struct intel_encoder *encoder = &dig_port->base;
+ int ret;
+ u8 val;
+
+ if (!intel_dp->is_mst)
+ return true;
+
+ ret = drm_dp_dpcd_readb(intel_dp->mst_mgr.aux, DP_MSTM_CTRL, &val);
+
+ /* Adjust the expected register value for SST + SideBand. */
+ if (ret < 0 || val != (DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC)) {
+ drm_dbg_kms(display->drm,
+ "[CONNECTOR:%d:%s][ENCODER:%d:%s] MST mode got reset, removing topology (ret=%d, ctrl=0x%02x)\n",
+ connector->base.base.id, connector->base.name,
+ encoder->base.base.id, encoder->base.name,
+ ret, val);
+
+ return false;
+ }
+
+ return true;
+}
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.h b/drivers/gpu/drm/i915/display/intel_dp_mst.h
index fba76454fa67f..8343804ce3f8d 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.h
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.h
@@ -28,5 +28,6 @@ int intel_dp_mst_atomic_check_link(struct intel_atomic_state *state,
bool intel_dp_mst_crtc_needs_modeset(struct intel_atomic_state *state,
struct intel_crtc *crtc);
void intel_dp_mst_prepare_probe(struct intel_dp *intel_dp);
+bool intel_dp_mst_verify_dpcd_state(struct intel_dp *intel_dp);
#endif /* __INTEL_DP_MST_H__ */
--
2.44.2
Currently, the behavior of zswap.writeback wrt.
the cgroup hierarchy seems a bit odd. Unlike zswap.max,
it doesn't honor the value from parent cgroups. This
surfaced when people tried to globally disable zswap writeback,
i.e. reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results
in subcgroups with zswap.writeback=1 still performing writeback.
The inconsistency became more noticeable after I introduced
the MemoryZSwapWriteback= systemd unit setting [2] for
controlling the knob. The patch assumed that the kernel would
enforce the value of parent cgroups. It could probably be
workarounded from systemd's side, by going up the slice unit
tree and inheriting the value. Yet I think it's more sensible
to make it behave consistently with zswap.max and friends.
[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Dis…
[2] https://github.com/systemd/systemd/pull/31734
Changes in v3:
- Additionally drop inheritance of zswap.writeback setting
on cgroup creation, which is no longer needed
Link to v2: https://lore.kernel.org/linux-kernel/20240816144344.18135-1-me@yhndnzj.com/
Changes in v2:
- Actually base on latest tree (is_zswap_enabled() -> zswap_is_enabled())
- Update Documentation/admin-guide/cgroup-v2.rst to reflect the change
Link to v1: https://lore.kernel.org/linux-kernel/20240814171800.23558-1-me@yhndnzj.com/
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Koutný <mkoutny(a)suse.com>
Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Yuan <me(a)yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs(a)gmail.com>
Acked-by: Yosry Ahmed <yosryahmed(a)google.com>
---
Documentation/admin-guide/cgroup-v2.rst | 7 ++++---
mm/memcontrol.c | 12 +++++++++---
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 86311c2907cd..95c18bc17083 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1717,9 +1717,10 @@ The following nested keys are defined.
entries fault back in or are written out to disk.
memory.zswap.writeback
- A read-write single value file. The default value is "1". The
- initial value of the root cgroup is 1, and when a new cgroup is
- created, it inherits the current value of its parent.
+ A read-write single value file. The default value is "1".
+ Note that this setting is hierarchical, i.e. the writeback would be
+ implicitly disabled for child cgroups if the upper hierarchy
+ does so.
When this is set to 0, all swapping attempts to swapping devices
are disabled. This included both zswap writebacks, and swapping due
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f29157288b7d..d563fb515766 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3613,8 +3613,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
memcg1_soft_limit_reset(memcg);
#ifdef CONFIG_ZSWAP
memcg->zswap_max = PAGE_COUNTER_MAX;
- WRITE_ONCE(memcg->zswap_writeback,
- !parent || READ_ONCE(parent->zswap_writeback));
+ WRITE_ONCE(memcg->zswap_writeback, true);
#endif
page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX);
if (parent) {
@@ -5320,7 +5319,14 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
{
/* if zswap is disabled, do not block pages going to the swapping device */
- return !zswap_is_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback);
+ if (!zswap_is_enabled())
+ return true;
+
+ for (; memcg; memcg = parent_mem_cgroup(memcg))
+ if (!READ_ONCE(memcg->zswap_writeback))
+ return false;
+
+ return true;
}
static u64 zswap_current_read(struct cgroup_subsys_state *css,
base-commit: 47ac09b91befbb6a235ab620c32af719f8208399
--
2.46.0
pinmux_generic_get_function() can return NULL and the pointer 'function'
was dereferenced without checking against NULL. Add checking of pointer
'function' in pcs_get_function().
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 571aec4df5b7 ("pinctrl: single: Use generic pinmux helpers for managing functions")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-single.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 4c6bfabb6bd7..4da3c3f422b6 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -345,6 +345,8 @@ static int pcs_get_function(struct pinctrl_dev *pctldev, unsigned pin,
return -ENOTSUPP;
fselector = setting->func;
function = pinmux_generic_get_function(pctldev, fselector);
+ if (!function)
+ return -EINVAL;
*func = function->data;
if (!(*func)) {
dev_err(pcs->dev, "%s could not find function%i\n",
--
2.25.1
From: Oliver Neukum <oneukum(a)suse.com>
[ Upstream commit e5876b088ba03a62124266fa20d00e65533c7269 ]
ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.
Hence the order of actions in ipheth_close() needs
to be inverted.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Foster Snowhill <forst(a)pen.gy>
Tested-by: Georgi Valkov <gvalkov(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/usb/ipheth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c
index cea005cc7b2ab..c762335587a43 100644
--- a/drivers/net/usb/ipheth.c
+++ b/drivers/net/usb/ipheth.c
@@ -407,8 +407,8 @@ static int ipheth_close(struct net_device *net)
{
struct ipheth_device *dev = netdev_priv(net);
- cancel_delayed_work_sync(&dev->carrier_work);
netif_stop_queue(net);
+ cancel_delayed_work_sync(&dev->carrier_work);
return 0;
}
--
2.43.0
From: Oliver Neukum <oneukum(a)suse.com>
[ Upstream commit e5876b088ba03a62124266fa20d00e65533c7269 ]
ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.
Hence the order of actions in ipheth_close() needs
to be inverted.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Foster Snowhill <forst(a)pen.gy>
Tested-by: Georgi Valkov <gvalkov(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/usb/ipheth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c
index 73ad78f47763c..7814856636907 100644
--- a/drivers/net/usb/ipheth.c
+++ b/drivers/net/usb/ipheth.c
@@ -353,8 +353,8 @@ static int ipheth_close(struct net_device *net)
{
struct ipheth_device *dev = netdev_priv(net);
- cancel_delayed_work_sync(&dev->carrier_work);
netif_stop_queue(net);
+ cancel_delayed_work_sync(&dev->carrier_work);
return 0;
}
--
2.43.0
From: Oliver Neukum <oneukum(a)suse.com>
[ Upstream commit e5876b088ba03a62124266fa20d00e65533c7269 ]
ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.
Hence the order of actions in ipheth_close() needs
to be inverted.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Foster Snowhill <forst(a)pen.gy>
Tested-by: Georgi Valkov <gvalkov(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/usb/ipheth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c
index 06d9f19ca142a..0774d753dd316 100644
--- a/drivers/net/usb/ipheth.c
+++ b/drivers/net/usb/ipheth.c
@@ -353,8 +353,8 @@ static int ipheth_close(struct net_device *net)
{
struct ipheth_device *dev = netdev_priv(net);
- cancel_delayed_work_sync(&dev->carrier_work);
netif_stop_queue(net);
+ cancel_delayed_work_sync(&dev->carrier_work);
return 0;
}
--
2.43.0
From: Oliver Neukum <oneukum(a)suse.com>
[ Upstream commit e5876b088ba03a62124266fa20d00e65533c7269 ]
ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.
Hence the order of actions in ipheth_close() needs
to be inverted.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Foster Snowhill <forst(a)pen.gy>
Tested-by: Georgi Valkov <gvalkov(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/usb/ipheth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c
index d56e276e4d805..4485388dcff2e 100644
--- a/drivers/net/usb/ipheth.c
+++ b/drivers/net/usb/ipheth.c
@@ -353,8 +353,8 @@ static int ipheth_close(struct net_device *net)
{
struct ipheth_device *dev = netdev_priv(net);
- cancel_delayed_work_sync(&dev->carrier_work);
netif_stop_queue(net);
+ cancel_delayed_work_sync(&dev->carrier_work);
return 0;
}
--
2.43.0
Greetings!
I've run into a problem with an external microphone (3.5 jack headset) and
fresh kernels (6.9, 6.10)
When using the 6.9 or the 6.10 kernel system doesn't see an external mic
from the headset (3.5 jack). Instead, there's only an internal mic
plugged/available in the system.
When using the 6.8 kernel or less it's okay, I can use mic from the
headset. So currently I'm on 6.8.12.
Also opened a bug on bugzilla (
https://bugzilla.kernel.org/show_bug.cgi?id=219158)
Laptop Model: Asus ROG G14 GA402XY
Codec: Realtek ALC285 + Cirrus logic cs35l41
OS: Debian Testing
Kernel: 6.9.x, 6.10 represents the issue, 6.8.x no
DE: Gnome 46, wayland
What I've tried:
- Play with snd_hda_intel model parameters from here:
https://www.kernel.org/doc/html/latest/sound/hd-audio/models.html?highlight…
ex: options snd_hda_intel model=dell-headset-mic
- Retask jack with hdajackretask
- Install a fresh system on another ssd (latest PopOS) - after kernel
update to 6.9.x (from the repository) it's the same issue.
On Mon, Aug 19, 2024 at 10:25:08AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> Input: bcm5974 - check endpoint type before starting traffic
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> input-bcm5974-check-endpoint-type-before-starting-tr.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please drop it, it was reverted.
Thanks.
--
Dmitry
[Public]
> -----Original Message-----
> From: Sasha Levin <sashal(a)kernel.org>
> Sent: Wednesday, August 21, 2024 9:33 AM
> To: stable-commits(a)vger.kernel.org; Xiao, Jack <Jack.Xiao(a)amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher(a)amd.com>; Koenig, Christian
> <Christian.Koenig(a)amd.com>; Pan, Xinhui <Xinhui.Pan(a)amd.com>; David
> Airlie <airlied(a)gmail.com>; Daniel Vetter <daniel(a)ffwll.ch>
> Subject: Patch "drm/amdgpu/gfx11: need acquire mutex before access
> CP_VMID_RESET v2" has been added to the 6.6-stable tree
>
> This is a note to let you know that I've just added the patch titled
>
> drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET v2
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-
> queue.git;a=summary
>
> The filename of the patch is:
> drm-amdgpu-gfx11-need-acquire-mutex-before-access-cp.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree, please let
> <stable(a)vger.kernel.org> know about it.
>
This patch is not stable material. Please drop for stable.
Thanks,
Alex
>
>
> commit 72516630230bee2668c491fdafcac27c565a5ad5
> Author: Jack Xiao <Jack.Xiao(a)amd.com>
> Date: Tue Dec 19 17:10:34 2023 +0800
>
> drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET v2
>
> [ Upstream commit 4b5c5f5ad38b9435518730cc7f8f1e8de9c5cb2f ]
>
> It's required to take the gfx mutex before access to CP_VMID_RESET,
> for there is a race condition with CP firmware to write the register.
>
> v2: add extra code to ensure the mutex releasing is successful.
>
> Signed-off-by: Jack Xiao <Jack.Xiao(a)amd.com>
> Reviewed-by: Hawking Zhang <Hawking.Zhang(a)amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index c81e98f0d17ff..17a09e96b30fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -4430,11 +4430,43 @@ static int gfx_v11_0_wait_for_idle(void *handle)
> return -ETIMEDOUT;
> }
>
> +static int gfx_v11_0_request_gfx_index_mutex(struct amdgpu_device *adev,
> + int req)
> +{
> + u32 i, tmp, val;
> +
> + for (i = 0; i < adev->usec_timeout; i++) {
> + /* Request with MeId=2, PipeId=0 */
> + tmp = REG_SET_FIELD(0, CP_GFX_INDEX_MUTEX, REQUEST,
> req);
> + tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX,
> CLIENTID, 4);
> + WREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX, tmp);
> +
> + val = RREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX);
> + if (req) {
> + if (val == tmp)
> + break;
> + } else {
> + tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX,
> + REQUEST, 1);
> +
> + /* unlocked or locked by firmware */
> + if (val != tmp)
> + break;
> + }
> + udelay(1);
> + }
> +
> + if (i >= adev->usec_timeout)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> static int gfx_v11_0_soft_reset(void *handle) {
> u32 grbm_soft_reset = 0;
> u32 tmp;
> - int i, j, k;
> + int r, i, j, k;
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> tmp = RREG32_SOC15(GC, 0, regCP_INT_CNTL); @@ -4474,6
> +4506,13 @@ static int gfx_v11_0_soft_reset(void *handle)
> }
> }
>
> + /* Try to acquire the gfx mutex before access to CP_VMID_RESET */
> + r = gfx_v11_0_request_gfx_index_mutex(adev, 1);
> + if (r) {
> + DRM_ERROR("Failed to acquire the gfx mutex during soft
> reset\n");
> + return r;
> + }
> +
> WREG32_SOC15(GC, 0, regCP_VMID_RESET, 0xfffffffe);
>
> // Read CP_VMID_RESET register three times.
> @@ -4482,6 +4521,13 @@ static int gfx_v11_0_soft_reset(void *handle)
> RREG32_SOC15(GC, 0, regCP_VMID_RESET);
> RREG32_SOC15(GC, 0, regCP_VMID_RESET);
>
> + /* release the gfx mutex */
> + r = gfx_v11_0_request_gfx_index_mutex(adev, 0);
> + if (r) {
> + DRM_ERROR("Failed to release the gfx mutex during soft
> reset\n");
> + return r;
> + }
> +
> for (i = 0; i < adev->usec_timeout; i++) {
> if (!RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) &&
> !RREG32_SOC15(GC, 0, regCP_GFX_HQD_ACTIVE))
On 21. 08. 24, 15:37, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> serial: pch: Don't disable interrupts while acquiring lock in ISR.
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> serial-pch-don-t-disable-interrupts-while-acquiring-.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I feel so. It does not fix anything real. It is a prep for
d47dd323bf959. So unless you take that too, this one does not make sense
on its own.
> commit 2e7194802a740ab6ef47e19e56bd1b06c03610d3
> Author: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
> Date: Fri Mar 1 22:45:28 2024 +0100
>
> serial: pch: Don't disable interrupts while acquiring lock in ISR.
>
> [ Upstream commit f8ff23ebce8c305383c8070e1ea3b08a69eb1e8d ]
>
> The interrupt service routine is always invoked with disabled
> interrupts.
>
> Remove the _irqsave() from the locking functions in the interrupts
> service routine/ pch_uart_interrupt().
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
> Link: https://lore.kernel.org/r/20240301215246.891055-16-bigeasy@linutronix.de
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/tty/serial/pch_uart.c b/drivers/tty/serial/pch_uart.c
> index abff1c6470f6a..d638e890ef6f0 100644
> --- a/drivers/tty/serial/pch_uart.c
> +++ b/drivers/tty/serial/pch_uart.c
> @@ -1023,11 +1023,10 @@ static irqreturn_t pch_uart_interrupt(int irq, void *dev_id)
> u8 lsr;
> int ret = 0;
> unsigned char iid;
> - unsigned long flags;
> int next = 1;
> u8 msr;
>
> - spin_lock_irqsave(&priv->lock, flags);
> + spin_lock(&priv->lock);
> handled = 0;
> while (next) {
> iid = pch_uart_hal_get_iid(priv);
> @@ -1087,7 +1086,7 @@ static irqreturn_t pch_uart_interrupt(int irq, void *dev_id)
> handled |= (unsigned int)ret;
> }
>
> - spin_unlock_irqrestore(&priv->lock, flags);
> + spin_unlock(&priv->lock);
> return IRQ_RETVAL(handled);
> }
>
--
js
suse labs
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
An uninitialized variable @data.have_async may be used as analyzed
by the following inline comments:
static int __device_attach(struct device *dev, bool allow_async)
{
// if @allow_async is true.
...
struct device_attach_data data = {
.dev = dev,
.check_async = allow_async,
.want_async = false,
};
// @data.have_async is not initialized.
...
ret = bus_for_each_drv(dev->bus, NULL, &data,
__device_attach_driver);
// @data.have_async must not be set by __device_attach_driver() if
// @dev->bus does not have driver which allows probe asynchronously
if (!ret && allow_async && data.have_async) {
// Above @data.have_async is not uninitialized but used.
...
}
...
}
It may be unnecessary to trigger the second pass probing asynchronous
drivers for the device @dev.
Fixed by initializing @data.have_async to false.
Fixes: 765230b5f084 ("driver-core: add asynchronous probing support for drivers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
drivers/base/dd.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9b745ba54de1..b0c44b0846aa 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1021,6 +1021,7 @@ static int __device_attach(struct device *dev, bool allow_async)
.dev = dev,
.check_async = allow_async,
.want_async = false,
+ .have_async = false,
};
if (dev->parent)
---
base-commit: 87ee9981d1f86ee9b1623a46c7f9e4ac24461fe4
change-id: 20240823-fix_have_async-3a135618d91b
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 71c8e2a7c822ee557b07d9bb49028dd269c87b2e
Gitweb: https://git.kernel.org/tip/71c8e2a7c822ee557b07d9bb49028dd269c87b2e
Author: Mark Rutland <mark.rutland(a)arm.com>
AuthorDate: Thu, 22 Aug 2024 11:23:08 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 23 Aug 2024 12:45:45 +02:00
irqchip/gic-v3: Init SRE before poking sysregs
The GICv3 driver pokes GICv3 system registers in gic_prio_init() before
gic_cpu_sys_reg_init() ensures that GICv3 system registers have been
enabled by writing to ICC_SRE_EL1.SRE.
On arm64 this is benign as has_useable_gicv3_cpuif() runs earlier during
cpufeature detection, and this enables the GICv3 system registers.
On 32-bit arm when booting on an FVP using the boot-wrapper, the accesses
in gic_prio_init() end up being UNDEFINED and crashes the kernel during
boot.
This is a regression introduced by the addition of gic_prio_init().
Fix this by factoring out the SRE initialization into a new function and
calling it early in the three paths where SRE may not have been
initialized:
(1) gic_init_bases(), before the primary CPU pokes GICv3 sysregs in
gic_prio_init().
(2) gic_starting_cpu(), before secondary CPUs initialize GICv3 sysregs
in gic_cpu_init().
(3) gic_cpu_pm_notifier(), before CPUs re-initialize GICv3 sysregs in
gic_cpu_sys_reg_init().
Fixes: d447bf09a4013541 ("irqchip/gic-v3: Detect GICD_CTRL.DS and SCR_EL3.FIQ earlier")
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-gic-v3.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index c19083b..74f21e0 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1154,14 +1154,8 @@ static void gic_update_rdist_properties(void)
gic_data.rdists.has_vpend_valid_dirty ? "Valid+Dirty " : "");
}
-static void gic_cpu_sys_reg_init(void)
+static void gic_cpu_sys_reg_enable(void)
{
- int i, cpu = smp_processor_id();
- u64 mpidr = gic_cpu_to_affinity(cpu);
- u64 need_rss = MPIDR_RS(mpidr);
- bool group0;
- u32 pribits;
-
/*
* Need to check that the SRE bit has actually been set. If
* not, it means that SRE is disabled at EL2. We're going to
@@ -1172,6 +1166,16 @@ static void gic_cpu_sys_reg_init(void)
if (!gic_enable_sre())
pr_err("GIC: unable to set SRE (disabled at EL2), panic ahead\n");
+}
+
+static void gic_cpu_sys_reg_init(void)
+{
+ int i, cpu = smp_processor_id();
+ u64 mpidr = gic_cpu_to_affinity(cpu);
+ u64 need_rss = MPIDR_RS(mpidr);
+ bool group0;
+ u32 pribits;
+
pribits = gic_get_pribits();
group0 = gic_has_group0();
@@ -1333,6 +1337,7 @@ static int gic_check_rdist(unsigned int cpu)
static int gic_starting_cpu(unsigned int cpu)
{
+ gic_cpu_sys_reg_enable();
gic_cpu_init();
if (gic_dist_supports_lpis())
@@ -1498,6 +1503,7 @@ static int gic_cpu_pm_notifier(struct notifier_block *self,
if (cmd == CPU_PM_EXIT) {
if (gic_dist_security_disabled())
gic_enable_redist(true);
+ gic_cpu_sys_reg_enable();
gic_cpu_sys_reg_init();
} else if (cmd == CPU_PM_ENTER && gic_dist_security_disabled()) {
gic_write_grpen1(0);
@@ -2070,6 +2076,7 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
gic_update_rdist_properties();
+ gic_cpu_sys_reg_enable();
gic_prio_init();
gic_dist_init();
gic_cpu_init();
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 14d069d92951a3e150c0a81f2ca3b93e54da913b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081950-amaze-wriggle-3057@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
14d069d92951 ("i2c: tegra: Do not mark ACPI devices as irq safe")
4f5d68c85914 ("i2c: tegra: allow VI support to be compiled out")
a55efa7edf37 ("i2c: tegra: allow DVC support to be compiled out")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 14d069d92951a3e150c0a81f2ca3b93e54da913b Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Tue, 13 Aug 2024 09:12:53 -0700
Subject: [PATCH] i2c: tegra: Do not mark ACPI devices as irq safe
On ACPI machines, the tegra i2c module encounters an issue due to a
mutex being called inside a spinlock. This leads to the following bug:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
...
Call trace:
__might_sleep
__mutex_lock_common
mutex_lock_nested
acpi_subsys_runtime_resume
rpm_resume
tegra_i2c_xfer
The problem arises because during __pm_runtime_resume(), the spinlock
&dev->power.lock is acquired before rpm_resume() is called. Later,
rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
mutexes, triggering the error.
To address this issue, devices on ACPI are now marked as not IRQ-safe,
considering the dependency of acpi_subsys_runtime_resume() on mutexes.
Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support")
Cc: <stable(a)vger.kernel.org> # v5.17+
Co-developed-by: Michael van der Westhuizen <rmikey(a)meta.com>
Signed-off-by: Michael van der Westhuizen <rmikey(a)meta.com>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Dmitry Osipenko <digetx(a)gmail.com>
Reviewed-by: Andy Shevchenko <andy(a)kernel.org>
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 85b31edc558d..1df5b4204142 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -1802,9 +1802,9 @@ static int tegra_i2c_probe(struct platform_device *pdev)
* domain.
*
* VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
- * be used for atomic transfers.
+ * be used for atomic transfers. ACPI device is not IRQ safe also.
*/
- if (!IS_VI(i2c_dev))
+ if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev))
pm_runtime_irq_safe(i2c_dev->dev);
pm_runtime_enable(i2c_dev->dev);
pinmux_generic_get_function() can return NULL and the pointer 'function'
was dereferenced without checking against NULL. Add checking of pointer
'function' in pcs_get_function().
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 571aec4df5b7 ("pinctrl: single: Use generic pinmux helpers for managing functions")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-single.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 4c6bfabb6bd7..4da3c3f422b6 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -345,6 +345,8 @@ static int pcs_get_function(struct pinctrl_dev *pctldev, unsigned pin,
return -ENOTSUPP;
fselector = setting->func;
function = pinmux_generic_get_function(pctldev, fselector);
+ if (!function)
+ return -EINVAL;
*func = function->data;
if (!(*func)) {
dev_err(pcs->dev, "%s could not find function%i\n",
--
2.25.1
Hi
While building 6.1.106 based verson for Debian I noticed that all
32bit architectures did fail to build:
https://buildd.debian.org/status/fetch.php?pkg=linux&arch=i386&ver=6.1.106-…
The problem is known as
https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/
This now affects as well 6.1.y as the commits 867046cc7027 ("minmax:
relax check to allow comparison between unsigned arguments and signed
constants") and 4ead534fba42 ("minmax: allow comparisons of 'int'
against 'unsigned char/short'") were backported to 6.1.106.
Thus, can you please pick as well 31e97d7c9ae3 ("media: solo6x10:
replace max(a, min(b, c)) by clamp(b, a, c)") for 6.1.y?
Note I suspect it is required as well for 5.15.164 (as the commits
were backported there as well and 31e97d7c9ae3 now missing there).
Regards,
Salvatore
It has turned out that having _set_required_opps() to recursively call
dev_pm_opp_set_opp() to set the required OPPs, doesn't really work as well
as we expected.
More precisely, at each recursive call to dev_pm_opp_set_opp() we are
changing an OPP for a required_dev that belongs to a required-OPP table.
The problem with this, is that we may have several devices sharing the same
required-OPP table, which leads to an incorrect behaviour in regards to
aggregating the per device votes.
To fix the problem for a required-OPP table belonging to a PM domain, which
is the only existing usecase for now, let's simply replace the call to
dev_pm_opp_set_opp() in _set_required_opps() by a call to _set_opp_level().
Moving forward we may potentially need to add support for other types of
required-OPP tables. In this case, the aggregation needs to be thought of.
Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
Changes in v3:
- Clarified the commitmsg.
Changes in v2:
- Clarified the commitmsg.
- Addressed some comments from Viresh.
- Drop calls to _add_opp_dev() for required_devs.
---
drivers/opp/core.c | 56 ++++++++++++++++++----------------------------
1 file changed, 22 insertions(+), 34 deletions(-)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 5f4598246a87..494f8860220d 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1061,6 +1061,27 @@ static int _set_opp_bw(const struct opp_table *opp_table,
return 0;
}
+static int _set_opp_level(struct device *dev, struct dev_pm_opp *opp)
+{
+ unsigned int level = 0;
+ int ret = 0;
+
+ if (opp) {
+ if (opp->level == OPP_LEVEL_UNSET)
+ return 0;
+
+ level = opp->level;
+ }
+
+ /* Request a new performance state through the device's PM domain. */
+ ret = dev_pm_domain_set_performance_state(dev, level);
+ if (ret)
+ dev_err(dev, "Failed to set performance state %u (%d)\n", level,
+ ret);
+
+ return ret;
+}
+
/* This is only called for PM domain for now */
static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
struct dev_pm_opp *opp, bool up)
@@ -1091,7 +1112,7 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
if (devs[index]) {
required_opp = opp ? opp->required_opps[index] : NULL;
- ret = dev_pm_opp_set_opp(devs[index], required_opp);
+ ret = _set_opp_level(devs[index], required_opp);
if (ret)
return ret;
}
@@ -1102,27 +1123,6 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
return 0;
}
-static int _set_opp_level(struct device *dev, struct dev_pm_opp *opp)
-{
- unsigned int level = 0;
- int ret = 0;
-
- if (opp) {
- if (opp->level == OPP_LEVEL_UNSET)
- return 0;
-
- level = opp->level;
- }
-
- /* Request a new performance state through the device's PM domain. */
- ret = dev_pm_domain_set_performance_state(dev, level);
- if (ret)
- dev_err(dev, "Failed to set performance state %u (%d)\n", level,
- ret);
-
- return ret;
-}
-
static void _find_current_opp(struct device *dev, struct opp_table *opp_table)
{
struct dev_pm_opp *opp = ERR_PTR(-ENODEV);
@@ -2457,18 +2457,6 @@ static int _opp_attach_genpd(struct opp_table *opp_table, struct device *dev,
}
}
- /*
- * Add the virtual genpd device as a user of the OPP table, so
- * we can call dev_pm_opp_set_opp() on it directly.
- *
- * This will be automatically removed when the OPP table is
- * removed, don't need to handle that here.
- */
- if (!_add_opp_dev(virt_dev, opp_table->required_opp_tables[index])) {
- ret = -ENOMEM;
- goto err;
- }
-
opp_table->required_devs[index] = virt_dev;
index++;
name++;
--
2.34.1
Binder objects are processed and copied individually into the target
buffer during transactions. Any raw data in-between these objects is
copied as well. However, this raw data copy lacks an out-of-bounds
check. If the raw data exceeds the data section size then the copy
overwrites the offsets section. This eventually triggers an error that
attempts to unwind the processed objects. However, at this point the
offsets used to index these objects are now corrupted.
Unwinding with corrupted offsets can result in decrements of arbitrary
nodes and lead to their premature release. Other users of such nodes are
left with a dangling pointer triggering a use-after-free. This issue is
made evident by the following KASAN report (trimmed):
==================================================================
BUG: KASAN: slab-use-after-free in _raw_spin_lock+0xe4/0x19c
Write of size 4 at addr ffff47fc91598f04 by task binder-util/743
CPU: 9 UID: 0 PID: 743 Comm: binder-util Not tainted 6.11.0-rc4 #1
Hardware name: linux,dummy-virt (DT)
Call trace:
_raw_spin_lock+0xe4/0x19c
binder_free_buf+0x128/0x434
binder_thread_write+0x8a4/0x3260
binder_ioctl+0x18f0/0x258c
[...]
Allocated by task 743:
__kmalloc_cache_noprof+0x110/0x270
binder_new_node+0x50/0x700
binder_transaction+0x413c/0x6da8
binder_thread_write+0x978/0x3260
binder_ioctl+0x18f0/0x258c
[...]
Freed by task 745:
kfree+0xbc/0x208
binder_thread_read+0x1c5c/0x37d4
binder_ioctl+0x16d8/0x258c
[...]
==================================================================
To avoid this issue, let's check that the raw data copy is within the
boundaries of the data section.
Fixes: 6d98eb95b450 ("binder: avoid potential data leakage when copying txn")
Cc: Todd Kjos <tkjos(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
drivers/android/binder.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 905290c98c3c..e8643c69d426 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3422,6 +3422,7 @@ static void binder_transaction(struct binder_proc *proc,
*/
copy_size = object_offset - user_offset;
if (copy_size && (user_offset > object_offset ||
+ object_offset > tr->data_size ||
binder_alloc_copy_user_to_buffer(
&target_proc->alloc,
t->buffer, user_offset,
--
2.46.0.295.g3b9ea8a38a-goog
The patch titled
Subject: revert-mm-skip-cma-pages-when-they-are-not-available-update
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-mm-skip-cma-pages-when-they-are-not-available-update.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Usama Arif <usamaarif642(a)gmail.com>
Subject: revert-mm-skip-cma-pages-when-they-are-not-available-update
Date: Wed, 21 Aug 2024 20:26:07 +0100
also revert b7108d66318a ("Multi-gen LRU: skip CMA pages when they are not
eligible"), per Johannes
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Link: https://lkml.kernel.org/r/357ac325-4c61-497a-92a3-bdbd230d5ec9@gmail.com
Fixes: 5da226dbfce3 ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Bharata B Rao <bharata(a)amd.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Zhaoyang Huang <huangzhaoyang(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 21 +--------------------
1 file changed, 1 insertion(+), 20 deletions(-)
--- a/mm/vmscan.c~revert-mm-skip-cma-pages-when-they-are-not-available-update
+++ a/mm/vmscan.c
@@ -4253,25 +4253,6 @@ void lru_gen_soft_reclaim(struct mem_cgr
#endif /* CONFIG_MEMCG */
-#ifdef CONFIG_CMA
-/*
- * It is waste of effort to scan and reclaim CMA pages if it is not available
- * for current allocation context. Kswapd can not be enrolled as it can not
- * distinguish this scenario by using sc->gfp_mask = GFP_KERNEL
- */
-static bool skip_cma(struct folio *folio, struct scan_control *sc)
-{
- return !current_is_kswapd() &&
- gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
- folio_migratetype(folio) == MIGRATE_CMA;
-}
-#else
-static bool skip_cma(struct folio *folio, struct scan_control *sc)
-{
- return false;
-}
-#endif
-
/******************************************************************************
* the eviction
******************************************************************************/
@@ -4319,7 +4300,7 @@ static bool sort_folio(struct lruvec *lr
}
/* ineligible */
- if (zone > sc->reclaim_idx || skip_cma(folio, sc)) {
+ if (zone > sc->reclaim_idx) {
gen = folio_inc_gen(lruvec, folio, false);
list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
return true;
_
Patches currently in -mm which might be from usamaarif642(a)gmail.com are
revert-mm-skip-cma-pages-when-they-are-not-available.patch
revert-mm-skip-cma-pages-when-they-are-not-available-update.patch
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 61119394631f219e23ce98bcc3eb993a64a8ea64
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081917-flanked-clear-e564@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
61119394631f ("riscv: entry: always initialize regs->a0 to -ENOSYS")
05d450aabd73 ("riscv: Support RANDOMIZE_KSTACK_OFFSET")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 61119394631f219e23ce98bcc3eb993a64a8ea64 Mon Sep 17 00:00:00 2001
From: Celeste Liu <coelacanthushex(a)gmail.com>
Date: Thu, 27 Jun 2024 22:23:39 +0800
Subject: [PATCH] riscv: entry: always initialize regs->a0 to -ENOSYS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Otherwise when the tracer changes syscall number to -1, the kernel fails
to initialize a0 with -ENOSYS and subsequently fails to return the error
code of the failed syscall to userspace. For example, it will break
strace syscall tampering.
Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")
Reported-by: "Dmitry V. Levin" <ldv(a)strace.io>
Reviewed-by: Björn Töpel <bjorn(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Celeste Liu <CoelacanthusHex(a)gmail.com>
Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 05a16b1f0aee..51ebfd23e007 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
regs->epc += 4;
regs->orig_a0 = regs->a0;
+ regs->a0 = -ENOSYS;
riscv_v_vstate_discard(regs);
@@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
if (syscall >= 0 && syscall < NR_syscalls)
syscall_handler(regs, syscall);
- else if (syscall != -1)
- regs->a0 = -ENOSYS;
+
/*
* Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
* so the maximum stack offset is 1k bytes (10 bits).
File-scope "__pmic_glink_lock" mutex protects the filke-scope
"__pmic_glink", thus reference to it should be obtained under the lock,
just like pmic_glink_rpmsg_remove() is doing. Otherwise we have a race
during if PMIC GLINK device removal: the pmic_glink_rpmsg_probe()
function could store local reference before mutex in driver removal is
acquired.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
drivers/soc/qcom/pmic_glink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 9606222993fd..452f30a9354d 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -217,10 +217,11 @@ static void pmic_glink_pdr_callback(int state, char *svc_path, void *priv)
static int pmic_glink_rpmsg_probe(struct rpmsg_device *rpdev)
{
- struct pmic_glink *pg = __pmic_glink;
+ struct pmic_glink *pg;
int ret = 0;
mutex_lock(&__pmic_glink_lock);
+ pg = __pmic_glink;
if (!pg) {
ret = dev_err_probe(&rpdev->dev, -ENODEV, "no pmic_glink device to attach to\n");
goto out_unlock;
--
2.43.0
https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422
[PATCH v3 1/2] bfs: prevent null pointer dereference in bfs_move_block()
v3: Changed the error handling
[PATCH v3 2/2] bfs: ensure buffer is marked uptodate before marking it dirty
v3: Replaced the buffer up-to-date check with an error exit by forcefully
setting the buffer as up-to-date before call mark_buffer_dirty()
From: Haitao Shan <hshan(a)google.com>
When running android emulator (which is based on QEMU 2.12) on
certain Intel hosts with kernel version 6.3-rc1 or above, guest
will freeze after loading a snapshot. This is almost 100%
reproducible. By default, the android emulator will use snapshot
to speed up the next launching of the same android guest. So
this breaks the android emulator badly.
I tested QEMU 8.0.4 from Debian 12 with an Ubuntu 22.04 guest by
running command "loadvm" after "savevm". The same issue is
observed. At the same time, none of our AMD platforms is impacted.
More experiments show that loading the KVM module with
"enable_apicv=false" can workaround it.
The issue started to show up after commit 8e6ed96cdd50 ("KVM: x86:
fire timer when it is migrated and expired, and in oneshot mode").
However, as is pointed out by Sean Christopherson, it is introduced
by commit 967235d32032 ("KVM: vmx: clear pending interrupts on
KVM_SET_LAPIC"). commit 8e6ed96cdd50 ("KVM: x86: fire timer when
it is migrated and expired, and in oneshot mode") just makes it
easier to hit the issue.
Having both commits, the oneshot lapic timer gets fired immediately
inside the KVM_SET_LAPIC call when loading the snapshot. On Intel
platforms with APIC virtualization and posted interrupt processing,
this eventually leads to setting the corresponding PIR bit. However,
the whole PIR bits get cleared later in the same KVM_SET_LAPIC call
by apicv_post_state_restore. This leads to timer interrupt lost.
The fix is to move vmx_apicv_post_state_restore to the beginning of
the KVM_SET_LAPIC call and rename to vmx_apicv_pre_state_restore.
What vmx_apicv_post_state_restore does is actually clearing any
former apicv state and this behavior is more suitable to carry out
in the beginning.
Fixes: 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC")
Cc: stable(a)vger.kernel.org
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Haitao Shan <hshan(a)google.com>
Link: https://lore.kernel.org/r/20230913000215.478387-1-hshan@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: David Hunter <david.hunter.linux(a)gmail.com>
---
arch/x86/kvm/vmx/vmx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 87abf4eebf8a..4040075bbd5a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8203,6 +8203,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.apicv_pre_state_restore = vmx_apicv_pre_state_restore,
.check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons,
+ .required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
.hwapic_irr_update = vmx_hwapic_irr_update,
.hwapic_isr_update = vmx_hwapic_isr_update,
.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
--
2.43.0
In aperture_remove_conflicting_pci_devices(), we currently only
call sysfb_disable() on vga class devices. This leads to the
following problem when the pimary device is not VGA compatible:
1. A PCI device with a non-VGA class is the boot display
2. That device is probed first and it is not a VGA device so
sysfb_disable() is not called, but the device resources
are freed by aperture_detach_platform_device()
3. Non-primary GPU has a VGA class and it ends up calling sysfb_disable()
4. NULL pointer dereference via sysfb_disable() since the resources
have already been freed by aperture_detach_platform_device() when
it was called by the other device.
Fix this by passing a device pointer to sysfb_disable() and checking
the device to determine if we should execute it or not.
v2: Fix build when CONFIG_SCREEN_INFO is not set
v3: Move device check into the mutex
Drop primary variable in aperture_remove_conflicting_pci_devices()
Drop __init on pci sysfb_pci_dev_is_enabled()
Fixes: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/firmware/sysfb.c | 19 +++++++++++++------
drivers/of/platform.c | 2 +-
drivers/video/aperture.c | 11 +++--------
include/linux/sysfb.h | 4 ++--
4 files changed, 19 insertions(+), 17 deletions(-)
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 880ffcb50088..ac4680dc463f 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -39,6 +39,8 @@ static struct platform_device *pd;
static DEFINE_MUTEX(disable_lock);
static bool disabled;
+static struct device *sysfb_parent_dev(const struct screen_info *si);
+
static bool sysfb_unregister(void)
{
if (IS_ERR_OR_NULL(pd))
@@ -52,6 +54,7 @@ static bool sysfb_unregister(void)
/**
* sysfb_disable() - disable the Generic System Framebuffers support
+ * @dev: the device to check if non-NULL
*
* This disables the registration of system framebuffer devices that match the
* generic drivers that make use of the system framebuffer set up by firmware.
@@ -61,17 +64,21 @@ static bool sysfb_unregister(void)
* Context: The function can sleep. A @disable_lock mutex is acquired to serialize
* against sysfb_init(), that registers a system framebuffer device.
*/
-void sysfb_disable(void)
+void sysfb_disable(struct device *dev)
{
+ struct screen_info *si = &screen_info;
+
mutex_lock(&disable_lock);
- sysfb_unregister();
- disabled = true;
+ if (!dev || dev == sysfb_parent_dev(si)) {
+ sysfb_unregister();
+ disabled = true;
+ }
mutex_unlock(&disable_lock);
}
EXPORT_SYMBOL_GPL(sysfb_disable);
#if defined(CONFIG_PCI)
-static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
+static bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
{
/*
* TODO: Try to integrate this code into the PCI subsystem
@@ -87,13 +94,13 @@ static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
return true;
}
#else
-static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
+static bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
{
return false;
}
#endif
-static __init struct device *sysfb_parent_dev(const struct screen_info *si)
+static struct device *sysfb_parent_dev(const struct screen_info *si)
{
struct pci_dev *pdev;
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 389d4ea6bfc1..ef622d41eb5b 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -592,7 +592,7 @@ static int __init of_platform_default_populate_init(void)
* This can happen for example on DT systems that do EFI
* booting and may provide a GOP handle to the EFI stub.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
of_platform_device_create(node, NULL, NULL);
of_node_put(node);
}
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..2b5a1e666e9b 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -293,7 +293,7 @@ int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si
* ask for this, so let's assume that a real driver for the display
* was already probed and prevent sysfb to register devices later.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
aperture_detach_devices(base, size);
@@ -346,15 +346,10 @@ EXPORT_SYMBOL(__aperture_remove_legacy_vga_devices);
*/
int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name)
{
- bool primary = false;
resource_size_t base, size;
int bar, ret = 0;
- if (pdev == vga_default_device())
- primary = true;
-
- if (primary)
- sysfb_disable();
+ sysfb_disable(&pdev->dev);
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
@@ -370,7 +365,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
* that consumes the VGA framebuffer I/O range. Remove this
* device as well.
*/
- if (primary)
+ if (pdev == vga_default_device())
ret = __aperture_remove_legacy_vga_devices(pdev);
return ret;
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index c9cb657dad08..bef5f06a91de 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,11 +58,11 @@ struct efifb_dmi_info {
#ifdef CONFIG_SYSFB
-void sysfb_disable(void);
+void sysfb_disable(struct device *dev);
#else /* CONFIG_SYSFB */
-static inline void sysfb_disable(void)
+static inline void sysfb_disable(struct device *dev)
{
}
--
2.46.0
Stop Endpoint command on LINK TRB with TC bit set to 1 causes that
internal cycle bit can have incorrect state after command complete.
In consequence empty transfer ring can be incorrectly detected
when EP is resumed.
NOP TRB before LINK TRB avoid such scenario. Stop Endpoint command
is then on NOP TRB and internal cycle bit is not changed and have
correct value.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
drivers/usb/cdns3/cdnsp-gadget.h | 3 +++
drivers/usb/cdns3/cdnsp-ring.c | 28 ++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.h b/drivers/usb/cdns3/cdnsp-gadget.h
index e1b5801fdddf..9a5577a772af 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.h
+++ b/drivers/usb/cdns3/cdnsp-gadget.h
@@ -811,6 +811,7 @@ struct cdnsp_stream_info {
* generate Missed Service Error Event.
* Set skip flag when receive a Missed Service Error Event and
* process the missed tds on the endpoint ring.
+ * @wa1_nop_trb: hold pointer to NOP trb.
*/
struct cdnsp_ep {
struct usb_ep endpoint;
@@ -838,6 +839,8 @@ struct cdnsp_ep {
#define EP_UNCONFIGURED BIT(7)
bool skip;
+ union cdnsp_trb *wa1_nop_trb;
+
};
/**
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 275a6a2fa671..75724e60653c 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -1904,6 +1904,23 @@ int cdnsp_queue_bulk_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
if (ret)
return ret;
+ /*
+ * workaround 1: STOP EP command on LINK TRB with TC bit set to 1
+ * causes that internal cycle bit can have incorrect state after
+ * command complete. In consequence empty transfer ring can be
+ * incorrectly detected when EP is resumed.
+ * NOP TRB before LINK TRB avoid such scenario. STOP EP command is
+ * then on NOP TRB and internal cycle bit is not changed and have
+ * correct value.
+ */
+ if (pep->wa1_nop_trb) {
+ field = le32_to_cpu(pep->wa1_nop_trb->trans_event.flags);
+ field ^= TRB_CYCLE;
+
+ pep->wa1_nop_trb->trans_event.flags = cpu_to_le32(field);
+ pep->wa1_nop_trb = NULL;
+ }
+
/*
* Don't give the first TRB to the hardware (by toggling the cycle bit)
* until we've finished creating all the other TRBs. The ring's cycle
@@ -1999,6 +2016,17 @@ int cdnsp_queue_bulk_tx(struct cdnsp_device *pdev, struct cdnsp_request *preq)
send_addr = addr;
}
+ if (cdnsp_trb_is_link(ring->enqueue + 1)) {
+ field = TRB_TYPE(TRB_TR_NOOP) | TRB_IOC;
+ if (!ring->cycle_state)
+ field |= TRB_CYCLE;
+
+ pep->wa1_nop_trb = ring->enqueue;
+
+ cdnsp_queue_trb(pdev, ring, 0, 0x0, 0x0,
+ TRB_INTR_TARGET(0), field);
+ }
+
cdnsp_check_trb_math(preq, enqd_len);
ret = cdnsp_giveback_first_trb(pdev, pep, preq->request.stream_id,
start_cycle, start_trb);
--
2.43.0
The MAC only has add the TX delay and it can not be modified.
MAC and PHY are both set the TX delay cause transmission problems.
So just disable TX delay in PHY, when use rgmii to attach to
external phy, set PHY_INTERFACE_MODE_RGMII_RXID to phy drivers.
And it is does not matter to internal phy.
Fixes: bc2426d74aa3 ("net: ngbe: convert phylib to phylink")
Signed-off-by: Mengyuan Lou <mengyuanlou(a)net-swift.com>
Cc: stable(a)vger.kernel.org # 6.3+
---
v3:
-Rebase the fix commit for net.
v2:
-Add a comment for the code modification.
-Add the problem in commit messages.
https://lore.kernel.org/netdev/E9C427FDDCF0CBC3+20240812103025.42417-1-meng…
v1:
https://lore.kernel.org/netdev/C1587837D62D1BC0+20240806082520.29193-1-meng…
drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c
index ec54b18c5fe7..a5e9b779c44d 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c
@@ -124,8 +124,12 @@ static int ngbe_phylink_init(struct wx *wx)
MAC_SYM_PAUSE | MAC_ASYM_PAUSE;
config->mac_managed_pm = true;
- phy_mode = PHY_INTERFACE_MODE_RGMII_ID;
- __set_bit(PHY_INTERFACE_MODE_RGMII_ID, config->supported_interfaces);
+ /* The MAC only has add the Tx delay and it can not be modified.
+ * So just disable TX delay in PHY, and it is does not matter to
+ * internal phy.
+ */
+ phy_mode = PHY_INTERFACE_MODE_RGMII_RXID;
+ __set_bit(PHY_INTERFACE_MODE_RGMII_RXID, config->supported_interfaces);
phylink = phylink_create(config, NULL, phy_mode, &ngbe_mac_ops);
if (IS_ERR(phylink))
--
2.43.2
On a system with a GICv3, if a guest hasn't been configured with
GICv3 and that the host is not capable of GICv2 emulation,
a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2.
We therefore try to emulate the SGI access, only to hit a NULL
pointer as no private interrupt is allocated (no GIC, remember?).
The obvious fix is to give the guest what it deserves, in the
shape of a UNDEF exception.
Reported-by: Alexander Potapenko <glider(a)google.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/kvm/sys_regs.c | 6 ++++++
arch/arm64/kvm/vgic/vgic.h | 7 +++++++
2 files changed, 13 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c90324060436..31e49da867ff 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -33,6 +33,7 @@
#include <trace/events/kvm.h>
#include "sys_regs.h"
+#include "vgic/vgic.h"
#include "trace.h"
@@ -435,6 +436,11 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
{
bool g1;
+ if (!kvm_has_gicv3(vcpu->kvm)) {
+ kvm_inject_undefined(vcpu);
+ return false;
+ }
+
if (!p->is_write)
return read_from_write_only(vcpu, p, r);
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index ba8f790431bd..8532bfe3fed4 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -346,4 +346,11 @@ void vgic_v4_configure_vsgis(struct kvm *kvm);
void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val);
int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq);
+static inline bool kvm_has_gicv3(struct kvm *kvm)
+{
+ return (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif) &&
+ irqchip_in_kernel(kvm) &&
+ kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3);
+}
+
#endif
--
2.39.2
Most firmware names are hardcoded strings, or are constructed from fairly
constrained format strings where the dynamic parts are just some hex
numbers or such.
However, there are a couple codepaths in the kernel where firmware file
names contain string components that are passed through from a device or
semi-privileged userspace; the ones I could find (not counting interfaces
that require root privileges) are:
- lpfc_sli4_request_firmware_update() seems to construct the firmware
filename from "ModelName", a string that was previously parsed out of
some descriptor ("Vital Product Data") in lpfc_fill_vpd()
- nfp_net_fw_find() seems to construct a firmware filename from a model
name coming from nfp_hwinfo_lookup(pf->hwinfo, "nffw.partno"), which I
think parses some descriptor that was read from the device.
(But this case likely isn't exploitable because the format string looks
like "netronome/nic_%s", and there shouldn't be any *folders* starting
with "netronome/nic_". The previous case was different because there,
the "%s" is *at the start* of the format string.)
- module_flash_fw_schedule() is reachable from the
ETHTOOL_MSG_MODULE_FW_FLASH_ACT netlink command, which is marked as
GENL_UNS_ADMIN_PERM (meaning CAP_NET_ADMIN inside a user namespace is
enough to pass the privilege check), and takes a userspace-provided
firmware name.
(But I think to reach this case, you need to have CAP_NET_ADMIN over a
network namespace that a special kind of ethernet device is mapped into,
so I think this is not a viable attack path in practice.)
For what it's worth, I went looking and haven't found any USB device
drivers that use the firmware loader dangerously.
Cc: stable(a)vger.kernel.org
Fixes: abb139e75c2c ("firmware: teach the kernel to load firmware files directly from the filesystem")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
I wasn't sure whether to mark this one for stable or not - but I think
since there seems to be at least one PCI device model which could
trigger firmware loading with directory traversal, we should probably
backport the fix?
---
drivers/base/firmware_loader/main.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
index a03ee4b11134..a32be64f3bf5 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -864,7 +864,15 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
if (!firmware_p)
return -EINVAL;
- if (!name || name[0] == '\0') {
+ /*
+ * Reject firmware file names with "/../" sequences in them.
+ * There are drivers that construct firmware file names from
+ * device-supplied strings, and we don't want some device to be able
+ * to tell us "I would like to be sent my firmware from
+ * ../../../etc/shadow, please".
+ */
+ if (!name || name[0] == '\0' ||
+ strstr(name, "/../") != NULL || strncmp(name, "../", 3) == 0) {
ret = -EINVAL;
goto out;
}
---
base-commit: b0da640826ba3b6506b4996a6b23a429235e6923
change-id: 20240820-firmware-traversal-6df8501b0fe4
--
Jann Horn <jannh(a)google.com>
Hi Greg,
I think commit 76cb763e6ea62e838ccc8f7a1ea4246d690fccc9 should be
applied to 6.10 kernel to disable Adaptive Backlight Management for
OLED displays.
Thanks,
Gergo Koteles
From: Chuck Lever <chuck.lever(a)oracle.com>
Following up on
https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
Here is a backport series targeting origin/linux-5.15.y that closes
the information leak described in the above thread.
Review comments welcome.
Chuck Lever (6):
NFSD: Refactor nfsd_reply_cache_free_locked()
NFSD: Rename nfsd_reply_cache_alloc()
NFSD: Replace nfsd_prune_bucket()
NFSD: Refactor the duplicate reply cache shrinker
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Jeff Layton (2):
nfsd: move reply cache initialization into nfsd startup
nfsd: move init of percpu reply_cache_stats counters back to
nfsd_init_net
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/export.c | 32 ++++--
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfscache.c | 202 ++++++++++++++++++++++---------------
fs/nfsd/nfsctl.c | 24 ++---
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 24 +++--
fs/nfsd/stats.c | 52 ++++------
fs/nfsd/stats.h | 83 ++++++---------
fs/nfsd/trace.h | 22 ++++
fs/nfsd/vfs.c | 6 +-
include/linux/sunrpc/svc.h | 5 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 36 ++++---
18 files changed, 302 insertions(+), 231 deletions(-)
--
2.45.2
The patch titled
Subject: padata: honor the caller's alignment in case of chunk_size 0
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
padata-honor-the-callers-alignment-in-case-of-chunk_size-0.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kamlesh Gurudasani <kamlesh(a)ti.com>
Subject: padata: honor the caller's alignment in case of chunk_size 0
Date: Thu, 22 Aug 2024 02:32:52 +0530
In the case where we are forcing the ps.chunk_size to be at least 1, we
are ignoring the caller's alignment.
Move the forcing of ps.chunk_size to be at least 1 before rounding it up
to caller's alignment, so that caller's alignment is honored.
While at it, use max() to force the ps.chunk_size to be at least 1 to
improve readability.
Link: https://lkml.kernel.org/r/20240822-max-v1-1-cb4bc5b1c101@ti.com
Fixes: 6d45e1c948a8 ("padata: Fix possible divide-by-0 panic in padata_mt_helper()")
Signed-off-by: Kamlesh Gurudasani <kamlesh(a)ti.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Waiman Long <longman(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/padata.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
--- a/kernel/padata.c~padata-honor-the-callers-alignment-in-case-of-chunk_size-0
+++ a/kernel/padata.c
@@ -509,21 +509,17 @@ void __init padata_do_multithreaded(stru
/*
* Chunk size is the amount of work a helper does per call to the
- * thread function. Load balance large jobs between threads by
+ * thread function. Load balance large jobs between threads by
* increasing the number of chunks, guarantee at least the minimum
* chunk size from the caller, and honor the caller's alignment.
+ * Ensure chunk_size is at least 1 to prevent divide-by-0
+ * panic in padata_mt_helper().
*/
ps.chunk_size = job->size / (ps.nworks * load_balance_factor);
ps.chunk_size = max(ps.chunk_size, job->min_chunk);
+ ps.chunk_size = max(ps.chunk_size, 1ul);
ps.chunk_size = roundup(ps.chunk_size, job->align);
- /*
- * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
- * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
- */
- if (!ps.chunk_size)
- ps.chunk_size = 1U;
-
list_for_each_entry(pw, &works, pw_list)
if (job->numa_aware) {
int old_node = atomic_read(&last_used_nid);
_
Patches currently in -mm which might be from kamlesh(a)ti.com are
padata-honor-the-callers-alignment-in-case-of-chunk_size-0.patch
The patch titled
Subject: Revert "mm: skip CMA pages when they are not available"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-mm-skip-cma-pages-when-they-are-not-available.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Usama Arif <usamaarif642(a)gmail.com>
Subject: Revert "mm: skip CMA pages when they are not available"
Date: Wed, 21 Aug 2024 20:26:07 +0100
This reverts commit 5da226dbfce3a2f44978c2c7cf88166e69a6788b.
lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios. If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it skips
those. For FIO workload, ~150million order=0 folios were skipped to
isolate a few ZONE_DMA folios [1]. This can cause lockups [1] and high
memory pressure for extended periods of time [2].
[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevG…
[2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@gmail.com/
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Fixes: 5da226dbfce3 ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
Cc: Bharata B Rao <bharata(a)amd.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Zhaoyang Huang <huangzhaoyang(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 41 ++++++++++++++++++++---------------------
1 file changed, 20 insertions(+), 21 deletions(-)
--- a/mm/vmscan.c~revert-mm-skip-cma-pages-when-they-are-not-available
+++ a/mm/vmscan.c
@@ -1604,25 +1604,6 @@ static __always_inline void update_lru_s
}
-#ifdef CONFIG_CMA
-/*
- * It is waste of effort to scan and reclaim CMA pages if it is not available
- * for current allocation context. Kswapd can not be enrolled as it can not
- * distinguish this scenario by using sc->gfp_mask = GFP_KERNEL
- */
-static bool skip_cma(struct folio *folio, struct scan_control *sc)
-{
- return !current_is_kswapd() &&
- gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
- folio_migratetype(folio) == MIGRATE_CMA;
-}
-#else
-static bool skip_cma(struct folio *folio, struct scan_control *sc)
-{
- return false;
-}
-#endif
-
/*
* Isolating page from the lruvec to fill in @dst list by nr_to_scan times.
*
@@ -1669,8 +1650,7 @@ static unsigned long isolate_lru_folios(
nr_pages = folio_nr_pages(folio);
total_scan += nr_pages;
- if (folio_zonenum(folio) > sc->reclaim_idx ||
- skip_cma(folio, sc)) {
+ if (folio_zonenum(folio) > sc->reclaim_idx) {
nr_skipped[folio_zonenum(folio)] += nr_pages;
move_to = &folios_skipped;
goto move;
@@ -4273,6 +4253,25 @@ void lru_gen_soft_reclaim(struct mem_cgr
#endif /* CONFIG_MEMCG */
+#ifdef CONFIG_CMA
+/*
+ * It is waste of effort to scan and reclaim CMA pages if it is not available
+ * for current allocation context. Kswapd can not be enrolled as it can not
+ * distinguish this scenario by using sc->gfp_mask = GFP_KERNEL
+ */
+static bool skip_cma(struct folio *folio, struct scan_control *sc)
+{
+ return !current_is_kswapd() &&
+ gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
+ folio_migratetype(folio) == MIGRATE_CMA;
+}
+#else
+static bool skip_cma(struct folio *folio, struct scan_control *sc)
+{
+ return false;
+}
+#endif
+
/******************************************************************************
* the eviction
******************************************************************************/
_
Patches currently in -mm which might be from usamaarif642(a)gmail.com are
revert-mm-skip-cma-pages-when-they-are-not-available.patch
[Public]
Thank you, Jiri, for your feedback.
I've dropped this patch from DC v.3.2.297.
We will follow-up on this separately and merge it after you do confirm the issue you reported is fixed.
Thanks,
Roman
> -----Original Message-----
> From: Jiri Slaby <jirislaby(a)kernel.org>
> Sent: Monday, August 19, 2024 4:37 AM
> To: Li, Roman <Roman.Li(a)amd.com>; amd-gfx(a)lists.freedesktop.org
> Cc: Wentland, Harry <Harry.Wentland(a)amd.com>; Li, Sun peng (Leo)
> <Sunpeng.Li(a)amd.com>; Siqueira, Rodrigo <Rodrigo.Siqueira(a)amd.com>;
> Pillai, Aurabindo <Aurabindo.Pillai(a)amd.com>; Lin, Wayne
> <Wayne.Lin(a)amd.com>; Gutierrez, Agustin <Agustin.Gutierrez(a)amd.com>;
> Chung, ChiaHsuan (Tom) <ChiaHsuan.Chung(a)amd.com>; Zuo, Jerry
> <Jerry.Zuo(a)amd.com>; Mohamed, Zaeem <Zaeem.Mohamed(a)amd.com>
> Subject: Re: RE: [PATCH 12/13] drm/amd/display: Fix a typo in revert commit
>
> On 16. 08. 24, 21:30, Li, Roman wrote:
> > [Public]
> >
> > Wiil update commit message as:
> >
> > -------------
> > drm/amd/display: Fix MST DSC lightup
> >
> > [Why]
> > Secondary monitor does not come up due to MST DSC bw calculation
> regression.
>
> This patch is only related to this. It does not fix that issue on its own at all.
>
> > [How]
> > Fix bug in try_disable_dsc()
>
> This update is worse than the original, IMO.
>
> Could you write saner commit logs in the whole amdgpu overall?
>
> If you insist on those [why] and [how] parts, something like:
> """
> [Why]
> The linked commit below misreverted one hunk in try_disable_dsc().
>
> [How]
> Fix that by using proper (original) 'max_kbps' instead of bogus 'stream_kbps'.
> ""
>
> > Fixes: 4b6564cb120c ("drm/amd/display: Fix MST BW calculation
> > Regression")
> >
> > Cc: mario.limonciello(a)amd.com
> > Cc: alexander.deucher(a)amd.com
> > Cc: stable(a)vger.kernel.org
> > Reported-by: jirislaby(a)kernel.org
>
> Care to fix up your machinery so that listed people are really CCed? I received a
> copy of neither the original (4b6564cb120c), nor this one.
>
> Nor any mentions in the linked #3495 at all.
>
> I would have told you that 4b6564cb120c is bogus. Immediately when it hit
> me as it differs from our (SUSE) in-tree revert in exactly this hunk. If I have
> known about this in the first place...
>
> And you would have received a Tested-by if it had worked.
>
> Given all the above, amdgpu workflow appears to be very ill. Please fix it.
>
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3495
> > Closes: https://bugzilla.suse.com/show_bug.cgi?id=1228093
> > Reviewed-by: Roman Li <roman.li(a)amd.com>
> > Signed-off-by: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
> > Signed-off-by: Roman Li <roman.li(a)amd.com>
> > Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
> >
> >
> >> -----Original Message-----
> >> From: Roman.Li(a)amd.com <Roman.Li(a)amd.com>
> >> Sent: Thursday, August 15, 2024 6:45 PM
> >> To: amd-gfx(a)lists.freedesktop.org
> >> Cc: Wentland, Harry <Harry.Wentland(a)amd.com>; Li, Sun peng (Leo)
> >> <Sunpeng.Li(a)amd.com>; Siqueira, Rodrigo <Rodrigo.Siqueira(a)amd.com>;
> >> Pillai, Aurabindo <Aurabindo.Pillai(a)amd.com>; Li, Roman
> >> <Roman.Li(a)amd.com>; Lin, Wayne <Wayne.Lin(a)amd.com>; Gutierrez,
> >> Agustin <Agustin.Gutierrez(a)amd.com>; Chung, ChiaHsuan (Tom)
> >> <ChiaHsuan.Chung(a)amd.com>; Zuo, Jerry <Jerry.Zuo(a)amd.com>;
> Mohamed,
> >> Zaeem <Zaeem.Mohamed(a)amd.com>; Zuo, Jerry <Jerry.Zuo(a)amd.com>
> >> Subject: [PATCH 12/13] drm/amd/display: Fix a typo in revert commit
> >>
> >> From: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
> >>
> >> A typo is fixed for "drm/amd/display: Fix MST BW calculation Regression"
> >>
> >> Fixes: 4b6564cb120c ("drm/amd/display: Fix MST BW calculation
> >> Regression")
> >>
> >> Reviewed-by: Roman Li <roman.li(a)amd.com>
> >> Signed-off-by: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
> >> Signed-off-by: Roman Li <roman.li(a)amd.com>
> >> ---
> >> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c |
> 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git
> >> a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >> index 958fad0d5307..5e08ca700c3f 100644
> >> ---
> a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >> +++
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >> @@ -1066,7 +1066,7 @@ static int try_disable_dsc(struct
> >> drm_atomic_state *state,
> >> vars[next_index].dsc_enabled = false;
> >> vars[next_index].bpp_x16 = 0;
> >> } else {
> >> - vars[next_index].pbn =
> >> kbps_to_peak_pbn(params[next_index].bw_range.stream_kbps,
> >> fec_overhead_multiplier_x1000);
> >> + vars[next_index].pbn =
> >> kbps_to_peak_pbn(params[next_index].bw_range.max_kbps,
> >> fec_overhead_multiplier_x1000);
> >> ret = drm_dp_atomic_find_time_slots(state,
> >>
> >> params[next_index].port->mgr,
> >>
> >> params[next_index].port,
>
>
> thanks,
> --
> js
> suse labs
When gfs2_fill_super() fails, destroy_workqueue()
is called within gfs2_gl_hash_clear(), and the
subsequent code path calls destroy_workqueue()
on the same work queue again.
This issue can be fixed by setting the work
queue pointer to NULL after the first
destroy_workqueue() call and checking for
a NULL pointer before attempting to destroy
the work queue again.
Reported-by: syzbot+d34c2a269ed512c531b0(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d34c2a269ed512c531b0
Fixes: 30e388d57367 ("gfs2: Switch to a per-filesystem glock workqueue")
Cc: stable(a)vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/gfs2/glock.c | 1 +
fs/gfs2/ops_fstype.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 32991cb22023..5838039d78e3 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2273,6 +2273,7 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
gfs2_free_dead_glocks(sdp);
glock_hash_walk(dump_glock_func, sdp);
destroy_workqueue(sdp->sd_glock_wq);
+ sdp->sd_glock_wq = NULL;
}
static const char *state2str(unsigned state)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 0561edd6cc86..5c0e1b24d6ec 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1308,7 +1308,8 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc)
fail_delete_wq:
destroy_workqueue(sdp->sd_delete_wq);
fail_glock_wq:
- destroy_workqueue(sdp->sd_glock_wq);
+ if (sdp->sd_glock_wq)
+ destroy_workqueue(sdp->sd_glock_wq);
fail_free:
free_sbd(sdp);
sb->s_fs_info = NULL;
--
2.39.2
The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when
parsing the FUSE daemon's response to a zero-length getxattr/listxattr
request.
On 32-bit kernels, where ssize_t and outarg.size are the same size, this is
wrong: The min_t() will pass through any size values that are negative when
interpreted as signed.
fuse_listxattr() will then return this userspace-supplied negative value,
which callers will treat as an error value.
This kind of bug pattern can lead to fairly bad security bugs because of
how error codes are used in the Linux kernel. If a caller were to convert
the numeric error into an error pointer, like so:
struct foo *func(...) {
int len = fuse_getxattr(..., NULL, 0);
if (len < 0)
return ERR_PTR(len);
...
}
then it would end up returning this userspace-supplied negative value cast
to a pointer - but the caller of this function wouldn't recognize it as an
error pointer (IS_ERR_VALUE() only detects values in the narrow range in
which legitimate errno values are), and so it would just be treated as a
kernel pointer.
I think there is at least one theoretical codepath where this could happen,
but that path would involve virtio-fs with submounts plus some weird
SELinux configuration, so I think it's probably not a concern in practice.
Cc: stable(a)vger.kernel.org
Fixes: 63401ccdb2ca ("fuse: limit xattr returned size")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
fs/fuse/xattr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 5b423fdbb13f..9f568d345c51 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -81,7 +81,7 @@ ssize_t fuse_getxattr(struct inode *inode, const char *name, void *value,
}
ret = fuse_simple_request(fm, &args);
if (!ret && !size)
- ret = min_t(ssize_t, outarg.size, XATTR_SIZE_MAX);
+ ret = min_t(size_t, outarg.size, XATTR_SIZE_MAX);
if (ret == -ENOSYS) {
fm->fc->no_getxattr = 1;
ret = -EOPNOTSUPP;
@@ -143,7 +143,7 @@ ssize_t fuse_listxattr(struct dentry *entry, char *list, size_t size)
}
ret = fuse_simple_request(fm, &args);
if (!ret && !size)
- ret = min_t(ssize_t, outarg.size, XATTR_LIST_MAX);
+ ret = min_t(size_t, outarg.size, XATTR_LIST_MAX);
if (ret > 0 && size)
ret = fuse_verify_xattr_list(list, ret);
if (ret == -ENOSYS) {
---
base-commit: b0da640826ba3b6506b4996a6b23a429235e6923
change-id: 20240819-fuse-oob-error-fix-664d082176d5
--
Jann Horn <jannh(a)google.com>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 40b760cfd44566bca791c80e0720d70d75382b84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-cheddar-oak-0777@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
40b760cfd445 ("mm/numa: no task_numa_fault() call if PTE is changed")
d2136d749d76 ("mm: support multi-size THP numa balancing")
6b0ed7b3c775 ("mm: factor out the numa mapping rebuilding into a new helper")
ec1778807a80 ("mm: mprotect: use a folio in change_pte_range()")
6695cf68b15c ("mm: memory: use a folio in do_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
df57721f9a63 ("Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40b760cfd44566bca791c80e0720d70d75382b84 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:04 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PTE is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
TLB flush via delaying mapping on hint page fault") restructured
do_numa_page() and did not avoid task_numa_fault() call in the second page
table check after a numa migration failure. Fix it by making all
!pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 34f8402d2046..3c01d68065be 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5295,7 +5295,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+ return 0;
}
pte = pte_modify(old_pte, vma->vm_page_prot);
@@ -5358,23 +5358,19 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
nid = target_nid;
flags |= TNF_MIGRATED;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
- vmf->address, &vmf->ptl);
- if (unlikely(!vmf->pte))
- goto out;
- if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, nr_pages, flags);
- return 0;
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+ vmf->address, &vmf->ptl);
+ if (unlikely(!vmf->pte))
+ return 0;
+ if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ return 0;
+ }
out_map:
/*
* Make it present again, depending on how arch implements
@@ -5387,7 +5383,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
writable);
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 40b760cfd44566bca791c80e0720d70d75382b84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081932-vastly-ice-7932@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
40b760cfd445 ("mm/numa: no task_numa_fault() call if PTE is changed")
d2136d749d76 ("mm: support multi-size THP numa balancing")
6b0ed7b3c775 ("mm: factor out the numa mapping rebuilding into a new helper")
ec1778807a80 ("mm: mprotect: use a folio in change_pte_range()")
6695cf68b15c ("mm: memory: use a folio in do_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40b760cfd44566bca791c80e0720d70d75382b84 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:04 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PTE is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
TLB flush via delaying mapping on hint page fault") restructured
do_numa_page() and did not avoid task_numa_fault() call in the second page
table check after a numa migration failure. Fix it by making all
!pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 34f8402d2046..3c01d68065be 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5295,7 +5295,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+ return 0;
}
pte = pte_modify(old_pte, vma->vm_page_prot);
@@ -5358,23 +5358,19 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
nid = target_nid;
flags |= TNF_MIGRATED;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
- vmf->address, &vmf->ptl);
- if (unlikely(!vmf->pte))
- goto out;
- if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, nr_pages, flags);
- return 0;
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+ vmf->address, &vmf->ptl);
+ if (unlikely(!vmf->pte))
+ return 0;
+ if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ return 0;
+ }
out_map:
/*
* Make it present again, depending on how arch implements
@@ -5387,7 +5383,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
writable);
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x fd8c35a92910f4829b7c99841f39b1b952c259d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081953-corncob-gab-6fce@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
fd8c35a92910 ("mm/numa: no task_numa_fault() call if PMD is changed")
667ffc31aa95 ("mm: huge_memory: use a folio in do_huge_pmd_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
4e096ae1801e ("mm: convert migrate_pages() to work on folios")
2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly")
a21d2133215b ("migrate_pages: move split folios processing out of migrate_pages_batch()")
fb3592c41a44 ("migrate_pages: fix deadlock in batched migration")
f9366f4c2a29 ("include/linux/migrate.h: remove unneeded externs")
cd7755800eb5 ("mm: change to return bool for isolate_movable_page()")
f7f9c00dfaff ("mm: change to return bool for isolate_lru_page()")
be2d57563822 ("mm: change to return bool for folio_isolate_lru()")
6f7d760e86fa ("migrate_pages: move THP/hugetlb migration support check to simplify code")
7e12beb8ca2a ("migrate_pages: batch flushing TLB")
ebe75e475106 ("migrate_pages: share more code between _unmap and _move")
80562ba0d837 ("migrate_pages: move migrate_folio_unmap()")
5dfab109d519 ("migrate_pages: batch _unmap and _move")
64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()")
42012e0436d4 ("migrate_pages: restrict number of pages to migrate in batch")
e5bfff8b10e4 ("migrate_pages: separate hugetlb folios migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd8c35a92910f4829b7c99841f39b1b952c259d5 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:05 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PMD is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4be468e06a4..67c86a5d64a6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1685,7 +1685,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
spin_unlock(vmf->ptl);
- goto out;
+ return 0;
}
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1728,22 +1728,16 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
flags |= TNF_MIGRATED;
nid = target_nid;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
- spin_unlock(vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
-
- return 0;
-
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+ spin_unlock(vmf->ptl);
+ return 0;
+ }
out_map:
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1753,7 +1747,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
spin_unlock(vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fd8c35a92910f4829b7c99841f39b1b952c259d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081952-handstand-rematch-5948@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
fd8c35a92910 ("mm/numa: no task_numa_fault() call if PMD is changed")
667ffc31aa95 ("mm: huge_memory: use a folio in do_huge_pmd_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
4e096ae1801e ("mm: convert migrate_pages() to work on folios")
2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly")
a21d2133215b ("migrate_pages: move split folios processing out of migrate_pages_batch()")
fb3592c41a44 ("migrate_pages: fix deadlock in batched migration")
f9366f4c2a29 ("include/linux/migrate.h: remove unneeded externs")
cd7755800eb5 ("mm: change to return bool for isolate_movable_page()")
f7f9c00dfaff ("mm: change to return bool for isolate_lru_page()")
be2d57563822 ("mm: change to return bool for folio_isolate_lru()")
6f7d760e86fa ("migrate_pages: move THP/hugetlb migration support check to simplify code")
7e12beb8ca2a ("migrate_pages: batch flushing TLB")
ebe75e475106 ("migrate_pages: share more code between _unmap and _move")
80562ba0d837 ("migrate_pages: move migrate_folio_unmap()")
5dfab109d519 ("migrate_pages: batch _unmap and _move")
64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()")
42012e0436d4 ("migrate_pages: restrict number of pages to migrate in batch")
e5bfff8b10e4 ("migrate_pages: separate hugetlb folios migration")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd8c35a92910f4829b7c99841f39b1b952c259d5 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:05 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PMD is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4be468e06a4..67c86a5d64a6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1685,7 +1685,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
spin_unlock(vmf->ptl);
- goto out;
+ return 0;
}
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1728,22 +1728,16 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
flags |= TNF_MIGRATED;
nid = target_nid;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
- spin_unlock(vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
-
- return 0;
-
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+ spin_unlock(vmf->ptl);
+ return 0;
+ }
out_map:
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1753,7 +1747,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
spin_unlock(vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
/*
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fd8c35a92910f4829b7c99841f39b1b952c259d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081951-fable-brewery-9048@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
fd8c35a92910 ("mm/numa: no task_numa_fault() call if PMD is changed")
667ffc31aa95 ("mm: huge_memory: use a folio in do_huge_pmd_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd8c35a92910f4829b7c99841f39b1b952c259d5 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:05 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PMD is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4be468e06a4..67c86a5d64a6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1685,7 +1685,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
spin_unlock(vmf->ptl);
- goto out;
+ return 0;
}
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1728,22 +1728,16 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
flags |= TNF_MIGRATED;
nid = target_nid;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
- spin_unlock(vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
-
- return 0;
-
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+ spin_unlock(vmf->ptl);
+ return 0;
+ }
out_map:
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1753,7 +1747,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
spin_unlock(vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
/*
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x fd8c35a92910f4829b7c99841f39b1b952c259d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081950-jolliness-crux-7fe1@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
fd8c35a92910 ("mm/numa: no task_numa_fault() call if PMD is changed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd8c35a92910f4829b7c99841f39b1b952c259d5 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy(a)nvidia.com>
Date: Fri, 9 Aug 2024 10:59:05 -0400
Subject: [PATCH] mm/numa: no task_numa_fault() call if PMD is changed
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4be468e06a4..67c86a5d64a6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1685,7 +1685,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
spin_unlock(vmf->ptl);
- goto out;
+ return 0;
}
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1728,22 +1728,16 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
flags |= TNF_MIGRATED;
nid = target_nid;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
- spin_unlock(vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
-
- return 0;
-
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+ spin_unlock(vmf->ptl);
+ return 0;
+ }
out_map:
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1753,7 +1747,10 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
spin_unlock(vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
/*
On 2024-08-21 09:34:36 [-0400], Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> serial: pch: Don't disable interrupts while acquiring lock in ISR.
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> serial-pch-don-t-disable-interrupts-while-acquiring-.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I don't think this needs to be backported. It was part of a cleanup
series. It is not wrong, but also not needed. Unless it is needed as a
dependency for another patch, I wouldn't bother.
Sebastian
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
v3: Added notes on top the function definition.
v2: Refactored the patch as suggested in v1 discussion.
drivers/usb/dwc3/gadget.c | 31 +++++++------------------------
1 file changed, 7 insertions(+), 24 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..d4f2f0e1f031 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -287,6 +287,13 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc, bool async);
*
* Caller should handle locking. This function will issue @cmd with given
* @params to @dep and wait for its completion.
+ *
+ * According to databook, if the link is in L1/L2/U3 while issuing StartXfer command,
+ * software must bring the link back to L0/U0 by performing remote wakeup. But we don't
+ * expect ep_queue to trigger a remote wakeup; instead it should be done by wakeup ops.
+ *
+ * After receiving wakeup event, device should no longer be in U3, and any link
+ * transition afterwards needs to be adressed with wakeup ops.
*/
int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
struct dwc3_gadget_ep_cmd_params *params)
@@ -327,30 +334,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
--
2.25.1
In psnet_open_pf_bar() a string later passed to pcim_iomap_regions() is
placed on the stack. Neither pcim_iomap_regions() nor the functions it
calls copy that string.
Should the string later ever be used, this, consequently, causes
undefined behavior since the stack frame will by then have disappeared.
Fix the bug by allocating the string on the heap through
devm_kasprintf().
Cc: stable(a)vger.kernel.org # v6.3
Fixes: 51a8f9d7f587 ("virtio: vdpa: new SolidNET DPU driver.")
Reported-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Closes: https://lore.kernel.org/all/74e9109a-ac59-49e2-9b1d-d825c9c9f891@wanadoo.fr/
Suggested-by: Andy Shevchenko <andy(a)kernel.org>
Signed-off-by: Philipp Stanner <pstanner(a)redhat.com>
---
drivers/vdpa/solidrun/snet_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c
index 99428a04068d..4d42a05d70fc 100644
--- a/drivers/vdpa/solidrun/snet_main.c
+++ b/drivers/vdpa/solidrun/snet_main.c
@@ -555,7 +555,7 @@ static const struct vdpa_config_ops snet_config_ops = {
static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
{
- char name[50];
+ char *name;
int ret, i, mask = 0;
/* We don't know which BAR will be used to communicate..
* We will map every bar with len > 0.
@@ -573,7 +573,10 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet)
return -ENODEV;
}
- snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev));
+ name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "psnet[%s]-bars", pci_name(pdev));
+ if (!name)
+ return -ENOMEM;
+
ret = pcim_iomap_regions(pdev, mask, name);
if (ret) {
SNET_ERR(pdev, "Failed to request and map PCI BARs\n");
--
2.46.0
In parport_attach, the return value of ida_alloc is unchecked, witch leads
to the use of an invalid index value.
To address this issue, index should be checked. When the index value is
abnormal, the device should be freed.
Found by code review, compile tested only.
Cc: stable(a)vger.kernel.org
Fixes: fb56d97df70e ("pps: client: use new parport device model")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- modified Fixes tag as suggestions.
Changes in v2:
- removed error output as suggestions.
---
drivers/pps/clients/pps_parport.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/pps/clients/pps_parport.c b/drivers/pps/clients/pps_parport.c
index 63d03a0df5cc..abaffb4e1c1c 100644
--- a/drivers/pps/clients/pps_parport.c
+++ b/drivers/pps/clients/pps_parport.c
@@ -149,6 +149,9 @@ static void parport_attach(struct parport *port)
}
index = ida_alloc(&pps_client_index, GFP_KERNEL);
+ if (index < 0)
+ goto err_free_device;
+
memset(&pps_client_cb, 0, sizeof(pps_client_cb));
pps_client_cb.private = device;
pps_client_cb.irq_func = parport_irq;
@@ -159,7 +162,7 @@ static void parport_attach(struct parport *port)
index);
if (!device->pardev) {
pr_err("couldn't register with %s\n", port->name);
- goto err_free;
+ goto err_free_ida;
}
if (parport_claim_or_block(device->pardev) < 0) {
@@ -187,8 +190,9 @@ static void parport_attach(struct parport *port)
parport_release(device->pardev);
err_unregister_dev:
parport_unregister_device(device->pardev);
-err_free:
+err_free_ida:
ida_free(&pps_client_index, index);
+err_free_device:
kfree(device);
}
--
2.25.1
Here are more fixes for the MPTCP in-kernel path-manager. In this
series, the fixes are around the endpoint IDs not being reusable for
on-going connections when re-creating endpoints with previously used IDs.
- Patch 1 fixes this case for endpoints being used to send ADD_ADDR.
Patch 2 validates this fix. The issue is present since v5.10.
- Patch 3 fixes this case for endpoints being used to establish new
subflows. Patch 4 validates this fix. The issue is present since v5.10.
- Patch 5 fixes this case when all endpoints are flushed. Patch 6
validates this fix. The issue is present since v5.13.
- Patch 7 removes a helper that is confusing, and introduced in v5.10.
It helps simplifying the next patches.
- Patch 8 makes sure a 'subflow' counter is only decremented when
removing a 'subflow' endpoint. Can be backported up to v5.13.
- Patch 9 is similar, but for a 'signal' counter. Can be backported up
to v5.10.
- Patch 10 checks the last max accepted ADD_ADDR limit before accepting
new ADD_ADDR. For v5.10 as well.
- Patch 11 removes a wrong restriction for the userspace PM, added
during a refactoring in v6.5.
- Patch 12 makes sure the fullmesh mode sets the ID 0 when a new subflow
using the source address of the initial subflow is created. Patch 13
covers this case. This issue is present since v5.15.
- Patch 14 avoid possible UaF when selecting an address from the
endpoints list.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (14):
mptcp: pm: re-using ID of unused removed ADD_ADDR
selftests: mptcp: join: check re-using ID of unused ADD_ADDR
mptcp: pm: re-using ID of unused removed subflows
selftests: mptcp: join: check re-using ID of closed subflow
mptcp: pm: re-using ID of unused flushed subflows
selftests: mptcp: join: test for flush/re-add endpoints
mptcp: pm: remove mptcp_pm_remove_subflow()
mptcp: pm: only mark 'subflow' endp as available
mptcp: pm: only decrement add_addr_accepted for MPJ req
mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDR
mptcp: pm: only in-kernel cannot have entries with ID 0
mptcp: pm: fullmesh: select the right ID later
selftests: mptcp: join: validate fullmesh endp on 1st sf
mptcp: pm: avoid possible UaF when selecting endp
net/mptcp/pm.c | 13 ---
net/mptcp/pm_netlink.c | 142 ++++++++++++++++--------
net/mptcp/protocol.h | 3 -
tools/testing/selftests/net/mptcp/mptcp_join.sh | 76 +++++++++++--
4 files changed, 160 insertions(+), 74 deletions(-)
---
base-commit: 565d121b69980637f040eb4d84289869cdaabedf
change-id: 20240819-net-mptcp-pm-reusing-id-eb08827b7be6
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
cxgbi_check_route() dont release the reference acquired by ip_dev_find()
which introducing a reference count leak. We could remedy this by
insuring the reference is released.ip_dev_find().
Cc: stable(a)vger.kernel.org
Fixes: 9ba682f01e2f ("[SCSI] libcxgbi: common library for cxgb3i and cxgb4i")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/scsi/cxgbi/libcxgbi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index bf75940f2be1..6b0f1e8dac40 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -670,6 +670,7 @@ cxgbi_check_route(struct sockaddr *dst_addr, int ifindex)
"route to %pI4 :%u, ndev p#%d,%s, cdev 0x%p.\n",
&daddr->sin_addr.s_addr, ntohs(daddr->sin_port),
port, ndev->name, cdev);
+ dev_put(ndev);
csk = cxgbi_sock_create(cdev);
if (!csk) {
--
2.25.1
The patch titled
Subject: maple_tree: remove rcu_read_lock() from mt_validate()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-remove-rcu_read_lock-from-mt_validate.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)Oracle.com>
Subject: maple_tree: remove rcu_read_lock() from mt_validate()
Date: Tue, 20 Aug 2024 13:54:17 -0400
The write lock should be held when validating the tree to avoid updates
racing with checks. Holding the rcu read lock during a large tree
validation may also cause a prolonged rcu read window and "rcu_preempt
detected stalls" warnings.
Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/
Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reported-by: syzbot+036af2f0c7338a33b0cd(a)syzkaller.appspotmail.com
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
--- a/lib/maple_tree.c~maple_tree-remove-rcu_read_lock-from-mt_validate
+++ a/lib/maple_tree.c
@@ -7566,14 +7566,14 @@ static void mt_validate_nulls(struct map
* 2. The gap is correctly set in the parents
*/
void mt_validate(struct maple_tree *mt)
+ __must_hold(mas->tree->ma_lock)
{
unsigned char end;
MA_STATE(mas, mt, 0, 0);
- rcu_read_lock();
mas_start(&mas);
if (!mas_is_active(&mas))
- goto done;
+ return;
while (!mte_is_leaf(mas.node))
mas_descend(&mas);
@@ -7594,9 +7594,6 @@ void mt_validate(struct maple_tree *mt)
mas_dfs_postorder(&mas, ULONG_MAX);
}
mt_validate_nulls(mt);
-done:
- rcu_read_unlock();
-
}
EXPORT_SYMBOL_GPL(mt_validate);
_
Patches currently in -mm which might be from Liam.Howlett(a)Oracle.com are
maple_tree-remove-rcu_read_lock-from-mt_validate.patch
In aperture_remove_conflicting_pci_devices(), we currently only
call sysfb_disable() on vga class devices. This leads to the
following problem when the pimary device is not VGA compatible:
1. A PCI device with a non-VGA class is the boot display
2. That device is probed first and it is not a VGA device so
sysfb_disable() is not called, but the device resources
are freed by aperture_detach_platform_device()
3. Non-primary GPU has a VGA class and it ends up calling sysfb_disable()
4. NULL pointer dereference via sysfb_disable() since the resources
have already been freed by aperture_detach_platform_device() when
it was called by the other device.
Fix this by passing a device pointer to sysfb_disable() and checking
the device to determine if we should execute it or not.
v2: Fix build when CONFIG_SCREEN_INFO is not set
Fixes: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/firmware/sysfb.c | 11 +++++++++--
drivers/of/platform.c | 2 +-
drivers/video/aperture.c | 5 ++---
include/linux/sysfb.h | 4 ++--
4 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 880ffcb500887..033a044af2646 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -39,6 +39,8 @@ static struct platform_device *pd;
static DEFINE_MUTEX(disable_lock);
static bool disabled;
+static struct device *sysfb_parent_dev(const struct screen_info *si);
+
static bool sysfb_unregister(void)
{
if (IS_ERR_OR_NULL(pd))
@@ -52,6 +54,7 @@ static bool sysfb_unregister(void)
/**
* sysfb_disable() - disable the Generic System Framebuffers support
+ * @dev: the device to check if non-NULL
*
* This disables the registration of system framebuffer devices that match the
* generic drivers that make use of the system framebuffer set up by firmware.
@@ -61,8 +64,12 @@ static bool sysfb_unregister(void)
* Context: The function can sleep. A @disable_lock mutex is acquired to serialize
* against sysfb_init(), that registers a system framebuffer device.
*/
-void sysfb_disable(void)
+void sysfb_disable(struct device *dev)
{
+ struct screen_info *si = &screen_info;
+
+ if (dev && dev != sysfb_parent_dev(si))
+ return;
mutex_lock(&disable_lock);
sysfb_unregister();
disabled = true;
@@ -93,7 +100,7 @@ static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
}
#endif
-static __init struct device *sysfb_parent_dev(const struct screen_info *si)
+static struct device *sysfb_parent_dev(const struct screen_info *si)
{
struct pci_dev *pdev;
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 389d4ea6bfc15..ef622d41eb5b2 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -592,7 +592,7 @@ static int __init of_platform_default_populate_init(void)
* This can happen for example on DT systems that do EFI
* booting and may provide a GOP handle to the EFI stub.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
of_platform_device_create(node, NULL, NULL);
of_node_put(node);
}
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96c..b23d85ceea104 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -293,7 +293,7 @@ int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si
* ask for this, so let's assume that a real driver for the display
* was already probed and prevent sysfb to register devices later.
*/
- sysfb_disable();
+ sysfb_disable(NULL);
aperture_detach_devices(base, size);
@@ -353,8 +353,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
if (pdev == vga_default_device())
primary = true;
- if (primary)
- sysfb_disable();
+ sysfb_disable(&pdev->dev);
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index c9cb657dad08a..bef5f06a91de6 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,11 +58,11 @@ struct efifb_dmi_info {
#ifdef CONFIG_SYSFB
-void sysfb_disable(void);
+void sysfb_disable(struct device *dev);
#else /* CONFIG_SYSFB */
-static inline void sysfb_disable(void)
+static inline void sysfb_disable(struct device *dev)
{
}
--
2.46.0
This commit addresses an issue where the USB core could access an
invalid event buffer address during runtime suspend, potentially causing
SMMU faults and other memory issues. The problem arises from the
following sequence.
1. In dwc3_gadget_suspend, there is a chance of a timeout when
moving the USB core to the halt state after clearing the
run/stop bit by software.
2. In dwc3_core_exit, the event buffer is cleared regardless of
the USB core's status, which may lead to an SMMU faults and
other memory issues. if the USB core tries to access the event
buffer address.
To prevent this issue, this commit ensures that the event buffer address
is not cleared by software when the USB core is active during runtime
suspend by checking its status before clearing the buffer address.
Cc: stable(a)vger.kernel.org
Fixes: 89d7f9629946 ("usb: dwc3: core: Skip setting event buffers for host only controllers")
Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com>
---
Changes in v2:
- Added separate check for USB controller status before cleaning the
event buffer.
- Link to v1: https://lore.kernel.org/lkml/20240722145617.537-1-selvarasu.g@samsung.com/
---
drivers/usb/dwc3/core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 734de2a8bd21..5b67d9bca71b 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -564,10 +564,15 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
void dwc3_event_buffers_cleanup(struct dwc3 *dwc)
{
struct dwc3_event_buffer *evt;
+ u32 reg;
if (!dwc->ev_buf)
return;
+ reg = dwc3_readl(dwc->regs, DWC3_DSTS);
+ if (!(reg & DWC3_DSTS_DEVCTRLHLT))
+ return;
+
evt = dwc->ev_buf;
evt->lpos = 0;
--
2.17.1
It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match()
There is no reason it should not be done here
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
Cc: stable(a)vger.kernel.org
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
Reviewed-by: Simon Horman <horms(a)kernel.org>
---
v2: Wrap line to 80 characters
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 786ceae34488..dd9e68465e69 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -1244,7 +1244,8 @@ static u64 hash_filter_ntuple(struct ch_filter_specification *fs,
* in the Compressed Filter Tuple.
*/
if (tp->vlan_shift >= 0 && fs->mask.ivlan)
- ntuple |= (FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift;
+ ntuple |= (u64)(FT_VLAN_VLD_F |
+ fs->val.ivlan) << tp->vlan_shift;
if (tp->port_shift >= 0 && fs->mask.iport)
ntuple |= (u64)fs->val.iport << tp->port_shift;
--
2.34.1
If formatting a suspended disk (such as formatting with different DIF
type), the disk will be resuming first, and then the format command will
submit to the disk through SG_IO ioctl.
When the disk is processing the format command, the system does not submit
other commands to the disk. Therefore, the system attempts to suspend the
disk again and sends the SYNC CACHE command. However, the SYNC CACHE
command will fail because the disk is in the formatting process, which
will cause the runtime_status of the disk to error and it is difficult
for user to recover it. Error info like:
[ 669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
[ 670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[ 670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
[ 670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
To solve the issue, ignore the error and return success/0 when formatting
in progress.
Cc: stable(a)vger.kernel.org
Signed-off-by: Yihang Li <liyihang9(a)huawei.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
---
Changes since v4:
- Rename the commit title.
- Ignore the SYNC command error during formatting as suggested by Damien.
Changes since v3:
- Add Cc tag for kernel stable.
Changes since v2:
- Add Reviewed-by for Bart.
Changes since v1:
- Updated and added error information to the patch description.
---
drivers/scsi/sd.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..2d7240a24b52 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1823,13 +1823,15 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
(sshdr.asc == 0x74 && sshdr.ascq == 0x71)) /* drive is password locked */
/* this is no error here */
return 0;
+
/*
- * This drive doesn't support sync and there's not much
- * we can do because this is called during shutdown
- * or suspend so just return success so those operations
- * can proceed.
+ * If a format is in progress or if the drive does not
+ * support sync, there is not much we can do because
+ * this is called during shutdown or suspend so just
+ * return success so those operations can proceed.
*/
- if (sshdr.sense_key == ILLEGAL_REQUEST)
+ if ((sshdr.asc == 0x04 && sshdr.ascq == 0x04) ||
+ sshdr.sense_key == ILLEGAL_REQUEST)
return 0;
}
--
2.33.0
From: Ahmed Ehab <bottaawesome633(a)gmail.com>
Preventing lockdep_set_subclass from creating a new instance of the
string literal. Hence, we will always have the same class->name among
parent and subclasses. This prevents kernel panics when looking up a
lock class while comparing class locks and class names.
Reported-by: <syzbot+7f4a6f7f7051474e40ad(a)syzkaller.appspotmail.com>
Fixes: de8f5e4f2dc1f ("lockdep: Introduce wait-type checks")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ahmed Ehab <bottaawesome633(a)gmail.com>
---
v3->v4:
- Fixed subject line truncation.
include/linux/lockdep.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 08b0d1d9d78b..df8fa5929de7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -173,7 +173,7 @@ static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
(lock)->dep_map.lock_type)
#define lockdep_set_subclass(lock, sub) \
- lockdep_init_map_type(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+ lockdep_init_map_type(&(lock)->dep_map, (lock)->dep_map.name, (lock)->dep_map.key, sub,\
(lock)->dep_map.wait_type_inner, \
(lock)->dep_map.wait_type_outer, \
(lock)->dep_map.lock_type)
--
2.45.2
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: ea72ce5da22806d5713f3ffb39a6d5ae73841f93
Gitweb: https://git.kernel.org/tip/ea72ce5da22806d5713f3ffb39a6d5ae73841f93
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 14 Aug 2024 00:29:36 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 20 Aug 2024 13:44:57 +02:00
x86/kaslr: Expose and use the end of the physical memory address space
iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.
KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions. It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory. This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.
MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits. All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.
Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.
To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.
Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions")
Reported-by: Max Ramanouski <max8rr8(a)gmail.com>
Reported-by: Alistair Popple <apopple(a)nvidia.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-By: Max Ramanouski <max8rr8(a)gmail.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
---
arch/x86/include/asm/page_64.h | 1 +-
arch/x86/include/asm/pgtable_64_types.h | 4 +++-
arch/x86/mm/init_64.c | 4 +++-
arch/x86/mm/kaslr.c | 32 +++++++++++++++++++-----
include/linux/mm.h | 4 +++-
kernel/resource.c | 6 +----
mm/memory_hotplug.c | 2 +-
mm/sparse.c | 2 +-
8 files changed, 43 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index af4302d..f3d257c 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -17,6 +17,7 @@ extern unsigned long phys_base;
extern unsigned long page_offset_base;
extern unsigned long vmalloc_base;
extern unsigned long vmemmap_base;
+extern unsigned long physmem_end;
static __always_inline unsigned long __phys_addr_nodebug(unsigned long x)
{
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 9053dfe..a98e534 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -140,6 +140,10 @@ extern unsigned int ptrs_per_p4d;
# define VMEMMAP_START __VMEMMAP_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
+#ifdef CONFIG_RANDOMIZE_MEMORY
+# define PHYSMEM_END physmem_end
+#endif
+
/*
* End of the region for which vmalloc page tables are pre-allocated.
* For non-KMSAN builds, this is the same as VMALLOC_END.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index d8dbeac..ff25364 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -958,8 +958,12 @@ static void update_end_of_memory_vars(u64 start, u64 size)
int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
struct mhp_params *params)
{
+ unsigned long end = ((start_pfn + nr_pages) << PAGE_SHIFT) - 1;
int ret;
+ if (WARN_ON_ONCE(end > PHYSMEM_END))
+ return -ERANGE;
+
ret = __add_pages(nid, start_pfn, nr_pages, params);
WARN_ON_ONCE(ret);
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 37db264..230f1de 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -47,13 +47,24 @@ static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
*/
static __initdata struct kaslr_memory_region {
unsigned long *base;
+ unsigned long *end;
unsigned long size_tb;
} kaslr_regions[] = {
- { &page_offset_base, 0 },
- { &vmalloc_base, 0 },
- { &vmemmap_base, 0 },
+ {
+ .base = &page_offset_base,
+ .end = &physmem_end,
+ },
+ {
+ .base = &vmalloc_base,
+ },
+ {
+ .base = &vmemmap_base,
+ },
};
+/* The end of the possible address space for physical memory */
+unsigned long physmem_end __ro_after_init;
+
/* Get size in bytes used by the memory region */
static inline unsigned long get_padding(struct kaslr_memory_region *region)
{
@@ -82,6 +93,8 @@ void __init kernel_randomize_memory(void)
BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
+ /* Preset the end of the possible address space for physical memory */
+ physmem_end = ((1ULL << MAX_PHYSMEM_BITS) - 1);
if (!kaslr_memory_enabled())
return;
@@ -128,11 +141,18 @@ void __init kernel_randomize_memory(void)
vaddr += entropy;
*kaslr_regions[i].base = vaddr;
+ /* Calculate the end of the region */
+ vaddr += get_padding(&kaslr_regions[i]);
/*
- * Jump the region and add a minimum padding based on
- * randomization alignment.
+ * KASLR trims the maximum possible size of the
+ * direct-map. Update the physmem_end boundary.
+ * No rounding required as the region starts
+ * PUD aligned and size is in units of TB.
*/
- vaddr += get_padding(&kaslr_regions[i]);
+ if (kaslr_regions[i].end)
+ *kaslr_regions[i].end = __pa_nodebug(vaddr - 1);
+
+ /* Add a minimum padding based on randomization alignment. */
vaddr = round_up(vaddr + 1, PUD_SIZE);
remain_entropy -= entropy;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c4b238a..b386415 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -97,6 +97,10 @@ extern const int mmap_rnd_compat_bits_max;
extern int mmap_rnd_compat_bits __read_mostly;
#endif
+#ifndef PHYSMEM_END
+# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1)
+#endif
+
#include <asm/page.h>
#include <asm/processor.h>
diff --git a/kernel/resource.c b/kernel/resource.c
index 14777af..a83040f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1826,8 +1826,7 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
if (flags & GFR_DESCENDING) {
resource_size_t end;
- end = min_t(resource_size_t, base->end,
- (1ULL << MAX_PHYSMEM_BITS) - 1);
+ end = min_t(resource_size_t, base->end, PHYSMEM_END);
return end - size + 1;
}
@@ -1844,8 +1843,7 @@ static bool gfr_continue(struct resource *base, resource_size_t addr,
* @size did not wrap 0.
*/
return addr > addr - size &&
- addr <= min_t(resource_size_t, base->end,
- (1ULL << MAX_PHYSMEM_BITS) - 1);
+ addr <= min_t(resource_size_t, base->end, PHYSMEM_END);
}
static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 66267c2..951878a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1681,7 +1681,7 @@ struct range __weak arch_get_mappable_range(void)
struct range mhp_get_pluggable_range(bool need_mapping)
{
- const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1;
+ const u64 max_phys = PHYSMEM_END;
struct range mhp_range;
if (need_mapping) {
diff --git a/mm/sparse.c b/mm/sparse.c
index e4b8300..0c3bff8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -129,7 +129,7 @@ static inline int sparse_early_nid(struct mem_section *section)
static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
unsigned long *end_pfn)
{
- unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
+ unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT;
/*
* Sanity checks - do not allow an architecture to pass
Hi, Greg
I saw you have included commit 7697a0fe0154 ("LoongArch: Define
__ARCH_WANT_NEW_STAT in unistd.h") in your stable trees, which
actually introduced new sys calls newfstatat and fstat to Loongarch.
I wonder if it is correct to add new syscalls, which actually changes
the kernel features, in stable releases, as it might confuse downstream
developers because they may assume the existence of a certain feature
after a certain version.
Cheers,
Miao Wang
If z_erofs_gbuf_growsize() partially fails on a global buffer due to
memory allocation failure or fault injection (as reported by syzbot [1]),
new pages need to be freed by comparing to the existing pages to avoid
memory leaks.
However, the old gbuf->pages[] array may not be large enough, which can
lead to null-ptr-deref or out-of-bound access.
Fix this by checking against gbuf->nrpages in advance.
[1] https://lore.kernel.org/r/000000000000f7b96e062018c6e3@google.com
Reported-by: syzbot+242ee56aaa9585553766(a)syzkaller.appspotmail.com
Fixes: d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
Cc: <stable(a)vger.kernel.org> # 6.10+
Cc: Chunhai Guo <guochunhai(a)vivo.com>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
---
RESEND:
Add missing link and reported-by.
fs/erofs/zutil.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/zutil.c b/fs/erofs/zutil.c
index 9b53883e5caf..37afe2024840 100644
--- a/fs/erofs/zutil.c
+++ b/fs/erofs/zutil.c
@@ -111,7 +111,8 @@ int z_erofs_gbuf_growsize(unsigned int nrpages)
out:
if (i < z_erofs_gbuf_count && tmp_pages) {
for (j = 0; j < nrpages; ++j)
- if (tmp_pages[j] && tmp_pages[j] != gbuf->pages[j])
+ if (tmp_pages[j] && (j >= gbuf->nrpages ||
+ tmp_pages[j] != gbuf->pages[j]))
__free_page(tmp_pages[j]);
kfree(tmp_pages);
}
--
2.43.5
This is the start of the stable review cycle for the 6.10.6 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 19 Aug 2024 08:53:52 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.6-rc3…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.10.6-rc3
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()"
Niklas Cassel <cassel(a)kernel.org>
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Sean Young <sean(a)mess.org>
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Wayne Lin <Wayne.Lin(a)amd.com>
drm/amd/display: Solve mst monitors blank out problem after resume
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: introduce a generic notification chain
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Simon Trimmer <simont(a)opensource.cirrus.com>
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Fangzhi Zuo <jerry.zuo(a)amd.com>
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Harry Wentland <harry.wentland(a)amd.com>
drm/amd/display: Separate setting and programming of cursor
Wayne Lin <wayne.lin(a)amd.com>
drm/amd/display: Defer handling mst up request in resume
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Makefile | 4 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/ata/libata-scsi.c | 15 +-
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +-
.../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 236 ++++++++-------------
.../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +-
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 94 +++++---
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8 +
.../drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2 +-
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 +--
drivers/nvme/host/pci.c | 7 +
drivers/platform/x86/Kconfig | 1 +
drivers/platform/x86/amd/pmf/spc.c | 32 +--
drivers/platform/x86/ideapad-laptop.c | 148 +++++++++++--
drivers/platform/x86/ideapad-laptop.h | 9 +
drivers/platform/x86/lenovo-ymc.c | 60 +-----
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/f2fs/extent_cache.c | 50 ++---
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 +
fs/f2fs/inode.c | 10 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/ntfs3/frecord.c | 75 ++++++-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
sound/soc/codecs/cs35l56-shared.c | 1 +
sound/usb/mixer.c | 7 +
29 files changed, 492 insertions(+), 361 deletions(-)
When an attribute group is created with sysfs_create_group(), the
->sysfs_ops() callback is set to kobj_sysfs_ops, which sets the ->show()
and ->store() callbacks to kobj_attr_show() and kobj_attr_store()
respectively. These functions use container_of() to get the respective
callback from the passed attribute, meaning that these callbacks need to
be the same type as the callbacks in 'struct kobj_attribute'.
However, the platform_profile sysfs functions have the type of the
->show() and ->store() callbacks in 'struct device_attribute', which
results a CFI violation when accessing platform_profile or
platform_profile_choices under /sys/firmware/acpi because the types do
not match:
CFI failure at kobj_attr_show+0x19/0x30 (target: platform_profile_choices_show+0x0/0x140; expected type: 0x7a69590c)
This happens to work because the layout of 'struct kobj_attribute' and
'struct device_attribute' are the same, so the container_of() cast
happens to allow the callbacks to still work.
Change the type of platform_profile_choices_show() and
platform_profile_{show,store}() to match the callbacks in
'struct kobj_attribute' and update the attribute variables to match,
which resolves the CFI violation.
Cc: stable(a)vger.kernel.org
Fixes: a2ff95e018f1 ("ACPI: platform: Add platform profile support")
Reported-by: John Rowley <lkml(a)johnrowley.me>
Closes: https://github.com/ClangBuiltLinux/linux/issues/2047
Tested-by: John Rowley <lkml(a)johnrowley.me>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
drivers/acpi/platform_profile.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/acpi/platform_profile.c b/drivers/acpi/platform_profile.c
index d2f7fd7743a1..11278f785526 100644
--- a/drivers/acpi/platform_profile.c
+++ b/drivers/acpi/platform_profile.c
@@ -22,8 +22,8 @@ static const char * const profile_names[] = {
};
static_assert(ARRAY_SIZE(profile_names) == PLATFORM_PROFILE_LAST);
-static ssize_t platform_profile_choices_show(struct device *dev,
- struct device_attribute *attr,
+static ssize_t platform_profile_choices_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
char *buf)
{
int len = 0;
@@ -49,8 +49,8 @@ static ssize_t platform_profile_choices_show(struct device *dev,
return len;
}
-static ssize_t platform_profile_show(struct device *dev,
- struct device_attribute *attr,
+static ssize_t platform_profile_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
char *buf)
{
enum platform_profile_option profile = PLATFORM_PROFILE_BALANCED;
@@ -77,8 +77,8 @@ static ssize_t platform_profile_show(struct device *dev,
return sysfs_emit(buf, "%s\n", profile_names[profile]);
}
-static ssize_t platform_profile_store(struct device *dev,
- struct device_attribute *attr,
+static ssize_t platform_profile_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
const char *buf, size_t count)
{
int err, i;
@@ -115,12 +115,12 @@ static ssize_t platform_profile_store(struct device *dev,
return count;
}
-static DEVICE_ATTR_RO(platform_profile_choices);
-static DEVICE_ATTR_RW(platform_profile);
+static struct kobj_attribute attr_platform_profile_choices = __ATTR_RO(platform_profile_choices);
+static struct kobj_attribute attr_platform_profile = __ATTR_RW(platform_profile);
static struct attribute *platform_profile_attrs[] = {
- &dev_attr_platform_profile_choices.attr,
- &dev_attr_platform_profile.attr,
+ &attr_platform_profile_choices.attr,
+ &attr_platform_profile.attr,
NULL
};
---
base-commit: 47ac09b91befbb6a235ab620c32af719f8208399
change-id: 20240819-acpi-platform_profile-fix-cfi-violation-de278753bd5f
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
CPU hangs were reported while offlining/onlining CPUs on s390.
Analyzing the vmcore data shows `stop_one_cpu_nowait()` in `affine_move_task()`
can fail when racing with off-/on-lining resulting in a deadlock waiting for
the pending migration stop work completion which is never done.
Fix this by gracefully handling such condition.
Fixes: 9e81889c7648 ("sched: Fix affine_move_task() self-concurrency")
Cc: stable(a)vger.kernel.org
Reported-by: Bill Peters <wpeters(a)atpco.net>
Tested-by: Bill Peters <wpeters(a)atpco.net>
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
---
kernel/sched/core.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f3951e4a55e5b..40a3c9ff74077 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2871,8 +2871,25 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
preempt_disable();
task_rq_unlock(rq, p, rf);
if (!stop_pending) {
- stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
- &pending->arg, &pending->stop_work);
+ stop_pending =
+ stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+ &pending->arg, &pending->stop_work);
+ /*
+ * The state resulting in this failure is not expected
+ * at this point. At least report a WARNING to be able
+ * to panic and further debug if reproduced.
+ */
+ if (WARN_ON(!stop_pending)) {
+ /*
+ * Then try to handle the failure gracefully
+ * to prevent the deadlock a few lines later.
+ */
+ rq = task_rq_lock(p, rf);
+ pending->stop_pending = false;
+ p->migration_pending = NULL;
+ task_rq_unlock(rq, p, rf);
+ complete_all(&pending->done);
+ }
}
preempt_enable();
--
2.43.0
The check_unaligned_access_emulated() function should have been called
during CPU hotplug to ensure that if all CPUs had emulated unaligned
accesses, the new CPU also does.
This patch adds the call to check_unaligned_access_emulated() in
the hotplug path.
Fixes: 55e0bf49a0d0 ("RISC-V: Probe misaligned access speed in parallel")
Signed-off-by: Jesse Taube <jesse(a)rivosinc.com>
Reviewed-by: Evan Green <evan(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
---
V5 -> V6:
- New patch
V6 -> V7:
- No changes
V7 -> V8:
- Rebase onto fixes
V8 -> V9:
- No changes
---
arch/riscv/kernel/unaligned_access_speed.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 160628a2116d..f3508cc54f91 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -191,6 +191,7 @@ static int riscv_online_cpu(unsigned int cpu)
if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
goto exit;
+ check_unaligned_access_emulated(NULL);
buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
if (!buf) {
pr_warn("Allocation failure, not measuring misaligned performance\n");
--
2.45.2
From: Jan Kiszka <jan.kiszka(a)siemens.com>
When k3_r5_cluster_rproc_exit is run, core 1 is shutdown and removed
first. When core 0 should then be stopped before its removal, it will
find core1->rproc as NULL already and crashes. Happens on rmmod e.g.
Fixes: 3c8a9066d584 ("remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com>
---
There might be one more because I can still make this driver crash
after an operator error. Were error scenarios tested at all?
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c
index eb09d2e9b32a..9ebd7a34e638 100644
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c
+++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c
@@ -646,7 +646,8 @@ static int k3_r5_rproc_stop(struct rproc *rproc)
/* do not allow core 0 to stop before core 1 */
core1 = list_last_entry(&cluster->cores, struct k3_r5_core,
elem);
- if (core != core1 && core1->rproc->state != RPROC_OFFLINE) {
+ if (core != core1 && core1->rproc &&
+ core1->rproc->state != RPROC_OFFLINE) {
dev_err(dev, "%s: can not stop core 0 before core 1\n",
__func__);
ret = -EPERM;
--
2.43.0
We fail to perform an of_node_put() when of_address_to_resource()
fails, leading to a refcount leak.
Address this by moving the error handling path outside of the loop and
making it common to all failure modes.
Cc: stable(a)vger.kernel.org
Fixes: 4266ab1a8ff5 ("irqchip/gic-v2m: Refactor to prepare for ACPI support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- modified the description information. Changed the added 'put' function
to 'of_node_put' (the previous incorrect function was 'of_put_node').
Changes in v2:
- modified the patch according to suggestions.
---
drivers/irqchip/irq-gic-v2m.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index 51af63c046ed..be35c5349986 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -407,12 +407,12 @@ static int __init gicv2m_of_init(struct fwnode_handle *parent_handle,
ret = gicv2m_init_one(&child->fwnode, spi_start, nr_spis,
&res, 0);
- if (ret) {
- of_node_put(child);
+ if (ret)
break;
- }
}
+ if (ret && child)
+ of_node_put(child);
if (!ret)
ret = gicv2m_allocate_domains(parent);
if (ret)
--
2.25.1
When gfs2_fill_super() fails, destroy_workqueue()
is called within gfs2_gl_hash_clear(), and the
subsequent code path calls destroy_workqueue()
on the same work queue again.
This issue can be fixed by setting the work
queue pointer to NULL after the first
destroy_workqueue() call and checking for
a NULL pointer before attempting to destroy
the work queue again.
Reported-by: syzbot+d34c2a269ed512c531b0(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d34c2a269ed512c531b0
Fixes: 30e388d57367 ("gfs2: Switch to a per-filesystem glock workqueue")
Cc: stable(a)vger.kernel.org
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/gfs2/glock.c | 1 +
fs/gfs2/ops_fstype.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 12a769077ea0..4775c2cb8ae1 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2249,6 +2249,7 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
gfs2_free_dead_glocks(sdp);
glock_hash_walk(dump_glock_func, sdp);
destroy_workqueue(sdp->sd_glock_wq);
+ sdp->sd_glock_wq = NULL;
}
static const char *state2str(unsigned state)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index ff1f3e3dc65c..c1a7ff713c84 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1305,7 +1305,8 @@ static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc)
gfs2_delete_debugfs_file(sdp);
gfs2_sys_fs_del(sdp);
fail_delete_wq:
- destroy_workqueue(sdp->sd_delete_wq);
+ if (sdp->sd_delete_wq)
+ destroy_workqueue(sdp->sd_delete_wq);
fail_glock_wq:
destroy_workqueue(sdp->sd_glock_wq);
fail_free:
--
2.39.2
On Tue, Aug 20, 2024 at 2:04 PM Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> rust: work around `bindgen` 0.69.0 issue
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> rust-work-around-bindgen-0.69.0-issue.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This should not hurt as a dependency of the other, but just in case:
it does not do anything and the comments may be confusing, since we
pin bindgen's version in the stable/LTS versions.
Thanks!
Cheers,
Miguel
When only the last resource is invalid, tpmi_sst_dev_add() is returing
error even if there are other valid resources before. This function
should return error when there are no valid resources.
Here tpmi_sst_dev_add() is returning "ret" variable. But this "ret"
variable contains the failure status of last call to sst_main(), which
failed for the invalid resource. But there may be other valid resources
before the last entry.
To address this, do not update "ret" variable for sst_main() return
status.
Fixes: 9d1d36268f3d ("platform/x86: ISST: Support partitioned systems")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # 6.10+
---
drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c b/drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c
index 7fa360073f6e..404582307109 100644
--- a/drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c
+++ b/drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c
@@ -1549,8 +1549,7 @@ int tpmi_sst_dev_add(struct auxiliary_device *auxdev)
goto unlock_free;
}
- ret = sst_main(auxdev, &pd_info[i]);
- if (ret) {
+ if (sst_main(auxdev, &pd_info[i])) {
/*
* This entry is not valid, hardware can partially
* populate dies. In this case MMIO will have 0xFFs.
--
2.45.0
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
Device attribute group @usb3_hardware_lpm_attr_group is merged by
add_power_attributes(), but it is not unmerged explicitly, fixed by
unmerging it in remove_power_attributes().
Fixes: 655fe4effe0f ("usbcore: add sysfs support to xHCI usb3 hardware LPM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Changes in v2:
- Add stable tag
- Link to v1: https://lore.kernel.org/r/20240814-sysfs_fix-v1-1-2224a29a259b@quicinc.com
---
drivers/usb/core/sysfs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c
index d83231d6736a..61b6d978892c 100644
--- a/drivers/usb/core/sysfs.c
+++ b/drivers/usb/core/sysfs.c
@@ -670,6 +670,7 @@ static int add_power_attributes(struct device *dev)
static void remove_power_attributes(struct device *dev)
{
+ sysfs_unmerge_group(&dev->kobj, &usb3_hardware_lpm_attr_group);
sysfs_unmerge_group(&dev->kobj, &usb2_hardware_lpm_attr_group);
sysfs_unmerge_group(&dev->kobj, &power_attr_group);
}
---
base-commit: ca7df2c7bb5f83fe46aa9ce998b7352c6b28f3a1
change-id: 20240814-sysfs_fix-2206de9b0179
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
If z_erofs_gbuf_growsize() partially fails on a global buffer due to
memory allocation failure or fault injection (as reported by syzbot [1]),
new pages need to be freed by comparing to the existing pages to avoid
memory leaks.
However, the old gbuf->pages[] array may not be large enough, which can
lead to null-ptr-deref or out-of-bound access.
Fix this by checking against gbuf->nrpages in advance.
Fixes: d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
Cc: <stable(a)vger.kernel.org> # 6.10+
Cc: Chunhai Guo <guochunhai(a)vivo.com>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
---
fs/erofs/zutil.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/zutil.c b/fs/erofs/zutil.c
index 9b53883e5caf..37afe2024840 100644
--- a/fs/erofs/zutil.c
+++ b/fs/erofs/zutil.c
@@ -111,7 +111,8 @@ int z_erofs_gbuf_growsize(unsigned int nrpages)
out:
if (i < z_erofs_gbuf_count && tmp_pages) {
for (j = 0; j < nrpages; ++j)
- if (tmp_pages[j] && tmp_pages[j] != gbuf->pages[j])
+ if (tmp_pages[j] && (j >= gbuf->nrpages ||
+ tmp_pages[j] != gbuf->pages[j]))
__free_page(tmp_pages[j]);
kfree(tmp_pages);
}
--
2.43.5
Stream endpoint can skip part of TD during next transfer initialization
after beginning stopped during active stream data transfer.
The Set TR Dequeue Pointer command does not clear all internal
transfer-related variables that position stream endpoint on transfer ring.
USB Controller stores all endpoint state information within RsvdO fields
inside endpoint context structure. For stream endpoints, all relevant
information regarding particular StreamID is stored within corresponding
Stream Endpoint context.
Whenever driver wants to stop stream endpoint traffic, it invokes
Stop Endpoint command which forces the controller to dump all endpoint
state-related variables into RsvdO spaces into endpoint context and stream
endpoint context. Whenever driver wants to reinitialize endpoint starting
point on Transfer Ring, it uses the Set TR Dequeue Pointer command
to update dequeue pointer for particular stream in Stream Endpoint
Context. When stream endpoint is forced to stop active transfer in the
middle of TD, it dumps an information about TRB bytes left in RsvdO fields
in Stream Endpoint Context which will be used in next transfer
initialization to designate starting point for XDMA. This field is not
cleared during Set TR Dequeue Pointer command which causes XDMA to skip
over transfer ring and leads to data loss on stream pipe.
Patch fixes this by clearing out all RsvdO fields before initializing new
transfer via that StreamID.
Field Rsvd0 is reserved field, so patch should not have impact for other
xHCI controllers.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
drivers/usb/host/xhci-ring.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1dde53f6eb31..7fc1c4efcae2 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1385,7 +1385,20 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
if (ep->ep_state & EP_HAS_STREAMS) {
struct xhci_stream_ctx *ctx =
&ep->stream_info->stream_ctx_array[stream_id];
+ u32 edtl;
+
deq = le64_to_cpu(ctx->stream_ring) & SCTX_DEQ_MASK;
+ edtl = EVENT_TRB_LEN(le32_to_cpu(ctx->reserved[1]));
+
+ /*
+ * Existing Cadence xHCI controllers store some endpoint state information
+ * within Rsvd0 fields of Stream Endpoint context. This field is not
+ * cleared during Set TR Dequeue Pointer command which causes XDMA to skip
+ * over transfer ring and leads to data loss on stream pipe.
+ * To fix this issue driver must clear Rsvd0 field.
+ */
+ ctx->reserved[1] = 0;
+ ctx->reserved[0] = cpu_to_le32(edtl);
} else {
deq = le64_to_cpu(ep_ctx->deq) & ~EP_CTX_CYCLE_MASK;
}
--
2.43.0
Failing to read out rawclk makes it impossible to read out backlight,
which results in backlight not working when the backlight is off during
boot, or when reloading the module.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/display/xe_display.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 30dfdac9f6fa9..79add15c6c4c7 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -159,6 +159,9 @@ int xe_display_init_noirq(struct xe_device *xe)
intel_display_device_info_runtime_init(xe);
+ RUNTIME_INFO(xe)->rawclk_freq = intel_read_rawclk(xe);
+ drm_dbg(&xe->drm, "rawclk rate: %d kHz\n", RUNTIME_INFO(xe)->rawclk_freq);
+
err = intel_display_driver_probe_noirq(xe);
if (err) {
intel_opregion_cleanup(display);
--
2.45.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 61ebe5a747da649057c37be1c37eb934b4af79ca
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081917-rubber-cable-a9ab@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
61ebe5a747da ("mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0")
88ae5fb755b0 ("mm: vmalloc: enable memory allocation profiling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 61ebe5a747da649057c37be1c37eb934b4af79ca Mon Sep 17 00:00:00 2001
From: Hailong Liu <hailong.liu(a)oppo.com>
Date: Thu, 8 Aug 2024 20:19:56 +0800
Subject: [PATCH] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with
high order fallback to order 0
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e ("mm,
vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
__GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
failed for high order, the pages** may contain two different page shifts
(high order and order-0). This could lead __vmap_pages_range_noflush() to
perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for
PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
__vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
vmap_pages_range()
vmap_pages_range_noflush()
__vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails,
__vmalloc_node_range_noprof() will retry with order-0. Therefore, it is
unnecessary to fallback to order-0 here. Therefore, fix this by removing
the fallback code.
Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com>
Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Acked-by: Barry Song <baohua(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6b783baf12a1..af2de36549d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_noprof(alloc_gfp, order);
else
page = alloc_pages_node_noprof(nid, alloc_gfp, order);
- if (unlikely(!page)) {
- if (!nofail)
- break;
-
- /* fall back to the zero order allocations */
- alloc_gfp |= __GFP_NOFAIL;
- order = 0;
- continue;
- }
+ if (unlikely(!page))
+ break;
/*
* Higher order allocations must be able to be treated as
Amit and Johan both reported a NULL pointer dereference in the
pmic_glink client code during initialization, and Stephen Boyd pointed
out the problem (race condition).
While investigating, and writing the fix, I noticed that
ucsi_unregister() is called in atomic context but tries to sleep, and I
also noticed that the condition for when to inform the pmic_glink client
drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in
the works that makes the GLINK callback happen in a sleepable context,
which would remove the need for the clients list to be protected by a
spinlock, and removing the work scheduling. This is however not -rc
material...
In addition to the NULL pointer dereference, there is the -ECANCELED
issue reported here:
https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/
Johan reports that these fixes do not address that issue.
Signed-off-by: Bjorn Andersson <quic_bjorande(a)quicinc.com>
---
Changes in v2:
- Refer to the correct commit in the ucsi_unregister() patch.
- Updated wording in the same commit message about the new error message
in the log.
- Changed the data type of the introduced state variables, opted to go
for a bool as we only represent two states (and I would like to
further clean this up going forward)
- Initialized the spinlock
- Link to v1: https://lore.kernel.org/r/20240818-pmic-glink-v6-11-races-v1-0-f87c577e0bc9…
---
Bjorn Andersson (3):
soc: qcom: pmic_glink: Fix race during initialization
usb: typec: ucsi: Move unregister out of atomic section
soc: qcom: pmic_glink: Actually communicate with remote goes down
drivers/power/supply/qcom_battmgr.c | 16 ++++++++-----
drivers/soc/qcom/pmic_glink.c | 40 ++++++++++++++++++++++----------
drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++-----
drivers/usb/typec/ucsi/ucsi_glink.c | 43 ++++++++++++++++++++++++++---------
include/linux/soc/qcom/pmic_glink.h | 11 +++++----
5 files changed, 87 insertions(+), 40 deletions(-)
---
base-commit: 2fd613d27928293eaa87788b10e8befb6805cd42
change-id: 20240818-pmic-glink-v6-11-races-363f5964c339
Best regards,
--
Bjorn Andersson <quic_bjorande(a)quicinc.com>
Amit and Johan both reported a NULL pointer dereference in the
pmic_glink client code during initialization, and Stephen Boyd pointed
out the problem (race condition).
While investigating, and writing the fix, I noticed that
ucsi_unregister() is called in atomic context but tries to sleep, and I
also noticed that the condition for when to inform the pmic_glink client
drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in
the works that makes the GLINK callback happen in a sleepable context,
which would remove the need for the clients list to be protected by a
spinlock, and removing the work scheduling. This is however not -rc
material...
In addition to the NULL pointer dereference, there is the -ECANCELED
issue reported here:
https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/
I have not yet been able to either reproduce this or convince myself
that this is the same issue.
Signed-off-by: Bjorn Andersson <quic_bjorande(a)quicinc.com>
---
Bjorn Andersson (3):
soc: qcom: pmic_glink: Fix race during initialization
usb: typec: ucsi: Move unregister out of atomic section
soc: qcom: pmic_glink: Actually communicate with remote goes down
drivers/power/supply/qcom_battmgr.c | 16 ++++++++-----
drivers/soc/qcom/pmic_glink.c | 40 +++++++++++++++++++++----------
drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++-----
drivers/usb/typec/ucsi/ucsi_glink.c | 44 ++++++++++++++++++++++++++---------
include/linux/soc/qcom/pmic_glink.h | 11 +++++----
5 files changed, 88 insertions(+), 40 deletions(-)
---
base-commit: 296c871d2904cff2b4742702ef94512ab467a8e3
change-id: 20240818-pmic-glink-v6-11-races-363f5964c339
Best regards,
--
Bjorn Andersson <quic_bjorande(a)quicinc.com>
The patch titled
Subject: mm: krealloc: consider spare memory for __GFP_ZERO
has been added to the -mm mm-unstable branch. Its filename is
mm-krealloc-consider-spare-memory-for-__gfp_zero.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Danilo Krummrich <dakr(a)kernel.org>
Subject: mm: krealloc: consider spare memory for __GFP_ZERO
Date: Tue, 13 Aug 2024 00:34:34 +0200
As long as krealloc() is called with __GFP_ZERO consistently, starting
with the initial memory allocation, __GFP_ZERO should be fully honored.
However, if for an existing allocation krealloc() is called with a
decreased size, it is not ensured that the spare portion the allocation is
zeroed. Thus, if krealloc() is subsequently called with a larger size
again, __GFP_ZERO can't be fully honored, since we don't know the previous
size, but only the bucket size.
Example:
buf = kzalloc(64, GFP_KERNEL);
memset(buf, 0xff, 64);
buf = krealloc(buf, 48, GFP_KERNEL | __GFP_ZERO);
/* After this call the last 16 bytes are still 0xff. */
buf = krealloc(buf, 64, GFP_KERNEL | __GFP_ZERO);
Fix this, by explicitly setting spare memory to zero, when shrinking an
allocation with __GFP_ZERO flag set or init_on_alloc enabled.
Link: https://lkml.kernel.org/r/20240812223707.32049-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr(a)kernel.org>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab_common.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/slab_common.c~mm-krealloc-consider-spare-memory-for-__gfp_zero
+++ a/mm/slab_common.c
@@ -1273,6 +1273,13 @@ __do_krealloc(const void *p, size_t new_
/* If the object still fits, repoison it precisely. */
if (ks >= new_size) {
+ /* Zero out spare memory. */
+ if (want_init_on_alloc(flags)) {
+ kasan_disable_current();
+ memset((void *)p + new_size, 0, ks - new_size);
+ kasan_enable_current();
+ }
+
p = kasan_krealloc((void *)p, new_size, flags);
return (void *)p;
}
_
Patches currently in -mm which might be from dakr(a)kernel.org are
mm-vmalloc-implement-vrealloc.patch
mm-vmalloc-implement-vrealloc-fix.patch
mm-vmalloc-implement-vrealloc-fix-2.patch
mm-vmalloc-implement-vrealloc-fix-3.patch
mm-vmalloc-implement-vrealloc-fix-4.patch
mm-kvmalloc-align-kvrealloc-with-krealloc.patch
mm-kvmalloc-align-kvrealloc-with-krealloc-fix.patch
mm-kvmalloc-align-kvrealloc-with-krealloc-fix-2.patch
mm-kvmalloc-align-kvrealloc-with-krealloc-fix-3.patch
mm-krealloc-consider-spare-memory-for-__gfp_zero.patch
mm-krealloc-clarify-valid-usage-of-__gfp_zero.patch
The quilt patch titled
Subject: kunit/overflow: fix UB in overflow_allocation_test
has been removed from the -mm tree. Its filename was
kunit-overflow-fix-ub-in-overflow_allocation_test.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Subject: kunit/overflow: fix UB in overflow_allocation_test
Date: Thu, 15 Aug 2024 01:04:31 +0100
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used
as a driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN
enabled due to undefined behaviour.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Link: https://lkml.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Tested-by: Erhard Furtner <erhard_f(a)mailbox.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/overflow_kunit.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/overflow_kunit.c~kunit-overflow-fix-ub-in-overflow_allocation_test
+++ a/lib/overflow_kunit.c
@@ -668,7 +668,6 @@ DEFINE_TEST_ALLOC(devm_kzalloc, devm_kf
static void overflow_allocation_test(struct kunit *test)
{
- const char device_name[] = "overflow-test";
struct device *dev;
int count = 0;
@@ -678,7 +677,7 @@ static void overflow_allocation_test(str
} while (0)
/* Create dummy device for devm_kmalloc()-family tests. */
- dev = kunit_device_register(test, device_name);
+ dev = kunit_device_register(test, "overflow-test");
KUNIT_ASSERT_FALSE_MSG(test, IS_ERR(dev),
"Cannot register test device\n");
_
Patches currently in -mm which might be from ivan.orlov0322(a)gmail.com are
The check_unaligned_access_emulated() function should have been called
during CPU hotplug to ensure that if all CPUs had emulated unaligned
accesses, the new CPU also does.
This patch adds the call to check_unaligned_access_emulated() in
the hotplug path.
Fixes: 55e0bf49a0d0 ("RISC-V: Probe misaligned access speed in parallel")
Signed-off-by: Jesse Taube <jesse(a)rivosinc.com>
Reviewed-by: Evan Green <evan(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
---
V5 -> V6:
- New patch
V6 -> V7:
- No changes
V7 -> V8:
- Rebase onto fixes
---
arch/riscv/kernel/unaligned_access_speed.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 160628a2116d..f3508cc54f91 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -191,6 +191,7 @@ static int riscv_online_cpu(unsigned int cpu)
if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
goto exit;
+ check_unaligned_access_emulated(NULL);
buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
if (!buf) {
pr_warn("Allocation failure, not measuring misaligned performance\n");
--
2.45.2
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS),
when executing the hdparm command for my WD-WCC7K4NLX884 drives to get
the active or standby state:
$ hdparm -C /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda
/dev/sda:
drive state is: active/idle
I did a bisection within the stable series and found the following
commit to be the first bad one:
28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
According to kernel.dance the same commit was also backported to the
v6.10.3 and v6.1.103 stable kernels and I could not find any commit or
pending patch with a "Fixes:" tag for the offending commit.
So far I have not been able to test with the mainline kernel as this is
a remote device which I couldn't rescue in case of a boot failure. Also
just for transparency it does have the out of tree ZFS module loaded,
but AFAIU this shouldn't be an issue here, as the commit seems clearly
related to the error. If needed I can test with an untainted mainline
kernel on Friday when I'm near the device.
I have attached the output of hdparm -I below and would be happy to
provide further debug information or test patches.
Cheers,
Christian
---
#regzbot introduced: 28ab9769117c
#regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives
---
$ pacman -Q hdparm
hdparm 9.65-2
$ hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: WDC WD40EFRX-68N32N0
Serial Number: WD-WCC7K4NLX884
Firmware Revision: 82.00A82
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x006d)
Supported: 10 9 8 7 6 5
Likely used: 10
Configuration:
Logical max current
cylinders 16383 0
heads 16 0
sectors/track 63 0
--
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 7814037168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 3815447 MBytes
device size with M = 1000*1000: 4000787 MBytes (4000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* Idle-Unload when NCQ is active
* NCQ priority information
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
* Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
* DOWNLOAD MICROCODE DMA command
* WRITE BUFFER DMA command
* READ BUFFER DMA command
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
504min for SECURITY ERASE UNIT. 504min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2647735a1
NAA : 5
IEEE OUI : 0014ee
Unique ID : 2647735a1
Checksum: correct
Coherent surfaces make only sense if the host renders to them using
accelerated apis. Without 3d the entire content of dumb buffers stays
in the guest making all of the extra work they're doing to synchronize
between guest and host useless.
Configurations without 3d also tend to run with very low graphics
memory limits. The pinned console fb, mob cursors and graphical login
manager tend to run out of 16MB graphics memory that those guests use.
Fix it by making sure the coherent dumb buffers are only used on
configs with 3d enabled.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Reported-by: Christian Heusel <christian(a)heusel.eu>
Closes: https://lore.kernel.org/all/0d0330f3-2ac0-4cd5-8075-7f1cbaf72a8e@heusel.eu
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.9+
Cc: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Cc: Martin Krastev <martin.krastev(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 8ae6a761c900..1625b30d9970 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -2283,9 +2283,11 @@ int vmw_dumb_create(struct drm_file *file_priv,
/*
* Without mob support we're just going to use raw memory buffer
* because we wouldn't be able to support full surface coherency
- * without mobs
+ * without mobs. There also no reason to support surface coherency
+ * without 3d (i.e. gpu usage on the host) because then all the
+ * contents is going to be rendered guest side.
*/
- if (!dev_priv->has_mob) {
+ if (!dev_priv->has_mob || !vmw_supports_3d(dev_priv)) {
int cpp = DIV_ROUND_UP(args->bpp, 8);
switch (cpp) {
--
2.43.0
In aperture_remove_conflicting_pci_devices(), match the pci
device determine whether or not to call sysfb_disable(). This
fixes cases where the pimary device is not VGA compatible which
leads to the following problem:
1. A PCI device with a non-VGA class is the the boot display
2. That device is probed first and it is not a vga device so
sysfb_disable() is not called, but the device resources
are freed by aperture_detach_platform_device()
3. Non-primary GPU is vga device and it ends up calling sysfb_disable()
4. NULL pointer dereference via sysfb_disable() since the resources
have already been freed by aperture_detach_platform_device() when
it was called by the other device.
Fix this by calling sysfb_disable() on the device associated with it.
Fixes: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/video/aperture.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..56a5a0bc2b1a 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -6,6 +6,7 @@
#include <linux/mutex.h>
#include <linux/pci.h>
#include <linux/platform_device.h>
+#include <linux/screen_info.h>
#include <linux/slab.h>
#include <linux/sysfb.h>
#include <linux/types.h>
@@ -346,6 +347,7 @@ EXPORT_SYMBOL(__aperture_remove_legacy_vga_devices);
*/
int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name)
{
+ struct screen_info *si = &screen_info;
bool primary = false;
resource_size_t base, size;
int bar, ret = 0;
@@ -353,7 +355,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
if (pdev == vga_default_device())
primary = true;
- if (primary)
+ if (pdev == screen_info_pci_dev(si))
sysfb_disable();
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
--
2.45.2
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
v2: Update doc strings
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.43.0
Some FSA4480-compatible chips like the OCP96011 used on Fairphone 5
return 0x00 from the CHIP_ID register. Handle that gracefully and only
fail probe when the I2C read has failed.
With this the dev_dbg will print 0 but otherwise continue working.
[ 0.251581] fsa4480 1-0042: Found FSA4480 v0.0 (Vendor ID = 0)
Cc: stable(a)vger.kernel.org
Fixes: e885f5f1f2b4 ("usb: typec: fsa4480: Check if the chip is really there")
Signed-off-by: Luca Weiss <luca.weiss(a)fairphone.com>
---
drivers/usb/typec/mux/fsa4480.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/mux/fsa4480.c b/drivers/usb/typec/mux/fsa4480.c
index cd235339834b..f71dba8bf07c 100644
--- a/drivers/usb/typec/mux/fsa4480.c
+++ b/drivers/usb/typec/mux/fsa4480.c
@@ -274,7 +274,7 @@ static int fsa4480_probe(struct i2c_client *client)
return dev_err_probe(dev, PTR_ERR(fsa->regmap), "failed to initialize regmap\n");
ret = regmap_read(fsa->regmap, FSA4480_DEVICE_ID, &val);
- if (ret || !val)
+ if (ret)
return dev_err_probe(dev, -ENODEV, "FSA4480 not found\n");
dev_dbg(dev, "Found FSA4480 v%lu.%lu (Vendor ID = %lu)\n",
---
base-commit: ccdbf91fdf5a71881ef32b41797382c4edd6f670
change-id: 20240818-fsa4480-chipid-fix-2c7cf5810135
Best regards,
--
Luca Weiss <luca.weiss(a)fairphone.com>
Failing to read out rawclk makes it impossible to read out backlight,
which results in backlight not working when the backlight is off during
boot, or when reloading the module.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/display/xe_display.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index ca4468c820788..65d7a5040270e 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -157,6 +157,9 @@ int xe_display_init_noirq(struct xe_device *xe)
intel_display_device_info_runtime_init(xe);
+ RUNTIME_INFO(xe)->rawclk_freq = intel_read_rawclk(xe);
+ drm_dbg(&xe->drm, "rawclk rate: %d kHz\n", RUNTIME_INFO(xe)->rawclk_freq);
+
err = intel_display_driver_probe_noirq(xe);
if (err) {
intel_opregion_cleanup(xe);
--
2.45.2
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081926-dumpling-ecard-9941@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081916-hastily-apostle-7025@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-sample-happening-317a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081910-grid-bobsled-6cb3@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081909-comment-hypocrisy-a802@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081908-dork-steadfast-6a68@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 737222cebecbdbcdde2b69475c52bcb9ecfeb830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081937-granular-passably-9226@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
737222cebecb ("drm/amd/display: fix cursor offset on rotation 180")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737222cebecbdbcdde2b69475c52bcb9ecfeb830 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Tue, 31 Jan 2023 15:05:46 -0100
Subject: [PATCH] drm/amd/display: fix cursor offset on rotation 180
[why & how]
Cursor gets clipped off in the middle of the screen with hw
rotation 180. Fix a miscalculation of cursor offset when it's
placed near the edges in the pipe split case.
Cursor bugs with hw rotation were reported on AMD issue
tracker:
https://gitlab.freedesktop.org/drm/amd/-/issues/2247
The issues on rotation 270 was fixed by:
https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.c…
that partially addressed the rotation 180 too. So, this patch is the
final bits for rotation 180.
Reported-by: Xaver Hugl <xaver.hugl(a)gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index ff03b1d98aa7..1b9ac8812f5b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3589,7 +3589,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = temp_x + viewport_width;
+ pos_cpy.x = 2 * viewport_width - temp_x;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 737222cebecbdbcdde2b69475c52bcb9ecfeb830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081932-diabetes-etching-e527@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
737222cebecb ("drm/amd/display: fix cursor offset on rotation 180")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737222cebecbdbcdde2b69475c52bcb9ecfeb830 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Tue, 31 Jan 2023 15:05:46 -0100
Subject: [PATCH] drm/amd/display: fix cursor offset on rotation 180
[why & how]
Cursor gets clipped off in the middle of the screen with hw
rotation 180. Fix a miscalculation of cursor offset when it's
placed near the edges in the pipe split case.
Cursor bugs with hw rotation were reported on AMD issue
tracker:
https://gitlab.freedesktop.org/drm/amd/-/issues/2247
The issues on rotation 270 was fixed by:
https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.c…
that partially addressed the rotation 180 too. So, this patch is the
final bits for rotation 180.
Reported-by: Xaver Hugl <xaver.hugl(a)gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index ff03b1d98aa7..1b9ac8812f5b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3589,7 +3589,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = temp_x + viewport_width;
+ pos_cpy.x = 2 * viewport_width - temp_x;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0dbb81d44108a2a1004e5b485ef3fca5bc078424
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081907-uptight-blah-bb36@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
0dbb81d44108 ("drm/amd/display: Enable otg synchronization logic for DCN321")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0dbb81d44108a2a1004e5b485ef3fca5bc078424 Mon Sep 17 00:00:00 2001
From: Loan Chen <lo-an.chen(a)amd.com>
Date: Fri, 2 Aug 2024 13:57:40 +0800
Subject: [PATCH] drm/amd/display: Enable otg synchronization logic for DCN321
[Why]
Tiled display cannot synchronize properly after S3.
The fix for commit 5f0c74915815 ("drm/amd/display: Fix for otg
synchronization logic") is not enable in DCN321, which causes
the otg is excluded from synchronization.
[How]
Enable otg synchronization logic in dcn321.
Fixes: 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com>
Signed-off-by: Loan Chen <lo-an.chen(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit d6ed53712f583423db61fbb802606759e023bf7b)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c
index 9a3cc0514a36..8e0588b1cf30 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c
@@ -1778,6 +1778,9 @@ static bool dcn321_resource_construct(
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1;
+ /* Use pipe context based otg sync logic */
+ dc->config.use_pipe_ctx_sync_logic = true;
+
dc->config.dc_mode_clk_limit_support = true;
dc->config.enable_windowed_mpo_odm = true;
/* read VBIOS LTTPR caps */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 56fb276d0244d430496f249335a44ae114dd5f54
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081912-custodian-handgrip-81cc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
56fb276d0244 ("drm/amd/display: Adjust cursor position")
e53524cdcc02 ("drm/amd/display: Refactor HWSS into component folder")
6e2c4941ce0c ("drm/amd/display: Move dml code under CONFIG_DRM_AMD_DC_FP guard")
1cb87e048975 ("drm/amd/display: Add DCN35 blocks to Makefile")
0fa45b6aeae4 ("drm/amd/display: Add DCN35 Resource")
ec129fa356be ("drm/amd/display: Add DCN35 init")
6f8b7565cca4 ("drm/amd/display: Add DCN35 HWSEQ")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56fb276d0244d430496f249335a44ae114dd5f54 Mon Sep 17 00:00:00 2001
From: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Date: Thu, 1 Aug 2024 16:16:35 -0600
Subject: [PATCH] drm/amd/display: Adjust cursor position
[why & how]
When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 1b9ac8812f5b..14a902ff3b8a 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3682,7 +3682,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = 2 * viewport_width - temp_x;
+ pos_cpy.x = temp_x + viewport_width;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 56fb276d0244d430496f249335a44ae114dd5f54
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-spending-purchase-3a01@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
56fb276d0244 ("drm/amd/display: Adjust cursor position")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56fb276d0244d430496f249335a44ae114dd5f54 Mon Sep 17 00:00:00 2001
From: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Date: Thu, 1 Aug 2024 16:16:35 -0600
Subject: [PATCH] drm/amd/display: Adjust cursor position
[why & how]
When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 1b9ac8812f5b..14a902ff3b8a 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3682,7 +3682,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = 2 * viewport_width - temp_x;
+ pos_cpy.x = temp_x + viewport_width;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 807174a93d24c456503692dc3f5af322ee0b640a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-outbid-skincare-1e41@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
807174a93d24 ("mm: fix endless reclaim on machines with unaccepted memory")
57c0419c5f0e ("mm, pcp: decrease PCP high if free pages < high watermark")
51a755c56dc0 ("mm: tune PCP high automatically")
90b41691b988 ("mm: add framework for PCP high auto-tuning")
c0a242394cb9 ("mm, page_alloc: scale the number of pages that are batch allocated")
52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too long latency")
362d37a106dd ("mm, pcp: reduce lock contention for draining high-order pages")
94a3bfe4073c ("cacheinfo: calculate size of per-CPU data cache slice")
ca71fe1ad922 ("mm, pcp: avoid to drain PCP when process exit")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 807174a93d24c456503692dc3f5af322ee0b640a Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Fri, 9 Aug 2024 14:48:47 +0300
Subject: [PATCH] mm: fix endless reclaim on machines with unaccepted memory
Unaccepted memory is considered unusable free memory, which is not counted
as free on the zone watermark check. This causes get_page_from_freelist()
to accept more memory to hit the high watermark, but it creates problems
in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts to
reclaim memory. This is usually successful, but if there is little or no
reclaimable memory, it can result in endless reclaim with little to no
progress. This can occur early in the boot process, just after start of
the init process when the only reclaimable memory is the page cache of the
init executable and its libraries.
Make unaccepted memory free from watermark check point of view. This way
unaccepted memory will never be the trigger of memory reclaim. Accept
more memory in the get_page_from_freelist() if needed.
Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Jianxiong Gao <jxgao(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Jianxiong Gao <jxgao(a)google.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 875d76e8684a..8747087acee3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -287,7 +287,7 @@ EXPORT_SYMBOL(nr_online_nodes);
static bool page_contains_unaccepted(struct page *page, unsigned int order);
static void accept_page(struct page *page, unsigned int order);
-static bool try_to_accept_memory(struct zone *zone, unsigned int order);
+static bool cond_accept_memory(struct zone *zone, unsigned int order);
static inline bool has_unaccepted_memory(void);
static bool __free_unaccepted(struct page *page);
@@ -3072,9 +3072,6 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
if (!(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
-#ifdef CONFIG_UNACCEPTED_MEMORY
- unusable_free += zone_page_state(z, NR_UNACCEPTED);
-#endif
return unusable_free;
}
@@ -3368,6 +3365,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
}
}
+ cond_accept_memory(zone, order);
+
/*
* Detect whether the number of free pages is below high
* watermark. If so, we will decrease pcp->high and free
@@ -3393,10 +3392,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
gfp_mask)) {
int ret;
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
@@ -3450,10 +3447,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
return page;
} else {
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/* Try again if zone has deferred pages */
@@ -6950,9 +6945,6 @@ static bool try_to_accept_memory_one(struct zone *zone)
struct page *page;
bool last;
- if (list_empty(&zone->unaccepted_pages))
- return false;
-
spin_lock_irqsave(&zone->lock, flags);
page = list_first_entry_or_null(&zone->unaccepted_pages,
struct page, lru);
@@ -6978,23 +6970,29 @@ static bool try_to_accept_memory_one(struct zone *zone)
return true;
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
long to_accept;
- int ret = false;
+ bool ret = false;
+
+ if (!has_unaccepted_memory())
+ return false;
+
+ if (list_empty(&zone->unaccepted_pages))
+ return false;
/* How much to accept to get to high watermark? */
to_accept = high_wmark_pages(zone) -
(zone_page_state(zone, NR_FREE_PAGES) -
- __zone_watermark_unusable_free(zone, order, 0));
+ __zone_watermark_unusable_free(zone, order, 0) -
+ zone_page_state(zone, NR_UNACCEPTED));
- /* Accept at least one page */
- do {
+ while (to_accept > 0) {
if (!try_to_accept_memory_one(zone))
break;
ret = true;
to_accept -= MAX_ORDER_NR_PAGES;
- } while (to_accept > 0);
+ }
return ret;
}
@@ -7037,7 +7035,7 @@ static void accept_page(struct page *page, unsigned int order)
{
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
return false;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081935-borax-concerned-4bcc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-audience-hedging-fac5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081934-daffodil-dingy-d012@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-sulphuric-enamel-7da6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081932-idealism-parabola-3435@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e30729d4bd4001881be4d1ad4332a5d4985398f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081951-enrich-hesitate-0db0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e30729d4bd40 ("btrfs: zoned: properly take lock to read/update block group's zoned variables")
8cd44dd1d17a ("btrfs: zoned: fix zone_unusable accounting on making block group read-write again")
b9fd2affe4aa ("btrfs: zoned: fix initial free space detection")
6a8ebc773ef6 ("btrfs: zoned: no longer count fresh BG region as zone unusable")
fa2068d7e922 ("btrfs: zoned: count fresh BG region as zone unusable")
3349b57fd47b ("btrfs: convert block group bit field to use bit helpers")
9d4b0a129a0d ("btrfs: simplify arguments of btrfs_update_space_info and rename")
6ca64ac27631 ("btrfs: zoned: fix mounting with conventional zones")
ced8ecf026fd ("btrfs: fix space cache corruption and potential double allocations")
b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes")
393f646e34c1 ("btrfs: zoned: finish least available block group on data bg allocation")
bb9950d3df71 ("btrfs: let can_allocate_chunk return error")
f6fca3917b4d ("btrfs: store chunk size in space-info struct")
b8bea09a456f ("btrfs: add trace event for submitted RAID56 bio")
c67c68eb57f1 ("btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap")
143823cf4d5a ("btrfs: fix typos in comments")
b3a3b0255797 ("btrfs: zoned: drop optimization of zone finish")
343d8a30851c ("btrfs: zoned: prevent allocation from previous data relocation BG")
d70cbdda75da ("btrfs: zoned: consolidate zone finish functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e30729d4bd4001881be4d1ad4332a5d4985398f8 Mon Sep 17 00:00:00 2001
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Date: Thu, 1 Aug 2024 16:47:52 +0900
Subject: [PATCH] btrfs: zoned: properly take lock to read/update block group's
zoned variables
__btrfs_add_free_space_zoned() references and modifies bg's alloc_offset,
ro, and zone_unusable, but without taking the lock. It is mostly safe
because they monotonically increase (at least for now) and this function is
mostly called by a transaction commit, which is serialized by itself.
Still, taking the lock is a safer and correct option and I'm going to add a
change to reset zone_unusable while a block group is still alive. So, add
locking around the operations.
Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f5996a43db24..eaa1dbd31352 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2697,15 +2697,16 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
u64 offset = bytenr - block_group->start;
u64 to_free, to_unusable;
int bg_reclaim_threshold = 0;
- bool initial = ((size == block_group->length) && (block_group->alloc_offset == 0));
+ bool initial;
u64 reclaimable_unusable;
- WARN_ON(!initial && offset + size > block_group->zone_capacity);
+ spin_lock(&block_group->lock);
+ initial = ((size == block_group->length) && (block_group->alloc_offset == 0));
+ WARN_ON(!initial && offset + size > block_group->zone_capacity);
if (!initial)
bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold);
- spin_lock(&ctl->tree_lock);
if (!used)
to_free = size;
else if (initial)
@@ -2718,7 +2719,9 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
to_free = offset + size - block_group->alloc_offset;
to_unusable = size - to_free;
+ spin_lock(&ctl->tree_lock);
ctl->free_space += to_free;
+ spin_unlock(&ctl->tree_lock);
/*
* If the block group is read-only, we should account freed space into
* bytes_readonly.
@@ -2727,11 +2730,8 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
block_group->zone_unusable += to_unusable;
WARN_ON(block_group->zone_unusable > block_group->length);
}
- spin_unlock(&ctl->tree_lock);
if (!used) {
- spin_lock(&block_group->lock);
block_group->alloc_offset -= size;
- spin_unlock(&block_group->lock);
}
reclaimable_unusable = block_group->zone_unusable -
@@ -2745,6 +2745,8 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
btrfs_mark_bg_to_reclaim(block_group);
}
+ spin_unlock(&block_group->lock);
+
return 0;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-lyricism-manly-3272@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081931-slot-improving-a44a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081929-drastic-shale-7639@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-overstay-bullseye-eee1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-knoll-dropkick-f11a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-scrubber-excretion-0f83@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
3ea4dc5bf00c ("btrfs: send: send compressed extents with encoded writes")
6fe81a3a3ac8 ("btrfs: balance btree dirty pages and delayed items after clone and dedupe")
152555b39ceb ("btrfs: send: avoid trashing the page cache")
521b6803f22e ("btrfs: send: keep the current inode open while processing it")
1c6cbbbeeeca ("btrfs: remove inode_dio_wait() calls when starting reflink operations")
6b1f86f8e9c7 ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-cupped-curing-f315@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
3ea4dc5bf00c ("btrfs: send: send compressed extents with encoded writes")
6fe81a3a3ac8 ("btrfs: balance btree dirty pages and delayed items after clone and dedupe")
152555b39ceb ("btrfs: send: avoid trashing the page cache")
521b6803f22e ("btrfs: send: keep the current inode open while processing it")
1c6cbbbeeeca ("btrfs: remove inode_dio_wait() calls when starting reflink operations")
6b1f86f8e9c7 ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-shrivel-overstay-1394@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081909-blunt-glass-4ea5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081954-purplish-scarecrow-7568@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081953-calm-granola-8c1b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081952-clatter-tribute-f81c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081951-anyhow-fool-5756@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081942-rippling-relieving-c8d1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
2bc481362245 ("selftests/mm: add -a to run_vmtests.sh")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081941-agility-fable-8749@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
2bc481362245 ("selftests/mm: add -a to run_vmtests.sh")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081940-stopped-knickers-3a22@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d75abd0d0bc29e6ebfebbf76d11b4067b35844af
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-stardust-create-e577@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d75abd0d0bc2 ("mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu")
96f96763de26 ("mm: memory-failure: convert to pr_fmt()")
98931dd95fd4 ("Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d75abd0d0bc29e6ebfebbf76d11b4067b35844af Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Tue, 6 Aug 2024 12:41:07 -0400
Subject: [PATCH] mm/memory-failure: use raw_spinlock_t in struct
memory_failure_cpu
The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption. The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
[12135.732252] preempt_count: 1, expected: 0
[12135.732255] RCU nest depth: 2, expected: 2
:
[12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
[12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
[12135.732433] Call Trace:
[12135.732436] <TASK>
[12135.732450] dump_stack_lvl+0x57/0x81
[12135.732461] __might_resched.cold+0xf4/0x12f
[12135.732479] rt_spin_lock+0x4c/0x100
[12135.732491] memory_failure_queue+0x40/0xe0
[12135.732503] ghes_do_memory_failure+0x53/0x390
[12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
[12135.732575] ghes_proc+0xf9/0x1a0
[12135.732591] ghes_notify_hed+0x6a/0x150
[12135.732602] notifier_call_chain+0x43/0xb0
[12135.732626] blocking_notifier_call_chain+0x43/0x60
[12135.732637] acpi_ev_notify_dispatch+0x47/0x70
[12135.732648] acpi_os_execute_deferred+0x13/0x20
[12135.732654] process_one_work+0x41f/0x500
[12135.732695] worker_thread+0x192/0x360
[12135.732715] kthread+0x111/0x140
[12135.732733] ret_from_fork+0x29/0x50
[12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.
[longman(a)redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 581d3e5c9117..7066fc84f351 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2417,7 +2417,7 @@ struct memory_failure_entry {
struct memory_failure_cpu {
DECLARE_KFIFO(fifo, struct memory_failure_entry,
MEMORY_FAILURE_FIFO_SIZE);
- spinlock_t lock;
+ raw_spinlock_t lock;
struct work_struct work;
};
@@ -2443,20 +2443,22 @@ void memory_failure_queue(unsigned long pfn, int flags)
{
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
+ bool buffer_overflow;
struct memory_failure_entry entry = {
.pfn = pfn,
.flags = flags,
};
mf_cpu = &get_cpu_var(memory_failure_cpu);
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
- if (kfifo_put(&mf_cpu->fifo, entry))
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
+ if (!buffer_overflow)
schedule_work_on(smp_processor_id(), &mf_cpu->work);
- else
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ put_cpu_var(memory_failure_cpu);
+ if (buffer_overflow)
pr_err("buffer overflow when queuing memory failure at %#lx\n",
pfn);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
- put_cpu_var(memory_failure_cpu);
}
EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2469,9 +2471,9 @@ static void memory_failure_work_func(struct work_struct *work)
mf_cpu = container_of(work, struct memory_failure_cpu, work);
for (;;) {
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
gotten = kfifo_get(&mf_cpu->fifo, &entry);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
@@ -2501,7 +2503,7 @@ static int __init memory_failure_init(void)
for_each_possible_cpu(cpu) {
mf_cpu = &per_cpu(memory_failure_cpu, cpu);
- spin_lock_init(&mf_cpu->lock);
+ raw_spin_lock_init(&mf_cpu->lock);
INIT_KFIFO(mf_cpu->fifo);
INIT_WORK(&mf_cpu->work, memory_failure_work_func);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 58a63729c957621f1990c3494c702711188ca347
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081950-tweed-tattle-7f2c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
18010ff776fa ("net: mana: Fix race on per-CQ variable napi work_done")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58a63729c957621f1990c3494c702711188ca347 Mon Sep 17 00:00:00 2001
From: Long Li <longli(a)microsoft.com>
Date: Fri, 9 Aug 2024 08:58:58 -0700
Subject: [PATCH] net: mana: Fix doorbell out of order violation and avoid
unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable(a)vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Long Li <longli(a)microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhy…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ae717d06e66f..39f56973746d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1792,7 +1792,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1803,16 +1802,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
+ cq->work_done_since_doorbell += w;
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even if there is no need to arm the CQ.
+ * This driver rings the doorbell as soon as we have exceeded
+ * 4 wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 58a63729c957621f1990c3494c702711188ca347
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081949-strategy-gatherer-758e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58a63729c957621f1990c3494c702711188ca347 Mon Sep 17 00:00:00 2001
From: Long Li <longli(a)microsoft.com>
Date: Fri, 9 Aug 2024 08:58:58 -0700
Subject: [PATCH] net: mana: Fix doorbell out of order violation and avoid
unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable(a)vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Long Li <longli(a)microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhy…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ae717d06e66f..39f56973746d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1792,7 +1792,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1803,16 +1802,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
+ cq->work_done_since_doorbell += w;
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even if there is no need to arm the CQ.
+ * This driver rings the doorbell as soon as we have exceeded
+ * 4 wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 38055789d15155109b41602ad719d770af507030
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081920-gating-yummy-71e0@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
38055789d151 ("wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850")
26dd8ccdba4d ("wifi: ath12k: dynamic VLAN support")
97b7cbb7a3cb ("wifi: ath12k: support SMPS configuration for 6 GHz")
f0e61dc7ecf9 ("wifi: ath12k: refactor SMPS configuration")
112dbc6af807 ("wifi: ath12k: add 6 GHz params in peer assoc command")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38055789d15155109b41602ad719d770af507030 Mon Sep 17 00:00:00 2001
From: Baochen Qiang <quic_bqiang(a)quicinc.com>
Date: Thu, 1 Aug 2024 18:04:07 +0300
Subject: [PATCH] wifi: ath12k: use 128 bytes aligned iova in transmit path for
WCN7850
In transmit path, it is likely that the iova is not aligned to PCIe TLP
max payload size, which is 128 for WCN7850. Normally in such cases hardware
is expected to split the packet into several parts in a manner such that
they, other than the first one, have aligned iova. However due to hardware
limitations, WCN7850 does not behave like that properly with some specific
unaligned iova in transmit path. This easily results in target hang in a
KPI transmit test: packet send/receive failure, WMI command send timeout
etc. Also fatal error seen in PCIe level:
...
Capabilities: ...
...
DevSta: ... FatalErr+ ...
...
...
Work around this by manually moving/reallocating payload buffer such that
we can map it to a 128 bytes aligned iova. The moving requires sufficient
head room or tail room in skb: for the former we can do ourselves a favor
by asking some extra bytes when registering with mac80211, while for the
latter we can do nothing.
Moving/reallocating buffer consumes additional CPU cycles, but the good news
is that an aligned iova increases PCIe efficiency. In my tests on some X86
platforms the KPI results are almost consistent.
Since this is seen only with WCN7850, add a new hardware parameter to
differentiate from others.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Baochen Qiang <quic_bqiang(a)quicinc.com>
Cc: <stable(a)vger.kernel.org>
Tested-by: Mark Pearson <mpearson-lenovo(a)squebb.ca>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://patch.msgid.link/20240715023814.20242-1-quic_bqiang@quicinc.com
diff --git a/drivers/net/wireless/ath/ath12k/dp_tx.c b/drivers/net/wireless/ath/ath12k/dp_tx.c
index d08c04343e90..44406e0b4a34 100644
--- a/drivers/net/wireless/ath/ath12k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath12k/dp_tx.c
@@ -162,6 +162,60 @@ static int ath12k_dp_prepare_htt_metadata(struct sk_buff *skb)
return 0;
}
+static void ath12k_dp_tx_move_payload(struct sk_buff *skb,
+ unsigned long delta,
+ bool head)
+{
+ unsigned long len = skb->len;
+
+ if (head) {
+ skb_push(skb, delta);
+ memmove(skb->data, skb->data + delta, len);
+ skb_trim(skb, len);
+ } else {
+ skb_put(skb, delta);
+ memmove(skb->data + delta, skb->data, len);
+ skb_pull(skb, delta);
+ }
+}
+
+static int ath12k_dp_tx_align_payload(struct ath12k_base *ab,
+ struct sk_buff **pskb)
+{
+ u32 iova_mask = ab->hw_params->iova_mask;
+ unsigned long offset, delta1, delta2;
+ struct sk_buff *skb2, *skb = *pskb;
+ unsigned int headroom = skb_headroom(skb);
+ int tailroom = skb_tailroom(skb);
+ int ret = 0;
+
+ offset = (unsigned long)skb->data & iova_mask;
+ delta1 = offset;
+ delta2 = iova_mask - offset + 1;
+
+ if (headroom >= delta1) {
+ ath12k_dp_tx_move_payload(skb, delta1, true);
+ } else if (tailroom >= delta2) {
+ ath12k_dp_tx_move_payload(skb, delta2, false);
+ } else {
+ skb2 = skb_realloc_headroom(skb, iova_mask);
+ if (!skb2) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ dev_kfree_skb_any(skb);
+
+ offset = (unsigned long)skb2->data & iova_mask;
+ if (offset)
+ ath12k_dp_tx_move_payload(skb2, offset, true);
+ *pskb = skb2;
+ }
+
+out:
+ return ret;
+}
+
int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
struct sk_buff *skb)
{
@@ -184,6 +238,7 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
bool tcl_ring_retry;
bool msdu_ext_desc = false;
bool add_htt_metadata = false;
+ u32 iova_mask = ab->hw_params->iova_mask;
if (test_bit(ATH12K_FLAG_CRASH_FLUSH, &ar->ab->dev_flags))
return -ESHUTDOWN;
@@ -279,6 +334,23 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
goto fail_remove_tx_buf;
}
+ if (iova_mask &&
+ (unsigned long)skb->data & iova_mask) {
+ ret = ath12k_dp_tx_align_payload(ab, &skb);
+ if (ret) {
+ ath12k_warn(ab, "failed to align TX buffer %d\n", ret);
+ /* don't bail out, give original buffer
+ * a chance even unaligned.
+ */
+ goto map;
+ }
+
+ /* hdr is pointing to a wrong place after alignment,
+ * so refresh it for later use.
+ */
+ hdr = (void *)skb->data;
+ }
+map:
ti.paddr = dma_map_single(ab->dev, skb->data, skb->len, DMA_TO_DEVICE);
if (dma_mapping_error(ab->dev, ti.paddr)) {
atomic_inc(&ab->soc_stats.tx_err.misc_fail);
diff --git a/drivers/net/wireless/ath/ath12k/hw.c b/drivers/net/wireless/ath/ath12k/hw.c
index 2e11ea763574..7b0b6a7f4701 100644
--- a/drivers/net/wireless/ath/ath12k/hw.c
+++ b/drivers/net/wireless/ath/ath12k/hw.c
@@ -924,6 +924,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = NULL,
.supports_dynamic_smps_6ghz = true,
+
+ .iova_mask = 0,
},
{
.name = "wcn7850 hw2.0",
@@ -1000,6 +1002,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = &wcn7850_uuid,
.supports_dynamic_smps_6ghz = false,
+
+ .iova_mask = ATH12K_PCIE_MAX_PAYLOAD_SIZE - 1,
},
{
.name = "qcn9274 hw2.0",
@@ -1072,6 +1076,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = NULL,
.supports_dynamic_smps_6ghz = true,
+
+ .iova_mask = 0,
},
};
diff --git a/drivers/net/wireless/ath/ath12k/hw.h b/drivers/net/wireless/ath/ath12k/hw.h
index e792eb6b249b..b1d302c48326 100644
--- a/drivers/net/wireless/ath/ath12k/hw.h
+++ b/drivers/net/wireless/ath/ath12k/hw.h
@@ -96,6 +96,8 @@
#define ATH12K_M3_FILE "m3.bin"
#define ATH12K_REGDB_FILE_NAME "regdb.bin"
+#define ATH12K_PCIE_MAX_PAYLOAD_SIZE 128
+
enum ath12k_hw_rate_cck {
ATH12K_HW_RATE_CCK_LP_11M = 0,
ATH12K_HW_RATE_CCK_LP_5_5M,
@@ -215,6 +217,8 @@ struct ath12k_hw_params {
const guid_t *acpi_guid;
bool supports_dynamic_smps_6ghz;
+
+ u32 iova_mask;
};
struct ath12k_hw_ops {
diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c
index 8106297f0bc1..ce41c8153080 100644
--- a/drivers/net/wireless/ath/ath12k/mac.c
+++ b/drivers/net/wireless/ath/ath12k/mac.c
@@ -9193,6 +9193,7 @@ static int ath12k_mac_hw_register(struct ath12k_hw *ah)
hw->vif_data_size = sizeof(struct ath12k_vif);
hw->sta_data_size = sizeof(struct ath12k_sta);
+ hw->extra_tx_headroom = ab->hw_params->iova_mask;
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST);
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_STA_TX_PWR);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081926-thrower-salaried-4605@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4e91fa1ef3ce ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume")
14d02fbadb5d ("i2c: qcom-geni: add desc struct to prepare support for I2C Master Hub variant")
d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
357ee8841d0b ("i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct")
9cb4c67d7717 ("Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)kernel.org>
Date: Mon, 12 Aug 2024 21:40:28 +0200
Subject: [PATCH] i2c: qcom-geni: Add missing geni_icc_disable in
geni_i2c_runtime_resume
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1(a)huawei.com>
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 365e37bba0f3..06e836e3e877 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
return ret;
ret = clk_prepare_enable(gi2c->core_clk);
- if (ret)
+ if (ret) {
+ geni_icc_disable(&gi2c->se);
return ret;
+ }
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-editor-flinch-ca97@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4e91fa1ef3ce ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume")
14d02fbadb5d ("i2c: qcom-geni: add desc struct to prepare support for I2C Master Hub variant")
d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)kernel.org>
Date: Mon, 12 Aug 2024 21:40:28 +0200
Subject: [PATCH] i2c: qcom-geni: Add missing geni_icc_disable in
geni_i2c_runtime_resume
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1(a)huawei.com>
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 365e37bba0f3..06e836e3e877 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
return ret;
ret = clk_prepare_enable(gi2c->core_clk);
- if (ret)
+ if (ret) {
+ geni_icc_disable(&gi2c->se);
return ret;
+ }
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-lapping-sedation-f06f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
85067747cf98 ("dm: do not use waitqueue for request-based DM")
087615bf3acd ("dm mpath: pass IO start time to path selector")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-voting-handwrite-258d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-comic-charcoal-8b91@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081924-fidelity-leverage-7546@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081923-move-improving-38ad@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 74c2ab6d653b4c2354df65a7f7f2df1925a40a51
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-sustained-humbly-8aaf@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
74c2ab6d653b ("smb/client: avoid possible NULL dereference in cifs_free_subrequest()")
519be989717c ("cifs: Add a tracepoint to track credits involved in R/W requests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74c2ab6d653b4c2354df65a7f7f2df1925a40a51 Mon Sep 17 00:00:00 2001
From: Su Hui <suhui(a)nfschina.com>
Date: Thu, 8 Aug 2024 20:23:32 +0800
Subject: [PATCH] smb/client: avoid possible NULL dereference in
cifs_free_subrequest()
Clang static checker (scan-build) warning:
cifsglob.h:line 890, column 3
Access to field 'ops' results in a dereference of a null pointer.
Commit 519be989717c ("cifs: Add a tracepoint to track credits involved in
R/W requests") adds a check for 'rdata->server', and let clang throw this
warning about NULL dereference.
When 'rdata->credits.value != 0 && rdata->server == NULL' happens,
add_credits_and_wake_if() will call rdata->server->ops->add_credits().
This will cause NULL dereference problem. Add a check for 'rdata->server'
to avoid NULL dereference.
Cc: stable(a)vger.kernel.org
Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..45459af5044d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -315,7 +315,7 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq)
#endif
}
- if (rdata->credits.value != 0)
+ if (rdata->credits.value != 0) {
trace_smb3_rw_credits(rdata->rreq->debug_id,
rdata->subreq.debug_index,
rdata->credits.value,
@@ -323,8 +323,12 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq)
rdata->server ? rdata->server->in_flight : 0,
-rdata->credits.value,
cifs_trace_rw_credits_free_subreq);
+ if (rdata->server)
+ add_credits_and_wake_if(rdata->server, &rdata->credits, 0);
+ else
+ rdata->credits.value = 0;
+ }
- add_credits_and_wake_if(rdata->server, &rdata->credits, 0);
if (rdata->have_xid)
free_xid(rdata->xid);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-antitoxic-kennel-6f2e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081914-divided-overhaul-0e25@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-dominoes-octagon-edcc@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-flame-used-1927@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081912-hesitate-manor-84c0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081910-dawn-handoff-e37b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dec92020671c ("drm: Use the state pointer directly in planes atomic_check")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_cursor.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
index db0a1eb53532..e460f5ba2d87 100644
--- a/drivers/gpu/drm/sti/sti_cursor.c
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -200,6 +200,8 @@ static int sti_cursor_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6e6f58a170ea98e44075b761f2da42a5aec47dfb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081916-anguished-snooze-7b66@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
6e6f58a170ea ("thermal: gov_bang_bang: Use governor_data to reduce overhead")
5f64b4a1ab1b ("thermal: gov_bang_bang: Add .manage() callback")
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6e6f58a170ea98e44075b761f2da42a5aec47dfb Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:29:11 +0200
Subject: [PATCH] thermal: gov_bang_bang: Use governor_data to reduce overhead
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.
However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index bc55e0698bfa..daed67d19efb 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -86,6 +86,10 @@ static void bang_bang_manage(struct thermal_zone_device *tz)
const struct thermal_trip_desc *td;
struct thermal_instance *instance;
+ /* If the code below has run already, nothing needs to be done. */
+ if (tz->governor_data)
+ return;
+
for_each_trip_desc(tz, td) {
const struct thermal_trip *trip = &td->trip;
@@ -107,11 +111,25 @@ static void bang_bang_manage(struct thermal_zone_device *tz)
bang_bang_set_instance_target(instance, 0);
}
}
+
+ tz->governor_data = (void *)true;
+}
+
+static void bang_bang_update_tz(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason)
+{
+ /*
+ * Let bang_bang_manage() know that it needs to walk trips after binding
+ * a new cdev and after system resume.
+ */
+ if (reason == THERMAL_TZ_BIND_CDEV || reason == THERMAL_TZ_RESUME)
+ tz->governor_data = NULL;
}
static struct thermal_governor thermal_gov_bang_bang = {
.name = "bang_bang",
.trip_crossed = bang_bang_control,
.manage = bang_bang_manage,
+ .update_tz = bang_bang_update_tz,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 95c399f94744..e6669aeda1ff 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1728,7 +1728,8 @@ static void thermal_zone_device_resume(struct work_struct *work)
thermal_debug_tz_resume(tz);
thermal_zone_device_init(tz);
- __thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+ thermal_governor_update_tz(tz, THERMAL_TZ_RESUME);
+ __thermal_zone_device_update(tz, THERMAL_TZ_RESUME);
complete(&tz->resume);
tz->resuming = false;
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 25fbf960b474..b86ddca46b9e 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -55,6 +55,7 @@ enum thermal_notify_event {
THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */
+ THERMAL_TZ_RESUME, /* Thermal zone is resuming after system sleep */
};
/**
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-parlor-reformer-542a@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
5f64b4a1ab1b ("thermal: gov_bang_bang: Add .manage() callback")
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:27:33 +0200
Subject: [PATCH] thermal: gov_bang_bang: Add .manage() callback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
After recent changes, the Bang-bang governor may not adjust the
initial configuration of cooling devices to the actual situation.
Namely, if a cooling device bound to a certain trip point starts in
the "on" state and the thermal zone temperature is below the threshold
of that trip point, the trip point may never be crossed on the way up
in which case the state of the cooling device will never be adjusted
because the thermal core will never invoke the governor's
.trip_crossed() callback. [Note that there is no issue if the zone
temperature is at the trip threshold or above it to start with because
.trip_crossed() will be invoked then to indicate the start of thermal
mitigation for the given trip.]
To address this, add a .manage() callback to the Bang-bang governor
and use it to ensure that all of the thermal instances managed by the
governor have been initialized properly and the states of all of the
cooling devices involved have been adjusted to the current zone
temperature as appropriate.
Fixes: 530c932bdf75 ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()")
Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.…
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index 87cff3ea77a9..bc55e0698bfa 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -26,6 +26,7 @@ static void bang_bang_set_instance_target(struct thermal_instance *instance,
* when the trip is crossed on the way down.
*/
instance->target = target;
+ instance->initialized = true;
dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
@@ -80,8 +81,37 @@ static void bang_bang_control(struct thermal_zone_device *tz,
}
}
+static void bang_bang_manage(struct thermal_zone_device *tz)
+{
+ const struct thermal_trip_desc *td;
+ struct thermal_instance *instance;
+
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
+ if (tz->temperature >= td->threshold ||
+ trip->temperature == THERMAL_TEMP_INVALID ||
+ trip->type == THERMAL_TRIP_CRITICAL ||
+ trip->type == THERMAL_TRIP_HOT)
+ continue;
+
+ /*
+ * If the initial cooling device state is "on", but the zone
+ * temperature is not above the trip point, the core will not
+ * call bang_bang_control() until the zone temperature reaches
+ * the trip point temperature which may be never. In those
+ * cases, set the initial state of the cooling device to 0.
+ */
+ list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+ if (!instance->initialized && instance->trip == trip)
+ bang_bang_set_instance_target(instance, 0);
+ }
+ }
+}
+
static struct thermal_governor thermal_gov_bang_bang = {
.name = "bang_bang",
.trip_crossed = bang_bang_control,
+ .manage = bang_bang_manage,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 84248e35d9b60e03df7276627e4e91fbaf80f73d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081904-sprite-urologist-d8e9@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 84248e35d9b60e03df7276627e4e91fbaf80f73d Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:26:42 +0200
Subject: [PATCH] thermal: gov_bang_bang: Split bang_bang_control()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Move the setting of the thermal instance target state from
bang_bang_control() into a separate function that will be also called
in a different place going forward.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index b9474c6af72b..87cff3ea77a9 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -13,6 +13,27 @@
#include "thermal_core.h"
+static void bang_bang_set_instance_target(struct thermal_instance *instance,
+ unsigned int target)
+{
+ if (instance->target != 0 && instance->target != 1 &&
+ instance->target != THERMAL_NO_TARGET)
+ pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n",
+ instance->target, instance->name);
+
+ /*
+ * Enable the fan when the trip is crossed on the way up and disable it
+ * when the trip is crossed on the way down.
+ */
+ instance->target = target;
+
+ dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
+
+ mutex_lock(&instance->cdev->lock);
+ __thermal_cdev_update(instance->cdev);
+ mutex_unlock(&instance->cdev->lock);
+}
+
/**
* bang_bang_control - controls devices associated with the given zone
* @tz: thermal_zone_device
@@ -54,25 +75,8 @@ static void bang_bang_control(struct thermal_zone_device *tz,
tz->temperature, trip->hysteresis);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
- if (instance->trip != trip)
- continue;
-
- if (instance->target != 0 && instance->target != 1 &&
- instance->target != THERMAL_NO_TARGET)
- pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n",
- instance->target, instance->name);
-
- /*
- * Enable the fan when the trip is crossed on the way up and
- * disable it when the trip is crossed on the way down.
- */
- instance->target = crossed_up;
-
- dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
-
- mutex_lock(&instance->cdev->lock);
- __thermal_cdev_update(instance->cdev);
- mutex_unlock(&instance->cdev->lock);
+ if (instance->trip == trip)
+ bang_bang_set_instance_target(instance, crossed_up);
}
}