Set the node limit of the root node so that the last pivot of all nodes
is the node limit (if the node is not full).
This patch also fixes a bug in mas_rev_awalk(). Effectively, always
setting a maximum makes mas_logical_pivot() behave as mas_safe_pivot().
Without this fix, it is possible that very small tasks would fail to
find the correct gap. Although this has not been observed with real
tasks, it has been reported to happen in m68k nommu running the maple
tree tests.
Link: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg…
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
lib/maple_tree.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index d3072858c280..f55e59bd9122 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -3692,7 +3692,8 @@ static inline int mas_root_expand(struct ma_state *mas, void *entry)
mas->offset = slot;
pivots[slot] = mas->last;
if (mas->last != ULONG_MAX)
- slot++;
+ pivots[++slot] = ULONG_MAX;
+
mas->depth = 1;
mas_set_height(mas);
ma_set_meta(node, maple_leaf_64, 0, slot);
--
2.20.1
On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote:
> Hi,
>
> This box is newly booted into linux v6.1.35 (2 days ago), it was previously
> running v5.15.118 without any problems (other than that fixed by
> "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the
> upgrade).
>
> I have rm operations on two files that have been stuck for in excess of 22
> hours and 18 hours respectively:
>
> $ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm
> PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD
> 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp
> 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp
> 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
> 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
> 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
> 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
>
> The "rm"s are run from a process that's obviously tried a few times to get
> rid of these files.
>
> There's nothing extraordinary about the files in terms of size:
>
> $ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp
> -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp
> -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp
>
> As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the
> first one run on each file) stack trace looks like:
>
> [<0>] vfs_unlink+0x48/0x270
> [<0>] do_unlinkat+0x1f5/0x290
> [<0>] __x64_sys_unlinkat+0x3b/0x60
> [<0>] do_syscall_64+0x34/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> And each "secondary" rm (i.e. the subsequent ones on each file) stack trace
> looks like:
>
> == blog-230710-xfs-rm-stuckd
> [<0>] down_write_nested+0xdc/0x100
> [<0>] do_unlinkat+0x10d/0x290
> [<0>] __x64_sys_unlinkat+0x3b/0x60
> [<0>] do_syscall_64+0x34/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Multiple kernel strack traces don't show vfs_unlink or anything related that
> I can see, or anything else consistent or otherwise interesting. Most cores
> are idle.
>
> Each of /aaa and /bbb are separate XFS filesystems:
>
> $ xfs_info /aaa
> meta-data=/dev/mapper/aaa isize=512 agcount=2, agsize=268434432 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=1
> = reflink=1 bigtime=1 inobtcount=1
> data = bsize=4096 blocks=536868864, imaxpct=5
> = sunit=256 swidth=256 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=262143, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> $ xfs_info /bbb
> meta-data=/dev/mapper/bbb isize=512 agcount=8, agsize=268434432 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=1
> = reflink=1 bigtime=1 inobtcount=1
> data = bsize=4096 blocks=1879047168, imaxpct=5
> = sunit=256 swidth=256 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=521728, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> There's plenty of free space at the fs level:
>
> $ df -h /aaa /bbb
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/aaa 2.0T 551G 1.5T 27% /aaa
> /dev/mapper/bbb 7.0T 3.6T 3.5T 52% /bbb
>
> The fses are on sparse ceph/rbd volumes, the underlying storage tells me
> they're 50-60% utilised:
>
> aaa: provisioned="2048G" used="1015.9G"
> bbb: provisioned="7168G" used="4925.3G"
>
> Where to from here?
>
> I'm guessing only a reboot is going to unstick this. Anything I should be
> looking at before reverting to v5.15.118?
>
> ...subsequent to starting writing all this down I have another two sets of
> rms stuck, again on unremarkable files, and on two more separate
> filesystems.
>
> ...oh. And an 'ls' on those files is hanging. The reboot has become more
> urgent.
>
Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not
completely fix your reported blocking regression earlier?
I'm kinda confused...
--
An old man doll... just what I always wanted! - Clara
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.
do {
xrandr --output Virtual --rotate inverted
xrandr --output Virtual --rotate right
xrandr --output Virtual --rotate left
xrandr --output Virtual --rotate normal
} while (1);
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable
It's caused by a stuck in lock dependency in such scenario on different
CPUs.
CPU1 CPU2
drm_crtc_vblank_off hrtimer_interrupt
grab event_lock (irq disabled) __hrtimer_run_queues
grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate
amdgpu_vkms_disable_vblank drm_handle_vblank
hrtimer_cancel grab dev->event_lock
So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.
So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.
v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well
Fixes: 84ec374bd580("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable(a)vger.kernel.org
Suggested-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Guchun Chen <guchun.chen(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 53ff91fc6cf6..44d704306f44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -46,7 +46,10 @@ static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer)
struct amdgpu_crtc *amdgpu_crtc = container_of(timer, struct amdgpu_crtc, vblank_timer);
struct drm_crtc *crtc = &amdgpu_crtc->base;
struct amdgpu_vkms_output *output = drm_crtc_to_amdgpu_vkms_output(crtc);
+ struct drm_vblank_crtc *vblank;
+ struct drm_device *dev;
u64 ret_overrun;
+ unsigned int pipe;
bool ret;
ret_overrun = hrtimer_forward_now(&amdgpu_crtc->vblank_timer,
@@ -54,9 +57,15 @@ static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer)
if (ret_overrun != 1)
DRM_WARN("%s: vblank timer overrun\n", __func__);
+ dev = crtc->dev;
+ pipe = drm_crtc_index(crtc);
+ vblank = &dev->vblank[pipe];
ret = drm_crtc_handle_vblank(crtc);
- if (!ret)
- DRM_ERROR("amdgpu_vkms failure on handling vblank");
+ if (!ret && !READ_ONCE(vblank->enabled)) {
+ /* Don't queue timer again when vblank is disabled. */
+ DRM_WARN("amdgpu_vkms failure on handling vblank\n");
+ return HRTIMER_NORESTART;
+ }
return HRTIMER_RESTART;
}
@@ -81,7 +90,7 @@ static void amdgpu_vkms_disable_vblank(struct drm_crtc *crtc)
{
struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
- hrtimer_cancel(&amdgpu_crtc->vblank_timer);
+ hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);
}
static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc,
--
2.25.1
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
sets the policy that all PCIe ports are allowed to use D3. When
the system is suspended if the port is not power manageable by the
platform and won't be used for wakeup via a PME this sets up the
policy for these ports to go into D3hot.
This policy generally makes sense from an OSPM perspective but it leads
to problems with wakeup from suspend on laptops with AMD chips:
- On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
interrupt.
- On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
Add a quirk for the PCI device ID used by the problematic root port on
both chips to ensure that these root ports are not put into D3hot at
suspend.
Cc: stable(a)vger.kernel.org # 6.1+
Reported-by: Iain Lane <iain(a)orangesquash.org.uk>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-exte…
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/pci/quirks.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 321156ca273d5..e0346073e5855 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3867,6 +3867,22 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
quirk_apple_poweroff_thunderbolt);
+
+/*
+ * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
+ * may cause problems when the system attempts wake up from s2idle.
+ *
+ * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
+ * interrupt.
+ * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
+ */
+static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
+{
+ if (!acpi_pci_power_manageable(pdev))
+ pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
#endif
/*
--
2.34.1
The HP Elite mt645 thin client exhibits resume times of over one
minute when using the normal nvme resume path. BIOS has tried to work
around this by setting the "StorageD3Enable" ACPI property, but only
if it detected the "Linux-HPI-Hybrid-Graphics" _OSI() flag. This flag
does not exist, so the BIOS workaround can't work.
Instead, just set NVME_QUIRK_SIMPLE_SUSPEND when running on an mt645.
The DMI_PRODUCT_NAME cannot be used because this string can be changed
in the field. Match against DMI_BOARD_NAME, which should be immutable.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexandru Gagniuc <alexandru.gagniuc(a)hp.com>
---
drivers/nvme/host/pci.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 492f319ebdf3..25b59f5ce874 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2897,6 +2897,15 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
if ((dmi_match(DMI_BOARD_VENDOR, "LENOVO")) &&
dmi_match(DMI_BOARD_NAME, "LNVNB161216"))
return NVME_QUIRK_SIMPLE_SUSPEND;
+ } else if (dmi_match(DMI_SYS_VENDOR, "HP") &&
+ (dmi_match(DMI_BOARD_NAME, "8B0F") ||
+ dmi_match(DMI_BOARD_NAME, "8B59"))) {
+ /*
+ * Force simple suspend to work around long resume latencies
+ * (1 minute or longer).
+ */
+ dev_info(&pdev->dev, "simple suspend quirk for HP mt645\n");
+ return NVME_QUIRK_SIMPLE_SUSPEND;
}
return 0;
--
2.39.1
From: Souradeep Chakrabarti <schakrabarti(a)linux.microsoft.com>
When unloading the MANA driver, mana_dealloc_queues() waits for the MANA
hardware to complete any inflight packets and set the pending send count
to zero. But if the hardware has failed, mana_dealloc_queues()
could wait forever.
Fix this by adding a timeout to the wait. Set the timeout to 120 seconds,
which is a somewhat arbitrary value that is more than long enough for
functional hardware to complete any sends.
Signed-off-by: Souradeep Chakrabarti <schakrabarti(a)linux.microsoft.com>
---
V3 -> V4:
* Fixed the commit message to describe the context.
* Removed the vf_unload_timeout, as it is not required.
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 26 ++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a499e460594b..d26f1da70411 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
{
struct mana_port_context *apc = netdev_priv(ndev);
struct gdma_dev *gd = apc->ac->gdma_dev;
+ unsigned long timeout;
struct mana_txq *txq;
+ struct sk_buff *skb;
+ struct mana_cq *cq;
int i, err;
if (apc->port_is_up)
@@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct net_device *ndev)
* to false, but it doesn't matter since mana_start_xmit() drops any
* new packets due to apc->port_is_up being false.
*
- * Drain all the in-flight TX packets
+ * Drain all the in-flight TX packets.
+ * A timeout of 120 seconds for all the queues is used.
+ * This will break the while loop when h/w is not responding.
+ * This value of 120 has been decided here considering max
+ * number of queues.
*/
+
+ timeout = jiffies + 120 * HZ;
for (i = 0; i < apc->num_queues; i++) {
txq = &apc->tx_qp[i].txq;
-
- while (atomic_read(&txq->pending_sends) > 0)
+ while (atomic_read(&txq->pending_sends) > 0 &&
+ time_before(jiffies, timeout)) {
usleep_range(1000, 2000);
+ }
}
+ for (i = 0; i < apc->num_queues; i++) {
+ txq = &apc->tx_qp[i].txq;
+ cq = &apc->tx_qp[i].tx_cq;
+ while (atomic_read(&txq->pending_sends)) {
+ skb = skb_dequeue(&txq->pending_skbs);
+ mana_unmap_skb(skb, apc);
+ napi_consume_skb(skb, cq->budget);
+ atomic_sub(1, &txq->pending_sends);
+ }
+ }
/* We're 100% sure the queues can no longer be woken up, because
* we're sure now mana_poll_tx_cq() can't be running.
*/
--
2.34.1
recv_data either returns the number of received bytes, or a negative value
representing an error code. Adding the return value directly to the total
number of received bytes therefore looks a little weird, since it might add
a negative error code to a sum of bytes.
The following check for size < expected usually makes the function return
ETIME in that case, so it does not cause too many problems in practice. But
to make the code look cleaner and because the caller might still be
interested in the original error code, explicitly check for the presence of
an error code and pass that through.
Cc: stable(a)vger.kernel.org
Fixes: cb5354253af2 ("[PATCH] tpm: spacing cleanups 2")
Signed-off-by: Alexander Steffen <Alexander.Steffen(a)infineon.com>
---
drivers/char/tpm/tpm_tis_core.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 558144fa707a..aaaa136044ae 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -363,8 +363,13 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count)
goto out;
}
- size += recv_data(chip, &buf[TPM_HEADER_SIZE],
- expected - TPM_HEADER_SIZE);
+ rc = recv_data(chip, &buf[TPM_HEADER_SIZE],
+ expected - TPM_HEADER_SIZE);
+ if (rc < 0) {
+ size = rc;
+ goto out;
+ }
+ size += rc;
if (size < expected) {
dev_err(&chip->dev, "Unable to read remainder of result\n");
size = -ETIME;
--
2.25.1
Since commit 241d2fb56a18 ("of: Make OF framebuffer device names unique"),
as spotted by Frédéric Bonnard, the historical "of-display" device is
gone: the updated logic creates "of-display.0" instead, then as many
"of-display.N" as required.
This means that offb no longer finds the expected device, which prevents
the Debian Installer from setting up its interface, at least on ppc64el.
Given the code similarity it is likely to affect ofdrm in the same way.
It might be better to iterate on all possible nodes, but updating the
hardcoded device from "of-display" to "of-display.0" is likely to help
as a first step.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217328
Link: https://bugs.debian.org/1033058
Fixes: 241d2fb56a18 ("of: Make OF framebuffer device names unique")
Cc: stable(a)vger.kernel.org # v6.2+
Signed-off-by: Cyril Brulebois <cyril(a)debamax.com>
---
drivers/gpu/drm/tiny/ofdrm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index 6e349ca42485..92df021d71df 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -1390,7 +1390,7 @@ MODULE_DEVICE_TABLE(of, ofdrm_of_match_display);
static struct platform_driver ofdrm_platform_driver = {
.driver = {
- .name = "of-display",
+ .name = "of-display.0",
.of_match_table = ofdrm_of_match_display,
},
.probe = ofdrm_probe,
--
2.30.2