This reverts commit 71598a5a7797f0052aaa7bcff0b8d4b8f20f1441.
This commit introduced a regression, however the fix for the
regression:
aa5fc4362fac ("drm/amdgpu: fix task hang from failed job submission during process kill")
depends on things not yet present in 6.12.y and older kernels. Since
this commit is more of an optimization, just revert it for
6.12.y and older stable kernels.
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.1.x - 6.12.x
---
Please apply this revert to 6.1.x to 6.12.x stable trees. The newer
stable trees and Linus' tree already have the regression fix.
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0adb106e2c42..37d53578825b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2292,11 +2292,13 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint32_t min_vm_size,
*/
long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout)
{
- timeout = drm_sched_entity_flush(&vm->immediate, timeout);
+ timeout = dma_resv_wait_timeout(vm->root.bo->tbo.base.resv,
+ DMA_RESV_USAGE_BOOKKEEP,
+ true, timeout);
if (timeout <= 0)
return timeout;
- return drm_sched_entity_flush(&vm->delayed, timeout);
+ return dma_fence_wait_timeout(vm->last_unlocked, true, timeout);
}
static void amdgpu_vm_destroy_task_info(struct kref *kref)
--
2.51.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1f403699c40f0806a707a9a6eed3b8904224021a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025090146-playback-kinsman-373c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1f403699c40f0806a707a9a6eed3b8904224021a Mon Sep 17 00:00:00 2001
From: Ma Ke <make24(a)iscas.ac.cn>
Date: Tue, 12 Aug 2025 15:19:32 +0800
Subject: [PATCH] drm/mediatek: Fix device/node reference count leaks in
mtk_drm_get_all_drm_priv
Using device_find_child() and of_find_device_by_node() to locate
devices could cause an imbalance in the device's reference count.
device_find_child() and of_find_device_by_node() both call
get_device() to increment the reference count of the found device
before returning the pointer. In mtk_drm_get_all_drm_priv(), these
references are never released through put_device(), resulting in
permanent reference count increments. Additionally, the
for_each_child_of_node() iterator fails to release node references in
all code paths. This leaks device node references when loop
termination occurs before reaching MAX_CRTC. These reference count
leaks may prevent device/node resources from being properly released
during driver unbind operations.
As comment of device_find_child() says, 'NOTE: you will need to drop
the reference with put_device() after use'.
Cc: stable(a)vger.kernel.org
Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
Reviewed-by: CK Hu <ck.hu(a)mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250812071932.471730-…
Signed-off-by: Chun-Kuang Hu <chunkuang.hu(a)kernel.org>
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index d5e6bab36414..f8a817689e16 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -387,19 +387,19 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
of_id = of_match_node(mtk_drm_of_ids, node);
if (!of_id)
- continue;
+ goto next_put_node;
pdev = of_find_device_by_node(node);
if (!pdev)
- continue;
+ goto next_put_node;
drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match);
if (!drm_dev)
- continue;
+ goto next_put_device_pdev_dev;
temp_drm_priv = dev_get_drvdata(drm_dev);
if (!temp_drm_priv)
- continue;
+ goto next_put_device_drm_dev;
if (temp_drm_priv->data->main_len)
all_drm_priv[CRTC_MAIN] = temp_drm_priv;
@@ -411,10 +411,17 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
if (temp_drm_priv->mtk_drm_bound)
cnt++;
- if (cnt == MAX_CRTC) {
- of_node_put(node);
+next_put_device_drm_dev:
+ put_device(drm_dev);
+
+next_put_device_pdev_dev:
+ put_device(&pdev->dev);
+
+next_put_node:
+ of_node_put(node);
+
+ if (cnt == MAX_CRTC)
break;
- }
}
if (drm_priv->data->mmsys_dev_num == cnt) {
The current implementation of CXL memory hotplug notifier gets called
before the HMAT memory hotplug notifier. The CXL driver calculates the
access coordinates (bandwidth and latency values) for the CXL end to
end path (i.e. CPU to endpoint). When the CXL region is onlined, the CXL
memory hotplug notifier writes the access coordinates to the HMAT target
structs. Then the HMAT memory hotplug notifier is called and it creates
the access coordinates for the node sysfs attributes.
During testing on an Intel platform, it was found that although the
newly calculated coordinates were pushed to sysfs, the sysfs attributes for
the access coordinates showed up with the wrong initiator. The system has
4 nodes (0, 1, 2, 3) where node 0 and 1 are CPU nodes and node 2 and 3 are
CXL nodes. The expectation is that node 2 would show up as a target to node
0:
/sys/devices/system/node/node2/access0/initiators/node0
However it was observed that node 2 showed up as a target under node 1:
/sys/devices/system/node/node2/access0/initiators/node1
The original intent of the 'ext_updated' flag in HMAT handling code was to
stop HMAT memory hotplug callback from clobbering the access coordinates
after CXL has injected its calculated coordinates and replaced the generic
target access coordinates provided by the HMAT table in the HMAT target
structs. However the flag is hacky at best and blocks the updates from
other CXL regions that are onlined in the same node later on. Remove the
'ext_updated' flag usage and just update the access coordinates for the
nodes directly without touching HMAT target data.
The hotplug memory callback ordering is changed. Instead of changing CXL,
move HMAT back so there's room for the levels rather than have CXL share
the same level as SLAB_CALLBACK_PRI. The change will resulting in the CXL
callback to be executed after the HMAT callback.
With the change, the CXL hotplug memory notifier runs after the HMAT
callback. The HMAT callback will create the node sysfs attributes for
access coordinates. The CXL callback will write the access coordinates to
the now created node sysfs attributes directly and will not pollute the
HMAT target values.
A nodemask is introduced to keep track if a node has been updated and
prevents further updates.
Fixes: 067353a46d8c ("cxl/region: Add memory hotplug notifier for cxl region")
Cc: stable(a)vger.kernel.org
Tested-by: Marc Herbert <marc.herbert(a)linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Dave Jiang <dave.jiang(a)intel.com>
---
v3:
- Use nodemask instead of xarray to keep track of node updates (Jonathan)
---
drivers/acpi/numa/hmat.c | 6 ------
drivers/cxl/core/cdat.c | 5 -----
drivers/cxl/core/core.h | 1 -
drivers/cxl/core/region.c | 20 ++++++++++++--------
include/linux/memory.h | 2 +-
5 files changed, 13 insertions(+), 21 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 4958301f5417..5d32490dc4ab 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -74,7 +74,6 @@ struct memory_target {
struct node_cache_attrs cache_attrs;
u8 gen_port_device_handle[ACPI_SRAT_DEVICE_HANDLE_SIZE];
bool registered;
- bool ext_updated; /* externally updated */
};
struct memory_initiator {
@@ -391,7 +390,6 @@ int hmat_update_target_coordinates(int nid, struct access_coordinate *coord,
coord->read_bandwidth, access);
hmat_update_target_access(target, ACPI_HMAT_WRITE_BANDWIDTH,
coord->write_bandwidth, access);
- target->ext_updated = true;
return 0;
}
@@ -773,10 +771,6 @@ static void hmat_update_target_attrs(struct memory_target *target,
u32 best = 0;
int i;
- /* Don't update if an external agent has changed the data. */
- if (target->ext_updated)
- return;
-
/* Don't update for generic port if there's no device handle */
if ((access == NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL ||
access == NODE_ACCESS_CLASS_GENPORT_SINK_CPU) &&
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index c0af645425f4..c891fd618cfd 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -1081,8 +1081,3 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr,
{
return hmat_update_target_coordinates(nid, &cxlr->coord[access], access);
}
-
-bool cxl_need_node_perf_attrs_update(int nid)
-{
- return !acpi_node_backed_by_real_pxm(nid);
-}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 2669f251d677..a253d308f3c9 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -139,7 +139,6 @@ long cxl_pci_get_latency(struct pci_dev *pdev);
int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c);
int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr,
enum access_coordinate_class access);
-bool cxl_need_node_perf_attrs_update(int nid);
int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
struct access_coordinate *c);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 71cc42d05248..0ed95cbc5d5b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -30,6 +30,12 @@
* 3. Decoder targets
*/
+/*
+ * nodemask that sets per node when the access_coordinates for the node has
+ * been updated by the CXL memory hotplug notifier.
+ */
+static nodemask_t nodemask_region_seen = NODE_MASK_NONE;
+
static struct cxl_region *to_cxl_region(struct device *dev);
#define __ACCESS_ATTR_RO(_level, _name) { \
@@ -2442,14 +2448,8 @@ static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
if (cxlr->coord[i].read_bandwidth) {
- rc = 0;
- if (cxl_need_node_perf_attrs_update(nid))
- node_set_perf_attrs(nid, &cxlr->coord[i], i);
- else
- rc = cxl_update_hmat_access_coordinates(nid, cxlr, i);
-
- if (rc == 0)
- cset++;
+ node_update_perf_attrs(nid, &cxlr->coord[i], i);
+ cset++;
}
}
@@ -2487,6 +2487,10 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
if (nid != region_nid)
return NOTIFY_DONE;
+ /* No action needed if node bit already set */
+ if (node_test_and_set(nid, nodemask_region_seen))
+ return NOTIFY_DONE;
+
if (!cxl_region_update_coordinates(cxlr, nid))
return NOTIFY_DONE;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 1305102688d0..0b755d1ef1ec 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -120,8 +120,8 @@ struct mem_section;
*/
#define DEFAULT_CALLBACK_PRI 0
#define SLAB_CALLBACK_PRI 1
-#define HMAT_CALLBACK_PRI 2
#define CXL_CALLBACK_PRI 5
+#define HMAT_CALLBACK_PRI 6
#define MM_COMPUTE_BATCH_PRI 10
#define CPUSET_CALLBACK_PRI 10
#define MEMTIER_HOTPLUG_PRI 100
--
2.50.1
Hi Dr. Sam Lavi Cosmetic And Implant Dentistry ,
AI adoption in healthcare isn’t coming — it’s already here.
Some implant clinics are quietly using AI to handle patient calls, consultations, and scheduling.
Their growth isn’t a coincidence.
Being first gives them an edge — being late means playing catch-up.
Should I share a 2-min demo video so you can see how it works?
Reply 'Video' and I’ll send it.
—
Best regards,
Mohanish Ved
AI Growth Specialist
EditRage Solutions
VIRQs come in 3 flavors, per-VPU, per-domain, and global, and the VIRQs
are tracked in per-cpu virq_to_irq arrays.
Per-domain and global VIRQs must be bound on CPU 0, and
bind_virq_to_irq() sets the per_cpu virq_to_irq at registration time
Later, the interrupt can migrate, and info->cpu is updated. When
calling __unbind_from_irq(), the per-cpu virq_to_irq is cleared for a
different cpu. If bind_virq_to_irq() is called again with CPU 0, the
stale irq is returned. There won't be any irq_info for the irq, so
things break.
Make xen_rebind_evtchn_to_cpu() update the per_cpu virq_to_irq mappings
to keep them update to date with the current cpu. This ensures the
correct virq_to_irq is cleared in __unbind_from_irq().
Fixes: e46cdb66c8fc ("xen: event channels")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
v3:
Kernel style brace placement
Delay setting old_cpu and tighten scope of variable
v2:
Different approach changing virq_to_irq
---
drivers/xen/events/events_base.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index b060b5a95f45..9478fae014e5 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1797,9 +1797,20 @@ static int xen_rebind_evtchn_to_cpu(struct irq_info *info, unsigned int tcpu)
* virq or IPI channel, which don't actually need to be rebound. Ignore
* it, but don't do the xenlinux-level rebind in that case.
*/
- if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_vcpu, &bind_vcpu) >= 0)
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_vcpu, &bind_vcpu) >= 0) {
+ int old_cpu = info->cpu;
+
bind_evtchn_to_cpu(info, tcpu, false);
+ if (info->type == IRQT_VIRQ) {
+ int virq = info->u.virq;
+ int irq = per_cpu(virq_to_irq, old_cpu)[virq];
+
+ per_cpu(virq_to_irq, old_cpu)[virq] = -1;
+ per_cpu(virq_to_irq, tcpu)[virq] = irq;
+ }
+ }
+
do_unmask(info, EVT_MASK_REASON_TEMPORARY);
return 0;
--
2.34.1
Change find_virq() to return -EEXIST when a VIRQ is bound to a
different CPU than the one passed in. With that, remove the BUG_ON()
from bind_virq_to_irq() to propogate the error upwards.
Some VIRQs are per-cpu, but others are per-domain or global. Those must
be bound to CPU0 and can then migrate elsewhere. The lookup for
per-domain and global will probably fail when migrated off CPU 0,
especially when the current CPU is tracked. This now returns -EEXIST
instead of BUG_ON().
A second call to bind a per-domain or global VIRQ is not expected, but
make it non-fatal to avoid trying to look up the irq, since we don't
know which per_cpu(virq_to_irq) it will be in.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
v3:
Cc: stable as a pre-req for the subsequent virg tracking change
Call __unbind_from_irq() on error ro avoid leaking info
v2:
New
---
drivers/xen/events/events_base.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 374231d84e4f..b060b5a95f45 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1314,10 +1314,12 @@ int bind_interdomain_evtchn_to_irq_lateeoi(struct xenbus_device *dev,
}
EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irq_lateeoi);
-static int find_virq(unsigned int virq, unsigned int cpu, evtchn_port_t *evtchn)
+static int find_virq(unsigned int virq, unsigned int cpu, evtchn_port_t *evtchn,
+ bool percpu)
{
struct evtchn_status status;
evtchn_port_t port;
+ bool exists = false;
memset(&status, 0, sizeof(status));
for (port = 0; port < xen_evtchn_max_channels(); port++) {
@@ -1330,12 +1332,16 @@ static int find_virq(unsigned int virq, unsigned int cpu, evtchn_port_t *evtchn)
continue;
if (status.status != EVTCHNSTAT_virq)
continue;
- if (status.u.virq == virq && status.vcpu == xen_vcpu_nr(cpu)) {
+ if (status.u.virq != virq)
+ continue;
+ if (status.vcpu == xen_vcpu_nr(cpu)) {
*evtchn = port;
return 0;
+ } else if (!percpu) {
+ exists = true;
}
}
- return -ENOENT;
+ return exists ? -EEXIST : -ENOENT;
}
/**
@@ -1382,8 +1388,11 @@ int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu)
evtchn = bind_virq.port;
else {
if (ret == -EEXIST)
- ret = find_virq(virq, cpu, &evtchn);
- BUG_ON(ret < 0);
+ ret = find_virq(virq, cpu, &evtchn, percpu);
+ if (ret) {
+ __unbind_from_irq(info, info->irq);
+ goto out;
+ }
}
ret = xen_irq_info_virq_setup(info, cpu, evtchn, virq);
--
2.34.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ae668cd567a6a7622bc813ee0bb61c42bed61ba7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025090112-skillet-muskiness-5948@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ae668cd567a6a7622bc813ee0bb61c42bed61ba7 Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen(a)redhat.com>
Date: Fri, 22 Aug 2025 12:55:56 -0500
Subject: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code;
namely, that the requested attribute name could not be found.
However, a medium error from disk may also return ENODATA. At best,
this medium error may escape to userspace as "attribute not found"
when in fact it's an IO (disk) error.
At worst, we may oops in xfs_attr_leaf_get() when we do:
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
xfs_trans_brelse(args->trans, bp);
return error;
}
because an ENODATA/ENOATTR error from disk leaves us with a null bp,
and the xfs_trans_brelse will then null-deref it.
As discussed on the list, we really need to modify the lower level
IO functions to trap all disk errors and ensure that we don't let
unique errors like this leak up into higher xfs functions - many
like this should be remapped to EIO.
However, this patch directly addresses a reported bug in the xattr
code, and should be safe to backport to stable kernels. A larger-scope
patch to handle more unique errors at lower levels can follow later.
(Note, prior to 07120f1abdff we did not oops, but we did return the
wrong error code to userspace.)
Signed-off-by: Eric Sandeen <sandeen(a)redhat.com>
Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
Cc: stable(a)vger.kernel.org # v5.9+
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Carlos Maiolino <cem(a)kernel.org>
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 4c44ce1c8a64..bff3dc226f81 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -435,6 +435,13 @@ xfs_attr_rmtval_get(
0, &bp, &xfs_attr3_rmt_buf_ops);
if (xfs_metadata_is_sick(error))
xfs_dirattr_mark_sick(args->dp, XFS_ATTR_FORK);
+ /*
+ * ENODATA from disk implies a disk medium failure;
+ * ENODATA for xattrs means attribute not found, so
+ * disambiguate that here.
+ */
+ if (error == -ENODATA)
+ error = -EIO;
if (error)
return error;
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 17d9e6154f19..723a0643b838 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2833,6 +2833,12 @@ xfs_da_read_buf(
&bp, ops);
if (xfs_metadata_is_sick(error))
xfs_dirattr_mark_sick(dp, whichfork);
+ /*
+ * ENODATA from disk implies a disk medium failure; ENODATA for
+ * xattrs means attribute not found, so disambiguate that here.
+ */
+ if (error == -ENODATA && whichfork == XFS_ATTR_FORK)
+ error = -EIO;
if (error)
goto out_free;