This reverts commit b3b274bc9d3d7307308aeaf75f70731765ac999a.
On the DragonBoard 820c (which uses APQ8096/MSM8996) this change causes
the CPUs to downclock to roughly half speed under sustained load. The
regression is visible both during boot and when running CPU stress
workloads such as stress-ng: the CPUs initially ramp up to the expected
frequency, then drop to a lower OPP even though the system is clearly
CPU-bound.
Bisecting points to this commit and reverting it restores the expected
behaviour on the DragonBoard 820c - the CPUs track the cpufreq policy
and run at full performance under load.
The exact interaction with the ACD is not yet fully understood and we
would like to keep ACD in use to avoid possible SoC reliability issues.
Until we have a better fix that preserves ACD while avoiding this
performance regression, revert the bisected patch to restore the
previous behaviour.
Fixes: b3b274bc9d3d ("clk: qcom: cpu-8996: simplify the cpu_clk_notifier_cb")
Cc: stable(a)vger.kernel.org # v6.3+
Link: https://lore.kernel.org/linux-arm-msm/20230113120544.59320-8-dmitry.baryshk…
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)oss.qualcomm.com>
Signed-off-by: Christopher Obbard <christopher.obbard(a)linaro.org>
---
Hi all,
This series contains a single revert for a regression affecting the
APQ8096/MSM8996 (DragonBoard 820c).
The commit being reverted, b3b274bc9d3d ("clk: qcom: cpu-8996: simplify the cpu_clk_notifier_cb"),
introduces a significant performance issue where the CPUs downclock to
~50% of their expected frequency under sustained load. The problem is
reproducible both at boot and when running CPU-bound workloads such as
stress-ng.
Bisecting the issue pointed directly to this commit and reverting it
restores correct cpufreq behaviour.
The root cause appears to be related to the interaction between the
simplified notifier callback and ACD (Adaptive Clock Distribution).
Since we would prefer to keep ACD enabled for SoC reliability reasons,
a revert is the safest option until a proper fix is identified.
Full details are included in the commit message.
Feedback & suggestions welcome.
Cheers!
Christopher Obbard
---
drivers/clk/qcom/clk-cpu-8996.c | 30 +++++++++++-------------------
1 file changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/clk/qcom/clk-cpu-8996.c b/drivers/clk/qcom/clk-cpu-8996.c
index 21d13c0841ed..028476931747 100644
--- a/drivers/clk/qcom/clk-cpu-8996.c
+++ b/drivers/clk/qcom/clk-cpu-8996.c
@@ -547,35 +547,27 @@ static int cpu_clk_notifier_cb(struct notifier_block *nb, unsigned long event,
{
struct clk_cpu_8996_pmux *cpuclk = to_clk_cpu_8996_pmux_nb(nb);
struct clk_notifier_data *cnd = data;
+ int ret;
switch (event) {
case PRE_RATE_CHANGE:
+ ret = clk_cpu_8996_pmux_set_parent(&cpuclk->clkr.hw, ALT_INDEX);
qcom_cpu_clk_msm8996_acd_init(cpuclk->clkr.regmap);
-
- /*
- * Avoid overvolting. clk_core_set_rate_nolock() walks from top
- * to bottom, so it will change the rate of the PLL before
- * chaging the parent of PMUX. This can result in pmux getting
- * clocked twice the expected rate.
- *
- * Manually switch to PLL/2 here.
- */
- if (cnd->new_rate < DIV_2_THRESHOLD &&
- cnd->old_rate > DIV_2_THRESHOLD)
- clk_cpu_8996_pmux_set_parent(&cpuclk->clkr.hw, SMUX_INDEX);
-
break;
- case ABORT_RATE_CHANGE:
- /* Revert manual change */
- if (cnd->new_rate < DIV_2_THRESHOLD &&
- cnd->old_rate > DIV_2_THRESHOLD)
- clk_cpu_8996_pmux_set_parent(&cpuclk->clkr.hw, ACD_INDEX);
+ case POST_RATE_CHANGE:
+ if (cnd->new_rate < DIV_2_THRESHOLD)
+ ret = clk_cpu_8996_pmux_set_parent(&cpuclk->clkr.hw,
+ SMUX_INDEX);
+ else
+ ret = clk_cpu_8996_pmux_set_parent(&cpuclk->clkr.hw,
+ ACD_INDEX);
break;
default:
+ ret = 0;
break;
}
- return NOTIFY_OK;
+ return notifier_from_errno(ret);
};
static int qcom_cpu_clk_msm8996_driver_probe(struct platform_device *pdev)
---
base-commit: c17e270dfb342a782d69c4a7c4c32980455afd9c
change-id: 20251202-wip-obbardc-qcom-msm8096-clk-cpu-fix-downclock-b7561da4cb95
Best regards,
--
Christopher Obbard <christopher.obbard(a)linaro.org>
As reported by Athul upstream in [1], there is a userspace regression caused
by commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb
tree") where if there is a bug in a fuse server that causes the server to
never complete writeback, it will make wait_sb_inodes() wait forever, causing
sync paths to hang.
This is a resubmission of this patch [2] that was dropped from the original
series due to a buggy/malicious server still being able to hold up sync() /
the system in other ways if they wanted to, but the wait_sb_inodes() path is
particularly common and easier to hit for malfunctioning servers.
Thanks,
Joanne
[1] https://lore.kernel.org/regressions/CAJnrk1ZjQ8W8NzojsvJPRXiv9TuYPNdj8Ye7=C…
[2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-4-joannelkoong@…
Joanne Koong (2):
mm: rename AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM to
AS_WRITEBACK_MAY_HANG
fs/writeback: skip inodes with potential writeback hang in
wait_sb_inodes()
fs/fs-writeback.c | 3 +++
fs/fuse/file.c | 2 +-
include/linux/pagemap.h | 10 +++++-----
mm/vmscan.c | 3 +--
4 files changed, 10 insertions(+), 8 deletions(-)
--
2.47.3
New kernel version, new warnings.
This series only introduces a new patch:
media: iris: Document difference in size during allocation
The other two have been already sent to linux-media or linux-next ML,
but they have not found their way into the tree.
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Jacopo Mondi (1):
media: uapi: c3-isp: Fix documentation warning
Ricardo Ribalda (2):
media: iris: Document difference in size during allocation
media: iris: Fix fps calculation
drivers/media/platform/qcom/iris/iris_hfi_gen2_command.c | 10 +++++++++-
drivers/media/platform/qcom/iris/iris_venc.c | 5 ++---
include/uapi/linux/media/amlogic/c3-isp-config.h | 2 +-
3 files changed, 12 insertions(+), 5 deletions(-)
---
base-commit: 47b7b5e32bb7264b51b89186043e1ada4090b558
change-id: 20251202-warnings-6-19-960d9b686cff
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
Currently, kvfree_rcu_barrier() flushes RCU sheaves across all slab
caches when a cache is destroyed. This is unnecessary; only the RCU
sheaves belonging to the cache being destroyed need to be flushed.
As suggested by Vlastimil Babka, introduce a weaker form of
kvfree_rcu_barrier() that operates on a specific slab cache.
Factor out flush_rcu_sheaves_on_cache() from flush_all_rcu_sheaves() and
call it from flush_all_rcu_sheaves() and kvfree_rcu_barrier_on_cache().
Call kvfree_rcu_barrier_on_cache() instead of kvfree_rcu_barrier() on
cache destruction.
The performance benefit is evaluated on a 12 core 24 threads AMD Ryzen
5900X machine (1 socket), by loading slub_kunit module.
Before:
Total calls: 19
Average latency (us): 18127
Total time (us): 344414
After:
Total calls: 19
Average latency (us): 10066
Total time (us): 191264
Two performance regression have been reported:
- stress module loader test's runtime increases by 50-60% (Daniel)
- internal graphics test's runtime on Tegra23 increases by 35% (Jon)
They are fixed by this change.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: ec66e0d59952 ("slab: add sheaf support for batching kfree_rcu() operations")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/linux-mm/1bda09da-93be-4737-aef0-d47f8c5c9301@suse.…
Reported-and-tested-by: Daniel Gomez <da.gomez(a)samsung.com>
Closes: https://lore.kernel.org/linux-mm/0406562e-2066-4cf8-9902-b2b0616dd742@kerne…
Reported-and-tested-by: Jon Hunter <jonathanh(a)nvidia.com>
Closes: https://lore.kernel.org/linux-mm/e988eff6-1287-425e-a06c-805af5bbf262@nvidi…
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
No code change, added proper tags and updated changelog.
include/linux/slab.h | 5 ++++
mm/slab.h | 1 +
mm/slab_common.c | 52 +++++++++++++++++++++++++++++------------
mm/slub.c | 55 ++++++++++++++++++++++++--------------------
4 files changed, 73 insertions(+), 40 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index cf443f064a66..937c93d44e8c 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1149,6 +1149,10 @@ static inline void kvfree_rcu_barrier(void)
{
rcu_barrier();
}
+static inline void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
+{
+ rcu_barrier();
+}
static inline void kfree_rcu_scheduler_running(void) { }
#else
@@ -1156,6 +1160,7 @@ void kvfree_rcu_barrier(void);
void kfree_rcu_scheduler_running(void);
#endif
+void kvfree_rcu_barrier_on_cache(struct kmem_cache *s);
/**
* kmalloc_size_roundup - Report allocation bucket size for the given size
diff --git a/mm/slab.h b/mm/slab.h
index f730e012553c..e767aa7e91b0 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -422,6 +422,7 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj);
void flush_all_rcu_sheaves(void);
+void flush_rcu_sheaves_on_cache(struct kmem_cache *s);
#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
SLAB_CACHE_DMA32 | SLAB_PANIC | \
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 84dfff4f7b1f..dd8a49d6f9cc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -492,7 +492,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
return;
/* in-flight kfree_rcu()'s may include objects from our cache */
- kvfree_rcu_barrier();
+ kvfree_rcu_barrier_on_cache(s);
if (IS_ENABLED(CONFIG_SLUB_RCU_DEBUG) &&
(s->flags & SLAB_TYPESAFE_BY_RCU)) {
@@ -2038,25 +2038,13 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
}
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
-/**
- * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete.
- *
- * Note that a single argument of kvfree_rcu() call has a slow path that
- * triggers synchronize_rcu() following by freeing a pointer. It is done
- * before the return from the function. Therefore for any single-argument
- * call that will result in a kfree() to a cache that is to be destroyed
- * during module exit, it is developer's responsibility to ensure that all
- * such calls have returned before the call to kmem_cache_destroy().
- */
-void kvfree_rcu_barrier(void)
+static inline void __kvfree_rcu_barrier(void)
{
struct kfree_rcu_cpu_work *krwp;
struct kfree_rcu_cpu *krcp;
bool queued;
int i, cpu;
- flush_all_rcu_sheaves();
-
/*
* Firstly we detach objects and queue them over an RCU-batch
* for all CPUs. Finally queued works are flushed for each CPU.
@@ -2118,8 +2106,43 @@ void kvfree_rcu_barrier(void)
}
}
}
+
+/**
+ * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete.
+ *
+ * Note that a single argument of kvfree_rcu() call has a slow path that
+ * triggers synchronize_rcu() following by freeing a pointer. It is done
+ * before the return from the function. Therefore for any single-argument
+ * call that will result in a kfree() to a cache that is to be destroyed
+ * during module exit, it is developer's responsibility to ensure that all
+ * such calls have returned before the call to kmem_cache_destroy().
+ */
+void kvfree_rcu_barrier(void)
+{
+ flush_all_rcu_sheaves();
+ __kvfree_rcu_barrier();
+}
EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
+/**
+ * kvfree_rcu_barrier_on_cache - Wait for in-flight kvfree_rcu() calls on a
+ * specific slab cache.
+ * @s: slab cache to wait for
+ *
+ * See the description of kvfree_rcu_barrier() for details.
+ */
+void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
+{
+ if (s->cpu_sheaves)
+ flush_rcu_sheaves_on_cache(s);
+ /*
+ * TODO: Introduce a version of __kvfree_rcu_barrier() that works
+ * on a specific slab cache.
+ */
+ __kvfree_rcu_barrier();
+}
+EXPORT_SYMBOL_GPL(kvfree_rcu_barrier_on_cache);
+
static unsigned long
kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
@@ -2215,4 +2238,3 @@ void __init kvfree_rcu_init(void)
}
#endif /* CONFIG_KVFREE_RCU_BATCHED */
-
diff --git a/mm/slub.c b/mm/slub.c
index 785e25a14999..7cec2220712b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4118,42 +4118,47 @@ static void flush_rcu_sheaf(struct work_struct *w)
/* needed for kvfree_rcu_barrier() */
-void flush_all_rcu_sheaves(void)
+void flush_rcu_sheaves_on_cache(struct kmem_cache *s)
{
struct slub_flush_work *sfw;
- struct kmem_cache *s;
unsigned int cpu;
- cpus_read_lock();
- mutex_lock(&slab_mutex);
+ mutex_lock(&flush_lock);
- list_for_each_entry(s, &slab_caches, list) {
- if (!s->cpu_sheaves)
- continue;
+ for_each_online_cpu(cpu) {
+ sfw = &per_cpu(slub_flush, cpu);
- mutex_lock(&flush_lock);
+ /*
+ * we don't check if rcu_free sheaf exists - racing
+ * __kfree_rcu_sheaf() might have just removed it.
+ * by executing flush_rcu_sheaf() on the cpu we make
+ * sure the __kfree_rcu_sheaf() finished its call_rcu()
+ */
- for_each_online_cpu(cpu) {
- sfw = &per_cpu(slub_flush, cpu);
+ INIT_WORK(&sfw->work, flush_rcu_sheaf);
+ sfw->s = s;
+ queue_work_on(cpu, flushwq, &sfw->work);
+ }
- /*
- * we don't check if rcu_free sheaf exists - racing
- * __kfree_rcu_sheaf() might have just removed it.
- * by executing flush_rcu_sheaf() on the cpu we make
- * sure the __kfree_rcu_sheaf() finished its call_rcu()
- */
+ for_each_online_cpu(cpu) {
+ sfw = &per_cpu(slub_flush, cpu);
+ flush_work(&sfw->work);
+ }
- INIT_WORK(&sfw->work, flush_rcu_sheaf);
- sfw->s = s;
- queue_work_on(cpu, flushwq, &sfw->work);
- }
+ mutex_unlock(&flush_lock);
+}
- for_each_online_cpu(cpu) {
- sfw = &per_cpu(slub_flush, cpu);
- flush_work(&sfw->work);
- }
+void flush_all_rcu_sheaves(void)
+{
+ struct kmem_cache *s;
+
+ cpus_read_lock();
+ mutex_lock(&slab_mutex);
- mutex_unlock(&flush_lock);
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!s->cpu_sheaves)
+ continue;
+ flush_rcu_sheaves_on_cache(s);
}
mutex_unlock(&slab_mutex);
--
2.43.0
The driver_override_show function reads the driver_override string
without holding the device_lock. However, the store function modifies
and frees the string while holding the device_lock. This creates a race
condition where the string can be freed by the store function while
being read by the show function, leading to a use-after-free.
To fix this, replace the rpmsg_string_attr macro with explicit show and
store functions. The new driver_override_store uses the standard
driver_set_override helper. Since the introduction of
driver_set_override, the comments in include/linux/rpmsg.h have stated
that this helper must be used to set or clear driver_override, but the
implementation was not updated until now.
Because driver_set_override modifies and frees the string while holding
the device_lock, the new driver_override_show now correctly holds the
device_lock during the read operation to prevent the race.
Additionally, since rpmsg_string_attr has only ever been used for
driver_override, removing the macro simplifies the code.
Fixes: 39e47767ec9b ("rpmsg: Add driver_override device attribute for rpmsg_device")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
I verified this with a stress test that continuously writes/reads the
attribute. It triggered KASAN and leaked bytes like a0 f4 81 9f a3 ff ff
(likely kernel pointers). Since driver_override is world-readable (0644),
this allows unprivileged users to leak kernel pointers and bypass KASLR.
Similar races were fixed in other buses (e.g., commits 9561475db680 and
91d44c1afc61). Currently, 9 of 11 buses handle this correctly; this patch
fixes one of the remaining two.
---
drivers/rpmsg/rpmsg_core.c | 66 ++++++++++++++++----------------------
1 file changed, 27 insertions(+), 39 deletions(-)
diff --git a/drivers/rpmsg/rpmsg_core.c b/drivers/rpmsg/rpmsg_core.c
index 5d661681a9b6..96964745065b 100644
--- a/drivers/rpmsg/rpmsg_core.c
+++ b/drivers/rpmsg/rpmsg_core.c
@@ -352,50 +352,38 @@ field##_show(struct device *dev, \
} \
static DEVICE_ATTR_RO(field);
-#define rpmsg_string_attr(field, member) \
-static ssize_t \
-field##_store(struct device *dev, struct device_attribute *attr, \
- const char *buf, size_t sz) \
-{ \
- struct rpmsg_device *rpdev = to_rpmsg_device(dev); \
- const char *old; \
- char *new; \
- \
- new = kstrndup(buf, sz, GFP_KERNEL); \
- if (!new) \
- return -ENOMEM; \
- new[strcspn(new, "\n")] = '\0'; \
- \
- device_lock(dev); \
- old = rpdev->member; \
- if (strlen(new)) { \
- rpdev->member = new; \
- } else { \
- kfree(new); \
- rpdev->member = NULL; \
- } \
- device_unlock(dev); \
- \
- kfree(old); \
- \
- return sz; \
-} \
-static ssize_t \
-field##_show(struct device *dev, \
- struct device_attribute *attr, char *buf) \
-{ \
- struct rpmsg_device *rpdev = to_rpmsg_device(dev); \
- \
- return sprintf(buf, "%s\n", rpdev->member); \
-} \
-static DEVICE_ATTR_RW(field)
-
/* for more info, see Documentation/ABI/testing/sysfs-bus-rpmsg */
rpmsg_show_attr(name, id.name, "%s\n");
rpmsg_show_attr(src, src, "0x%x\n");
rpmsg_show_attr(dst, dst, "0x%x\n");
rpmsg_show_attr(announce, announce ? "true" : "false", "%s\n");
-rpmsg_string_attr(driver_override, driver_override);
+
+static ssize_t driver_override_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct rpmsg_device *rpdev = to_rpmsg_device(dev);
+ int ret;
+
+ ret = driver_set_override(dev, &rpdev->driver_override, buf, count);
+ if (ret)
+ return ret;
+
+ return count;
+}
+
+static ssize_t driver_override_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct rpmsg_device *rpdev = to_rpmsg_device(dev);
+ ssize_t len;
+
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", rpdev->driver_override);
+ device_unlock(dev);
+ return len;
+}
+static DEVICE_ATTR_RW(driver_override);
static ssize_t modalias_show(struct device *dev,
struct device_attribute *attr, char *buf)
--
2.43.0
An invalid pointer dereference bug was reported on arm64 cpu, and has
not yet been seen on x86. A partial oops looks like:
Call trace:
update_cfs_rq_h_load+0x80/0xb0
wake_affine+0x158/0x168
select_task_rq_fair+0x364/0x3a8
try_to_wake_up+0x154/0x648
wake_up_q+0x68/0xd0
futex_wake_op+0x280/0x4c8
do_futex+0x198/0x1c0
__arm64_sys_futex+0x11c/0x198
Link: https://lore.kernel.org/all/20251013071820.1531295-1-CruzZhao@linux.alibaba…
We found that the task_group corresponding to the problematic se
is not in the parent task_group’s children list, indicating that
h_load_next points to an invalid address. Consider the following
cgroup and task hierarchy:
A
/ \
/ \
B E
/ \ |
/ \ t2
C D
| |
t0 t1
Here follows a timing sequence that may be responsible for triggering
the problem:
CPU X CPU Y CPU Z
wakeup t0
set list A->B->C
traverse A->B->C
t0 exits
destroy C
wakeup t2
set list A->E wakeup t1
set list A->B->D
traverse A->B->C
panic
CPU Z sets ->h_load_next list to A->B->D, but due to arm64 weaker memory
ordering, Y may observe A->B before it sees B->D, then in this time window,
it can traverse A->B->C and reach an invalid se.
We can avoid stale pointer accesses by clearing ->h_load_next when
unregistering cgroup.
Suggested-by: Vincent Guittot <vincent.guittot(a)linaro.org>
Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()")
Cc: <stable(a)vger.kernel.org>
Co-developed-by: Cruz Zhao <CruzZhao(a)linux.alibaba.com>
Signed-off-by: Cruz Zhao <CruzZhao(a)linux.alibaba.com>
Signed-off-by: Peng Wang <peng_wang(a)linux.alibaba.com>
---
kernel/sched/fair.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cee1793e8277..a5fce15093d3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -13427,6 +13427,14 @@ void unregister_fair_sched_group(struct task_group *tg)
list_del_leaf_cfs_rq(cfs_rq);
}
remove_entity_load_avg(se);
+ /*
+ * Clear parent's h_load_next if it points to the
+ * sched_entity being freed to avoid stale pointer.
+ */
+ struct cfs_rq *parent_cfs_rq = cfs_rq_of(se);
+
+ if (READ_ONCE(parent_cfs_rq->h_load_next) == se)
+ WRITE_ONCE(parent_cfs_rq->h_load_next, NULL);
}
/*
--
2.27.0
From: Doug Berger <opendmb(a)gmail.com>
[ Upstream commit 382748c05e58a9f1935f5a653c352422375566ea ]
Commit 16b269436b72 ("sched/deadline: Modify cpudl::free_cpus
to reflect rd->online") introduced the cpudl_set/clear_freecpu
functions to allow the cpu_dl::free_cpus mask to be manipulated
by the deadline scheduler class rq_on/offline callbacks so the
mask would also reflect this state.
Commit 9659e1eeee28 ("sched/deadline: Remove cpu_active_mask
from cpudl_find()") removed the check of the cpu_active_mask to
save some processing on the premise that the cpudl::free_cpus
mask already reflected the runqueue online state.
Unfortunately, there are cases where it is possible for the
cpudl_clear function to set the free_cpus bit for a CPU when the
deadline runqueue is offline. When this occurs while a CPU is
connected to the default root domain the flag may retain the bad
state after the CPU has been unplugged. Later, a different CPU
that is transitioning through the default root domain may push a
deadline task to the powered down CPU when cpudl_find sees its
free_cpus bit is set. If this happens the task will not have the
opportunity to run.
One example is outlined here:
https://lore.kernel.org/lkml/20250110233010.2339521-1-opendmb@gmail.com
Another occurs when the last deadline task is migrated from a
CPU that has an offlined runqueue. The dequeue_task member of
the deadline scheduler class will eventually call cpudl_clear
and set the free_cpus bit for the CPU.
This commit modifies the cpudl_clear function to be aware of the
online state of the deadline runqueue so that the free_cpus mask
can be updated appropriately.
It is no longer necessary to manage the mask outside of the
cpudl_set/clear functions so the cpudl_set/clear_freecpu
functions are removed. In addition, since the free_cpus mask is
now only updated under the cpudl lock the code was changed to
use the non-atomic __cpumask functions.
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
---
kernel/sched/cpudeadline.c | 34 +++++++++-------------------------
kernel/sched/cpudeadline.h | 4 +---
kernel/sched/deadline.c | 8 ++++----
3 files changed, 14 insertions(+), 32 deletions(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 8cb06c8c7eb1..fd28ba4e1a28 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -166,12 +166,13 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
* cpudl_clear - remove a CPU from the cpudl max-heap
* @cp: the cpudl max-heap context
* @cpu: the target CPU
+ * @online: the online state of the deadline runqueue
*
* Notes: assumes cpu_rq(cpu)->lock is locked
*
* Returns: (void)
*/
-void cpudl_clear(struct cpudl *cp, int cpu)
+void cpudl_clear(struct cpudl *cp, int cpu, bool online)
{
int old_idx, new_cpu;
unsigned long flags;
@@ -184,7 +185,7 @@ void cpudl_clear(struct cpudl *cp, int cpu)
if (old_idx == IDX_INVALID) {
/*
* Nothing to remove if old_idx was invalid.
- * This could happen if a rq_offline_dl is
+ * This could happen if rq_online_dl or rq_offline_dl is
* called for a CPU without -dl tasks running.
*/
} else {
@@ -195,9 +196,12 @@ void cpudl_clear(struct cpudl *cp, int cpu)
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
cpudl_heapify(cp, old_idx);
-
- cpumask_set_cpu(cpu, cp->free_cpus);
}
+ if (likely(online))
+ __cpumask_set_cpu(cpu, cp->free_cpus);
+ else
+ __cpumask_clear_cpu(cpu, cp->free_cpus);
+
raw_spin_unlock_irqrestore(&cp->lock, flags);
}
@@ -228,7 +232,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
cp->elements[new_idx].cpu = cpu;
cp->elements[cpu].idx = new_idx;
cpudl_heapify_up(cp, new_idx);
- cpumask_clear_cpu(cpu, cp->free_cpus);
+ __cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
cp->elements[old_idx].dl = dl;
cpudl_heapify(cp, old_idx);
@@ -237,26 +241,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
raw_spin_unlock_irqrestore(&cp->lock, flags);
}
-/*
- * cpudl_set_freecpu - Set the cpudl.free_cpus
- * @cp: the cpudl max-heap context
- * @cpu: rd attached CPU
- */
-void cpudl_set_freecpu(struct cpudl *cp, int cpu)
-{
- cpumask_set_cpu(cpu, cp->free_cpus);
-}
-
-/*
- * cpudl_clear_freecpu - Clear the cpudl.free_cpus
- * @cp: the cpudl max-heap context
- * @cpu: rd attached CPU
- */
-void cpudl_clear_freecpu(struct cpudl *cp, int cpu)
-{
- cpumask_clear_cpu(cpu, cp->free_cpus);
-}
-
/*
* cpudl_init - initialize the cpudl structure
* @cp: the cpudl max-heap context
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index 0adeda93b5fb..ecff718d94ae 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -18,9 +18,7 @@ struct cpudl {
#ifdef CONFIG_SMP
int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask);
void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
-void cpudl_clear(struct cpudl *cp, int cpu);
+void cpudl_clear(struct cpudl *cp, int cpu, bool online);
int cpudl_init(struct cpudl *cp);
-void cpudl_set_freecpu(struct cpudl *cp, int cpu);
-void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
void cpudl_cleanup(struct cpudl *cp);
#endif /* CONFIG_SMP */
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 6548bd90c5c3..85e4ef476686 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1414,7 +1414,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
if (!dl_rq->dl_nr_running) {
dl_rq->earliest_dl.curr = 0;
dl_rq->earliest_dl.next = 0;
- cpudl_clear(&rq->rd->cpudl, rq->cpu);
+ cpudl_clear(&rq->rd->cpudl, rq->cpu, rq->online);
} else {
struct rb_node *leftmost = dl_rq->root.rb_leftmost;
struct sched_dl_entity *entry;
@@ -2349,9 +2349,10 @@ static void rq_online_dl(struct rq *rq)
if (rq->dl.overloaded)
dl_set_overload(rq);
- cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
if (rq->dl.dl_nr_running > 0)
cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
+ else
+ cpudl_clear(&rq->rd->cpudl, rq->cpu, true);
}
/* Assumes rq->lock is held */
@@ -2360,8 +2361,7 @@ static void rq_offline_dl(struct rq *rq)
if (rq->dl.overloaded)
dl_clear_overload(rq);
- cpudl_clear(&rq->rd->cpudl, rq->cpu);
- cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
+ cpudl_clear(&rq->rd->cpudl, rq->cpu, false);
}
void __init init_sched_dl_class(void)
--
2.34.1
On QCS9075 and QCA8275 platforms, the BT_EN pin is always pulled up by hw
and cannot be controlled by the host. As a result, in case of a firmware
crash, the host cannot trigger a cold reset. Instead, the BT controller
performs a warm restart on its own, without reloading the firmware.
This leads to the controller remaining in IBS_WAKE state, while the host
expects it to be in sleep mode. The mismatch causes HCI reset commands
to time out. Additionally, the driver does not clear internal flags
QCA_SSR_TRIGGERED and QCA_IBS_DISABLED, which blocks the reset sequence.
If the SSR duration exceeds 2 seconds, the host may enter TX sleep mode
due to tx_idle_timeout, further preventing recovery. Also, memcoredump_flag
is not cleared, so only the first SSR generates a coredump.
Tell driver that BT controller has undergone a proper restart sequence:
- Clear QCA_SSR_TRIGGERED and QCA_IBS_DISABLED flags after SSR.
- Add a 50ms delay to allow the controller to complete its warm reset.
- Reset tx_idle_timer to prevent the host from entering TX sleep mode.
- Clear memcoredump_flag to allow multiple coredump captures.
Apply these steps only when HCI_QUIRK_NON_PERSISTENT_SETUP is not set,
which indicates that BT_EN is defined in DTS and cannot be toggled.
Refer to the comment in include/net/bluetooth/hci.h for details on
HCI_QUIRK_NON_PERSISTENT_SETUP.
Changes in v12:
- Rewrote commit to clarify the actual issue and affected platforms.
- Used imperative language to describe the fix.
- Explained the role of HCI_QUIRK_NON_PERSISTENT_SETUP.
Signed-off-by: Shuai Zhang <quic_shuaz(a)quicinc.com>
---
drivers/bluetooth/hci_qca.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 4cff4d9be..2d6560482 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1653,6 +1653,39 @@ static void qca_hw_error(struct hci_dev *hdev, u8 code)
skb_queue_purge(&qca->rx_memdump_q);
}
+ /*
+ * If the BT chip's bt_en pin is connected to a 3.3V power supply via
+ * hardware and always stays high, driver cannot control the bt_en pin.
+ * As a result, during SSR (SubSystem Restart), QCA_SSR_TRIGGERED and
+ * QCA_IBS_DISABLED flags cannot be cleared, which leads to a reset
+ * command timeout.
+ * Add an msleep delay to ensure controller completes the SSR process.
+ *
+ * Host will not download the firmware after SSR, controller to remain
+ * in the IBS_WAKE state, and the host needs to synchronize with it
+ *
+ * Since the bluetooth chip has been reset, clear the memdump state.
+ */
+ if (!hci_test_quirk(hu->hdev, HCI_QUIRK_NON_PERSISTENT_SETUP)) {
+ /*
+ * When the SSR (SubSystem Restart) duration exceeds 2 seconds,
+ * it triggers host tx_idle_delay, which sets host TX state
+ * to sleep. Reset tx_idle_timer after SSR to prevent
+ * host enter TX IBS_Sleep mode.
+ */
+ mod_timer(&qca->tx_idle_timer, jiffies +
+ msecs_to_jiffies(qca->tx_idle_delay));
+
+ /* Controller reset completion time is 50ms */
+ msleep(50);
+
+ clear_bit(QCA_SSR_TRIGGERED, &qca->flags);
+ clear_bit(QCA_IBS_DISABLED, &qca->flags);
+
+ qca->tx_ibs_state = HCI_IBS_TX_AWAKE;
+ qca->memdump_state = QCA_MEMDUMP_IDLE;
+ }
+
clear_bit(QCA_HW_ERROR_EVENT, &qca->flags);
}
--
2.34.1
The temperature was never clamped to SPRD_THM_TEMP_LOW or
SPRD_THM_TEMP_HIGH because the return value of clamp() was not used. Fix
this by assigning the clamped value to 'temp'.
Casting SPRD_THM_TEMP_LOW and SPRD_THM_TEMP_HIGH to int is also
redundant and can be removed.
Cc: stable(a)vger.kernel.org
Fixes: 554fdbaf19b1 ("thermal: sprd: Add Spreadtrum thermal driver support")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/thermal/sprd_thermal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/sprd_thermal.c b/drivers/thermal/sprd_thermal.c
index f7fa83b2428e..44fa45f74da7 100644
--- a/drivers/thermal/sprd_thermal.c
+++ b/drivers/thermal/sprd_thermal.c
@@ -192,7 +192,7 @@ static int sprd_thm_temp_to_rawdata(int temp, struct sprd_thermal_sensor *sen)
{
u32 val;
- clamp(temp, (int)SPRD_THM_TEMP_LOW, (int)SPRD_THM_TEMP_HIGH);
+ temp = clamp(temp, SPRD_THM_TEMP_LOW, SPRD_THM_TEMP_HIGH);
/*
* According to the thermal datasheet, the formula of converting
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
The raw temperature data was never clamped to SPRD_THM_RAW_DATA_LOW or
SPRD_THM_RAW_DATA_HIGH because the return value of clamp() was not used.
Fix this by assigning the clamped value to 'rawdata'.
Casting SPRD_THM_RAW_DATA_LOW and SPRD_THM_RAW_DATA_HIGH to u32 is also
redundant and can be removed.
Cc: stable(a)vger.kernel.org
Fixes: 554fdbaf19b1 ("thermal: sprd: Add Spreadtrum thermal driver support")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/thermal/sprd_thermal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/sprd_thermal.c b/drivers/thermal/sprd_thermal.c
index e546067c9621..f7fa83b2428e 100644
--- a/drivers/thermal/sprd_thermal.c
+++ b/drivers/thermal/sprd_thermal.c
@@ -178,7 +178,7 @@ static int sprd_thm_sensor_calibration(struct device_node *np,
static int sprd_thm_rawdata_to_temp(struct sprd_thermal_sensor *sen,
u32 rawdata)
{
- clamp(rawdata, (u32)SPRD_THM_RAW_DATA_LOW, (u32)SPRD_THM_RAW_DATA_HIGH);
+ rawdata = clamp(rawdata, SPRD_THM_RAW_DATA_LOW, SPRD_THM_RAW_DATA_HIGH);
/*
* According to the thermal datasheet, the formula of converting
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
Mejora tus contrataciones con PsicoSmart
body {
margin: 0;
padding: 0;
font-family: Arial, Helvetica, sans-serif;
font-size: 14px;
color: #333;
background-color: #ffffff;
}
table {
border-spacing: 0;
width: 100%;
max-width: 600px;
margin: auto;
}
td {
padding: 12px 20px;
}
a {
color: #1a73e8;
text-decoration: none;
}
.footer {
font-size: 12px;
color: #888888;
text-align: center;
}
Evalúa talento de forma objetiva y mejora tus contrataciones con PsicoSmart.
Hola, ,
¿Te ha pasado que un candidato luce perfecto en entrevista, pero en el trabajo no encaja como esperabas?
En selección, confiar solo en la percepción puede llevar a decisiones costosas. Por eso quiero presentarte PsicoSmart, una herramienta creada para que los equipos de Recursos Humanos tomen decisiones más objetivas y acertadas.
Con PsicoSmart puedes:
Aplicar 31 pruebas psicométricas que evalúan liderazgo, honestidad, comunicación e inteligencia.
Validar conocimientos técnicos con más de 2,500 exámenes especializados.
Supervisar la identidad de quien responde mediante captura fotográfica automática durante la evaluación.
Gestionar todo desde una sola plataforma, accesible desde cualquier dispositivo.
Si estás buscando mejorar tus contrataciones, podría ser una muy buena opción. Si quieres conocer más puedes responder este correo o simplemente contactarme, mis datos están abajo.
Saludos,
--------------
Atte.: Valeria Pérez
Ciudad de México: (55) 5018 0565
WhatsApp: +52 33 1607 2089
Si no deseas recibir más correos, haz clic aquí para darte de baja.
Para remover su dirección de esta lista haga <a href="https://s1.arrobamail.com/unsuscribe.php?id=yiwtsrewiswqrtseup">click aquí</a>
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0f80e21bf6229637e193248fbd284c0ec44bc0fd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025120249-dormitory-filtrate-415a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0f80e21bf6229637e193248fbd284c0ec44bc0fd Mon Sep 17 00:00:00 2001
From: "Bastien Curutchet (Schneider Electric)" <bastien.curutchet(a)bootlin.com>
Date: Thu, 20 Nov 2025 10:12:03 +0100
Subject: [PATCH] net: dsa: microchip: Free previously initialized ports on
init failures
If a port interrupt setup fails after at least one port has already been
successfully initialized, the gotos miss some resource releasing:
- the already initialized PTP IRQs aren't released
- the already initialized port IRQs aren't released if the failure
occurs in ksz_pirq_setup().
Merge 'out_girq' and 'out_ptpirq' into a single 'port_release' label.
Behind this label, use the reverse loop to release all IRQ resources
for all initialized ports.
Jump in the middle of the reverse loop if an error occurs in
ksz_ptp_irq_setup() to only release the port IRQ of the current
iteration.
Cc: stable(a)vger.kernel.org
Fixes: c9cd961c0d43 ("net: dsa: microchip: lan937x: add interrupt support for port phy link")
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet(a)bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-4-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index a927055423f3..0c10351fe5eb 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -3038,12 +3038,12 @@ static int ksz_setup(struct dsa_switch *ds)
dsa_switch_for_each_user_port(dp, dev->ds) {
ret = ksz_pirq_setup(dev, dp->index);
if (ret)
- goto out_girq;
+ goto port_release;
if (dev->info->ptp_capable) {
ret = ksz_ptp_irq_setup(ds, dp->index);
if (ret)
- goto out_pirq;
+ goto pirq_release;
}
}
}
@@ -3053,7 +3053,7 @@ static int ksz_setup(struct dsa_switch *ds)
if (ret) {
dev_err(dev->dev, "Failed to register PTP clock: %d\n",
ret);
- goto out_ptpirq;
+ goto port_release;
}
}
@@ -3076,17 +3076,16 @@ static int ksz_setup(struct dsa_switch *ds)
out_ptp_clock_unregister:
if (dev->info->ptp_capable)
ksz_ptp_clock_unregister(ds);
-out_ptpirq:
- if (dev->irq > 0 && dev->info->ptp_capable)
- dsa_switch_for_each_user_port(dp, dev->ds)
- ksz_ptp_irq_free(ds, dp->index);
-out_pirq:
- if (dev->irq > 0)
- dsa_switch_for_each_user_port_continue_reverse(dp, dev->ds)
+port_release:
+ if (dev->irq > 0) {
+ dsa_switch_for_each_user_port_continue_reverse(dp, dev->ds) {
+ if (dev->info->ptp_capable)
+ ksz_ptp_irq_free(ds, dp->index);
+pirq_release:
ksz_irq_free(&dev->ports[dp->index].pirq);
-out_girq:
- if (dev->irq > 0)
+ }
ksz_irq_free(&dev->girq);
+ }
return ret;
}
get_user/put_user change didn't spend time in next and
seems a bit too risky to rush. I'm keeping it in my tree
and we'll get it in the next cycle.
The following changes since commit ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d:
Linux 6.18-rc7 (2025-11-23 14:53:16 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 205dd7a5d6ad6f4c8e8fcd3c3b95a7c0e7067fee:
virtio_pci: drop kernel.h (2025-11-30 18:02:43 -0500)
----------------------------------------------------------------
virtio,vhost: fixes, cleanups
Just a bunch of fixes and cleanups, mostly very simple. Several
features are merged through net-next this time around.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Alok Tiwari (3):
virtio_vdpa: fix misleading return in void function
vdpa/mlx5: Fix incorrect error code reporting in query_virtqueues
vdpa/pds: use %pe for ERR_PTR() in event handler registration
Kriish Sharma (1):
virtio: fix kernel-doc for mapping/free_coherent functions
Marco Crivellari (2):
virtio_balloon: add WQ_PERCPU to alloc_workqueue users
vduse: add WQ_PERCPU to alloc_workqueue users
Miaoqian Lin (1):
virtio: vdpa: Fix reference count leak in octep_sriov_enable()
Michael S. Tsirkin (11):
virtio: fix typo in virtio_device_ready() comment
virtio: fix whitespace in virtio_config_ops
virtio: fix grammar in virtio_queue_info docs
virtio: fix grammar in virtio_map_ops docs
virtio: standardize Returns documentation style
virtio: fix virtqueue_set_affinity() docs
virtio: fix map ops comment
virtio: clean up features qword/dword terms
vhost/test: add test specific macro for features
vhost: switch to arrays of feature bits
virtio_pci: drop kernel.h
Mike Christie (1):
vhost: Fix kthread worker cgroup failure handling
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
drivers/vdpa/octeon_ep/octep_vdpa_main.c | 1 +
drivers/vdpa/pds/vdpa_dev.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++-
drivers/vhost/net.c | 29 +++++++++++-----------
drivers/vhost/scsi.c | 9 ++++---
drivers/vhost/test.c | 10 ++++++--
drivers/vhost/vhost.c | 4 ++-
drivers/vhost/vhost.h | 42 ++++++++++++++++++++++++++------
drivers/vhost/vsock.c | 10 +++++---
drivers/virtio/virtio.c | 12 ++++-----
drivers/virtio/virtio_balloon.c | 3 ++-
drivers/virtio/virtio_debug.c | 10 ++++----
drivers/virtio/virtio_pci_modern_dev.c | 6 ++---
drivers/virtio/virtio_ring.c | 7 +++---
drivers/virtio/virtio_vdpa.c | 2 +-
include/linux/virtio.h | 2 +-
include/linux/virtio_config.h | 24 +++++++++---------
include/linux/virtio_features.h | 29 +++++++++++-----------
include/linux/virtio_pci_modern.h | 8 +++---
include/uapi/linux/virtio_pci.h | 2 +-
21 files changed, 131 insertions(+), 86 deletions(-)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x d0b8fec8ae50525b57139393d0bb1f446e82ff7e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025120217-pelvis-rotunda-cb95@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d0b8fec8ae50525b57139393d0bb1f446e82ff7e Mon Sep 17 00:00:00 2001
From: "Bastien Curutchet (Schneider Electric)" <bastien.curutchet(a)bootlin.com>
Date: Thu, 20 Nov 2025 10:12:04 +0100
Subject: [PATCH] net: dsa: microchip: Fix symetry in
ksz_ptp_msg_irq_{setup/free}()
The IRQ numbers created through irq_create_mapping() are only assigned
to ptpmsg_irq[n].num at the end of the IRQ setup. So if an error occurs
between their creation and their assignment (for instance during the
request_threaded_irq() step), we enter the error path and fail to
release the newly created virtual IRQs because they aren't yet assigned
to ptpmsg_irq[n].num.
Move the mapping creation to ksz_ptp_msg_irq_setup() to ensure symetry
with what's released by ksz_ptp_msg_irq_free().
In the error path, move the irq_dispose_mapping to the out_ptp_msg label
so it will be called only on created IRQs.
Cc: stable(a)vger.kernel.org
Fixes: cc13ab18b201 ("net: dsa: microchip: ptp: enable interrupt for timestamping")
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet(a)bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-5-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/dsa/microchip/ksz_ptp.c b/drivers/net/dsa/microchip/ksz_ptp.c
index c8bfbe5e2157..997e4a76d0a6 100644
--- a/drivers/net/dsa/microchip/ksz_ptp.c
+++ b/drivers/net/dsa/microchip/ksz_ptp.c
@@ -1093,19 +1093,19 @@ static int ksz_ptp_msg_irq_setup(struct ksz_port *port, u8 n)
static const char * const name[] = {"pdresp-msg", "xdreq-msg",
"sync-msg"};
const struct ksz_dev_ops *ops = port->ksz_dev->dev_ops;
+ struct ksz_irq *ptpirq = &port->ptpirq;
struct ksz_ptp_irq *ptpmsg_irq;
ptpmsg_irq = &port->ptpmsg_irq[n];
+ ptpmsg_irq->num = irq_create_mapping(ptpirq->domain, n);
+ if (!ptpmsg_irq->num)
+ return -EINVAL;
ptpmsg_irq->port = port;
ptpmsg_irq->ts_reg = ops->get_port_addr(port->num, ts_reg[n]);
strscpy(ptpmsg_irq->name, name[n]);
- ptpmsg_irq->num = irq_find_mapping(port->ptpirq.domain, n);
- if (ptpmsg_irq->num < 0)
- return ptpmsg_irq->num;
-
return request_threaded_irq(ptpmsg_irq->num, NULL,
ksz_ptp_msg_thread_fn, IRQF_ONESHOT,
ptpmsg_irq->name, ptpmsg_irq);
@@ -1135,9 +1135,6 @@ int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p)
if (!ptpirq->domain)
return -ENOMEM;
- for (irq = 0; irq < ptpirq->nirqs; irq++)
- irq_create_mapping(ptpirq->domain, irq);
-
ptpirq->irq_num = irq_find_mapping(port->pirq.domain, PORT_SRC_PTP_INT);
if (!ptpirq->irq_num) {
ret = -EINVAL;
@@ -1159,12 +1156,11 @@ int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p)
out_ptp_msg:
free_irq(ptpirq->irq_num, ptpirq);
- while (irq--)
+ while (irq--) {
free_irq(port->ptpmsg_irq[irq].num, &port->ptpmsg_irq[irq]);
-out:
- for (irq = 0; irq < ptpirq->nirqs; irq++)
irq_dispose_mapping(port->ptpmsg_irq[irq].num);
-
+ }
+out:
irq_domain_remove(ptpirq->domain);
return ret;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 0f80e21bf6229637e193248fbd284c0ec44bc0fd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025120246-talcum-rectangle-f99d@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0f80e21bf6229637e193248fbd284c0ec44bc0fd Mon Sep 17 00:00:00 2001
From: "Bastien Curutchet (Schneider Electric)" <bastien.curutchet(a)bootlin.com>
Date: Thu, 20 Nov 2025 10:12:03 +0100
Subject: [PATCH] net: dsa: microchip: Free previously initialized ports on
init failures
If a port interrupt setup fails after at least one port has already been
successfully initialized, the gotos miss some resource releasing:
- the already initialized PTP IRQs aren't released
- the already initialized port IRQs aren't released if the failure
occurs in ksz_pirq_setup().
Merge 'out_girq' and 'out_ptpirq' into a single 'port_release' label.
Behind this label, use the reverse loop to release all IRQ resources
for all initialized ports.
Jump in the middle of the reverse loop if an error occurs in
ksz_ptp_irq_setup() to only release the port IRQ of the current
iteration.
Cc: stable(a)vger.kernel.org
Fixes: c9cd961c0d43 ("net: dsa: microchip: lan937x: add interrupt support for port phy link")
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet(a)bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-4-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index a927055423f3..0c10351fe5eb 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -3038,12 +3038,12 @@ static int ksz_setup(struct dsa_switch *ds)
dsa_switch_for_each_user_port(dp, dev->ds) {
ret = ksz_pirq_setup(dev, dp->index);
if (ret)
- goto out_girq;
+ goto port_release;
if (dev->info->ptp_capable) {
ret = ksz_ptp_irq_setup(ds, dp->index);
if (ret)
- goto out_pirq;
+ goto pirq_release;
}
}
}
@@ -3053,7 +3053,7 @@ static int ksz_setup(struct dsa_switch *ds)
if (ret) {
dev_err(dev->dev, "Failed to register PTP clock: %d\n",
ret);
- goto out_ptpirq;
+ goto port_release;
}
}
@@ -3076,17 +3076,16 @@ static int ksz_setup(struct dsa_switch *ds)
out_ptp_clock_unregister:
if (dev->info->ptp_capable)
ksz_ptp_clock_unregister(ds);
-out_ptpirq:
- if (dev->irq > 0 && dev->info->ptp_capable)
- dsa_switch_for_each_user_port(dp, dev->ds)
- ksz_ptp_irq_free(ds, dp->index);
-out_pirq:
- if (dev->irq > 0)
- dsa_switch_for_each_user_port_continue_reverse(dp, dev->ds)
+port_release:
+ if (dev->irq > 0) {
+ dsa_switch_for_each_user_port_continue_reverse(dp, dev->ds) {
+ if (dev->info->ptp_capable)
+ ksz_ptp_irq_free(ds, dp->index);
+pirq_release:
ksz_irq_free(&dev->ports[dp->index].pirq);
-out_girq:
- if (dev->irq > 0)
+ }
ksz_irq_free(&dev->girq);
+ }
return ret;
}
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x d0b8fec8ae50525b57139393d0bb1f446e82ff7e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025120214-musky-conjure-b8b9@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d0b8fec8ae50525b57139393d0bb1f446e82ff7e Mon Sep 17 00:00:00 2001
From: "Bastien Curutchet (Schneider Electric)" <bastien.curutchet(a)bootlin.com>
Date: Thu, 20 Nov 2025 10:12:04 +0100
Subject: [PATCH] net: dsa: microchip: Fix symetry in
ksz_ptp_msg_irq_{setup/free}()
The IRQ numbers created through irq_create_mapping() are only assigned
to ptpmsg_irq[n].num at the end of the IRQ setup. So if an error occurs
between their creation and their assignment (for instance during the
request_threaded_irq() step), we enter the error path and fail to
release the newly created virtual IRQs because they aren't yet assigned
to ptpmsg_irq[n].num.
Move the mapping creation to ksz_ptp_msg_irq_setup() to ensure symetry
with what's released by ksz_ptp_msg_irq_free().
In the error path, move the irq_dispose_mapping to the out_ptp_msg label
so it will be called only on created IRQs.
Cc: stable(a)vger.kernel.org
Fixes: cc13ab18b201 ("net: dsa: microchip: ptp: enable interrupt for timestamping")
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet(a)bootlin.com>
Link: https://patch.msgid.link/20251120-ksz-fix-v6-5-891f80ae7f8f@bootlin.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/dsa/microchip/ksz_ptp.c b/drivers/net/dsa/microchip/ksz_ptp.c
index c8bfbe5e2157..997e4a76d0a6 100644
--- a/drivers/net/dsa/microchip/ksz_ptp.c
+++ b/drivers/net/dsa/microchip/ksz_ptp.c
@@ -1093,19 +1093,19 @@ static int ksz_ptp_msg_irq_setup(struct ksz_port *port, u8 n)
static const char * const name[] = {"pdresp-msg", "xdreq-msg",
"sync-msg"};
const struct ksz_dev_ops *ops = port->ksz_dev->dev_ops;
+ struct ksz_irq *ptpirq = &port->ptpirq;
struct ksz_ptp_irq *ptpmsg_irq;
ptpmsg_irq = &port->ptpmsg_irq[n];
+ ptpmsg_irq->num = irq_create_mapping(ptpirq->domain, n);
+ if (!ptpmsg_irq->num)
+ return -EINVAL;
ptpmsg_irq->port = port;
ptpmsg_irq->ts_reg = ops->get_port_addr(port->num, ts_reg[n]);
strscpy(ptpmsg_irq->name, name[n]);
- ptpmsg_irq->num = irq_find_mapping(port->ptpirq.domain, n);
- if (ptpmsg_irq->num < 0)
- return ptpmsg_irq->num;
-
return request_threaded_irq(ptpmsg_irq->num, NULL,
ksz_ptp_msg_thread_fn, IRQF_ONESHOT,
ptpmsg_irq->name, ptpmsg_irq);
@@ -1135,9 +1135,6 @@ int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p)
if (!ptpirq->domain)
return -ENOMEM;
- for (irq = 0; irq < ptpirq->nirqs; irq++)
- irq_create_mapping(ptpirq->domain, irq);
-
ptpirq->irq_num = irq_find_mapping(port->pirq.domain, PORT_SRC_PTP_INT);
if (!ptpirq->irq_num) {
ret = -EINVAL;
@@ -1159,12 +1156,11 @@ int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p)
out_ptp_msg:
free_irq(ptpirq->irq_num, ptpirq);
- while (irq--)
+ while (irq--) {
free_irq(port->ptpmsg_irq[irq].num, &port->ptpmsg_irq[irq]);
-out:
- for (irq = 0; irq < ptpirq->nirqs; irq++)
irq_dispose_mapping(port->ptpmsg_irq[irq].num);
-
+ }
+out:
irq_domain_remove(ptpirq->domain);
return ret;