Overview
========
When a CPU chooses to call push_rt_task and picks a task to push to
another CPU's runqueue then it will call find_lock_lowest_rq method
which would take a double lock on both CPUs' runqueues. If one of the
locks aren't readily available, it may lead to dropping the current
runqueue lock and reacquiring both the locks at once. During this window
it is possible that the task is already migrated and is running on some
other CPU. These cases are already handled. However, if the task is
migrated and has already been executed and another CPU is now trying to
wake it up (ttwu) such that it is queued again on the runqeue
(on_rq is 1) and also if the task was run by the same CPU, then the
current checks will pass even though the task was migrated out and is no
longer in the pushable tasks list.
Crashes
=======
This bug resulted in quite a few flavors of crashes triggering kernel
panics with various crash signatures such as assert failures, page
faults, null pointer dereferences, and queue corruption errors all
coming from scheduler itself.
Some of the crashes:
-> kernel BUG at kernel/sched/rt.c:1616! BUG_ON(idx >= MAX_RT_PRIO)
Call Trace:
? __die_body+0x1a/0x60
? die+0x2a/0x50
? do_trap+0x85/0x100
? pick_next_task_rt+0x6e/0x1d0
? do_error_trap+0x64/0xa0
? pick_next_task_rt+0x6e/0x1d0
? exc_invalid_op+0x4c/0x60
? pick_next_task_rt+0x6e/0x1d0
? asm_exc_invalid_op+0x12/0x20
? pick_next_task_rt+0x6e/0x1d0
__schedule+0x5cb/0x790
? update_ts_time_stats+0x55/0x70
schedule_idle+0x1e/0x40
do_idle+0x15e/0x200
cpu_startup_entry+0x19/0x20
start_secondary+0x117/0x160
secondary_startup_64_no_verify+0xb0/0xbb
-> BUG: kernel NULL pointer dereference, address: 00000000000000c0
Call Trace:
? __die_body+0x1a/0x60
? no_context+0x183/0x350
? __warn+0x8a/0xe0
? exc_page_fault+0x3d6/0x520
? asm_exc_page_fault+0x1e/0x30
? pick_next_task_rt+0xb5/0x1d0
? pick_next_task_rt+0x8c/0x1d0
__schedule+0x583/0x7e0
? update_ts_time_stats+0x55/0x70
schedule_idle+0x1e/0x40
do_idle+0x15e/0x200
cpu_startup_entry+0x19/0x20
start_secondary+0x117/0x160
secondary_startup_64_no_verify+0xb0/0xbb
-> BUG: unable to handle page fault for address: ffff9464daea5900
kernel BUG at kernel/sched/rt.c:1861! BUG_ON(rq->cpu != task_cpu(p))
-> kernel BUG at kernel/sched/rt.c:1055! BUG_ON(!rq->nr_running)
Call Trace:
? __die_body+0x1a/0x60
? die+0x2a/0x50
? do_trap+0x85/0x100
? dequeue_top_rt_rq+0xa2/0xb0
? do_error_trap+0x64/0xa0
? dequeue_top_rt_rq+0xa2/0xb0
? exc_invalid_op+0x4c/0x60
? dequeue_top_rt_rq+0xa2/0xb0
? asm_exc_invalid_op+0x12/0x20
? dequeue_top_rt_rq+0xa2/0xb0
dequeue_rt_entity+0x1f/0x70
dequeue_task_rt+0x2d/0x70
__schedule+0x1a8/0x7e0
? blk_finish_plug+0x25/0x40
schedule+0x3c/0xb0
futex_wait_queue_me+0xb6/0x120
futex_wait+0xd9/0x240
do_futex+0x344/0xa90
? get_mm_exe_file+0x30/0x60
? audit_exe_compare+0x58/0x70
? audit_filter_rules.constprop.26+0x65e/0x1220
__x64_sys_futex+0x148/0x1f0
do_syscall_64+0x30/0x80
entry_SYSCALL_64_after_hwframe+0x62/0xc7
-> BUG: unable to handle page fault for address: ffff8cf3608bc2c0
Call Trace:
? __die_body+0x1a/0x60
? no_context+0x183/0x350
? spurious_kernel_fault+0x171/0x1c0
? exc_page_fault+0x3b6/0x520
? plist_check_list+0x15/0x40
? plist_check_list+0x2e/0x40
? asm_exc_page_fault+0x1e/0x30
? _cond_resched+0x15/0x30
? futex_wait_queue_me+0xc8/0x120
? futex_wait+0xd9/0x240
? try_to_wake_up+0x1b8/0x490
? futex_wake+0x78/0x160
? do_futex+0xcd/0xa90
? plist_check_list+0x15/0x40
? plist_check_list+0x2e/0x40
? plist_del+0x6a/0xd0
? plist_check_list+0x15/0x40
? plist_check_list+0x2e/0x40
? dequeue_pushable_task+0x20/0x70
? __schedule+0x382/0x7e0
? asm_sysvec_reschedule_ipi+0xa/0x20
? schedule+0x3c/0xb0
? exit_to_user_mode_prepare+0x9e/0x150
? irqentry_exit_to_user_mode+0x5/0x30
? asm_sysvec_reschedule_ipi+0x12/0x20
Above are some of the common examples of the crashes that were observed
due to this issue.
Details
=======
Let's look at the following scenario to understand this race.
1) CPU A enters push_rt_task
a) CPU A has chosen next_task = task p.
b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
c) CPU A identifies CPU X as a destination CPU (X < Z).
d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
locked CPU X’s rq, and thus, CPU A must wait.
2) At CPU Z
a) Previous task has completed execution and thus, CPU Z enters
schedule, locks its own rq after CPU A releases it.
b) CPU Z dequeues previous task and begins executing task p.
c) CPU Z unlocks its rq.
d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
lock) which triggers the schedule function on CPU Z.
e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.
3) At CPU B
a) CPU B enters try_to_wake_up with input task p.
b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
B.state = WAKING.
c) CPU B via select_task_rq determines CPU Y as the target CPU.
4) The race
a) CPU A acquires CPU X’s lock and relocks CPU Z.
b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
still on CPU Z.
c) CPU A failed to notice task p had been dequeued from CPU Z while
CPU A was waiting for locks in double_lock_balance. If CPU A knew
that task p had been dequeued, it would return NULL forcing
push_rt_task to give up the task p's migration.
d) CPU B updates task p.cpu = Y and calls ttwu_queue.
e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
p.on_rq = 1.
f) CPU B unlocks CPU Y, triggering memory synchronization.
g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
has not migrated.
h) CPU A decides to migrate p to CPU X.
This leads to A dequeuing p from Y's queue and various crashes down the
line.
Solution
========
The solution here is fairly simple. After obtaining the lock (at 4a),
the check is enhanced to make sure that the task is still at the head of
the pushable tasks list. If not, then it is anyway not suitable for
being pushed out.
Testing
=======
The fix is tested on a cluster of 3 nodes, where the panics due to this
are hit every couple of days. A fix similar to this was deployed on such
cluster and was stable for more than 30 days.
Co-developed-by: Jon Kohler <jon(a)nutanix.com>
Signed-off-by: Jon Kohler <jon(a)nutanix.com>
Co-developed-by: Gauri Patwardhan <gauri.patwardhan(a)nutanix.com>
Signed-off-by: Gauri Patwardhan <gauri.patwardhan(a)nutanix.com>
Co-developed-by: Rahul Chunduru <rahul.chunduru(a)nutanix.com>
Signed-off-by: Rahul Chunduru <rahul.chunduru(a)nutanix.com>
Signed-off-by: Harshit Agarwal <harshit(a)nutanix.com>
Tested-by: Will Ton <william.ton(a)nutanix.com>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- As per Steve's suggestion, removed some checks that are done after
obtaining the lock that are no longer needed with the addition of new
check.
- Moved up is_migration_disabled check.
- Link to v1:
https://lore.kernel.org/lkml/20250211054646.23987-1-harshit@nutanix.com/
Changes in v3:
- Updated commit message to add stable maintainers and reviewed-by tag.
- Link to v2:
https://lore.kernel.org/lkml/20250214170844.201692-1-harshit@nutanix.com/
---
kernel/sched/rt.c | 54 +++++++++++++++++++++++------------------------
1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4b8e33c615b1..4762dd3f50c5 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1885,6 +1885,27 @@ static int find_lowest_rq(struct task_struct *task)
return -1;
}
+static struct task_struct *pick_next_pushable_task(struct rq *rq)
+{
+ struct task_struct *p;
+
+ if (!has_pushable_tasks(rq))
+ return NULL;
+
+ p = plist_first_entry(&rq->rt.pushable_tasks,
+ struct task_struct, pushable_tasks);
+
+ BUG_ON(rq->cpu != task_cpu(p));
+ BUG_ON(task_current(rq, p));
+ BUG_ON(task_current_donor(rq, p));
+ BUG_ON(p->nr_cpus_allowed <= 1);
+
+ BUG_ON(!task_on_rq_queued(p));
+ BUG_ON(!rt_task(p));
+
+ return p;
+}
+
/* Will lock the rq it finds */
static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
{
@@ -1915,18 +1936,16 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
/*
* We had to unlock the run queue. In
* the mean time, task could have
- * migrated already or had its affinity changed.
- * Also make sure that it wasn't scheduled on its rq.
+ * migrated already or had its affinity changed,
+ * therefore check if the task is still at the
+ * head of the pushable tasks list.
* It is possible the task was scheduled, set
* "migrate_disabled" and then got preempted, so we must
* check the task migration disable flag here too.
*/
- if (unlikely(task_rq(task) != rq ||
+ if (unlikely(is_migration_disabled(task) ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
- task_on_cpu(rq, task) ||
- !rt_task(task) ||
- is_migration_disabled(task) ||
- !task_on_rq_queued(task))) {
+ task != pick_next_pushable_task(rq))) {
double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;
@@ -1946,27 +1965,6 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
return lowest_rq;
}
-static struct task_struct *pick_next_pushable_task(struct rq *rq)
-{
- struct task_struct *p;
-
- if (!has_pushable_tasks(rq))
- return NULL;
-
- p = plist_first_entry(&rq->rt.pushable_tasks,
- struct task_struct, pushable_tasks);
-
- BUG_ON(rq->cpu != task_cpu(p));
- BUG_ON(task_current(rq, p));
- BUG_ON(task_current_donor(rq, p));
- BUG_ON(p->nr_cpus_allowed <= 1);
-
- BUG_ON(!task_on_rq_queued(p));
- BUG_ON(!rt_task(p));
-
- return p;
-}
-
/*
* If the current CPU has more than one RT task, see if the non
* running task can migrate over to a CPU that is running a task
--
2.22.3
[ resent due to a wrong address for regression reporting, sorry! ]
Hi,
we received a bug report showing the regression on 6.13.1 kernel
against 6.13.0. The symptom is that Chrome and VSCode stopped working
with Gnome Scaling, as reported on openSUSE Tumbleweed bug tracker
https://bugzilla.suse.com/show_bug.cgi?id=1236943
Quoting from there:
"""
I use the latest TW on Gnome with a 4K display and 150%
scaling. Everything has been working fine, but recently both Chrome
and VSCode (installed from official non-openSUSE channels) stopped
working with Scaling.
....
I am using VSCode with:
`--enable-features=UseOzonePlatform --enable-features=WaylandWindowDecorations --ozone-platform-hint=auto` and for Chrome, I select `Preferred Ozone platform` == `Wayland`.
"""
Surprisingly, the bisection pointed to the backport of the commit
b9b588f22a0c049a14885399e27625635ae6ef91 ("libfs: Use d_children list
to iterate simple_offset directories").
Indeed, the revert of this patch on the latest 6.13.4 was confirmed to
fix the issue. Also, the reporter verified that the latest 6.14-rc
release is still affected, too.
For now I have no concrete idea how the patch could break the behavior
of a graphical application like the above. Let us know if you need
something for debugging. (Or at easiest, join to the bugzilla entry
and ask there; or open another bug report at whatever you like.)
BTW, I'll be traveling tomorrow, so my reply will be delayed.
thanks,
Takashi
#regzbot introduced: b9b588f22a0c049a14885399e27625635ae6ef91
#regzbot monitor: https://bugzilla.suse.com/show_bug.cgi?id=1236943
Disabling HPD polling from i915_hpd_poll_init_work() involves probing
all display connectors explicitly to account for lost hotplug
interrupts. On some platforms (mostly pre-ICL) with HDMI connectors the
I2C EDID bit-banging using udelay() triggers in turn the
workqueue: i915_hpd_poll_init_work [i915] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
warning.
Fix the above by scheduling i915_hpd_poll_init_work() on a WQ_UNBOUND
workqueue. It's ok to use a system WQ, since i915_hpd_poll_init_work()
is properly flushed in intel_hpd_cancel_work().
The connector probing from drm_mode_config::output_poll_work resulting
in the same warning is fixed by the next patch.
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Heiner Kallweit <hkallweit1(a)gmail.com>
CC: stable(a)vger.kernel.org # 6.5
Suggested-by: Tejun Heo <tj(a)kernel.org>
Suggested-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Reported-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9245
Link: https://lore.kernel.org/all/f7e21caa-e98d-e5b5-932a-fe12d27fde9b@gmail.com
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_hotplug.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_hotplug.c b/drivers/gpu/drm/i915/display/intel_hotplug.c
index e8562f6f8bb44..accc2fec562a0 100644
--- a/drivers/gpu/drm/i915/display/intel_hotplug.c
+++ b/drivers/gpu/drm/i915/display/intel_hotplug.c
@@ -774,7 +774,7 @@ void intel_hpd_poll_enable(struct drm_i915_private *dev_priv)
* As well, there's no issue if we race here since we always reschedule
* this worker anyway
*/
- queue_work(dev_priv->unordered_wq,
+ queue_work(system_unbound_wq,
&dev_priv->display.hotplug.poll_init_work);
}
@@ -803,7 +803,7 @@ void intel_hpd_poll_disable(struct drm_i915_private *dev_priv)
return;
WRITE_ONCE(dev_priv->display.hotplug.poll_enabled, false);
- queue_work(dev_priv->unordered_wq,
+ queue_work(system_unbound_wq,
&dev_priv->display.hotplug.poll_init_work);
}
--
2.37.2
In the function "wled_probe", the "wled->name" is dynamically allocated
(wled_probe -> wled_configure -> devm_kasprintf), which is possible
to be null.
In the call trace: wled_probe -> devm_backlight_device_register
-> backlight_device_register, this "name" variable is directly
dereferenced without checking. We add a null-check statement.
Fixes: f86b77583d88 ("backlight: pm8941: Convert to using %pOFn instead of device_node.name")
Signed-off-by: Haoyu Li <lihaoyu499(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/video/backlight/backlight.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/video/backlight/backlight.c b/drivers/video/backlight/backlight.c
index f699e5827ccb..b21670bd86de 100644
--- a/drivers/video/backlight/backlight.c
+++ b/drivers/video/backlight/backlight.c
@@ -414,6 +414,8 @@ struct backlight_device *backlight_device_register(const char *name,
struct backlight_device *new_bd;
int rc;
+ if (!name)
+ return ERR_PTR(-EINVAL);
pr_debug("backlight_device_register: name=%s\n", name);
new_bd = kzalloc(sizeof(struct backlight_device), GFP_KERNEL);
--
2.34.1
This series corrects two issues found on the stm32mp135f-dk related
to a missing clock-names property within the stm32mp135.dtsi and a
st-mipid02 device-tree bindings issue.
Signed-off-by: Alain Volmat <alain.volmat(a)foss.st.com>
---
Alain Volmat (2):
ARM: dts: stm32: add missing dcmipp kclk clock-names in stm32mp135.dtsi
dt-bindings: media: st,stmipid02: correct lane-polarities maxItems
Documentation/devicetree/bindings/media/i2c/st,st-mipid02.yaml | 2 +-
arch/arm/boot/dts/st/stm32mp135.dtsi | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
---
base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
change-id: 20250210-6-14-stm32-media-fixes-5810b4feb917
Best regards,
--
Alain Volmat <alain.volmat(a)foss.st.com>
When I converted rk808 to device managed resources I converted the rk808
specific pm_power_off handler to devm_register_sys_off_handler() using
SYS_OFF_MODE_POWER_OFF_PREPARE, which is allowed to sleep. I did this
because the driver's poweroff function makes use of regmap and the backend
of that might sleep.
But the PMIC poweroff function will kill off the board power and the
kernel does some extra steps after the prepare handler. Thus the prepare
handler should not be used for the PMIC's poweroff routine. Instead the
normal SYS_OFF_MODE_POWER_OFF phase should be used. The old pm_power_off
method is also being called from there, so this would have been a
cleaner conversion anyways.
But it still makes sense to investigate the sleep handling and check
if there are any issues. Apparently the Rockchip and Meson I2C drivers
(the only platforms using the PMICs handled by this driver) both have
support for atomic transfers and thus may be called from the proper
poweroff context.
Things are different on the SPI side. That is so far only used by rk806
and that one is only used by Rockchip RK3588. Unfortunately the Rockchip
SPI driver does not support atomic transfers. That means using the
normal POWER_OFF handler would introduce the following error splash
during shutdown on all RK3588 boards currently supported upstream:
[ 13.761353] ------------[ cut here ]------------
[ 13.761764] Voluntary context switch within RCU read-side critical section!
[ 13.761776] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_plugin.h:330 rcu_note_context_switch+0x3ac/0x404
[ 13.763219] Modules linked in:
[ 13.763498] CPU: 0 UID: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.10.0-12284-g2818a9a19514 #1499
[ 13.764297] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT)
[ 13.764812] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 13.765427] pc : rcu_note_context_switch+0x3ac/0x404
[ 13.765871] lr : rcu_note_context_switch+0x3ac/0x404
[ 13.766314] sp : ffff800084f4b5b0
[ 13.766609] x29: ffff800084f4b5b0 x28: ffff00040139b800 x27: 00007dfb4439ae80
[ 13.767245] x26: ffff00040139bc80 x25: 0000000000000000 x24: ffff800082118470
[ 13.767880] x23: 0000000000000000 x22: ffff000400300000 x21: ffff000400300000
[ 13.768515] x20: ffff800083a9d600 x19: ffff0004fee48600 x18: fffffffffffed448
[ 13.769151] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0000000000000048
[ 13.769787] x14: fffffffffffed490 x13: ffff80008473b3c0 x12: 0000000000000900
[ 13.770421] x11: 0000000000000300 x10: ffff800084797bc0 x9 : ffff80008473b3c0
[ 13.771057] x8 : 00000000ffffefff x7 : ffff8000847933c0 x6 : 0000000000000300
[ 13.771692] x5 : 0000000000000301 x4 : 40000000fffff300 x3 : 0000000000000000
[ 13.772328] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000400300000
[ 13.772964] Call trace:
[ 13.773184] rcu_note_context_switch+0x3ac/0x404
[ 13.773598] __schedule+0x94/0xb0c
[ 13.773907] schedule+0x34/0x104
[ 13.774198] schedule_timeout+0x84/0xfc
[ 13.774544] wait_for_completion_timeout+0x78/0x14c
[ 13.774980] spi_transfer_one_message+0x588/0x690
[ 13.775403] __spi_pump_transfer_message+0x19c/0x4ec
[ 13.775846] __spi_sync+0x2a8/0x3c4
[ 13.776161] spi_write_then_read+0x120/0x208
[ 13.776543] rk806_spi_bus_read+0x54/0x88
[ 13.776905] _regmap_raw_read+0xec/0x16c
[ 13.777257] _regmap_bus_read+0x44/0x7c
[ 13.777601] _regmap_read+0x60/0xd8
[ 13.777915] _regmap_update_bits+0xf4/0x13c
[ 13.778289] regmap_update_bits_base+0x64/0x98
[ 13.778686] rk808_power_off+0x70/0xfc
[ 13.779024] sys_off_notify+0x40/0x6c
[ 13.779356] atomic_notifier_call_chain+0x60/0x90
[ 13.779776] do_kernel_power_off+0x54/0x6c
[ 13.780146] machine_power_off+0x18/0x24
[ 13.780499] kernel_power_off+0x70/0x7c
[ 13.780845] __do_sys_reboot+0x210/0x270
[ 13.781198] __arm64_sys_reboot+0x24/0x30
[ 13.781558] invoke_syscall+0x48/0x10c
[ 13.781897] el0_svc_common+0x3c/0xe8
[ 13.782228] do_el0_svc+0x20/0x2c
[ 13.782528] el0_svc+0x34/0xd8
[ 13.782806] el0t_64_sync_handler+0x120/0x12c
[ 13.783197] el0t_64_sync+0x190/0x194
[ 13.783527] ---[ end trace 0000000000000000 ]---
To avoid this we keep the SYS_OFF_MODE_POWER_OFF_PREPARE handler for the
SPI backend. This is not great, but at least avoids regressions and the
fix should be small enough to allow backporting.
As a side-effect this also works around a shutdown problem on the Asus
C201. For reasons unknown that skips calling the prepare handler and
directly calls the final shutdown handler.
Fixes: 4fec8a5a85c49 ("mfd: rk808: Convert to device managed resources")
Cc: stable(a)vger.kernel.org
Reported-by: Urja <urja(a)urja.dev>
Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com>
---
drivers/mfd/rk8xx-core.c | 15 +++++++++++++--
drivers/mfd/rk8xx-i2c.c | 2 +-
drivers/mfd/rk8xx-spi.c | 2 +-
include/linux/mfd/rk808.h | 2 +-
4 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/mfd/rk8xx-core.c b/drivers/mfd/rk8xx-core.c
index 5eda3c0dbbdf..757ef8181328 100644
--- a/drivers/mfd/rk8xx-core.c
+++ b/drivers/mfd/rk8xx-core.c
@@ -692,10 +692,11 @@ void rk8xx_shutdown(struct device *dev)
}
EXPORT_SYMBOL_GPL(rk8xx_shutdown);
-int rk8xx_probe(struct device *dev, int variant, unsigned int irq, struct regmap *regmap)
+int rk8xx_probe(struct device *dev, int variant, unsigned int irq, struct regmap *regmap, bool is_spi)
{
struct rk808 *rk808;
const struct rk808_reg_data *pre_init_reg;
+ enum sys_off_mode pwr_off_mode = SYS_OFF_MODE_POWER_OFF;
const struct mfd_cell *cells;
int dual_support = 0;
int nr_pre_init_regs;
@@ -785,10 +786,20 @@ int rk8xx_probe(struct device *dev, int variant, unsigned int irq, struct regmap
if (ret)
return dev_err_probe(dev, ret, "failed to add MFD devices\n");
+ /*
+ * Currently the Rockchip SPI driver always sleeps when doing SPI
+ * transfers. This is not allowed in the SYS_OFF_MODE_POWER_OFF
+ * handler, so we are using the prepare handler as a workaround.
+ * This should be removed once the Rockchip SPI driver has been
+ * adapted.
+ */
+ if (is_spi)
+ pwr_off_mode = SYS_OFF_MODE_POWER_OFF_PREPARE;
+
if (device_property_read_bool(dev, "rockchip,system-power-controller") ||
device_property_read_bool(dev, "system-power-controller")) {
ret = devm_register_sys_off_handler(dev,
- SYS_OFF_MODE_POWER_OFF_PREPARE, SYS_OFF_PRIO_HIGH,
+ pwr_off_mode, SYS_OFF_PRIO_HIGH,
&rk808_power_off, rk808);
if (ret)
return dev_err_probe(dev, ret,
diff --git a/drivers/mfd/rk8xx-i2c.c b/drivers/mfd/rk8xx-i2c.c
index 69a6b297d723..a2029decd654 100644
--- a/drivers/mfd/rk8xx-i2c.c
+++ b/drivers/mfd/rk8xx-i2c.c
@@ -189,7 +189,7 @@ static int rk8xx_i2c_probe(struct i2c_client *client)
return dev_err_probe(&client->dev, PTR_ERR(regmap),
"regmap initialization failed\n");
- return rk8xx_probe(&client->dev, data->variant, client->irq, regmap);
+ return rk8xx_probe(&client->dev, data->variant, client->irq, regmap, false);
}
static void rk8xx_i2c_shutdown(struct i2c_client *client)
diff --git a/drivers/mfd/rk8xx-spi.c b/drivers/mfd/rk8xx-spi.c
index 3405fb82ff9f..20f9428f94bb 100644
--- a/drivers/mfd/rk8xx-spi.c
+++ b/drivers/mfd/rk8xx-spi.c
@@ -94,7 +94,7 @@ static int rk8xx_spi_probe(struct spi_device *spi)
return dev_err_probe(&spi->dev, PTR_ERR(regmap),
"Failed to init regmap\n");
- return rk8xx_probe(&spi->dev, RK806_ID, spi->irq, regmap);
+ return rk8xx_probe(&spi->dev, RK806_ID, spi->irq, regmap, true);
}
static const struct of_device_id rk8xx_spi_of_match[] = {
diff --git a/include/linux/mfd/rk808.h b/include/linux/mfd/rk808.h
index 69cbea78b430..be15b84cff9e 100644
--- a/include/linux/mfd/rk808.h
+++ b/include/linux/mfd/rk808.h
@@ -1349,7 +1349,7 @@ struct rk808 {
};
void rk8xx_shutdown(struct device *dev);
-int rk8xx_probe(struct device *dev, int variant, unsigned int irq, struct regmap *regmap);
+int rk8xx_probe(struct device *dev, int variant, unsigned int irq, struct regmap *regmap, bool is_spi);
int rk8xx_suspend(struct device *dev);
int rk8xx_resume(struct device *dev);
--
2.43.0
During the High-Speed Isochronous Audio transfers, xHCI
controller on certain AMD platforms experiences momentary data
loss. This results in Missed Service Errors (MSE) being
generated by the xHCI.
The root cause of the MSE is attributed to the ISOC OUT endpoint
being omitted from scheduling. This can happen either when an IN
endpoint with a 64ms service interval is pre-scheduled prior to
the ISOC OUT endpoint or when the interval of the ISOC OUT
endpoint is shorter than that of the IN endpoint. Consequently,
the OUT service is neglected when an IN endpoint with a service
interval exceeding 32ms is scheduled concurrently (every 64ms in
this scenario).
This issue is particularly seen on certain older AMD platforms.
To mitigate this problem, it is recommended to adjust the service
interval of the IN endpoint to not exceed 32ms (interval 8). This
adjustment ensures that the OUT endpoint will not be bypassed,
even if a smaller interval value is utilized.
Cc: stable(a)vger.kernel.org
Signed-off-by: Raju Rangoju <Raju.Rangoju(a)amd.com>
---
Changes since v3:
- Bump up the enum number XHCI_LIMIT_ENDPOINT_INTERVAL_9
Changes since v2:
- added stable tag to backport to all stable kernels
Changes since v1:
- replaced hex values with pci device names
- corrected the commit message
drivers/usb/host/xhci-mem.c | 5 +++++
drivers/usb/host/xhci-pci.c | 25 +++++++++++++++++++++++++
drivers/usb/host/xhci.h | 1 +
3 files changed, 31 insertions(+)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..d3182ba98788 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1420,6 +1420,11 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
/* Periodic endpoint bInterval limit quirk */
if (usb_endpoint_xfer_int(&ep->desc) ||
usb_endpoint_xfer_isoc(&ep->desc)) {
+ if ((xhci->quirks & XHCI_LIMIT_ENDPOINT_INTERVAL_9) &&
+ usb_endpoint_xfer_int(&ep->desc) &&
+ interval >= 9) {
+ interval = 8;
+ }
if ((xhci->quirks & XHCI_LIMIT_ENDPOINT_INTERVAL_7) &&
udev->speed >= USB_SPEED_HIGH &&
interval >= 7) {
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 2d1e205c14c6..d23884afdf3f 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -69,12 +69,22 @@
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_XHCI 0x15ec
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_XHCI 0x15f0
+#define PCI_DEVICE_ID_AMD_ARIEL_TYPEC_XHCI 0x13ed
+#define PCI_DEVICE_ID_AMD_ARIEL_TYPEA_XHCI 0x13ee
+#define PCI_DEVICE_ID_AMD_STARSHIP_XHCI 0x148c
+#define PCI_DEVICE_ID_AMD_FIREFLIGHT_15D4_XHCI 0x15d4
+#define PCI_DEVICE_ID_AMD_FIREFLIGHT_15D5_XHCI 0x15d5
+#define PCI_DEVICE_ID_AMD_RAVEN_15E0_XHCI 0x15e0
+#define PCI_DEVICE_ID_AMD_RAVEN_15E1_XHCI 0x15e1
+#define PCI_DEVICE_ID_AMD_RAVEN2_XHCI 0x15e5
#define PCI_DEVICE_ID_AMD_RENOIR_XHCI 0x1639
#define PCI_DEVICE_ID_AMD_PROMONTORYA_4 0x43b9
#define PCI_DEVICE_ID_AMD_PROMONTORYA_3 0x43ba
#define PCI_DEVICE_ID_AMD_PROMONTORYA_2 0x43bb
#define PCI_DEVICE_ID_AMD_PROMONTORYA_1 0x43bc
+#define PCI_DEVICE_ID_ATI_NAVI10_7316_XHCI 0x7316
+
#define PCI_DEVICE_ID_ASMEDIA_1042_XHCI 0x1042
#define PCI_DEVICE_ID_ASMEDIA_1042A_XHCI 0x1142
#define PCI_DEVICE_ID_ASMEDIA_1142_XHCI 0x1242
@@ -278,6 +288,21 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_NEC)
xhci->quirks |= XHCI_NEC_HOST;
+ if (pdev->vendor == PCI_VENDOR_ID_AMD &&
+ (pdev->device == PCI_DEVICE_ID_AMD_ARIEL_TYPEC_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_ARIEL_TYPEA_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_STARSHIP_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_FIREFLIGHT_15D4_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_FIREFLIGHT_15D5_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_RAVEN_15E0_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_RAVEN_15E1_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_RAVEN2_XHCI))
+ xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_9;
+
+ if (pdev->vendor == PCI_VENDOR_ID_ATI &&
+ pdev->device == PCI_DEVICE_ID_ATI_NAVI10_7316_XHCI)
+ xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_9;
+
if (pdev->vendor == PCI_VENDOR_ID_AMD && xhci->hci_version == 0x96)
xhci->quirks |= XHCI_AMD_0x96_HOST;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 4914f0a10cff..36b77d3c0e7b 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1633,6 +1633,7 @@ struct xhci_hcd {
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
#define XHCI_ETRON_HOST BIT_ULL(49)
+#define XHCI_LIMIT_ENDPOINT_INTERVAL_9 BIT_ULL(50)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.34.1
Overview
========
When a CPU chooses to call push_rt_task and picks a task to push to
another CPU's runqueue then it will call find_lock_lowest_rq method
which would take a double lock on both CPUs' runqueues. If one of the
locks aren't readily available, it may lead to dropping the current
runqueue lock and reacquiring both the locks at once. During this window
it is possible that the task is already migrated and is running on some
other CPU. These cases are already handled. However, if the task is
migrated and has already been executed and another CPU is now trying to
wake it up (ttwu) such that it is queued again on the runqeue
(on_rq is 1) and also if the task was run by the same CPU, then the
current checks will pass even though the task was migrated out and is no
longer in the pushable tasks list.
Crashes
=======
This bug resulted in quite a few flavors of crashes triggering kernel
panics with various crash signatures such as assert failures, page
faults, null pointer dereferences, and queue corruption errors all
coming from scheduler itself.
Some of the crashes:
-> kernel BUG at kernel/sched/rt.c:1616! BUG_ON(idx >= MAX_RT_PRIO)
Call Trace:
? __die_body+0x1a/0x60
? die+0x2a/0x50
? do_trap+0x85/0x100
? pick_next_task_rt+0x6e/0x1d0
? do_error_trap+0x64/0xa0
? pick_next_task_rt+0x6e/0x1d0
? exc_invalid_op+0x4c/0x60
? pick_next_task_rt+0x6e/0x1d0
? asm_exc_invalid_op+0x12/0x20
? pick_next_task_rt+0x6e/0x1d0
__schedule+0x5cb/0x790
? update_ts_time_stats+0x55/0x70
schedule_idle+0x1e/0x40
do_idle+0x15e/0x200
cpu_startup_entry+0x19/0x20
start_secondary+0x117/0x160
secondary_startup_64_no_verify+0xb0/0xbb
-> BUG: kernel NULL pointer dereference, address: 00000000000000c0
Call Trace:
? __die_body+0x1a/0x60
? no_context+0x183/0x350
? __warn+0x8a/0xe0
? exc_page_fault+0x3d6/0x520
? asm_exc_page_fault+0x1e/0x30
? pick_next_task_rt+0xb5/0x1d0
? pick_next_task_rt+0x8c/0x1d0
__schedule+0x583/0x7e0
? update_ts_time_stats+0x55/0x70
schedule_idle+0x1e/0x40
do_idle+0x15e/0x200
cpu_startup_entry+0x19/0x20
start_secondary+0x117/0x160
secondary_startup_64_no_verify+0xb0/0xbb
-> BUG: unable to handle page fault for address: ffff9464daea5900
kernel BUG at kernel/sched/rt.c:1861! BUG_ON(rq->cpu != task_cpu(p))
-> kernel BUG at kernel/sched/rt.c:1055! BUG_ON(!rq->nr_running)
Call Trace:
? __die_body+0x1a/0x60
? die+0x2a/0x50
? do_trap+0x85/0x100
? dequeue_top_rt_rq+0xa2/0xb0
? do_error_trap+0x64/0xa0
? dequeue_top_rt_rq+0xa2/0xb0
? exc_invalid_op+0x4c/0x60
? dequeue_top_rt_rq+0xa2/0xb0
? asm_exc_invalid_op+0x12/0x20
? dequeue_top_rt_rq+0xa2/0xb0
dequeue_rt_entity+0x1f/0x70
dequeue_task_rt+0x2d/0x70
__schedule+0x1a8/0x7e0
? blk_finish_plug+0x25/0x40
schedule+0x3c/0xb0
futex_wait_queue_me+0xb6/0x120
futex_wait+0xd9/0x240
do_futex+0x344/0xa90
? get_mm_exe_file+0x30/0x60
? audit_exe_compare+0x58/0x70
? audit_filter_rules.constprop.26+0x65e/0x1220
__x64_sys_futex+0x148/0x1f0
do_syscall_64+0x30/0x80
entry_SYSCALL_64_after_hwframe+0x62/0xc7
-> BUG: unable to handle page fault for address: ffff8cf3608bc2c0
Call Trace:
? __die_body+0x1a/0x60
? no_context+0x183/0x350
? spurious_kernel_fault+0x171/0x1c0
? exc_page_fault+0x3b6/0x520
? plist_check_list+0x15/0x40
? plist_check_list+0x2e/0x40
? asm_exc_page_fault+0x1e/0x30
? _cond_resched+0x15/0x30
? futex_wait_queue_me+0xc8/0x120
? futex_wait+0xd9/0x240
? try_to_wake_up+0x1b8/0x490
? futex_wake+0x78/0x160
? do_futex+0xcd/0xa90
? plist_check_list+0x15/0x40
? plist_check_list+0x2e/0x40
? plist_del+0x6a/0xd0
? plist_check_list+0x15/0x40
? plist_check_list+0x2e/0x40
? dequeue_pushable_task+0x20/0x70
? __schedule+0x382/0x7e0
? asm_sysvec_reschedule_ipi+0xa/0x20
? schedule+0x3c/0xb0
? exit_to_user_mode_prepare+0x9e/0x150
? irqentry_exit_to_user_mode+0x5/0x30
? asm_sysvec_reschedule_ipi+0x12/0x20
Above are some of the common examples of the crashes that were observed
due to this issue.
Details
=======
Let's look at the following scenario to understand this race.
1) CPU A enters push_rt_task
a) CPU A has chosen next_task = task p.
b) CPU A calls find_lock_lowest_rq(Task p, CPU Z’s rq).
c) CPU A identifies CPU X as a destination CPU (X < Z).
d) CPU A enters double_lock_balance(CPU Z’s rq, CPU X’s rq).
e) Since X is lower than Z, CPU A unlocks CPU Z’s rq. Someone else has
locked CPU X’s rq, and thus, CPU A must wait.
2) At CPU Z
a) Previous task has completed execution and thus, CPU Z enters
schedule, locks its own rq after CPU A releases it.
b) CPU Z dequeues previous task and begins executing task p.
c) CPU Z unlocks its rq.
d) Task p yields the CPU (ex. by doing IO or waiting to acquire a
lock) which triggers the schedule function on CPU Z.
e) CPU Z enters schedule again, locks its own rq, and dequeues task p.
f) As part of dequeue, it sets p.on_rq = 0 and unlocks its rq.
3) At CPU B
a) CPU B enters try_to_wake_up with input task p.
b) Since CPU Z dequeued task p, p.on_rq = 0, and CPU B updates
B.state = WAKING.
c) CPU B via select_task_rq determines CPU Y as the target CPU.
4) The race
a) CPU A acquires CPU X’s lock and relocks CPU Z.
b) CPU A reads task p.cpu = Z and incorrectly concludes task p is
still on CPU Z.
c) CPU A failed to notice task p had been dequeued from CPU Z while
CPU A was waiting for locks in double_lock_balance. If CPU A knew
that task p had been dequeued, it would return NULL forcing
push_rt_task to give up the task p's migration.
d) CPU B updates task p.cpu = Y and calls ttwu_queue.
e) CPU B locks Ys rq. CPU B enqueues task p onto Y and sets task
p.on_rq = 1.
f) CPU B unlocks CPU Y, triggering memory synchronization.
g) CPU A reads task p.on_rq = 1, cementing its assumption that task p
has not migrated.
h) CPU A decides to migrate p to CPU X.
This leads to A dequeuing p from Y's queue and various crashes down the
line.
Solution
========
The solution here is fairly simple. After obtaining the lock (at 4a),
the check is enhanced to make sure that the task is still at the head of
the pushable tasks list. If not, then it is anyway not suitable for
being pushed out.
Testing
=======
The fix is tested on a cluster of 3 nodes, where the panics due to this
are hit every couple of days. A fix similar to this was deployed on such
cluster and was stable for more than 30 days.
Co-developed-by: Jon Kohler <jon(a)nutanix.com>
Signed-off-by: Jon Kohler <jon(a)nutanix.com>
Co-developed-by: Gauri Patwardhan <gauri.patwardhan(a)nutanix.com>
Signed-off-by: Gauri Patwardhan <gauri.patwardhan(a)nutanix.com>
Co-developed-by: Rahul Chunduru <rahul.chunduru(a)nutanix.com>
Signed-off-by: Rahul Chunduru <rahul.chunduru(a)nutanix.com>
Signed-off-by: Harshit Agarwal <harshit(a)nutanix.com>
Tested-by: Will Ton <william.ton(a)nutanix.com>
---
Changes in v2:
- As per Steve's suggestion, removed some checks that are done after
obtaining the lock that are no longer needed with the addition of new
check.
- Moved up is_migration_disabled check.
- Link to v1:
https://lore.kernel.org/lkml/20250211054646.23987-1-harshit@nutanix.com/
---
kernel/sched/rt.c | 54 +++++++++++++++++++++++------------------------
1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4b8e33c615b1..4762dd3f50c5 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1885,6 +1885,27 @@ static int find_lowest_rq(struct task_struct *task)
return -1;
}
+static struct task_struct *pick_next_pushable_task(struct rq *rq)
+{
+ struct task_struct *p;
+
+ if (!has_pushable_tasks(rq))
+ return NULL;
+
+ p = plist_first_entry(&rq->rt.pushable_tasks,
+ struct task_struct, pushable_tasks);
+
+ BUG_ON(rq->cpu != task_cpu(p));
+ BUG_ON(task_current(rq, p));
+ BUG_ON(task_current_donor(rq, p));
+ BUG_ON(p->nr_cpus_allowed <= 1);
+
+ BUG_ON(!task_on_rq_queued(p));
+ BUG_ON(!rt_task(p));
+
+ return p;
+}
+
/* Will lock the rq it finds */
static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
{
@@ -1915,18 +1936,16 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
/*
* We had to unlock the run queue. In
* the mean time, task could have
- * migrated already or had its affinity changed.
- * Also make sure that it wasn't scheduled on its rq.
+ * migrated already or had its affinity changed,
+ * therefore check if the task is still at the
+ * head of the pushable tasks list.
* It is possible the task was scheduled, set
* "migrate_disabled" and then got preempted, so we must
* check the task migration disable flag here too.
*/
- if (unlikely(task_rq(task) != rq ||
+ if (unlikely(is_migration_disabled(task) ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
- task_on_cpu(rq, task) ||
- !rt_task(task) ||
- is_migration_disabled(task) ||
- !task_on_rq_queued(task))) {
+ task != pick_next_pushable_task(rq))) {
double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;
@@ -1946,27 +1965,6 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
return lowest_rq;
}
-static struct task_struct *pick_next_pushable_task(struct rq *rq)
-{
- struct task_struct *p;
-
- if (!has_pushable_tasks(rq))
- return NULL;
-
- p = plist_first_entry(&rq->rt.pushable_tasks,
- struct task_struct, pushable_tasks);
-
- BUG_ON(rq->cpu != task_cpu(p));
- BUG_ON(task_current(rq, p));
- BUG_ON(task_current_donor(rq, p));
- BUG_ON(p->nr_cpus_allowed <= 1);
-
- BUG_ON(!task_on_rq_queued(p));
- BUG_ON(!rt_task(p));
-
- return p;
-}
-
/*
* If the current CPU has more than one RT task, see if the non
* running task can migrate over to a CPU that is running a task
--
2.22.3
From: Kairui Song <kasong(a)tencent.com>
This series fixes the page cache corruption issue reported by Christian
Theune [1]. The issue was reported affects kernels back to 5.19.
Current maintained effected branches includes 6.1 and 6.6 and the fix
was included in 6.10 already.
This series can be applied for both 6.1 and 6.6.
Patch 3/3 is the fixing patch. It was initially submitted and merge as
an optimization but found to have fixed the corruption by handling race
correctly.
Patch 1/3 and 2/3 is required for 3/3.
Patch 3/3 included some unit test code, making the LOC of the backport a
bit higher, but should be OK to be kept, since they are just test code.
Note there seems still some unresolved problem in Link [1] but that
should be a different issue, and the commits being backported have been
well tested, they fix the corruption issue just fine.
Link: https://lore.kernel.org/linux-mm/A5A976CB-DB57-4513-A700-656580488AB6@flyin… [1]
Kairui Song (3):
mm/filemap: return early if failed to allocate memory for split
lib/xarray: introduce a new helper xas_get_order
mm/filemap: optimize filemap folio adding
include/linux/xarray.h | 6 +++
lib/test_xarray.c | 93 ++++++++++++++++++++++++++++++++++++++++++
lib/xarray.c | 49 ++++++++++++++--------
mm/filemap.c | 50 ++++++++++++++++++-----
4 files changed, 169 insertions(+), 29 deletions(-)
--
2.46.1
Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex. crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).
On the other hand, crypto_alloc_acomp_node() holds the scomp_lock
(through crypto_scomp_init_tfm()), and then allocates memory.
If the allocation results in reclaim, we may attempt to hold the per-CPU
acomp_ctx mutex.
The above dependencies can cause an ABBA deadlock. For example in the
following scenario:
(1) Task A running on CPU #1:
crypto_alloc_acomp_node()
Holds scomp_lock
Enters reclaim
Reads per_cpu_ptr(pool->acomp_ctx, 1)
(2) Task A is descheduled
(3) CPU #1 goes offline
zswap_cpu_comp_dead(CPU #1)
Holds per_cpu_ptr(pool->acomp_ctx, 1))
Calls crypto_free_acomp()
Waits for scomp_lock
(4) Task A running on CPU #2:
Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
DEADLOCK
Since there is no requirement to call crypto_free_acomp() with the
per-CPU acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the
mutex is unlocked. Also move the acomp_request_free() and kfree() calls
for consistency and to avoid any potential sublte locking dependencies
in the future.
With this, only setting acomp_ctx fields to NULL occurs with the mutex
held. This is similar to how zswap_cpu_comp_prepare() only initializes
acomp_ctx fields with the mutex held, after performing all allocations
before holding the mutex.
Opportunistically, move the NULL check on acomp_ctx so that it takes
place before the mutex dereference.
Fixes: 12dcb0ef5406 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
Reported-by: syzbot+1a517ccfcbc6a7ab0f82(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcea51.050a0220.bbfd1.0096.GAE@google.com/
Cc: <stable(a)vger.kernel.org>
Co-developed-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Acked-by: Herbert Xu <herbert(a)gondor.apana.org.au>
---
v1 -> v2:
- Explained the problem more clearly in the commit message.
- Moved all freeing calls outside the lock critical section.
v1: https://lore.kernel.org/all/Z72FJnbA39zWh4zS@gondor.apana.org.au/
---
mm/zswap.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index ac9d299e7d0c1..adf745c66aa1d 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -881,18 +881,32 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
{
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
+ struct acomp_req *req;
+ struct crypto_acomp *acomp;
+ u8 *buffer;
+
+ if (IS_ERR_OR_NULL(acomp_ctx))
+ return 0;
mutex_lock(&acomp_ctx->mutex);
- if (!IS_ERR_OR_NULL(acomp_ctx)) {
- if (!IS_ERR_OR_NULL(acomp_ctx->req))
- acomp_request_free(acomp_ctx->req);
- acomp_ctx->req = NULL;
- if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
- crypto_free_acomp(acomp_ctx->acomp);
- kfree(acomp_ctx->buffer);
- }
+ req = acomp_ctx->req;
+ acomp = acomp_ctx->acomp;
+ buffer = acomp_ctx->buffer;
+ acomp_ctx->req = NULL;
+ acomp_ctx->acomp = NULL;
+ acomp_ctx->buffer = NULL;
mutex_unlock(&acomp_ctx->mutex);
+ /*
+ * Do the actual freeing after releasing the mutex to avoid subtle
+ * locking dependencies causing deadlocks.
+ */
+ if (!IS_ERR_OR_NULL(req))
+ acomp_request_free(req);
+ if (!IS_ERR_OR_NULL(acomp))
+ crypto_free_acomp(acomp);
+ kfree(buffer);
+
return 0;
}
--
2.48.1.658.g4767266eb4-goog
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Turns out LNL+ and BMG+ no longer have the weird extra scanline
offset for HDMI outputs. Fix intel_crtc_scanline_offset()
accordingly so that scanline evasion/etc. works correctly on
HDMI outputs on these new platforms.
Cc: stable(a)vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_vblank.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_vblank.c b/drivers/gpu/drm/i915/display/intel_vblank.c
index 4efd4f7d497a..7b240ce681a0 100644
--- a/drivers/gpu/drm/i915/display/intel_vblank.c
+++ b/drivers/gpu/drm/i915/display/intel_vblank.c
@@ -222,7 +222,9 @@ int intel_crtc_scanline_offset(const struct intel_crtc_state *crtc_state)
* However if queried just before the start of vblank we'll get an
* answer that's slightly in the future.
*/
- if (DISPLAY_VER(display) == 2)
+ if (DISPLAY_VER(display) >= 20 || display->platform.battlemage)
+ return 1;
+ else if (DISPLAY_VER(display) == 2)
return -1;
else if (HAS_DDI(display) && intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI))
return 2;
--
2.45.3
From: Zhang Lixu <lixu.zhang(a)intel.com>
[ Upstream commit 4b54ae69197b9f416baa0fceadff7e89075f8454 ]
The timestamps in the Firmware log and HID sensor samples are incorrect.
They show 1970-01-01 because the current IPC driver only uses the first
8 bytes of bootup time when synchronizing time with the firmware. The
firmware converts the bootup time to UTC time, which results in the
display of 1970-01-01.
In write_ipc_from_queue(), when sending the MNG_SYNC_FW_CLOCK message,
the clock is updated according to the definition of ipc_time_update_msg.
However, in _ish_sync_fw_clock(), the message length is specified as the
size of uint64_t when building the doorbell. As a result, the firmware
only receives the first 8 bytes of struct ipc_time_update_msg.
This patch corrects the length in the doorbell to ensure the entire
ipc_time_update_msg is sent, fixing the timestamp issue.
Signed-off-by: Zhang Lixu <lixu.zhang(a)intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/intel-ish-hid/ipc/ipc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c b/drivers/hid/intel-ish-hid/ipc/ipc.c
index dd5fc60874ba1..b1a41c90c5741 100644
--- a/drivers/hid/intel-ish-hid/ipc/ipc.c
+++ b/drivers/hid/intel-ish-hid/ipc/ipc.c
@@ -577,14 +577,14 @@ static void fw_reset_work_fn(struct work_struct *unused)
static void _ish_sync_fw_clock(struct ishtp_device *dev)
{
static unsigned long prev_sync;
- uint64_t usec;
+ struct ipc_time_update_msg time = {};
if (prev_sync && time_before(jiffies, prev_sync + 20 * HZ))
return;
prev_sync = jiffies;
- usec = ktime_to_us(ktime_get_boottime());
- ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &usec, sizeof(uint64_t));
+ /* The fields of time would be updated while sending message */
+ ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &time, sizeof(time));
}
/**
--
2.39.5
The retain_ff bit should be updated for a GDSC when it is under SW
control and ON. The current sequence needs to be fixed as the GDSC
needs to update retention and is moved to HW control which does not
guarantee the GDSC to be in enabled state.
During the GDSC FSM state, the GDSC hardware waits for an ACK and the
timeout for the ACK is 2000us as per design requirements.
Signed-off-by: Taniya Das <quic_tdas(a)quicinc.com>
---
Taniya Das (2):
clk: qcom: gdsc: Set retain_ff before moving to HW CTRL
clk: qcom: gdsc: Update the status poll timeout for GDSC
drivers/clk/qcom/gdsc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
---
base-commit: c674aa7c289e51659e40dda0f954886ef7f80042
change-id: 20250212-gdsc_fixes-77e8b8e27e2f
Best regards,
--
Taniya Das <quic_tdas(a)quicinc.com>