March 2025 - Linux-stable-mirror

[PATCH 6.1.y 1/1] hrtimer: Use and report correct timerslack values for realtime tasks

by Felix Moessbauer

commit ed4fb6d7ef68111bb539283561953e5c6e9a6e38 upstream. The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple timers in a single interrupt. This is a performance optimization. Timers of realtime tasks (having a realtime scheduling policy) should not be delayed. This logic was inconsitently applied to the hrtimers, leading to delays of realtime tasks which used timed waits for events (e.g. condition variables). Due to the downstream override of the slack for rt tasks, the procfs reported incorrect (non-zero) timerslack_ns values. This is changed by setting the timer_slack_ns task attribute to 0 for all tasks with a rt policy. By that, downstream users do not need to specially handle rt tasks (w.r.t. the slack), and the procfs entry shows the correct value of "0". Setting non-zero slack values (either via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored, as stated in "man 2 PR_SET_TIMERSLACK": Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2)). The special handling of timerslack on rt tasks in downstream users is removed as well. Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemen… --- fs/proc/base.c | 9 +++++---- fs/select.c | 11 ++++------- kernel/sched/core.c | 8 ++++++++ kernel/sys.c | 2 ++ kernel/time/hrtimer.c | 18 +++--------------- 5 files changed, 22 insertions(+), 26 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index ecc45389ea793..82e4a8805bae6 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2633,10 +2633,11 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf, } task_lock(p); - if (slack_ns == 0) - p->timer_slack_ns = p->default_timer_slack_ns; - else - p->timer_slack_ns = slack_ns; + if (task_is_realtime(p)) + slack_ns = 0; + else if (slack_ns == 0) + slack_ns = p->default_timer_slack_ns; + p->timer_slack_ns = slack_ns; task_unlock(p); out: diff --git a/fs/select.c b/fs/select.c index 3f730b8581f65..e66b6189845ea 100644 --- a/fs/select.c +++ b/fs/select.c @@ -77,19 +77,16 @@ u64 select_estimate_accuracy(struct timespec64 *tv) { u64 ret; struct timespec64 now; + u64 slack = current->timer_slack_ns; - /* - * Realtime tasks get a slack of 0 for obvious reasons. - */ - - if (rt_task(current)) + if (slack == 0) return 0; ktime_get_ts64(&now); now = timespec64_sub(*tv, now); ret = __estimate_accuracy(&now); - if (ret < current->timer_slack_ns) - return current->timer_slack_ns; + if (ret < slack) + return slack; return ret; } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0a483fd9f5de5..9be8a509b5f3f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7380,6 +7380,14 @@ static void __setscheduler_params(struct task_struct *p, else if (fair_policy(policy)) p->static_prio = NICE_TO_PRIO(attr->sched_nice); + /* rt-policy tasks do not have a timerslack */ + if (task_is_realtime(p)) { + p->timer_slack_ns = 0; + } else if (p->timer_slack_ns == 0) { + /* when switching back to non-rt policy, restore timerslack */ + p->timer_slack_ns = p->default_timer_slack_ns; + } + /* * __sched_setscheduler() ensures attr->sched_priority == 0 when * !rt_policy. Always setting this ensures that things like diff --git a/kernel/sys.c b/kernel/sys.c index d06eda1387b69..06a9a87a8d3e0 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2477,6 +2477,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = current->timer_slack_ns; break; case PR_SET_TIMERSLACK: + if (task_is_realtime(current)) + break; if (arg2 <= 0) current->timer_slack_ns = current->default_timer_slack_ns; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 8db65e2db14c7..f6d799646dd9c 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -2090,14 +2090,9 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode, struct restart_block *restart; struct hrtimer_sleeper t; int ret = 0; - u64 slack; - - slack = current->timer_slack_ns; - if (dl_task(current) || rt_task(current)) - slack = 0; hrtimer_init_sleeper_on_stack(&t, clockid, mode); - hrtimer_set_expires_range_ns(&t.timer, rqtp, slack); + hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns); ret = do_nanosleep(&t, mode); if (ret != -ERESTART_RESTARTBLOCK) goto out; @@ -2278,7 +2273,7 @@ void __init hrtimers_init(void) /** * schedule_hrtimeout_range_clock - sleep until timeout * @expires: timeout value (ktime_t) - * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks + * @delta: slack in expires timeout (ktime_t) * @mode: timer mode * @clock_id: timer clock to be used */ @@ -2305,13 +2300,6 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta, return -EINTR; } - /* - * Override any slack passed by the user if under - * rt contraints. - */ - if (rt_task(current)) - delta = 0; - hrtimer_init_sleeper_on_stack(&t, clock_id, mode); hrtimer_set_expires_range_ns(&t.timer, *expires, delta); hrtimer_sleeper_start_expires(&t, mode); @@ -2331,7 +2319,7 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock); /** * schedule_hrtimeout_range - sleep until timeout * @expires: timeout value (ktime_t) - * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks + * @delta: slack in expires timeout (ktime_t) * @mode: timer mode * * Make the current task sleep until the given expiry time has -- 2.47.2

8 months

2
1
0 0

[PATCH 5.4 0/4] sctp: sysctl: fix argument passed to container_of

by Magali Lemes

Patches "sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy" and "sctp: sysctl: auth_enable: avoid using current->nsproxy" have been mixed up when backported to 5.4. The `member` argument passed to `container_of` has been swapped in both proc_sctp_do_auth() and proc_sctp_do_hmac_alg(). For instance, accessing /proc/sys/net/sctp/cookie_hmac_alg can now cause a kernel oops. Fix this by reverting the wrong backports and re-applying them correctly. Magali Lemes (2): Revert "sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy" Revert "sctp: sysctl: auth_enable: avoid using current->nsproxy" Matthieu Baerts (NGI0) (2): sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy sctp: sysctl: auth_enable: avoid using current->nsproxy net/sctp/sysctl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.48.1

8 months

2
8
0 0

[PATCH stable 5.15] lib/buildid: Handle memfd_secret() files in build_id_parse()

by Chen Linxuan

Backport of a similar change from commit 5ac9b4e935df ("lib/buildid: Handle memfd_secret() files in build_id_parse()") to address an issue where accessing secret memfd contents through build_id_parse() would trigger faults. Original report and repro can be found in [0]. [0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/ This repro will cause BUG: unable to handle kernel paging request in build_id_parse in 5.15/6.1/6.6. Some other discussions can be found in [1]. [1] https://lore.kernel.org/bpf/20241104175256.2327164-1-jolsa@kernel.org/T/#u Cc: stable(a)vger.kernel.org Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event") Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org> --- lib/buildid.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/lib/buildid.c b/lib/buildid.c index 9fc46366597e..9db35305f257 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -5,6 +5,7 @@ #include <linux/elf.h> #include <linux/kernel.h> #include <linux/pagemap.h> +#include <linux/secretmem.h> #define BUILD_ID 3 @@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, if (!vma->vm_file) return -EINVAL; +#ifdef CONFIG_SECRETMEM + /* reject secretmem folios created with memfd_secret() */ + if (vma->vm_file->f_mapping->a_ops == &secretmem_aops) + return -EFAULT; +#endif + page = find_get_page(vma->vm_file->f_mapping, 0); if (!page) return -EFAULT; /* page not mapped */ -- 2.48.1

8 months

2
1
0 0

[PATCH 6.6.y 1/1] hrtimer: Use and report correct timerslack values for realtime tasks

by Felix Moessbauer

commit ed4fb6d7ef68111bb539283561953e5c6e9a6e38 upstream. The timerslack_ns setting is used to specify how much the hardware timers should be delayed, to potentially dispatch multiple timers in a single interrupt. This is a performance optimization. Timers of realtime tasks (having a realtime scheduling policy) should not be delayed. This logic was inconsitently applied to the hrtimers, leading to delays of realtime tasks which used timed waits for events (e.g. condition variables). Due to the downstream override of the slack for rt tasks, the procfs reported incorrect (non-zero) timerslack_ns values. This is changed by setting the timer_slack_ns task attribute to 0 for all tasks with a rt policy. By that, downstream users do not need to specially handle rt tasks (w.r.t. the slack), and the procfs entry shows the correct value of "0". Setting non-zero slack values (either via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored, as stated in "man 2 PR_SET_TIMERSLACK": Timer slack is not applied to threads that are scheduled under a real-time scheduling policy (see sched_setscheduler(2)). The special handling of timerslack on rt tasks in downstream users is removed as well. Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemen… --- fs/proc/base.c | 9 +++++---- fs/select.c | 11 ++++------- kernel/sched/core.c | 8 ++++++++ kernel/sys.c | 2 ++ kernel/time/hrtimer.c | 18 +++--------------- 5 files changed, 22 insertions(+), 26 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 699f085d4de7d..91fe20b7657c0 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2633,10 +2633,11 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf, } task_lock(p); - if (slack_ns == 0) - p->timer_slack_ns = p->default_timer_slack_ns; - else - p->timer_slack_ns = slack_ns; + if (task_is_realtime(p)) + slack_ns = 0; + else if (slack_ns == 0) + slack_ns = p->default_timer_slack_ns; + p->timer_slack_ns = slack_ns; task_unlock(p); out: diff --git a/fs/select.c b/fs/select.c index 3f730b8581f65..e66b6189845ea 100644 --- a/fs/select.c +++ b/fs/select.c @@ -77,19 +77,16 @@ u64 select_estimate_accuracy(struct timespec64 *tv) { u64 ret; struct timespec64 now; + u64 slack = current->timer_slack_ns; - /* - * Realtime tasks get a slack of 0 for obvious reasons. - */ - - if (rt_task(current)) + if (slack == 0) return 0; ktime_get_ts64(&now); now = timespec64_sub(*tv, now); ret = __estimate_accuracy(&now); - if (ret < current->timer_slack_ns) - return current->timer_slack_ns; + if (ret < slack) + return slack; return ret; } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 784a4f8409453..3d6dc03e4a7d3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7530,6 +7530,14 @@ static void __setscheduler_params(struct task_struct *p, else if (fair_policy(policy)) p->static_prio = NICE_TO_PRIO(attr->sched_nice); + /* rt-policy tasks do not have a timerslack */ + if (task_is_realtime(p)) { + p->timer_slack_ns = 0; + } else if (p->timer_slack_ns == 0) { + /* when switching back to non-rt policy, restore timerslack */ + p->timer_slack_ns = p->default_timer_slack_ns; + } + /* * __sched_setscheduler() ensures attr->sched_priority == 0 when * !rt_policy. Always setting this ensures that things like diff --git a/kernel/sys.c b/kernel/sys.c index 44b5759903332..355de0b65c235 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2535,6 +2535,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = current->timer_slack_ns; break; case PR_SET_TIMERSLACK: + if (task_is_realtime(current)) + break; if (arg2 <= 0) current->timer_slack_ns = current->default_timer_slack_ns; diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index e99b1305e1a5f..5db6912b8f6e1 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -2093,14 +2093,9 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode, struct restart_block *restart; struct hrtimer_sleeper t; int ret = 0; - u64 slack; - - slack = current->timer_slack_ns; - if (rt_task(current)) - slack = 0; hrtimer_init_sleeper_on_stack(&t, clockid, mode); - hrtimer_set_expires_range_ns(&t.timer, rqtp, slack); + hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns); ret = do_nanosleep(&t, mode); if (ret != -ERESTART_RESTARTBLOCK) goto out; @@ -2281,7 +2276,7 @@ void __init hrtimers_init(void) /** * schedule_hrtimeout_range_clock - sleep until timeout * @expires: timeout value (ktime_t) - * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks + * @delta: slack in expires timeout (ktime_t) * @mode: timer mode * @clock_id: timer clock to be used */ @@ -2308,13 +2303,6 @@ schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta, return -EINTR; } - /* - * Override any slack passed by the user if under - * rt contraints. - */ - if (rt_task(current)) - delta = 0; - hrtimer_init_sleeper_on_stack(&t, clock_id, mode); hrtimer_set_expires_range_ns(&t.timer, *expires, delta); hrtimer_sleeper_start_expires(&t, mode); @@ -2334,7 +2322,7 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock); /** * schedule_hrtimeout_range - sleep until timeout * @expires: timeout value (ktime_t) - * @delta: slack in expires timeout (ktime_t) for SCHED_OTHER tasks + * @delta: slack in expires timeout (ktime_t) * @mode: timer mode * * Make the current task sleep until the given expiry time has -- 2.47.2

8 months

2
1
0 0

[PATCH v2] x86/microcode/AMD: Fix out-of-bounds on systems with CPU-less NUMA nodes

by Florent Revest

Currently, load_microcode_amd() iterates over all NUMA nodes, retrieves their CPU masks and unconditonally accesses per-CPU data for the first CPU of each mask. According to Documentation/admin-guide/mm/numaperf.rst: "Some memory may share the same node as a CPU, and others are provided as memory only nodes." Therefore, some node CPU masks may be empty and wouldn't have a "first CPU". On a machine with far memory (and therefore CPU-less NUMA nodes): - cpumask_of_node(nid) is 0 - cpumask_first(0) is CONFIG_NR_CPUS - cpu_data(CONFIG_NR_CPUS) accesses the cpu_info per-CPU array at an index that is 1 out of bounds This does not have any security implications since flashing microcode is a privileged operation but I believe this has reliability implications by potentially corrupting memory while flashing a microcode update. When booting with CONFIG_UBSAN_BOUNDS=y on an AMD machine that flashes a microcode update. I get the following splat: UBSAN: array-index-out-of-bounds in arch/x86/kernel/cpu/microcode/amd.c:X:Y index 512 is out of range for type 'unsigned long[512]' [...] Call Trace: dump_stack+0xdb/0x143 __ubsan_handle_out_of_bounds+0xf5/0x120 load_microcode_amd+0x58f/0x6b0 request_microcode_amd+0x17c/0x250 reload_store+0x174/0x2b0 kernfs_fop_write_iter+0x227/0x2d0 vfs_write+0x322/0x510 ksys_write+0xb5/0x160 do_syscall_64+0x6b/0xa0 entry_SYSCALL_64_after_hwframe+0x67/0xd1 This changes the iteration to only loop on NUMA nodes which have CPUs before attempting to update their microcodes. Fixes: 7ff6edf4fef3 ("x86/microcode/AMD: Fix mixed steppings support") Signed-off-by: Florent Revest <revest(a)chromium.org> Cc: stable(a)vger.kernel.org --- arch/x86/kernel/cpu/microcode/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index 95ac1c6a84fbe..d1e74bfe130f8 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -1068,7 +1068,7 @@ static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t siz if (ret != UCODE_OK) return ret; - for_each_node(nid) { + for_each_node_with_cpus(nid) { cpu = cpumask_first(cpumask_of_node(nid)); c = &cpu_data(cpu); -- 2.49.0.rc0.332.g42c0ae87b1-goog

8 months

3
2
0 0

[PATCH 6.12.y 6.13.y] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq

by Uladzislau Rezki (Sony)

commit dfd3df31c9db752234d7d2e09bef2aeabb643ce4 upstream. Currently kvfree_rcu() APIs use a system workqueue which is "system_unbound_wq" to driver RCU machinery to reclaim a memory. Recently, it has been noted that the following kernel warning can be observed: <snip> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120 Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ... CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G E 6.13.2-0_g925d379822da #1 Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023 Workqueue: nvme-wq nvme_scan_work RIP: 0010:check_flush_dependency+0x112/0x120 Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ... RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082 RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027 RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88 RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400 R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000 CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0xa4/0x140 ? check_flush_dependency+0x112/0x120 ? report_bug+0xe1/0x140 ? check_flush_dependency+0x112/0x120 ? handle_bug+0x5e/0x90 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? timer_recalc_next_expiry+0x190/0x190 ? check_flush_dependency+0x112/0x120 ? check_flush_dependency+0x112/0x120 __flush_work.llvm.1643880146586177030+0x174/0x2c0 flush_rcu_work+0x28/0x30 kvfree_rcu_barrier+0x12f/0x160 kmem_cache_destroy+0x18/0x120 bioset_exit+0x10c/0x150 disk_release.llvm.6740012984264378178+0x61/0xd0 device_release+0x4f/0x90 kobject_put+0x95/0x180 nvme_put_ns+0x23/0xc0 nvme_remove_invalid_namespaces+0xb3/0xd0 nvme_scan_work+0x342/0x490 process_scheduled_works+0x1a2/0x370 worker_thread+0x2ff/0x390 ? pwq_release_workfn+0x1e0/0x1e0 kthread+0xb1/0xe0 ? __kthread_parkme+0x70/0x70 ret_from_fork+0x30/0x40 ? __kthread_parkme+0x70/0x70 ret_from_fork_asm+0x11/0x20 </TASK> ---[ end trace 0000000000000000 ]--- <snip> To address this switch to use of independent WQ_MEM_RECLAIM workqueue, so the rules are not violated from workqueue framework point of view. Apart of that, since kvfree_rcu() does reclaim memory it is worth to go with WQ_MEM_RECLAIM type of wq because it is designed for this purpose. Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), Reported-by: Keith Busch <kbusch(a)kernel.org> Closes: https://lore.kernel.org/all/Z7iqJtCjHKfo8Kho@kbusch-mbp/ Cc: stable(a)vger.kernel.org Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Reviewed-by: Joel Fernandes <joelagnelf(a)nvidia.com> Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> --- kernel/rcu/tree.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 3e486ccaa4ca..8e52c1dd0628 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3191,6 +3191,8 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) } EXPORT_SYMBOL_GPL(call_rcu); +static struct workqueue_struct *rcu_reclaim_wq; + /* Maximum number of jiffies to wait before draining a batch. */ #define KFREE_DRAIN_JIFFIES (5 * HZ) #define KFREE_N_BATCHES 2 @@ -3519,10 +3521,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) if (delayed_work_pending(&krcp->monitor_work)) { delay_left = krcp->monitor_work.timer.expires - jiffies; if (delay < delay_left) - mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); + mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay); return; } - queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); + queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay); } static void @@ -3620,7 +3622,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) // "free channels", the batch can handle. Break // the loop since it is done with this CPU thus // queuing an RCU work is _always_ success here. - queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work); + queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work); WARN_ON_ONCE(!queued); break; } @@ -3708,7 +3710,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp) if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && !atomic_xchg(&krcp->work_in_progress, 1)) { if (atomic_read(&krcp->backoff_page_cache_fill)) { - queue_delayed_work(system_unbound_wq, + queue_delayed_work(rcu_reclaim_wq, &krcp->page_cache_work, msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); } else { @@ -5662,6 +5664,10 @@ static void __init kfree_rcu_batch_init(void) int i, j; struct shrinker *kfree_rcu_shrinker; + rcu_reclaim_wq = alloc_workqueue("kvfree_rcu_reclaim", + WQ_UNBOUND | WQ_MEM_RECLAIM, 0); + WARN_ON(!rcu_reclaim_wq); + /* Clamp it to [0:100] seconds interval. */ if (rcu_delay_page_cache_fill_msec < 0 || rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) { -- 2.39.5

8 months

2
1
0 0

FAILED: patch "[PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030901-banshee-unwomanly-f19e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001 From: Michal Pecio <michal.pecio(a)gmail.com> Date: Tue, 25 Feb 2025 11:59:27 +0200 Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805 Raspberry Pi is a major user of those chips and they discovered a bug - when the end of a transfer ring segment is reached, up to four TRBs can be prefetched from the next page even if the segment ends with link TRB and on page boundary (the chip claims to support standard 4KB pages). It also appears that if the prefetched TRBs belong to a different ring whose doorbell is later rung, they may be used without refreshing from system RAM and the endpoint will stay idle if their cycle bit is stale. Other users complain about IOMMU faults on x86 systems, unsurprisingly. Deal with it by using existing quirk which allocates a dummy page after each transfer ring segment. This was seen to resolve both problems. RPi came up with a more efficient solution, shortening each segment by four TRBs, but it complicated the driver and they ditched it for this quirk. Also rename the quirk and add VL805 device ID macro. Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com> Link: https://github.com/raspberrypi/linux/issues/4685 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906 CC: stable(a)vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 92703efda1f7..fdf0c1008225 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags) * and our use of dma addresses in the trb_address_map radix tree needs * TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need. */ - if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH) + if (xhci->quirks & XHCI_TRB_OVERFETCH) + /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */ xhci->segment_pool = dma_pool_create("xHCI ring segments", dev, TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2); else diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index ad0ff356f6fa..54460d11f7ee 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -38,6 +38,8 @@ #define PCI_DEVICE_ID_ETRON_EJ168 0x7023 #define PCI_DEVICE_ID_ETRON_EJ188 0x7052 +#define PCI_DEVICE_ID_VIA_VL805 0x3483 + #define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31 #define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1 @@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) pdev->device == 0x3432) xhci->quirks |= XHCI_BROKEN_STREAMS; - if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483) + if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) { xhci->quirks |= XHCI_LPM_SUPPORT; + xhci->quirks |= XHCI_TRB_OVERFETCH; + } if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) { @@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) if (pdev->device == 0x9202) { xhci->quirks |= XHCI_RESET_ON_RESUME; - xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH; + xhci->quirks |= XHCI_TRB_OVERFETCH; } if (pdev->device == 0x9203) - xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH; + xhci->quirks |= XHCI_TRB_OVERFETCH; } if (pdev->vendor == PCI_VENDOR_ID_CDNS && diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 8c164340a2c3..779b01dee068 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1632,7 +1632,7 @@ struct xhci_hcd { #define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42) #define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43) #define XHCI_RESET_TO_DEFAULT BIT_ULL(44) -#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45) +#define XHCI_TRB_OVERFETCH BIT_ULL(45) #define XHCI_ZHAOXIN_HOST BIT_ULL(46) #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)

8 months

3
2
0 0

[PATCH RFC 5.15] smb: client: fix potential UAF in cifs_stats_proc_write()

by Xiangyu Chen

From: Paulo Alcantara <pc(a)manguebit.com> commit d3da25c5ac84430f89875ca7485a3828150a7e0a upstream. Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: stable(a)vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com> Signed-off-by: Steve French <stfrench(a)microsoft.com> [ cifs_debug.c was moved from fs/cifs to fs/smb/client since 38c8a9a52082 ("smb: move client and server files to common directory fs/smb"). The cifs_ses_exiting() was introduced to cifs_debug.c since ca545b7f0823 ("smb: client: fix potential UAF in cifs_debug_files_proc_show()"). and the SES_EXITING in cifs_ses_exiting() instead of CifsExiting since dd3cd8709ed5 ("cifs: use new enum for ses_status"). The ses_lock in cifs_ses_exiting() was introduced in commmit d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data"), on 5.15/5.10, there is a global lock take care of ses->status. So use "if (ses->status == CifsExiting)" instead of "if (cifs_ses_exiting(ses))" ] Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> Signed-off-by: He Zhe <zhe.he(a)windriver.com> --- Try to merge commit d3da25c5ac84430f89875ca7485a3828150a7e0a to 5.15 Verified the code compile on linux 5.15 --- fs/cifs/cifs_debug.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c index e7501533c2ec..ce02cc71e117 100644 --- a/fs/cifs/cifs_debug.c +++ b/fs/cifs/cifs_debug.c @@ -528,6 +528,8 @@ static ssize_t cifs_stats_proc_write(struct file *file, list_for_each(tmp2, &server->smb_ses_list) { ses = list_entry(tmp2, struct cifs_ses, smb_ses_list); + if (ses->status == CifsExiting) + continue; list_for_each(tmp3, &ses->tcon_list) { tcon = list_entry(tmp3, struct cifs_tcon, -- 2.25.1

8 months

2
1
0 0

[PATCH for-stable-6.x 0/2] Stable backport of c00b413a96261fae

by Ard Biesheuvel

From: Ard Biesheuvel <ardb(a)kernel.org> This is the stable backport of commit c00b413a96261fae ("x86/boot: Sanitize boot params before parsing command line") to the v6.6 and v6.1 trees. Patch #2 can be applied to both v6.1 and v6.6. Patch #1 is a prerequisite that is already present in v6.1 but was not backported to v6.6 yet for reasons that are unclear to me, and so it needs to be applied to v6.6 first. Ard Biesheuvel (2): x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr' x86/boot: Sanitize boot params before parsing command line arch/x86/boot/compressed/acpi.c | 14 +++++------ arch/x86/boot/compressed/cmdline.c | 4 +-- arch/x86/boot/compressed/ident_map_64.c | 7 +++--- arch/x86/boot/compressed/kaslr.c | 26 ++++++++++---------- arch/x86/boot/compressed/mem.c | 6 ++--- arch/x86/boot/compressed/misc.c | 26 ++++++++++---------- arch/x86/boot/compressed/misc.h | 1 - arch/x86/boot/compressed/pgtable_64.c | 11 +++++---- arch/x86/boot/compressed/sev.c | 2 +- arch/x86/include/asm/boot.h | 2 ++ drivers/firmware/efi/libstub/x86-stub.c | 2 +- drivers/firmware/efi/libstub/x86-stub.h | 2 -- 12 files changed, 52 insertions(+), 51 deletions(-) -- 2.49.0.rc0.332.g42c0ae87b1-goog

8 months

3
4
0 0

[PATCH 6.1.y] fs/ntfs3: Add rough attr alloc_size check

by bin.lan.cn＠windriver.com

From: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> [ Upstream commit c4a8ba334262e9a5c158d618a4820e1b9c12495c ] Reported-by: syzbot+c6d94bedd910a8216d25(a)syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com> Signed-off-by: He Zhe <zhe.he(a)windriver.com> --- Verified the build test. --- fs/ntfs3/record.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c index b2b98631a000..bfb1f4c2f271 100644 --- a/fs/ntfs3/record.c +++ b/fs/ntfs3/record.c @@ -325,6 +325,9 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr) } else { if (attr->nres.c_unit) return NULL; + + if (alloc_size > mi->sbi->volume.size) + return NULL; } return attr; -- 2.34.1

8 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2025