From: Alexander Aring aahringo@redhat.com
[ Upstream commit 30ea3257e8766027c4d8d609dcbd256ff9a76073 ]
This patch fixes a race between queue_work() in _dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can take the final reference of a dlm_msg and so msg->idx can contain garbage which is signaled by the following warning:
[ 676.237050] ------------[ cut here ]------------ [ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50 [ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr [ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546 [ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014 [ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50 [ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 [ 676.249253] RSP: 0018:ffffa401c18ffc68 EFLAGS: 00010282 [ 676.249855] RAX: 0000000000000001 RBX: 00000000ffff8b76 RCX: 0000000000000006 [ 676.250713] RDX: 0000000000000000 RSI: ffffffffbccf3a10 RDI: ffffffffbcc7b62e [ 676.251610] RBP: ffffa401c18ffc70 R08: 0000000000000001 R09: 0000000000000001 [ 676.252481] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000005 [ 676.253421] R13: ffff8b76786ec370 R14: ffff8b76786ec370 R15: ffff8b76786ec480 [ 676.254257] FS: 0000000000000000(0000) GS:ffff8b7777800000(0000) knlGS:0000000000000000 [ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 676.255897] CR2: 00005590205d88b8 CR3: 000000017656c003 CR4: 0000000000770ee0 [ 676.256734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 676.257567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 676.258397] PKRU: 55555554 [ 676.258729] Call Trace: [ 676.259063] <TASK> [ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110 [ 676.259964] queue_bast+0x8b/0xb0 [ 676.260423] grant_pending_locks+0x166/0x1b0 [ 676.261007] _unlock_lock+0x75/0x90 [ 676.261469] unlock_lock.isra.57+0x62/0xa0 [ 676.262009] dlm_unlock+0x21e/0x330 [ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture] [ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture] [ 676.263815] ? preempt_count_sub+0xba/0x100 [ 676.264361] ? complete+0x1d/0x60 [ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture] [ 676.265555] kthread+0x10a/0x130 [ 676.266007] ? kthread_complete_and_exit+0x20/0x20 [ 676.266616] ret_from_fork+0x22/0x30 [ 676.267097] </TASK> [ 676.267381] irq event stamp: 9579855 [ 676.267824] hardirqs last enabled at (9579863): [<ffffffffbb14e6f8>] __up_console_sem+0x58/0x60 [ 676.268896] hardirqs last disabled at (9579872): [<ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60 [ 676.270008] softirqs last enabled at (9579798): [<ffffffffbc200349>] __do_softirq+0x349/0x4c7 [ 676.271438] softirqs last disabled at (9579897): [<ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0 [ 676.272796] ---[ end trace 0000000000000000 ]---
I reproduced this warning with dlm_locktorture test which is currently not upstream. However this patch fix the issue by make a additional refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg(). In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be the final put.
Signed-off-by: Alexander Aring aahringo@redhat.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/dlm/lowcomms.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index b11f695261f5..9d7078a1dc8b 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1319,6 +1319,8 @@ struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, gfp_t allocation, return NULL; }
+ /* for dlm_lowcomms_commit_msg() */ + kref_get(&msg->ref); /* we assume if successful commit must called */ msg->idx = idx; return msg; @@ -1353,6 +1355,8 @@ void dlm_lowcomms_commit_msg(struct dlm_msg *msg) { _dlm_lowcomms_commit_msg(msg); srcu_read_unlock(&connections_srcu, msg->idx); + /* because dlm_lowcomms_new_msg() */ + kref_put(&msg->ref, dlm_msg_release); }
void dlm_lowcomms_put_msg(struct dlm_msg *msg)
From: Zqiang qiang1.zhang@intel.com
[ Upstream commit 621189a1fe93cb2b34d62c5cdb9e258bca044813 ]
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section.
However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period.
This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task.
This change was tested as follows on a four-CPU system:
echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter
This procedure produces results in this per-CPU set of files:
/sys/kernel/debug/tracing/trace_stat/function*
Sample output from one of these files is as follows:
Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us
The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant.
The raw data is available in the Link: URL.
Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/ Signed-off-by: Zqiang qiang1.zhang@intel.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree_plugin.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index ef2dd131e955..f1a73a1f8472 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -638,7 +638,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) || (rdp->grpmask & READ_ONCE(rnp->expmask)) || - IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) || + (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && + ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) || (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled && t->rcu_blocked_node); // Need to defer quiescent state until everything is enabled.
From: Michal Hocko mhocko@suse.com
[ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ]
The fill_page_cache_func() function allocates couple of pages to store kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY) allocation which can fail under memory pressure. The function will, however keep retrying even when the previous attempt has failed.
This retrying is in theory correct, but in practice the allocation is invoked from workqueue context, which means that if the memory reclaim gets stuck, these retries can hog the worker for quite some time. Although the workqueues subsystem automatically adjusts concurrency, such adjustment is not guaranteed to happen until the worker context sleeps. And the fill_page_cache_func() function's retry loop is not guaranteed to sleep (see the should_reclaim_retry() function).
And we have seen this function cause workqueue lockups:
kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s! [...] kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146 kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5 kernel: in-flight: 1917:fill_page_cache_func kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor
Originally, we thought that the root cause of this lockup was several retries with direct reclaim, but this is not yet confirmed. Furthermore, we have seen similar lockups without any heavy memory pressure. This suggests that there are other factors contributing to these lockups. However, it is not really clear that endless retries are desireable.
So let's make the fill_page_cache_func() function back off after allocation failure.
Cc: Uladzislau Rezki (Sony) urezki@gmail.com Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Frederic Weisbecker frederic@kernel.org Cc: Neeraj Upadhyay quic_neeraju@quicinc.com Cc: Josh Triplett josh@joshtriplett.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Lai Jiangshan jiangshanlai@gmail.com Cc: Joel Fernandes joel@joelfernandes.org Signed-off-by: Michal Hocko mhocko@suse.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a4a9d68b1fdc..63f7ce228cc3 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3419,15 +3419,16 @@ static void fill_page_cache_func(struct work_struct *work) bnode = (struct kvfree_rcu_bulk_data *) __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
- if (bnode) { - raw_spin_lock_irqsave(&krcp->lock, flags); - pushed = put_cached_bnode(krcp, bnode); - raw_spin_unlock_irqrestore(&krcp->lock, flags); + if (!bnode) + break;
- if (!pushed) { - free_page((unsigned long) bnode); - break; - } + raw_spin_lock_irqsave(&krcp->lock, flags); + pushed = put_cached_bnode(krcp, bnode); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (!pushed) { + free_page((unsigned long) bnode); + break; } }
From: Zqiang qiang1.zhang@intel.com
[ Upstream commit fcd53c8a4dfa38bafb89efdd0b0f718f3a03f884 ]
Kernels built with CONFIG_PROVE_RCU=y and CONFIG_DEBUG_LOCK_ALLOC=y attempt to emit a warning when the synchronize_rcu_tasks_generic() function is called during early boot while the rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE. However the warnings is not actually be printed because the debug_lockdep_rcu_enabled() returns false, exactly because the rcu_scheduler_active variable is still equal to RCU_SCHEDULER_INACTIVE.
This commit therefore replaces RCU_LOCKDEP_WARN() with WARN_ONCE() to force these warnings to actually be printed.
Signed-off-by: Zqiang qiang1.zhang@intel.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tasks.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 60c9eacac25b..ae8396032b5d 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -171,7 +171,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func, static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp) { /* Complain if the scheduler has not started. */ - RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE, + WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE, "synchronize_rcu_tasks called too soon");
/* Wait for the grace period. */
From: Arvid Norlander lkml@vorpal.se
[ Upstream commit 574160b8548deff8b80b174f03201e94ab8431e2 ]
Toshiba Satellite Z830 needs the quirk video_disable_backlight_sysfs_if for proper backlight control after suspend/resume cycles.
Toshiba Portege Z830 is simply the same laptop rebranded for certain markets (I looked through the manual to other language sections to confirm this) and thus also needs this quirk.
Thanks to Hans de Goede for suggesting this fix.
Link: https://www.spinics.net/lists/platform-driver-x86/msg34394.html Suggested-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Arvid Norlander lkml@vorpal.se Reviewed-by: Hans de Goede hdegoede@redhat.com Tested-by: Arvid Norlander lkml@vorpal.se Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpi_video.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c index 390af28f6faf..2b18b51f6351 100644 --- a/drivers/acpi/acpi_video.c +++ b/drivers/acpi/acpi_video.c @@ -496,6 +496,22 @@ static const struct dmi_system_id video_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "SATELLITE R830"), }, }, + { + .callback = video_disable_backlight_sysfs_if, + .ident = "Toshiba Satellite Z830", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "SATELLITE Z830"), + }, + }, + { + .callback = video_disable_backlight_sysfs_if, + .ident = "Toshiba Portege Z830", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "PORTEGE Z830"), + }, + }, /* * Some machine's _DOD IDs don't have bit 31(Device ID Scheme) set * but the IDs actually follow the Device ID Scheme.
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 211391bf04b3c74e250c566eeff9cf808156c693 ]
On a Packard Bell Dot SC (Intel Atom N2600 model) there is a FPDT table which contains invalid physical addresses, with high bits set which fall outside the range of the CPU-s supported physical address range.
Calling acpi_os_map_memory() on such an invalid phys address leads to the below WARN_ON in ioremap triggering resulting in an oops/stacktrace.
Add code to verify the physical address before calling acpi_os_map_memory() to fix / avoid the oops.
[ 1.226900] ioremap: invalid physical address 3001000000000000 [ 1.226949] ------------[ cut here ]------------ [ 1.226962] WARNING: CPU: 1 PID: 1 at arch/x86/mm/ioremap.c:200 __ioremap_caller.cold+0x43/0x5f [ 1.226996] Modules linked in: [ 1.227016] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc3+ #490 [ 1.227029] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013 [ 1.227038] RIP: 0010:__ioremap_caller.cold+0x43/0x5f [ 1.227054] Code: 96 00 00 e9 f8 af 24 ff 89 c6 48 c7 c7 d8 0c 84 99 e8 6a 96 00 00 e9 76 af 24 ff 48 89 fe 48 c7 c7 a8 0c 84 99 e8 56 96 00 00 <0f> 0b e9 60 af 24 ff 48 8b 34 24 48 c7 c7 40 0d 84 99 e8 3f 96 00 [ 1.227067] RSP: 0000:ffffb18c40033d60 EFLAGS: 00010286 [ 1.227084] RAX: 0000000000000032 RBX: 3001000000000000 RCX: 0000000000000000 [ 1.227095] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff [ 1.227105] RBP: 3001000000000000 R08: 0000000000000000 R09: ffffb18c40033c18 [ 1.227115] R10: 0000000000000003 R11: ffffffff99d62fe8 R12: 0000000000000008 [ 1.227124] R13: 0003001000000000 R14: 0000000000001000 R15: 3001000000000000 [ 1.227135] FS: 0000000000000000(0000) GS:ffff913a3c080000(0000) knlGS:0000000000000000 [ 1.227146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.227156] CR2: 0000000000000000 CR3: 0000000018c26000 CR4: 00000000000006e0 [ 1.227167] Call Trace: [ 1.227176] <TASK> [ 1.227185] ? acpi_os_map_iomem+0x1c9/0x1e0 [ 1.227215] ? kmem_cache_alloc_trace+0x187/0x370 [ 1.227254] acpi_os_map_iomem+0x1c9/0x1e0 [ 1.227288] acpi_init_fpdt+0xa8/0x253 [ 1.227308] ? acpi_debugfs_init+0x1f/0x1f [ 1.227339] do_one_initcall+0x5a/0x300 [ 1.227406] ? rcu_read_lock_sched_held+0x3f/0x80 [ 1.227442] kernel_init_freeable+0x28b/0x2cc [ 1.227512] ? rest_init+0x170/0x170 [ 1.227538] kernel_init+0x16/0x140 [ 1.227552] ret_from_fork+0x1f/0x30 [ 1.227639] </TASK> [ 1.227647] irq event stamp: 186819 [ 1.227656] hardirqs last enabled at (186825): [<ffffffff98184a6e>] __up_console_sem+0x5e/0x70 [ 1.227672] hardirqs last disabled at (186830): [<ffffffff98184a53>] __up_console_sem+0x43/0x70 [ 1.227686] softirqs last enabled at (186576): [<ffffffff980fbc9d>] __irq_exit_rcu+0xed/0x160 [ 1.227701] softirqs last disabled at (186569): [<ffffffff980fbc9d>] __irq_exit_rcu+0xed/0x160 [ 1.227715] ---[ end trace 0000000000000000 ]---
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpi_fpdt.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/acpi/acpi_fpdt.c b/drivers/acpi/acpi_fpdt.c index 6922a44b3ce7..a2056c4c8cb7 100644 --- a/drivers/acpi/acpi_fpdt.c +++ b/drivers/acpi/acpi_fpdt.c @@ -143,6 +143,23 @@ static const struct attribute_group boot_attr_group = {
static struct kobject *fpdt_kobj;
+#if defined CONFIG_X86 && defined CONFIG_PHYS_ADDR_T_64BIT +#include <linux/processor.h> +static bool fpdt_address_valid(u64 address) +{ + /* + * On some systems the table contains invalid addresses + * with unsuppored high address bits set, check for this. + */ + return !(address >> boot_cpu_data.x86_phys_bits); +} +#else +static bool fpdt_address_valid(u64 address) +{ + return true; +} +#endif + static int fpdt_process_subtable(u64 address, u32 subtable_type) { struct fpdt_subtable_header *subtable_header; @@ -151,6 +168,11 @@ static int fpdt_process_subtable(u64 address, u32 subtable_type) u32 length, offset; int result;
+ if (!fpdt_address_valid(address)) { + pr_info(FW_BUG "invalid physical address: 0x%llx!\n", address); + return -EINVAL; + } + subtable_header = acpi_os_map_memory(address, sizeof(*subtable_header)); if (!subtable_header) return -ENOMEM;
From: Doug Smythies dsmythies@telus.net
[ Upstream commit 71bb5c82aaaea007167f3ba68d3a669c74d7d55d ]
Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported.
Add TIGERLAKE to the list of CPUs that can register intel_pstate while not advertising the HWP capability. Without this change, an TIGERLAKE in no-HWP mode could only use the acpi_cpufreq frequency scaling driver.
See also commits: d8de7a44e11f: cpufreq: intel_pstate: Add Skylake servers support fbdc21e9b038: cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode 706c5328851d: cpufreq: intel_pstate: Add Cometlake support in no-HWP mode
Reported by: M. Cargi Ari cagriari@pm.me Signed-off-by: Doug Smythies dsmythies@telus.net Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/intel_pstate.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 8a2c6b58b652..c57229c108a7 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2257,6 +2257,7 @@ static const struct x86_cpu_id intel_pstate_cpu_ids[] = { X86_MATCH(SKYLAKE_X, core_funcs), X86_MATCH(COMETLAKE, core_funcs), X86_MATCH(ICELAKE_X, core_funcs), + X86_MATCH(TIGERLAKE, core_funcs), {} }; MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
From: Kees Cook keescook@chromium.org
[ Upstream commit 0dedcf6e3301836eb70cfa649052e7ce4fcd13ba ]
Clang is especially sensitive about argument type matching when using __overloaded functions (like memcmp(), etc). Help it see that function pointers are just "void *". Avoids this error:
arch/mips/bcm47xx/prom.c:89:8: error: no matching function for call to 'memcmp' if (!memcmp(prom_init, prom_init + mem, 32)) ^~~~~~ include/linux/string.h:156:12: note: candidate function not viable: no known conversion from 'void (void)' to 'const void *' for 1st argument extern int memcmp(const void *,const void *,__kernel_size_t);
Cc: Hauke Mehrtens hauke@hauke-m.de Cc: "Rafał Miłecki" zajec5@gmail.com Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: linux-mips@vger.kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: llvm@lists.linux.dev Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/lkml/202209080652.sz2d68e5-lkp@intel.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/bcm47xx/prom.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/bcm47xx/prom.c b/arch/mips/bcm47xx/prom.c index 0a63721d0fbf..5a33d6b48d77 100644 --- a/arch/mips/bcm47xx/prom.c +++ b/arch/mips/bcm47xx/prom.c @@ -86,7 +86,7 @@ static __init void prom_init_mem(void) pr_debug("Assume 128MB RAM\n"); break; } - if (!memcmp(prom_init, prom_init + mem, 32)) + if (!memcmp((void *)prom_init, (void *)prom_init + mem, 32)) break; } lowmem = mem; @@ -159,7 +159,7 @@ void __init bcm47xx_prom_highmem_init(void)
off = EXTVBASE + __pa(off); for (extmem = 128 << 20; extmem < 512 << 20; extmem <<= 1) { - if (!memcmp(prom_init, (void *)(off + extmem), 16)) + if (!memcmp((void *)prom_init, (void *)(off + extmem), 16)) break; } extmem -= lowmem;
From: Chao Qin chao.qin@intel.com
[ Upstream commit 2d93540014387d1c73b9ccc4d7895320df66d01b ]
When value < time_unit, the parameter of ilog2() will be zero and the return value is -1. u64(-1) is too large for shift exponent and then will trigger shift-out-of-bounds:
shift exponent 18446744073709551615 is too large for 32-bit type 'int' Call Trace: rapl_compute_time_window_core rapl_write_data_raw set_time_window store_constraint_time_window_us
Signed-off-by: Chao Qin chao.qin@intel.com Acked-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 7c0099e7a6d7..0f0a7191d040 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -938,6 +938,9 @@ static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value, y = value & 0x1f; value = (1 << y) * (4 + f) * rp->time_unit / 4; } else { + if (value < rp->time_unit) + return 0; + do_div(value, rp->time_unit); y = ilog2(value); f = div64_u64(4 * (value - (1 << y)), 1 << y);
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
[ Upstream commit 68b99e94a4a2db6ba9b31fe0485e057b9354a640 ]
When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG:
BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ...
Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning.
Use get_cpu() instead of smp_processor_id() to avoid this BUG.
Suggested-by: Chen Yu yu.c.chen@intel.com Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/intel/intel_powerclamp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c index a5b58ea89cc6..9121ae4f5068 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -532,8 +532,10 @@ static int start_power_clamp(void)
/* prefer BSP */ control_cpu = 0; - if (!cpu_online(control_cpu)) - control_cpu = smp_processor_id(); + if (!cpu_online(control_cpu)) { + control_cpu = get_cpu(); + put_cpu(); + }
clamping = true; schedule_delayed_work(&poll_pkg_cstate_work, 0);
From: Kees Cook keescook@chromium.org
[ Upstream commit 1b64daf413acd86c2c13f5443f6b4ef3690c8061 ]
The .data.rel.ro.local section has the same semantics as .data.rel.ro here, so include it in the .rodata section of the decompressor. Additionally since the .printk_index section isn't usable outside of the core kernel, discard it in the decompressor. Avoids these warnings:
arm-linux-gnueabi-ld: warning: orphan section `.data.rel.ro.local' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.data.rel.ro.local' arm-linux-gnueabi-ld: warning: orphan section `.printk_index' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.printk_index'
Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/linux-mm/202209080545.qMIVj7YM-lkp@intel.com Cc: Russell King linux@armlinux.org.uk Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/compressed/vmlinux.lds.S | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S index 1bcb68ac4b01..3fcb3e62dc56 100644 --- a/arch/arm/boot/compressed/vmlinux.lds.S +++ b/arch/arm/boot/compressed/vmlinux.lds.S @@ -23,6 +23,7 @@ SECTIONS *(.ARM.extab*) *(.note.*) *(.rel.*) + *(.printk_index) /* * Discard any r/w data - this produces a link error if we have any, * which is required for PIC decompression. Local data generates @@ -57,6 +58,7 @@ SECTIONS *(.rodata) *(.rodata.*) *(.data.rel.ro) + *(.data.rel.ro.*) } .piggydata : { *(.piggydata)
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 018d6711c26e4bd26e20a819fcc7f8ab902608f3 ]
Dell Inspiron 14 2-in-1 has two ACPI nodes under GPP1 both with _ADR of 0, both without _HID. It's ambiguous which the kernel should take, but it seems to take "DEV0". Unfortunately "DEV0" is missing the device property `StorageD3Enable` which is present on "NVME".
To avoid this causing problems for suspend, add a quirk for this system to behave like `StorageD3Enable` property was found.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216440 Reported-and-tested-by: Luya Tshimbalanga luya@fedoraproject.org Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/x86/utils.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index b3fb428461c6..3a3f09b6cbfc 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -198,7 +198,24 @@ static const struct x86_cpu_id storage_d3_cpu_ids[] = { {} };
+static const struct dmi_system_id force_storage_d3_dmi[] = { + { + /* + * _ADR is ambiguous between GPP1.DEV0 and GPP1.NVME + * but .NVME is needed to get StorageD3Enable node + * https://bugzilla.kernel.org/show_bug.cgi?id=216440 + */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 14 7425 2-in-1"), + } + }, + {} +}; + bool force_storage_d3(void) { - return x86_match_cpu(storage_d3_cpu_ids); + const struct dmi_system_id *dmi_id = dmi_first_match(force_storage_d3_dmi); + + return dmi_id || x86_match_cpu(storage_d3_cpu_ids); }
From: Kees Cook keescook@chromium.org
[ Upstream commit 3e1730842f142add55dc658929221521a9ea62b6 ]
Clang produces a false positive when building with CONFIG_FORTIFY_SOURCE=y and CONFIG_UBSAN_BOUNDS=y when operating on an array with a dynamic offset. Work around this by using a direct assignment of an empty instance. Avoids this warning:
../include/linux/fortify-string.h:309:4: warning: call to __write_overflow_field declared with 'warn ing' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wat tribute-warning] __write_overflow_field(p_size_field, size); ^
which was isolated to the memset() call in xen_load_idt().
Note that this looks very much like another bug that was worked around: https://github.com/ClangBuiltLinux/linux/issues/1592
Cc: Juergen Gross jgross@suse.com Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: x86@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: xen-devel@lists.xenproject.org Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Link: https://lore.kernel.org/lkml/41527d69-e8ab-3f86-ff37-6b298c01d5bc@oracle.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/xen/enlighten_pv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 133ef31639df..561aad13412f 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -759,6 +759,7 @@ static void xen_load_idt(const struct desc_ptr *desc) { static DEFINE_SPINLOCK(lock); static struct trap_info traps[257]; + static const struct trap_info zero = { }; unsigned out;
trace_xen_cpu_load_idt(desc); @@ -768,7 +769,7 @@ static void xen_load_idt(const struct desc_ptr *desc) memcpy(this_cpu_ptr(&idt_desc), desc, sizeof(idt_desc));
out = xen_convert_trap_info(desc, traps, false); - memset(&traps[out], 0, sizeof(traps[0])); + traps[out] = zero;
xen_mc_flush(); if (HYPERVISOR_set_trap_table(traps))
From: Anna Schumaker Anna.Schumaker@Netapp.com
[ Upstream commit 06981d560606ac48d61e5f4fff6738b925c93173 ]
This was discussed with Chuck as part of this patch set. Returning nfserr_resource was decided to not be the best error message here, and he suggested changing to nfserr_serverfault instead.
Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/... Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index f6d385e0efec..f559405772c5 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3993,7 +3993,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, if (resp->xdr->buf->page_len && test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { WARN_ON_ONCE(1); - return nfserr_resource; + return nfserr_serverfault; } xdr_commit_encode(xdr);
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 019805fea91599b22dfa62ffb29c022f35abeb06 ]
Use-after-free occurred when the laundromat tried to free expired cpntf_state entry on the s2s_cp_stateids list after inter-server copy completed. The sc_cp_list that the expired copy state was inserted on was already freed.
When COPY completes, the Linux client normally sends LOCKU(lock_state x), FREE_STATEID(lock_state x) and CLOSE(open_state y) to the source server. The nfs4_put_stid call from nfsd4_free_stateid cleans up the copy state from the s2s_cp_stateids list before freeing the lock state's stid.
However, sometimes the CLOSE was sent before the FREE_STATEID request. When this happens, the nfsd4_close_open_stateid call from nfsd4_close frees all lock states on its st_locks list without cleaning up the copy state on the sc_cp_list list. When the time the FREE_STATEID arrives the server returns BAD_STATEID since the lock state was freed. This causes the use-after-free error to occur when the laundromat tries to free the expired cpntf_state.
This patch adds a call to nfs4_free_cpntf_statelist in nfsd4_close_open_stateid to clean up the copy state before calling free_ol_stateid_reaplist to free the lock state's stid on the reaplist.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f9e2fa9cfbec..7b763f146b62 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -961,6 +961,7 @@ static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp)
static void nfs4_free_deleg(struct nfs4_stid *stid) { + WARN_ON(!list_empty(&stid->sc_cp_list)); kmem_cache_free(deleg_slab, stid); atomic_long_dec(&num_delegations); } @@ -1374,6 +1375,7 @@ static void nfs4_free_ol_stateid(struct nfs4_stid *stid) release_all_access(stp); if (stp->st_stateowner) nfs4_put_stateowner(stp->st_stateowner); + WARN_ON(!list_empty(&stid->sc_cp_list)); kmem_cache_free(stateid_slab, stid); }
@@ -6389,6 +6391,7 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s) struct nfs4_client *clp = s->st_stid.sc_client; bool unhashed; LIST_HEAD(reaplist); + struct nfs4_ol_stateid *stp;
spin_lock(&clp->cl_lock); unhashed = unhash_open_stateid(s, &reaplist); @@ -6397,6 +6400,8 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s) if (unhashed) put_ol_stateid_locked(s, &reaplist); spin_unlock(&clp->cl_lock); + list_for_each_entry(stp, &reaplist, st_locks) + nfs4_free_cpntf_statelist(clp->net, &stp->st_stid); free_ol_stateid_reaplist(&reaplist); } else { spin_unlock(&clp->cl_lock);
linux-stable-mirror@lists.linaro.org