Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
exceptions/interrupts, to reduce speculation attack surface") unintendedly
broke Xen PV virtual machines by clearing the %rbx register at the end of
xen_failsafe_callback before the latter jumps to error_exit.
error_exit expects the %rbx register to be a flag indicating whether
there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by
the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
by xen_failsafe_callback, to avoid the issue.
The impact of this bug includes (and most likely is not limited to) kernel
BUGs when wine is run in Xen PV virtual machines. One example is included
below. This example depicts the bug when wine is used to run TeamViewer
version 13's portable version under a Xen PV virtual machine using an
up-to-date version of Qubes OS R3.2.
[...........] kernel tried to execute NX-protected page - exploit attempt? (uid: 1000)
[ +0.000021] BUG: unable to handle kernel paging request at ffffffff82012480
[ +0.000020] IP: init_task+0x0/0x2ac0
[ +0.000009] PGD 200e067 P4D 200e067 PUD 200f067 PMD 2d5e067 PTE 8010000002012067
[ +0.000019] Oops: 0011 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ +0.000015] Modules linked in: fuse bluetooth ecdh_generic rfkill ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c intel_rapl x86_pkg_temp_thermal crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront intel_rapl_perf pcspkr binfmt_misc dummy_hcd udc_core xen_gntdev xen_gntalloc u2mfn(O) xen_blkback xenfs xen_evtchn xen_privcmd xen_blkfront
[ +0.000091] CPU: 0 PID: 1350 Comm: TeamViewer.exe Tainted: G O 4.14.20-11.d10d0bb86d #15
[ +0.000018] task: ffff8800042fda00 task.stack: ffffc900006f8000
[ +0.000015] RIP: e030:init_task+0x0/0x2ac0
[ +0.000009] RSP: e02b:ffffffff82003df8 EFLAGS: 00010002
[ +0.000012] RAX: 0000000000000040 RBX: 0000000000000004 RCX: 0000000000000000
[ +0.000015] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffffff810011aa
[ +0.000015] RBP: 000000000033eb88 R08: 0000000000000000 R09: 0000000000000000
[ +0.000015] R10: 0000000000000000 R11: ffff88000d772600 R12: 0000000000000000
[ +0.000015] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ +0.000027] FS: 000000007ffd8000(0063) GS:ffff88000f400000(006b) knlGS:0000000000000000
[ +0.000016] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
[ +0.000014] CR2: ffffffff82012480 CR3: 0000000004880000 CR4: 0000000000042660
[ +0.000025] Call Trace:
[ +0.000007] Code: ff ff ff d0 5a 1e 81 ff ff ff ff 00 00 00 00 00 00 00 00 e0 d7 04 82 ff ff ff ff e8 98 c0 0d 00 88 ff ff 01 80 00 00 00 00 00 00 <00> 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ +0.000068] RIP: init_task+0x0/0x2ac0 RSP: ffffffff82003df8
[ +0.000011] CR2: ffffffff82012480
[ +0.000010] ---[ end trace 7fb0075d4247b2a2 ]---
The commit that introduces this bug was found with git-bisect in conjunction with
some scripts and Qubes OS's cross-VM RPCs for automation, and the correction was
prepared after a manual inspection of the changes introduced by the commit in
question.
To the best of my recollection, this issue is also reproducible in an Ubuntu
18.04 LTS dom0 instance with the version of the Xen hypervisor in Ubuntu's
package repositories.
Signed-off-by: M. Vefa Bicakci <m.v.b(a)runbox.com>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: xen-devel(a)lists.xenproject.org
Cc: x86(a)kernel.org
Cc: stable(a)vger.kernel.org # for v4.14 and up
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
---
arch/x86/entry/calling.h | 4 +++-
arch/x86/entry/entry_64.S | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 20d0885b00fb..71795767da94 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -97,7 +97,7 @@ For 32-bit we have the following conventions - kernel is built with
#define SIZEOF_PTREGS 21*8
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0 clear_rbx=1
/*
* Push registers and sanitize registers of values that a
* speculation attack might otherwise want to exploit. The
@@ -127,7 +127,9 @@ For 32-bit we have the following conventions - kernel is built with
pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
+ .if \clear_rbx
xorl %ebx, %ebx /* nospec rbx*/
+ .endif
pushq %rbp /* pt_regs->rbp */
xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c7449f377a77..96e8ff34129e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
addq $0x30, %rsp
UNWIND_HINT_IRET_REGS
pushq $-1 /* orig_ax = -1 => not a system call */
- PUSH_AND_CLEAR_REGS
+ PUSH_AND_CLEAR_REGS clear_rbx=0
ENCODE_FRAME_POINTER
jmp error_exit
END(xen_failsafe_callback)
--
2.17.1
Dear Client,
Please confirm if you are still alive because two gentle men
walked into my office this morning to claim your inheritance
funds with our bank. They said that you are dead and that they
are your
representative. I got your email from the file of your relative
who is yet to be paid for the Contract he has executed before his
death several years ago.You the beneficiary of this fund has
not been in contact with the bank to claim your fund. The
gentlemen submitted an address where they want your VISA DEBIT
ATM CARD sent.
If you are still alive, please indicate by sending your full
contact details within 7 day of receiving this message, faliure
to do so, I will send the card to the address submitted by your
representatives.
Regards
Derek
Greetings,
I represent business group in Middle East looking for projects to
fund; we seek any business that will guaranty a safe and secure
return
on investments. Alternative powers, movies, start up companies
etc. We
are also looking for commercial building projects, hotels,
casino,
strip malls etc.
If you have a project that differs from these and current budget
is
over $40M, Write us and Provide Executive Summary and project
details.
(a)100% financing is available.
(b)Possible JV partnerships or 100% loans.
(c)Project must be over $20M total budget.
I look forward to your reply,
Sir Abraham Oded
Sahara Petrochems Ltd
odoabraham5(a)gmail.com
I'm sure I don't need to tell you that fb_helper's locking is a mess.
That being said; fb_helper's locking mess can seriously complicate the
runtime suspend/resume operations of drivers because it can invoke
atomic commits and connector probing from anywhere that calls
drm_fb_helper_hotplug_event(). Since most drivers use
drm_fb_helper_output_poll_changed() as their output_poll_changed
handler, this can happen in every single context that can fire off a
hotplug event. An example:
[ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.676527] kworker/4:0 D 0 37 2 0x80000000
[ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[ 246.678704] Call Trace:
[ 246.679753] __schedule+0x322/0xaf0
[ 246.680916] schedule+0x33/0x90
[ 246.681924] schedule_preempt_disabled+0x15/0x20
[ 246.683023] __mutex_lock+0x569/0x9a0
[ 246.684035] ? kobject_uevent_env+0x117/0x7b0
[ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.686179] mutex_lock_nested+0x1b/0x20
[ 246.687278] ? mutex_lock_nested+0x1b/0x20
[ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[ 246.692611] process_one_work+0x231/0x620
[ 246.693725] worker_thread+0x214/0x3a0
[ 246.694756] kthread+0x12b/0x150
[ 246.695856] ? wq_pool_ids_show+0x140/0x140
[ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.697998] ret_from_fork+0x3a/0x50
[ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.702278] kworker/0:1 D 0 60 2 0x80000000
[ 246.703293] Workqueue: pm pm_runtime_work
[ 246.704393] Call Trace:
[ 246.705403] __schedule+0x322/0xaf0
[ 246.706439] ? wait_for_completion+0x104/0x190
[ 246.707393] schedule+0x33/0x90
[ 246.708375] schedule_timeout+0x3a5/0x590
[ 246.709289] ? mark_held_locks+0x58/0x80
[ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40
[ 246.711222] ? wait_for_completion+0x104/0x190
[ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190
[ 246.713094] ? wait_for_completion+0x104/0x190
[ 246.713964] wait_for_completion+0x12c/0x190
[ 246.714895] ? wake_up_q+0x80/0x80
[ 246.715727] ? get_work_pool+0x90/0x90
[ 246.716649] flush_work+0x1c9/0x280
[ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[ 246.718442] __cancel_work_timer+0x146/0x1d0
[ 246.719247] cancel_delayed_work_sync+0x13/0x20
[ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[ 246.721897] pci_pm_runtime_suspend+0x6b/0x190
[ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.723737] __rpm_callback+0x7a/0x1d0
[ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.725607] rpm_callback+0x24/0x80
[ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.727376] rpm_suspend+0x142/0x6b0
[ 246.728185] pm_runtime_work+0x97/0xc0
[ 246.728938] process_one_work+0x231/0x620
[ 246.729796] worker_thread+0x44/0x3a0
[ 246.730614] kthread+0x12b/0x150
[ 246.731395] ? wq_pool_ids_show+0x140/0x140
[ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.732878] ret_from_fork+0x3a/0x50
[ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.736113] kworker/4:2 D 0 422 2 0x80000080
[ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[ 246.737665] Call Trace:
[ 246.738490] __schedule+0x322/0xaf0
[ 246.739250] schedule+0x33/0x90
[ 246.739908] rpm_resume+0x19c/0x850
[ 246.740750] ? finish_wait+0x90/0x90
[ 246.741541] __pm_runtime_resume+0x4e/0x90
[ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[ 246.743124] drm_atomic_commit+0x4a/0x50 [drm]
[ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau]
[ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[ 246.752314] process_one_work+0x231/0x620
[ 246.752979] worker_thread+0x44/0x3a0
[ 246.753838] kthread+0x12b/0x150
[ 246.754619] ? wq_pool_ids_show+0x140/0x140
[ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.756162] ret_from_fork+0x3a/0x50
[ 246.756847]
Showing all locks held in the system:
[ 246.758261] 3 locks held by kworker/4:0/37:
[ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.761516] 2 locks held by kworker/0:1/60:
[ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.763890] 1 lock held by khungtaskd/64:
[ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[ 246.765588] 5 locks held by kworker/4:2/422:
[ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[ 246.770839] 1 lock held by dmesg/1038:
[ 246.771739] 2 locks held by zsh/1172:
[ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[ 246.775522] =============================================
Because of this, there's an unreasonable number of places that drm
drivers would need to insert special handling to prevent trying to
resume the device from all of these contexts that can deadlock. It's
difficult even to try synchronizing with fb_helper in these contexts as
well, since any of them could introduce a deadlock by waiting to acquire
the top-level fb_helper mutex, while it's being held by another thread
that might potentially call down to pm_runtime_get_sync().
Luckily-there's no actual reason we need to allow fb_helper to handle
hotplugging at all when runtime suspending a device. If a hotplug
happens during a runtime suspend operation, there's no reason the driver
can't just re-enable fbcon's hotplug handling and bring it up to speed
with hotplugging events it may have missed by calling
drm_fb_helper_hotplug_event().
So, let's make this easy and just add helpers to handle disabling and
enabling fb_helper connector probing() without having to potentially
wait on fb_helper to finish it's work. This will let us fix the runtime
suspend/resume deadlocks that we've been experiencing with nouveau,
along with being able to fix some of the incorrect runtime PM core
interaction that other DRM drivers currently perform to work around
these issues.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: Karol Herbst <karolherbst(a)gmail.com>
---
drivers/gpu/drm/drm_fb_helper.c | 65 +++++++++++++++++++++++++++++++++
include/drm/drm_fb_helper.h | 20 ++++++++++
2 files changed, 85 insertions(+)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 2ee1eaa66188..28d59befbc92 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -2733,6 +2733,66 @@ int drm_fb_helper_initial_config(struct drm_fb_helper *fb_helper, int bpp_sel)
}
EXPORT_SYMBOL(drm_fb_helper_initial_config);
+/**
+ * drm_fb_helper_resume_hotplug - Allow fb_helper to handle probing new
+ * connectors again, and bring it up to date
+ * with the current device's connector status
+ * fb_helper: driver-allocated fbdev helper, can be NULL
+ *
+ * Allow fb_helper to react to connector status changes again after having
+ * been disabled through drm_fb_helper_suspend_hotplug(). This also schedules
+ * a fb_helper hotplug event to bring fb_helper up to date with the current
+ * status of the DRM device's connectors.
+ */
+void
+drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper)
+{
+ fb_helper->hotplug_suspended = false;
+ drm_fb_helper_hotplug_event(fb_helper);
+}
+EXPORT_SYMBOL(drm_fb_helper_resume_hotplug);
+
+/**
+ * drm_fb_helper_suspend_hotplug - Attempt to temporarily inhibit fb_helper's
+ * ability to respond to connector changes
+ * without waiting
+ * @fb_helper: driver-allocated fbdev helper, can be NULL
+ *
+ * Temporarily prevent fb_helper from being able to handle connector changes,
+ * but only if it isn't busy doing something else. The connector probing
+ * routines in fb_helper can potentially grab any modesetting lock imaginable,
+ * which dramatically complicates the runtime suspend process. This helper can
+ * be used to simplify the process of runtime suspend by allowing the driver
+ * to disable unpredictable fb_helper operations as early as possible, without
+ * requiring that we try waiting on fb_helper (as this could lead to very
+ * difficult to solve deadlocks with runtime suspend code if fb_helper ends up
+ * needing to acquire a runtime PM reference).
+ *
+ * This call should be put at the very start of a driver's runtime suspend
+ * operation if desired. The driver must be responsible for re-enabling
+ * fb_helper hotplug handling when normal hotplug detection becomes available
+ * on the device again. This will usually happen if runtime suspend is
+ * aborted, or when the device is runtime resumed.
+ *
+ * Returns: true if hotplug handling was disabled, false if disabling hotplug
+ * handling would mean waiting on fb_helper.
+ */
+bool
+drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper)
+{
+ int ret;
+
+ ret = mutex_trylock(&fb_helper->lock);
+ if (!ret)
+ return false;
+
+ fb_helper->hotplug_suspended = true;
+ mutex_unlock(&fb_helper->lock);
+
+ return true;
+}
+EXPORT_SYMBOL(drm_fb_helper_suspend_hotplug);
+
/**
* drm_fb_helper_hotplug_event - respond to a hotplug notification by
* probing all the outputs attached to the fb
@@ -2774,6 +2834,11 @@ int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper)
return err;
}
+ if (fb_helper->hotplug_suspended) {
+ mutex_unlock(&fb_helper->lock);
+ return err;
+ }
+
DRM_DEBUG_KMS("\n");
drm_setup_crtcs(fb_helper, fb_helper->fb->width, fb_helper->fb->height);
diff --git a/include/drm/drm_fb_helper.h b/include/drm/drm_fb_helper.h
index b069433e7fc1..30881131075c 100644
--- a/include/drm/drm_fb_helper.h
+++ b/include/drm/drm_fb_helper.h
@@ -232,6 +232,14 @@ struct drm_fb_helper {
* See also: @deferred_setup
*/
int preferred_bpp;
+
+ /**
+ * @hotplug_suspended:
+ *
+ * Whether or not we can currently handle hotplug events, or if we
+ * need to wait for the DRM device to uninhibit us.
+ */
+ bool hotplug_suspended;
};
/**
@@ -330,6 +338,10 @@ void drm_fb_helper_fbdev_teardown(struct drm_device *dev);
void drm_fb_helper_lastclose(struct drm_device *dev);
void drm_fb_helper_output_poll_changed(struct drm_device *dev);
+
+void drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper);
+bool drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper);
+
#else
static inline void drm_fb_helper_prepare(struct drm_device *dev,
struct drm_fb_helper *helper,
@@ -564,6 +576,14 @@ static inline void drm_fb_helper_output_poll_changed(struct drm_device *dev)
{
}
+static inline void
+drm_fb_helper_resume_hotplug(struct drm_fb_helper *fb_helper)
+{
+}
+static inline bool
+drm_fb_helper_suspend_hotplug(struct drm_fb_helper *fb_helper)
+{
+}
#endif
static inline int
--
2.17.1
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4c612add7b18844ddd733ebdcbe754520155999b Mon Sep 17 00:00:00 2001
From: Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
Date: Tue, 24 Jul 2018 17:13:02 +0300
Subject: [PATCH] ARC: dma [non IOC]: fix arc_dma_sync_single_for_(device|cpu)
ARC backend for dma_sync_single_for_(device|cpu) was broken as it was
not honoring the @dir argument and simply forcing it based on the call:
- arc_dma_sync_single_for_device(dir) assumed DMA_TO_DEVICE (cache wback)
- arc_dma_sync_single_for_cpu(dir) assumed DMA_FROM_DEVICE (cache inv)
This is not true given the DMA API programming model and has been
discussed here [1] in some detail.
Interestingly while the deficiency has been there forever, it only started
showing up after 4.17 dma common ops rework, commit a8eb92d02dd7
("arc: fix arc_dma_{map,unmap}_page") which wired up these calls under the
more commonly used dma_map_page API triggering the issue.
[1]: https://lkml.org/lkml/2018/5/18/979
Fixes: commit a8eb92d02dd7 ("arc: fix arc_dma_{map,unmap}_page")
Cc: stable(a)kernel.org # v4.17+
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
[vgupta: reworked changelog]
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 8c1071840979..ec47e6079f5d 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -129,14 +129,59 @@ int arch_dma_mmap(struct device *dev, struct vm_area_struct *vma,
return ret;
}
+/*
+ * Cache operations depending on function and direction argument, inspired by
+ * https://lkml.org/lkml/2018/5/18/979
+ * "dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20]
+ * dma-mapping: provide a generic dma-noncoherent implementation)"
+ *
+ * | map == for_device | unmap == for_cpu
+ * |----------------------------------------------------------------
+ * TO_DEV | writeback writeback | none none
+ * FROM_DEV | invalidate invalidate | invalidate* invalidate*
+ * BIDIR | writeback+inv writeback+inv | invalidate invalidate
+ *
+ * [*] needed for CPU speculative prefetches
+ *
+ * NOTE: we don't check the validity of direction argument as it is done in
+ * upper layer functions (in include/linux/dma-mapping.h)
+ */
+
void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
{
- dma_cache_wback(paddr, size);
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ dma_cache_wback(paddr, size);
+ break;
+
+ case DMA_FROM_DEVICE:
+ dma_cache_inv(paddr, size);
+ break;
+
+ case DMA_BIDIRECTIONAL:
+ dma_cache_wback_inv(paddr, size);
+ break;
+
+ default:
+ break;
+ }
}
void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
{
- dma_cache_inv(paddr, size);
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ break;
+
+ /* FROM_DEVICE invalidate needed if speculative CPU prefetch only */
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ dma_cache_inv(paddr, size);
+ break;
+
+ default:
+ break;
+ }
}
I'm announcing the release of the 4.17.13 kernel.
All users of the 4.17 kernel series must upgrade.
The updated 4.17.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.17.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/entry/entry_64.S | 18 ----
arch/x86/kernel/apic/apic.c | 3
arch/x86/kvm/vmx.c | 7 -
arch/x86/platform/efi/efi_64.c | 2
drivers/crypto/padlock-aes.c | 8 +
drivers/gpu/drm/drm_atomic_helper.c | 8 +
drivers/gpu/drm/vc4/vc4_plane.c | 3
drivers/infiniband/core/uverbs_cmd.c | 59 ++++++++++++-
drivers/net/bonding/bond_main.c | 14 ++-
drivers/net/can/usb/ems_usb.c | 1
drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 2
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 4
drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 4
drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 40 ++++++++-
drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 3
drivers/net/wireless/intel/iwlwifi/cfg/9000.c | 69 ++++++++++++++++
drivers/net/wireless/intel/iwlwifi/iwl-config.h | 5 +
drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 22 +++++
drivers/scsi/sg.c | 1
drivers/virtio/virtio_balloon.c | 2
fs/squashfs/block.c | 2
fs/squashfs/fragment.c | 13 ++-
fs/squashfs/squashfs_fs_sb.h | 1
fs/squashfs/super.c | 5 -
fs/userfaultfd.c | 4
ipc/shm.c | 12 ++
kernel/auditsc.c | 13 ++-
mm/hugetlb.c | 7 +
net/dsa/slave.c | 6 +
net/ipv4/inet_fragment.c | 6 -
net/ipv4/ip_fragment.c | 5 +
net/netlink/af_netlink.c | 2
net/rxrpc/call_accept.c | 4
net/socket.c | 5 -
35 files changed, 308 insertions(+), 54 deletions(-)
Andy Lutomirski (1):
x86/entry/64: Remove %ebx handling from error_entry/exit
Anton Vasilyev (1):
can: ems_usb: Fix memory leak on ems_usb_disconnect()
Boris Brezillon (3):
drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats
drm/atomic: Check old_plane_state->crtc in drm_atomic_helper_async_check()
drm/atomic: Initialize variables in drm_atomic_helper_async_check() to make gcc happy
Brijesh Singh (1):
x86/efi: Access EFI MMIO data as unencrypted when SEV is active
Eli Cohen (1):
net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
Emmanuel Grumbach (1):
iwlwifi: add more card IDs for 9000 series
Eric Dumazet (3):
bonding: avoid lockdep confusion in bond_get_stats()
inet: frag: enforce memory limits earlier
ipv4: frags: handle possible skb truesize change
Feras Daoud (1):
net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow
Florian Fainelli (1):
net: dsa: Do not suspend/resume closed slave_dev
Greg Kroah-Hartman (1):
Linux 4.17.13
Herbert Xu (1):
crypto: padlock-aes - Fix Nano workaround data corruption
Jack Morgenstein (1):
RDMA/uverbs: Expand primary and alt AV port checks
Jane Chu (1):
ipc/shm.c add ->pagesize function to shm_vm_ops
Jeremy Cline (3):
netlink: Fix spectre v1 gadget in netlink_create()
net: socket: fix potential spectre v1 gadget in socketcall
net: socket: Fix potential spectre v1 gadget in sock_is_registered
Jiang Biao (1):
virtio_balloon: fix another race between migration and ballooning
Jose Abreu (1):
net: stmmac: Fix WoL for PCI-based setups
Len Brown (1):
x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
Linus Torvalds (2):
squashfs: more metadata hardening
squashfs: more metadata hardenings
Mike Rapoport (1):
userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails
Or Gerlitz (1):
net/mlx5e: Set port trust mode to PCP as default
Rafał Miłecki (1):
brcmfmac: fix regression in parsing NVRAM for multiple devices
Roman Kagan (1):
kvm: x86: vmx: fix vpid leak
Tony Battersby (1):
scsi: sg: fix minor memory leak in error path
Yi Wang (1):
audit: fix potential null dereference 'context->module.name'
YueHaibing (1):
rxrpc: Fix user call ID check in rxrpc_service_prealloc_one
Want to follow up the email sent last week.
Do you have needs for photo editing?
We can edit 400 images within 24 hours.
We are working on all kinds of ecommerce photos, jewelry photos, and the
portrait images.
We do cutting out and clipping path and others, and also we provide
retouching for your photos,
You can throw us a photo and we will do testing for you to check our
quality.
Thanks,
Jason
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Some of the ME clients are available only for BIOS operation and are
removed during hand off to an OS. However the removal is not instant.
A client may be visible on the client list when the mei driver requests
for enumeration, while the subsequent request for properties will be
answered with client not found error value. The default behavior
for an error is to perform client reset while this error is harmless and
the link reset should be prevented. This issue started to be visible due to
suspend/resume timing changes. Currently reported only on the Haswell
based system.
Fixes:
[33.564957] mei_me 0000:00:16.0: hbm: properties response: wrong status = 1 CLIENT_NOT_FOUND
[33.564978] mei_me 0000:00:16.0: mei_irq_read_handler ret = -71.
[33.565270] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 1E000255 60002306 00000200 00004401 00000000 00000010
Cc: <stable(a)vger.kernel.org>
Reported-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
---
drivers/misc/mei/hbm.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/misc/mei/hbm.c b/drivers/misc/mei/hbm.c
index f15d44bda28e..fa0f8e91e86f 100644
--- a/drivers/misc/mei/hbm.c
+++ b/drivers/misc/mei/hbm.c
@@ -1244,15 +1244,18 @@ int mei_hbm_dispatch(struct mei_device *dev, struct mei_msg_hdr *hdr)
props_res = (struct hbm_props_response *)mei_msg;
- if (props_res->status) {
+ if (props_res->status == MEI_HBMS_CLIENT_NOT_FOUND) {
+ dev_dbg(dev->dev, "hbm: properties response: %d CLIENT_NOT_FOUND\n",
+ props_res->me_addr);
+ } else if (props_res->status) {
dev_err(dev->dev, "hbm: properties response: wrong status = %d %s\n",
props_res->status,
mei_hbm_status_str(props_res->status));
return -EPROTO;
+ } else {
+ mei_hbm_me_cl_add(dev, props_res);
}
- mei_hbm_me_cl_add(dev, props_res);
-
/* request property for the next client */
if (mei_hbm_prop_req(dev, props_res->me_addr + 1))
return -EIO;
--
2.14.4
Attention: stable(a)vger.kernel.org,
FINAL NOTICE
We have been instructed to arrange your funds/payment via Online Banking & Loaded ATM Cards delivery to you.
Your response very urgent for more details!
Sincerely yours,
Eddie. P.
From: Peter Zijlstra <peterz(a)infradead.org>
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.
--- 8< --- 8< --- 8< ---
[ Upstream commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 ]
Julia reported futex state corruption in the following scenario:
waiter waker stealer (prio > waiter)
futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
timeout=[N ms])
futex_wait_requeue_pi()
futex_wait_queue_me()
freezable_schedule()
<scheduled out>
futex(LOCK_PI, uaddr2)
futex(CMP_REQUEUE_PI, uaddr,
uaddr2, 1, 0)
/* requeues waiter to uaddr2 */
futex(UNLOCK_PI, uaddr2)
wake_futex_pi()
cmp_futex_value_locked(uaddr2, waiter)
wake_up_q()
<woken by waker>
<hrtimer_wakeup() fires,
clears sleeper->task>
futex(LOCK_PI, uaddr2)
__rt_mutex_start_proxy_lock()
try_to_take_rt_mutex() /* steals lock */
rt_mutex_set_owner(lock, stealer)
<preempted>
<scheduled in>
rt_mutex_wait_proxy_lock()
__rt_mutex_slowlock()
try_to_take_rt_mutex() /* fails, lock held by stealer */
if (timeout && !timeout->task)
return -ETIMEDOUT;
fixup_owner()
/* lock wasn't acquired, so,
fixup_pi_state_owner skipped */
return -ETIMEDOUT;
/* At this point, we've returned -ETIMEDOUT to userspace, but the
* futex word shows waiter to be the owner, and the pi_mutex has
* stealer as the owner */
futex_lock(LOCK_PI, uaddr2)
-> bails with EDEADLK, futex word says we're owner.
And suggested that what commit:
73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:
16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
changed all the locking rules. Even without that, the sequence:
- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
- locked = 1;
- goto out;
- }
- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
- owner = rt_mutex_owner(&q->pi_state->pi_mutex);
- if (!owner)
- owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
- raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
- ret = fixup_pi_state_owner(uaddr, q, owner);
already suggests there were races; otherwise we'd never have to look
at next_owner.
So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.
Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.
Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright <julia(a)ni.com>
Reported-by: Gratian Crisan <gratian.crisan(a)ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Julia Cartwright <julia(a)ni.com>
Tested-by: Gratian Crisan <gratian.crisan(a)ni.com>
Cc: Darren Hart <dvhart(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Julia Cartwright <julia(a)ni.com>
---
kernel/futex.c | 83 ++++++++++++++++++++++++++-------
kernel/locking/rtmutex.c | 26 ++++++++---
kernel/locking/rtmutex_common.h | 1 +
3 files changed, 87 insertions(+), 23 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 270148be5647..cdd68ba6e3a6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2287,21 +2287,17 @@ static void unqueue_me_pi(struct futex_q *q)
spin_unlock(q->lock_ptr);
}
-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
- struct task_struct *newowner)
+ struct task_struct *argowner)
{
- u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
struct futex_pi_state *pi_state = q->pi_state;
u32 uval, uninitialized_var(curval), newval;
- struct task_struct *oldowner;
+ struct task_struct *oldowner, *newowner;
+ u32 newtid;
int ret;
+ lockdep_assert_held(q->lock_ptr);
+
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
oldowner = pi_state->owner;
@@ -2310,11 +2306,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
newtid |= FUTEX_OWNER_DIED;
/*
- * We are here either because we stole the rtmutex from the
- * previous highest priority waiter or we are the highest priority
- * waiter but have failed to get the rtmutex the first time.
+ * We are here because either:
+ *
+ * - we stole the lock and pi_state->owner needs updating to reflect
+ * that (@argowner == current),
*
- * We have to replace the newowner TID in the user space variable.
+ * or:
+ *
+ * - someone stole our lock and we need to fix things to point to the
+ * new owner (@argowner == NULL).
+ *
+ * Either way, we have to replace the TID in the user space variable.
* This must be atomic as we have to preserve the owner died bit here.
*
* Note: We write the user space value _before_ changing the pi_state
@@ -2327,6 +2329,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
* in the PID check in lookup_pi_state.
*/
retry:
+ if (!argowner) {
+ if (oldowner != current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+ /* We got the lock after all, nothing to fix. */
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * Since we just failed the trylock; there must be an owner.
+ */
+ newowner = rt_mutex_owner(&pi_state->pi_mutex);
+ BUG_ON(!newowner);
+ } else {
+ WARN_ON_ONCE(argowner != current);
+ if (oldowner == current) {
+ /*
+ * We raced against a concurrent self; things are
+ * already fixed up. Nothing to do.
+ */
+ ret = 0;
+ goto out_unlock;
+ }
+ newowner = argowner;
+ }
+
+ newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
if (get_futex_value_locked(&uval, uaddr))
goto handle_fault;
@@ -2427,15 +2465,28 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
* Got the lock. We might not be the anticipated owner if we
* did a lock-steal - fix up the PI-state in that case:
*
- * We can safely read pi_state->owner without holding wait_lock
- * because we now own the rt_mutex, only the owner will attempt
- * to change it.
+ * Speculative pi_state->owner read (we don't hold wait_lock);
+ * since we own the lock pi_state->owner == current is the
+ * stable state, anything else needs more attention.
*/
if (q->pi_state->owner != current)
ret = fixup_pi_state_owner(uaddr, q, current);
goto out;
}
+ /*
+ * If we didn't get the lock; check if anybody stole it from us. In
+ * that case, we need to fix up the uval to point to them instead of
+ * us, otherwise bad things happen. [10]
+ *
+ * Another speculative read; pi_state->owner == current is unstable
+ * but needs our attention.
+ */
+ if (q->pi_state->owner == current) {
+ ret = fixup_pi_state_owner(uaddr, q, NULL);
+ goto out;
+ }
+
/*
* Paranoia check. If we did not take the lock, then we should not be
* the owner of the rt_mutex.
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 3a8b5d44aaf8..57361d631749 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1849,6 +1849,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
return ret;
}
+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+ int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+ /*
+ * try_to_take_rt_mutex() sets the lock waiters bit
+ * unconditionally. Clean this up.
+ */
+ fixup_rt_mutex_waiters(lock);
+
+ return ret;
+}
+
/*
* Slow path try-lock function:
*/
@@ -1871,13 +1884,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);
- ret = try_to_take_rt_mutex(lock, current, NULL);
-
- /*
- * try_to_take_rt_mutex() sets the lock waiters bit
- * unconditionally. Clean this up.
- */
- fixup_rt_mutex_waiters(lock);
+ ret = __rt_mutex_slowtrylock(lock);
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
@@ -2102,6 +2109,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
return rt_mutex_slowtrylock(lock);
}
+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+ return __rt_mutex_slowtrylock(lock);
+}
+
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 64d89d780059..50c0a1043556 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -122,6 +122,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter);
extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,
--
2.18.0
From: Peter Zijlstra <peterz(a)infradead.org>
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.
--- 8< --- 8< --- 8< ---
[ Upstream commit 51d00899f7e6ded15c89cb4e2cb11a35283bac81 ]
Dmitry (through syzbot) reported being able to trigger the WARN in
get_pi_state() and a use-after-free on:
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
Both are due to this race:
exit_pi_state_list() put_pi_state()
lock(&curr->pi_lock)
while() {
pi_state = list_first_entry(head);
hb = hash_futex(&pi_state->key);
unlock(&curr->pi_lock);
dec_and_test(&pi_state->refcount);
lock(&hb->lock)
lock(&pi_state->pi_mutex.wait_lock) // uaf if pi_state free'd
lock(&curr->pi_lock);
....
unlock(&curr->pi_lock);
get_pi_state(); // WARN; refcount==0
The problem is we take the reference count too late, and don't allow it
being 0. Fix it by using inc_not_zero() and simply retrying the loop
when we fail to get a refcount. In that case put_pi_state() should
remove the entry from the list.
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Gratian Crisan <gratian.crisan(a)ni.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: dvhart(a)infradead.org
Cc: syzbot <bot+2af19c9e1ffe4d4ee1d16c56ae7580feaee75765(a)syzkaller.appspotmail.com>
Cc: syzkaller-bugs(a)googlegroups.com
Cc: <stable(a)vger.kernel.org>
Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization")
Link: http://lkml.kernel.org/r/20171031101853.xpfh72y643kdfhjs@hirez.programming.…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Julia Cartwright <julia(a)ni.com>
---
kernel/futex.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 47e42faad6c5..270148be5647 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -899,11 +899,27 @@ void exit_pi_state_list(struct task_struct *curr)
*/
raw_spin_lock_irq(&curr->pi_lock);
while (!list_empty(head)) {
-
next = head->next;
pi_state = list_entry(next, struct futex_pi_state, list);
key = pi_state->key;
hb = hash_futex(&key);
+
+ /*
+ * We can race against put_pi_state() removing itself from the
+ * list (a waiter going away). put_pi_state() will first
+ * decrement the reference count and then modify the list, so
+ * its possible to see the list entry but fail this reference
+ * acquire.
+ *
+ * In that case; drop the locks to let put_pi_state() make
+ * progress and retry the loop.
+ */
+ if (!atomic_inc_not_zero(&pi_state->refcount)) {
+ raw_spin_unlock_irq(&curr->pi_lock);
+ cpu_relax();
+ raw_spin_lock_irq(&curr->pi_lock);
+ continue;
+ }
raw_spin_unlock_irq(&curr->pi_lock);
spin_lock(&hb->lock);
@@ -914,10 +930,12 @@ void exit_pi_state_list(struct task_struct *curr)
* task still owns the PI-state:
*/
if (head->next != next) {
+ /* retain curr->pi_lock for the loop invariant */
raw_spin_unlock(&curr->pi_lock);
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
+ put_pi_state(pi_state);
continue;
}
@@ -925,9 +943,8 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(&pi_state->list));
list_del_init(&pi_state->list);
pi_state->owner = NULL;
- raw_spin_unlock(&curr->pi_lock);
- get_pi_state(pi_state);
+ raw_spin_unlock(&curr->pi_lock);
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
--
2.18.0
From: Peter Zijlstra <peterz(a)infradead.org>
4.9.115-rt94-rc1 stable review patch.
If you have any objection to the inclusion of this patch, let me know.
--- 8< --- 8< --- 8< ---
[ Upstream commit c74aef2d06a9f59cece89093eecc552933cba72a ]
There was a reported suspicion about a race between exit_pi_state_list()
and put_pi_state(). The same report mentioned the comment with
put_pi_state() said it should be called with hb->lock held, and it no
longer is in all places.
As it turns out, the pi_state->owner serialization is indeed broken. As per
the new rules:
734009e96d19 ("futex: Change locking rules")
pi_state->owner should be serialized by pi_state->pi_mutex.wait_lock.
For the sites setting pi_state->owner we already hold wait_lock (where
required) but exit_pi_state_list() and put_pi_state() were not and
raced on clearing it.
Fixes: 734009e96d19 ("futex: Change locking rules")
Reported-by: Gratian Crisan <gratian.crisan(a)ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: dvhart(a)infradead.org
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20170922154806.jd3ffltfk24m4o4y@hirez.programming…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Julia Cartwright <julia(a)ni.com>
---
kernel/futex.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 8ab0ddd4cf8f..47e42faad6c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -819,8 +819,6 @@ static void get_pi_state(struct futex_pi_state *pi_state)
/*
* Drops a reference to the pi_state object and frees or caches it
* when the last reference is gone.
- *
- * Must be called with the hb lock held.
*/
static void put_pi_state(struct futex_pi_state *pi_state)
{
@@ -835,16 +833,22 @@ static void put_pi_state(struct futex_pi_state *pi_state)
* and has cleaned up the pi_state already
*/
if (pi_state->owner) {
- raw_spin_lock_irq(&pi_state->owner->pi_lock);
- list_del_init(&pi_state->list);
- raw_spin_unlock_irq(&pi_state->owner->pi_lock);
+ struct task_struct *owner;
- rt_mutex_proxy_unlock(&pi_state->pi_mutex, pi_state->owner);
+ raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+ owner = pi_state->owner;
+ if (owner) {
+ raw_spin_lock(&owner->pi_lock);
+ list_del_init(&pi_state->list);
+ raw_spin_unlock(&owner->pi_lock);
+ }
+ rt_mutex_proxy_unlock(&pi_state->pi_mutex, owner);
+ raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
}
- if (current->pi_state_cache)
+ if (current->pi_state_cache) {
kfree(pi_state);
- else {
+ } else {
/*
* pi_state->list is already empty.
* clear pi_state->owner.
@@ -903,14 +907,15 @@ void exit_pi_state_list(struct task_struct *curr)
raw_spin_unlock_irq(&curr->pi_lock);
spin_lock(&hb->lock);
-
- raw_spin_lock_irq(&curr->pi_lock);
+ raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+ raw_spin_lock(&curr->pi_lock);
/*
* We dropped the pi-lock, so re-check whether this
* task still owns the PI-state:
*/
if (head->next != next) {
- raw_spin_unlock_irq(&curr->pi_lock);
+ raw_spin_unlock(&curr->pi_lock);
+ raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
continue;
@@ -920,9 +925,10 @@ void exit_pi_state_list(struct task_struct *curr)
WARN_ON(list_empty(&pi_state->list));
list_del_init(&pi_state->list);
pi_state->owner = NULL;
- raw_spin_unlock_irq(&curr->pi_lock);
+ raw_spin_unlock(&curr->pi_lock);
get_pi_state(pi_state);
+ raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
spin_unlock(&hb->lock);
rt_mutex_futex_unlock(&pi_state->pi_mutex);
@@ -1204,6 +1210,10 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
WARN_ON(!list_empty(&pi_state->list));
list_add(&pi_state->list, &p->pi_state_list);
+ /*
+ * Assignment without holding pi_state->pi_mutex.wait_lock is safe
+ * because there is no concurrency as the object is not published yet.
+ */
pi_state->owner = p;
raw_spin_unlock_irq(&p->pi_lock);
--
2.18.0
This commit:
9fb8d5dc4b64 ("stop_machine, Disable preemption when
waking two stopper threads")
does not fully address the race condition that can occur
as follows:
On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.
Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.
If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.
Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.
Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
Co-Developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
---
kernel/stop_machine.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523d..e190d1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
err = 0;
__cpu_stop_queue_work(stopper1, work1, &wakeq);
__cpu_stop_queue_work(stopper2, work2, &wakeq);
+ /*
+ * The waking up of stopper threads has to happen
+ * in the same scheduling context as the queueing.
+ * Otherwise, there is a possibility of one of the
+ * above stoppers being woken up by another CPU,
+ * and preempting us. This will cause us to n ot
+ * wake up the other stopper forever.
+ */
+ preempt_disable();
unlock:
raw_spin_unlock(&stopper2->lock);
raw_spin_unlock_irq(&stopper1->lock);
@@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
}
if (!err) {
- preempt_disable();
wake_up_q(&wakeq);
preempt_enable();
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Want to know if you have photos for editing?
We can edit 300+ images each day.
We can work on ecommerce photos, jewelry photos, and portrait photos.
We give cut out and clipping path for different kind of photos, and also we
provide retouching for
them.
Send us a test photo and we will do testing for you.
Thanks,
Sam Dennis
Dear Prospective Client,
We provide funding for up to 500 Million USD. Loans are available
at 3% interest
rate with re-payment period of 1 year to 30 years. We provide:-
*Business Loans
*Project Loans
*Personal Loans
*Home Loans e.t.c
If interested, please provide the information below:-
Name:
Amount Needed:
Duration:
Loan Type(e.g Business, Project or Personal):
Country of residence:
The above information would help us determine the best way to
assist you.
Regards
Sanwomi Brad