Turns out this part is my fault for not noticing when reviewing
9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently
we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work().
This makes basically no sense however, because that means we're calling
drm_kms_helper_poll_enable() every time we schedule the hotplug
detection work. This is also against the advice mentioned in
drm_kms_helper_poll_enable()'s documentation:
Note that calls to enable and disable polling must be strictly ordered,
which is automatically the case when they're only call from
suspend/resume callbacks.
Of course, hotplugs can't really be ordered. They could even happen
immediately after we called drm_kms_helper_poll_disable() in
nouveau_display_fini(), which can lead to all sorts of issues.
Additionally; enabling polling /after/ we call
drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug
event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to
probe connectors so long as polling is disabled.
So; simply move this back into nouveau_display_init() again. The race
condition that both of these patches attempted to work around has
already been fixed properly in
d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend")
Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling")
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/nouveau/nouveau_display.c | 7 +++++--
drivers/gpu/drm/nouveau/nouveau_drm.c | 1 -
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index ec7861457b84..1d36ab5d4796 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -355,8 +355,6 @@ nouveau_display_hpd_work(struct work_struct *work)
pm_runtime_get_sync(drm->dev->dev);
drm_helper_hpd_irq_event(drm->dev);
- /* enable polling for external displays */
- drm_kms_helper_poll_enable(drm->dev);
pm_runtime_mark_last_busy(drm->dev->dev);
pm_runtime_put_sync(drm->dev->dev);
@@ -411,6 +409,11 @@ nouveau_display_init(struct drm_device *dev)
if (ret)
return ret;
+ /* enable connector detection and polling for connectors without HPD
+ * support
+ */
+ drm_kms_helper_poll_enable(dev);
+
/* enable hotplug interrupts */
drm_connector_list_iter_begin(dev, &conn_iter);
nouveau_for_each_non_mst_connector_iter(connector, &conn_iter) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index c7ec86d6c3c9..5fdc1fbe2ee5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -835,7 +835,6 @@ nouveau_pmops_runtime_suspend(struct device *dev)
return -EBUSY;
}
- drm_kms_helper_poll_disable(drm_dev);
nouveau_switcheroo_optimus_dsm();
ret = nouveau_do_suspend(drm_dev, true);
pci_save_state(pdev);
--
2.17.1
A long time ago the unfortunate decision was taken to add a self-
deletion attribute to the sysfs SCSI device directory. That decision
was unfortunate because self-deletion is really tricky. We can't drop
that attribute because widely used user space software depends on it,
namely the rescan-scsi-bus.sh script. Hence this patch that avoids
that writing into that attribute triggers a deadlock. See also commit
7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete
scsi_devices").
This patch avoids that self-removal triggers the following deadlock:
======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc2-dbg+ #5 Not tainted
------------------------------------------------------
modprobe/6539 is trying to acquire lock:
000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90
but task is already holding lock:
00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&shost->scan_mutex){+.+.}:
__mutex_lock+0xfe/0xc70
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x26/0x40 [scsi_mod]
sdev_store_delete+0x27/0x30 [scsi_mod]
dev_attr_store+0x3e/0x50
sysfs_kf_write+0x87/0xa0
kernfs_fop_write+0x190/0x230
__vfs_write+0xd2/0x3b0
vfs_write+0x101/0x270
ksys_write+0xab/0x120
__x64_sys_write+0x43/0x50
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (kn->count#202){++++}:
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&shost->scan_mutex);
lock(kn->count#202);
lock(&shost->scan_mutex);
lock(kn->count#202);
*** DEADLOCK ***
2 locks held by modprobe/6539:
#0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360
#1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
stack backtrace:
CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xa4/0xf5
print_circular_bug.isra.34+0x213/0x221
__lock_acquire+0x1a7e/0x1b50
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html.
Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Cc: <stable(a)vger.kernel.org>
---
drivers/scsi/scsi_sysfs.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index de122354d09a..3aee9464a7bf 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -722,8 +722,24 @@ static ssize_t
sdev_store_delete(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
- if (device_remove_file_self(dev, attr))
- scsi_remove_device(to_scsi_device(dev));
+ struct kernfs_node *kn;
+
+ kn = sysfs_break_active_protection(&dev->kobj, &attr->attr);
+ WARN_ON_ONCE(!kn);
+ /*
+ * Concurrent writes into the "delete" sysfs attribute may trigger
+ * concurrent calls to device_remove_file() and scsi_remove_device().
+ * device_remove_file() handles concurrent removal calls by
+ * serializing these and by ignoring the second and later removal
+ * attempts. Concurrent calls of scsi_remove_device() are
+ * serialized. The second and later calls of scsi_remove_device() are
+ * ignored because the first call of that function changes the device
+ * state into SDEV_DEL.
+ */
+ device_remove_file(dev, attr);
+ scsi_remove_device(to_scsi_device(dev));
+ if (kn)
+ sysfs_unbreak_active_protection(kn);
return count;
};
static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
--
2.18.0
A long time ago the unfortunate decision was taken to add a self-
deletion attribute to the sysfs SCSI device directory. That decision
was unfortunate because self-deletion is really tricky. We can't drop
that attribute because widely used user space software depends on it,
namely the rescan-scsi-bus.sh script. Hence this patch that avoids
that writing into that attribute triggers a deadlock. See also commit
7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete
scsi_devices").
This patch avoids that self-removal triggers the following deadlock:
======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc2-dbg+ #5 Not tainted
------------------------------------------------------
modprobe/6539 is trying to acquire lock:
000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90
but task is already holding lock:
00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&shost->scan_mutex){+.+.}:
__mutex_lock+0xfe/0xc70
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x26/0x40 [scsi_mod]
sdev_store_delete+0x27/0x30 [scsi_mod]
dev_attr_store+0x3e/0x50
sysfs_kf_write+0x87/0xa0
kernfs_fop_write+0x190/0x230
__vfs_write+0xd2/0x3b0
vfs_write+0x101/0x270
ksys_write+0xab/0x120
__x64_sys_write+0x43/0x50
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (kn->count#202){++++}:
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&shost->scan_mutex);
lock(kn->count#202);
lock(&shost->scan_mutex);
lock(kn->count#202);
*** DEADLOCK ***
2 locks held by modprobe/6539:
#0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360
#1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
stack backtrace:
CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xa4/0xf5
print_circular_bug.isra.34+0x213/0x221
__lock_acquire+0x1a7e/0x1b50
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html.
Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Cc: <stable(a)vger.kernel.org>
---
drivers/scsi/scsi_sysfs.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index de122354d09a..3aee9464a7bf 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -722,8 +722,24 @@ static ssize_t
sdev_store_delete(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
- if (device_remove_file_self(dev, attr))
- scsi_remove_device(to_scsi_device(dev));
+ struct kernfs_node *kn;
+
+ kn = sysfs_break_active_protection(&dev->kobj, &attr->attr);
+ WARN_ON_ONCE(!kn);
+ /*
+ * Concurrent writes into the "delete" sysfs attribute may trigger
+ * concurrent calls to device_remove_file() and scsi_remove_device().
+ * device_remove_file() handles concurrent removal calls by
+ * serializing these and by ignoring the second and later removal
+ * attempts. Concurrent calls of scsi_remove_device() are
+ * serialized. The second and later calls of scsi_remove_device() are
+ * ignored because the first call of that function changes the device
+ * state into SDEV_DEL.
+ */
+ device_remove_file(dev, attr);
+ scsi_remove_device(to_scsi_device(dev));
+ if (kn)
+ sysfs_unbreak_active_protection(kn);
return count;
};
static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
--
2.18.0
Runtime PM isn't ready for blk-mq yet, and commit 765e40b675a9 ("block:
disable runtime-pm for blk-mq") tried to disable it. Unfortunately,
it can't take effect in that way since user space still can switch
it on via 'echo auto > /sys/block/sdN/device/power/control'.
This patch disables runtime-pm for blk-mq really by pm_runtime_disable()
and fixes all kinds of PM related kernel crash.
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Patrick Steinhardt <ps(a)pks.im>
Cc: Bart Van Assche <Bart.VanAssche(a)wdc.com>
Cc: Tomas Janousek <tomi(a)nomi.cz>
Cc: Przemek Socha <soprwa(a)gmail.com>
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 03a4ea93a5f3..090b782df129 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3769,9 +3769,11 @@ EXPORT_SYMBOL(blk_finish_plug);
*/
void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
{
- /* not support for RQF_PM and ->rpm_status in blk-mq yet */
- if (q->mq_ops)
+ /* Don't enable runtime PM for blk-mq until it is ready */
+ if (q->mq_ops) {
+ pm_runtime_disable(dev);
return;
+ }
q->dev = dev;
q->rpm_status = RPM_ACTIVE;
--
2.9.5
commit 4437ba7ee7de7e71d11deb91c87a370e4ffd2601 upstream.
Our Transmit Frame Descriptor (TFD) is a DMA descriptor that
includes several pointers to be able to transmit a packet
which is not physically contiguous.
Depending on the hardware being use, we can have 20 or 25
pointers in a single TFD. In both cases, it is more than
enough and it is quite hard to hit this limit.
It has been reported that when using specific applications
(Ktorrent), we can actually use all the pointers and then
a long standing bug showed up.
When we free the TFD, we check its number of valid pointers
and make sure it doesn't exceed the number of pointers the
hardware support.
This check had an off by one bug: it is perfectly valid to
free the 20 pointers if the TFD has 20 pointers.
Fix that.
https://bugzilla.kernel.org/show_bug.cgi?id=197981
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
index 4704137..4c3b134 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
@@ -378,7 +378,7 @@ static void iwl_pcie_tfd_unmap(struct iwl_trans *trans,
/* Sanity check on number of chunks */
num_tbs = iwl_pcie_tfd_get_num_tbs(trans, tfd);
- if (num_tbs >= trans_pcie->max_tbs) {
+ if (num_tbs > trans_pcie->max_tbs) {
IWL_ERR(trans, "Too many chunks: %i\n", num_tbs);
/* @todo issue fatal error, it is quite serious situation */
return;
--
2.7.4
commit 4437ba7ee7de7e71d11deb91c87a370e4ffd2601 upstream.
Our Transmit Frame Descriptor (TFD) is a DMA descriptor that
includes several pointers to be able to transmit a packet
which is not physically contiguous.
Depending on the hardware being use, we can have 20 or 25
pointers in a single TFD. In both cases, it is more than
enough and it is quite hard to hit this limit.
It has been reported that when using specific applications
(Ktorrent), we can actually use all the pointers and then
a long standing bug showed up.
When we free the TFD, we check its number of valid pointers
and make sure it doesn't exceed the number of pointers the
hardware support.
This check had an off by one bug: it is perfectly valid to
free the 20 pointers if the TFD has 20 pointers.
Fix that.
https://bugzilla.kernel.org/show_bug.cgi?id=197981
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
index e1bfc95..2c98784 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
@@ -416,7 +416,7 @@ static void iwl_pcie_tfd_unmap(struct iwl_trans *trans,
/* Sanity check on number of chunks */
num_tbs = iwl_pcie_tfd_get_num_tbs(trans, tfd);
- if (num_tbs >= trans_pcie->max_tbs) {
+ if (num_tbs > trans_pcie->max_tbs) {
IWL_ERR(trans, "Too many chunks: %i\n", num_tbs);
/* @todo issue fatal error, it is quite serious situation */
return;
--
2.7.4