From: Alan Jenkins <alan.christopher.jenkins(a)gmail.com>
BugLink: http://bugs.launchpad.net/bugs/1776887
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Reviewed-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
(cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Khalid Elmously <khalid.elmously(a)canonical.com>
---
block/blk-core.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index fc0666354af3..59c91e345eea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
while (true) {
bool success = false;
- int ret;
rcu_read_lock();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
*/
smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq,
- (atomic_read(&q->mq_freeze_depth) == 0 &&
- (preempt || !blk_queue_preempt_only(q))) ||
- blk_queue_dying(q));
+ wait_event(q->mq_freeze_wq,
+ (atomic_read(&q->mq_freeze_depth) == 0 &&
+ (preempt || !blk_queue_preempt_only(q))) ||
+ blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
- if (ret)
- return ret;
}
}
--
2.17.1
From: Alan Jenkins <alan.christopher.jenkins(a)gmail.com>
BugLink: http://bugs.launchpad.net/bugs/1776887
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Reviewed-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
(cherry-picked from 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428)
Signed-off-by: Khalid Elmously <khalid.elmously(a)canonical.com>
---
block/blk-core.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index fc0666354af3..59c91e345eea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,7 +821,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
while (true) {
bool success = false;
- int ret;
rcu_read_lock();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -853,14 +852,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
*/
smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq,
- (atomic_read(&q->mq_freeze_depth) == 0 &&
- (preempt || !blk_queue_preempt_only(q))) ||
- blk_queue_dying(q));
+ wait_event(q->mq_freeze_wq,
+ (atomic_read(&q->mq_freeze_depth) == 0 &&
+ (preempt || !blk_queue_preempt_only(q))) ||
+ blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
- if (ret)
- return ret;
}
}
--
2.17.1
On Wed, 2018-02-28 at 09:19 +0000, Roger Pau Monne wrote:
> From: Roger Pau Monne <roger.pau(a)citrix.com>
>
> [ Upstream commit 910f8befdf5bccf25287d9f1743e3e546bcb7ce0 ]
>
> Current cleanup in the error path of xen_bind_pirq_msi_to_irq is
> wrong. First of all there's an off-by-one in the cleanup loop, which
> can lead to unbinding wrong IRQs.
>
> Secondly IRQs not bound won't be freed, thus leaking IRQ numbers.
>
> Note that there's no need to differentiate between bound and unbound
> IRQs when freeing them, __unbind_from_irq will deal with both of them
> correctly.
It appears to me that it is safe to call __unbind_from_irq() after
xen_irq_info_common_setup() fails, but *not* if the latter hasn't been
called at all. In that case the IRQ type will still be set to
IRQT_UNBOUND and this will trigger the BUG_ON() in __unbind_from_irq().
[...]
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -764,8 +764,8 @@ out:
> mutex_unlock(&irq_mapping_update_lock);
> return irq;
> error_irq:
> - for (; i >= 0; i--)
> - __unbind_from_irq(irq + i);
> + while (nvec--)
> + __unbind_from_irq(irq + nvec);
If nvec > 1, and xen_irq_info_pirq_setup() fails for i != nvec - 1,
then we reach here without having called xen_irq_info_common_setup()
for all these IRQs.
In that case, I think we will still want to call xen_free_irq() for all
IRQs. So maybe the fix would be to remove the BUG_ON() in
__unbind_from_irq()?
Ben.
> mutex_unlock(&irq_mapping_update_lock);
> return ret;
> }
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a7e625ce50412a7711efa0f2ef0b96ce3826759 Mon Sep 17 00:00:00 2001
From: Dave Martin <Dave.Martin(a)arm.com>
Date: Thu, 10 May 2018 18:08:23 +0100
Subject: [PATCH] tty: pl011: Avoid spuriously stuck-off interrupts
Commit 9b96fbacda34 ("serial: PL011: clear pending interrupts")
clears the RX and receive timeout interrupts on pl011 startup, to
avoid a screaming-interrupt scenario that can occur when the
firmware or bootloader leaves these interrupts asserted.
This has been noted as an issue when running Linux on qemu [1].
Unfortunately, the above fix seems to lead to potential
misbehaviour if the RX FIFO interrupt is asserted _non_ spuriously
on driver startup, if the RX FIFO is also already full to the
trigger level.
Clearing the RX FIFO interrupt does not change the FIFO fill level.
In this scenario, because the interrupt is now clear and because
the FIFO is already full to the trigger level, no new assertion of
the RX FIFO interrupt can occur unless the FIFO is drained back
below the trigger level. This never occurs because the pl011
driver is waiting for an RX FIFO interrupt to tell it that there is
something to read, and does not read the FIFO at all until that
interrupt occurs.
Thus, simply clearing "spurious" interrupts on startup may be
misguided, since there is no way to be sure that the interrupts are
truly spurious, and things can go wrong if they are not.
This patch instead clears the interrupt condition by draining the
RX FIFO during UART startup, after clearing any potentially
spurious interrupt. This should ensure that an interrupt will
definitely be asserted if the RX FIFO subsequently becomes
sufficiently full.
The drain is done at the point of enabling interrupts only. This
means that it will occur any time the UART is newly opened through
the tty layer. It will not apply to polled-mode use of the UART by
kgdboc: since that scenario cannot use interrupts by design, this
should not matter. kgdboc will interact badly with "normal" use of
the UART in any case: this patch makes no attempt to paper over
such issues.
This patch does not attempt to address the case where the RX FIFO
fills faster than it can be drained: that is a pathological
hardware design problem that is beyond the scope of the driver to
work around. As a failsafe, the number of poll iterations for
draining the FIFO is limited to twice the FIFO size. This will
ensure that the kernel at least boots even if it is impossible to
drain the FIFO for some reason.
[1] [Qemu-devel] [Qemu-arm] [PATCH] pl011: do not put into fifo
before enabled the interruption
https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg06446.html
Reported-by: Wei Xu <xuwei5(a)hisilicon.com>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Linus Walleij <linus.walleij(a)linaro.org>
Cc: Peter Maydell <peter.maydell(a)linaro.org>
Fixes: 9b96fbacda34 ("serial: PL011: clear pending interrupts")
Signed-off-by: Dave Martin <Dave.Martin(a)arm.com>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Wei Xu <xuwei5(a)hisilicon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 4b40a5b449ee..ebd33c0232e6 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1727,10 +1727,26 @@ static int pl011_allocate_irq(struct uart_amba_port *uap)
*/
static void pl011_enable_interrupts(struct uart_amba_port *uap)
{
+ unsigned int i;
+
spin_lock_irq(&uap->port.lock);
/* Clear out any spuriously appearing RX interrupts */
pl011_write(UART011_RTIS | UART011_RXIS, uap, REG_ICR);
+
+ /*
+ * RXIS is asserted only when the RX FIFO transitions from below
+ * to above the trigger threshold. If the RX FIFO is already
+ * full to the threshold this can't happen and RXIS will now be
+ * stuck off. Drain the RX FIFO explicitly to fix this:
+ */
+ for (i = 0; i < uap->fifosize * 2; ++i) {
+ if (pl011_read(uap, REG_FR) & UART01x_FR_RXFE)
+ break;
+
+ pl011_read(uap, REG_DR);
+ }
+
uap->im = UART011_RTIM;
if (!pl011_dma_rx_running(uap))
uap->im |= UART011_RXIM;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a7e625ce50412a7711efa0f2ef0b96ce3826759 Mon Sep 17 00:00:00 2001
From: Dave Martin <Dave.Martin(a)arm.com>
Date: Thu, 10 May 2018 18:08:23 +0100
Subject: [PATCH] tty: pl011: Avoid spuriously stuck-off interrupts
Commit 9b96fbacda34 ("serial: PL011: clear pending interrupts")
clears the RX and receive timeout interrupts on pl011 startup, to
avoid a screaming-interrupt scenario that can occur when the
firmware or bootloader leaves these interrupts asserted.
This has been noted as an issue when running Linux on qemu [1].
Unfortunately, the above fix seems to lead to potential
misbehaviour if the RX FIFO interrupt is asserted _non_ spuriously
on driver startup, if the RX FIFO is also already full to the
trigger level.
Clearing the RX FIFO interrupt does not change the FIFO fill level.
In this scenario, because the interrupt is now clear and because
the FIFO is already full to the trigger level, no new assertion of
the RX FIFO interrupt can occur unless the FIFO is drained back
below the trigger level. This never occurs because the pl011
driver is waiting for an RX FIFO interrupt to tell it that there is
something to read, and does not read the FIFO at all until that
interrupt occurs.
Thus, simply clearing "spurious" interrupts on startup may be
misguided, since there is no way to be sure that the interrupts are
truly spurious, and things can go wrong if they are not.
This patch instead clears the interrupt condition by draining the
RX FIFO during UART startup, after clearing any potentially
spurious interrupt. This should ensure that an interrupt will
definitely be asserted if the RX FIFO subsequently becomes
sufficiently full.
The drain is done at the point of enabling interrupts only. This
means that it will occur any time the UART is newly opened through
the tty layer. It will not apply to polled-mode use of the UART by
kgdboc: since that scenario cannot use interrupts by design, this
should not matter. kgdboc will interact badly with "normal" use of
the UART in any case: this patch makes no attempt to paper over
such issues.
This patch does not attempt to address the case where the RX FIFO
fills faster than it can be drained: that is a pathological
hardware design problem that is beyond the scope of the driver to
work around. As a failsafe, the number of poll iterations for
draining the FIFO is limited to twice the FIFO size. This will
ensure that the kernel at least boots even if it is impossible to
drain the FIFO for some reason.
[1] [Qemu-devel] [Qemu-arm] [PATCH] pl011: do not put into fifo
before enabled the interruption
https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg06446.html
Reported-by: Wei Xu <xuwei5(a)hisilicon.com>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Linus Walleij <linus.walleij(a)linaro.org>
Cc: Peter Maydell <peter.maydell(a)linaro.org>
Fixes: 9b96fbacda34 ("serial: PL011: clear pending interrupts")
Signed-off-by: Dave Martin <Dave.Martin(a)arm.com>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Wei Xu <xuwei5(a)hisilicon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 4b40a5b449ee..ebd33c0232e6 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1727,10 +1727,26 @@ static int pl011_allocate_irq(struct uart_amba_port *uap)
*/
static void pl011_enable_interrupts(struct uart_amba_port *uap)
{
+ unsigned int i;
+
spin_lock_irq(&uap->port.lock);
/* Clear out any spuriously appearing RX interrupts */
pl011_write(UART011_RTIS | UART011_RXIS, uap, REG_ICR);
+
+ /*
+ * RXIS is asserted only when the RX FIFO transitions from below
+ * to above the trigger threshold. If the RX FIFO is already
+ * full to the threshold this can't happen and RXIS will now be
+ * stuck off. Drain the RX FIFO explicitly to fix this:
+ */
+ for (i = 0; i < uap->fifosize * 2; ++i) {
+ if (pl011_read(uap, REG_FR) & UART01x_FR_RXFE)
+ break;
+
+ pl011_read(uap, REG_DR);
+ }
+
uap->im = UART011_RTIM;
if (!pl011_dma_rx_running(uap))
uap->im |= UART011_RXIM;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 13dc04d0e5fdc25c8f713ad23fdce51cf2bf96ba Mon Sep 17 00:00:00 2001
From: Tony Lindgren <tony(a)atomide.com>
Date: Fri, 4 May 2018 10:44:09 -0700
Subject: [PATCH] serial: 8250: omap: Fix idling of clocks for unused uarts
I noticed that unused UARTs won't necessarily idle properly always
unless at least one byte tx transfer is done first.
After some debugging I narrowed down the problem to the scr register
dma configuration bits that need to be set before softreset for the
clocks to idle. Unless we do this, the module clkctrl idlest bits
may be set to 1 instead of 3 meaning the clock will never idle and
is blocking deeper idle states for the whole domain.
This might be related to the configuration done by the bootloader
or kexec booting where certain configurations cause the 8250 or
the clkctrl clock to jam in a way where setting of the scr bits
and reset is needed to clear it. I've tried diffing the 8250
registers for the various modes, but did not see anything specific.
So far I've only seen this on omap4 but I'm suspecting this might
also happen on the other clkctrl using SoCs considering they
already have a quirk enabled for UART_ERRATA_CLOCK_DISABLE.
Let's fix the issue by configuring scr before reset for basic dma
even if we don't use it. The scr register will be reset when we do
softreset few lines after, and we restore scr on resume. We should
do this for all the SoCs with UART_ERRATA_CLOCK_DISABLE quirk flag
set since the ones with UART_ERRATA_CLOCK_DISABLE are all based
using clkctrl similar to omap4.
Looks like both OMAP_UART_SCR_DMAMODE_1 | OMAP_UART_SCR_DMAMODE_CTL
bits are needed for the clkctrl to idle after a softreset.
And we need to add omap4 to also use the UART_ERRATA_CLOCK_DISABLE
for the related workaround to be enabled. This same compatible
value will also be used for omap5.
Fixes: cdb929e4452a ("serial: 8250_omap: workaround errata around idling UART after using DMA")
Cc: Keerthy <j-keerthy(a)ti.com>
Cc: Matthijs van Duin <matthijsvanduin(a)gmail.com>
Cc: Sekhar Nori <nsekhar(a)ti.com>
Cc: Tero Kristo <t-kristo(a)ti.com>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index 6aaa84355fd1..1b337fee07ed 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -1110,13 +1110,14 @@ static int omap8250_no_handle_irq(struct uart_port *port)
return 0;
}
+static const u8 omap4_habit = UART_ERRATA_CLOCK_DISABLE;
static const u8 am3352_habit = OMAP_DMA_TX_KICK | UART_ERRATA_CLOCK_DISABLE;
static const u8 dra742_habit = UART_ERRATA_CLOCK_DISABLE;
static const struct of_device_id omap8250_dt_ids[] = {
{ .compatible = "ti,omap2-uart" },
{ .compatible = "ti,omap3-uart" },
- { .compatible = "ti,omap4-uart" },
+ { .compatible = "ti,omap4-uart", .data = &omap4_habit, },
{ .compatible = "ti,am3352-uart", .data = &am3352_habit, },
{ .compatible = "ti,am4372-uart", .data = &am3352_habit, },
{ .compatible = "ti,dra742-uart", .data = &dra742_habit, },
@@ -1362,6 +1363,19 @@ static int omap8250_soft_reset(struct device *dev)
int sysc;
int syss;
+ /*
+ * At least on omap4, unused uarts may not idle after reset without
+ * a basic scr dma configuration even with no dma in use. The
+ * module clkctrl status bits will be 1 instead of 3 blocking idle
+ * for the whole clockdomain. The softreset below will clear scr,
+ * and we restore it on resume so this is safe to do on all SoCs
+ * needing omap8250_soft_reset() quirk. Do it in two writes as
+ * recommended in the comment for omap8250_update_scr().
+ */
+ serial_out(up, UART_OMAP_SCR, OMAP_UART_SCR_DMAMODE_1);
+ serial_out(up, UART_OMAP_SCR,
+ OMAP_UART_SCR_DMAMODE_1 | OMAP_UART_SCR_DMAMODE_CTL);
+
sysc = serial_in(up, UART_OMAP_SYSC);
/* softreset the UART */