When md raid device (e.g. raid456) is used as backing device, read-ahead
requests on a degrading and recovering md raid device might be failured
immediately by md raid code, but indeed this md raid array can still be
read or write for normal I/O requests. Therefore such failed read-ahead
request are not real hardware failure. Further more, after degrading and
recovering accomplished, read-ahead requests will be handled by md raid
array again.
For such condition, I/O failures of read-ahead requests don't indicate
real health status (because normal I/O still be served), they should not
be counted into I/O error counter dc->io_errors.
Since there is no simple way to detect whether the backing divice is a
md raid device, this patch simply ignores I/O failures for read-ahead
bios on backing device, to avoid bogus backing device failure on a
degrading md raid array.
Suggested-and-tested-by: Thorsten Knabe <linux(a)thorsten-knabe.de>
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/io.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index c25097968319..4d93f07f63e5 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -58,6 +58,18 @@ void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio)
WARN_ONCE(!dc, "NULL pointer of struct cached_dev");
+ /*
+ * Read-ahead requests on a degrading and recovering md raid
+ * (e.g. raid6) device might be failured immediately by md
+ * raid code, which is not a real hardware media failure. So
+ * we shouldn't count failed REQ_RAHEAD bio to dc->io_errors.
+ */
+ if (bio->bi_opf & REQ_RAHEAD) {
+ pr_warn_ratelimited("%s: Read-ahead I/O failed on backing device, ignore",
+ dc->backing_dev_name);
+ return;
+ }
+
errors = atomic_add_return(1, &dc->io_errors);
if (errors < dc->error_limit)
pr_err("%s: IO error on backing device, unrecoverable",
--
2.16.4
This reverts commit 6147305c73e4511ca1a975b766b97a779d442567.
Although this patch helps the failed bcache device to stop faster when
too many I/O errors detected on corresponding cached device, setting
CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This
operation will disable all I/Os on cache set, which means other attached
bcache devices won't work neither.
Without this patch, the failed bcache device can also be stopped
eventually if internal I/O accomplished (e.g. writeback). Therefore here
I revert it.
Fixes: 6147305c73e4 ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()")
Reported-by: Yong Li <mr.liyong(a)qq.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/super.c | 17 -----------------
1 file changed, 17 deletions(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 64d9de89a63f..ba2ad093bc80 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1437,8 +1437,6 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size)
bool bch_cached_dev_error(struct cached_dev *dc)
{
- struct cache_set *c;
-
if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags))
return false;
@@ -1449,21 +1447,6 @@ bool bch_cached_dev_error(struct cached_dev *dc)
pr_err("stop %s: too many IO errors on backing device %s\n",
dc->disk.disk->disk_name, dc->backing_dev_name);
- /*
- * If the cached device is still attached to a cache set,
- * even dc->io_disable is true and no more I/O requests
- * accepted, cache device internal I/O (writeback scan or
- * garbage collection) may still prevent bcache device from
- * being stopped. So here CACHE_SET_IO_DISABLE should be
- * set to c->flags too, to make the internal I/O to cache
- * device rejected and stopped immediately.
- * If c is NULL, that means the bcache device is not attached
- * to any cache set, then no CACHE_SET_IO_DISABLE bit to set.
- */
- c = dc->disk.c;
- if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags))
- pr_info("CACHE_SET_IO_DISABLE already set");
-
bcache_device_stop(&dc->disk);
return true;
}
--
2.16.4
This is the start of the stable review cycle for the 4.14.131 release.
There are 1 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri 28 Jun 2019 08:35:42 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.131-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.131-rc1
Eric Dumazet <edumazet(a)google.com>
tcp: refine memory limit test in tcp_fragment()
-------------
Diffstat:
Makefile | 4 ++--
net/ipv4/tcp_output.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
This is the start of the stable review cycle for the 4.4.184 release.
There are 1 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri 28 Jun 2019 08:35:42 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.184-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.184-rc1
Eric Dumazet <edumazet(a)google.com>
tcp: refine memory limit test in tcp_fragment()
-------------
Diffstat:
Makefile | 4 ++--
net/ipv4/tcp_output.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
In case of the last page containing bitflips (ret > 0),
spinand_mtd_read() will return that number of bitflips for the last
page. But to me it looks like it should instead return max_bitflips like
it does when the last page read returns with 0.
Signed-off-by: liaoweixiong <liaoweixiong(a)allwinnertech.com>
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
---
drivers/mtd/nand/spi/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 556bfdb..6b9388d 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -511,12 +511,12 @@ static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
if (ret == -EBADMSG) {
ecc_failed = true;
mtd->ecc_stats.failed++;
- ret = 0;
} else {
mtd->ecc_stats.corrected += ret;
max_bitflips = max_t(unsigned int, max_bitflips, ret);
}
+ ret = 0;
ops->retlen += iter.req.datalen;
ops->oobretlen += iter.req.ooblen;
}
--
1.9.1
Hi Linus,
Could you pull this please?
There are four patches:
(1) Fix the printing of the "vnode modified" warning to exclude checks on
files for which we don't have a callback promise from the server (and
so don't expect the server to tell us when it changes).
Without this, for every file or directory for which we still have an
in-core inode that gets changed on the server, we may get a message
logged when we next look at it. This can happen in bulk if, for
instance, someone does "vos release" to update a R/O volume from a R/W
volume and a whole set of files are all changed together.
We only really want to log a message if the file changed and the
server didn't tell us about it or we failed to track the state
internally.
(2) Fix accidental corruption of either afs_vlserver struct objects or the
the following memory locations (which could hold anything). The issue
is caused by a union that points to two different structs in struct
afs_call (to save space in the struct). The call cleanup code assumes
that it can simply call the cleanup for one of those structs if not
NULL - when it might be actually pointing to the other struct.
This means that every Volume Location RPC op is going to corrupt
something.
(3) Fix an uninitialised spinlock. This isn't too bad, it just causes a
one-off warning if lockdep is enabled when "vos release" is called,
but the spinlock still behaves correctly.
(4) Fix the setting of i_block in the inode. This causes du, for example,
to produce incorrect results, but otherwise should not be dangerous to
the kernel.
The in-kernel AFS client has been undergoing testing on opendev.org on one
of their mirror machines. They are using AFS to hold data that is then
served via apache, and Ian Wienand had reported seeing oopses, spontaneous
machine reboots and updates to volumes going missing. This patch series
appears to have fixed the problem, very probably due to patch (2), but it's
not 100% certain.
Reviewed-by: Jeffrey Altman <jaltman(a)auristor.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Tested-by: Ian Wienand <iwienand(a)redhat.com>
---
The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9:
Linux 5.2-rc1 (2019-05-19 15:47:09 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-fixes-20190620
for you to fetch changes up to 2cd42d19cffa0ec3dfb57b1b3e1a07a9bf4ed80a:
afs: Fix setting of i_blocks (2019-06-20 18:12:02 +0100)
----------------------------------------------------------------
AFS fixes
----------------------------------------------------------------
David Howells (4):
afs: Fix over zealous "vnode modified" warnings
afs: Fix vlserver record corruption
afs: Fix uninitialised spinlock afs_volume::cb_break_lock
afs: Fix setting of i_blocks
fs/afs/callback.c | 4 ++--
fs/afs/inode.c | 31 +++++++++++++++++++------------
fs/afs/internal.h | 8 +++-----
fs/afs/volume.c | 1 +
4 files changed, 25 insertions(+), 19 deletions(-)
The patch titled
Subject: fs/userfaultfd.c: disable irqs for fault_pending and event locks
has been added to the -mm tree. Its filename is
userfaultfd-disable-irqs-for-fault_pending-and-event-locks.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/userfaultfd-disable-irqs-for-fault…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd-disable-irqs-for-fault…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Eric Biggers <ebiggers(a)google.com>
Subject: fs/userfaultfd.c: disable irqs for fault_pending and event locks
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and
takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have
to wait for userfaultfd_ctx::fd_wqh.lock to be released by
userfaultfd_ctx_read(), which can be waiting for
userfaultfd_ctx::fault_pending_wqh.lock or
userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and
event_wqh locks are taken with IRQs enabled. Since the IRQ handler may
take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
Fix it by always disabling IRQs when taking the fault_pending_wqh and
event_wqh locks.
ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock")
didn't fix this because it only accounted for the fd_wqh lock, not the
other locks nested inside it.
Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reported-by: syzbot+fab6de82892b6b9c6191(a)syzkaller.appspotmail.com
Reported-by: syzbot+53c0b767f7ca0dc0c451(a)syzkaller.appspotmail.com
Reported-by: syzbot+a3accb352f9c22041cfa(a)syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 42 ++++++++++++++++++++++++++----------------
1 file changed, 26 insertions(+), 16 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd-disable-irqs-for-fault_pending-and-event-locks
+++ a/fs/userfaultfd.c
@@ -40,6 +40,16 @@ enum userfaultfd_state {
/*
* Start with fault_pending_wqh and fault_wqh so they're more likely
* to be in the same cacheline.
+ *
+ * Locking order:
+ * fd_wqh.lock
+ * fault_pending_wqh.lock
+ * fault_wqh.lock
+ * event_wqh.lock
+ *
+ * To avoid deadlocks, IRQs must be disabled when taking any of the above locks,
+ * since fd_wqh.lock is taken by aio_poll() while it's holding a lock that's
+ * also taken in IRQ context.
*/
struct userfaultfd_ctx {
/* waitqueue head for the pending (i.e. not read) userfaults */
@@ -458,7 +468,7 @@ vm_fault_t handle_userfault(struct vm_fa
blocking_state = return_to_userland ? TASK_INTERRUPTIBLE :
TASK_KILLABLE;
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
/*
* After the __add_wait_queue the uwq is visible to userland
* through poll/read().
@@ -470,7 +480,7 @@ vm_fault_t handle_userfault(struct vm_fa
* __add_wait_queue.
*/
set_current_state(blocking_state);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
if (!is_vm_hugetlb_page(vmf->vma))
must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
@@ -552,13 +562,13 @@ vm_fault_t handle_userfault(struct vm_fa
* kernel stack can be released after the list_del_init.
*/
if (!list_empty_careful(&uwq.wq.entry)) {
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
/*
* No need of list_del_init(), the uwq on the stack
* will be freed shortly anyway.
*/
list_del(&uwq.wq.entry);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
}
/*
@@ -583,7 +593,7 @@ static void userfaultfd_event_wait_compl
init_waitqueue_entry(&ewq->wq, current);
release_new_ctx = NULL;
- spin_lock(&ctx->event_wqh.lock);
+ spin_lock_irq(&ctx->event_wqh.lock);
/*
* After the __add_wait_queue the uwq is visible to userland
* through poll/read().
@@ -613,15 +623,15 @@ static void userfaultfd_event_wait_compl
break;
}
- spin_unlock(&ctx->event_wqh.lock);
+ spin_unlock_irq(&ctx->event_wqh.lock);
wake_up_poll(&ctx->fd_wqh, EPOLLIN);
schedule();
- spin_lock(&ctx->event_wqh.lock);
+ spin_lock_irq(&ctx->event_wqh.lock);
}
__set_current_state(TASK_RUNNING);
- spin_unlock(&ctx->event_wqh.lock);
+ spin_unlock_irq(&ctx->event_wqh.lock);
if (release_new_ctx) {
struct vm_area_struct *vma;
@@ -918,10 +928,10 @@ wakeup:
* the last page faults that may have been already waiting on
* the fault_*wqh.
*/
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, &range);
__wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
/* Flush pending events that may still wait on event_wqh */
wake_up_all(&ctx->event_wqh);
@@ -1134,7 +1144,7 @@ static ssize_t userfaultfd_ctx_read(stru
if (!ret && msg->event == UFFD_EVENT_FORK) {
ret = resolve_userfault_fork(ctx, fork_nctx, msg);
- spin_lock(&ctx->event_wqh.lock);
+ spin_lock_irq(&ctx->event_wqh.lock);
if (!list_empty(&fork_event)) {
/*
* The fork thread didn't abort, so we can
@@ -1180,7 +1190,7 @@ static ssize_t userfaultfd_ctx_read(stru
if (ret)
userfaultfd_ctx_put(fork_nctx);
}
- spin_unlock(&ctx->event_wqh.lock);
+ spin_unlock_irq(&ctx->event_wqh.lock);
}
return ret;
@@ -1219,14 +1229,14 @@ static ssize_t userfaultfd_read(struct f
static void __wake_userfault(struct userfaultfd_ctx *ctx,
struct userfaultfd_wake_range *range)
{
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
/* wake all in the range and autoremove */
if (waitqueue_active(&ctx->fault_pending_wqh))
__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL,
range);
if (waitqueue_active(&ctx->fault_wqh))
__wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, range);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
}
static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
@@ -1881,7 +1891,7 @@ static void userfaultfd_show_fdinfo(stru
wait_queue_entry_t *wq;
unsigned long pending = 0, total = 0;
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
list_for_each_entry(wq, &ctx->fault_pending_wqh.head, entry) {
pending++;
total++;
@@ -1889,7 +1899,7 @@ static void userfaultfd_show_fdinfo(stru
list_for_each_entry(wq, &ctx->fault_wqh.head, entry) {
total++;
}
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
/*
* If more protocols will be added, there will be all shown
_
Patches currently in -mm which might be from ebiggers(a)google.com are
userfaultfd-disable-irqs-for-fault_pending-and-event-locks.patch
The vsyscall=native feature is gone -- remove the docs.
Fixes: 076ca272a14c ("x86/vsyscall/64: Drop "native" vsyscalls")
Cc: stable(a)vger.kernel.org
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Kernel Hardening <kernel-hardening(a)lists.openwall.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 6 ------
1 file changed, 6 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..0082d1e56999 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5102,12 +5102,6 @@
emulate [default] Vsyscalls turn into traps and are
emulated reasonably safely.
- native Vsyscalls are native syscall instructions.
- This is a little bit faster than trapping
- and makes a few dynamic recompilers work
- better than they would in emulation mode.
- It also makes exploits much easier to write.
-
none Vsyscalls don't work at all. This makes
them quite hard to use for exploits but
might break your system.
--
2.21.0
Switch to the "marvell,armada-38x-uart" driver variant to empty
the UART buffer before writing to the UART_LCR register.
Signed-off-by: Joshua Scott <joshua.scott(a)alliedtelesis.co.nz>
Tested-by: Andrew Lunn <andrew(a)lunn.ch>
Acked-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>.
Cc: stable(a)vger.kernel.org
Fixes: 43e28ba87708 ("ARM: dts: Use armada-370-xp as a base for armada-xp-98dx3236")
---
Changes in v3:
Updated title, added tested-by, and Fixes tag
Changes in v2:
Andrew Lunn was able to test on a Marvell 370RD reference design, and
the character loss issue did not occur.
The fix has now been changed to only affect the following SOCs:
* 98DX323x
* 98DX3333
* 98DX4251
v1 message:
We have found that like the armada 38x, other Marvell SOCs can lose
characters when the UART_LCR register is written to without first
waiting for the buffer to empty.
We have observed this behaviour on the following Marvell switch SOCs:
* 98DX323x
* 98DX3333
* 98DX4251
However, we do not currently have access to non-switch SOCs which share
the same parent device-tree.
The question we have is, should the fix be applied to the common
armada-370-xp device-tree, or should it be restricted to only affect the
SOCs listed above.
If anybody is able to check, we would like to find out if the issue
affects other armada-xp / armada-370 based SOCs.
The issue can be reproduced, if logging in using the serial port, with:
resize && echo "hello world"
On affected devices, the first couple letters of "hello world" are
lost. On some SOCs this can be seen at 115200 baud, and on others
we have had to slow down to 9600 to see the issue.
Cheers,
Joshua Scott
---
arch/arm/boot/dts/armada-xp-98dx3236.dtsi | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm/boot/dts/armada-xp-98dx3236.dtsi b/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
index 59753470cd34..267d0c178e55 100644
--- a/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
+++ b/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
@@ -336,3 +336,11 @@
status = "disabled";
};
+&uart0 {
+ compatible = "marvell,armada-38x-uart";
+};
+
+&uart1 {
+ compatible = "marvell,armada-38x-uart";
+};
+
--
2.21.0