From: Robin Gong <yibin.gong(a)nxp.com>
It is possible for an irq triggered by channel0 to be received later
after clks are disabled once firmware loaded during sdma probe. If
that happens then clearing them by writing to SDMA_H_INTR won't work
and the kernel will hang processing infinite interrupts. Actually,
don't need interrupt triggered on channel0 since it's pollling
SDMA_H_STATSTOP to know channel0 done rather than interrupt in
current code, just clear BD_INTR to disable channel0 interrupt to
avoid the above case.
This issue was brought by commit 1d069bfa3c78 ("dmaengine: imx-sdma:
ack channel 0 IRQ in the interrupt handler") which didn't take care
the above case.
Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler")
Cc: stable(a)vger.kernel.org #5.0+
Signed-off-by: Robin Gong <yibin.gong(a)nxp.com>
Reported-by: Sven Van Asbroeck <thesven73(a)gmail.com>
Tested-by: Sven Van Asbroeck <thesven73(a)gmail.com>
---
drivers/dma/imx-sdma.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index deea9aa..b5a1ee2 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -742,7 +742,7 @@ static int sdma_load_script(struct sdma_engine *sdma, void *buf, int size,
spin_lock_irqsave(&sdma->channel_0_lock, flags);
bd0->mode.command = C0_SETPM;
- bd0->mode.status = BD_DONE | BD_INTR | BD_WRAP | BD_EXTD;
+ bd0->mode.status = BD_DONE | BD_WRAP | BD_EXTD;
bd0->mode.count = size / 2;
bd0->buffer_addr = buf_phys;
bd0->ext_buffer_addr = address;
@@ -1064,7 +1064,7 @@ static int sdma_load_context(struct sdma_channel *sdmac)
context->gReg[7] = sdmac->watermark_level;
bd0->mode.command = C0_SETDM;
- bd0->mode.status = BD_DONE | BD_INTR | BD_WRAP | BD_EXTD;
+ bd0->mode.status = BD_DONE | BD_WRAP | BD_EXTD;
bd0->mode.count = sizeof(*context) / 4;
bd0->buffer_addr = sdma->context_phys;
bd0->ext_buffer_addr = 2048 + (sizeof(*context) / 4) * channel;
--
2.7.4
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: swap_readpage(): avoid blk_wake_io_task() if !synchronous
swap_readpage() sets waiter = bio->bi_private even if synchronous = F,
this means that the caller can get the spurious wakeup after return. This
can be fatal if blk_wake_io_task() does set_current_state(TASK_RUNNING)
after the caller does set_special_state(), in the worst case the kernel
can crash in do_task_dead().
Link: http://lkml.kernel.org/r/20190704160301.GA5956@redhat.com
Fixes: 0619317ff8baa2d ("block: add polled wakeup task helper")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Qian Cai <cai(a)lca.pw>
Acked-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_io.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--- a/mm/page_io.c~swap_readpage-avoid-blk_wake_io_task-if-synchronous
+++ a/mm/page_io.c
@@ -137,8 +137,10 @@ out:
unlock_page(page);
WRITE_ONCE(bio->bi_private, NULL);
bio_put(bio);
- blk_wake_io_task(waiter);
- put_task_struct(waiter);
+ if (waiter) {
+ blk_wake_io_task(waiter);
+ put_task_struct(waiter);
+ }
}
int generic_swapfile_activate(struct swap_info_struct *sis,
@@ -395,11 +397,12 @@ int swap_readpage(struct page *page, boo
* Keep this task valid during swap readpage because the oom killer may
* attempt to access it in the page fault retry time check.
*/
- get_task_struct(current);
- bio->bi_private = current;
bio_set_op_attrs(bio, REQ_OP_READ, 0);
- if (synchronous)
+ if (synchronous) {
bio->bi_opf |= REQ_HIPRI;
+ get_task_struct(current);
+ bio->bi_private = current;
+ }
count_vm_event(PSWPIN);
bio_get(bio);
qc = submit_bio(bio);
_
From: Eric Biggers <ebiggers(a)google.com>
Subject: fs/userfaultfd.c: disable irqs for fault_pending and event locks
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and
takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have
to wait for userfaultfd_ctx::fd_wqh.lock to be released by
userfaultfd_ctx_read(), which can be waiting for
userfaultfd_ctx::fault_pending_wqh.lock or
userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and
event_wqh locks are taken with IRQs enabled. Since the IRQ handler may
take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
Fix it by always disabling IRQs when taking the fault_pending_wqh and
event_wqh locks.
ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock")
didn't fix this because it only accounted for the fd_wqh lock, not the
other locks nested inside it.
Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reported-by: syzbot+fab6de82892b6b9c6191(a)syzkaller.appspotmail.com
Reported-by: syzbot+53c0b767f7ca0dc0c451(a)syzkaller.appspotmail.com
Reported-by: syzbot+a3accb352f9c22041cfa(a)syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 42 ++++++++++++++++++++++++++----------------
1 file changed, 26 insertions(+), 16 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd-disable-irqs-for-fault_pending-and-event-locks
+++ a/fs/userfaultfd.c
@@ -40,6 +40,16 @@ enum userfaultfd_state {
/*
* Start with fault_pending_wqh and fault_wqh so they're more likely
* to be in the same cacheline.
+ *
+ * Locking order:
+ * fd_wqh.lock
+ * fault_pending_wqh.lock
+ * fault_wqh.lock
+ * event_wqh.lock
+ *
+ * To avoid deadlocks, IRQs must be disabled when taking any of the above locks,
+ * since fd_wqh.lock is taken by aio_poll() while it's holding a lock that's
+ * also taken in IRQ context.
*/
struct userfaultfd_ctx {
/* waitqueue head for the pending (i.e. not read) userfaults */
@@ -458,7 +468,7 @@ vm_fault_t handle_userfault(struct vm_fa
blocking_state = return_to_userland ? TASK_INTERRUPTIBLE :
TASK_KILLABLE;
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
/*
* After the __add_wait_queue the uwq is visible to userland
* through poll/read().
@@ -470,7 +480,7 @@ vm_fault_t handle_userfault(struct vm_fa
* __add_wait_queue.
*/
set_current_state(blocking_state);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
if (!is_vm_hugetlb_page(vmf->vma))
must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
@@ -552,13 +562,13 @@ vm_fault_t handle_userfault(struct vm_fa
* kernel stack can be released after the list_del_init.
*/
if (!list_empty_careful(&uwq.wq.entry)) {
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
/*
* No need of list_del_init(), the uwq on the stack
* will be freed shortly anyway.
*/
list_del(&uwq.wq.entry);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
}
/*
@@ -583,7 +593,7 @@ static void userfaultfd_event_wait_compl
init_waitqueue_entry(&ewq->wq, current);
release_new_ctx = NULL;
- spin_lock(&ctx->event_wqh.lock);
+ spin_lock_irq(&ctx->event_wqh.lock);
/*
* After the __add_wait_queue the uwq is visible to userland
* through poll/read().
@@ -613,15 +623,15 @@ static void userfaultfd_event_wait_compl
break;
}
- spin_unlock(&ctx->event_wqh.lock);
+ spin_unlock_irq(&ctx->event_wqh.lock);
wake_up_poll(&ctx->fd_wqh, EPOLLIN);
schedule();
- spin_lock(&ctx->event_wqh.lock);
+ spin_lock_irq(&ctx->event_wqh.lock);
}
__set_current_state(TASK_RUNNING);
- spin_unlock(&ctx->event_wqh.lock);
+ spin_unlock_irq(&ctx->event_wqh.lock);
if (release_new_ctx) {
struct vm_area_struct *vma;
@@ -918,10 +928,10 @@ wakeup:
* the last page faults that may have been already waiting on
* the fault_*wqh.
*/
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, &range);
__wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
/* Flush pending events that may still wait on event_wqh */
wake_up_all(&ctx->event_wqh);
@@ -1134,7 +1144,7 @@ static ssize_t userfaultfd_ctx_read(stru
if (!ret && msg->event == UFFD_EVENT_FORK) {
ret = resolve_userfault_fork(ctx, fork_nctx, msg);
- spin_lock(&ctx->event_wqh.lock);
+ spin_lock_irq(&ctx->event_wqh.lock);
if (!list_empty(&fork_event)) {
/*
* The fork thread didn't abort, so we can
@@ -1180,7 +1190,7 @@ static ssize_t userfaultfd_ctx_read(stru
if (ret)
userfaultfd_ctx_put(fork_nctx);
}
- spin_unlock(&ctx->event_wqh.lock);
+ spin_unlock_irq(&ctx->event_wqh.lock);
}
return ret;
@@ -1219,14 +1229,14 @@ static ssize_t userfaultfd_read(struct f
static void __wake_userfault(struct userfaultfd_ctx *ctx,
struct userfaultfd_wake_range *range)
{
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
/* wake all in the range and autoremove */
if (waitqueue_active(&ctx->fault_pending_wqh))
__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL,
range);
if (waitqueue_active(&ctx->fault_wqh))
__wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, range);
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
}
static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
@@ -1881,7 +1891,7 @@ static void userfaultfd_show_fdinfo(stru
wait_queue_entry_t *wq;
unsigned long pending = 0, total = 0;
- spin_lock(&ctx->fault_pending_wqh.lock);
+ spin_lock_irq(&ctx->fault_pending_wqh.lock);
list_for_each_entry(wq, &ctx->fault_pending_wqh.head, entry) {
pending++;
total++;
@@ -1889,7 +1899,7 @@ static void userfaultfd_show_fdinfo(stru
list_for_each_entry(wq, &ctx->fault_wqh.head, entry) {
total++;
}
- spin_unlock(&ctx->fault_pending_wqh.lock);
+ spin_unlock_irq(&ctx->fault_pending_wqh.lock);
/*
* If more protocols will be added, there will be all shown
_
The patch titled
Subject: mm/memcontrol: fix wrong statistics in memory.stat
has been added to the -mm tree. Its filename is
mm-memcontrol-fix-wrong-statistics-in-memorystat.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-fix-wrong-statistics…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-fix-wrong-statistics…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: mm/memcontrol: fix wrong statistics in memory.stat
When we calculate total statistics for memcg1_stats and memcg1_events, we
use the the index 'i' in the for loop as the events index. Actually we
should use memcg1_stats[i] and memcg1_events[i] as the events index.
Link: http://lkml.kernel.org/r/1562116978-19539-1-git-send-email-laoar.shao@gmail…
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty").
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Yafang Shao <shaoyafang(a)didiglobal.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-wrong-statistics-in-memorystat
+++ a/mm/memcontrol.c
@@ -3530,12 +3530,13 @@ static int memcg_stat_show(struct seq_fi
if (memcg1_stats[i] == MEMCG_SWAP && !do_memsw_account())
continue;
seq_printf(m, "total_%s %llu\n", memcg1_stat_names[i],
- (u64)memcg_page_state(memcg, i) * PAGE_SIZE);
+ (u64)memcg_page_state(memcg, memcg1_stats[i]) *
+ PAGE_SIZE);
}
for (i = 0; i < ARRAY_SIZE(memcg1_events); i++)
seq_printf(m, "total_%s %llu\n", memcg1_event_names[i],
- (u64)memcg_events(memcg, i));
+ (u64)memcg_events(memcg, memcg1_events[i]));
for (i = 0; i < NR_LRU_LISTS; i++)
seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i],
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
mm-memcontrol-fix-wrong-statistics-in-memorystat.patch
mm-vmscan-expose-cgroup_ino-for-memcg-reclaim-tracepoints.patch
mm-vmscan-add-a-new-member-reclaim_state-in-struct-shrink_control.patch
mm-vmscan-add-a-new-member-reclaim_state-in-struct-shrink_control-fix.patch
mm-vmscan-calculate-reclaimed-slab-caches-in-all-reclaim-paths.patch
The patch titled
Subject: mm/z3fold.c: lock z3fold page before __SetPageMovable()
has been removed from the -mm tree. Its filename was
mm-z3foldc-lock-z3fold-page-before-__setpagemovable.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Henry Burns <henryburns(a)google.com>
Subject: mm/z3fold.c: lock z3fold page before __SetPageMovable()
__SetPageMovable() expects its page to be locked, but z3fold.c doesn't
lock the page. This triggers the VM_BUG_ON_PAGE(!PageLocked(page), page)
in __SetPageMovable().
Following zsmalloc.c's example we call trylock_page() and unlock_page().
Also make z3fold_page_migrate() assert that newpage is passed in locked,
as per the documentation.
Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns(a)google.com>
Suggested-by: Vitaly Wool <vitalywool(a)gmail.com>
Acked-by: Vitaly Wool <vitalywool(a)gmail.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Vitaly Vul <vitaly.vul(a)sony.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Xidong Wang <wangxidong_97(a)163.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/z3fold.c~mm-z3foldc-lock-z3fold-page-before-__setpagemovable
+++ a/mm/z3fold.c
@@ -919,7 +919,10 @@ retry:
set_bit(PAGE_HEADLESS, &page->private);
goto headless;
}
- __SetPageMovable(page, pool->inode->i_mapping);
+ if (!WARN_ON(!trylock_page(page))) {
+ __SetPageMovable(page, pool->inode->i_mapping);
+ unlock_page(page);
+ }
z3fold_page_lock(zhdr);
found:
@@ -1326,6 +1329,7 @@ static int z3fold_page_migrate(struct ad
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
+ VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
zhdr = page_address(page);
pool = zhdr_to_pool(zhdr);
_
Patches currently in -mm which might be from henryburns(a)google.com are
mm-z3fold-fix-z3fold_buddy_slots-use-after-free.patch
Going through Olof's build report for 4.19.y:
On Thu, Jul 4, 2019 at 12:14 PM Olof's autobuilder <build(a)lixom.net> wrote:
>
> arm.rpc_defconfig:
> arm-unknown-linux-gnueabi-gcc: error: unrecognized -march target: armv3m
> arm-unknown-linux-gnueabi-gcc: error: missing argument to '-march='
> arm-unknown-linux-gnueabi-gcc: error: unrecognized -march target: armv3m
> arm-unknown-linux-gnueabi-gcc: error: missing argument to '-march='
No mainline patch yet, this happens with gcc-9, which cannot build an
rpc kernel any more as armv3 support got dropped:
> arch/arm/mm/init.c:471:13: warning: unused variable 'itcm_end' [-Wunused-variable]
> arch/arm/mm/init.c:470:13: warning: unused variable 'dtcm_end' [-Wunused-variable]
Please backport this to 5.1-stable:
e6c4375f7c92 ("ARM: 8865/1: mm: remove unused variables")
> /tmp/ccUhzzYK.s:18119: Warning: using r15 results in unpredictable behaviour
> /tmp/ccUhzzYK.s:18191: Warning: using r15 results in unpredictable behaviour
I have a patch but not mainlined it yet.
> sound/pci/echoaudio/echoaudio_dsp.c:647:9: warning: iteration 1073741824 invokes undefined behavior [-Waggressive-loop-optimizations]
> sound/pci/echoaudio/echoaudio_dsp.c:658:9: warning: iteration 1073741824 invokes undefined behavior [-Waggressive-loop-optimizations]
> sound/pci/echoaudio/echoaudio_dsp.c:647:9: warning: iteration 1073741824 invokes undefined behavior [-Waggressive-loop-optimizations]
Have not seen this one yet, sorry.
> include/linux/string.h:340:9: warning: '__builtin_memset' offset [321, 344] from the object at 'buf' is out of the bounds of referenced subobject 'rdata' with type 'struct fc_rport_priv' at offset 0 [-Warray-bounds]
> include/linux/string.h:340:9: warning: '__builtin_memset' offset [321, 344] from the object at 'buf' is out of the bounds of referenced subobject 'rdata' with type 'struct fc_rport_priv' at offset 0 [-Warray-bounds]
Looks like a harmless warning from an unusal coding style. The issue is
still present in mainline and should be trivial to address by anyone using
gcc-9.
> include/linux/module.h:132:6: warning: 'init_module' specifies less restrictive attribute than its target 'rp_init': 'cold' [-Wmissing-attributes]
Please backport this to all stable kernels (2.6.39+):
423ea3255424 ("tty: rocket: fix incorrect forward declaration of 'rp_init()'"
> arm64.allmodconfig:
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: warning: the frame size of 2592 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: warning: the frame size of 2560 bytes is larger than 2048 bytes [-Wframe-larger-than=]
My patch is waiting for mainline acceptance:
https://patchwork.kernel.org/patch/11022355/
> aarch64-unknown-linux-gnu-ld: warning: creating a DT_TEXTREL in object
> aarch64-unknown-linux-gnu-ld: warning: creating a DT_TEXTREL in object
> aarch64-unknown-linux-gnu-ld: warning: creating a DT_TEXTREL in object
no idea, I don't see this here.
>
> i386.allmodconfig:
> drivers/iio/adc/rcar-gyroadc.c:510:5: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
Yep, that's a bug, just sent a fix now.:
https://lore.kernel.org/lkml/20190704113800.3299636-1-arnd@arndb.de/
Arnd
The patch titled
Subject: swap_readpage(): avoid blk_wake_io_task() if !synchronous
has been added to the -mm tree. Its filename is
swap_readpage-avoid-blk_wake_io_task-if-synchronous.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/swap_readpage-avoid-blk_wake_io_ta…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/swap_readpage-avoid-blk_wake_io_ta…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: swap_readpage(): avoid blk_wake_io_task() if !synchronous
swap_readpage() sets waiter = bio->bi_private even if synchronous = F,
this means that the caller can get the spurious wakeup after return. This
can be fatal if blk_wake_io_task() does set_current_state(TASK_RUNNING)
after the caller does set_special_state(), in the worst case the kernel
can crash in do_task_dead().
Link: http://lkml.kernel.org/r/20190704160301.GA5956@redhat.com
Fixes: 0619317ff8baa2d ("block: add polled wakeup task helper")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Qian Cai <cai(a)lca.pw>
Acked-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_io.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--- a/mm/page_io.c~swap_readpage-avoid-blk_wake_io_task-if-synchronous
+++ a/mm/page_io.c
@@ -137,8 +137,10 @@ out:
unlock_page(page);
WRITE_ONCE(bio->bi_private, NULL);
bio_put(bio);
- blk_wake_io_task(waiter);
- put_task_struct(waiter);
+ if (waiter) {
+ blk_wake_io_task(waiter);
+ put_task_struct(waiter);
+ }
}
int generic_swapfile_activate(struct swap_info_struct *sis,
@@ -395,11 +397,12 @@ int swap_readpage(struct page *page, boo
* Keep this task valid during swap readpage because the oom killer may
* attempt to access it in the page fault retry time check.
*/
- get_task_struct(current);
- bio->bi_private = current;
bio_set_op_attrs(bio, REQ_OP_READ, 0);
- if (synchronous)
+ if (synchronous) {
bio->bi_opf |= REQ_HIPRI;
+ get_task_struct(current);
+ bio->bi_private = current;
+ }
count_vm_event(PSWPIN);
bio_get(bio);
qc = submit_bio(bio);
_
Patches currently in -mm which might be from oleg(a)redhat.com are
swap_readpage-avoid-blk_wake_io_task-if-synchronous.patch
signal-simplify-set_user_sigmask-restore_user_sigmask.patch
select-change-do_poll-to-return-erestartnohand-rather-than-eintr.patch
select-shift-restore_saved_sigmask_unless-into-poll_select_copy_remaining.patch
aio-simplify-read_events.patch
The patch titled
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
has been removed from the -mm tree. Its filename was
libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields.patch
This patch was dropped because other changes were merged, which wrecked this patch
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
At namespace creation time there is the potential for the "expected to be
zero" fields of a 'pfn' info-block to be filled with indeterminate data.
While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.
In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly initialized
to be guaranteed zero. Bump the minor version to indicate it is safe to
assume the 'padding' and 'flags' are zero. Otherwise, this corruption is
expected to benign since all other critical fields are explicitly
initialized.
Note The cc: stable is about spreading this new policy to as many kernels
as possible not fixing an issue in those kernels. It is not until the
change titled "libnvdimm/pfn: Stop padding pmem namespaces to section
alignment" where this improper initialization becomes a problem. So if
someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces
to section alignment" (which is not tagged for stable), make sure this
pre-requisite is flagged.
Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwil…
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com> [ppc64]
Cc: <stable(a)vger.kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/nvdimm/dax_devs.c | 2 +-
drivers/nvdimm/pfn.h | 1 +
drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++---
3 files changed, 17 insertions(+), 4 deletions(-)
--- a/drivers/nvdimm/dax_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/dax_devs.c
@@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!dax_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
--- a/drivers/nvdimm/pfn_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn_devs.c
@@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(st
return 0;
}
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
@@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!pfn_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -694,7 +703,7 @@ static int nd_pfn_init(struct nd_pfn *nd
u64 checksum;
int rc;
- pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
@@ -703,11 +712,14 @@ static int nd_pfn_init(struct nd_pfn *nd
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
/* no info block, do init */;
+ memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(&nd_pfn->dev,
@@ -760,7 +772,7 @@ static int nd_pfn_init(struct nd_pfn *nd
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
- pfn_sb->version_minor = cpu_to_le16(2);
+ pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
--- a/drivers/nvdimm/pfn.h~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn.h
@@ -28,6 +28,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+ /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
};
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
Ever since the conversion of DAX to the Xarray a RocksDB benchmark has
been encountering intermittent lockups. The backtraces always include
the filesystem-DAX PMD path, multi-order entries have been a source of
bugs in the past, and disabling the PMD path allows a test that fails in
minutes to run for an hour.
The regression has been bisected to "dax: Convert page fault handlers to
XArray", but little progress has been made on the root cause debug.
Unless / until root cause can be identified mark CONFIG_FS_DAX_PMD
broken to preclude intermittent lockups. Reverting the Xarray conversion
also works, but that change is too big to backport. The implementation
is committed to Xarray at this point.
Link: https://lore.kernel.org/linux-fsdevel/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4j…
Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray")
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Reported-by: Robert Barror <robert.barror(a)intel.com>
Reported-by: Seema Pandit <seema.pandit(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
fs/Kconfig | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/Kconfig b/fs/Kconfig
index f1046cf6ad85..85eecd0d4c5d 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -66,6 +66,9 @@ config FS_DAX_PMD
depends on FS_DAX
depends on ZONE_DEVICE
depends on TRANSPARENT_HUGEPAGE
+ # intermittent lockups since commit b15cd800682f "dax: Convert
+ # page fault handlers to XArray"
+ depends on BROKEN
# Selected by DAX drivers that do not expect filesystem DAX to support
# get_user_pages() of DAX mappings. I.e. "limited" indicates no support
Hi,
we encountered an issue that blocked us from sending out proper reports for
some pipelines so I want to make sure the results get sent out.
In that time, we tested:
stable linux-5.1.y:8584aaf1c326
stable linux-5.1.y:57f5b343cdf9
queue-4.19 c775271c438ccaad33f025bb5027c573bd7d8c35
queue-4.19 d13157b55a88eed3505bb42a249dd721d2837cff
queue-4.19 715a0203f375147f679bb92e052676380efadcff
queue-5.1 715a0203f375147f679bb92e052676380efadcff
queue-5.1 d13157b55a88eed3505bb42a249dd721d2837cff
All of the testing passed. The regular set of tests was executed with all
of these.
Sorry for the inconvenience and lack of the usual reports,
Veronika