On Tue, May 1, 2018 at 4:52 PM <gregkh(a)linuxfoundation.org> wrote:
> This is a note to let you know that I've just added the patch titled
> mm, mlock, vmscan: no more skipping pagevecs
> to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
> The filename of the patch is:
> mm-mlock-vmscan-no-more-skipping-pagevecs.patch
> and it can be found in the queue-4.14 subdirectory.
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I would advise not to include this patch in the previous stable trees. This
patch does not fix any critical bug and for some very specific
microbenchmarks on very large machines, it has shown performance
regression. So, I don't think it is worth the risk.
> From foo@baz Tue May 1 16:18:19 PDT 2018
> From: Shakeel Butt <shakeelb(a)google.com>
> Date: Wed, 21 Feb 2018 14:45:28 -0800
> Subject: mm, mlock, vmscan: no more skipping pagevecs
> From: Shakeel Butt <shakeelb(a)google.com>
> [ Upstream commit 9c4e6b1a7027f102990c0395296015a812525f4d ]
> When a thread mlocks an address space backed either by file pages which
> are currently not present in memory or swapped out anon pages (not in
> swapcache), a new page is allocated and added to the local pagevec
> (lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
> On I/O completion, the thread can wake on a different CPU, the mlock
> syscall will then sets the PageMlocked() bit of the page but will not be
> able to put that page in unevictable LRU as the page is on the pagevec
> of a different CPU. Even on drain, that page will go to evictable LRU
> because the PageMlocked() bit is not checked on pagevec drain.
> The page will eventually go to right LRU on reclaim but the LRU stats
> will remain skewed for a long time.
> This patch puts all the pages, even unevictable, to the pagevecs and on
> the drain, the pages will be added on their LRUs correctly by checking
> their evictability. This resolves the mlocked pages on pagevec of other
> CPUs issue because when those pagevecs will be drained, the mlocked file
> pages will go to unevictable LRU. Also this makes the race with munlock
> easier to resolve because the pagevec drains happen in LRU lock.
> However there is still one place which makes a page evictable and does
> PageLRU check on that page without LRU lock and needs special attention.
> TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().
> #0: __pagevec_lru_add_fn #1: clear_page_mlock
> SetPageLRU() if (!TestClearPageMlocked())
> return
> smp_mb() // <--required
> // inside does PageLRU
> if (!PageMlocked()) if (isolate_lru_page())
> move to evictable LRU putback_lru_page()
> else
> move to unevictable LRU
> In '#1', TestClearPageMlocked() provides full memory barrier semantics
> and thus the PageLRU check (inside isolate_lru_page) can not be
> reordered before it.
> In '#0', without explicit memory barrier, the PageMlocked() check can be
> reordered before SetPageLRU(). If that happens, '#0' can put a page in
> unevictable LRU and '#1' might have just cleared the Mlocked bit of that
> page but fails to isolate as PageLRU fails as '#0' still hasn't set
> PageLRU bit of that page. That page will be stranded on the unevictable
> LRU.
> There is one (good) side effect though. Without this patch, the pages
> allocated for System V shared memory segment are added to evictable LRUs
> even after shmctl(SHM_LOCK) on that segment. This patch will correctly
> put such pages to unevictable LRU.
> Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com
> Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
> Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
> Cc: Jérôme Glisse <jglisse(a)redhat.com>
> Cc: Huang Ying <ying.huang(a)intel.com>
> Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
> Cc: Michal Hocko <mhocko(a)kernel.org>
> Cc: Greg Thelen <gthelen(a)google.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Balbir Singh <bsingharora(a)gmail.com>
> Cc: Minchan Kim <minchan(a)kernel.org>
> Cc: Shaohua Li <shli(a)fb.com>
> Cc: Jan Kara <jack(a)suse.cz>
> Cc: Nicholas Piggin <npiggin(a)gmail.com>
> Cc: Dan Williams <dan.j.williams(a)intel.com>
> Cc: Mel Gorman <mgorman(a)suse.de>
> Cc: Hugh Dickins <hughd(a)google.com>
> Cc: Vlastimil Babka <vbabka(a)suse.cz>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
> Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> include/linux/swap.h | 2 -
> mm/mlock.c | 6 +++
> mm/swap.c | 82
+++++++++++++++++++++++++++++----------------------
> mm/vmscan.c | 59 ------------------------------------
> 4 files changed, 54 insertions(+), 95 deletions(-)
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -326,8 +326,6 @@ extern void deactivate_file_page(struct
> extern void mark_page_lazyfree(struct page *page);
> extern void swap_setup(void);
> -extern void add_page_to_unevictable_list(struct page *page);
> -
> extern void lru_cache_add_active_or_unevictable(struct page *page,
> struct vm_area_struct
*vma);
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -64,6 +64,12 @@ void clear_page_mlock(struct page *page)
> mod_zone_page_state(page_zone(page), NR_MLOCK,
> -hpage_nr_pages(page));
> count_vm_event(UNEVICTABLE_PGCLEARED);
> + /*
> + * The previous TestClearPageMlocked() corresponds to the smp_mb()
> + * in __pagevec_lru_add_fn().
> + *
> + * See __pagevec_lru_add_fn for more explanation.
> + */
> if (!isolate_lru_page(page)) {
> putback_lru_page(page);
> } else {
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -446,30 +446,6 @@ void lru_cache_add(struct page *page)
> }
> /**
> - * add_page_to_unevictable_list - add a page to the unevictable list
> - * @page: the page to be added to the unevictable list
> - *
> - * Add page directly to its zone's unevictable list. To avoid races with
> - * tasks that might be making the page evictable, through eg. munlock,
> - * munmap or exit, while it's not on the lru, we want to add the page
> - * while it's locked or otherwise "invisible" to other tasks. This is
> - * difficult to do when using the pagevec cache, so bypass that.
> - */
> -void add_page_to_unevictable_list(struct page *page)
> -{
> - struct pglist_data *pgdat = page_pgdat(page);
> - struct lruvec *lruvec;
> -
> - spin_lock_irq(&pgdat->lru_lock);
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
> - ClearPageActive(page);
> - SetPageUnevictable(page);
> - SetPageLRU(page);
> - add_page_to_lru_list(page, lruvec, LRU_UNEVICTABLE);
> - spin_unlock_irq(&pgdat->lru_lock);
> -}
> -
> -/**
> * lru_cache_add_active_or_unevictable
> * @page: the page to be added to LRU
> * @vma: vma in which page is mapped for determining reclaimability
> @@ -484,13 +460,9 @@ void lru_cache_add_active_or_unevictable
> {
> VM_BUG_ON_PAGE(PageLRU(page), page);
> - if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) !=
VM_LOCKED)) {
> + if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) !=
VM_LOCKED))
> SetPageActive(page);
> - lru_cache_add(page);
> - return;
> - }
> -
> - if (!TestSetPageMlocked(page)) {
> + else if (!TestSetPageMlocked(page)) {
> /*
> * We use the irq-unsafe __mod_zone_page_stat because this
> * counter is not modified from interrupt context, and
the pte
> @@ -500,7 +472,7 @@ void lru_cache_add_active_or_unevictable
> hpage_nr_pages(page));
> count_vm_event(UNEVICTABLE_PGMLOCKED);
> }
> - add_page_to_unevictable_list(page);
> + lru_cache_add(page);
> }
> /*
> @@ -883,15 +855,55 @@ void lru_add_page_tail(struct page *page
> static void __pagevec_lru_add_fn(struct page *page, struct lruvec
*lruvec,
> void *arg)
> {
> - int file = page_is_file_cache(page);
> - int active = PageActive(page);
> - enum lru_list lru = page_lru(page);
> + enum lru_list lru;
> + int was_unevictable = TestClearPageUnevictable(page);
> VM_BUG_ON_PAGE(PageLRU(page), page);
> SetPageLRU(page);
> + /*
> + * Page becomes evictable in two ways:
> + * 1) Within LRU lock [munlock_vma_pages() and
__munlock_pagevec()].
> + * 2) Before acquiring LRU lock to put the page to correct LRU
and then
> + * a) do PageLRU check with lock [check_move_unevictable_pages]
> + * b) do PageLRU check before lock [clear_page_mlock]
> + *
> + * (1) & (2a) are ok as LRU lock will serialize them. For (2b),
we need
> + * following strict ordering:
> + *
> + * #0: __pagevec_lru_add_fn #1: clear_page_mlock
> + *
> + * SetPageLRU() TestClearPageMlocked()
> + * smp_mb() // explicit ordering // above provides strict
> + * // ordering
> + * PageMlocked() PageLRU()
> + *
> + *
> + * if '#1' does not observe setting of PG_lru by '#0' and fails
> + * isolation, the explicit barrier will make sure that
page_evictable
> + * check will put the page in correct LRU. Without smp_mb(),
SetPageLRU
> + * can be reordered after PageMlocked check and can make '#1' to
fail
> + * the isolation of the page whose Mlocked bit is cleared (#0 is
also
> + * looking at the same page) and the evictable page will be
stranded
> + * in an unevictable LRU.
> + */
> + smp_mb();
> +
> + if (page_evictable(page)) {
> + lru = page_lru(page);
> + update_page_reclaim_stat(lruvec, page_is_file_cache(page),
> + PageActive(page));
> + if (was_unevictable)
> + count_vm_event(UNEVICTABLE_PGRESCUED);
> + } else {
> + lru = LRU_UNEVICTABLE;
> + ClearPageActive(page);
> + SetPageUnevictable(page);
> + if (!was_unevictable)
> + count_vm_event(UNEVICTABLE_PGCULLED);
> + }
> +
> add_page_to_lru_list(page, lruvec, lru);
> - update_page_reclaim_stat(lruvec, file, active);
> trace_mm_lru_insertion(page, lru);
> }
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -790,64 +790,7 @@ int remove_mapping(struct address_space
> */
> void putback_lru_page(struct page *page)
> {
> - bool is_unevictable;
> - int was_unevictable = PageUnevictable(page);
> -
> - VM_BUG_ON_PAGE(PageLRU(page), page);
> -
> -redo:
> - ClearPageUnevictable(page);
> -
> - if (page_evictable(page)) {
> - /*
> - * For evictable pages, we can use the cache.
> - * In event of a race, worst case is we end up with an
> - * unevictable page on [in]active list.
> - * We know how to handle that.
> - */
> - is_unevictable = false;
> - lru_cache_add(page);
> - } else {
> - /*
> - * Put unevictable pages directly on zone's unevictable
> - * list.
> - */
> - is_unevictable = true;
> - add_page_to_unevictable_list(page);
> - /*
> - * When racing with an mlock or AS_UNEVICTABLE clearing
> - * (page is unlocked) make sure that if the other thread
> - * does not observe our setting of PG_lru and fails
> - * isolation/check_move_unevictable_pages,
> - * we see PG_mlocked/AS_UNEVICTABLE cleared below and move
> - * the page back to the evictable list.
> - *
> - * The other side is TestClearPageMlocked() or
shmem_lock().
> - */
> - smp_mb();
> - }
> -
> - /*
> - * page's status can change while we move it among lru. If an
evictable
> - * page is on unevictable list, it never be freed. To avoid that,
> - * check after we added it to the list, again.
> - */
> - if (is_unevictable && page_evictable(page)) {
> - if (!isolate_lru_page(page)) {
> - put_page(page);
> - goto redo;
> - }
> - /* This means someone else dropped this page from LRU
> - * So, it will be freed or putback to LRU again. There is
> - * nothing to do here.
> - */
> - }
> -
> - if (was_unevictable && !is_unevictable)
> - count_vm_event(UNEVICTABLE_PGRESCUED);
> - else if (!was_unevictable && is_unevictable)
> - count_vm_event(UNEVICTABLE_PGCULLED);
> -
> + lru_cache_add(page);
> put_page(page); /* drop ref from isolate */
> }
> Patches currently in stable-queue which might be from shakeelb(a)google.com
are
> queue-4.14/mm-slab-memcg_link-the-slab-s-kmem_cache.patch
> queue-4.14/mm-mlock-vmscan-no-more-skipping-pagevecs.patch
Latest binutils release (2.29.1) will no longer allow proper computation
of GDT entries on 32-bits, with warning:
arch/x86/xen/xen-pvh.S: Assembler messages:
arch/x86/xen/xen-pvh.S:150: Warning: shift count out of range (32 is not between 0 and 31)
arch/x86/xen/xen-pvh.S:150: Warning: shift count out of range (40 is not between 0 and 31)
arch/x86/xen/xen-pvh.S:150: Warning: shift count out of range (32 is not between 0 and 31)
arch/x86/xen/xen-pvh.S:152: Warning: shift count out of range (32 is not between 0 and 31)
arch/x86/xen/xen-pvh.S:152: Warning: shift count out of range (40 is not between 0 and 31)
arch/x86/xen/xen-pvh.S:152: Warning: shift count out of range (32 is not between 0 and 31)
Use explicit value of the entry instead of using GDT_ENTRY() macro.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/xen/xen-pvh.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbe..934f7d4 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -145,11 +145,11 @@ gdt_start:
.quad 0x0000000000000000 /* NULL descriptor */
.quad 0x0000000000000000 /* reserved */
#ifdef CONFIG_X86_64
- .quad GDT_ENTRY(0xa09a, 0, 0xfffff) /* __KERNEL_CS */
+ .quad 0x00af9a000000ffff /* __BOOT_CS */
#else
- .quad GDT_ENTRY(0xc09a, 0, 0xfffff) /* __KERNEL_CS */
+ .quad 0x00cf9a000000ffff /* __BOOT_CS */
#endif
- .quad GDT_ENTRY(0xc092, 0, 0xfffff) /* __KERNEL_DS */
+ .quad 0x00cf92000000ffff /* __BOOT_DS */
gdt_end:
.balign 4
--
2.9.3
Patches in original Xen Security Advisory 155 cared only about backend drivers
while leaving frontend patches to be "developed and released (publicly) after
the embargo date". This is said series.
Marek Marczykowski-Górecki (6):
xen: Add RING_COPY_RESPONSE()
xen-netfront: copy response out of shared buffer before accessing it
xen-netfront: do not use data already exposed to backend
xen-netfront: add range check for Tx response id
xen-blkfront: make local copy of response before using it
xen-blkfront: prepare request locally, only then put it on the shared ring
drivers/block/xen-blkfront.c | 110 ++++++++++++++++++---------------
drivers/net/xen-netfront.c | 61 +++++++++---------
include/xen/interface/io/ring.h | 14 ++++-
3 files changed, 106 insertions(+), 79 deletions(-)
base-commit: 6d08b06e67cd117f6992c46611dfb4ce267cd71e
--
git-series 0.9.1
The patch titled
Subject: mm, oom: fix concurrent munlock and oom reaper unmap, v3
has been added to the -mm tree. Its filename is
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-oom-fix-concurrent-munlock-and-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-oom-fix-concurrent-munlock-and-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Rientjes <rientjes(a)google.com>
Subject: mm, oom: fix concurrent munlock and oom reaper unmap, v3
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd
is zapped by the oom reaper during follow_page_mask() after the check for
pmd_none() is bypassed, this ends up deferencing a NULL ptl or a kernel
oops.
Fix this by manually freeing all possible memory from the mm before doing
the munlock and then setting MMF_OOM_SKIP. The oom reaper can not run on
the mm anymore so the munlock is safe to do in exit_mmap(). It also
matches the logic that the oom reaper currently uses for determining when
to set MMF_OOM_SKIP itself, so there's no new risk of excessive oom
killing.
This issue fixes CVE-2018-1000200.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.cor…
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Suggested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/oom.h | 2 +
mm/mmap.c | 44 +++++++++++++---------
mm/oom_kill.c | 81 ++++++++++++++++++++++--------------------
3 files changed, 71 insertions(+), 56 deletions(-)
diff -puN include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap include/linux/oom.h
--- a/include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/include/linux/oom.h
@@ -95,6 +95,8 @@ static inline int check_stable_address_s
return 0;
}
+void __oom_reap_task_mm(struct mm_struct *mm);
+
extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
unsigned long totalpages);
diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c
--- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/mmap.c
@@ -3024,6 +3024,32 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Manually reap the mm to free as much memory as possible.
+ * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
+ * this mm from further consideration. Taking mm->mmap_sem for
+ * write after setting MMF_OOM_SKIP will guarantee that the oom
+ * reaper will not run on this mm again after mmap_sem is
+ * dropped.
+ *
+ * Nothing can be holding mm->mmap_sem here and the above call
+ * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
+ * __oom_reap_task_mm() will not block.
+ *
+ * This needs to be done before calling munlock_vma_pages_all(),
+ * which clears VM_LOCKED, otherwise the oom reaper cannot
+ * reliably test it.
+ */
+ mutex_lock(&oom_lock);
+ __oom_reap_task_mm(mm);
+ mutex_unlock(&oom_lock);
+
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3045,24 +3071,6 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/oom_kill.c
@@ -469,7 +469,6 @@ bool process_shares_mm(struct task_struc
return false;
}
-
#ifdef CONFIG_MMU
/*
* OOM Reaper kernel thread which tries to reap the memory used by the OOM
@@ -480,16 +479,54 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+void __oom_reap_task_mm(struct mm_struct *mm)
{
- struct mmu_gather tlb;
struct vm_area_struct *vma;
+
+ /*
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over a reaped memory.
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+
+ for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+ if (!can_madv_dontneed_vma(vma))
+ continue;
+
+ /*
+ * Only anonymous pages have a good chance to be dropped
+ * without additional steps which we cannot afford as we
+ * are OOM already.
+ *
+ * We do not even care about fs backed pages because all
+ * which are reclaimable have already been reclaimed and
+ * we do not want to block exit_mmap by keeping mm ref
+ * count elevated without a good reason.
+ */
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ const unsigned long start = vma->vm_start;
+ const unsigned long end = vma->vm_end;
+ struct mmu_gather tlb;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ mmu_notifier_invalidate_range_start(mm, start, end);
+ unmap_page_range(&tlb, vma, start, end, NULL);
+ mmu_notifier_invalidate_range_end(mm, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+ }
+ }
+}
+
+static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+{
bool ret = true;
/*
* We have to make sure to not race with the victim exit path
* and cause premature new oom victim selection:
- * __oom_reap_task_mm exit_mm
+ * oom_reap_task_mm exit_mm
* mmget_not_zero
* mmput
* atomic_dec_and_test
@@ -534,39 +571,8 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
- for (vma = mm->mmap ; vma; vma = vma->vm_next) {
- if (!can_madv_dontneed_vma(vma))
- continue;
+ __oom_reap_task_mm(mm);
- /*
- * Only anonymous pages have a good chance to be dropped
- * without additional steps which we cannot afford as we
- * are OOM already.
- *
- * We do not even care about fs backed pages because all
- * which are reclaimable have already been reclaimed and
- * we do not want to block exit_mmap by keeping mm ref
- * count elevated without a good reason.
- */
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
- const unsigned long start = vma->vm_start;
- const unsigned long end = vma->vm_end;
-
- tlb_gather_mmu(&tlb, mm, start, end);
- mmu_notifier_invalidate_range_start(mm, start, end);
- unmap_page_range(&tlb, vma, start, end, NULL);
- mmu_notifier_invalidate_range_end(mm, start, end);
- tlb_finish_mmu(&tlb, start, end);
- }
- }
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -587,14 +593,13 @@ static void oom_reap_task(struct task_st
struct mm_struct *mm = tsk->signal->oom_mm;
/* Retry the down_read_trylock(mmap_sem) a few times */
- while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+ while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
schedule_timeout_idle(HZ/10);
if (attempts <= MAX_OOM_REAP_RETRIES ||
test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
-
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
task_pid_nr(tsk), tsk->comm);
debug_show_all_locks();
_
Patches currently in -mm which might be from rientjes(a)google.com are
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
The patch titled
Subject: mm, oom: fix concurrent munlock and oom reaper unmap
has been removed from the -mm tree. Its filename was
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: David Rientjes <rientjes(a)google.com>
Subject: mm, oom: fix concurrent munlock and oom reaper unmap
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd
is zapped by the oom reaper during follow_page_mask() after the check for
pmd_none() is bypassed, this ends up deferencing a NULL ptl.
Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
reaped. This prevents the concurrent munlock_vma_pages_range() and
unmap_page_range(). The oom reaper will simply not operate on an mm that
has the bit set and leave the unmapping to exit_mmap().
[rientjes(a)google.com: v2]
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804171951440.105401@chino.kir.cor…
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804171545460.53786@chino.kir.corp…
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 38 ++++++++++++++++++++------------------
mm/oom_kill.c | 28 +++++++++++++---------------
2 files changed, 33 insertions(+), 33 deletions(-)
diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c
--- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/mmap.c
@@ -3024,6 +3024,25 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Wait for oom_reap_task() to stop working on this mm. Because
+ * MMF_UNSTABLE is already set before calling down_read(),
+ * oom_reap_task() will not run on this mm after up_write().
+ * oom_reap_task() also depends on a stable VM_LOCKED flag to
+ * indicate it should not unmap during munlock_vma_pages_all().
+ *
+ * mm_is_oom_victim() cannot be set from under us because
+ * victim->mm is already set to NULL under task_lock before
+ * calling mmput() and victim->signal->oom_mm is set by the oom
+ * killer only if victim->mm is non-NULL while holding
+ * task_lock().
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3045,26 +3064,9 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
+ set_bit(MMF_OOM_SKIP, &mm->flags);
/*
* Walk the list again, actually closing and freeing it,
diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/oom_kill.c
@@ -521,12 +521,17 @@ static bool __oom_reap_task_mm(struct ta
}
/*
- * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
- * work on the mm anymore. The check for MMF_OOM_SKIP must run
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over reaped memory.
+ *
+ * MMF_UNSTABLE is also set by exit_mmap when the OOM reaper shouldn't
+ * work on the mm anymore. The check for MMF_OOM_UNSTABLE must run
* under mmap_sem for reading because it serializes against the
* down_write();up_write() cycle in exit_mmap().
*/
- if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ if (test_and_set_bit(MMF_UNSTABLE, &mm->flags)) {
up_read(&mm->mmap_sem);
trace_skip_task_reaping(tsk->pid);
goto unlock_oom;
@@ -534,14 +539,6 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
continue;
@@ -567,6 +564,7 @@ static bool __oom_reap_task_mm(struct ta
tlb_finish_mmu(&tlb, start, end);
}
}
+ set_bit(MMF_OOM_SKIP, &mm->flags);
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -594,7 +592,6 @@ static void oom_reap_task(struct task_st
test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
-
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
task_pid_nr(tsk), tsk->comm);
debug_show_all_locks();
@@ -603,10 +600,11 @@ done:
tsk->oom_reaper_list = NULL;
/*
- * Hide this mm from OOM killer because it has been either reaped or
- * somebody can't call up_write(mmap_sem).
+ * If the oom reaper could not get started on this mm and it has not yet
+ * reached exit_mmap(), set MMF_OOM_SKIP to disregard.
*/
- set_bit(MMF_OOM_SKIP, &mm->flags);
+ if (!test_bit(MMF_UNSTABLE, &mm->flags))
+ set_bit(MMF_OOM_SKIP, &mm->flags);
/* Drop a reference taken by wake_oom_reaper */
put_task_struct(tsk);
_
Patches currently in -mm which might be from rientjes(a)google.com are
On 5/1/2018 6:32 PM, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> net: qlge: Eliminate duplicate barriers on weakly-ordered archs
>
> to the 4.16-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> net-qlge-eliminate-duplicate-barriers-on-weakly-ordered-archs.patch
> and it can be found in the queue-4.16 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
I don't think these should be pulled at this moment. There is still more barrier
work to be done and the patch below goes in the wrong direction after discussion
with Linus.
I'm waiting for all architectures to reach to functional parity at the end of 4.17
to pick up from where I left.
There are still few architectures still violating writel() guarantee.
>
>>From foo@baz Tue May 1 14:59:17 PDT 2018
> From: Sinan Kaya <okaya(a)codeaurora.org>
> Date: Sun, 25 Mar 2018 10:39:19 -0400
> Subject: net: qlge: Eliminate duplicate barriers on weakly-ordered archs
>
> From: Sinan Kaya <okaya(a)codeaurora.org>
>
> [ Upstream commit e42d8cee343a545ac2d9557a3b28708bbca2bd31 ]
>
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
>
> Signed-off-by: Sinan Kaya <okaya(a)codeaurora.org>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> drivers/net/ethernet/qlogic/qlge/qlge.h | 16 ++++++++++++++++
> drivers/net/ethernet/qlogic/qlge/qlge_main.c | 3 ++-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> --- a/drivers/net/ethernet/qlogic/qlge/qlge.h
> +++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
> @@ -2185,6 +2185,22 @@ static inline void ql_write_db_reg(u32 v
> }
>
> /*
> + * Doorbell Registers:
> + * Doorbell registers are virtual registers in the PCI memory space.
> + * The space is allocated by the chip during PCI initialization. The
> + * device driver finds the doorbell address in BAR 3 in PCI config space.
> + * The registers are used to control outbound and inbound queues. For
> + * example, the producer index for an outbound queue. Each queue uses
> + * 1 4k chunk of memory. The lower half of the space is for outbound
> + * queues. The upper half is for inbound queues.
> + * Caller has to guarantee ordering.
> + */
> +static inline void ql_write_db_reg_relaxed(u32 val, void __iomem *addr)
> +{
> + writel_relaxed(val, addr);
> +}
> +
> +/*
> * Shadow Registers:
> * Outbound queues have a consumer index that is maintained by the chip.
> * Inbound queues have a producer index that is maintained by the chip.
> --- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> +++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> @@ -2700,7 +2700,8 @@ static netdev_tx_t qlge_send(struct sk_b
> tx_ring->prod_idx = 0;
> wmb();
>
> - ql_write_db_reg(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
> + ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
> + mmiowb();
> netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
> "tx queued, slot %d, len %d\n",
> tx_ring->prod_idx, skb->len);
>
>
> Patches currently in stable-queue which might be from okaya(a)codeaurora.org are
>
> queue-4.16/net-qlge-eliminate-duplicate-barriers-on-weakly-ordered-archs.patch
>
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Dear Greg,
Looking at the change-log of the stable series, there are two lines
between the commit messages summary [1] and the first line of the commit
message body.
> commit 4c8e0270dc7afb10d4e06be4adfd177db5cb3cb8
> Author: Tang Junhui <tang.junhui(a)zte.com.cn>
> Date: Wed Feb 7 11:41:46 2018 -0800
>
> bcache: fix for data collapse after re-attaching an attached device
>
>
> [ Upstream commit 73ac105be390c1de42a2f21643c9778a5e002930 ]
I assume you scripts generate that. Could they be changed to just add
one blank line?
Kind regards,
Paul
[1] https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.14.37
The patch
ASoC: cirrus: i2s: Fix LRCLK configuration
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From 2d534113be9a2aa532a1ae127a57e83558aed358 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Date: Sat, 28 Apr 2018 22:51:38 +0200
Subject: [PATCH] ASoC: cirrus: i2s: Fix LRCLK configuration
The bit responsible for LRCLK polarity is i2s_tlrs (0), not i2s_trel (2)
(refer to "EP93xx User's Guide").
Previously card drivers which specified SND_SOC_DAIFMT_NB_IF actually got
SND_SOC_DAIFMT_NB_NF, an adaptation is necessary to retain the old
behavior.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/cirrus/edb93xx.c | 2 +-
sound/soc/cirrus/ep93xx-i2s.c | 8 ++++----
sound/soc/cirrus/snappercl15.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/sound/soc/cirrus/edb93xx.c b/sound/soc/cirrus/edb93xx.c
index c53bd6f2c2d7..3d011abaa266 100644
--- a/sound/soc/cirrus/edb93xx.c
+++ b/sound/soc/cirrus/edb93xx.c
@@ -67,7 +67,7 @@ static struct snd_soc_dai_link edb93xx_dai = {
.cpu_dai_name = "ep93xx-i2s",
.codec_name = "spi0.0",
.codec_dai_name = "cs4271-hifi",
- .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_IF |
+ .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
SND_SOC_DAIFMT_CBS_CFS,
.ops = &edb93xx_ops,
};
diff --git a/sound/soc/cirrus/ep93xx-i2s.c b/sound/soc/cirrus/ep93xx-i2s.c
index 934f8aefdd90..38c240c97041 100644
--- a/sound/soc/cirrus/ep93xx-i2s.c
+++ b/sound/soc/cirrus/ep93xx-i2s.c
@@ -213,24 +213,24 @@ static int ep93xx_i2s_set_dai_fmt(struct snd_soc_dai *cpu_dai,
switch (fmt & SND_SOC_DAIFMT_INV_MASK) {
case SND_SOC_DAIFMT_NB_NF:
/* Negative bit clock, lrclk low on left word */
- clk_cfg &= ~(EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_REL);
+ clk_cfg &= ~(EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_LRS);
break;
case SND_SOC_DAIFMT_NB_IF:
/* Negative bit clock, lrclk low on right word */
clk_cfg &= ~EP93XX_I2S_CLKCFG_CKP;
- clk_cfg |= EP93XX_I2S_CLKCFG_REL;
+ clk_cfg |= EP93XX_I2S_CLKCFG_LRS;
break;
case SND_SOC_DAIFMT_IB_NF:
/* Positive bit clock, lrclk low on left word */
clk_cfg |= EP93XX_I2S_CLKCFG_CKP;
- clk_cfg &= ~EP93XX_I2S_CLKCFG_REL;
+ clk_cfg &= ~EP93XX_I2S_CLKCFG_LRS;
break;
case SND_SOC_DAIFMT_IB_IF:
/* Positive bit clock, lrclk low on right word */
- clk_cfg |= EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_REL;
+ clk_cfg |= EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_LRS;
break;
}
diff --git a/sound/soc/cirrus/snappercl15.c b/sound/soc/cirrus/snappercl15.c
index 2334ec19e7eb..11ff7b2672b2 100644
--- a/sound/soc/cirrus/snappercl15.c
+++ b/sound/soc/cirrus/snappercl15.c
@@ -72,7 +72,7 @@ static struct snd_soc_dai_link snappercl15_dai = {
.codec_dai_name = "tlv320aic23-hifi",
.codec_name = "tlv320aic23-codec.0-001a",
.platform_name = "ep93xx-i2s",
- .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_IF |
+ .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
SND_SOC_DAIFMT_CBS_CFS,
.ops = &snappercl15_ops,
};
--
2.17.0
The patch
ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From 5d302ed3cc80564fb835bed5fdba1e1250ecc9e5 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Date: Sat, 28 Apr 2018 22:51:39 +0200
Subject: [PATCH] ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to "EP93xx User’s Guide", I2STXLinCtrlData and I2SRXLinCtrlData
registers actually have different format. The only currently used bit
(Left_Right_Justify) has different position. Fix this and simplify the
whole setup taking into account the fact that both registers have zero
default value.
The practical effect of the above is repaired SND_SOC_DAIFMT_RIGHT_J
support (currently unused).
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/cirrus/ep93xx-i2s.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/sound/soc/cirrus/ep93xx-i2s.c b/sound/soc/cirrus/ep93xx-i2s.c
index 38c240c97041..0dc3852c4621 100644
--- a/sound/soc/cirrus/ep93xx-i2s.c
+++ b/sound/soc/cirrus/ep93xx-i2s.c
@@ -51,7 +51,9 @@
#define EP93XX_I2S_WRDLEN_24 (1 << 0)
#define EP93XX_I2S_WRDLEN_32 (2 << 0)
-#define EP93XX_I2S_LINCTRLDATA_R_JUST (1 << 2) /* Right justify */
+#define EP93XX_I2S_RXLINCTRLDATA_R_JUST BIT(1) /* Right justify */
+
+#define EP93XX_I2S_TXLINCTRLDATA_R_JUST BIT(2) /* Right justify */
#define EP93XX_I2S_CLKCFG_LRS (1 << 0) /* lrclk polarity */
#define EP93XX_I2S_CLKCFG_CKP (1 << 1) /* Bit clock polarity */
@@ -170,25 +172,25 @@ static int ep93xx_i2s_set_dai_fmt(struct snd_soc_dai *cpu_dai,
unsigned int fmt)
{
struct ep93xx_i2s_info *info = snd_soc_dai_get_drvdata(cpu_dai);
- unsigned int clk_cfg, lin_ctrl;
+ unsigned int clk_cfg;
+ unsigned int txlin_ctrl = 0;
+ unsigned int rxlin_ctrl = 0;
clk_cfg = ep93xx_i2s_read_reg(info, EP93XX_I2S_RXCLKCFG);
- lin_ctrl = ep93xx_i2s_read_reg(info, EP93XX_I2S_RXLINCTRLDATA);
switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
case SND_SOC_DAIFMT_I2S:
clk_cfg |= EP93XX_I2S_CLKCFG_REL;
- lin_ctrl &= ~EP93XX_I2S_LINCTRLDATA_R_JUST;
break;
case SND_SOC_DAIFMT_LEFT_J:
clk_cfg &= ~EP93XX_I2S_CLKCFG_REL;
- lin_ctrl &= ~EP93XX_I2S_LINCTRLDATA_R_JUST;
break;
case SND_SOC_DAIFMT_RIGHT_J:
clk_cfg &= ~EP93XX_I2S_CLKCFG_REL;
- lin_ctrl |= EP93XX_I2S_LINCTRLDATA_R_JUST;
+ rxlin_ctrl |= EP93XX_I2S_RXLINCTRLDATA_R_JUST;
+ txlin_ctrl |= EP93XX_I2S_TXLINCTRLDATA_R_JUST;
break;
default:
@@ -237,8 +239,8 @@ static int ep93xx_i2s_set_dai_fmt(struct snd_soc_dai *cpu_dai,
/* Write new register values */
ep93xx_i2s_write_reg(info, EP93XX_I2S_RXCLKCFG, clk_cfg);
ep93xx_i2s_write_reg(info, EP93XX_I2S_TXCLKCFG, clk_cfg);
- ep93xx_i2s_write_reg(info, EP93XX_I2S_RXLINCTRLDATA, lin_ctrl);
- ep93xx_i2s_write_reg(info, EP93XX_I2S_TXLINCTRLDATA, lin_ctrl);
+ ep93xx_i2s_write_reg(info, EP93XX_I2S_RXLINCTRLDATA, rxlin_ctrl);
+ ep93xx_i2s_write_reg(info, EP93XX_I2S_TXLINCTRLDATA, txlin_ctrl);
return 0;
}
--
2.17.0
This is the start of the stable review cycle for the 3.18.108 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed May 2 18:39:02 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.108-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.108-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "perf tests: Decompress kernel module before objdump"
Ilya Dryomov <idryomov(a)gmail.com>
libceph: validate con->state at the top of try_write()
Nicolin Chen <nicoleotsuka(a)gmail.com>
ASoC: fsl_esai: Fix divisor calculation failure at lower ratio
Mahesh Rajashekhara <mahesh.rajashekhara(a)microsemi.com>
scsi: sd: Defer spinning up drive while SANITIZE is in progress
Dmitry Vyukov <dvyukov(a)google.com>
kobject: don't use WARN for registration failures
Joakim Tjernlund <joakim.tjernlund(a)infinera.com>
mtd: cfi: cmdset_0002: Do not allow read/write to suspend erase block.
Joakim Tjernlund <joakim.tjernlund(a)transmode.se>
mtd: cfi: cmdset_0001: Workaround Micron Erase suspend bug.
Joakim Tjernlund <joakim.tjernlund(a)transmode.se>
mtd: cfi: cmdset_0001: Do not allow read/write to suspend erase block.
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: oss: Fix unbalanced use lock for synth MIDI device
David Henningsson <diwic(a)ubuntu.com>
ALSA: core: Report audio_tstamp in snd_pcm_sync_ptr
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
tty: Use __GFP_NOFAIL for tty_ldisc_get()
Tony Lindgren <tony(a)atomide.com>
tty: n_gsm: Fix DLCI handling for ADM mode if debug & 2 is not set
Tony Lindgren <tony(a)atomide.com>
tty: n_gsm: Fix long delays with control frame timeouts in ADM mode
Michael S. Tsirkin <mst(a)redhat.com>
virtio_console: free buffers after reset
Michael S. Tsirkin <mst(a)redhat.com>
virtio: add ability to iterate over vqs
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Skip broken EU on Dell dock USB-audio
Ravi Chandra Sadineni <ravisadineni(a)chromium.org>
USB: Increment wakeup count on remote wakeup.
Kamil Lulko <kamilx.lulko(a)intel.com>
usb: core: Add quirk for HP v222w 16GB Mini
Kyle Roeschley <kyle.roeschley(a)ni.com>
USB: serial: cp210x: add ID for NI USB serial console
Vasyl Vavrychuk <vvavrychuk(a)gmail.com>
USB: serial: ftdi_sio: use jtag quirk for Arrow USB Blaster
Shuah Khan <shuahkh(a)osg.samsung.com>
usbip: vhci_hcd: Fix usb device and sockfd leaks
Shuah Khan <shuahkh(a)osg.samsung.com>
usbip: usbip_host: fix to hold parent lock for device_attach() calls
Lukas Czerner <lczerner(a)redhat.com>
ext4: fix bitmap position validation
Theodore Ts'o <tytso(a)mit.edu>
ext4: add validity checks for bitmap block numbers
Theodore Ts'o <tytso(a)mit.edu>
ext4: set h_journal if there is a failure starting a reserved handle
-------------
Diffstat:
Makefile | 4 +--
drivers/char/virtio_console.c | 49 ++++++++++++++++++-------------------
drivers/mtd/chips/cfi_cmdset_0001.c | 33 +++++++++++++++++++++----
drivers/mtd/chips/cfi_cmdset_0002.c | 9 ++++---
drivers/scsi/sd.c | 2 ++
drivers/tty/n_gsm.c | 23 ++++++++++++++++-
drivers/tty/tty_ldisc.c | 11 ++++-----
drivers/usb/core/hcd.c | 1 +
drivers/usb/core/hub.c | 10 +++++++-
drivers/usb/core/quirks.c | 3 +++
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/ftdi_sio.c | 3 ++-
drivers/usb/usbip/stub_main.c | 5 ++++
drivers/usb/usbip/usbip_common.h | 2 +-
fs/ext4/balloc.c | 17 +++++++++++--
fs/ext4/ialloc.c | 8 +++++-
fs/jbd2/transaction.c | 1 +
include/linux/mtd/flashchip.h | 1 +
include/linux/virtio.h | 3 +++
lib/kobject.c | 12 ++++-----
net/ceph/messenger.c | 7 ++++++
sound/core/pcm_native.c | 1 +
sound/core/seq/oss/seq_oss_synth.c | 12 ++++++---
sound/soc/fsl/fsl_esai.c | 7 ++++++
sound/usb/mixer_maps.c | 3 +++
tools/perf/tests/code-reading.c | 20 +--------------
26 files changed, 170 insertions(+), 78 deletions(-)
Hi Greg,
These two patches should probably be part of the 4.4 because
ce59e48fdbad ("serial: mctrl_gpio: implement interrupt handling")
backport. 4.9, 4.14 and 4.16 have these patches already in it.
Romain Izard (1):
serial: mctrl_gpio: Add missing module license
Uwe Kleine-König (1):
serial: mctrl_gpio: export mctrl_gpio_disable_ms and mctrl_gpio_init
drivers/tty/serial/serial_mctrl_gpio.c | 5 +++++
1 file changed, 5 insertions(+)
--
2.14.3
Hi,
please add
> From f15ca723c1ebe6c1a06bc95fda6b62cd87b44559 Mon Sep 17 00:00:00 2001
> From: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
> Date: Thu, 25 Jan 2018 19:03:03 +0100
> Subject: net: don't call update_pmtu unconditionally
>
> Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
> "BUG: unable to handle kernel NULL pointer dereference at (null)"
>
> Let's add a helper to check if update_pmtu is available before calling it.
>
> Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
> Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
> CC: Roman Kapl <code(a)rkapl.cz>
> CC: Xin Long <lucien.xin(a)gmail.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
to 4.14.x.
This fixes NULL derefs caused by a93bf0ff4490 ("vxlan: update
skb dst pmtu on tx path"), which was backported to 4.14.24.
--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0803d7befa15cab5717d667a97a66214d2a4c083 Mon Sep 17 00:00:00 2001
From: Chris Chiu <chiu(a)endlessm.com>
Date: Tue, 20 Mar 2018 15:36:40 +0800
Subject: [PATCH] tpm: self test failure should not cause suspend to fail
The Acer Acer Veriton X4110G has a TPM device detected as:
tpm_tis 00:0b: 1.2 TPM (device-id 0xFE, rev-id 71)
After the first S3 suspend, the following error appears during resume:
tpm tpm0: A TPM error(38) occurred continue selftest
Any following S3 suspend attempts will now fail with this error:
tpm tpm0: Error (38) sending savestate before suspend
PM: Device 00:0b failed to suspend: error 38
Error 38 is TPM_ERR_INVALID_POSTINIT which means the TPM is
not in the correct state. This indicates that the platform BIOS
is not sending the usual TPM_Startup command during S3 resume.
>From this point onwards, all TPM commands will fail.
The same issue was previously reported on Foxconn 6150BK8MC and
Sony Vaio TX3.
The platform behaviour seems broken here, but we should not break
suspend/resume because of this.
When the unexpected TPM state is encountered, set a flag to skip the
affected TPM_SaveState command on later suspends.
Cc: stable(a)vger.kernel.org
Signed-off-by: Chris Chiu <chiu(a)endlessm.com>
Signed-off-by: Daniel Drake <drake(a)endlessm.com>
Link: http://lkml.kernel.org/r/CAB4CAwfSCvj1cudi+MWaB5g2Z67d9DwY1o475YOZD64ma23Ui…
Link: https://lkml.org/lkml/2011/3/28/192
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591031
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 47aacecdc85c..22288ff70a0b 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -1002,6 +1002,10 @@ int tpm_do_selftest(struct tpm_chip *chip)
loops = jiffies_to_msecs(duration) / delay_msec;
rc = tpm_continue_selftest(chip);
+ if (rc == TPM_ERR_INVALID_POSTINIT) {
+ chip->flags |= TPM_CHIP_FLAG_ALWAYS_POWERED;
+ dev_info(&chip->dev, "TPM not ready (%d)\n", rc);
+ }
/* This may fail if there was no TPM driver during a suspend/resume
* cycle; some may return 10 (BAD_ORDINAL), others 28 (FAILEDSELFTEST)
*/
"tpm: add retry logic" caused merge conflicts so I picked couple of
other fixes in order to get it apply cleanly.
James Bottomley (1):
tpm: add retry logic
Tomas Winkler (1):
tpm: cmd_ready command can be issued only after granting locality
Winkler, Tomas (1):
tpm: tpm-interface: fix tpm_transmit/_cmd kdoc
drivers/char/tpm/tpm-interface.c | 131 ++++++++++++++++++++++++-------
drivers/char/tpm/tpm.h | 1 +
drivers/char/tpm/tpm_crb.c | 108 +++++++++++++++++--------
drivers/char/tpm/tpm_tis_core.c | 4 +-
include/linux/tpm.h | 2 +-
5 files changed, 183 insertions(+), 63 deletions(-)
--
2.17.0
The patch titled
Subject: mm: sections are not offlined during memory hotremove
has been added to the -mm tree. Its filename is
mm-sections-are-not-offlined-during-memory-hotremove.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-sections-are-not-offlined-durin…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-sections-are-not-offlined-durin…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Subject: mm: sections are not offlined during memory hotremove
Memory hotplug and hotremove operate with per-block granularity. If the
machine has a large amount of memory (more than 64G), the size of a memory
block can span multiple sections. By mistake, during hotremove we set
only the first section to offline state.
The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
is older than this commit. In this optimization we also added a check for
sections to be in a proper state during hotplug operation.
Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Steven Sistare <steven.sistare(a)oracle.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove mm/sparse.c
--- a/mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove
+++ a/mm/sparse.c
@@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ unsigned long section_nr = pfn_to_section_nr(pfn);
struct mem_section *ms;
/*
_
Patches currently in -mm which might be from pasha.tatashin(a)oracle.com are
mm-sections-are-not-offlined-during-memory-hotremove.patch
sparc64-ng4-memset-32-bits-overflow.patch
The patch titled
Subject: z3fold: fix reclaim lock-ups
has been added to the -mm tree. Its filename is
z3fold-fix-reclaim-lock-ups.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/z3fold-fix-reclaim-lock-ups.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/z3fold-fix-reclaim-lock-ups.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Vitaly Wool <vitalywool(a)gmail.com>
Subject: z3fold: fix reclaim lock-ups
Do not try to optimize in-page object layout while the page is under
reclaim. This fixes lock-ups on reclaim and improves reclaim performance
at the same time.
Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul(a)sony.com>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: <Oleksiy.Avramchenko(a)sony.com>
Cc: Matthew Wilcox <mawilcox(a)microsoft.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)
diff -puN mm/z3fold.c~z3fold-fix-reclaim-lock-ups mm/z3fold.c
--- a/mm/z3fold.c~z3fold-fix-reclaim-lock-ups
+++ a/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
- PAGE_STALE
+ PAGE_STALE,
+ UNDER_RECLAIM
};
/*****************
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold
clear_bit(MIDDLE_CHUNK_MAPPED, &page->private);
clear_bit(NEEDS_COMPACTING, &page->private);
clear_bit(PAGE_STALE, &page->private);
+ clear_bit(UNDER_RECLAIM, &page->private);
spin_lock_init(&zhdr->page_lock);
kref_init(&zhdr->refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_po
atomic64_dec(&pool->pages_nr);
return;
}
+ if (test_bit(UNDER_RECLAIM, &page->private)) {
+ z3fold_page_unlock(zhdr);
+ return;
+ }
if (test_and_set_bit(NEEDS_COMPACTING, &page->private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3
kref_get(&zhdr->refcount);
list_del_init(&zhdr->buddy);
zhdr->cpu = -1;
+ set_bit(UNDER_RECLAIM, &page->private);
+ break;
}
list_del_init(&page->lru);
@@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3
goto next;
}
next:
- spin_lock(&pool->lock);
if (test_bit(PAGE_HEADLESS, &page->private)) {
if (ret == 0) {
- spin_unlock(&pool->lock);
free_z3fold_page(page);
return 0;
}
- } else if (kref_put(&zhdr->refcount, release_z3fold_page)) {
- atomic64_dec(&pool->pages_nr);
+ spin_lock(&pool->lock);
+ list_add(&page->lru, &pool->lru);
+ spin_unlock(&pool->lock);
+ } else {
+ z3fold_page_lock(zhdr);
+ clear_bit(UNDER_RECLAIM, &page->private);
+ if (kref_put(&zhdr->refcount,
+ release_z3fold_page_locked)) {
+ atomic64_dec(&pool->pages_nr);
+ return 0;
+ }
+ /*
+ * if we are here, the page is still not completely
+ * free. Take the global pool lock then to be able
+ * to add it back to the lru list
+ */
+ spin_lock(&pool->lock);
+ list_add(&page->lru, &pool->lru);
spin_unlock(&pool->lock);
- return 0;
+ z3fold_page_unlock(zhdr);
}
- /*
- * Add to the beginning of LRU.
- * Pool lock has to be kept here to ensure the page has
- * not already been released
- */
- list_add(&page->lru, &pool->lru);
+ /* We started off locked to we need to lock the pool back */
+ spin_lock(&pool->lock);
}
spin_unlock(&pool->lock);
return -EAGAIN;
_
Patches currently in -mm which might be from vitalywool(a)gmail.com are
z3fold-fix-reclaim-lock-ups.patch
Commit b9f19259b84d ("drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl")
introduced a mechanism to mark some BOs as purgeable to allow the driver
to drop them under memory pressure. In order to implement this feature
we had to add a mechanism to mark BOs as currently used by a piece of
hardware which materialized through the ->usecnt counter.
Plane code is supposed to increment usecnt when it attaches a BO to a
plane and decrement it when it's done with this BO, which was done in
the ->prepare_fb() and ->cleanup_fb() hooks. The problem is, async page
flip logic does not go through the regular atomic update path, and
->prepare_fb() and ->cleanup_fb() are not called in this case.
Fix that by manually calling vc4_bo_{inc,dec}_usecnt() in the
async-page-flip path.
Note that all this should go away as soon as we get generic async page
flip support in the core, in the meantime, this fix should do the
trick.
Fixes: b9f19259b84d ("drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl")
Reported-by: Peter Robinson <pbrobinson(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com>
---
drivers/gpu/drm/vc4/vc4_crtc.c | 46 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 83d3b7912fc2..c8650bbcbcb3 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -741,6 +741,7 @@ static irqreturn_t vc4_crtc_irq_handler(int irq, void *data)
struct vc4_async_flip_state {
struct drm_crtc *crtc;
struct drm_framebuffer *fb;
+ struct drm_framebuffer *old_fb;
struct drm_pending_vblank_event *event;
struct vc4_seqno_cb cb;
@@ -770,6 +771,23 @@ vc4_async_page_flip_complete(struct vc4_seqno_cb *cb)
drm_crtc_vblank_put(crtc);
drm_framebuffer_put(flip_state->fb);
+
+ /* Decrement the BO usecnt in order to keep the inc/dec calls balanced
+ * when the planes are updated through the async update path.
+ * FIXME: we should move to generic async-page-flip when it's
+ * available, so that we can get rid of this hand-made cleanup_fb()
+ * logic.
+ */
+ if (flip_state->old_fb) {
+ struct drm_gem_cma_object *cma_bo;
+ struct vc4_bo *bo;
+
+ cma_bo = drm_fb_cma_get_gem_obj(flip_state->old_fb, 0);
+ bo = to_vc4_bo(&cma_bo->base);
+ vc4_bo_dec_usecnt(bo);
+ drm_framebuffer_put(flip_state->old_fb);
+ }
+
kfree(flip_state);
up(&vc4->async_modeset);
@@ -794,9 +812,22 @@ static int vc4_async_page_flip(struct drm_crtc *crtc,
struct drm_gem_cma_object *cma_bo = drm_fb_cma_get_gem_obj(fb, 0);
struct vc4_bo *bo = to_vc4_bo(&cma_bo->base);
+ /* Increment the BO usecnt here, so that we never end up with an
+ * unbalanced number of vc4_bo_{dec,inc}_usecnt() calls when the
+ * plane is later updated through the non-async path.
+ * FIXME: we should move to generic async-page-flip when it's
+ * available, so that we can get rid of this hand-made prepare_fb()
+ * logic.
+ */
+ ret = vc4_bo_inc_usecnt(bo);
+ if (ret)
+ return ret;
+
flip_state = kzalloc(sizeof(*flip_state), GFP_KERNEL);
- if (!flip_state)
+ if (!flip_state) {
+ vc4_bo_dec_usecnt(bo);
return -ENOMEM;
+ }
drm_framebuffer_get(fb);
flip_state->fb = fb;
@@ -807,10 +838,23 @@ static int vc4_async_page_flip(struct drm_crtc *crtc,
ret = down_interruptible(&vc4->async_modeset);
if (ret) {
drm_framebuffer_put(fb);
+ vc4_bo_dec_usecnt(bo);
kfree(flip_state);
return ret;
}
+ /* Save the current FB before it's replaced by the new one in
+ * drm_atomic_set_fb_for_plane(). We'll need the old FB in
+ * vc4_async_page_flip_complete() to decrement the BO usecnt and keep
+ * it consistent.
+ * FIXME: we should move to generic async-page-flip when it's
+ * available, so that we can get rid of this hand-made cleanup_fb()
+ * logic.
+ */
+ flip_state->old_fb = plane->state->fb;
+ if (flip_state->old_fb)
+ drm_framebuffer_get(flip_state->old_fb);
+
WARN_ON(drm_crtc_vblank_get(crtc) != 0);
/* Immediately update the plane's legacy fb pointer, so that later
--
2.14.1