The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00ee8b60102862f4daf0814d12a2ea2744fc0b9b Mon Sep 17 00:00:00 2001
From: Richard Weinberger <richard(a)nod.at>
Date: Mon, 11 Jun 2018 23:41:09 +0200
Subject: [PATCH] ubifs: Fix directory size calculation for symlinks
We have to account the name of the symlink and not the target length.
Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 9da224d4f2da..e8616040bffc 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1123,8 +1123,7 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry,
struct ubifs_inode *ui;
struct ubifs_inode *dir_ui = ubifs_inode(dir);
struct ubifs_info *c = dir->i_sb->s_fs_info;
- int err, len = strlen(symname);
- int sz_change = CALC_DENT_SIZE(len);
+ int err, sz_change, len = strlen(symname);
struct fscrypt_str disk_link;
struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
.new_ino_d = ALIGN(len, 8),
@@ -1151,6 +1150,8 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry,
if (err)
goto out_budg;
+ sz_change = CALC_DENT_SIZE(fname_len(&nm));
+
inode = ubifs_new_inode(c, dir, S_IFLNK | S_IRWXUGO);
if (IS_ERR(inode)) {
err = PTR_ERR(inode);
From: Dave Chinner <dchinner(a)redhat.com>
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:
# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION -- xfs
FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
generic/038 454s ...
run fstests generic/038 at 2018-12-20 18:43:05
XFS (sdc): Unmounting Filesystem
XFS (sdc): Mounting V5 Filesystem
XFS (sdc): Ending clean mount
BUG: Bad page state in process kswapd0 pfn:3a7fa
page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
flags: 0xfffffc0000000()
raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
raw: 0000000000000001 0000000000000000 00000000ffffffff
page dumped because: non-NULL mapping
CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
Call Trace:
dump_stack+0x67/0x90
bad_page.cold.116+0x8a/0xbd
free_pcppages_bulk+0x4bf/0x6a0
free_unref_page_list+0x10f/0x1f0
shrink_page_list+0x49d/0xf50
shrink_inactive_list+0x19d/0x3b0
shrink_node_memcg.constprop.77+0x398/0x690
? shrink_slab.constprop.81+0x278/0x3f0
shrink_node+0x7a/0x2f0
kswapd+0x34b/0x6d0
? node_reclaim+0x240/0x240
kthread+0x11f/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x24/0x30
Disabling lock debugging due to kernel taint
....
The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
---
fs/iomap.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c
index 5bc172f3dfe8..d6bc98ae8d35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -116,12 +116,6 @@ iomap_page_create(struct inode *inode, struct page *page)
atomic_set(&iop->read_count, 0);
atomic_set(&iop->write_count, 0);
bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
- /*
- * migrate_page_move_mapping() assumes that pages with private data have
- * their count elevated by 1.
- */
- get_page(page);
set_page_private(page, (unsigned long)iop);
SetPagePrivate(page);
return iop;
@@ -138,7 +132,6 @@ iomap_page_release(struct page *page)
WARN_ON_ONCE(atomic_read(&iop->write_count));
ClearPagePrivate(page);
set_page_private(page, 0);
- put_page(page);
kfree(iop);
}
--
2.19.1
From: Peter Xu <peterx(a)redhat.com>
Subject: mm: thp: fix flags for pmd migration when split
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs. However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound()
while actually they don't make sense at all when it's a migration entry.
Fix them up. Since at it, drop the ifdef together as not needed.
Note that if my understanding is correct about the problem then if without
the patch there is chance to lose some of the dirty bits in the migrating
pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset
instead of bit 2) and it could potentially corrupt the memory of an
userspace program which depends on the dirty bit.
Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-flags-for-pmd-migration-when-split
+++ a/mm/huge_memory.c
@@ -2144,23 +2144,25 @@ static void __split_huge_pmd_locked(stru
*/
old_pmd = pmdp_invalidate(vma, haddr, pmd);
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
pmd_migration = is_pmd_migration_entry(old_pmd);
- if (pmd_migration) {
+ if (unlikely(pmd_migration)) {
swp_entry_t entry;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_to_page(swp_offset(entry));
- } else
-#endif
+ write = is_write_migration_entry(entry);
+ young = false;
+ soft_dirty = pmd_swp_soft_dirty(old_pmd);
+ } else {
page = pmd_page(old_pmd);
+ if (pmd_dirty(old_pmd))
+ SetPageDirty(page);
+ write = pmd_write(old_pmd);
+ young = pmd_young(old_pmd);
+ soft_dirty = pmd_soft_dirty(old_pmd);
+ }
VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
- if (pmd_dirty(old_pmd))
- SetPageDirty(page);
- write = pmd_write(old_pmd);
- young = pmd_young(old_pmd);
- soft_dirty = pmd_soft_dirty(old_pmd);
/*
* Withdraw the table only after we mark the pmd entry invalid.
_
From: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Subject: mm, memory_hotplug: initialize struct pages for the full memory section
If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers:
Here are the the panic examples:
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
[<00000000008f15c4>] show_valid_zones+0x5c/0x190
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
kernel parameter mem=3075M
--------------------------
page:000003d08300c000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
[<00000000008f12fa>] show_mem_removable+0x9a/0xd8
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<000000000038596c>] is_mem_section_removable+0xb4/0x190
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.
Michal said:
: This has alwways been problem AFAIU. It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8d9 kernels mostly and
: therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during
: allocation in vmemmap")
Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer(a)de.ibm.com>
Suggested-by: Michal Hocko <mhocko(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin(a)microsoft.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/mm/page_alloc.c~mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section
+++ a/mm/page_alloc.c
@@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned
cond_resched();
}
}
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * If the zone does not span the rest of the section then
+ * we should at least initialize those pages. Otherwise we
+ * could blow up on a poisoned page in some paths which depend
+ * on full sections being initialized (e.g. memory hotplug).
+ */
+ while (end_pfn % PAGES_PER_SECTION) {
+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+ end_pfn++;
+ }
+#endif
}
#ifdef CONFIG_ZONE_DEVICE
_