From: Richard Weinberger <richard(a)nod.at>
commit e58725d51fa8da9133f3f1c54170aa2e43056b91 upstream.
UBIFS's recovery code strictly assumes that a deleted inode will never
come back, therefore it removes all data which belongs to that inode
as soon it faces an inode with link count 0 in the replay list.
Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE
it can lead to data loss upon a power-cut.
Consider a journal with entries like:
0: inode X (nlink = 0) /* O_TMPFILE was created */
1: data for inode X /* Someone writes to the temp file */
2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */
3: inode X (nlink = 1) /* inode was re-linked via linkat() */
Upon replay of entry #2 UBIFS will drop all data that belongs to inode X,
this will lead to an empty file after mounting.
As solution for this problem, scan the replay list for a re-link entry
before dropping data.
Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE")
Cc: stable(a)vger.kernel.org # 4.9-4.18
Cc: Russell Senior <russell(a)personaltelco.net>
Cc: Rafał Miłecki <zajec5(a)gmail.com>
Reported-by: Russell Senior <russell(a)personaltelco.net>
Reported-by: Rafał Miłecki <zajec5(a)gmail.com>
Tested-by: Rafał Miłecki <rafal(a)milecki.pl>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
[rmilecki: update ubifs_assert() calls to compile with 4.18 and older]
Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl>
(cherry picked from commit e58725d51fa8da9133f3f1c54170aa2e43056b91)
---
fs/ubifs/replay.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index ae5c02f22f3e..d998fbf7de30 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -210,6 +210,38 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
}
/**
+ * inode_still_linked - check whether inode in question will be re-linked.
+ * @c: UBIFS file-system description object
+ * @rino: replay entry to test
+ *
+ * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1.
+ * This case needs special care, otherwise all references to the inode will
+ * be removed upon the first replay entry of an inode with link count 0
+ * is found.
+ */
+static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino)
+{
+ struct replay_entry *r;
+
+ ubifs_assert(rino->deletion);
+ ubifs_assert(key_type(c, &rino->key) == UBIFS_INO_KEY);
+
+ /*
+ * Find the most recent entry for the inode behind @rino and check
+ * whether it is a deletion.
+ */
+ list_for_each_entry_reverse(r, &c->replay_list, list) {
+ ubifs_assert(r->sqnum >= rino->sqnum);
+ if (key_inum(c, &r->key) == key_inum(c, &rino->key))
+ return r->deletion == 0;
+
+ }
+
+ ubifs_assert(0);
+ return false;
+}
+
+/**
* apply_replay_entry - apply a replay entry to the TNC.
* @c: UBIFS file-system description object
* @r: replay entry to apply
@@ -239,6 +271,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
{
ino_t inum = key_inum(c, &r->key);
+ if (inode_still_linked(c, r)) {
+ err = 0;
+ break;
+ }
+
err = ubifs_tnc_remove_ino(c, inum);
break;
}
--
2.13.7
Hello,
even though the subject sounds harmless it fixes a real bug.
(Yes, I'm aware that commit isn't in Linus' tree yet, but I assume your
tracking of patches targetting stable is better than mine, so I didn't
wait :-)
For backporting you also need its parent commit (i.e. e697271c4e29
("spi: imx: add a device specific prepare_message callback")). I have a
local backport to v4.14 here which isn't entirely trivial. So just tell
me if you need help.
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
在 2018-12-27四的 22:34 +0800,Icenowy Zheng写道:
> The SMI SM3350 USB-UFS bridge controller cannot handle long sense
> request
> correctly and will make the chip refuse to do read/write when
> requested
> long sense.
>
> Add a bad sense quirk for it.
>
> Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io>
> ---
I forgot to:
Cc: stable(a)vger.kernel.org
> drivers/usb/storage/unusual_devs.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/usb/storage/unusual_devs.h
> b/drivers/usb/storage/unusual_devs.h
> index f7f83b21dc74..ea0d27a94afe 100644
> --- a/drivers/usb/storage/unusual_devs.h
> +++ b/drivers/usb/storage/unusual_devs.h
> @@ -1265,6 +1265,18 @@ UNUSUAL_DEV( 0x090c, 0x1132, 0x0000, 0xffff,
> USB_SC_DEVICE, USB_PR_DEVICE, NULL,
> US_FL_FIX_CAPACITY ),
>
> +/*
> + * Reported by Icenowy Zheng <icenowy(a)aosc.io>
> + * The SMI SM3350 USB-UFS bridge controller will enter a wrong state
> + * that do not process read/write command if a long sense is
> requested,
> + * so force to use 18-byte sense.
> + */
> +UNUSUAL_DEV( 0x090c, 0x3350, 0x0000, 0xffff,
> + "SMI",
> + "SM3350 UFS-to-USB-Mass-Storage bridge",
> + USB_SC_DEVICE, USB_PR_DEVICE, NULL,
> + US_FL_BAD_SENSE ),
> +
> /*
> * Reported by Paul Hartman <paul.hartman+linux(a)gmail.com>
> * This card reader returns "Illegal Request, Logical Block Address
在 2018-12-27四的 22:34 +0800,Icenowy Zheng写道:
> Currently the code will set US_FL_SANE_SENSE flag unconditionally if
> device claims SPC3+, however we should allow US_FL_BAD_SENSE flag to
> prevent this behavior, because SMI SM3350 UFS-USB bridge controller,
> which claims SPC4, will show strange behavior with 96-byte sense
> (put the chip into a wrong state that cannot read/write anything).
>
> Check the presence of US_FL_BAD_SENSE when assuming US_FL_SANE_SENSE
> on
> SPC4+ devices.
>
> Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io>
> ---
I forgot to:
Cc: stable(a)vger.kernel.org
> drivers/usb/storage/scsiglue.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/storage/scsiglue.c
> b/drivers/usb/storage/scsiglue.c
> index fde2e71a6ade..699fe9557127 100644
> --- a/drivers/usb/storage/scsiglue.c
> +++ b/drivers/usb/storage/scsiglue.c
> @@ -236,7 +236,8 @@ static int slave_configure(struct scsi_device
> *sdev)
> sdev->try_rc_10_first = 1;
>
> /* assume SPC3 or latter devices support sense size >
> 18 */
> - if (sdev->scsi_level > SCSI_SPC_2)
> + if (sdev->scsi_level > SCSI_SPC_2 &&
> + !(us->fflags & US_FL_BAD_SENSE))
> us->fflags |= US_FL_SANE_SENSE;
>
> /*
The patch titled
Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail
has been removed from the -mm tree. Its filename was
forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail
Changeset 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
will result in fork failing if allocating a kernel stack for a task
in dup_task_struct exceeds the kernel memory allowance for that cgroup.
Unfortunately, it also results in a crash.
This is due to the code jumping to free_stack and calling free_thread_stack
when the memcg kernel stack charge fails, but without tsk->stack pointing
at the freshly allocated stack.
This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:
#5 [ffffc900244efc88] die at ffffffff8101f0ab
#6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
#7 [ffffc900244efce0] general_protection at ffffffff818ff082
[exception RIP: llist_add_batch+7]
RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000
RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a
RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0
R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
#9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948
RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421
RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040
R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
The simplest fix is to assign tsk->stack right where it is allocated.
Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/kernel/fork.c~forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail
+++ a/kernel/fork.c
@@ -240,8 +240,10 @@ static unsigned long *alloc_thread_stack
* free_thread_stack() can be called in interrupt context,
* so cache the vm_struct.
*/
- if (stack)
+ if (stack) {
tsk->stack_vm_area = find_vm_area(stack);
+ tsk->stack = stack;
+ }
return stack;
#else
struct page *page = alloc_pages_node(node, THREADINFO_GFP,
@@ -288,7 +290,10 @@ static struct kmem_cache *thread_stack_c
static unsigned long *alloc_thread_stack_node(struct task_struct *tsk,
int node)
{
- return kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node);
+ unsigned long *stack;
+ stack = kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node);
+ tsk->stack = stack;
+ return stack;
}
static void free_thread_stack(struct task_struct *tsk)
_
Patches currently in -mm which might be from riel(a)surriel.com are
The patch titled
Subject: mm: thp: fix flags for pmd migration when split
has been removed from the -mm tree. Its filename was
mm-thp-fix-flags-for-pmd-migration-when-split.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm: thp: fix flags for pmd migration when split
When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs. However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound()
while actually they don't make sense at all when it's a migration entry.
Fix them up. Since at it, drop the ifdef together as not needed.
Note that if my understanding is correct about the problem then if without
the patch there is chance to lose some of the dirty bits in the migrating
pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset
instead of bit 2) and it could potentially corrupt the memory of an
userspace program which depends on the dirty bit.
Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-flags-for-pmd-migration-when-split
+++ a/mm/huge_memory.c
@@ -2144,23 +2144,25 @@ static void __split_huge_pmd_locked(stru
*/
old_pmd = pmdp_invalidate(vma, haddr, pmd);
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
pmd_migration = is_pmd_migration_entry(old_pmd);
- if (pmd_migration) {
+ if (unlikely(pmd_migration)) {
swp_entry_t entry;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_to_page(swp_offset(entry));
- } else
-#endif
+ write = is_write_migration_entry(entry);
+ young = false;
+ soft_dirty = pmd_swp_soft_dirty(old_pmd);
+ } else {
page = pmd_page(old_pmd);
+ if (pmd_dirty(old_pmd))
+ SetPageDirty(page);
+ write = pmd_write(old_pmd);
+ young = pmd_young(old_pmd);
+ soft_dirty = pmd_soft_dirty(old_pmd);
+ }
VM_BUG_ON_PAGE(!page_count(page), page);
page_ref_add(page, HPAGE_PMD_NR - 1);
- if (pmd_dirty(old_pmd))
- SetPageDirty(page);
- write = pmd_write(old_pmd);
- young = pmd_young(old_pmd);
- soft_dirty = pmd_soft_dirty(old_pmd);
/*
* Withdraw the table only after we mark the pmd entry invalid.
_
Patches currently in -mm which might be from peterx(a)redhat.com are
userfaultfd-clear-flag-if-remap-event-not-enabled.patch
The patch titled
Subject: mm, memory_hotplug: initialize struct pages for the full memory section
has been removed from the -mm tree. Its filename was
mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Subject: mm, memory_hotplug: initialize struct pages for the full memory section
If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers:
Here are the the panic examples:
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
[<00000000008f15c4>] show_valid_zones+0x5c/0x190
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
kernel parameter mem=3075M
--------------------------
page:000003d08300c000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
[<00000000008f12fa>] show_mem_removable+0x9a/0xd8
[<00000000008cf9c4>] dev_attr_show+0x34/0x70
[<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
[<00000000003e4194>] seq_read+0x204/0x480
[<00000000003b53ea>] __vfs_read+0x32/0x178
[<00000000003b55b2>] vfs_read+0x82/0x138
[<00000000003b5be2>] ksys_read+0x5a/0xb0
[<0000000000b86ba0>] system_call+0xdc/0x2d8
Last Breaking-Event-Address:
[<000000000038596c>] is_mem_section_removable+0xb4/0x190
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.
Michal said:
: This has alwways been problem AFAIU. It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8d9 kernels mostly and
: therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during
: allocation in vmemmap")
Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer(a)de.ibm.com>
Suggested-by: Michal Hocko <mhocko(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin(a)microsoft.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/mm/page_alloc.c~mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section
+++ a/mm/page_alloc.c
@@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned
cond_resched();
}
}
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * If the zone does not span the rest of the section then
+ * we should at least initialize those pages. Otherwise we
+ * could blow up on a poisoned page in some paths which depend
+ * on full sections being initialized (e.g. memory hotplug).
+ */
+ while (end_pfn % PAGES_PER_SECTION) {
+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+ end_pfn++;
+ }
+#endif
}
#ifdef CONFIG_ZONE_DEVICE
_
Patches currently in -mm which might be from zaslonko(a)linux.ibm.com are