The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 60d9f50308e5df19bc18c2fefab0eba4a843900a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 16 May 2019 15:48:55 +0100
Subject: [PATCH] Btrfs: fix fsync not persisting changed attributes of a
directory
While logging an inode we follow its ancestors and for each one we mark
it as logged in the current transaction, even if we have not logged it.
As a consequence if we change an attribute of an ancestor, such as the
UID or GID for example, and then explicitly fsync it, we end up not
logging the inode at all despite returning success to user space, which
results in the attribute being lost if a power failure happens after
the fsync.
Sample reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ chown 6007:6007 /mnt/dir
$ sync
$ chown 9003:9003 /mnt/dir
$ touch /mnt/dir/file
$ xfs_io -c fsync /mnt/dir/file
# fsync our directory after fsync'ing the new file, should persist the
# new values for the uid and gid.
$ xfs_io -c fsync /mnt/dir
<power failure>
$ mount /dev/sdb /mnt
$ stat -c %u:%g /mnt/dir
6007:6007
--> should be 9003:9003, the uid and gid were not persisted, despite
the explicit fsync on the directory prior to the power failure
Fix this by not updating the logged_trans field of ancestor inodes when
logging an inode, since we have not logged them. Let only future calls to
btrfs_log_inode() to mark inodes as logged.
This could be triggered by my recent fsync fuzz tester for fstests, for
which an fstests patch exists titled "fstests: generic, fsync fuzz tester
with fsstress".
Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6c47f6ed3e94..de729acee738 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5478,7 +5478,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans,
{
int ret = 0;
struct dentry *old_parent = NULL;
- struct btrfs_inode *orig_inode = inode;
/*
* for regular files, if its inode is already on disk, we don't
@@ -5498,16 +5497,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans,
}
while (1) {
- /*
- * If we are logging a directory then we start with our inode,
- * not our parent's inode, so we need to skip setting the
- * logged_trans so that further down in the log code we don't
- * think this inode has already been logged.
- */
- if (inode != orig_inode)
- inode->logged_trans = trans->transid;
- smp_mb();
-
if (btrfs_must_commit_transaction(trans, inode)) {
ret = 1;
break;
@@ -6384,7 +6373,6 @@ void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
* if this directory was already logged any new
* names for this file/dir will get recorded
*/
- smp_mb();
if (dir->logged_trans == trans->transid)
return;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5338e43abbab13791144d37fd8846847062351c6 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 15 May 2019 16:02:47 +0100
Subject: [PATCH] Btrfs: fix wrong ctime and mtime of a directory after log
replay
When replaying a log that contains a new file or directory name that needs
to be added to its parent directory, we end up updating the mtime and the
ctime of the parent directory to the current time after we have set their
values to the correct ones (set at fsync time), efectivelly losing them.
Sample reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/file
# fsync of the directory is optional, not needed
$ xfs_io -c fsync /mnt/dir
$ xfs_io -c fsync /mnt/dir/file
$ stat -c %Y /mnt/dir
1557856079
<power failure>
$ sleep 3
$ mount /dev/sdb /mnt
$ stat -c %Y /mnt/dir
1557856082
--> should have been 1557856079, the mtime is updated to the current
time when replaying the log
Fix this by not updating the mtime and ctime to the current time at
btrfs_add_link() when we are replaying a log tree.
This could be triggered by my recent fsync fuzz tester for fstests, for
which an fstests patch exists titled "fstests: generic, fsync fuzz tester
with fsstress".
Fixes: e02119d5a7b43 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b6d549c993f6..6bebc0ca751d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6433,8 +6433,18 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
btrfs_i_size_write(parent_inode, parent_inode->vfs_inode.i_size +
name_len * 2);
inode_inc_iversion(&parent_inode->vfs_inode);
- parent_inode->vfs_inode.i_mtime = parent_inode->vfs_inode.i_ctime =
- current_time(&parent_inode->vfs_inode);
+ /*
+ * If we are replaying a log tree, we do not want to update the mtime
+ * and ctime of the parent directory with the current time, since the
+ * log replay procedure is responsible for setting them to their correct
+ * values (the ones it had when the fsync was done).
+ */
+ if (!test_bit(BTRFS_FS_LOG_RECOVERING, &root->fs_info->flags)) {
+ struct timespec64 now = current_time(&parent_inode->vfs_inode);
+
+ parent_inode->vfs_inode.i_mtime = now;
+ parent_inode->vfs_inode.i_ctime = now;
+ }
ret = btrfs_update_inode(trans, root, &parent_inode->vfs_inode);
if (ret)
btrfs_abort_transaction(trans, ret);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5338e43abbab13791144d37fd8846847062351c6 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 15 May 2019 16:02:47 +0100
Subject: [PATCH] Btrfs: fix wrong ctime and mtime of a directory after log
replay
When replaying a log that contains a new file or directory name that needs
to be added to its parent directory, we end up updating the mtime and the
ctime of the parent directory to the current time after we have set their
values to the correct ones (set at fsync time), efectivelly losing them.
Sample reproducer:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/dir
$ touch /mnt/dir/file
# fsync of the directory is optional, not needed
$ xfs_io -c fsync /mnt/dir
$ xfs_io -c fsync /mnt/dir/file
$ stat -c %Y /mnt/dir
1557856079
<power failure>
$ sleep 3
$ mount /dev/sdb /mnt
$ stat -c %Y /mnt/dir
1557856082
--> should have been 1557856079, the mtime is updated to the current
time when replaying the log
Fix this by not updating the mtime and ctime to the current time at
btrfs_add_link() when we are replaying a log tree.
This could be triggered by my recent fsync fuzz tester for fstests, for
which an fstests patch exists titled "fstests: generic, fsync fuzz tester
with fsstress".
Fixes: e02119d5a7b43 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b6d549c993f6..6bebc0ca751d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6433,8 +6433,18 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
btrfs_i_size_write(parent_inode, parent_inode->vfs_inode.i_size +
name_len * 2);
inode_inc_iversion(&parent_inode->vfs_inode);
- parent_inode->vfs_inode.i_mtime = parent_inode->vfs_inode.i_ctime =
- current_time(&parent_inode->vfs_inode);
+ /*
+ * If we are replaying a log tree, we do not want to update the mtime
+ * and ctime of the parent directory with the current time, since the
+ * log replay procedure is responsible for setting them to their correct
+ * values (the ones it had when the fsync was done).
+ */
+ if (!test_bit(BTRFS_FS_LOG_RECOVERING, &root->fs_info->flags)) {
+ struct timespec64 now = current_time(&parent_inode->vfs_inode);
+
+ parent_inode->vfs_inode.i_mtime = now;
+ parent_inode->vfs_inode.i_ctime = now;
+ }
ret = btrfs_update_inode(trans, root, &parent_inode->vfs_inode);
if (ret)
btrfs_abort_transaction(trans, ret);
I'm announcing the release of the 5.1.7 kernel.
All users of the 5.1 kernel series must upgrade.
The updated 5.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/crypto/vmx/ghash.c | 212 ++++++-----------
drivers/net/bonding/bond_main.c | 15 -
drivers/net/dsa/mv88e6xxx/chip.c | 2
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 -
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 2
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 2
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 5
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 11
drivers/net/ethernet/freescale/fec_main.c | 2
drivers/net/ethernet/marvell/mvneta.c | 4
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 10
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 +
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 6
drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c | 11
drivers/net/ethernet/realtek/r8169.c | 3
drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 4
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8
drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 3
drivers/net/phy/marvell10g.c | 13 +
drivers/net/usb/usbnet.c | 6
include/linux/siphash.h | 5
include/net/netns/ipv4.h | 2
include/uapi/linux/tipc_config.h | 10
net/core/dev.c | 2
net/core/ethtool.c | 8
net/core/skbuff.c | 6
net/ipv4/igmp.c | 47 ++-
net/ipv4/ip_output.c | 4
net/ipv4/route.c | 12
net/ipv6/ip6_output.c | 4
net/ipv6/output_core.c | 30 +-
net/ipv6/raw.c | 2
net/ipv6/route.c | 6
net/llc/llc_output.c | 2
net/sched/act_api.c | 3
net/tipc/core.c | 32 +-
net/tipc/subscr.h | 5
net/tipc/topsrv.c | 14 -
net/tls/tls_device.c | 9
net/tls/tls_sw.c | 19 -
tools/testing/selftests/net/tls.c | 34 ++
43 files changed, 359 insertions(+), 256 deletions(-)
Andy Duan (1):
net: fec: fix the clk mismatch in failed_reset path
Antoine Tenart (1):
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
Chris Packham (1):
tipc: Avoid copying bytes beyond the supplied data
Daniel Axtens (1):
crypto: vmx - ghash: do nosimd fallback manually
David Ahern (1):
ipv6: Fix redirect with VRF
David S. Miller (1):
Revert "tipc: fix modprobe tipc failed after switch order of device registration"
Eric Dumazet (5):
inet: switch IP ID generator to siphash
ipv4/igmp: fix another memory leak in igmpv3_del_delrec()
ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST
llc: fix skb leak in llc_build_and_send_ui_pkt()
net-gro: fix use-after-free read in napi_gro_frags()
Greg Kroah-Hartman (1):
Linux 5.1.7
Heiner Kallweit (1):
r8169: fix MAC address being lost in PCI D3
Jakub Kicinski (6):
net/tls: fix lowat calculation if some data came from previous record
selftests/tls: test for lowat overshoot with multiple records
net/tls: fix no wakeup on partial reads
selftests/tls: add test for sleeping even though there is data
net/tls: fix state removal with feature flags off
net/tls: don't ignore netdev notifications if no TLS features
Jarod Wilson (1):
bonding/802.3ad: fix slave link initialization transition states
Jiri Pirko (1):
mlxsw: spectrum_acl: Avoid warning after identical rules insertion
Jisheng Zhang (2):
net: mvneta: Fix err code path of probe
net: stmmac: fix reset gpio free missing
Junwei Hu (1):
tipc: fix modprobe tipc failed after switch order of device registration
Kloetzke Jan (1):
usbnet: fix kernel crash after disconnect
Maxime Chevallier (1):
ethtool: Check for vlan etype or vlan tci when parsing flow_rule
Michael Chan (3):
bnxt_en: Fix aggregation buffer leak under OOM condition.
bnxt_en: Fix possible BUG() condition when calling pci_disable_msix().
bnxt_en: Reduce memory usage when running in kdump kernel.
Mike Manning (1):
ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
Parav Pandit (2):
net/mlx5: Avoid double free in fs init error unwinding path
net/mlx5: Allocate root ns memory using kzalloc to match kfree
Raju Rangoju (1):
cxgb4: offload VLAN flows regardless of VLAN ethtype
Rasmus Villemoes (1):
net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
Russell King (1):
net: phy: marvell10g: report if the PHY fails to boot firmware
Saeed Mahameed (1):
net/mlx5e: Disable rxhash when CQE compress is enabled
Tan, Tee Min (1):
net: stmmac: fix ethtool flow control not able to get/set
Vishal Kulkarni (1):
cxgb4: Revert "cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size"
Vlad Buslov (1):
net: sched: don't use tc_action->order during action dump
Weifeng Voon (1):
net: stmmac: dma channel control register need to be init first
Willem de Bruijn (1):
net: correct zerocopy refcnt with udp MSG_MORE
Hi Sasha,
On Wed, May 29, 2019 at 07:52:27PM -0400, Sasha Levin wrote:
> brcmfmac: fix Oops when bringing up interface during USB disconnect
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I see you taken the brcmfmac fixes to stable, if I recall correctly the
following commit was also pretty relevant and I don't think it was picked-up.
It's in the upstream:
commit 5cdb0ef6144f47440850553579aa923c20a63f23
Author: Piotr Figiel <p.figiel(a)camlintechnologies.com>
Date: Mon Mar 4 15:42:52 2019 +0000
brcmfmac: fix NULL pointer derefence during USB disconnect
Maybe you could consider also taking this.
Best regards,
--
Piotr Figiel
Changes since v7 [1]:
- Make subsection helpers pfn based rather than physical-address based
(Oscar and Pavel)
- Make subsection bitmap definition scalable for different section and
sub-section sizes across architectures. As a result:
unsigned long map_active
...is converted to:
DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION)
...and the helpers are renamed with a 'subsection' prefix. (Pavel)
- New in this version is a touch of arch/powerpc/include/asm/sparsemem.h
in "[PATCH v8 01/12] mm/sparsemem: Introduce struct mem_section_usage"
to define ARCH_SUBSECTION_SHIFT.
- Drop "mm/sparsemem: Introduce common definitions for the size and mask
of a section" in favor of Robin's "mm/memremap: Rename and consolidate
SECTION_SIZE" (Pavel)
- Collect some more Reviewed-by tags. Patches that still lack review
tags: 1, 3, 9 - 12
[1]: https://lore.kernel.org/lkml/155677652226.2336373.8700273400832001094.stgit…
---
[merge logistics]
Hi Andrew,
These are too late for v5.2, I'm posting this v8 during the merge window
to maintain the review momentum.
---
[cover letter]
The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem uses
devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a
'struct page' memmap for pmem. However, it does not use the 'bottom
half' of memory hotplug, i.e. never marks pmem pages online and never
exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.
To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.
It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:
Current design assumptions:
- Sections that describe boot memory (early sections) are never
unplugged / removed.
- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
valid_section() check
- __add_pages() and helper routines assume all operations occur in
PAGES_PER_SECTION units.
- The memblock sysfs interface only comprehends full sections
New design assumptions:
- Sections are instrumented with a sub-section bitmask to track (on x86)
individual 2MB sub-divisions of a 128MB section.
- Partially populated early sections can be extended with additional
sub-sections, and those sub-sections can be removed with
arch_remove_memory(). With this in place we no longer lose usable memory
capacity to padding.
- pfn_valid() is updated to look deeper than valid_section() to also check the
active-sub-section mask. This indication is in the same cacheline as
the valid_section() so the performance impact is expected to be
negligible. So far the lkp robot has not reported any regressions.
- Outside of the core vmemmap population routines which are replaced,
other helper routines like shrink_{zone,pgdat}_span() are updated to
handle the smaller granularity. Core memory hotplug routines that deal
with online memory are not touched.
- The existing memblock sysfs user api guarantees / assumptions are
not touched since this capability is limited to !online
!memblock-sysfs-accessible sections.
Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem
ranges with other pmem ranges by default [3]. In short,
devm_memremap_pages() has pushed the venerable section-size constraint
past the breaking point, and the simplicity of section-aligned
arch_add_memory() is no longer tenable.
These patches are exposed to the kbuild robot on my libnvdimm-pending
branch [4], and a preview of the unit test for this functionality is
available on the 'subsection-pending' branch of ndctl [5].
[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwi…
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libn…
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
---
Dan Williams (11):
mm/sparsemem: Introduce struct mem_section_usage
mm/sparsemem: Add helpers track active portions of a section at boot
mm/hotplug: Prepare shrink_{zone,pgdat}_span for sub-section removal
mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap()
mm/hotplug: Kill is_dev_zone() usage in __remove_pages()
mm: Kill is_dev_zone() helper
mm/sparsemem: Prepare for sub-section ranges
mm/sparsemem: Support sub-section hotplug
mm/devm_memremap_pages: Enable sub-section remap
libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields
libnvdimm/pfn: Stop padding pmem namespaces to section alignment
Robin Murphy (1):
mm/memremap: Rename and consolidate SECTION_SIZE
arch/powerpc/include/asm/sparsemem.h | 3
arch/x86/mm/init_64.c | 4
drivers/nvdimm/dax_devs.c | 2
drivers/nvdimm/pfn.h | 15 -
drivers/nvdimm/pfn_devs.c | 95 +++------
include/linux/memory_hotplug.h | 7 -
include/linux/mm.h | 4
include/linux/mmzone.h | 93 +++++++--
kernel/memremap.c | 63 ++----
mm/hmm.c | 2
mm/memory_hotplug.c | 172 +++++++++-------
mm/page_alloc.c | 8 -
mm/sparse-vmemmap.c | 21 +-
mm/sparse.c | 369 +++++++++++++++++++++++-----------
14 files changed, 511 insertions(+), 347 deletions(-)