This is the start of the stable review cycle for the 4.14.217 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 24 Jan 2021 13:57:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.217-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.217-rc1
Michael Hennerich michael.hennerich@analog.com spi: cadence: cache reference clock rate during probe
Aya Levin ayal@nvidia.com net: ipv6: Validate GSO SKB before finish IPv6 processing
Jason A. Donenfeld Jason@zx2c4.com net: skbuff: disambiguate argument and member for skb_list_walk_safe helper
Jason A. Donenfeld Jason@zx2c4.com net: introduce skb_list_walk_safe for skb segment walking
Edward Cree ecree@solarflare.com net: use skb_list_del_init() to remove from RX sublists
Hoang Le hoang.h.le@dektech.com.au tipc: fix NULL deref in tipc_link_xmit()
David Howells dhowells@redhat.com rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Eric Dumazet edumazet@google.com net: avoid 32 x truesize under-estimation for tiny skbs
Jakub Kicinski kuba@kernel.org net: sit: unregister_netdevice on newlink's error path
David Wu david.wu@rock-chips.com net: stmmac: Fixed mtu channged by cache aligned
Petr Machata petrm@nvidia.com net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata me@pmachata.org net: dcb: Validate netlink message in DCB handler
Willem de Bruijn willemb@google.com esp: avoid unneeded kmap_atomic call
Andrey Zhizhikin andrey.zhizhikin@leica-geosystems.com rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Manish Chopra manishc@marvell.com netxen_nic: fix MSI/MSI-x interrupts
J. Bruce Fields bfields@redhat.com nfsd4: readdirplus shouldn't return parent of export
Will Deacon will@kernel.org compiler.h: Raise minimum version of GCC to 5.1 for arm64
Hamish Martin hamish.martin@alliedtelesis.co.nz usb: ohci: Make distrust_firmware param default to false
Jesper Dangaard Brouer brouer@redhat.com netfilter: conntrack: fix reading nf_conntrack_buckets
Geert Uytterhoeven geert+renesas@glider.be ALSA: fireface: Fix integer overflow in transmit_midi_msg()
Geert Uytterhoeven geert+renesas@glider.be ALSA: firewire-tascam: Fix integer overflow in midi_port_work()
Mike Snitzer snitzer@redhat.com dm: eliminate potential source of excessive kernel log noise
j.nixdorf@avm.de j.nixdorf@avm.de net: sunrpc: interpret the return value of kstrtou32 correctly
Jann Horn jannh@google.com mm, slub: consider rest of partial list if acquire_slab() fails
Dinghao Liu dinghao.liu@zju.edu.cn RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
Jan Kara jack@suse.cz ext4: fix superblock checksum failure when setting password salt
Trond Myklebust trond.myklebust@hammerspace.com NFS: nfs_igrab_and_active must first reference the superblock
Trond Myklebust trond.myklebust@hammerspace.com pNFS: Mark layout for return if return-on-close was not sent
Dave Wysochanski dwysocha@redhat.com NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
Dan Carpenter dan.carpenter@oracle.com ASoC: Intel: fix error code cnl_set_dsp_D0()
Al Viro viro@zeniv.linux.org.uk dump_common_audit_data(): fix racy accesses to ->d_name
Arnd Bergmann arnd@arndb.de ARM: picoxcell: fix missing interrupt-parent properties
Shawn Guo shawn.guo@linaro.org ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Michael Ellerman mpe@ellerman.id.au net: ethernet: fs_enet: Add missing MODULE_LICENSE
Arnd Bergmann arnd@arndb.de misdn: dsp: select CONFIG_BITREVERSE
Randy Dunlap rdunlap@infradead.org arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Rasmus Villemoes rasmus.villemoes@prevas.dk ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
Filipe Manana fdmanana@suse.com btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
Masahiro Yamada masahiroy@kernel.org ARC: build: add boot_targets to PHONY
Masahiro Yamada masahiroy@kernel.org ARC: build: add uImage.lzma to the top-level target
Masahiro Yamada masahiroy@kernel.org ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
yangerkun yangerkun@huawei.com ext4: fix bug for rename with RENAME_WHITEOUT
Leon Schuermann leon@is.currently.online r8152: Add Lenovo Powered USB-C Travel Hub
Akilesh Kailash akailash@google.com dm snapshot: flush merged data before committing metadata
Miaohe Lin linmiaohe@huawei.com mm/hugetlb: fix potential missing huge page size info
Dexuan Cui decui@microsoft.com ACPI: scan: Harden acpi_device_add() against device ID overflows
Alexander Lobakin alobakin@pm.me MIPS: relocatable: fix possible boot hangup with KASLR enabled
Al Viro viro@zeniv.linux.org.uk MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Paul Cercueil paul@crapouillou.net MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
Thomas Hebb tommyhebb@gmail.com ASoC: dapm: remove widget from dirty list on free
-------------
Diffstat:
Makefile | 4 +-- arch/arc/Makefile | 9 ++--- arch/arc/include/asm/page.h | 1 + arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 +++ arch/mips/boot/compressed/decompress.c | 3 +- arch/mips/kernel/binfmt_elfn32.c | 7 ++++ arch/mips/kernel/binfmt_elfo32.c | 7 ++++ arch/mips/kernel/relocate.c | 10 ++++-- drivers/acpi/internal.h | 2 +- drivers/acpi/scan.c | 15 +++++++- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 ++ drivers/isdn/mISDN/Kconfig | 1 + drivers/md/dm-snap.c | 24 +++++++++++++ drivers/md/dm.c | 2 +- .../net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 + drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 + drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++- .../net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +--- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +- drivers/net/usb/cdc_ether.c | 7 ++++ drivers/net/usb/r8152.c | 1 + drivers/net/usb/rndis_host.c | 2 +- drivers/spi/spi-cadence.c | 6 ++-- drivers/usb/host/ohci-hcd.c | 2 +- fs/btrfs/qgroup.c | 13 +++++-- fs/btrfs/super.c | 8 +++++ fs/ext4/ioctl.c | 3 ++ fs/ext4/namei.c | 16 +++++---- fs/nfs/internal.h | 12 ++++--- fs/nfs/nfs4proc.c | 2 +- fs/nfs/pnfs.c | 6 ++++ fs/nfsd/nfs3xdr.c | 7 +++- include/linux/acpi.h | 7 ++++ include/linux/compiler-gcc.h | 6 ++++ include/linux/skbuff.h | 16 +++++++++ mm/hugetlb.c | 2 +- mm/slub.c | 2 +- net/core/skbuff.c | 9 +++-- net/dcb/dcbnl.c | 2 ++ net/ipv4/esp4.c | 7 +--- net/ipv6/esp6.c | 7 +--- net/ipv6/ip6_output.c | 40 +++++++++++++++++++++- net/ipv6/sit.c | 5 ++- net/netfilter/nf_conntrack_standalone.c | 3 ++ net/rxrpc/key.c | 6 ++-- net/sunrpc/addr.c | 2 +- net/tipc/link.c | 9 +++-- security/lsm_audit.c | 7 ++-- sound/firewire/fireface/ff-transaction.c | 2 +- sound/firewire/tascam/tascam-transaction.c | 2 +- sound/soc/intel/skylake/cnl-sst.c | 1 + sound/soc/soc-dapm.c | 1 + 52 files changed, 263 insertions(+), 71 deletions(-)
From: Thomas Hebb tommyhebb@gmail.com
commit 5c6679b5cb120f07652418524ab186ac47680b49 upstream.
A widget's "dirty" list_head, much like its "list" list_head, eventually chains back to a list_head on the snd_soc_card itself. This means that the list can stick around even after the widget (or all widgets) have been freed. Currently, however, widgets that are in the dirty list when freed remain there, corrupting the entire list and leading to memory errors and undefined behavior when the list is next accessed or modified.
I encountered this issue when a component failed to probe relatively late in snd_soc_bind_card(), causing it to bail out and call soc_cleanup_card_resources(), which eventually called snd_soc_dapm_free() with widgets that were still dirty from when they'd been added.
Fixes: db432b414e20 ("ASoC: Do DAPM power checks only for widgets changed since last run") Cc: stable@vger.kernel.org Signed-off-by: Thomas Hebb tommyhebb@gmail.com Reviewed-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/f8b5f031d50122bf1a9bfc9cae046badf4a7a31a.160782241... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/soc-dapm.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -2434,6 +2434,7 @@ void snd_soc_dapm_free_widget(struct snd enum snd_soc_dapm_direction dir;
list_del(&w->list); + list_del(&w->dirty); /* * remove source and sink paths associated to this widget. * While removing the path, remove reference to it from both
From: Paul Cercueil paul@crapouillou.net
commit 4d4f9c1a17a3480f8fe523673f7232b254d724b7 upstream.
The compressed payload is not necesarily 4-byte aligned, at least when compiling with Clang. In that case, the 4-byte value appended to the compressed payload that corresponds to the uncompressed kernel image size must be read using get_unaligned_le32().
This fixes Clang-built kernels not booting on MIPS (tested on a Ingenic JZ4770 board).
Fixes: b8f54f2cde78 ("MIPS: ZBOOT: copy appended dtb to the end of the kernel") Cc: stable@vger.kernel.org # v4.7 Signed-off-by: Paul Cercueil paul@crapouillou.net Reviewed-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Philippe Mathieu-Daudé f4bug@amsat.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/boot/compressed/decompress.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/mips/boot/compressed/decompress.c +++ b/arch/mips/boot/compressed/decompress.c @@ -17,6 +17,7 @@ #include <linux/libfdt.h>
#include <asm/addrspace.h> +#include <asm/unaligned.h>
/* * These two variables specify the free mem region @@ -124,7 +125,7 @@ void decompress_kernel(unsigned long boo dtb_size = fdt_totalsize((void *)&__appended_dtb);
/* last four bytes is always image size in little endian */ - image_size = le32_to_cpup((void *)&__image_end - 4); + image_size = get_unaligned_le32((void *)&__image_end - 4);
/* copy dtb to where the booted kernel will expect it */ memcpy((void *)VMLINUX_LOAD_ADDRESS_ULL + image_size,
From: Al Viro viro@zeniv.linux.org.uk
commit 698222457465ce343443be81c5512edda86e5914 upstream.
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012 had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c) coredumps; unfortunately, compat on mips (which does not go through the usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels have those sections malformed enough to confuse the living hell out of all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the regular compat_binfmt_elf.c, but that's too much for backports. The minimal solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing those patches have done in fs/compat_binfmt_elf.c
Cc: stable@kernel.org # v3.7+ Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/binfmt_elfn32.c | 7 +++++++ arch/mips/kernel/binfmt_elfo32.c | 7 +++++++ 2 files changed, 14 insertions(+)
--- a/arch/mips/kernel/binfmt_elfn32.c +++ b/arch/mips/kernel/binfmt_elfn32.c @@ -103,4 +103,11 @@ jiffies_to_compat_timeval(unsigned long #undef ns_to_timeval #define ns_to_timeval ns_to_compat_timeval
+/* + * Some data types as stored in coredump. + */ +#define user_long_t compat_long_t +#define user_siginfo_t compat_siginfo_t +#define copy_siginfo_to_external copy_siginfo_to_external32 + #include "../../../fs/binfmt_elf.c" --- a/arch/mips/kernel/binfmt_elfo32.c +++ b/arch/mips/kernel/binfmt_elfo32.c @@ -106,4 +106,11 @@ jiffies_to_compat_timeval(unsigned long #undef ns_to_timeval #define ns_to_timeval ns_to_compat_timeval
+/* + * Some data types as stored in coredump. + */ +#define user_long_t compat_long_t +#define user_siginfo_t compat_siginfo_t +#define copy_siginfo_to_external copy_siginfo_to_external32 + #include "../../../fs/binfmt_elf.c"
From: Alexander Lobakin alobakin@pm.me
commit 69e976831cd53f9ba304fd20305b2025ecc78eab upstream.
LLVM-built Linux triggered a boot hangup with KASLR enabled.
arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner, which is a string constant, as a random seed, but accesses it as an array of unsigned long (in rotate_xor()). When the address of linux_banner is not aligned to sizeof(long), such access emits unaligned access exception and hangs the kernel.
Use PTR_ALIGN() to align input address to sizeof(long) and also align down the input length to prevent possible access-beyond-end.
Fixes: 405bc8fd12f5 ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE") Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Alexander Lobakin alobakin@pm.me Tested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/relocate.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/mips/kernel/relocate.c +++ b/arch/mips/kernel/relocate.c @@ -187,8 +187,14 @@ static int __init relocate_exception_tab static inline __init unsigned long rotate_xor(unsigned long hash, const void *area, size_t size) { - size_t i; - unsigned long *ptr = (unsigned long *)area; + const typeof(hash) *ptr = PTR_ALIGN(area, sizeof(hash)); + size_t diff, i; + + diff = (void *)ptr - area; + if (unlikely(size < diff + sizeof(hash))) + return hash; + + size = ALIGN_DOWN(size - diff, sizeof(hash));
for (i = 0; i < size / sizeof(hash); i++) { /* Rotate by odd number of bits and XOR. */
From: Dexuan Cui decui@microsoft.com
commit a58015d638cd4e4555297b04bec9b49028369075 upstream.
Linux VM on Hyper-V crashes with the latest mainline:
[ 4.069624] detected buffer overflow in strcpy [ 4.077733] kernel BUG at lib/string.c:1149! .. [ 4.085819] RIP: 0010:fortify_panic+0xf/0x11 ... [ 4.085819] Call Trace: [ 4.085819] acpi_device_add.cold.15+0xf2/0xfb [ 4.085819] acpi_add_single_object+0x2a6/0x690 [ 4.085819] acpi_bus_check_add+0xc6/0x280 [ 4.085819] acpi_ns_walk_namespace+0xda/0x1aa [ 4.085819] acpi_walk_namespace+0x9a/0xc2 [ 4.085819] acpi_bus_scan+0x78/0x90 [ 4.085819] acpi_scan_init+0xfa/0x248 [ 4.085819] acpi_init+0x2c1/0x321 [ 4.085819] do_one_initcall+0x44/0x1d0 [ 4.085819] kernel_init_freeable+0x1ab/0x1f4
This is because of the recent buffer overflow detection in the commit 6a39e62abbaf ("lib: string.h: detect intra-object overflow in fortified string functions")
Here acpi_device_bus_id->bus_id can only hold 14 characters, while the the acpi_device_hid(device) returns a 22-char string "HYPER_V_GEN_COUNTER_V1".
Per ACPI Spec v6.2, Section 6.1.5 _HID (Hardware ID), if the ID is a string, it must be of the form AAA#### or NNNN####, i.e. 7 chars or 8 chars.
The field bus_id in struct acpi_device_bus_id was originally defined as char bus_id[9], and later was enlarged to char bus_id[15] in 2007 in the commit bb0958544f3c ("ACPI: use more understandable bus_id for ACPI devices")
Fix the issue by changing the field bus_id to const char *, and use kstrdup_const() to initialize it.
Signed-off-by: Dexuan Cui decui@microsoft.com Tested-By: Jethro Beekman jethro@fortanix.com [ rjw: Subject change, whitespace adjustment ] Cc: All applicable stable@vger.kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/internal.h | 2 +- drivers/acpi/scan.c | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 2 deletions(-)
--- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -98,7 +98,7 @@ void acpi_scan_table_handler(u32 event, extern struct list_head acpi_bus_id_list;
struct acpi_device_bus_id { - char bus_id[15]; + const char *bus_id; unsigned int instance_no; struct list_head node; }; --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -485,6 +485,7 @@ static void acpi_device_del(struct acpi_ acpi_device_bus_id->instance_no--; else { list_del(&acpi_device_bus_id->node); + kfree_const(acpi_device_bus_id->bus_id); kfree(acpi_device_bus_id); } break; @@ -673,7 +674,14 @@ int acpi_device_add(struct acpi_device * } if (!found) { acpi_device_bus_id = new_bus_id; - strcpy(acpi_device_bus_id->bus_id, acpi_device_hid(device)); + acpi_device_bus_id->bus_id = + kstrdup_const(acpi_device_hid(device), GFP_KERNEL); + if (!acpi_device_bus_id->bus_id) { + pr_err(PREFIX "Memory allocation error for bus id\n"); + result = -ENOMEM; + goto err_free_new_bus_id; + } + acpi_device_bus_id->instance_no = 0; list_add_tail(&acpi_device_bus_id->node, &acpi_bus_id_list); } @@ -708,6 +716,11 @@ int acpi_device_add(struct acpi_device * if (device->parent) list_del(&device->node); list_del(&device->wakeup_list); + + err_free_new_bus_id: + if (!found) + kfree(new_bus_id); + mutex_unlock(&acpi_device_lock);
err_detach:
From: Miaohe Lin linmiaohe@huawei.com
commit 0eb98f1588c2cc7a79816d84ab18a55d254f481c upstream.
The huge page size is encoded for VM_FAULT_HWPOISON errors only. So if we return VM_FAULT_HWPOISON, huge page size would just be ignored.
Link: https://lkml.kernel.org/r/20210107123449.38481-1-linmiaohe@huawei.com Fixes: aa50d3a7aa81 ("Encode huge page size for VM_FAULT_HWPOISON errors") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3797,7 +3797,7 @@ retry: * So we need to block hugepage fault by PG_hwpoison bit check. */ if (unlikely(PageHWPoison(page))) { - ret = VM_FAULT_HWPOISON | + ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; }
From: Akilesh Kailash akailash@google.com
commit fcc42338375a1e67b8568dbb558f8b784d0f3b01 upstream.
If the origin device has a volatile write-back cache and the following events occur:
1: After finishing merge operation of one set of exceptions, merge_callback() is invoked. 2: Update the metadata in COW device tracking the merge completion. This update to COW device is flushed cleanly. 3: System crashes and the origin device's cache where the recent merge was completed has not been flushed.
During the next cycle when we read the metadata from the COW device, we will skip reading those metadata whose merge was completed in step (1). This will lead to data loss/corruption.
To address this, flush the origin device post merge IO before updating the metadata.
Cc: stable@vger.kernel.org Signed-off-by: Akilesh Kailash akailash@google.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-snap.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
--- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -137,6 +137,11 @@ struct dm_snapshot { * for them to be committed. */ struct bio_list bios_queued_during_merge; + + /* + * Flush data after merge. + */ + struct bio flush_bio; };
/* @@ -1060,6 +1065,17 @@ shut:
static void error_bios(struct bio *bio);
+static int flush_data(struct dm_snapshot *s) +{ + struct bio *flush_bio = &s->flush_bio; + + bio_reset(flush_bio); + bio_set_dev(flush_bio, s->origin->bdev); + flush_bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; + + return submit_bio_wait(flush_bio); +} + static void merge_callback(int read_err, unsigned long write_err, void *context) { struct dm_snapshot *s = context; @@ -1073,6 +1089,11 @@ static void merge_callback(int read_err, goto shut; }
+ if (flush_data(s) < 0) { + DMERR("Flush after merge failed: shutting down merge"); + goto shut; + } + if (s->store->type->commit_merge(s->store, s->num_merging_chunks) < 0) { DMERR("Write error in exception store: shutting down merge"); @@ -1197,6 +1218,7 @@ static int snapshot_ctr(struct dm_target s->first_merging_chunk = 0; s->num_merging_chunks = 0; bio_list_init(&s->bios_queued_during_merge); + bio_init(&s->flush_bio, NULL, 0);
/* Allocate hash table for COW data */ if (init_hash_tables(s)) { @@ -1391,6 +1413,8 @@ static void snapshot_dtr(struct dm_targe
mutex_destroy(&s->lock);
+ bio_uninit(&s->flush_bio); + dm_put_device(ti, s->cow);
dm_put_device(ti, s->origin);
From: Leon Schuermann leon@is.currently.online
commit cb82a54904a99df9e8f9e9d282046055dae5a730 upstream.
This USB-C Hub (17ef:721e) based on the Realtek RTL8153B chip used to use the cdc_ether driver. However, using this driver, with the system suspended the device constantly sends pause-frames as soon as the receive buffer fills up. This causes issues with other devices, where some Ethernet switches stop forwarding packets altogether.
Using the Realtek driver (r8152) fixes this issue. Pause frames are no longer sent while the host system is suspended.
Signed-off-by: Leon Schuermann leon@is.currently.online Tested-by: Leon Schuermann leon@is.currently.online Link: https://lore.kernel.org/r/20210111190312.12589-2-leon@is.currently.online Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/cdc_ether.c | 7 +++++++ drivers/net/usb/r8152.c | 1 + 2 files changed, 8 insertions(+)
--- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -800,6 +800,13 @@ static const struct usb_device_id produc .driver_info = 0, },
+/* Lenovo Powered USB-C Travel Hub (4X90S92381, based on Realtek RTL8153) */ +{ + USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x721e, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + .driver_info = 0, +}, + /* ThinkPad USB-C Dock Gen 2 (based on Realtek RTL8153) */ { USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0xa387, USB_CLASS_COMM, --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -5337,6 +5337,7 @@ static const struct usb_device_id rtl815 {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x7205)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x720c)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x7214)}, + {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x721e)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0xa387)}, {REALTEK_USB_DEVICE(VENDOR_ID_LINKSYS, 0x0041)}, {REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA, 0x09ff)},
From: yangerkun yangerkun@huawei.com
[ Upstream commit 6b4b8e6b4ad8553660421d6360678b3811d5deb9 ]
We got a "deleted inode referenced" warning cross our fsstress test. The bug can be reproduced easily with following steps:
cd /dev/shm mkdir test/ fallocate -l 128M img mkfs.ext4 -b 1024 img mount img test/ dd if=/dev/zero of=test/foo bs=1M count=128 mkdir test/dir/ && cd test/dir/ for ((i=0;i<1000;i++)); do touch file$i; done # consume all block cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD, /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in ext4_rename will return ENOSPC!! cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1 We will get the output: "ls: cannot access 'test/dir/file1': Structure needs cleaning" and the dmesg show: "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls: deleted inode referenced: 139"
ext4_rename will create a special inode for whiteout and use this 'ino' to replace the source file's dir entry 'ino'. Once error happens latter(the error above was the ENOSPC return from ext4_add_entry in ext4_rename since all space has been consumed), the cleanup do drop the nlink for whiteout, but forget to restore 'ino' with source file. This will trigger the bug describle as above.
Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org Fixes: cd808deced43 ("ext4: support RENAME_WHITEOUT") Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/namei.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 6936de30fcf0d..a4301fa4719ff 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3442,8 +3442,6 @@ static int ext4_setent(handle_t *handle, struct ext4_renament *ent, return retval; } } - brelse(ent->bh); - ent->bh = NULL;
return 0; } @@ -3656,6 +3654,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, } }
+ old_file_type = old.de->file_type; if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir)) ext4_handle_sync(handle);
@@ -3683,7 +3682,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, force_reread = (new.dir->i_ino == old.dir->i_ino && ext4_test_inode_flag(new.dir, EXT4_INODE_INLINE_DATA));
- old_file_type = old.de->file_type; if (whiteout) { /* * Do this before adding a new entry, so the old entry is sure @@ -3755,15 +3753,19 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, retval = 0;
end_rename: - brelse(old.dir_bh); - brelse(old.bh); - brelse(new.bh); if (whiteout) { - if (retval) + if (retval) { + ext4_setent(handle, &old, + old.inode->i_ino, old_file_type); drop_nlink(whiteout); + } unlock_new_inode(whiteout); iput(whiteout); + } + brelse(old.dir_bh); + brelse(old.bh); + brelse(new.bh); if (handle) ext4_journal_stop(handle); return retval;
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ]
The deb-pkg builds for ARCH=arc fail.
$ export CROSS_COMPILE=<your-arc-compiler-prefix> $ make -s ARCH=arc defconfig $ make ARCH=arc bindeb-pkg SORTTAB vmlinux SYSMAP System.map MODPOST Module.symvers make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg sh ./scripts/package/builddeb cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1 make[3]: *** [Makefile:1527: intdeb-pkg] Error 2 make[2]: *** [debian/rules:13: binary-arch] Error 2 dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2 make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2 make: *** [Makefile:1527: bindeb-pkg] Error 2
The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as the default image, but there is no rule to build it.
Remove the meaningless KBUILD_IMAGE assignment so it will fallback to the default vmlinux. With this change, you can build the deb package.
I removed the 'bootpImage' target as well. At best, it provides 'make bootpImage' as an alias of 'make vmlinux', but I do not see much sense in doing so.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arc/Makefile | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 2917f56f0ea43..98d31b701a97c 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -99,12 +99,6 @@ libs-y += arch/arc/lib/ $(LIBGCC)
boot := arch/arc/boot
-#default target for make without any arguments. -KBUILD_IMAGE := $(boot)/bootpImage - -all: bootpImage -bootpImage: vmlinux - boot_targets += uImage uImage.bin uImage.gz
$(boot_targets): vmlinux
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit f2712ec76a5433e5ec9def2bd52a95df1f96d050 ]
arch/arc/boot/Makefile supports uImage.lzma, but you cannot do 'make uImage.lzma' because the corresponding target is missing in arch/arc/Makefile. Add it.
I also changed the assignment operator '+=' to ':=' since this is the only place where we expect this variable to be set.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 98d31b701a97c..1146ca5fc349b 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -99,7 +99,7 @@ libs-y += arch/arc/lib/ $(LIBGCC)
boot := arch/arc/boot
-boot_targets += uImage uImage.bin uImage.gz +boot_targets := uImage uImage.bin uImage.gz uImage.lzma
$(boot_targets): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 0cfccb3c04934cdef42ae26042139f16e805b5f7 ]
The top-level boot_targets (uImage and uImage.*) should be phony targets. They just let Kbuild descend into arch/arc/boot/ and create files there.
If a file exists in the top directory with the same name, the boot image will not be created.
You can confirm it by the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix> $ make -s ARCH=arc defconfig all # vmlinux will be built $ touch uImage.gz $ make ARCH=arc uImage.gz CALL scripts/atomic/check-atomics.sh CALL scripts/checksyscalls.sh CHK include/generated/compile.h # arch/arc/boot/uImage.gz is not created
Specify the targets as PHONY to fix this.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arc/Makefile | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 1146ca5fc349b..ef5e8ea042158 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -101,6 +101,7 @@ boot := arch/arc/boot
boot_targets := uImage uImage.bin uImage.gz uImage.lzma
+PHONY += $(boot_targets) $(boot_targets): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
From: Filipe Manana fdmanana@suse.com
[ Upstream commit cb13eea3b49055bd78e6ddf39defd6340f7379fc ]
If we remount a filesystem in RO mode while the qgroup rescan worker is running, we can end up having it still running after the remount is done, and at unmount time we may end up with an open transaction that ends up never getting committed. If that happens we end up with several memory leaks and can crash when hardware acceleration is unavailable for crc32c. Possibly it can lead to other nasty surprises too, due to use-after-free issues.
The following steps explain how the problem happens.
1) We have a filesystem mounted in RW mode and the qgroup rescan worker is running;
2) We remount the filesystem in RO mode, and never stop/pause the rescan worker, so after the remount the rescan worker is still running. The important detail here is that the rescan task is still running after the remount operation committed any ongoing transaction through its call to btrfs_commit_super();
3) The rescan is still running, and after the remount completed, the rescan worker started a transaction, after it finished iterating all leaves of the extent tree, to update the qgroup status item in the quotas tree. It does not commit the transaction, it only releases its handle on the transaction;
4) A filesystem unmount operation starts shortly after;
5) The unmount task, at close_ctree(), stops the transaction kthread, which had not had a chance to commit the open transaction since it was sleeping and the commit interval (default of 30 seconds) has not yet elapsed since the last time it committed a transaction;
6) So after stopping the transaction kthread we still have the transaction used to update the qgroup status item open. At close_ctree(), when the filesystem is in RO mode and no transaction abort happened (or the filesystem is in error mode), we do not expect to have any transaction open, so we do not call btrfs_commit_super();
7) We then proceed to destroy the work queues, free the roots and block groups, etc. After that we drop the last reference on the btree inode by calling iput() on it. Since there are dirty pages for the btree inode, corresponding to the COWed extent buffer for the quotas btree, btree_write_cache_pages() is invoked to flush those dirty pages. This results in creating a bio and submitting it, which makes us end up at btrfs_submit_metadata_bio();
8) At btrfs_submit_metadata_bio() we end up at the if-then-else branch that calls btrfs_wq_submit_bio(), because check_async_write() returned a value of 1. This value of 1 is because we did not have hardware acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not set in fs_info->flags;
9) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the workqueue at fs_info->workers, which was already freed before by the call to btrfs_stop_all_workers() at close_ctree(). This results in an invalid memory access due to a use-after-free, leading to a crash.
When this happens, before the crash there are several warnings triggered, since we have reserved metadata space in a block group, the delayed refs reservation, etc:
------------[ cut here ]------------ WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 4 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs] Code: f0 01 00 00 48 39 c2 75 (...) RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206 RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8 RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800 RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110 R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0x17f/0x2f0 [btrfs] close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 01 48 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c6 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 2 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs] Code: 48 83 bb b0 03 00 00 00 (...) RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206 RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110 R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0x24c/0x2f0 [btrfs] close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 01 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c7 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 5 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs] Code: ad de 49 be 22 01 00 (...) RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206 RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246 RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00 R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c8 ]--- BTRFS info (device sdc): space_info 4 has 268238848 free, is not full BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536 BTRFS info (device sdc): global_block_rsv: size 0 reserved 0 BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0 BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
And the crash, which only happens when we do not have crc32c hardware acceleration, produces the following trace immediately after those warnings:
stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 1749129 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs] Code: 54 55 53 48 89 f3 (...) RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282 RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0 RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8 R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000 FS: 00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_wq_submit_bio+0xb3/0xd0 [btrfs] btrfs_submit_metadata_bio+0x44/0xc0 [btrfs] submit_one_bio+0x61/0x70 [btrfs] btree_write_cache_pages+0x414/0x450 [btrfs] ? kobject_put+0x9a/0x1d0 ? trace_hardirqs_on+0x1b/0xf0 ? _raw_spin_unlock_irqrestore+0x3c/0x60 ? free_debug_processing+0x1e1/0x2b0 do_writepages+0x43/0xe0 ? lock_acquired+0x199/0x490 __writeback_single_inode+0x59/0x650 writeback_single_inode+0xaf/0x120 write_inode_now+0x94/0xd0 iput+0x187/0x2b0 close_ctree+0x2c6/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f3cfebabee7 Code: ff 0b 00 f7 d8 64 89 01 (...) RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000 RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0 R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60 Modules linked in: btrfs dm_snapshot dm_thin_pool (...) ---[ end trace dd74718fef1ed5cc ]---
Finally when we remove the btrfs module (rmmod btrfs), there are several warnings about objects that were allocated from our slabs but were never freed, consequence of the transaction that was never committed and got leaked:
============================================================================= BUG btrfs_delayed_ref_head (Tainted: G B W ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown() -----------------------------------------------------------------------------
INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200 CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? lock_release+0x20e/0x4c0 kmem_cache_destroy+0x55/0x120 btrfs_delayed_ref_exit+0x11/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x0000000050cbdd61 @offset=12104 INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] btrfs_free_tree_block+0x128/0x360 [btrfs] __btrfs_cow_block+0x489/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Object 0x0000000086e9b0ff @offset=12776 INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] btrfs_alloc_tree_block+0x2bf/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs] commit_cowonly_roots+0x248/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 btrfs_delayed_ref_exit+0x11/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 0b (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 ============================================================================= BUG btrfs_delayed_tree_ref (Tainted: G B W ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown() -----------------------------------------------------------------------------
INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200 CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? lock_release+0x20e/0x4c0 kmem_cache_destroy+0x55/0x120 btrfs_delayed_ref_exit+0x1d/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x000000001a340018 @offset=4408 INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] btrfs_free_tree_block+0x128/0x360 [btrfs] __btrfs_cow_block+0x489/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] btrfs_commit_transaction+0x60/0xc40 [btrfs] create_subvol+0x56a/0x990 [btrfs] btrfs_mksubvol+0x3fb/0x4a0 [btrfs] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs] btrfs_ioctl_snap_create+0x58/0x80 [btrfs] btrfs_ioctl+0x1a92/0x36f0 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Object 0x000000002b46292a @offset=13648 INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] btrfs_alloc_tree_block+0x2bf/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 btrfs_delayed_ref_exit+0x1d/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 ============================================================================= BUG btrfs_delayed_extent_op (Tainted: G B W ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown() -----------------------------------------------------------------------------
INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200 CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? __mutex_unlock_slowpath+0x45/0x2a0 kmem_cache_destroy+0x55/0x120 exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x000000004cf95ea8 @offset=6264 INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
Fix this issue by having the remount path stop the qgroup rescan worker when we are remounting RO and teach the rescan worker to stop when a remount is in progress. If later a remount in RW mode happens, we are already resuming the qgroup rescan worker through the call to btrfs_qgroup_rescan_resume(), so we do not need to worry about that.
Tested-by: Fabian Vogt fvogt@suse.com Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/qgroup.c | 13 ++++++++++--- fs/btrfs/super.c | 8 ++++++++ 2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 53f6bb5d0b72c..47c28983fd01f 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2614,6 +2614,12 @@ out: return ret; }
+static bool rescan_should_stop(struct btrfs_fs_info *fs_info) +{ + return btrfs_fs_closing(fs_info) || + test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state); +} + static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) { struct btrfs_fs_info *fs_info = container_of(work, struct btrfs_fs_info, @@ -2622,13 +2628,14 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) struct btrfs_trans_handle *trans = NULL; int err = -ENOMEM; int ret = 0; + bool stopped = false;
path = btrfs_alloc_path(); if (!path) goto out;
err = 0; - while (!err && !btrfs_fs_closing(fs_info)) { + while (!err && !(stopped = rescan_should_stop(fs_info))) { trans = btrfs_start_transaction(fs_info->fs_root, 0); if (IS_ERR(trans)) { err = PTR_ERR(trans); @@ -2671,7 +2678,7 @@ out: }
mutex_lock(&fs_info->qgroup_rescan_lock); - if (!btrfs_fs_closing(fs_info)) + if (!stopped) fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; if (trans) { ret = update_qgroup_status_item(trans); @@ -2690,7 +2697,7 @@ out:
btrfs_end_transaction(trans);
- if (btrfs_fs_closing(fs_info)) { + if (stopped) { btrfs_info(fs_info, "qgroup scan paused"); } else if (err >= 0) { btrfs_info(fs_info, "qgroup scan completed%s", diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index eb64d4b159e07..4edfd26594eda 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1784,6 +1784,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) btrfs_scrub_cancel(fs_info); btrfs_pause_balance(fs_info);
+ /* + * Pause the qgroup rescan worker if it is running. We don't want + * it to be still running after we are in RO mode, as after that, + * by the time we unmount, it might have left a transaction open, + * so we would leak the transaction and/or crash. + */ + btrfs_qgroup_wait_for_completion(fs_info, false); + ret = btrfs_commit_super(fs_info); if (ret) goto restore;
From: Rasmus Villemoes rasmus.villemoes@prevas.dk
[ Upstream commit 887078de2a23689e29d6fa1b75d7cbc544c280be ]
Table 8-53 in the QUICC Engine Reference manual shows definitions of fields up to a size of 192 bytes, not just 128. But in table 8-111, one does find the text
Base Address of the Global Transmitter Parameter RAM Page. [...] The user needs to allocate 128 bytes for this page. The address must be aligned to the page size.
I've checked both rev. 7 (11/2015) and rev. 9 (05/2018) of the manual; they both have this inconsistency (and the table numbers are the same).
Adding a bit of debug printing, on my board the struct ucc_geth_tx_global_pram is allocated at offset 0x880, while the (opaque) ucc_geth_thread_data_tx gets allocated immediately afterwards, at 0x900. So whatever the engine writes into the thread data overlaps with the tail of the global tx pram (and devmem says that something does get written during a simple ping).
I haven't observed any failure that could be attributed to this, but it seems to be the kind of thing that would be extremely hard to debug. So extend the struct definition so that we do allocate 192 bytes.
Signed-off-by: Rasmus Villemoes rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/ucc_geth.h b/drivers/net/ethernet/freescale/ucc_geth.h index 5da19b440a6a8..bf25e49d4fe34 100644 --- a/drivers/net/ethernet/freescale/ucc_geth.h +++ b/drivers/net/ethernet/freescale/ucc_geth.h @@ -580,7 +580,14 @@ struct ucc_geth_tx_global_pram { u32 vtagtable[0x8]; /* 8 4-byte VLAN tags */ u32 tqptr; /* a base pointer to the Tx Queues Memory Region */ - u8 res2[0x80 - 0x74]; + u8 res2[0x78 - 0x74]; + u64 snums_en; + u32 l2l3baseptr; /* top byte consists of a few other bit fields */ + + u16 mtu[8]; + u8 res3[0xa8 - 0x94]; + u32 wrrtablebase; /* top byte is reserved */ + u8 res4[0xc0 - 0xac]; } __packed;
/* structure representing Extended Filtering Global Parameters in PRAM */
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 8a48c0a3360bf2bf4f40c980d0ec216e770e58ee ]
fs/dax.c uses copy_user_page() but ARC does not provide that interface, resulting in a build error.
Provide copy_user_page() in <asm/page.h>.
../fs/dax.c: In function 'copy_cow_page_dax': ../fs/dax.c:702:2: error: implicit declaration of function 'copy_user_page'; did you mean 'copy_to_user_page'? [-Werror=implicit-function-declaration]
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Vineet Gupta vgupta@synopsys.com Cc: linux-snps-arc@lists.infradead.org Cc: Dan Williams dan.j.williams@intel.com #Acked-by: Vineet Gupta vgupta@synopsys.com # v1 Cc: Andrew Morton akpm@linux-foundation.org Cc: Matthew Wilcox willy@infradead.org Cc: Jan Kara jack@suse.cz Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org #Reviewed-by: Ira Weiny ira.weiny@intel.com # v2 Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arc/include/asm/page.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arc/include/asm/page.h b/arch/arc/include/asm/page.h index 09ddddf71cc50..a70fef79c4055 100644 --- a/arch/arc/include/asm/page.h +++ b/arch/arc/include/asm/page.h @@ -13,6 +13,7 @@ #ifndef __ASSEMBLY__
#define clear_page(paddr) memset((paddr), 0, PAGE_SIZE) +#define copy_user_page(to, from, vaddr, pg) copy_page(to, from) #define copy_page(to, from) memcpy((to), (from), PAGE_SIZE)
struct vm_area_struct;
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 51049bd903a81307f751babe15a1df8d197884e8 ]
Without this, we run into a link error
arm-linux-gnueabi-ld: drivers/isdn/mISDN/dsp_audio.o: in function `dsp_audio_generate_law_tables': (.text+0x30c): undefined reference to `byte_rev_table' arm-linux-gnueabi-ld: drivers/isdn/mISDN/dsp_audio.o:(.text+0x5e4): more undefined references to `byte_rev_table' follow
Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/isdn/mISDN/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/isdn/mISDN/Kconfig b/drivers/isdn/mISDN/Kconfig index c0730d5c734d6..fb61181a5c4f7 100644 --- a/drivers/isdn/mISDN/Kconfig +++ b/drivers/isdn/mISDN/Kconfig @@ -12,6 +12,7 @@ if MISDN != n config MISDN_DSP tristate "Digital Audio Processing of transparent data" depends on MISDN + select BITREVERSE help Enable support for digital audio processing capability.
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 445c6198fe7be03b7d38e66fe8d4b3187bc251d4 ]
Since commit 1d6cd3929360 ("modpost: turn missing MODULE_LICENSE() into error") the ppc32_allmodconfig build fails with:
ERROR: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/freescale/fs_enet/mii-fec.o ERROR: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/freescale/fs_enet/mii-bitbang.o
Add the missing MODULE_LICENSEs to fix the build. Both files include a copyright header indicating they are GPL v2.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 + drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c b/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c index c8e5d889bd81f..21de56345503f 100644 --- a/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c +++ b/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c @@ -223,3 +223,4 @@ static struct platform_driver fs_enet_bb_mdio_driver = { };
module_platform_driver(fs_enet_bb_mdio_driver); +MODULE_LICENSE("GPL"); diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c index 1582d82483eca..4e6a9c5d8af55 100644 --- a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c +++ b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c @@ -224,3 +224,4 @@ static struct platform_driver fs_enet_fec_mdio_driver = { };
module_platform_driver(fs_enet_fec_mdio_driver); +MODULE_LICENSE("GPL");
From: Shawn Guo shawn.guo@linaro.org
[ Upstream commit ee61cfd955a64a58ed35cbcfc54068fcbd486945 ]
It adds a stub acpi_create_platform_device() for !CONFIG_ACPI build, so that caller doesn't have to deal with !CONFIG_ACPI build issue.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Shawn Guo shawn.guo@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/acpi.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4bb3bca75004d..37f0b8515c1cf 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -787,6 +787,13 @@ static inline int acpi_device_modalias(struct device *dev, return -ENODEV; }
+static inline struct platform_device * +acpi_create_platform_device(struct acpi_device *adev, + struct property_entry *properties) +{ + return NULL; +} + static inline bool acpi_dma_supported(struct acpi_device *adev) { return false;
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit bac717171971176b78c72d15a8b6961764ab197f ]
dtc points out that the interrupts for some devices are not parsable:
picoxcell-pc3x2.dtsi:45.19-49.5: Warning (interrupts_property): /paxi/gem@30000: Missing interrupt-parent picoxcell-pc3x2.dtsi:51.21-55.5: Warning (interrupts_property): /paxi/dmac@40000: Missing interrupt-parent picoxcell-pc3x2.dtsi:57.21-61.5: Warning (interrupts_property): /paxi/dmac@50000: Missing interrupt-parent picoxcell-pc3x2.dtsi:233.21-237.5: Warning (interrupts_property): /rwid-axi/axi2pico@c0000000: Missing interrupt-parent
There are two VIC instances, so it's not clear which one needs to be used. I found the BSP sources that reference VIC0, so use that:
https://github.com/r1mikey/meta-picoxcell/blob/master/recipes-kernel/linux/l...
Acked-by: Jamie Iles jamie@jamieiles.com Link: https://lore.kernel.org/r/20201230152010.3914962-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/arm/boot/dts/picoxcell-pc3x2.dtsi b/arch/arm/boot/dts/picoxcell-pc3x2.dtsi index 533919e96eaee..f22a6b4363177 100644 --- a/arch/arm/boot/dts/picoxcell-pc3x2.dtsi +++ b/arch/arm/boot/dts/picoxcell-pc3x2.dtsi @@ -54,18 +54,21 @@ emac: gem@30000 { compatible = "cadence,gem"; reg = <0x30000 0x10000>; + interrupt-parent = <&vic0>; interrupts = <31>; };
dmac1: dmac@40000 { compatible = "snps,dw-dmac"; reg = <0x40000 0x10000>; + interrupt-parent = <&vic0>; interrupts = <25>; };
dmac2: dmac@50000 { compatible = "snps,dw-dmac"; reg = <0x50000 0x10000>; + interrupt-parent = <&vic0>; interrupts = <26>; };
@@ -243,6 +246,7 @@ axi2pico@c0000000 { compatible = "picochip,axi2pico-pc3x2"; reg = <0xc0000000 0x10000>; + interrupt-parent = <&vic0>; interrupts = <13 14 15 16 17 18 19 20 21>; }; };
From: Al Viro viro@zeniv.linux.org.uk
commit d36a1dd9f77ae1e72da48f4123ed35627848507d upstream.
We are not guaranteed the locking environment that would prevent dentry getting renamed right under us. And it's possible for old long name to be freed after rename, leading to UAF here.
Cc: stable@kernel.org # v2.6.2+ Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/lsm_audit.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/security/lsm_audit.c +++ b/security/lsm_audit.c @@ -277,7 +277,9 @@ static void dump_common_audit_data(struc struct inode *inode;
audit_log_format(ab, " name="); + spin_lock(&a->u.dentry->d_lock); audit_log_untrustedstring(ab, a->u.dentry->d_name.name); + spin_unlock(&a->u.dentry->d_lock);
inode = d_backing_inode(a->u.dentry); if (inode) { @@ -295,8 +297,9 @@ static void dump_common_audit_data(struc dentry = d_find_alias(inode); if (dentry) { audit_log_format(ab, " name="); - audit_log_untrustedstring(ab, - dentry->d_name.name); + spin_lock(&dentry->d_lock); + audit_log_untrustedstring(ab, dentry->d_name.name); + spin_unlock(&dentry->d_lock); dput(dentry); } audit_log_format(ab, " dev=");
From: Dan Carpenter dan.carpenter@oracle.com
commit f373a811fd9a69fc8bafb9bcb41d2cfa36c62665 upstream.
Return -ETIMEDOUT if the dsp boot times out instead of returning success.
Fixes: cb6a55284629 ("ASoC: Intel: cnl: Add sst library functions for cnl platform") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://lore.kernel.org/r/X9NEvCzuN+IObnTN@mwanda Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/intel/skylake/cnl-sst.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/soc/intel/skylake/cnl-sst.c +++ b/sound/soc/intel/skylake/cnl-sst.c @@ -212,6 +212,7 @@ static int cnl_set_dsp_D0(struct sst_dsp "dsp boot timeout, status=%#x error=%#x\n", sst_dsp_shim_read(ctx, CNL_ADSP_FW_STATUS), sst_dsp_shim_read(ctx, CNL_ADSP_ERROR_CODE)); + ret = -ETIMEDOUT; goto err; } } else {
From: Dave Wysochanski dwysocha@redhat.com
commit 3d1a90ab0ed93362ec8ac85cf291243c87260c21 upstream.
It is only safe to call the tracepoint before rpc_put_task() because 'data' is freed inside nfs4_lock_release (rpc_release).
Fixes: 48c9579a1afe ("Adding stateid information to tracepoints") Signed-off-by: Dave Wysochanski dwysocha@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6395,9 +6395,9 @@ static int _nfs4_do_setlk(struct nfs4_st data->arg.new_lock_owner, ret); } else data->cancelled = true; + trace_nfs4_set_lock(fl, state, &data->res.stateid, cmd, ret); rpc_put_task(task); dprintk("%s: done, ret = %d!\n", __func__, ret); - trace_nfs4_set_lock(fl, state, &data->res.stateid, cmd, ret); return ret; }
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 67bbceedc9bb8ad48993a8bd6486054756d711f4 upstream.
If the layout return-on-close failed because the layoutreturn was never sent, then we should mark the layout for return again.
Fixes: 9c47b18cf722 ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/pnfs.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1328,12 +1328,18 @@ void pnfs_roc_release(struct nfs4_layout int ret) { struct pnfs_layout_hdr *lo = args->layout; + struct inode *inode = args->inode; const nfs4_stateid *arg_stateid = NULL; const nfs4_stateid *res_stateid = NULL; struct nfs4_xdr_opaque_data *ld_private = args->ld_private;
switch (ret) { case -NFS4ERR_NOMATCHING_LAYOUT: + spin_lock(&inode->i_lock); + if (pnfs_layout_is_valid(lo) && + nfs4_stateid_match_other(&args->stateid, &lo->plh_stateid)) + pnfs_set_plh_return_info(lo, args->range.iomode, 0); + spin_unlock(&inode->i_lock); break; case 0: if (res->lrs_present)
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 896567ee7f17a8a736cda8a28cc987228410a2ac upstream.
Before referencing the inode, we must ensure that the superblock can be referenced. Otherwise, we can end up with iput() calling superblock operations that are no longer valid or accessible.
Fixes: ea7c38fef0b7 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/internal.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -575,12 +575,14 @@ extern int nfs4_test_session_trunk(struc
static inline struct inode *nfs_igrab_and_active(struct inode *inode) { - inode = igrab(inode); - if (inode != NULL && !nfs_sb_active(inode->i_sb)) { - iput(inode); - inode = NULL; + struct super_block *sb = inode->i_sb; + + if (sb && nfs_sb_active(sb)) { + if (igrab(inode)) + return inode; + nfs_sb_deactive(sb); } - return inode; + return NULL; }
static inline void nfs_iput_and_deactive(struct inode *inode)
From: Jan Kara jack@suse.cz
commit dfd56c2c0c0dbb11be939b804ddc8d5395ab3432 upstream.
When setting password salt in the superblock, we forget to recompute the superblock checksum so it will not match until the next superblock modification which recomputes the checksum. Fix it.
CC: Michael Halcrow mhalcrow@google.com Reported-by: Andreas Dilger adilger@dilger.ca Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support") Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20201216101844.22917-8-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/ioctl.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -1032,7 +1032,10 @@ resizefs_out: err = ext4_journal_get_write_access(handle, sbi->s_sbh); if (err) goto pwsalt_err_journal; + lock_buffer(sbi->s_sbh); generate_random_uuid(sbi->s_es->s_encrypt_pw_salt); + ext4_superblock_csum_set(sb); + unlock_buffer(sbi->s_sbh); err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); pwsalt_err_journal:
From: Dinghao Liu dinghao.liu@zju.edu.cn
commit a306aba9c8d869b1fdfc8ad9237f1ed718ea55e6 upstream.
If usnic_ib_qp_grp_create() fails at the first call, dev_list will not be freed on error, which leads to memleak.
Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Link: https://lore.kernel.org/r/20201226074248.2893-1-dinghao.liu@zju.edu.cn Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c @@ -188,6 +188,7 @@ find_free_vf_and_create_qp_grp(struct us
} usnic_uiom_free_dev_list(dev_list); + dev_list = NULL; }
/* Try to find resources on an unused vf */ @@ -212,6 +213,8 @@ find_free_vf_and_create_qp_grp(struct us qp_grp_check: if (IS_ERR_OR_NULL(qp_grp)) { usnic_err("Failed to allocate qp_grp\n"); + if (usnic_ib_share_vf) + usnic_uiom_free_dev_list(dev_list); return ERR_PTR(qp_grp ? PTR_ERR(qp_grp) : -ENOMEM); } return qp_grp;
From: Jann Horn jannh@google.com
commit 8ff60eb052eeba95cfb3efe16b08c9199f8121cf upstream.
acquire_slab() fails if there is contention on the freelist of the page (probably because some other CPU is concurrently freeing an object from the page). In that case, it might make sense to look for a different page (since there might be more remote frees to the page from other CPUs, and we don't want contention on struct page).
However, the current code accidentally stops looking at the partial list completely in that case. Especially on kernels without CONFIG_NUMA set, this means that get_partial() fails and new_slab_objects() falls back to new_slab(), allocating new pages. This could lead to an unnecessary increase in memory fragmentation.
Link: https://lkml.kernel.org/r/20201228130853.1871516-1-jannh@google.com Fixes: 7ced37197196 ("slub: Acquire_slab() avoid loop") Signed-off-by: Jann Horn jannh@google.com Acked-by: David Rientjes rientjes@google.com Acked-by: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slub.c +++ b/mm/slub.c @@ -1846,7 +1846,7 @@ static void *get_partial_node(struct kme
t = acquire_slab(s, n, page, object == NULL, &objects); if (!t) - break; + continue; /* cmpxchg raced */
available += objects; if (!object) {
Just a note to the stable tree: this commit has been reverted upstream, because it causes a huge performance drop (admittedly on a load and setup that may not be all that relevant to most people).
It was applied to 4.4, 4.9 and 4.12, because the commit it was marked as "fixing" is from 2012, but it turns out that the early exit from the loop in that commit was very much intentional, and very much shows up on scalability benchmarks.
I don't think this is likely to be a big deal for the stable kernels - we're basically talking tuning for special cases, and while it is reverted in my tree now, the "correct" thing to do is likely to be a bit more flexible than either "exit loop immediately" or "loop for as long as we have contention".
In practice, most machines probably won't see either case - or it will at least be rare enough that you can't tell.
The machine that reports a huge performance drop was a multi-socket machine under fairly extreme conditions, and these contention issues are often close to exponential - a smaller machine (or a slighly less extreme load) would never see the issue at all either way.
See
https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
for details if you care. I don't think this has to necessarily be undone in the stable trees, this email is more of an incidental note just as a heads-up.
Linus
On Fri, Jan 22, 2021 at 6:14 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: j.nixdorf@avm.de j.nixdorf@avm.de
commit 86b53fbf08f48d353a86a06aef537e78e82ba721 upstream.
A return value of 0 means success. This is documented in lib/kstrtox.c.
This was found by trying to mount an NFS share from a link-local IPv6 address with the interface specified by its index:
mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1")
Before this commit this failed with EINVAL and also caused the following message in dmesg:
[...] NFS: bad IP address specified: addr=fe80::1%1
The syscall using the same address based on the interface name instead of its index succeeds.
Credits for this patch go to my colleague Christian Speich, who traced the origin of this bug to this line of code.
Signed-off-by: Johannes Nixdorf j.nixdorf@avm.de Fixes: 00cfaa943ec3 ("replace strict_strto calls") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/addr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sunrpc/addr.c +++ b/net/sunrpc/addr.c @@ -184,7 +184,7 @@ static int rpc_parse_scope_id(struct net scope_id = dev->ifindex; dev_put(dev); } else { - if (kstrtou32(p, 10, &scope_id) == 0) { + if (kstrtou32(p, 10, &scope_id) != 0) { kfree(p); return 0; }
From: Mike Snitzer snitzer@redhat.com
commit 0378c625afe80eb3f212adae42cc33c9f6f31abf upstream.
There wasn't ever a real need to log an error in the kernel log for ioctls issued with insufficient permissions. Simply return an error and if an admin/user is sufficiently motivated they can enable DM's dynamic debugging to see an explanation for why the ioctls were disallowed.
Reported-by: Nir Soffer nsoffer@redhat.com Fixes: e980f62353c6 ("dm: don't allow ioctls to targets that don't map to whole devices") Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -472,7 +472,7 @@ static int dm_blk_ioctl(struct block_dev * subset of the parent bdev; require extra privileges. */ if (!capable(CAP_SYS_RAWIO)) { - DMWARN_LIMIT( + DMDEBUG_LIMIT( "%s: sending ioctl %x to DM device without required privilege.", current->comm, cmd); r = -ENOIOCTLCMD;
From: Geert Uytterhoeven geert+renesas@glider.be
commit 9f65df9c589f249435255da37a5dd11f1bc86f4d upstream.
As snd_fw_async_midi_port.consume_bytes is unsigned int, and NSEC_PER_SEC is 1000000000L, the second multiplication in
port->consume_bytes * 8 * NSEC_PER_SEC / 31250
always overflows on 32-bit platforms, truncating the result. Fix this by precalculating "NSEC_PER_SEC / 31250", which is an integer constant.
Note that this assumes port->consume_bytes <= 16777.
Fixes: 531f471834227d03 ("ALSA: firewire-lib/firewire-tascam: localize async midi port") Reviewed-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20210111130251.361335-3-geert+renesas@glider.be Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/firewire/tascam/tascam-transaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/firewire/tascam/tascam-transaction.c +++ b/sound/firewire/tascam/tascam-transaction.c @@ -210,7 +210,7 @@ static void midi_port_work(struct work_s
/* Set interval to next transaction. */ port->next_ktime = ktime_add_ns(ktime_get(), - port->consume_bytes * 8 * NSEC_PER_SEC / 31250); + port->consume_bytes * 8 * (NSEC_PER_SEC / 31250));
/* Start this transaction. */ port->idling = false;
From: Geert Uytterhoeven geert+renesas@glider.be
commit e7c22eeaff8565d9a8374f320238c251ca31480b upstream.
As snd_ff.rx_bytes[] is unsigned int, and NSEC_PER_SEC is 1000000000L, the second multiplication in
ff->rx_bytes[port] * 8 * NSEC_PER_SEC / 31250
always overflows on 32-bit platforms, truncating the result. Fix this by precalculating "NSEC_PER_SEC / 31250", which is an integer constant.
Note that this assumes ff->rx_bytes[port] <= 16777.
Fixes: 19174295788de77d ("ALSA: fireface: add transaction support") Reviewed-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://lore.kernel.org/r/20210111130251.361335-2-geert+renesas@glider.be Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/firewire/fireface/ff-transaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/firewire/fireface/ff-transaction.c +++ b/sound/firewire/fireface/ff-transaction.c @@ -99,7 +99,7 @@ static void transmit_midi_msg(struct snd
/* Set interval to next transaction. */ ff->next_ktime[port] = ktime_add_ns(ktime_get(), - len * 8 * NSEC_PER_SEC / 31250); + len * 8 * (NSEC_PER_SEC / 31250)); ff->rx_bytes[port] = len;
/*
From: Jesper Dangaard Brouer brouer@redhat.com
commit f6351c3f1c27c80535d76cac2299aec44c36291e upstream.
The old way of changing the conntrack hashsize runtime was through changing the module param via file /sys/module/nf_conntrack/parameters/hashsize. This was extended to sysctl change in commit 3183ab8997a4 ("netfilter: conntrack: allow increasing bucket size via sysctl too").
The commit introduced second "user" variable nf_conntrack_htable_size_user which shadow actual variable nf_conntrack_htable_size. When hashsize is changed via module param this "user" variable isn't updated. This results in sysctl net/netfilter/nf_conntrack_buckets shows the wrong value when users update via the old way.
This patch fix the issue by always updating "user" variable when reading the proc file. This will take care of changes to the actual variable without sysctl need to be aware.
Fixes: 3183ab8997a4 ("netfilter: conntrack: allow increasing bucket size via sysctl too") Reported-by: Yoel Caspersen yoel@kviknet.dk Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nf_conntrack_standalone.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/netfilter/nf_conntrack_standalone.c +++ b/net/netfilter/nf_conntrack_standalone.c @@ -537,6 +537,9 @@ nf_conntrack_hash_sysctl(struct ctl_tabl { int ret;
+ /* module_param hashsize could have changed value */ + nf_conntrack_htable_size_user = nf_conntrack_htable_size; + ret = proc_dointvec(table, write, buffer, lenp, ppos); if (ret < 0 || !write) return ret;
From: Hamish Martin hamish.martin@alliedtelesis.co.nz
commit c4005a8f65edc55fb1700dfc5c1c3dc58be80209 upstream.
The 'distrust_firmware' module parameter dates from 2004 and the USB subsystem is a lot more mature and reliable now than it was then. Alter the default to false now.
Suggested-by: Alan Stern stern@rowland.harvard.edu Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Hamish Martin hamish.martin@alliedtelesis.co.nz Link: https://lore.kernel.org/r/20200910212512.16670-2-hamish.martin@alliedtelesis... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/host/ohci-hcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/host/ohci-hcd.c +++ b/drivers/usb/host/ohci-hcd.c @@ -100,7 +100,7 @@ static void io_watchdog_func(unsigned lo
/* Some boards misreport power switching/overcurrent */ -static bool distrust_firmware = true; +static bool distrust_firmware; module_param (distrust_firmware, bool, 0); MODULE_PARM_DESC (distrust_firmware, "true to distrust firmware power/overcurrent setup");
From: Will Deacon will@kernel.org
commit dca5244d2f5b94f1809f0c02a549edf41ccd5493 upstream.
GCC versions >= 4.9 and < 5.1 have been shown to emit memory references beyond the stack pointer, resulting in memory corruption if an interrupt is taken after the stack pointer has been adjusted but before the reference has been executed. This leads to subtle, infrequent data corruption such as the EXT4 problems reported by Russell King at the link below.
Life is too short for buggy compilers, so raise the minimum GCC version required by arm64 to 5.1.
Reported-by: Russell King linux@armlinux.org.uk Suggested-by: Arnd Bergmann arnd@kernel.org Signed-off-by: Will Deacon will@kernel.org Tested-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Acked-by: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Cc: Theodore Ts'o tytso@mit.edu Cc: Florian Weimer fweimer@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Nick Desaulniers ndesaulniers@google.com Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com [will: backport to 4.4.y/4.9.y/4.14.y] Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/compiler-gcc.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -152,6 +152,12 @@
#if GCC_VERSION < 30200 # error Sorry, your compiler is too old - please upgrade it. +#elif defined(CONFIG_ARM64) && GCC_VERSION < 50100 +/* + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293 + * https://lore.kernel.org/r/20210107111841.GN1551@shell.armlinux.org.uk + */ +# error Sorry, your version of GCC is too old - please use 5.1 or newer. #endif
#if GCC_VERSION < 30300
From: J. Bruce Fields bfields@redhat.com
commit 51b2ee7d006a736a9126e8111d1f24e4fd0afaa6 upstream.
If you export a subdirectory of a filesystem, a READDIRPLUS on the root of that export will return the filehandle of the parent with the ".." entry.
The filehandle is optional, so let's just not return the filehandle for ".." if we're at the root of an export.
Note that once the client learns one filehandle outside of the export, they can trivially access the rest of the export using further lookups.
However, it is also not very difficult to guess filehandles outside of the export. So exporting a subdirectory of a filesystem should considered equivalent to providing access to the entire filesystem. To avoid confusion, we recommend only exporting entire filesystems.
Reported-by: Youjipeng wangzhibei1999@gmail.com Signed-off-by: J. Bruce Fields bfields@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfsd/nfs3xdr.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -845,9 +845,14 @@ compose_entry_fh(struct nfsd3_readdirres if (isdotent(name, namlen)) { if (namlen == 2) { dchild = dget_parent(dparent); - /* filesystem root - cannot return filehandle for ".." */ + /* + * Don't return filehandle for ".." if we're at + * the filesystem or export root: + */ if (dchild == dparent) goto out; + if (dparent == exp->ex_path.dentry) + goto out; } else dchild = dget(dparent); } else
From: Manish Chopra manishc@marvell.com
[ Upstream commit a2bc221b972db91e4be1970e776e98f16aa87904 ]
For all PCI functions on the netxen_nic adapter, interrupt mode (INTx or MSI) configuration is dependent on what has been configured by the PCI function zero in the shared interrupt register, as these adapters do not support mixed mode interrupts among the functions of a given adapter.
Logic for setting MSI/MSI-x interrupt mode in the shared interrupt register based on PCI function id zero check is not appropriate for all family of netxen adapters, as for some of the netxen family adapters PCI function zero is not really meant to be probed/loaded in the host but rather just act as a management function on the device, which caused all the other PCI functions on the adapter to always use legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode.
This patch replaces that check with port number so that for all type of adapters driver attempts for MSI/MSI-x interrupt modes.
Fixes: b37eb210c076 ("netxen_nic: Avoid mixed mode interrupts") Signed-off-by: Manish Chopra manishc@marvell.com Signed-off-by: Igor Russkikh irusskikh@marvell.com Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c @@ -586,11 +586,6 @@ static const struct net_device_ops netxe #endif };
-static inline bool netxen_function_zero(struct pci_dev *pdev) -{ - return (PCI_FUNC(pdev->devfn) == 0) ? true : false; -} - static inline void netxen_set_interrupt_mode(struct netxen_adapter *adapter, u32 mode) { @@ -686,7 +681,7 @@ static int netxen_setup_intr(struct netx netxen_initialize_interrupt_registers(adapter); netxen_set_msix_bit(pdev, 0);
- if (netxen_function_zero(pdev)) { + if (adapter->portnum == 0) { if (!netxen_setup_msi_interrupts(adapter, num_msix)) netxen_set_interrupt_mode(adapter, NETXEN_MSI_MODE); else
From: Andrey Zhizhikin andrey.zhizhikin@leica-geosystems.com
[ Upstream commit e56b3d94d939f52d46209b9e1b6700c5bfff3123 ]
MSFT ActiveSync implementation requires that the size of the response for incoming query is to be provided in the request input length. Failure to set the input size proper results in failed request transfer, where the ActiveSync counterpart reports the NDIS_STATUS_INVALID_LENGTH (0xC0010014L) error.
Set the input size for OID_GEN_PHYSICAL_MEDIUM query to the expected size of the response in order for the ActiveSync to properly respond to the request.
Fixes: 039ee17d1baa ("rndis_host: Add RNDIS physical medium checking into generic_rndis_bind()") Signed-off-by: Andrey Zhizhikin andrey.zhizhikin@leica-geosystems.com Link: https://lore.kernel.org/r/20210108095839.3335-1-andrey.zhizhikin@leica-geosy... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/rndis_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/usb/rndis_host.c +++ b/drivers/net/usb/rndis_host.c @@ -399,7 +399,7 @@ generic_rndis_bind(struct usbnet *dev, s reply_len = sizeof *phym; retval = rndis_query(dev, intf, u.buf, RNDIS_OID_GEN_PHYSICAL_MEDIUM, - 0, (void **) &phym, &reply_len); + reply_len, (void **)&phym, &reply_len); if (retval != 0 || !phym) { /* OID is optional so don't fail here. */ phym_unspec = cpu_to_le32(RNDIS_PHYSICAL_MEDIUM_UNSPECIFIED);
From: Willem de Bruijn willemb@google.com
[ Upstream commit 9bd6b629c39e3fa9e14243a6d8820492be1a5b2e ]
esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for the esp trailer.
It accesses the page with kmap_atomic to handle highmem. But skb_page_frag_refill can return compound pages, of which kmap_atomic only maps the first underlying page.
skb_page_frag_refill does not return highmem, because flag __GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP. That also does not call kmap_atomic, but directly uses page_address, in skb_copy_to_page_nocache. Do the same for ESP.
This issue has become easier to trigger with recent kmap local debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Willem de Bruijn willemb@google.com Acked-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/esp4.c | 7 +------ net/ipv6/esp6.c | 7 +------ 2 files changed, 2 insertions(+), 12 deletions(-)
--- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -252,7 +252,6 @@ static int esp_output_udp_encap(struct x int esp_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *esp) { u8 *tail; - u8 *vaddr; int nfrags; int esph_offset; struct page *page; @@ -294,14 +293,10 @@ int esp_output_head(struct xfrm_state *x page = pfrag->page; get_page(page);
- vaddr = kmap_atomic(page); - - tail = vaddr + pfrag->offset; + tail = page_address(page) + pfrag->offset;
esp_output_fill_trailer(tail, esp->tfclen, esp->plen, esp->proto);
- kunmap_atomic(vaddr); - nfrags = skb_shinfo(skb)->nr_frags;
__skb_fill_page_desc(skb, nfrags, page, pfrag->offset, --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -219,7 +219,6 @@ static void esp_output_fill_trailer(u8 * int esp6_output_head(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *esp) { u8 *tail; - u8 *vaddr; int nfrags; struct page *page; struct sk_buff *trailer; @@ -252,14 +251,10 @@ int esp6_output_head(struct xfrm_state * page = pfrag->page; get_page(page);
- vaddr = kmap_atomic(page); - - tail = vaddr + pfrag->offset; + tail = page_address(page) + pfrag->offset;
esp_output_fill_trailer(tail, esp->tfclen, esp->plen, esp->proto);
- kunmap_atomic(vaddr); - nfrags = skb_shinfo(skb)->nr_frags;
__skb_fill_page_desc(skb, nfrags, page, pfrag->offset,
From: Petr Machata me@pmachata.org
[ Upstream commit 826f328e2b7e8854dd42ea44e6519cd75018e7b1 ]
DCB uses the same handler function for both RTM_GETDCB and RTM_SETDCB messages. dcb_doit() bounces RTM_SETDCB mesasges if the user does not have the CAP_NET_ADMIN capability.
However, the operation to be performed is not decided from the DCB message type, but from the DCB command. Thus DCB_CMD_*_GET commands are used for reading DCB objects, the corresponding SET and DEL commands are used for manipulation.
The assumption is that set-like commands will be sent via an RTM_SETDCB message, and get-like ones via RTM_GETDCB. However, this assumption is not enforced.
It is therefore possible to manipulate DCB objects without CAP_NET_ADMIN capability by sending the corresponding command in an RTM_GETDCB message. That is a bug. Fix it by validating the type of the request message against the type used for the response.
Fixes: 2f90b8657ec9 ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver") Signed-off-by: Petr Machata me@pmachata.org Link: https://lore.kernel.org/r/a2a9b88418f3a58ef211b718f2970128ef9e3793.160867364... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dcb/dcbnl.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -1727,6 +1727,8 @@ static int dcb_doit(struct sk_buff *skb, fn = &reply_funcs[dcb->cmd]; if (!fn->cb) return -EOPNOTSUPP; + if (fn->type != nlh->nlmsg_type) + return -EPERM;
if (!tb[DCB_ATTR_IFNAME]) return -EINVAL;
From: Petr Machata petrm@nvidia.com
[ Upstream commit df85bc140a4d6cbaa78d8e9c35154e1a2f0622c7 ]
In commit 826f328e2b7e ("net: dcb: Validate netlink message in DCB handler"), Linux started rejecting RTM_GETDCB netlink messages if they contained a set-like DCB_CMD_ command.
The reason was that privileges were only verified for RTM_SETDCB messages, but the value that determined the action to be taken is the command, not the message type. And validation of message type against the DCB command was the obvious missing piece.
Unfortunately it turns out that mlnx_qos, a somewhat widely deployed tool for configuration of DCB, accesses the DCB set-like APIs through RTM_GETDCB.
Therefore do not bounce the discrepancy between message type and command. Instead, in addition to validating privileges based on the actual message type, validate them also based on the expected message type. This closes the loophole of allowing DCB configuration on non-admin accounts, while maintaining backward compatibility.
Fixes: 2f90b8657ec9 ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver") Fixes: 826f328e2b7e ("net: dcb: Validate netlink message in DCB handler") Signed-off-by: Petr Machata petrm@nvidia.com Link: https://lore.kernel.org/r/a3edcfda0825f2aa2591801c5232f2bbf2d8a554.161038480... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dcb/dcbnl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -1727,7 +1727,7 @@ static int dcb_doit(struct sk_buff *skb, fn = &reply_funcs[dcb->cmd]; if (!fn->cb) return -EOPNOTSUPP; - if (fn->type != nlh->nlmsg_type) + if (fn->type == RTM_SETDCB && !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM;
if (!tb[DCB_ATTR_IFNAME])
From: David Wu david.wu@rock-chips.com
[ Upstream commit 5b55299eed78538cc4746e50ee97103a1643249c ]
Since the original mtu is not used when the mtu is updated, the mtu is aligned with cache, this will get an incorrect. For example, if you want to configure the mtu to be 1500, but mtu 1536 is configured in fact.
Fixed: eaf4fac478077 ("net: stmmac: Do not accept invalid MTU values") Signed-off-by: David Wu david.wu@rock-chips.com Link: https://lore.kernel.org/r/20210113034109.27865-1-david.wu@rock-chips.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3613,6 +3613,7 @@ static int stmmac_change_mtu(struct net_ { struct stmmac_priv *priv = netdev_priv(dev); int txfifosz = priv->plat->tx_fifo_size; + const int mtu = new_mtu;
if (txfifosz == 0) txfifosz = priv->dma_cap.tx_fifo_size; @@ -3630,7 +3631,7 @@ static int stmmac_change_mtu(struct net_ if ((txfifosz < new_mtu) || (new_mtu > BUF_SIZE_16KiB)) return -EINVAL;
- dev->mtu = new_mtu; + dev->mtu = mtu;
netdev_update_features(dev);
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 47e4bb147a96f1c9b4e7691e7e994e53838bfff8 ]
We need to unregister the netdevice if config failed. .ndo_uninit takes care of most of the heavy lifting.
This was uncovered by recent commit c269a24ce057 ("net: make free_netdev() more lenient with unregistering devices"). Previously the partially-initialized device would be left in the system.
Reported-and-tested-by: syzbot+2393580080a2da190f04@syzkaller.appspotmail.com Fixes: e2f1f072db8d ("sit: allow to configure 6rd tunnels via netlink") Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Link: https://lore.kernel.org/r/20210114012947.2515313-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/sit.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1582,8 +1582,11 @@ static int ipip6_newlink(struct net *src }
#ifdef CONFIG_IPV6_SIT_6RD - if (ipip6_netlink_6rd_parms(data, &ip6rd)) + if (ipip6_netlink_6rd_parms(data, &ip6rd)) { err = ipip6_tunnel_update_6rd(nt, &ip6rd); + if (err < 0) + unregister_netdevice_queue(dev, NULL); + } #endif
return err;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ]
Both virtio net and napi_get_frags() allocate skbs with a very small skb->head
While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage.
For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations per page (order-3 page in x86), or even 64 on PowerPC
We have been tracking OOM issues on GKE hosts hitting tcp_mem limits but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
Even if we force napi_alloc_skb() to only use order-0 pages, the issue would still be there on arches with PAGE_SIZE >= 32768
This patch makes sure that small skb head are kmalloc backed, so that other objects in the slab page can be reused instead of being held as long as skbs are sitting in socket queues.
Note that we might in the future use the sk_buff napi cache, instead of going through a more expensive __alloc_skb()
Another idea would be to use separate page sizes depending on the allocated length (to never have more than 4 frags per page)
I would like to thank Greg Thelen for his precious help on this matter, analysing crash dumps is always a time consuming task.
Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Paolo Abeni pabeni@redhat.com Cc: Greg Thelen gthelen@google.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Acked-by: Michael S. Tsirkin mst@redhat.com Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/skbuff.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -459,13 +459,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb); struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, gfp_t gfp_mask) { - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + struct napi_alloc_cache *nc; struct sk_buff *skb; void *data;
len += NET_SKB_PAD + NET_IP_ALIGN;
- if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || + /* If requested length is either too small or too big, + * we use kmalloc() for skb->head allocation. + */ + if (len <= SKB_WITH_OVERHEAD(1024) || + len > SKB_WITH_OVERHEAD(PAGE_SIZE) || (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); if (!skb) @@ -473,6 +477,7 @@ struct sk_buff *__napi_alloc_skb(struct goto skb_success; }
+ nc = this_cpu_ptr(&napi_alloc_cache); len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); len = SKB_DATA_ALIGN(len);
From: David Howells dhowells@redhat.com
[ Upstream commit d52e419ac8b50c8bef41b398ed13528e75d7ad48 ]
Clang static analysis reports the following:
net/rxrpc/key.c:657:11: warning: Assigned value is garbage or undefined toksize = toksizes[tok++]; ^ ~~~~~~~~~~~~~~~
rxrpc_read() contains two consecutive loops. The first loop calculates the token sizes and stores the results in toksizes[] and the second one uses the array. When there is an error in identifying the token in the first loop, the token is skipped, no change is made to the toksizes[] array. When the same error happens in the second loop, the token is not skipped. This will cause the toksizes[] array to be out of step and will overrun past the calculated sizes.
Fix this by making both loops log a message and return an error in this case. This should only happen if a new token type is incompletely implemented, so it should normally be impossible to trigger this.
Fixes: 9a059cd5ca7d ("rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()") Reported-by: Tom Rix trix@redhat.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Tom Rix trix@redhat.com Link: https://lore.kernel.org/r/161046503122.2445787.16714129930607546635.stgit@wa... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rxrpc/key.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -1112,7 +1112,7 @@ static long rxrpc_read(const struct key default: /* we have a ticket we can't encode */ pr_err("Unsupported key token type (%u)\n", token->security_index); - continue; + return -ENOPKG; }
_debug("token[%u]: toksize=%u", ntoks, toksize); @@ -1227,7 +1227,9 @@ static long rxrpc_read(const struct key break;
default: - break; + pr_err("Unsupported key token type (%u)\n", + token->security_index); + return -ENOPKG; }
ASSERTCMP((unsigned long)xdr - (unsigned long)oldxdr, ==,
From: Hoang Le hoang.h.le@dektech.com.au
[ Upstream commit b77413446408fdd256599daf00d5be72b5f3e7c6 ]
The buffer list can have zero skb as following path: tipc_named_node_up()->tipc_node_xmit()->tipc_link_xmit(), so we need to check the list before casting an &sk_buff.
Fault report: [] tipc: Bulk publication failure [] general protection fault, probably for non-canonical [#1] PREEMPT [...] [] KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf] [] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.10.0-rc4+ #2 [] Hardware name: Bochs ..., BIOS Bochs 01/01/2011 [] RIP: 0010:tipc_link_xmit+0xc1/0x2180 [] Code: 24 b8 00 00 00 00 4d 39 ec 4c 0f 44 e8 e8 d7 0a 10 f9 48 [...] [] RSP: 0018:ffffc90000006ea0 EFLAGS: 00010202 [] RAX: dffffc0000000000 RBX: ffff8880224da000 RCX: 1ffff11003d3cc0d [] RDX: 0000000000000019 RSI: ffffffff886007b9 RDI: 00000000000000c8 [] RBP: ffffc90000007018 R08: 0000000000000001 R09: fffff52000000ded [] R10: 0000000000000003 R11: fffff52000000dec R12: ffffc90000007148 [] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90000007018 [] FS: 0000000000000000(0000) GS:ffff888037400000(0000) knlGS:000[...] [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [] CR2: 00007fffd2db5000 CR3: 000000002b08f000 CR4: 00000000000006f0
Fixes: af9b028e270fd ("tipc: make media xmit call outside node spinlock context") Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Hoang Le hoang.h.le@dektech.com.au Link: https://lore.kernel.org/r/20210108071337.3598-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/link.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -882,9 +882,7 @@ void tipc_link_reset(struct tipc_link *l int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, struct sk_buff_head *xmitq) { - struct tipc_msg *hdr = buf_msg(skb_peek(list)); unsigned int maxwin = l->window; - int imp = msg_importance(hdr); unsigned int mtu = l->mtu; u16 ack = l->rcv_nxt - 1; u16 seqno = l->snd_nxt; @@ -893,13 +891,20 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *backlogq = &l->backlogq; struct sk_buff *skb, *_skb, **tskb; int pkt_cnt = skb_queue_len(list); + struct tipc_msg *hdr; int rc = 0; + int imp;
+ if (pkt_cnt <= 0) + return 0; + + hdr = buf_msg(skb_peek(list)); if (unlikely(msg_size(hdr) > mtu)) { skb_queue_purge(list); return -EMSGSIZE; }
+ imp = msg_importance(hdr); /* Allow oversubscription of one data msg per source at congestion */ if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) { if (imp == TIPC_SYSTEM_IMPORTANCE) {
From: Edward Cree ecree@solarflare.com
[ Upstream commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ]
list_del() leaves the skb->next pointer poisoned, which can then lead to a crash in e.g. OVS forwarding. For example, setting up an OVS VXLAN forwarding bridge on sfc as per:
======== $ ovs-vsctl show 5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30 Bridge "br0" Port "br0" Interface "br0" type: internal Port "enp6s0f0" Interface "enp6s0f0" Port "vxlan0" Interface "vxlan0" type: vxlan options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"} ovs_version: "2.5.0" ======== (where 10.0.0.5 is an address on enp6s0f1) and sending traffic across it will lead to the following panic: ======== general protection fault: 0000 [#1] SMP PTI CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701 Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013 RIP: 0010:dev_hard_start_xmit+0x38/0x200 Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95 RSP: 0018:ffff888627b437e0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000 RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8 R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000 R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000 FS: 0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0 Call Trace: <IRQ> __dev_queue_xmit+0x623/0x870 ? masked_flow_lookup+0xf7/0x220 [openvswitch] ? ep_poll_callback+0x101/0x310 do_execute_actions+0xaba/0xaf0 [openvswitch] ? __wake_up_common+0x8a/0x150 ? __wake_up_common_lock+0x87/0xc0 ? queue_userspace_packet+0x31c/0x5b0 [openvswitch] ovs_execute_actions+0x47/0x120 [openvswitch] ovs_dp_process_packet+0x7d/0x110 [openvswitch] ovs_vport_receive+0x6e/0xd0 [openvswitch] ? dst_alloc+0x64/0x90 ? rt_dst_alloc+0x50/0xd0 ? ip_route_input_slow+0x19a/0x9a0 ? __udp_enqueue_schedule_skb+0x198/0x1b0 ? __udp4_lib_rcv+0x856/0xa30 ? __udp4_lib_rcv+0x856/0xa30 ? cpumask_next_and+0x19/0x20 ? find_busiest_group+0x12d/0xcd0 netdev_frame_hook+0xce/0x150 [openvswitch] __netif_receive_skb_core+0x205/0xae0 __netif_receive_skb_list_core+0x11e/0x220 netif_receive_skb_list+0x203/0x460 ? __efx_rx_packet+0x335/0x5e0 [sfc] efx_poll+0x182/0x320 [sfc] net_rx_action+0x294/0x3c0 __do_softirq+0xca/0x297 irq_exit+0xa6/0xb0 do_IRQ+0x54/0xd0 common_interrupt+0xf/0xf </IRQ> ======== So, in all listified-receive handling, instead pull skbs off the lists with skb_list_del_init().
Fixes: 9af86f933894 ("net: core: fix use-after-free in __netif_receive_skb_list_core") Fixes: 7da517a3bc52 ("net: core: Another step of skb receive list processing") Fixes: a4ca8b7df73c ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()") Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") Signed-off-by: Edward Cree ecree@solarflare.com Signed-off-by: David S. Miller davem@davemloft.net [ for 4.14.y and older, just take the skbuff.h change - gregkh ] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1335,6 +1335,17 @@ static inline void skb_zcopy_abort(struc } }
+static inline void skb_mark_not_on_list(struct sk_buff *skb) +{ + skb->next = NULL; +} + +static inline void skb_list_del_init(struct sk_buff *skb) +{ + __list_del_entry(&skb->list); + skb_mark_not_on_list(skb); +} + /** * skb_queue_empty - check if a queue is empty * @list: queue head
From: Jason A. Donenfeld Jason@zx2c4.com
commit dcfea72e79b0aa7a057c8f6024169d86a1bbc84b upstream.
As part of the continual effort to remove direct usage of skb->next and skb->prev, this patch adds a helper for iterating through the singly-linked variant of skb lists, which are used for lists of GSO packet. The name "skb_list_..." has been chosen to match the existing function, "kfree_skb_list, which also operates on these singly-linked lists, and the "..._walk_safe" part is the same idiom as elsewhere in the kernel.
This patch removes the helper from wireguard and puts it into linux/skbuff.h, while making it a bit more robust for general usage. In particular, parenthesis are added around the macro argument usage, and it now accounts for trying to iterate through an already-null skb pointer, which will simply run the iteration zero times. This latter enhancement means it can be used to replace both do { ... } while and while (...) open-coded idioms.
This should take care of these three possible usages, which match all current methods of iterations.
skb_list_walk_safe(segs, skb, next) { ... } skb_list_walk_safe(skb, skb, next) { ... } skb_list_walk_safe(segs, skb, segs) { ... }
Gcc appears to generate efficient code for each of these.
Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net [ Just the skbuff.h changes for backporting - gregkh] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1340,6 +1340,11 @@ static inline void skb_mark_not_on_list( skb->next = NULL; }
+/* Iterate through singly-linked GSO fragments of an skb. */ +#define skb_list_walk_safe(first, skb, next) \ + for ((skb) = (first), (next) = (skb) ? (skb)->next : NULL; (skb); \ + (skb) = (next), (next) = (skb) ? (skb)->next : NULL) + static inline void skb_list_del_init(struct sk_buff *skb) { __list_del_entry(&skb->list);
From: Jason A. Donenfeld Jason@zx2c4.com
commit 5eee7bd7e245914e4e050c413dfe864e31805207 upstream.
This worked before, because we made all callers name their next pointer "next". But in trying to be more "drop-in" ready, the silliness here is revealed. This commit fixes the problem by making the macro argument and the member use different names.
Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1341,9 +1341,9 @@ static inline void skb_mark_not_on_list( }
/* Iterate through singly-linked GSO fragments of an skb. */ -#define skb_list_walk_safe(first, skb, next) \ - for ((skb) = (first), (next) = (skb) ? (skb)->next : NULL; (skb); \ - (skb) = (next), (next) = (skb) ? (skb)->next : NULL) +#define skb_list_walk_safe(first, skb, next_skb) \ + for ((skb) = (first), (next_skb) = (skb) ? (skb)->next : NULL; (skb); \ + (skb) = (next_skb), (next_skb) = (skb) ? (skb)->next : NULL)
static inline void skb_list_del_init(struct sk_buff *skb) {
From: Aya Levin ayal@nvidia.com
[ Upstream commit b210de4f8c97d57de051e805686248ec4c6cfc52 ]
There are cases where GSO segment's length exceeds the egress MTU: - Forwarding of a TCP GRO skb, when DF flag is not set. - Forwarding of an skb that arrived on a virtualisation interface (virtio-net/vhost/tap) with TSO/GSO size set by other network stack. - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an interface with a smaller MTU. - Arriving GRO skb (or GSO skb in a virtualised environment) that is bridged to a NETIF_F_TSO tunnel stacked over an interface with an insufficient MTU.
If so: - Consume the SKB and its segments. - Issue an ICMP packet with 'Packet Too Big' message containing the MTU, allowing the source host to reduce its Path MTU appropriately.
Note: These cases are handled in the same manner in IPv4 output finish. This patch aligns the behavior of IPv6 and the one of IPv4.
Fixes: 9e50849054a4 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation") Signed-off-by: Aya Levin ayal@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -128,8 +128,42 @@ static int ip6_finish_output2(struct net return -EINVAL; }
+static int +ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk, + struct sk_buff *skb, unsigned int mtu) +{ + struct sk_buff *segs, *nskb; + netdev_features_t features; + int ret = 0; + + /* Please see corresponding comment in ip_finish_output_gso + * describing the cases where GSO segment length exceeds the + * egress MTU. + */ + features = netif_skb_features(skb); + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); + if (IS_ERR_OR_NULL(segs)) { + kfree_skb(skb); + return -ENOMEM; + } + + consume_skb(skb); + + skb_list_walk_safe(segs, segs, nskb) { + int err; + + skb_mark_not_on_list(segs); + err = ip6_fragment(net, sk, segs, ip6_finish_output2); + if (err && ret == 0) + ret = err; + } + + return ret; +} + static int ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb) { + unsigned int mtu; int ret;
ret = BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb); @@ -146,7 +180,11 @@ static int ip6_finish_output(struct net } #endif
- if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) || + mtu = ip6_skb_dst_mtu(skb); + if (skb_is_gso(skb) && !skb_gso_validate_mtu(skb, mtu)) + return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); + + if ((skb->len > mtu && !skb_is_gso(skb)) || dst_allfrag(skb_dst(skb)) || (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(net, sk, skb, ip6_finish_output2);
From: Michael Hennerich michael.hennerich@analog.com
commit 4d163ad79b155c71bf30366dc38f8d2502f78844 upstream.
The issue is that using SPI from a callback under the CCF lock will deadlock, since this code uses clk_get_rate().
Fixes: c474b38665463 ("spi: Add driver for Cadence SPI controller") Signed-off-by: Michael Hennerich michael.hennerich@analog.com Signed-off-by: Alexandru Ardelean alexandru.ardelean@analog.com Link: https://lore.kernel.org/r/20210114154217.51996-1-alexandru.ardelean@analog.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/spi/spi-cadence.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/spi/spi-cadence.c +++ b/drivers/spi/spi-cadence.c @@ -119,6 +119,7 @@ struct cdns_spi { void __iomem *regs; struct clk *ref_clk; struct clk *pclk; + unsigned int clk_rate; u32 speed_hz; const u8 *txbuf; u8 *rxbuf; @@ -258,7 +259,7 @@ static void cdns_spi_config_clock_freq(s u32 ctrl_reg, baud_rate_val; unsigned long frequency;
- frequency = clk_get_rate(xspi->ref_clk); + frequency = xspi->clk_rate;
ctrl_reg = cdns_spi_read(xspi, CDNS_SPI_CR);
@@ -628,8 +629,9 @@ static int cdns_spi_probe(struct platfor master->auto_runtime_pm = true; master->mode_bits = SPI_CPOL | SPI_CPHA;
+ xspi->clk_rate = clk_get_rate(xspi->ref_clk); /* Set to default valid value */ - master->max_speed_hz = clk_get_rate(xspi->ref_clk) / 4; + master->max_speed_hz = xspi->clk_rate / 4; xspi->speed_hz = master->max_speed_hz;
master->bits_per_word_mask = SPI_BPW_MASK(8);
On Fri, 22 Jan 2021 at 19:45, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
arm64 clang-10 builds breaks due to this patch on - stable-rc 4.14 - stable-rc 4.9 - stable-rc 4.4
Will Deacon will@kernel.org compiler.h: Raise minimum version of GCC to 5.1 for arm64
arm64 (defconfig) with clang-10 - FAILED
On Fri, 22 Jan 2021 at 20:38, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- 'HOSTCC=sccache clang' 'CC=sccache clang' In file included from <built-in>:1: include/linux/kconfig.h:74: include/linux/compiler_types.h:58: include/linux/compiler-gcc.h:160:3: error: Sorry, your version of GCC is too old - please use 5.1 or newer. # error Sorry, your version of GCC is too old - please use 5.1 or newer. ^ 1 error generated.
build error link: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/9804890...
- Naresh
On Fri, Jan 22, 2021 at 08:43:18PM +0530, Naresh Kamboju wrote:
Urgh, looks like we need backports of 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") then.
Greg -- please drop my changes from 4.14, 4.9 and 4.4 for now and I'll look at this next week.
Cheers,
Will
On Fri, Jan 22, 2021 at 03:36:04PM +0000, Will Deacon wrote:
That backport is going to be pretty gnarly, there was a pretty decent tailwind of fixes around that patch IIRC.
The simple solution would be to stick a !defined(__clang__) in that preprocessor conditional so that it truly fires only for GCC.
Cheers, Nathan
On Fri, Jan 22, 2021 at 7:42 AM Nathan Chancellor natechancellor@gmail.com wrote:
I agree with that approach.
On Fri, 22 Jan 2021 at 19:45, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
MIPS: cavium_octeon_defconfig and nlm_xlp_defconfig builds breaks due to this patch on 4.14, 4.9 and 4.4
Al Viro viro@zeniv.linux.org.uk MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Build error: arch/mips/kernel/binfmt_elfo32.c:116: /arch/mips/kernel/../../../fs/binfmt_elf.c: In function 'fill_siginfo_note': /arch/mips/kernel/../../../fs/binfmt_elf.c:1575:23: error: passing argument 1 of 'copy_siginfo_to_user' from incompatible pointer type [-Werror=incompatible-pointer-types] copy_siginfo_to_user((user_siginfo_t __user *) csigdata, siginfo); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MIPS build failed * gcc-10-cavium_octeon_defconfig - FAILED * gcc-10-nlm_xlp_defconfig - FAILED * gcc-8-cavium_octeon_defconfig - FAILED * gcc-8-nlm_xlp_defconfig - FAILED * gcc-9-cavium_octeon_defconfig - FAILED * gcc-9-nlm_xlp_defconfig - FAILED
Build log link, https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc/-/jobs/9804890...
linux-stable-mirror@lists.linaro.org