This is the start of the stable review cycle for the 4.14.49 release. There are 41 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon Jun 11 15:29:07 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.49-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.49-rc1
Dexuan Cui decui@microsoft.com PCI: hv: Do not wait forever on a device that has disappeared
Paul Blakey paulb@mellanox.com cls_flower: Fix incorrect idr release when failing to modify rule
Eric Dumazet edumazet@google.com rtnetlink: validate attributes in do_setlink()
Jason Wang jasowang@redhat.com virtio-net: fix leaking page for gso packet during mergeable XDP
Eran Ben Elisha eranbe@mellanox.com net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
Jason Wang jasowang@redhat.com virtio-net: correctly check num_buf during err path
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp tun: Fix NULL pointer dereference in XDP redirect
Jack Morgenstein jackm@dev.mellanox.co.il net/mlx4: Fix irq-unsafe spinlock usage
Jason Wang jasowang@redhat.com virtio-net: correctly transmit XDP buff after linearizing
Alexander Duyck alexander.h.duyck@intel.com net-sysfs: Fix memory leak in XPS configuration
Florian Fainelli f.fainelli@gmail.com net: phy: broadcom: Fix auxiliary control register reads
Mathieu Xhonneux m.xhonneux@gmail.com ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline
Stephen Suryaputra ssuryaextr@gmail.com vrf: check the original netdevice for generating redirect
Jason Wang jasowang@redhat.com vhost: synchronize IOTLB message with dev cleanup
Dan Carpenter dan.carpenter@oracle.com team: use netdev_features_t instead of u32
Xin Long lucien.xin@gmail.com sctp: not allow transport timeout value less than HZ/5 for hb_timer
Shahed Shaikh shahed.shaikh@cavium.com qed: Fix mask for physical address in ILT entry
Willem de Bruijn willemb@google.com packet: fix reserve calculation
Daniele Palmas dnlplm@gmail.com net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
Florian Fainelli f.fainelli@gmail.com net: phy: broadcom: Fix bcm_write_exp()
Eric Dumazet edumazet@google.com net/packet: refine check for priv area size
Eric Dumazet edumazet@google.com net: metrics: add proper netlink validation
Roopa Prabhu roopa@cumulusnetworks.com net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
Cong Wang xiyou.wangcong@gmail.com netdev-FAQ: clarify DaveM's position for stable backports
Kirill Tkhai ktkhai@virtuozzo.com kcm: Fix use-after-free caused by clonned sockets
Wenwen Wang wang6495@umn.edu isdn: eicon: fix a missing-check bug
Michal Kubecek mkubecek@suse.cz ipv6: omit traffic class when calculating flow hash
Willem de Bruijn willemb@google.com ipv4: remove warning in ip_recv_error
Eric Dumazet edumazet@google.com ipmr: properly check rhltable_init() return value
Nicolas Dichtel nicolas.dichtel@6wind.com ip6_tunnel: remove magic mtu value 0xFFF8
Sabrina Dubroca sd@queasysnail.net ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
Govindarajulu Varadarajan gvaradar@cisco.com enic: set DMA mask to 47 bit
Alexey Kodanev alexey.kodanev@oracle.com dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()
Julia Lawall Julia.Lawall@lip6.fr bnx2x: use the right constant
Suresh Reddy suresh.reddy@broadcom.com be2net: Fix error detection logic for BE3
Nathan Chancellor natechancellor@gmail.com kconfig: Avoid format overflow warning from GCC 8.1
Anand Jain Anand.Jain@oracle.com btrfs: define SUPER_FLAG_METADUMP_V2
Linus Torvalds torvalds@linux-foundation.org mmap: relax file size limit for regular files
Linus Torvalds torvalds@linux-foundation.org mmap: introduce sane default mmap limits
Bart Van Assche bart.vanassche@wdc.com scsi: sd_zbc: Avoid that resetting a zone fails sporadically
Damien Le Moal damien.lemoal@wdc.com scsi: sd_zbc: Fix potential memory leak
-------------
Diffstat:
Documentation/networking/netdev-FAQ.txt | 9 ++ Makefile | 4 +- drivers/isdn/hardware/eicon/diva.c | 22 ++-- drivers/isdn/hardware/eicon/diva.h | 5 +- drivers/isdn/hardware/eicon/divasmain.c | 18 ++-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 2 +- drivers/net/ethernet/cisco/enic/enic_main.c | 8 +- drivers/net/ethernet/emulex/benet/be_main.c | 4 +- drivers/net/ethernet/mellanox/mlx4/qp.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 42 ++++++++ drivers/net/ethernet/qlogic/qed/qed_cxt.c | 2 +- drivers/net/phy/bcm-cygnus.c | 6 +- drivers/net/phy/bcm-phy-lib.c | 2 +- drivers/net/phy/bcm-phy-lib.h | 7 ++ drivers/net/phy/bcm7xxx.c | 4 +- drivers/net/team/team.c | 3 +- drivers/net/tun.c | 15 +-- drivers/net/usb/cdc_mbim.c | 2 +- drivers/net/virtio_net.c | 19 ++-- drivers/pci/host/pci-hyperv.c | 46 +++++--- drivers/scsi/sd_zbc.c | 128 ++++++++++++++--------- drivers/vhost/vhost.c | 3 + fs/btrfs/disk-io.c | 3 +- include/net/ipv6.h | 5 + include/uapi/linux/btrfs_tree.h | 1 + mm/mmap.c | 32 ++++++ net/core/flow_dissector.c | 2 +- net/core/net-sysfs.c | 6 +- net/core/rtnetlink.c | 8 +- net/dccp/proto.c | 2 - net/ipv4/fib_frontend.c | 1 + net/ipv4/fib_semantics.c | 4 + net/ipv4/ip_sockglue.c | 2 - net/ipv4/ipmr.c | 7 +- net/ipv6/ip6_output.c | 3 +- net/ipv6/ip6_tunnel.c | 11 +- net/ipv6/ip6mr.c | 3 +- net/ipv6/ndisc.c | 6 ++ net/ipv6/route.c | 2 +- net/ipv6/seg6_iptunnel.c | 4 +- net/ipv6/sit.c | 5 +- net/kcm/kcmsock.c | 2 +- net/packet/af_packet.c | 4 +- net/sched/cls_flower.c | 2 +- net/sctp/transport.c | 2 +- scripts/kconfig/confdata.c | 2 +- 46 files changed, 329 insertions(+), 145 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal damien.lemoal@wdc.com
commit 4b433924b2755a94f99258c178684a0e05c344de upstream.
Rework sd_zbc_check_zone_size() to avoid a memory leak due to an early return if sd_zbc_report_zones() fails.
Reported-by: David.butterfield david.butterfield@wdc.com Signed-off-by: Damien Le Moal damien.lemoal@wdc.com Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche bart.vanassche@wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/sd_zbc.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-)
--- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -425,7 +425,7 @@ static int sd_zbc_check_capacity(struct
static int sd_zbc_check_zone_size(struct scsi_disk *sdkp) { - u64 zone_blocks; + u64 zone_blocks = 0; sector_t block = 0; unsigned char *buf; unsigned char *rec; @@ -443,10 +443,8 @@ static int sd_zbc_check_zone_size(struct
/* Do a report zone to get the same field */ ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, 0); - if (ret) { - zone_blocks = 0; - goto out; - } + if (ret) + goto out_free;
same = buf[4] & 0x0f; if (same > 0) { @@ -489,7 +487,7 @@ static int sd_zbc_check_zone_size(struct ret = sd_zbc_report_zones(sdkp, buf, SD_ZBC_BUF_SIZE, block); if (ret) - return ret; + goto out_free; }
} while (block < sdkp->capacity); @@ -497,34 +495,32 @@ static int sd_zbc_check_zone_size(struct zone_blocks = sdkp->zone_blocks;
out: - kfree(buf); - if (!zone_blocks) { if (sdkp->first_scan) sd_printk(KERN_NOTICE, sdkp, "Devices with non constant zone " "size are not supported\n"); - return -ENODEV; - } - - if (!is_power_of_2(zone_blocks)) { + ret = -ENODEV; + } else if (!is_power_of_2(zone_blocks)) { if (sdkp->first_scan) sd_printk(KERN_NOTICE, sdkp, "Devices with non power of 2 zone " "size are not supported\n"); - return -ENODEV; - } - - if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) { + ret = -ENODEV; + } else if (logical_to_sectors(sdkp->device, zone_blocks) > UINT_MAX) { if (sdkp->first_scan) sd_printk(KERN_NOTICE, sdkp, "Zone size too large\n"); - return -ENODEV; + ret = -ENODEV; + } else { + sdkp->zone_blocks = zone_blocks; + sdkp->zone_shift = ilog2(zone_blocks); }
- sdkp->zone_blocks = zone_blocks; +out_free: + kfree(buf);
- return 0; + return ret; }
static int sd_zbc_setup(struct scsi_disk *sdkp)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bart.vanassche@wdc.com
commit ccce20fc7968d546fb1e8e147bf5cdc8afc4278a upstream.
Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is called from sd_probe_async() and since sd_revalidate_disk() calls sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called concurrently with blkdev_report_zones() and/or blkdev_reset_zones(). That can cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it to the actual value, even if no drive characteristics have changed. Avoid that this can happen by making the following changes:
- Protect the code that updates zone information with blk_queue_enter() and blk_queue_exit(). - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that these functions do not modify struct scsi_disk before all zone information has been obtained.
Note: since commit 055f6e18e08f ("block: Make q_usage_counter also track legacy requests"; kernel v4.15) the request queue freezing mechanism also affects legacy request queues.
Fixes: 89d947561077 ("sd: Implement support for ZBC devices") Signed-off-by: Bart Van Assche bart.vanassche@wdc.com Cc: Jens Axboe axboe@kernel.dk Cc: Damien Le Moal damien.lemoal@wdc.com Cc: Christoph Hellwig hch@lst.de Cc: Hannes Reinecke hare@suse.com Cc: stable@vger.kernel.org # v4.16 Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/sd_zbc.c | 98 ++++++++++++++++++++++++++++++++------------------ 1 file changed, 63 insertions(+), 35 deletions(-)
--- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -423,7 +423,16 @@ static int sd_zbc_check_capacity(struct
#define SD_ZBC_BUF_SIZE 131072
-static int sd_zbc_check_zone_size(struct scsi_disk *sdkp) +/** + * sd_zbc_check_zone_size - Check the device zone sizes + * @sdkp: Target disk + * + * Check that all zones of the device are equal. The last zone can however + * be smaller. The zone size must also be a power of two number of LBAs. + * + * Returns the zone size in bytes upon success or an error code upon failure. + */ +static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp) { u64 zone_blocks = 0; sector_t block = 0; @@ -434,8 +443,6 @@ static int sd_zbc_check_zone_size(struct int ret; u8 same;
- sdkp->zone_blocks = 0; - /* Get a buffer */ buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL); if (!buf) @@ -470,16 +477,17 @@ static int sd_zbc_check_zone_size(struct
/* Parse zone descriptors */ while (rec < buf + buf_len) { - zone_blocks = get_unaligned_be64(&rec[8]); - if (sdkp->zone_blocks == 0) { - sdkp->zone_blocks = zone_blocks; - } else if (zone_blocks != sdkp->zone_blocks && - (block + zone_blocks < sdkp->capacity - || zone_blocks > sdkp->zone_blocks)) { + u64 this_zone_blocks = get_unaligned_be64(&rec[8]); + + if (zone_blocks == 0) { + zone_blocks = this_zone_blocks; + } else if (this_zone_blocks != zone_blocks && + (block + this_zone_blocks < sdkp->capacity + || this_zone_blocks > zone_blocks)) { zone_blocks = 0; goto out; } - block += zone_blocks; + block += this_zone_blocks; rec += 64; }
@@ -492,8 +500,6 @@ static int sd_zbc_check_zone_size(struct
} while (block < sdkp->capacity);
- zone_blocks = sdkp->zone_blocks; - out: if (!zone_blocks) { if (sdkp->first_scan) @@ -513,8 +519,7 @@ out: "Zone size too large\n"); ret = -ENODEV; } else { - sdkp->zone_blocks = zone_blocks; - sdkp->zone_shift = ilog2(zone_blocks); + ret = zone_blocks; }
out_free: @@ -523,23 +528,44 @@ out_free: return ret; }
-static int sd_zbc_setup(struct scsi_disk *sdkp) +static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks) { + struct request_queue *q = sdkp->disk->queue; + u32 zone_shift = ilog2(zone_blocks); + u32 nr_zones;
/* chunk_sectors indicates the zone size */ - blk_queue_chunk_sectors(sdkp->disk->queue, - logical_to_sectors(sdkp->device, sdkp->zone_blocks)); - sdkp->zone_shift = ilog2(sdkp->zone_blocks); - sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift; - if (sdkp->capacity & (sdkp->zone_blocks - 1)) - sdkp->nr_zones++; - - if (!sdkp->zones_wlock) { - sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones), - sizeof(unsigned long), - GFP_KERNEL); - if (!sdkp->zones_wlock) - return -ENOMEM; + blk_queue_chunk_sectors(q, + logical_to_sectors(sdkp->device, zone_blocks)); + nr_zones = round_up(sdkp->capacity, zone_blocks) >> zone_shift; + + /* + * Initialize the disk zone write lock bitmap if the number + * of zones changed. + */ + if (nr_zones != sdkp->nr_zones) { + unsigned long *zones_wlock = NULL; + + if (nr_zones) { + zones_wlock = kcalloc(BITS_TO_LONGS(nr_zones), + sizeof(unsigned long), + GFP_KERNEL); + if (!zones_wlock) + return -ENOMEM; + } + + blk_mq_freeze_queue(q); + sdkp->zone_blocks = zone_blocks; + sdkp->zone_shift = zone_shift; + sdkp->nr_zones = nr_zones; + swap(sdkp->zones_wlock, zones_wlock); + blk_mq_unfreeze_queue(q); + + kfree(zones_wlock); + + /* READ16/WRITE16 is mandatory for ZBC disks */ + sdkp->device->use_16_for_rw = 1; + sdkp->device->use_10_for_rw = 0; }
return 0; @@ -548,6 +574,7 @@ static int sd_zbc_setup(struct scsi_disk int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { + int64_t zone_blocks; int ret;
if (!sd_is_zoned(sdkp)) @@ -585,19 +612,19 @@ int sd_zbc_read_zones(struct scsi_disk * * Check zone size: only devices with a constant zone size (except * an eventual last runt zone) that is a power of 2 are supported. */ - ret = sd_zbc_check_zone_size(sdkp); - if (ret) + zone_blocks = sd_zbc_check_zone_size(sdkp); + ret = -EFBIG; + if (zone_blocks != (u32)zone_blocks) + goto err; + ret = zone_blocks; + if (ret < 0) goto err;
/* The drive satisfies the kernel restrictions: set it up */ - ret = sd_zbc_setup(sdkp); + ret = sd_zbc_setup(sdkp, zone_blocks); if (ret) goto err;
- /* READ16/WRITE16 is mandatory for ZBC disks */ - sdkp->device->use_16_for_rw = 1; - sdkp->device->use_10_for_rw = 0; - return 0;
err: @@ -610,6 +637,7 @@ void sd_zbc_remove(struct scsi_disk *sdk { kfree(sdkp->zones_wlock); sdkp->zones_wlock = NULL; + sdkp->nr_zones = 0; }
void sd_zbc_print_zones(struct scsi_disk *sdkp)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit be83bbf806822b1b89e0a0f23cd87cddc409e429 upstream.
The internal VM "mmap()" interfaces are based on the mmap target doing everything using page indexes rather than byte offsets, because traditionally (ie 32-bit) we had the situation that the byte offset didn't fit in a register. So while the mmap virtual address was limited by the word size of the architecture, the backing store was not.
So we're basically passing "pgoff" around as a page index, in order to be able to describe backing store locations that are much bigger than the word size (think files larger than 4GB etc).
But while this all makes a ton of sense conceptually, we've been dogged by various drivers that don't really understand this, and internally work with byte offsets, and then try to work with the page index by turning it into a byte offset with "pgoff << PAGE_SHIFT".
Which obviously can overflow.
Adding the size of the mapping to it to get the byte offset of the end of the backing store just exacerbates the problem, and if you then use this overflow-prone value to check various limits of your device driver mmap capability, you're just setting yourself up for problems.
The correct thing for drivers to do is to do their limit math in page indices, the way the interface is designed. Because the generic mmap code _does_ test that the index doesn't overflow, since that's what the mmap code really cares about.
HOWEVER.
Finding and fixing various random drivers is a sisyphean task, so let's just see if we can just make the core mmap() code do the limiting for us. Realistically, the only "big" backing stores we need to care about are regular files and block devices, both of which are known to do this properly, and which have nice well-defined limits for how much data they can access.
So let's special-case just those two known cases, and then limit other random mmap users to a backing store that still fits in "unsigned long". Realistically, that's not much of a limit at all on 64-bit, and on 32-bit architectures the only worry might be the GPU drivers, which can have big physical address spaces.
To make it possible for drivers like that to say that they are 64-bit clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the file flags to allow drivers to mark their file descriptors as safe in the full 64-bit mmap address space.
[ The timing for doing this is less than optimal, and this should really go in a merge window. But realistically, this needs wide testing more than it needs anything else, and being main-line is the only way to do that.
So the earlier the better, even if it's outside the proper development cycle - Linus ]
Cc: Kees Cook keescook@chromium.org Cc: Dan Carpenter dan.carpenter@oracle.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Willy Tarreau w@1wt.eu Cc: Dave Airlie airlied@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/mmap.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -1315,6 +1315,35 @@ static inline int mlock_future_check(str return 0; }
+static inline u64 file_mmap_size_max(struct file *file, struct inode *inode) +{ + if (S_ISREG(inode->i_mode)) + return inode->i_sb->s_maxbytes; + + if (S_ISBLK(inode->i_mode)) + return MAX_LFS_FILESIZE; + + /* Special "we do even unsigned file positions" case */ + if (file->f_mode & FMODE_UNSIGNED_OFFSET) + return 0; + + /* Yes, random drivers might want more. But I'm tired of buggy drivers */ + return ULONG_MAX; +} + +static inline bool file_mmap_ok(struct file *file, struct inode *inode, + unsigned long pgoff, unsigned long len) +{ + u64 maxsize = file_mmap_size_max(file, inode); + + if (maxsize && len > maxsize) + return false; + maxsize -= len; + if (pgoff > maxsize >> PAGE_SHIFT) + return false; + return true; +} + /* * The caller must hold down_write(¤t->mm->mmap_sem). */ @@ -1388,6 +1417,9 @@ unsigned long do_mmap(struct file *file, if (file) { struct inode *inode = file_inode(file);
+ if (!file_mmap_ok(file, inode, pgoff, len)) + return -EOVERFLOW; + switch (flags & MAP_TYPE) { case MAP_SHARED: if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 423913ad4ae5b3e8fb8983f70969fb522261ba26 upstream.
Commit be83bbf80682 ("mmap: introduce sane default mmap limits") was introduced to catch problems in various ad-hoc character device drivers doing mmap and getting the size limits wrong. In the process, it used "known good" limits for the normal cases of mapping regular files and block device drivers.
It turns out that the "s_maxbytes" limit was less "known good" than I thought. In particular, /proc doesn't set it, but exposes one regular file to mmap: /proc/vmcore. As a result, that file got limited to the default MAX_INT s_maxbytes value.
This went unnoticed for a while, because apparently the only thing that needs it is the s390 kernel zfcpdump, but there might be other tools that use this too.
Vasily suggested just changing s_maxbytes for all of /proc, which isn't wrong, but makes me nervous at this stage. So instead, just make the new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't affect anything else. It wasn't the regular file case I was worried about.
I'd really prefer for maxsize to have been per-inode, but that is not how things are today.
Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits") Reported-by: Vasily Gorbik gor@linux.ibm.com Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -1318,7 +1318,7 @@ static inline int mlock_future_check(str static inline u64 file_mmap_size_max(struct file *file, struct inode *inode) { if (S_ISREG(inode->i_mode)) - return inode->i_sb->s_maxbytes; + return MAX_LFS_FILESIZE;
if (S_ISBLK(inode->i_mode)) return MAX_LFS_FILESIZE;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anand Jain Anand.Jain@oracle.com
commit e2731e55884f2138a252b0a3d7b24d57e49c3c59 upstream.
btrfs-progs uses super flag bit BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34). So just define that in kernel so that we know its been used.
Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/disk-io.c | 3 ++- include/uapi/linux/btrfs_tree.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -59,7 +59,8 @@ BTRFS_HEADER_FLAG_RELOC |\ BTRFS_SUPER_FLAG_ERROR |\ BTRFS_SUPER_FLAG_SEEDING |\ - BTRFS_SUPER_FLAG_METADUMP) + BTRFS_SUPER_FLAG_METADUMP |\ + BTRFS_SUPER_FLAG_METADUMP_V2)
static const struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -456,6 +456,7 @@ struct btrfs_free_space_header {
#define BTRFS_SUPER_FLAG_SEEDING (1ULL << 32) #define BTRFS_SUPER_FLAG_METADUMP (1ULL << 33) +#define BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34)
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suresh Reddy suresh.reddy@broadcom.com
[ Upstream commit d2c2725c2cdbcc108a191f50953d31c7b6556761 ]
Check for 0xE00 (RECOVERABLE_ERR) along with ARMFW UE (0x0) in be_detect_error() to know whether the error is valid error or not
Fixes: 673c96e5a ("be2net: Fix UE detection logic for BE3") Signed-off-by: Suresh Reddy suresh.reddy@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/emulex/benet/be_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -3294,7 +3294,9 @@ void be_detect_error(struct be_adapter * if ((val & POST_STAGE_FAT_LOG_START) != POST_STAGE_FAT_LOG_START && (val & POST_STAGE_ARMFW_UE) - != POST_STAGE_ARMFW_UE) + != POST_STAGE_ARMFW_UE && + (val & POST_STAGE_RECOVERABLE_ERR) + != POST_STAGE_RECOVERABLE_ERR) return; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julia Lawall Julia.Lawall@lip6.fr
[ Upstream commit dd612f18a49b63af8b3a5f572d999bdb197385bc ]
Nearby code that also tests port suggests that the P0 constant should be used when port is zero.
The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/)
// <smpl> @@ expression e,e1; @@
* e ? e1 : e1 // </smpl>
Fixes: 6c3218c6f7e5 ("bnx2x: Adjust ETS to 578xx") Signed-off-by: Julia Lawall Julia.Lawall@lip6.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -588,7 +588,7 @@ static void bnx2x_ets_e3b0_nig_disabled( * slots for the highest priority. */ REG_WR(bp, (port) ? NIG_REG_P1_TX_ARB_NUM_STRICT_ARB_SLOTS : - NIG_REG_P1_TX_ARB_NUM_STRICT_ARB_SLOTS, 0x100); + NIG_REG_P0_TX_ARB_NUM_STRICT_ARB_SLOTS, 0x100); /* Mapping between the CREDIT_WEIGHT registers and actual client * numbers */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit 2677d20677314101293e6da0094ede7b5526d2b1 ]
Syzbot reported the use-after-free in timer_is_static_object() [1].
This can happen because the structure for the rto timer (ccid2_hc_tx_sock) is removed in dccp_disconnect(), and ccid2_hc_tx_rto_expire() can be called after that.
The report [1] is similar to the one in commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"). And the fix is the same, delay freeing ccid2_hc_tx_sock structure, so that it is freed in dccp_sk_destruct().
[1]
================================================================== BUG: KASAN: use-after-free in timer_is_static_object+0x80/0x90 kernel/time/timer.c:607 Read of size 8 at addr ffff8801bebb5118 by task syz-executor2/25299
CPU: 1 PID: 25299 Comm: syz-executor2 Not tainted 4.17.0-rc5+ #54 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 timer_is_static_object+0x80/0x90 kernel/time/timer.c:607 debug_object_activate+0x2d9/0x670 lib/debugobjects.c:508 debug_timer_activate kernel/time/timer.c:709 [inline] debug_activate kernel/time/timer.c:764 [inline] __mod_timer kernel/time/timer.c:1041 [inline] mod_timer+0x4d3/0x13b0 kernel/time/timer.c:1102 sk_reset_timer+0x22/0x60 net/core/sock.c:2742 ccid2_hc_tx_rto_expire+0x587/0x680 net/dccp/ccids/ccid2.c:147 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> ... Allocated by task 25374: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 ccid_new+0x25b/0x3e0 net/dccp/ccid.c:151 dccp_hdlr_ccid+0x27/0x150 net/dccp/feat.c:44 __dccp_feat_activate+0x184/0x270 net/dccp/feat.c:344 dccp_feat_activate_values+0x3a7/0x819 net/dccp/feat.c:1538 dccp_create_openreq_child+0x472/0x610 net/dccp/minisocks.c:128 dccp_v4_request_recv_sock+0x12c/0xca0 net/dccp/ipv4.c:408 dccp_v6_request_recv_sock+0x125d/0x1f10 net/dccp/ipv6.c:415 dccp_check_req+0x455/0x6a0 net/dccp/minisocks.c:197 dccp_v4_rcv+0x7b8/0x1f3f net/dccp/ipv4.c:841 ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657 process_backlog+0x219/0x760 net/core/dev.c:5337 napi_poll net/core/dev.c:5735 [inline] net_rx_action+0x7b7/0x1930 net/core/dev.c:5801 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
Freed by task 25374: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 ccid_hc_tx_delete+0xc3/0x100 net/dccp/ccid.c:190 dccp_disconnect+0x130/0xc66 net/dccp/proto.c:286 dccp_close+0x3bc/0xe60 net/dccp/proto.c:1045 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460 sock_release+0x96/0x1b0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x34d/0x890 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1e4/0x290 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8801bebb4cc0 which belongs to the cache ccid2_hc_tx_sock of size 1240 The buggy address is located 1112 bytes inside of 1240-byte region [ffff8801bebb4cc0, ffff8801bebb5198) The buggy address belongs to the page: page:ffffea0006faed00 count:1 mapcount:0 mapping:ffff8801bebb41c0 index:0xffff8801bebb5240 compound_mapcount: 0 flags: 0x2fffc0000008100(slab|head) raw: 02fffc0000008100 ffff8801bebb41c0 ffff8801bebb5240 0000000100000003 raw: ffff8801cdba3138 ffffea0007634120 ffff8801cdbaab40 0000000000000000 page dumped because: kasan: bad access detected ... ==================================================================
Reported-by: syzbot+5d47e9ec91a6f15dbd6f@syzkaller.appspotmail.com Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/proto.c | 2 -- 1 file changed, 2 deletions(-)
--- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -280,9 +280,7 @@ int dccp_disconnect(struct sock *sk, int
dccp_clear_xmit_timers(sk); ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); - ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); dp->dccps_hc_rx_ccid = NULL; - dp->dccps_hc_tx_ccid = NULL;
__skb_queue_purge(&sk->sk_receive_queue); __skb_queue_purge(&sk->sk_write_queue);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Govindarajulu Varadarajan gvaradar@cisco.com
[ Upstream commit 322eaa06d55ebc1402a4a8d140945cff536638b4 ]
In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") DMA mask was changed from 40 bits to 64 bits. Hardware actually supports only 47 bits.
Fixes: 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") Signed-off-by: Govindarajulu Varadarajan gvaradar@cisco.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cisco/enic/enic_main.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -2703,11 +2703,11 @@ static int enic_probe(struct pci_dev *pd pci_set_master(pdev);
/* Query PCI controller on system for DMA addressing - * limitation for the device. Try 64-bit first, and + * limitation for the device. Try 47-bit first, and * fail to 32-bit. */
- err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + err = pci_set_dma_mask(pdev, DMA_BIT_MASK(47)); if (err) { err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); if (err) { @@ -2721,10 +2721,10 @@ static int enic_probe(struct pci_dev *pd goto err_out_release_regions; } } else { - err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(47)); if (err) { dev_err(dev, "Unable to obtain %u-bit DMA " - "for consistent allocations, aborting\n", 64); + "for consistent allocations, aborting\n", 47); goto err_out_release_regions; } using_dac = 1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 848235edb5c93ed086700584c8ff64f6d7fc778d ]
Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same setsockopt will fail with -ENOENT, since we haven't actually created that table.
A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups").
Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6mr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1795,7 +1795,8 @@ int ip6_mroute_setsockopt(struct sock *s ret = 0; if (!ip6mr_new_table(net, v)) ret = -ENOMEM; - raw6_sk(sk)->ip6mr_table = v; + else + raw6_sk(sk)->ip6mr_table = v; rtnl_unlock(); return ret; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolas Dichtel nicolas.dichtel@6wind.com
[ Upstream commit f7ff1fde9441b4fcc8ffb6e66e6e5a00d008937e ]
I don't know where this value comes from (probably a copy and paste and paste and paste ...). Let's use standard values which are a bit greater.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/co... Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_tunnel.c | 11 ++++++++--- net/ipv6/sit.c | 5 +++-- 2 files changed, 11 insertions(+), 5 deletions(-)
--- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1693,8 +1693,13 @@ int ip6_tnl_change_mtu(struct net_device if (new_mtu < ETH_MIN_MTU) return -EINVAL; } - if (new_mtu > 0xFFF8 - dev->hard_header_len) - return -EINVAL; + if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) { + if (new_mtu > IP6_MAX_MTU - dev->hard_header_len) + return -EINVAL; + } else { + if (new_mtu > IP_MAX_MTU - dev->hard_header_len) + return -EINVAL; + } dev->mtu = new_mtu; return 0; } @@ -1842,7 +1847,7 @@ ip6_tnl_dev_init_gen(struct net_device * if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; dev->min_mtu = ETH_MIN_MTU; - dev->max_mtu = 0xFFF8 - dev->hard_header_len; + dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len;
return 0;
--- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1360,7 +1360,7 @@ static void ipip6_tunnel_setup(struct ne dev->hard_header_len = LL_MAX_HEADER + t_hlen; dev->mtu = ETH_DATA_LEN - t_hlen; dev->min_mtu = IPV6_MIN_MTU; - dev->max_mtu = 0xFFF8 - t_hlen; + dev->max_mtu = IP6_MAX_MTU - t_hlen; dev->flags = IFF_NOARP; netif_keep_dst(dev); dev->addr_len = 4; @@ -1572,7 +1572,8 @@ static int ipip6_newlink(struct net *src if (tb[IFLA_MTU]) { u32 mtu = nla_get_u32(tb[IFLA_MTU]);
- if (mtu >= IPV6_MIN_MTU && mtu <= 0xFFF8 - dev->hard_header_len) + if (mtu >= IPV6_MIN_MTU && + mtu <= IP6_MAX_MTU - dev->hard_header_len) dev->mtu = mtu; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 66fb33254f45df4b049f487aff1cbde1ef919390 ]
commit 8fb472c09b9d ("ipmr: improve hash scalability") added a call to rhltable_init() without checking its return value.
This problem was then later copied to IPv6 and factorized in commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 31552 Comm: syz-executor7 Not tainted 4.17.0-rc5+ #60 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rht_key_hashfn include/linux/rhashtable.h:277 [inline] RIP: 0010:__rhashtable_lookup include/linux/rhashtable.h:630 [inline] RIP: 0010:rhltable_lookup include/linux/rhashtable.h:716 [inline] RIP: 0010:mr_mfc_find_parent+0x2ad/0xbb0 net/ipv4/ipmr_base.c:63 RSP: 0018:ffff8801826aef70 EFLAGS: 00010203 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffc90001ea0000 RDX: 0000000000000079 RSI: ffffffff8661e859 RDI: 000000000000000c RBP: ffff8801826af1c0 R08: ffff8801b2212000 R09: ffffed003b5e46c2 R10: ffffed003b5e46c2 R11: ffff8801daf23613 R12: dffffc0000000000 R13: ffff8801826af198 R14: ffff8801cf8225c0 R15: ffff8801826af658 FS: 00007ff7fa732700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000003ffffff9c CR3: 00000001b0210000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6mr_cache_find_parent net/ipv6/ip6mr.c:981 [inline] ip6mr_mfc_delete+0x1fe/0x6b0 net/ipv6/ip6mr.c:1221 ip6_mroute_setsockopt+0x15c6/0x1d70 net/ipv6/ip6mr.c:1698 do_ipv6_setsockopt.isra.9+0x422/0x4660 net/ipv6/ipv6_sockglue.c:163 ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:922 rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1060 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 8fb472c09b9d ("ipmr: improve hash scalability") Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Nikolay Aleksandrov nikolay@cumulusnetworks.com Cc: Yuval Mintz yuvalm@mellanox.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ipmr.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -323,6 +323,7 @@ static const struct rhashtable_params ip static struct mr_table *ipmr_new_table(struct net *net, u32 id) { struct mr_table *mrt; + int err;
/* "pimreg%u" should not exceed 16 bytes (IFNAMSIZ) */ if (id != RT_TABLE_DEFAULT && id >= 1000000000) @@ -338,7 +339,11 @@ static struct mr_table *ipmr_new_table(s write_pnet(&mrt->net, net); mrt->id = id;
- rhltable_init(&mrt->mfc_hash, &ipmr_rht_params); + err = rhltable_init(&mrt->mfc_hash, &ipmr_rht_params); + if (err) { + kfree(mrt); + return ERR_PTR(err); + } INIT_LIST_HEAD(&mrt->mfc_cache_list); INIT_LIST_HEAD(&mrt->mfc_unres_queue);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit 730c54d59403658a62af6517338fa8d4922c1b28 ]
A precondition check in ip_recv_error triggered on an otherwise benign race. Remove the warning.
The warning triggers when passing an ipv6 socket to this ipv4 error handling function. RaceFuzzer was able to trigger it due to a race in setsockopt IPV6_ADDRFORM.
--- CPU0 do_ipv6_setsockopt sk->sk_socket->ops = &inet_dgram_ops;
--- CPU1 sk->sk_prot->recvmsg udp_recvmsg ip_recv_error WARN_ON_ONCE(sk->sk_family == AF_INET6);
--- CPU0 do_ipv6_setsockopt sk->sk_family = PF_INET;
This socket option converts a v6 socket that is connected to a v4 peer to an v4 socket. It updates the socket on the fly, changing fields in sk as well as other structs. This is inherently non-atomic. It races with the lockless udp_recvmsg path.
No other code makes an assumption that these fields are updated atomically. It is benign here, too, as ip_recv_error cares only about the protocol of the skbs enqueued on the error queue, for which sk_family is not a precise predictor (thanks to another isue with IPV6_ADDRFORM).
Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr Fixes: 7ce875e5ecb8 ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error") Reported-by: DaeRyong Jeong threeearcat@gmail.com Suggested-by: Eric Dumazet edumazet@google.com Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_sockglue.c | 2 -- 1 file changed, 2 deletions(-)
--- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -511,8 +511,6 @@ int ip_recv_error(struct sock *sk, struc int err; int copied;
- WARN_ON_ONCE(sk->sk_family == AF_INET6); - err = -EAGAIN; skb = sock_dequeue_err_skb(sk); if (!skb)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Kubecek mkubecek@suse.cz
[ Upstream commit fa1be7e01ea863e911349e30456706749518eeab ]
Some of the code paths calculating flow hash for IPv6 use flowlabel member of struct flowi6 which, despite its name, encodes both flow label and traffic class. If traffic class changes within a TCP connection (as e.g. ssh does), ECMP route can switch between path. It's also inconsistent with other code paths where ip6_flowlabel() (returning only flow label) is used to feed the key.
Use only flow label everywhere, including one place where hash key is set using ip6_flowinfo().
Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Fixes: f70ea018da06 ("net: Add functions to get skb->hash based on flow structures") Signed-off-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ipv6.h | 5 +++++ net/core/flow_dissector.c | 2 +- net/ipv6/route.c | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-)
--- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -861,6 +861,11 @@ static inline __be32 ip6_make_flowinfo(u return htonl(tclass << IPV6_TCLASS_SHIFT) | flowlabel; }
+static inline __be32 flowi6_get_flowlabel(const struct flowi6 *fl6) +{ + return fl6->flowlabel & IPV6_FLOWLABEL_MASK; +} + /* * Prototypes exported by ipv6 */ --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1179,7 +1179,7 @@ __u32 __get_hash_from_flowi6(const struc keys->ports.src = fl6->fl6_sport; keys->ports.dst = fl6->fl6_dport; keys->keyid.keyid = fl6->fl6_gre_key; - keys->tags.flow_label = (__force u32)fl6->flowlabel; + keys->tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6); keys->basic.ip_proto = fl6->flowi6_proto;
return flow_hash_from_keys(keys); --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1250,7 +1250,7 @@ out: keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; keys->addrs.v6addrs.src = key_iph->saddr; keys->addrs.v6addrs.dst = key_iph->daddr; - keys->tags.flow_label = ip6_flowinfo(key_iph); + keys->tags.flow_label = ip6_flowlabel(key_iph); keys->basic.ip_proto = key_iph->nexthdr; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenwen Wang wang6495@umn.edu
[ Upstream commit 6009d1fe6ba3bb2dab55921da60465329cc1cd89 ]
In divasmain.c, the function divas_write() firstly invokes the function diva_xdi_open_adapter() to open the adapter that matches with the adapter number provided by the user, and then invokes the function diva_xdi_write() to perform the write operation using the matched adapter. The two functions diva_xdi_open_adapter() and diva_xdi_write() are located in diva.c.
In diva_xdi_open_adapter(), the user command is copied to the object 'msg' from the userspace pointer 'src' through the function pointer 'cp_fn', which eventually calls copy_from_user() to do the copy. Then, the adapter number 'msg.adapter' is used to find out a matched adapter from the 'adapter_queue'. A matched adapter will be returned if it is found. Otherwise, NULL is returned to indicate the failure of the verification on the adapter number.
As mentioned above, if a matched adapter is returned, the function diva_xdi_write() is invoked to perform the write operation. In this function, the user command is copied once again from the userspace pointer 'src', which is the same as the 'src' pointer in diva_xdi_open_adapter() as both of them are from the 'buf' pointer in divas_write(). Similarly, the copy is achieved through the function pointer 'cp_fn', which finally calls copy_from_user(). After the successful copy, the corresponding command processing handler of the matched adapter is invoked to perform the write operation.
It is obvious that there are two copies here from userspace, one is in diva_xdi_open_adapter(), and one is in diva_xdi_write(). Plus, both of these two copies share the same source userspace pointer, i.e., the 'buf' pointer in divas_write(). Given that a malicious userspace process can race to change the content pointed by the 'buf' pointer, this can pose potential security issues. For example, in the first copy, the user provides a valid adapter number to pass the verification process and a valid adapter can be found. Then the user can modify the adapter number to an invalid number. This way, the user can bypass the verification process of the adapter number and inject inconsistent data.
This patch reuses the data copied in diva_xdi_open_adapter() and passes it to diva_xdi_write(). This way, the above issues can be avoided.
Signed-off-by: Wenwen Wang wang6495@umn.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/isdn/hardware/eicon/diva.c | 22 +++++++++++++++------- drivers/isdn/hardware/eicon/diva.h | 5 +++-- drivers/isdn/hardware/eicon/divasmain.c | 18 +++++++++++------- 3 files changed, 29 insertions(+), 16 deletions(-)
--- a/drivers/isdn/hardware/eicon/diva.c +++ b/drivers/isdn/hardware/eicon/diva.c @@ -388,10 +388,10 @@ void divasa_xdi_driver_unload(void) ** Receive and process command from user mode utility */ void *diva_xdi_open_adapter(void *os_handle, const void __user *src, - int length, + int length, void *mptr, divas_xdi_copy_from_user_fn_t cp_fn) { - diva_xdi_um_cfg_cmd_t msg; + diva_xdi_um_cfg_cmd_t *msg = (diva_xdi_um_cfg_cmd_t *)mptr; diva_os_xdi_adapter_t *a = NULL; diva_os_spin_lock_magic_t old_irql; struct list_head *tmp; @@ -401,21 +401,21 @@ void *diva_xdi_open_adapter(void *os_han length, sizeof(diva_xdi_um_cfg_cmd_t))) return NULL; } - if ((*cp_fn) (os_handle, &msg, src, sizeof(msg)) <= 0) { + if ((*cp_fn) (os_handle, msg, src, sizeof(*msg)) <= 0) { DBG_ERR(("A: A(?) open, write error")) return NULL; } diva_os_enter_spin_lock(&adapter_lock, &old_irql, "open_adapter"); list_for_each(tmp, &adapter_queue) { a = list_entry(tmp, diva_os_xdi_adapter_t, link); - if (a->controller == (int)msg.adapter) + if (a->controller == (int)msg->adapter) break; a = NULL; } diva_os_leave_spin_lock(&adapter_lock, &old_irql, "open_adapter");
if (!a) { - DBG_ERR(("A: A(%d) open, adapter not found", msg.adapter)) + DBG_ERR(("A: A(%d) open, adapter not found", msg->adapter)) }
return (a); @@ -437,8 +437,10 @@ void diva_xdi_close_adapter(void *adapte
int diva_xdi_write(void *adapter, void *os_handle, const void __user *src, - int length, divas_xdi_copy_from_user_fn_t cp_fn) + int length, void *mptr, + divas_xdi_copy_from_user_fn_t cp_fn) { + diva_xdi_um_cfg_cmd_t *msg = (diva_xdi_um_cfg_cmd_t *)mptr; diva_os_xdi_adapter_t *a = (diva_os_xdi_adapter_t *) adapter; void *data;
@@ -459,7 +461,13 @@ diva_xdi_write(void *adapter, void *os_h return (-2); }
- length = (*cp_fn) (os_handle, data, src, length); + if (msg) { + *(diva_xdi_um_cfg_cmd_t *)data = *msg; + length = (*cp_fn) (os_handle, (char *)data + sizeof(*msg), + src + sizeof(*msg), length - sizeof(*msg)); + } else { + length = (*cp_fn) (os_handle, data, src, length); + } if (length > 0) { if ((*(a->interface.cmd_proc)) (a, (diva_xdi_um_cfg_cmd_t *) data, length)) { --- a/drivers/isdn/hardware/eicon/diva.h +++ b/drivers/isdn/hardware/eicon/diva.h @@ -20,10 +20,11 @@ int diva_xdi_read(void *adapter, void *o int max_length, divas_xdi_copy_to_user_fn_t cp_fn);
int diva_xdi_write(void *adapter, void *os_handle, const void __user *src, - int length, divas_xdi_copy_from_user_fn_t cp_fn); + int length, void *msg, + divas_xdi_copy_from_user_fn_t cp_fn);
void *diva_xdi_open_adapter(void *os_handle, const void __user *src, - int length, + int length, void *msg, divas_xdi_copy_from_user_fn_t cp_fn);
void diva_xdi_close_adapter(void *adapter, void *os_handle); --- a/drivers/isdn/hardware/eicon/divasmain.c +++ b/drivers/isdn/hardware/eicon/divasmain.c @@ -591,19 +591,22 @@ static int divas_release(struct inode *i static ssize_t divas_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { + diva_xdi_um_cfg_cmd_t msg; int ret = -EINVAL;
if (!file->private_data) { file->private_data = diva_xdi_open_adapter(file, buf, - count, + count, &msg, xdi_copy_from_user); - } - if (!file->private_data) { - return (-ENODEV); + if (!file->private_data) + return (-ENODEV); + ret = diva_xdi_write(file->private_data, file, + buf, count, &msg, xdi_copy_from_user); + } else { + ret = diva_xdi_write(file->private_data, file, + buf, count, NULL, xdi_copy_from_user); }
- ret = diva_xdi_write(file->private_data, file, - buf, count, xdi_copy_from_user); switch (ret) { case -1: /* Message should be removed from rx mailbox first */ ret = -EBUSY; @@ -622,11 +625,12 @@ static ssize_t divas_write(struct file * static ssize_t divas_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + diva_xdi_um_cfg_cmd_t msg; int ret = -EINVAL;
if (!file->private_data) { file->private_data = diva_xdi_open_adapter(file, buf, - count, + count, &msg, xdi_copy_from_user); } if (!file->private_data) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill Tkhai ktkhai@virtuozzo.com
[ Upstream commit eb7f54b90bd8f469834c5e86dcf72ebf9a629811 ]
(resend for properly queueing in patchwork)
kcm_clone() creates kernel socket, which does not take net counter. Thus, the net may die before the socket is completely destructed, i.e. kcm_exit_net() is executed before kcm_done().
Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/kcm/kcmsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1672,7 +1672,7 @@ static struct file *kcm_clone(struct soc __module_get(newsock->ops->owner);
newsk = sk_alloc(sock_net(osock->sk), PF_KCM, GFP_KERNEL, - &kcm_proto, true); + &kcm_proto, false); if (!newsk) { sock_release(newsock); return ERR_PTR(-ENOMEM);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 75d4e704fa8d2cf33ff295e5b441317603d7f9fd ]
Per discussion with David at netconf 2018, let's clarify DaveM's position of handling stable backports in netdev-FAQ.
This is important for people relying on upstream -stable releases.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/networking/netdev-FAQ.txt | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/Documentation/networking/netdev-FAQ.txt +++ b/Documentation/networking/netdev-FAQ.txt @@ -176,6 +176,15 @@ A: No. See above answer. In short, if dash marker line as described in Documentation/process/submitting-patches.rst to temporarily embed that information into the patch that you send.
+Q: Are all networking bug fixes backported to all stable releases? + +A: Due to capacity, Dave could only take care of the backports for the last + 2 stable releases. For earlier stable releases, each stable branch maintainer + is supposed to take care of them. If you find any patch is missing from an + earlier stable branch, please notify stable@vger.kernel.org with either a + commit ID or a formal patch backported, and CC Dave and other relevant + networking developers. + Q: Someone said that the comment style and coding convention is different for the networking content. Is this true?
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roopa Prabhu roopa@cumulusnetworks.com
[ Upstream commit 2eabd764cb5512f1338d06ffc054c8bc9fbe9104 ]
Signed-off-by: Roopa Prabhu roopa@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_frontend.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -625,6 +625,7 @@ const struct nla_policy rtm_ipv4_policy[ [RTA_ENCAP] = { .type = NLA_NESTED }, [RTA_UID] = { .type = NLA_U32 }, [RTA_MARK] = { .type = NLA_U32 }, + [RTA_TABLE] = { .type = NLA_U32 }, };
static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 5b5e7a0de2bbf2a1afcd9f49e940010e9fb80d53 ]
Before using nla_get_u32(), better make sure the attribute is of the proper size.
Code recently was changed, but bug has been there from beginning of git.
BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361 rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419 fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007faae5fd8c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007faae5fd96d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529 fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline] fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150 fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_semantics.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -725,6 +725,8 @@ bool fib_metrics_match(struct fib_config nla_strlcpy(tmp, nla, sizeof(tmp)); val = tcp_ca_get_key_by_name(tmp, &ecn_ca); } else { + if (nla_len(nla) != sizeof(u32)) + return false; val = nla_get_u32(nla); }
@@ -1051,6 +1053,8 @@ fib_convert_metrics(struct fib_info *fi, if (val == TCP_CA_UNSPEC) return -EINVAL; } else { + if (nla_len(nla) != sizeof(u32)) + return -EINVAL; val = nla_get_u32(nla); } if (type == RTAX_ADVMSS && val > 65535 - 40)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit eb73190f4fbeedf762394e92d6a4ec9ace684c88 ]
syzbot was able to trick af_packet again [1]
Various commits tried to address the problem in the past, but failed to take into account V3 header size.
[1]
tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96 BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline] BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039 Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106
CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436 prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline] prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039 __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline] packet_current_rx_frame net/packet/af_packet.c:1117 [inline] tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282 dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018 xmit_one net/core/dev.c:3049 [inline] dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069 __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584 dev_queue_xmit+0x17/0x20 net/core/dev.c:3617 neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358 neigh_output include/net/neighbour.h:482 [inline] ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120 ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154 NF_HOOK_COND include/linux/netfilter.h:277 [inline] ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:288 [inline] ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491 ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633 addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the page: page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80 flags: 0x2fffc0000000000() raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80 raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^ ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Fixes: 2b6867c2ce76 ("net/packet: fix overflow in check for priv area size") Fixes: dc808110bb62 ("packet: handle too big packets for PACKET_V3") Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4293,7 +4293,7 @@ static int packet_set_ring(struct sock * goto out; if (po->tp_version >= TPACKET_V3 && req->tp_block_size <= - BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv)) + BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv) + sizeof(struct tpacket3_hdr)) goto out; if (unlikely(req->tp_frame_size < po->tp_hdrlen + po->tp_reserve))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit 79fb218d97980d4fee9a64f4c8ff05289364ba25 ]
On newer PHYs, we need to select the expansion register to write with setting bits [11:8] to 0xf. This was done correctly by bcm7xxx.c prior to being migrated to generic code under bcm-phy-lib.c which unfortunately used the older implementation from the BCM54xx days.
Fix this by creating an inline stub: bcm_write_exp_sel() which adds the correct value (MII_BCM54XX_EXP_SEL_ER) and update both the Cygnus PHY and BCM7xxx PHY drivers which require setting these bits.
broadcom.c is unchanged because some PHYs even use a different selector method, so let them specify it directly (e.g: SerDes secondary selector).
Fixes: a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/bcm-cygnus.c | 6 +++--- drivers/net/phy/bcm-phy-lib.h | 7 +++++++ drivers/net/phy/bcm7xxx.c | 4 ++-- 3 files changed, 12 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/bcm-cygnus.c +++ b/drivers/net/phy/bcm-cygnus.c @@ -61,17 +61,17 @@ static int bcm_cygnus_afe_config(struct return rc;
/* make rcal=100, since rdb default is 000 */ - rc = bcm_phy_write_exp(phydev, MII_BRCM_CORE_EXPB1, 0x10); + rc = bcm_phy_write_exp_sel(phydev, MII_BRCM_CORE_EXPB1, 0x10); if (rc < 0) return rc;
/* CORE_EXPB0, Reset R_CAL/RC_CAL Engine */ - rc = bcm_phy_write_exp(phydev, MII_BRCM_CORE_EXPB0, 0x10); + rc = bcm_phy_write_exp_sel(phydev, MII_BRCM_CORE_EXPB0, 0x10); if (rc < 0) return rc;
/* CORE_EXPB0, Disable Reset R_CAL/RC_CAL Engine */ - rc = bcm_phy_write_exp(phydev, MII_BRCM_CORE_EXPB0, 0x00); + rc = bcm_phy_write_exp_sel(phydev, MII_BRCM_CORE_EXPB0, 0x00);
return 0; } --- a/drivers/net/phy/bcm-phy-lib.h +++ b/drivers/net/phy/bcm-phy-lib.h @@ -14,11 +14,18 @@ #ifndef _LINUX_BCM_PHY_LIB_H #define _LINUX_BCM_PHY_LIB_H
+#include <linux/brcmphy.h> #include <linux/phy.h>
int bcm_phy_write_exp(struct phy_device *phydev, u16 reg, u16 val); int bcm_phy_read_exp(struct phy_device *phydev, u16 reg);
+static inline int bcm_phy_write_exp_sel(struct phy_device *phydev, + u16 reg, u16 val) +{ + return bcm_phy_write_exp(phydev, reg | MII_BCM54XX_EXP_SEL_ER, val); +} + int bcm54xx_auxctl_write(struct phy_device *phydev, u16 regnum, u16 val); int bcm54xx_auxctl_read(struct phy_device *phydev, u16 regnum);
--- a/drivers/net/phy/bcm7xxx.c +++ b/drivers/net/phy/bcm7xxx.c @@ -65,10 +65,10 @@ struct bcm7xxx_phy_priv { static void r_rc_cal_reset(struct phy_device *phydev) { /* Reset R_CAL/RC_CAL Engine */ - bcm_phy_write_exp(phydev, 0x00b0, 0x0010); + bcm_phy_write_exp_sel(phydev, 0x00b0, 0x0010);
/* Disable Reset R_AL/RC_CAL Engine */ - bcm_phy_write_exp(phydev, 0x00b0, 0x0000); + bcm_phy_write_exp_sel(phydev, 0x00b0, 0x0000); }
static int bcm7xxx_28nm_b0_afe_config_init(struct phy_device *phydev)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit 9aad13b087ab0a588cd68259de618f100053360e ]
Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") ensures that packet_snd always starts writing the link layer header in reserved headroom allocated for this purpose.
This is needed because packets may be shorter than hard_header_len, in which case the space up to hard_header_len may be zeroed. But that necessary padding is not accounted for in skb->len.
The fix, however, is buggy. It calls skb_push, which grows skb->len when moving skb->data back. But in this case packet length should not change.
Instead, call skb_reserve, which moves both skb->data and skb->tail back, without changing length.
Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") Reported-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: Willem de Bruijn willemb@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2920,7 +2920,7 @@ static int packet_snd(struct socket *soc if (unlikely(offset < 0)) goto out_free; } else if (reserve) { - skb_push(skb, reserve); + skb_reserve(skb, -reserve); }
/* Returns -EFAULT on error */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shahed Shaikh shahed.shaikh@cavium.com
[ Upstream commit fdd13dd350dda1826579eb5c333d76b14513b812 ]
ILT entry requires 12 bit right shifted physical address. Existing mask for ILT entry of physical address i.e. ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit address because upper 8 bits of 64 bit address were getting masked which resulted in completer abort error on PCIe bus due to invalid address.
Fix that mask to handle 64bit physical address.
Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Signed-off-by: Shahed Shaikh shahed.shaikh@cavium.com Signed-off-by: Ariel Elior ariel.elior@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_cxt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c @@ -77,7 +77,7 @@ #define ILT_CFG_REG(cli, reg) PSWRQ2_REG_ ## cli ## _ ## reg ## _RT_OFFSET
/* ILT entry structure */ -#define ILT_ENTRY_PHY_ADDR_MASK 0x000FFFFFFFFFFFULL +#define ILT_ENTRY_PHY_ADDR_MASK (~0ULL >> 12) #define ILT_ENTRY_PHY_ADDR_SHIFT 0 #define ILT_ENTRY_VALID_MASK 0x1ULL #define ILT_ENTRY_VALID_SHIFT 52
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 1d88ba1ebb2763aa86172cd7ca05dedbeccc0d35 ]
syzbot reported a rcu_sched self-detected stall on CPU which is caused by too small value set on rto_min with SCTP_RTOINFO sockopt. With this value, hb_timer will get stuck there, as in its timer handler it starts this timer again with this value, then goes to the timer handler again.
This problem is there since very beginning, and thanks to Eric for the reproducer shared from a syzbot mail.
This patch fixes it by not allowing sctp_transport_timeout to return a smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.
Note that it doesn't fix this issue by limiting rto_min, as some users are still using small rto and no proper value was found for it yet.
Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sctp/transport.c +++ b/net/sctp/transport.c @@ -637,7 +637,7 @@ unsigned long sctp_transport_timeout(str trans->state != SCTP_PF) timeout += trans->hbinterval;
- return timeout; + return max_t(unsigned long, timeout, HZ / 5); }
/* Reset transport variables to their initial values */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 25ea66544bfd1d9df1b7e1502f8717e85fa1e6e6 ]
This code was introduced in 2011 around the same time that we made netdev_features_t a u64 type. These days a u32 is not big enough to hold all the potential features.
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/team/team.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1004,7 +1004,8 @@ static void team_port_disable(struct tea static void __team_compute_features(struct team *team) { struct team_port *port; - u32 vlan_features = TEAM_VLAN_FEATURES & NETIF_F_ALL_FOR_ALL; + netdev_features_t vlan_features = TEAM_VLAN_FEATURES & + NETIF_F_ALL_FOR_ALL; netdev_features_t enc_features = TEAM_ENC_FEATURES; unsigned short max_hard_header_len = ETH_HLEN; unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 1b15ad683ab42a203f98b67045b40720e99d0e9a ]
DaeRyong Jeong reports a race between vhost_dev_cleanup() and vhost_process_iotlb_msg():
Thread interleaving: CPU0 (vhost_process_iotlb_msg) CPU1 (vhost_dev_cleanup) (In the case of both VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE)
===== ===== vhost_umem_clean(dev->iotlb); if (!dev->iotlb) { ret = -EFAULT; break; } dev->iotlb = NULL;
The reason is we don't synchronize between them, fixing by protecting vhost_process_iotlb_msg() with dev mutex.
Reported-by: DaeRyong Jeong threeearcat@gmail.com Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Signed-off-by: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/vhost.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -993,6 +993,7 @@ static int vhost_process_iotlb_msg(struc { int ret = 0;
+ mutex_lock(&dev->mutex); vhost_dev_lock_vqs(dev); switch (msg->type) { case VHOST_IOTLB_UPDATE: @@ -1024,6 +1025,8 @@ static int vhost_process_iotlb_msg(struc }
vhost_dev_unlock_vqs(dev); + mutex_unlock(&dev->mutex); + return ret; } ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Suryaputra ssuryaextr@gmail.com
[ Upstream commit 2f17becfbea5e9a0529b51da7345783e96e69516 ]
Use the right device to determine if redirect should be sent especially when using vrf. Same as well as when sending the redirect.
Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 3 ++- net/ipv6/ndisc.c | 6 ++++++ 2 files changed, 8 insertions(+), 1 deletion(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -506,7 +506,8 @@ int ip6_forward(struct sk_buff *skb) send redirects to source routed frames. We don't send redirects to frames decapsulated from IPsec. */ - if (skb->dev == dst->dev && opt->srcrt == 0 && !skb_sec_path(skb)) { + if (IP6CB(skb)->iif == dst->dev->ifindex && + opt->srcrt == 0 && !skb_sec_path(skb)) { struct in6_addr *target = NULL; struct inet_peer *peer; struct rt6_info *rt; --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1568,6 +1568,12 @@ void ndisc_send_redirect(struct sk_buff ops_data_buf[NDISC_OPS_REDIRECT_DATA_SPACE], *ops_data = NULL; bool ret;
+ if (netif_is_l3_master(skb->dev)) { + dev = __dev_get_by_index(dev_net(skb->dev), IPCB(skb)->iif); + if (!dev) + return; + } + if (ipv6_get_lladdr(dev, &saddr_buf, IFA_F_TENTATIVE)) { ND_PRINTK(2, warn, "Redirect: no link-local address on %s\n", dev->name);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Xhonneux m.xhonneux@gmail.com
[ Upstream commit bbb40a0b75209734ff9286f3326171638c9f6569 ]
seg6_do_srh_encap and seg6_do_srh_inline can possibly do an out-of-bounds access when adding the SRH to the packet. This no longer happen when expanding the skb not only by the size of the SRH (+ outer IPv6 header), but also by skb->mac_len.
[ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620 [ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674
[ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90 [ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 53.796673] Call Trace: [ 53.796679] <IRQ> [ 53.796689] dump_stack+0x71/0xab [ 53.796700] print_address_description+0x6a/0x270 [ 53.796707] kasan_report+0x258/0x380 [ 53.796715] ? seg6_do_srh_encap+0x284/0x620 [ 53.796722] memmove+0x34/0x50 [ 53.796730] seg6_do_srh_encap+0x284/0x620 [ 53.796741] ? seg6_do_srh+0x29b/0x360 [ 53.796747] seg6_do_srh+0x29b/0x360 [ 53.796756] seg6_input+0x2e/0x2e0 [ 53.796765] lwtunnel_input+0x93/0xd0 [ 53.796774] ipv6_rcv+0x690/0x920 [ 53.796783] ? ip6_input+0x170/0x170 [ 53.796791] ? eth_gro_receive+0x2d0/0x2d0 [ 53.796800] ? ip6_input+0x170/0x170 [ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0 [ 53.796820] ? netdev_info+0x110/0x110 [ 53.796827] ? napi_complete_done+0xb6/0x170 [ 53.796834] ? e1000_clean+0x6da/0xf70 [ 53.796845] ? process_backlog+0x129/0x2a0 [ 53.796853] process_backlog+0x129/0x2a0 [ 53.796862] net_rx_action+0x211/0x5c0 [ 53.796870] ? napi_complete_done+0x170/0x170 [ 53.796887] ? run_rebalance_domains+0x11f/0x150 [ 53.796891] __do_softirq+0x10e/0x39e [ 53.796894] do_softirq_own_stack+0x2a/0x40 [ 53.796895] </IRQ> [ 53.796898] do_softirq.part.16+0x54/0x60 [ 53.796900] __local_bh_enable_ip+0x5b/0x60 [ 53.796903] ip6_finish_output2+0x416/0x9f0 [ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110 [ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390 [ 53.796911] ? __rcu_read_unlock+0x66/0x80 [ 53.796913] ? ip6_mtu+0x44/0xf0 [ 53.796916] ? ip6_output+0xfc/0x220 [ 53.796918] ip6_output+0xfc/0x220 [ 53.796921] ? ip6_finish_output+0x2b0/0x2b0 [ 53.796923] ? memcpy+0x34/0x50 [ 53.796926] ip6_send_skb+0x43/0xc0 [ 53.796929] rawv6_sendmsg+0x1216/0x1530 [ 53.796932] ? __orc_find+0x6b/0xc0 [ 53.796934] ? rawv6_rcv_skb+0x160/0x160 [ 53.796937] ? __rcu_read_unlock+0x66/0x80 [ 53.796939] ? __rcu_read_unlock+0x66/0x80 [ 53.796942] ? is_bpf_text_address+0x1e/0x30 [ 53.796944] ? kernel_text_address+0xec/0x100 [ 53.796946] ? __kernel_text_address+0xe/0x30 [ 53.796948] ? unwind_get_return_address+0x2f/0x50 [ 53.796950] ? __save_stack_trace+0x92/0x100 [ 53.796954] ? save_stack+0x89/0xb0 [ 53.796956] ? kasan_kmalloc+0xa0/0xd0 [ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0 [ 53.796961] ? prepare_creds+0x23/0x160 [ 53.796963] ? __x64_sys_capset+0x252/0x3e0 [ 53.796966] ? do_syscall_64+0x69/0x160 [ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.796971] ? __alloc_pages_nodemask+0x170/0x380 [ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0 [ 53.796977] ? tty_vhangup+0x20/0x20 [ 53.796979] ? policy_nodemask+0x1a/0x90 [ 53.796982] ? __mod_node_page_state+0x8d/0xa0 [ 53.796986] ? __check_object_size+0xe7/0x240 [ 53.796989] ? __sys_sendto+0x229/0x290 [ 53.796991] ? rawv6_rcv_skb+0x160/0x160 [ 53.796993] __sys_sendto+0x229/0x290 [ 53.796996] ? __ia32_sys_getpeername+0x50/0x50 [ 53.796999] ? commit_creds+0x2de/0x520 [ 53.797002] ? security_capset+0x57/0x70 [ 53.797004] ? __x64_sys_capset+0x29f/0x3e0 [ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0 [ 53.797011] ? __do_page_fault+0x664/0x770 [ 53.797014] __x64_sys_sendto+0x74/0x90 [ 53.797017] do_syscall_64+0x69/0x160 [ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797022] RIP: 0033:0x7f43b7a6714a [ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a [ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004 [ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c [ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661
[ 53.797171] Allocated by task 642: [ 53.797460] kasan_kmalloc+0xa0/0xd0 [ 53.797463] kmem_cache_alloc+0xd2/0x1f0 [ 53.797465] getname_flags+0x40/0x210 [ 53.797467] user_path_at_empty+0x1d/0x40 [ 53.797469] do_faccessat+0x12a/0x320 [ 53.797471] do_syscall_64+0x69/0x160 [ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797607] Freed by task 642: [ 53.797869] __kasan_slab_free+0x130/0x180 [ 53.797871] kmem_cache_free+0xa8/0x230 [ 53.797872] filename_lookup+0x15b/0x230 [ 53.797874] do_faccessat+0x12a/0x320 [ 53.797876] do_syscall_64+0x69/0x160 [ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.798014] The buggy address belongs to the object at ffff88011975e600 which belongs to the cache names_cache of size 4096 [ 53.799043] The buggy address is located 1786 bytes inside of 4096-byte region [ffff88011975e600, ffff88011975f600) [ 53.800013] The buggy address belongs to the page: [ 53.800414] page:ffffea000465d600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 53.801259] flags: 0x17fff0000008100(slab|head) [ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000 0000000100070007 [ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40 0000000000000000 [ 53.803787] page dumped because: kasan: bad access detected
[ 53.804384] Memory state around the buggy address: [ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.806577] ^ [ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.808356] ================================================================== [ 53.808949] Disabling lock debugging due to kernel taint
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun dlebrun@google.com Signed-off-by: Mathieu Xhonneux m.xhonneux@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/seg6_iptunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -103,7 +103,7 @@ int seg6_do_srh_encap(struct sk_buff *sk hdrlen = (osrh->hdrlen + 1) << 3; tot_len = hdrlen + sizeof(*hdr);
- err = skb_cow_head(skb, tot_len); + err = skb_cow_head(skb, tot_len + skb->mac_len); if (unlikely(err)) return err;
@@ -161,7 +161,7 @@ int seg6_do_srh_inline(struct sk_buff *s
hdrlen = (osrh->hdrlen + 1) << 3;
- err = skb_cow_head(skb, hdrlen); + err = skb_cow_head(skb, hdrlen + skb->mac_len); if (unlikely(err)) return err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit 733a969a7ed14fc5786bcc59c1bdda83c7ddb46e ]
We are currently doing auxiliary control register reads with the shadow register value 0b111 (0x7) which incidentally is also the selector value that should be present in bits [2:0]. Fix this by using the appropriate selector mask which is defined (MII_BCM54XX_AUXCTL_SHDWSEL_MASK).
This does not have a functional impact yet because we always access the MII_BCM54XX_AUXCTL_SHDWSEL_MISC (0x7) register in the current code. This might change at some point though.
Fixes: 5b4e29005123 ("net: phy: broadcom: add bcm54xx_auxctl_read") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/bcm-phy-lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/phy/bcm-phy-lib.c +++ b/drivers/net/phy/bcm-phy-lib.c @@ -56,7 +56,7 @@ int bcm54xx_auxctl_read(struct phy_devic /* The register must be written to both the Shadow Register Select and * the Shadow Read Register Selector */ - phy_write(phydev, MII_BCM54XX_AUX_CTL, regnum | + phy_write(phydev, MII_BCM54XX_AUX_CTL, MII_BCM54XX_AUXCTL_SHDWSEL_MASK | regnum << MII_BCM54XX_AUXCTL_SHDWSEL_READ_SHIFT); return phy_read(phydev, MII_BCM54XX_AUX_CTL); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Duyck alexander.h.duyck@intel.com
[ Upstream commit 664088f8d68178809b848ca450f2797efb34e8e7 ]
This patch reorders the error cases in showing the XPS configuration so that we hold off on memory allocation until after we have verified that we can support XPS on a given ring.
Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Alexander Duyck alexander.h.duyck@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/net-sysfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1207,9 +1207,6 @@ static ssize_t xps_cpus_show(struct netd cpumask_var_t mask; unsigned long index;
- if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) - return -ENOMEM; - index = get_netdev_queue_index(queue);
if (dev->num_tc) { @@ -1219,6 +1216,9 @@ static ssize_t xps_cpus_show(struct netd return -EINVAL; }
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) + return -ENOMEM; + rcu_read_lock(); dev_maps = rcu_dereference(dev->xps_maps); if (dev_maps) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 5d458a13dd59d04b4d6658a6d5b94d42732b15ae ]
We should not go for the error path after successfully transmitting a XDP buffer after linearizing. Since the error path may try to pop and drop next packet and increase the drop counters. Fixing this by simply drop the refcnt of original page and go for xmit path.
Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Cc: John Fastabend john.fastabend@gmail.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -688,7 +688,7 @@ static struct sk_buff *receive_mergeable trace_xdp_exception(vi->dev, xdp_prog, act); ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len); if (unlikely(xdp_page != page)) - goto err_xdp; + put_page(page); rcu_read_unlock(); goto xdp_xmit; default:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jack Morgenstein jackm@dev.mellanox.co.il
[ Upstream commit d546b67cda015fb92bfee93d5dc0ceadb91deaee ]
spin_lock/unlock was used instead of spin_un/lock_irq in a procedure used in process space, on a spinlock which can be grabbed in an interrupt.
This caused the stack trace below to be displayed (on kernel 4.17.0-rc1 compiled with Lock Debugging enabled):
[ 154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G I [ 154.675856] ----------------------------------------------------- [ 154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core] [ 154.700927] and this task is already holding: [ 154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib] [ 154.718028] which would create a new lock dependency: [ 154.723705] (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.} [ 154.731922] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 154.740798] (&(&cq->lock)->rlock){..-.} [ 154.740800] ... which became SOFTIRQ-irq-safe at: [ 154.752163] _raw_spin_lock_irqsave+0x3e/0x50 [ 154.757163] mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib] [ 154.762554] ipoib_tx_poll+0x4a/0xf0 [ib_ipoib] ... to a SOFTIRQ-irq-unsafe lock: [ 154.815603] (&(&qp_table->lock)->rlock){+.+.} [ 154.815604] ... which became SOFTIRQ-irq-unsafe at: [ 154.827718] ... [ 154.827720] _raw_spin_lock+0x35/0x50 [ 154.833912] mlx4_qp_lookup+0x1e/0x50 [mlx4_core] [ 154.839302] mlx4_flow_attach+0x3f/0x3d0 [mlx4_core]
Since mlx4_qp_lookup() is called only in process space, we can simply replace the spin_un/lock calls with spin_un/lock_irq calls.
Fixes: 6dc06c08bef1 ("net/mlx4: Fix the check in attaching steering rules") Signed-off-by: Jack Morgenstein jackm@dev.mellanox.co.il Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/qp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/qp.c +++ b/drivers/net/ethernet/mellanox/mlx4/qp.c @@ -393,11 +393,11 @@ struct mlx4_qp *mlx4_qp_lookup(struct ml struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table; struct mlx4_qp *qp;
- spin_lock(&qp_table->lock); + spin_lock_irq(&qp_table->lock);
qp = __mlx4_qp_lookup(dev, qpn);
- spin_unlock(&qp_table->lock); + spin_unlock_irq(&qp_table->lock); return qp; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 6547e387d7f52f2ba681a229de3c13e5b9e01ee1 ]
Calling XDP redirection requires bh disabled. Softirq can call another XDP function and redirection functions, then the percpu static variable ri->map can be overwritten to NULL.
This is a generic XDP case called from tun.
[ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 3535.743974] PGD 0 P4D 0 [ 3535.746530] Oops: 0000 [#1] SMP PTI [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2 [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017 [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30 [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246 [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100 [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000 [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0 [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000 [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00 [ 3535.874990] FS: 0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000 [ 3535.883152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0 [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3535.910538] PKRU: 55555554 [ 3535.913267] Call Trace: [ 3535.915736] xdp_do_generic_redirect+0x7a/0x310 [ 3535.920310] do_xdp_generic.part.117+0x285/0x370 [ 3535.924970] tun_get_user+0x5b9/0x1260 [tun] [ 3535.929279] tun_sendmsg+0x52/0x70 [tun] [ 3535.933237] handle_tx+0x2ad/0x5f0 [vhost_net] [ 3535.937721] vhost_worker+0xa5/0x100 [vhost] [ 3535.942030] kthread+0xf5/0x130 [ 3535.945198] ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost] [ 3535.950031] ? kthread_bind+0x10/0x10 [ 3535.953727] ret_from_fork+0x35/0x40 [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3 [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58 [ 3535.982883] CR2: 0000000000000018 [ 3535.987096] ---[ end trace 383b299dd1430240 ]--- [ 3536.131325] Kernel panic - not syncing: Fatal exception [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---
And a kernel with generic case fixed still panics in tun driver XDP redirect, because it disabled only preemption, but not bh.
[ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 2055.136662] PGD 0 P4D 0 [ 2055.139219] Oops: 0000 [#1] SMP PTI [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G W 4.17.0-rc5-fix-tun+ #1 [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017 [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30 [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246 [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004 [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000 [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80 [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000 [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78 [ 2055.269879] FS: 0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000 [ 2055.278039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0 [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2055.305424] PKRU: 55555554 [ 2055.308153] Call Trace: [ 2055.310624] xdp_do_redirect+0x7b/0x380 [ 2055.314499] tun_get_user+0x10fe/0x12a0 [tun] [ 2055.318895] tun_sendmsg+0x52/0x70 [tun] [ 2055.322852] handle_tx+0x2ad/0x5f0 [vhost_net] [ 2055.327337] vhost_worker+0xa5/0x100 [vhost] [ 2055.331646] kthread+0xf5/0x130 [ 2055.334813] ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost] [ 2055.339646] ? kthread_bind+0x10/0x10 [ 2055.343343] ret_from_fork+0x35/0x40 [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3 [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0 [ 2055.372500] CR2: 0000000000000018 [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]--- [ 2055.523626] Kernel panic - not syncing: Fatal exception [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---
v2: - Removed preempt_disable/enable since local_bh_disable will prevent preemption as well, feedback from Jason Wang.
Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1315,7 +1315,7 @@ static struct sk_buff *tun_build_skb(str else *skb_xdp = 0;
- preempt_disable(); + local_bh_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog && !*skb_xdp) { @@ -1338,7 +1338,7 @@ static struct sk_buff *tun_build_skb(str if (err) goto err_redirect; rcu_read_unlock(); - preempt_enable(); + local_bh_enable(); return NULL; case XDP_TX: xdp_xmit = true; @@ -1360,7 +1360,7 @@ static struct sk_buff *tun_build_skb(str skb = build_skb(buf, buflen); if (!skb) { rcu_read_unlock(); - preempt_enable(); + local_bh_enable(); return ERR_PTR(-ENOMEM); }
@@ -1373,12 +1373,12 @@ static struct sk_buff *tun_build_skb(str skb->dev = tun->dev; generic_xdp_tx(skb, xdp_prog); rcu_read_unlock(); - preempt_enable(); + local_bh_enable(); return NULL; }
rcu_read_unlock(); - preempt_enable(); + local_bh_enable();
return skb;
@@ -1386,7 +1386,7 @@ err_redirect: put_page(alloc_frag->page); err_xdp: rcu_read_unlock(); - preempt_enable(); + local_bh_enable(); this_cpu_inc(tun->pcpu_stats->rx_dropped); return NULL; } @@ -1556,16 +1556,19 @@ static ssize_t tun_get_user(struct tun_s struct bpf_prog *xdp_prog; int ret;
+ local_bh_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) { ret = do_xdp_generic(xdp_prog, skb); if (ret != XDP_PASS) { rcu_read_unlock(); + local_bh_enable(); return total_len; } } rcu_read_unlock(); + local_bh_enable(); }
rxhash = __skb_get_hash_symmetric(skb);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 850e088d5bbb333342fd4def08d0a4035f2b7126 ]
If we successfully linearize the packet, num_buf will be set to zero which may confuse error handling path which assumes num_buf is at least 1 and this can lead the code tries to pop the descriptor of next buffer. Fixing this by checking num_buf against 1 before decreasing.
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -777,7 +777,7 @@ err_xdp: rcu_read_unlock(); err_skb: put_page(page); - while (--num_buf) { + while (num_buf-- > 1) { buf = virtqueue_get_buf(rq->vq, &len); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers missing\n",
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eran Ben Elisha eranbe@mellanox.com
[ Upstream commit 902a545904c71d719ed144234d67df75f31db63b ]
When RXFCS feature is enabled, the HW do not strip the FCS data, however it is not present in the checksum calculated by the HW.
Fix that by manually calculating the FCS checksum and adding it to the SKB checksum field.
Add helper function to find the FCS data for all SKB forms (linear, one fragment or more).
Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Eran Ben Elisha eranbe@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 42 ++++++++++++++++++++++++ 1 file changed, 42 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -635,6 +635,45 @@ static inline bool is_first_ethertype_ip return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6)); }
+static __be32 mlx5e_get_fcs(struct sk_buff *skb) +{ + int last_frag_sz, bytes_in_prev, nr_frags; + u8 *fcs_p1, *fcs_p2; + skb_frag_t *last_frag; + __be32 fcs_bytes; + + if (!skb_is_nonlinear(skb)) + return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN); + + nr_frags = skb_shinfo(skb)->nr_frags; + last_frag = &skb_shinfo(skb)->frags[nr_frags - 1]; + last_frag_sz = skb_frag_size(last_frag); + + /* If all FCS data is in last frag */ + if (last_frag_sz >= ETH_FCS_LEN) + return *(__be32 *)(skb_frag_address(last_frag) + + last_frag_sz - ETH_FCS_LEN); + + fcs_p2 = (u8 *)skb_frag_address(last_frag); + bytes_in_prev = ETH_FCS_LEN - last_frag_sz; + + /* Find where the other part of the FCS is - Linear or another frag */ + if (nr_frags == 1) { + fcs_p1 = skb_tail_pointer(skb); + } else { + skb_frag_t *prev_frag = &skb_shinfo(skb)->frags[nr_frags - 2]; + + fcs_p1 = skb_frag_address(prev_frag) + + skb_frag_size(prev_frag); + } + fcs_p1 -= bytes_in_prev; + + memcpy(&fcs_bytes, fcs_p1, bytes_in_prev); + memcpy(((u8 *)&fcs_bytes) + bytes_in_prev, fcs_p2, last_frag_sz); + + return fcs_bytes; +} + static inline void mlx5e_handle_csum(struct net_device *netdev, struct mlx5_cqe64 *cqe, struct mlx5e_rq *rq, @@ -653,6 +692,9 @@ static inline void mlx5e_handle_csum(str if (is_first_ethertype_ip(skb)) { skb->ip_summed = CHECKSUM_COMPLETE; skb->csum = csum_unfold((__force __sum16)cqe->check_sum); + if (unlikely(netdev->features & NETIF_F_RXFCS)) + skb->csum = csum_add(skb->csum, + (__force __wsum)mlx5e_get_fcs(skb)); rq->stats.csum_complete++; return; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 3d62b2a0db505bbf9ed0755f254e45d775f9807f ]
We need to drop refcnt to xdp_page if we see a gso packet. Otherwise it will be leaked. Fixing this by moving the check of gso packet above the linearizing logic. While at it, remove useless comment as well.
Cc: John Fastabend john.fastabend@gmail.com Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -632,6 +632,13 @@ static struct sk_buff *receive_mergeable void *data; u32 act;
+ /* Transient failure which in theory could occur if + * in-flight packets from before XDP was enabled reach + * the receive path after XDP is loaded. + */ + if (unlikely(hdr->hdr.gso_type)) + goto err_xdp; + /* This happens when rx buffer size is underestimated */ if (unlikely(num_buf > 1 || headroom < virtnet_get_headroom(vi))) { @@ -647,14 +654,6 @@ static struct sk_buff *receive_mergeable xdp_page = page; }
- /* Transient failure which in theory could occur if - * in-flight packets from before XDP was enabled reach - * the receive path after XDP is loaded. In practice I - * was not able to create this condition. - */ - if (unlikely(hdr->hdr.gso_type)) - goto err_xdp; - /* Allow consuming headroom but reserve enough space to push * the descriptor on if we get an XDP_TX return code. */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 644c7eebbfd59e72982d11ec6cc7d39af12450ae ]
It seems that rtnl_group_changelink() can call do_setlink while a prior call to validate_linkmsg(dev = NULL, ...) could not validate IFLA_ADDRESS / IFLA_BROADCAST
Make sure do_setlink() calls validate_linkmsg() instead of letting its callers having this responsibility.
With help from Dmitry Vyukov, thanks a lot !
BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline] BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline] BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308 CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 is_valid_ether_addr include/linux/etherdevice.h:199 [inline] eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline] eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308 dev_set_mac_address+0x261/0x530 net/core/dev.c:7157 do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317 rtnl_group_changelink net/core/rtnetlink.c:2824 [inline] rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000
Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527 __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478 do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315 rtnl_group_changelink net/core/rtnetlink.c:2824 [inline] rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: e7ed828f10bd ("netlink: support setting devgroup parameters") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/rtnetlink.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1958,6 +1958,10 @@ static int do_setlink(const struct sk_bu const struct net_device_ops *ops = dev->netdev_ops; int err;
+ err = validate_linkmsg(dev, tb); + if (err < 0) + return err; + if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD]) { struct net *net = rtnl_link_get_net(dev_net(dev), tb); if (IS_ERR(net)) { @@ -2296,10 +2300,6 @@ static int rtnl_setlink(struct sk_buff * goto errout; }
- err = validate_linkmsg(dev, tb); - if (err < 0) - goto errout; - err = do_setlink(skb, dev, ifm, extack, tb, ifname, 0); errout: return err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Blakey paulb@mellanox.com
[ Upstream commit 8258d2da9f9f521dce7019e018360c28d116354e ]
When we fail to modify a rule, we incorrectly release the idr handle of the unmodified old rule.
Fix that by checking if we need to release it.
Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure") Reported-by: Vlad Buslov vladbu@mellanox.com Reviewed-by: Roi Dayan roid@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: Paul Blakey paulb@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/cls_flower.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -1007,7 +1007,7 @@ static int fl_change(struct net *net, st return 0;
errout_idr: - if (fnew->handle) + if (!fold) idr_remove_ext(&head->handle_idr, fnew->handle); errout: tcf_exts_destroy(&fnew->exts);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui decui@microsoft.com
commit c3635da2a336441253c33298b87b3042db100725 upstream.
Before the guest finishes the device initialization, the device can be removed anytime by the host, and after that the host won't respond to the guest's request, so the guest should be prepared to handle this case.
Add a polling mechanism to detect device presence.
Signed-off-by: Dexuan Cui decui@microsoft.com [lorenzo.pieralisi@arm.com: edited commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/host/pci-hyperv.c | 46 +++++++++++++++++++++++++++++++----------- 1 file changed, 34 insertions(+), 12 deletions(-)
--- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -566,6 +566,26 @@ static void put_pcichild(struct hv_pci_d static void get_hvpcibus(struct hv_pcibus_device *hv_pcibus); static void put_hvpcibus(struct hv_pcibus_device *hv_pcibus);
+/* + * There is no good way to get notified from vmbus_onoffer_rescind(), + * so let's use polling here, since this is not a hot path. + */ +static int wait_for_response(struct hv_device *hdev, + struct completion *comp) +{ + while (true) { + if (hdev->channel->rescind) { + dev_warn_once(&hdev->device, "The device is gone.\n"); + return -ENODEV; + } + + if (wait_for_completion_timeout(comp, HZ / 10)) + break; + } + + return 0; +} + /** * devfn_to_wslot() - Convert from Linux PCI slot to Windows * @devfn: The Linux representation of PCI slot @@ -1582,7 +1602,8 @@ static struct hv_pci_dev *new_pcichild_d if (ret) goto error;
- wait_for_completion(&comp_pkt.host_event); + if (wait_for_response(hbus->hdev, &comp_pkt.host_event)) + goto error;
hpdev->desc = *desc; refcount_set(&hpdev->refs, 1); @@ -2075,15 +2096,16 @@ static int hv_pci_protocol_negotiation(s sizeof(struct pci_version_request), (unsigned long)pkt, VM_PKT_DATA_INBAND, VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); + if (!ret) + ret = wait_for_response(hdev, &comp_pkt.host_event); + if (ret) { dev_err(&hdev->device, - "PCI Pass-through VSP failed sending version reqquest: %#x", + "PCI Pass-through VSP failed to request version: %d", ret); goto exit; }
- wait_for_completion(&comp_pkt.host_event); - if (comp_pkt.completion_status >= 0) { pci_protocol_version = pci_protocol_versions[i]; dev_info(&hdev->device, @@ -2292,11 +2314,12 @@ static int hv_pci_enter_d0(struct hv_dev ret = vmbus_sendpacket(hdev->channel, d0_entry, sizeof(*d0_entry), (unsigned long)pkt, VM_PKT_DATA_INBAND, VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); + if (!ret) + ret = wait_for_response(hdev, &comp_pkt.host_event); + if (ret) goto exit;
- wait_for_completion(&comp_pkt.host_event); - if (comp_pkt.completion_status < 0) { dev_err(&hdev->device, "PCI Pass-through VSP failed D0 Entry with status %x\n", @@ -2336,11 +2359,10 @@ static int hv_pci_query_relations(struct
ret = vmbus_sendpacket(hdev->channel, &message, sizeof(message), 0, VM_PKT_DATA_INBAND, 0); - if (ret) - return ret; + if (!ret) + ret = wait_for_response(hdev, &comp);
- wait_for_completion(&comp); - return 0; + return ret; }
/** @@ -2410,11 +2432,11 @@ static int hv_send_resources_allocated(s size_res, (unsigned long)pkt, VM_PKT_DATA_INBAND, VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); + if (!ret) + ret = wait_for_response(hdev, &comp_pkt.host_event); if (ret) break;
- wait_for_completion(&comp_pkt.host_event); - if (comp_pkt.completion_status < 0) { ret = -EPROTO; dev_err(&hdev->device,
On 9 June 2018 at 20:59, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.49 release. There are 41 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon Jun 11 15:29:07 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.49-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.14.49-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 8a073119a25f3a91734d482e869c6f8203cd6171 git describe: v4.14.48-42-g8a073119a25f Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.48-42...
No regressions (compared to build v4.14.48-43-gcb411db6da07) ------------------------------------------------------------------------
Ran 11405 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Sat, Jun 09, 2018 at 05:29:32PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.49 release. There are 41 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon Jun 11 15:29:07 UTC 2018. Anything received after that time might be too late.
Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 146 pass: 146 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
On 06/09/2018 09:29 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.49 release. There are 41 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Mon Jun 11 15:29:07 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.49-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org