This is the start of the stable review cycle for the 4.9.79 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:38:12 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.79-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.79-rc1
Daniel Borkmann daniel@iogearbox.net bpf: reject stores into ctx via st and xadd
Alexei Starovoitov ast@kernel.org bpf: fix 32-bit divide by zero
Eric Dumazet edumazet@google.com bpf: fix divides by zero
Daniel Borkmann daniel@iogearbox.net bpf: avoid false sharing of map refcount with max_entries
Daniel Borkmann daniel@iogearbox.net bpf: arsh is not supported in 32 bit alu thus reject it
Alexei Starovoitov ast@kernel.org bpf: introduce BPF_JIT_ALWAYS_ON config
Alexei Starovoitov ast@fb.com bpf: fix bpf_tail_call() x64 JIT
Eric Dumazet edumazet@google.com x86: bpf_jit: small optimization in emit_bpf_tail_call()
Thomas Gleixner tglx@linutronix.de hrtimer: Reset hrtimer cpu base proper on CPU hotplug
Jia Zhang zhang.jia@linux.alibaba.com x86/microcode/intel: Extend BDW late-loading further with LLC size check
Xiao Liang xiliang@redhat.com perf/x86/amd/power: Do not load AMD power module on !AMD platforms
Eric Dumazet edumazet@google.com flow_dissector: properly cap thoff field
Cong Wang xiyou.wangcong@gmail.com tun: fix a memory leak for tfile->tx_array
Yuval Mintz yuvalm@mellanox.com mlxsw: spectrum_router: Don't log an error on missing neighbor
Willem de Bruijn willemb@google.com gso: validate gso_type in GSO handlers
Alexey Kodanev alexey.kodanev@oracle.com ip6_gre: init dev->mtu and dev->hard_header_len correctly
Ivan Vecera cera@cera.cz be2net: restore properly promisc mode after queues reconfiguration
Guillaume Nault g.nault@alphalink.fr ppp: unlock all_ppp_mutex before registering device
Jim Westfall jwestfall@surrealistic.net ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
Jim Westfall jwestfall@surrealistic.net net: Allow neigh contructor functions ability to modify the primary_key
Neil Horman nhorman@tuxdriver.com vmxnet3: repair memory leak
Cong Wang xiyou.wangcong@gmail.com tipc: fix a memory leak in tipc_nl_node_get_link()
Xin Long lucien.xin@gmail.com sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
Xin Long lucien.xin@gmail.com sctp: do not allow the v4 socket to bind a v4mapped v6 address
Francois Romieu romieu@fr.zoreil.com r8169: fix memory corruption on retrieval of hardware statistics.
Guillaume Nault g.nault@alphalink.fr pppoe: take ->needed_headroom of lower device into account on xmit
Dan Streetman ddstreet@ieee.org net: tcp: close sock if net namespace is exiting
Eric Dumazet edumazet@google.com net: qdisc_pkt_len_init() should be more robust
Felix Fietkau nbd@nbd.name net: igmp: fix source address check for IGMPv3 reports
Yuiko Oshino yuiko.oshino@microchip.com lan78xx: Fix failure in USB Full Speed
Eric Dumazet edumazet@google.com ipv6: ip6_make_skb() needs to clear cork.base.dst
Mike Maloney maloney@google.com ipv6: fix udpv6 sendmsg crash caused by too small MTU
Ben Hutchings ben.hutchings@codethink.co.uk ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
Alexey Kodanev alexey.kodanev@oracle.com dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
Greg KH gregkh@linuxfoundation.org eventpoll.h: add missing epoll event masks
Ben Hutchings ben.hutchings@codethink.co.uk vsyscall: Fix permissions for emulate mode with KAISER/PTI
Thomas Meyer thomas@m3y3r.de um: link vmlinux with -no-pie
Martin Brandenburg martin@omnibond.com orangefs: fix deadlock; do not write i_size in read_iter
Aaron Ma aaron.ma@canonical.com Input: trackpoint - force 3 buttons if 0 button is reported
Johannes Weiner hannes@cmpxchg.org mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "module: Add retpoline tag to VERMAGIC"
Johannes Thumshirn jthumshirn@suse.de scsi: libiscsi: fix shifting of DID_REQUEUE host byte
Jiri Slaby jslaby@suse.cz fs/fcntl: f_setown, avoid undefined behaviour
Jeff Mahoney jeffm@suse.com reiserfs: don't preallocate blocks for extended attributes
Jeff Mahoney jeffm@suse.com reiserfs: fix race in prealloc discard
Kevin Cernekee cernekee@chromium.org netfilter: xt_osf: Add missing permission checks
Kevin Cernekee cernekee@chromium.org netfilter: nfnetlink_cthelper: Add missing permission checks
Seunghun Han kkamagui@gmail.com ACPICA: Namespace: fix operand cache leak
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
Jiri Slaby jslaby@suse.cz ipc: msg, make msgrcv work with LONG_MIN
Vlastimil Babka vbabka@suse.cz mm, page_alloc: fix potential false positive in __zone_watermark_ok
Doug Berger opendmb@gmail.com cma: fix calculation of aligned offset
Michal Hocko mhocko@suse.com hwpoison, memcg: forcibly uncharge LRU pages
Michal Hocko mhocko@suse.com mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
Sudeep Holla sudeep.holla@arm.com drivers: base: cacheinfo: fix boot error message when acpi is enabled
Sudeep Holla sudeep.holla@arm.com drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
Janakarajan Natarajan Janakarajan.Natarajan@amd.com Prevent timer value 0 for MWAITX
Punit Agrawal punit.agrawal@arm.com KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
Marc Kleine-Budde mkl@pengutronix.de can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
Marc Kleine-Budde mkl@pengutronix.de can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
Jonathan Dieter jdieter@lesbg.com usbip: Fix potential format overflow in userspace tools
Jonathan Dieter jdieter@lesbg.com usbip: Fix implicit fallthrough warning
Shuah Khan shuah@kernel.org usbip: prevent vhci_hcd driver from leaking a socket pointer address
Martin Brandenburg martin@omnibond.com orangefs: initialize op on loop restart in orangefs_devreq_read
Martin Brandenburg martin@omnibond.com orangefs: use list_for_each_entry_safe in purge_waiting_ops
Andy Lutomirski luto@kernel.org x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
-------------
Diffstat:
Makefile | 4 +- arch/arm/kvm/mmu.c | 2 +- arch/um/Makefile | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 7 ++-- arch/x86/events/amd/power.c | 2 +- arch/x86/include/asm/processor.h | 2 +- arch/x86/include/asm/vsyscall.h | 1 + arch/x86/kernel/cpu/intel_cacheinfo.c | 2 + arch/x86/kernel/cpu/microcode/intel.c | 20 ++++++++- arch/x86/lib/delay.c | 7 ++++ arch/x86/mm/kaiser.c | 2 +- arch/x86/net/bpf_jit_comp.c | 13 +++--- drivers/acpi/acpica/nsutils.c | 23 +++++------ drivers/acpi/glue.c | 12 +++--- drivers/base/cacheinfo.c | 15 +++++-- drivers/input/mouse/trackpoint.c | 3 ++ drivers/net/ethernet/emulex/benet/be_main.c | 9 +++++ .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 5 +-- drivers/net/ethernet/realtek/r8169.c | 9 +---- drivers/net/ppp/ppp_generic.c | 5 ++- drivers/net/ppp/pppoe.c | 11 ++--- drivers/net/tun.c | 15 ++++++- drivers/net/usb/lan78xx.c | 1 + drivers/net/vmxnet3/vmxnet3_drv.c | 2 +- drivers/scsi/libiscsi.c | 2 +- drivers/usb/usbip/usbip_common.h | 1 + drivers/usb/usbip/vhci_sysfs.c | 25 +++++++----- fs/fcntl.c | 4 ++ fs/orangefs/devorangefs-req.c | 3 +- fs/orangefs/file.c | 7 +--- fs/orangefs/orangefs-kernel.h | 11 ----- fs/orangefs/waitqueue.c | 4 +- fs/reiserfs/bitmap.c | 14 +++++-- include/linux/bpf.h | 16 ++++++-- include/linux/cacheinfo.h | 1 + include/linux/mmzone.h | 2 + include/linux/vermagic.h | 8 +--- include/net/arp.h | 3 ++ include/net/ipv6.h | 1 + include/net/net_namespace.h | 10 +++++ include/uapi/linux/eventpoll.h | 13 ++++++ init/Kconfig | 7 ++++ ipc/msg.c | 5 ++- kernel/bpf/core.c | 24 +++++++++-- kernel/bpf/verifier.c | 42 +++++++++++++++++++ kernel/time/hrtimer.c | 3 ++ lib/test_bpf.c | 11 +++-- mm/cma.c | 15 +++---- mm/internal.h | 6 +++ mm/memcontrol.c | 2 +- mm/memory-failure.c | 7 ++++ mm/mmap.c | 6 ++- mm/page_alloc.c | 15 +++---- mm/vmscan.c | 47 +++++++++++++++------- mm/vmstat.c | 2 +- net/can/af_can.c | 22 +++++----- net/core/dev.c | 19 +++++++-- net/core/filter.c | 10 +++-- net/core/flow_dissector.c | 3 +- net/core/neighbour.c | 4 +- net/core/sysctl_net_core.c | 6 +++ net/dccp/ccids/ccid2.c | 3 ++ net/ipv4/arp.c | 7 +++- net/ipv4/igmp.c | 2 +- net/ipv4/tcp.c | 3 ++ net/ipv4/tcp_offload.c | 3 ++ net/ipv4/tcp_timer.c | 15 +++++++ net/ipv4/udp_offload.c | 3 ++ net/ipv6/ip6_gre.c | 14 +++---- net/ipv6/ip6_output.c | 9 +++-- net/ipv6/ipv6_sockglue.c | 2 +- net/ipv6/tcpv6_offload.c | 3 ++ net/ipv6/udp_offload.c | 3 ++ net/netfilter/nfnetlink_cthelper.c | 10 +++++ net/netfilter/xt_osf.c | 7 ++++ net/sctp/offload.c | 3 ++ net/sctp/socket.c | 30 ++++++-------- net/socket.c | 9 +++++ net/tipc/node.c | 26 ++++++------ tools/usb/usbip/libsrc/usbip_common.c | 9 ++++- tools/usb/usbip/libsrc/usbip_host_common.c | 28 ++++++++++--- tools/usb/usbip/libsrc/vhci_driver.c | 8 ++-- tools/usb/usbip/src/usbip.c | 2 + 83 files changed, 523 insertions(+), 231 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 1c52d859cb2d417e7216d3e56bb7fea88444cec9 upstream.
We support various non-Intel CPUs that don't have the CPUID instruction, so the M486 test was wrong. For now, fix it with a big hammer: handle missing CPUID on all 32-bit CPUs.
Reported-by: One Thousand Gnomes gnomes@lxorguk.ukuu.org.uk Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Juergen Gross jgross@suse.com Cc: Peter Zijlstra peterz@infradead.org Cc: Brian Gerst brgerst@gmail.com Cc: Matthew Whitehead tedheadster@gmail.com Cc: Borislav Petkov bp@alien8.de Cc: Henrique de Moraes Holschuh hmh@hmh.eng.br Cc: Andrew Cooper andrew.cooper3@citrix.com Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: xen-devel Xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/685bd083a7c036f7769510b6846315b17d6ba71f.1481307769... Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: "Zhang, Ning A" ning.a.zhang@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -596,7 +596,7 @@ static inline void sync_core(void) { int tmp;
-#ifdef CONFIG_M486 +#ifdef CONFIG_X86_32 /* * Do a CPUID if available, otherwise do a jump. The jump * can conveniently enough be the jump around CPUID.
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Brandenburg martin@omnibond.com
commit 0afc0decf247f65b7aba666a76a0a68adf4bc435 upstream.
set_op_state_purged can delete the op.
Signed-off-by: Martin Brandenburg martin@omnibond.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/orangefs/waitqueue.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/orangefs/waitqueue.c +++ b/fs/orangefs/waitqueue.c @@ -28,10 +28,10 @@ static void orangefs_clean_up_interrupte */ void purge_waiting_ops(void) { - struct orangefs_kernel_op_s *op; + struct orangefs_kernel_op_s *op, *tmp;
spin_lock(&orangefs_request_list_lock); - list_for_each_entry(op, &orangefs_request_list, list) { + list_for_each_entry_safe(op, tmp, &orangefs_request_list, list) { gossip_debug(GOSSIP_WAIT_DEBUG, "pvfs2-client-core: purging op tag %llu %s\n", llu(op->tag),
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Brandenburg martin@omnibond.com
commit a0ec1ded22e6a6bc41981fae22406835b006a66e upstream.
In orangefs_devreq_read, there is a loop which picks an op off the list of pending ops. If the loop fails to find an op, there is nothing to read, and it returns EAGAIN. If the op has been given up on, the loop is restarted via a goto. The bug is that the variable which the found op is written to is not reinitialized, so if there are no more eligible ops on the list, the code runs again on the already handled op.
This is triggered by interrupting a process while the op is being copied to the client-core. It's a fairly small window, but it's there.
Signed-off-by: Martin Brandenburg martin@omnibond.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/orangefs/devorangefs-req.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/orangefs/devorangefs-req.c +++ b/fs/orangefs/devorangefs-req.c @@ -161,7 +161,7 @@ static ssize_t orangefs_devreq_read(stru struct orangefs_kernel_op_s *op, *temp; __s32 proto_ver = ORANGEFS_KERNEL_PROTO_VERSION; static __s32 magic = ORANGEFS_DEVREQ_MAGIC; - struct orangefs_kernel_op_s *cur_op = NULL; + struct orangefs_kernel_op_s *cur_op; unsigned long ret;
/* We do not support blocking IO. */ @@ -181,6 +181,7 @@ static ssize_t orangefs_devreq_read(stru }
restart: + cur_op = NULL; /* Get next op (if any) from top of list. */ spin_lock(&orangefs_request_list_lock); list_for_each_entry_safe(op, temp, &orangefs_request_list, list) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan shuahkh@osg.samsung.com
commit 2f2d0088eb93db5c649d2a5e34a3800a8a935fc5 upstream.
When a client has a USB device attached over IP, the vhci_hcd driver is locally leaking a socket pointer address via the
/sys/devices/platform/vhci_hcd/status file (world-readable) and in debug output when "usbip --debug port" is run.
Fix it to not leak. The socket pointer address is not used at the moment and it was made visible as a convenient way to find IP address from socket pointer address by looking up /proc/net/{tcp,tcp6}.
As this opens a security hole, the fix replaces socket pointer address with sockfd.
Reported-by: Secunia Research vuln@secunia.com Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/usbip/usbip_common.h | 1 + drivers/usb/usbip/vhci_sysfs.c | 25 +++++++++++++++---------- tools/usb/usbip/libsrc/vhci_driver.c | 8 ++++---- 3 files changed, 20 insertions(+), 14 deletions(-)
--- a/drivers/usb/usbip/usbip_common.h +++ b/drivers/usb/usbip/usbip_common.h @@ -271,6 +271,7 @@ struct usbip_device { /* lock for status */ spinlock_t lock;
+ int sockfd; struct socket *tcp_socket;
struct task_struct *tcp_rx; --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -49,13 +49,17 @@ static ssize_t status_show_vhci(int pdev
/* * output example: - * port sta spd dev socket local_busid - * 0000 004 000 00000000 c5a7bb80 1-2.3 - * 0001 004 000 00000000 d8cee980 2-3.4 + * port sta spd dev sockfd local_busid + * 0000 004 000 00000000 000003 1-2.3 + * 0001 004 000 00000000 000004 2-3.4 * - * IP address can be retrieved from a socket pointer address by looking - * up /proc/net/{tcp,tcp6}. Also, a userland program may remember a - * port number and its peer IP address. + * Output includes socket fd instead of socket pointer address to + * avoid leaking kernel memory address in: + * /sys/devices/platform/vhci_hcd.0/status and in debug output. + * The socket pointer address is not used at the moment and it was + * made visible as a convenient way to find IP address from socket + * pointer address by looking up /proc/net/{tcp,tcp6}. As this opens + * a security hole, the change is made to use sockfd instead. */ for (i = 0; i < VHCI_HC_PORTS; i++) { struct vhci_device *vdev = &vhci->vdev[i]; @@ -68,13 +72,13 @@ static ssize_t status_show_vhci(int pdev if (vdev->ud.status == VDEV_ST_USED) { out += sprintf(out, "%03u %08x ", vdev->speed, vdev->devid); - out += sprintf(out, "%16p %s", - vdev->ud.tcp_socket, + out += sprintf(out, "%06u %s", + vdev->ud.sockfd, dev_name(&vdev->udev->dev));
} else { out += sprintf(out, "000 00000000 "); - out += sprintf(out, "0000000000000000 0-0"); + out += sprintf(out, "000000 0-0"); }
out += sprintf(out, "\n"); @@ -125,7 +129,7 @@ static ssize_t status_show(struct device int pdev_nr;
out += sprintf(out, - "port sta spd dev socket local_busid\n"); + "port sta spd dev sockfd local_busid\n");
pdev_nr = status_name_to_id(attr->attr.name); if (pdev_nr < 0) @@ -324,6 +328,7 @@ static ssize_t store_attach(struct devic
vdev->devid = devid; vdev->speed = speed; + vdev->ud.sockfd = sockfd; vdev->ud.tcp_socket = socket; vdev->ud.status = VDEV_ST_NOTASSIGNED;
--- a/tools/usb/usbip/libsrc/vhci_driver.c +++ b/tools/usb/usbip/libsrc/vhci_driver.c @@ -55,12 +55,12 @@ static int parse_status(const char *valu
while (*c != '\0') { int port, status, speed, devid; - unsigned long socket; + int sockfd; char lbusid[SYSFS_BUS_ID_SIZE];
- ret = sscanf(c, "%d %d %d %x %lx %31s\n", + ret = sscanf(c, "%d %d %d %x %u %31s\n", &port, &status, &speed, - &devid, &socket, lbusid); + &devid, &sockfd, lbusid);
if (ret < 5) { dbg("sscanf failed: %d", ret); @@ -69,7 +69,7 @@ static int parse_status(const char *valu
dbg("port %d status %d speed %d devid %x", port, status, speed, devid); - dbg("socket %lx lbusid %s", socket, lbusid); + dbg("sockfd %u lbusid %s", sockfd, lbusid);
/* if a device is connected, look at it */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Dieter jdieter@lesbg.com
commit cfd6ed4537a9e938fa76facecd4b9cd65b6d1563 upstream.
GCC 7 now warns when switch statements fall through implicitly, and with -Werror enabled in configure.ac, that makes these tools unbuildable.
We fix this by notifying the compiler that this particular case statement is meant to fall through.
Reviewed-by: Peter Senna Tschudin peter.senna@gmail.com Signed-off-by: Jonathan Dieter jdieter@lesbg.com Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/usb/usbip/src/usbip.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/tools/usb/usbip/src/usbip.c +++ b/tools/usb/usbip/src/usbip.c @@ -176,6 +176,8 @@ int main(int argc, char *argv[]) break; case '?': printf("usbip: invalid option\n"); + /* Terminate after printing error */ + /* FALLTHRU */ default: usbip_usage(); goto out;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Dieter jdieter@lesbg.com
commit e5dfa3f902b9a642ae8c6997d57d7c41e384a90b upstream.
The usbip userspace tools call sprintf()/snprintf() and don't check for the return value which can lead the paths to overflow, truncating the final file in the path.
More urgently, GCC 7 now warns that these aren't checked with -Wformat-overflow, and with -Werror enabled in configure.ac, that makes these tools unbuildable.
This patch fixes these problems by replacing sprintf() with snprintf() in one place and adding checks for the return value of snprintf().
Reviewed-by: Peter Senna Tschudin peter.senna@gmail.com Signed-off-by: Jonathan Dieter jdieter@lesbg.com Acked-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/usb/usbip/libsrc/usbip_common.c | 9 ++++++++- tools/usb/usbip/libsrc/usbip_host_common.c | 28 +++++++++++++++++++++++----- 2 files changed, 31 insertions(+), 6 deletions(-)
--- a/tools/usb/usbip/libsrc/usbip_common.c +++ b/tools/usb/usbip/libsrc/usbip_common.c @@ -215,9 +215,16 @@ int read_usb_interface(struct usbip_usb_ struct usbip_usb_interface *uinf) { char busid[SYSFS_BUS_ID_SIZE]; + int size; struct udev_device *sif;
- sprintf(busid, "%s:%d.%d", udev->busid, udev->bConfigurationValue, i); + size = snprintf(busid, sizeof(busid), "%s:%d.%d", + udev->busid, udev->bConfigurationValue, i); + if (size < 0 || (unsigned int)size >= sizeof(busid)) { + err("busid length %i >= %lu or < 0", size, + (long unsigned)sizeof(busid)); + return -1; + }
sif = udev_device_new_from_subsystem_sysname(udev_context, "usb", busid); if (!sif) { --- a/tools/usb/usbip/libsrc/usbip_host_common.c +++ b/tools/usb/usbip/libsrc/usbip_host_common.c @@ -40,13 +40,20 @@ struct udev *udev_context; static int32_t read_attr_usbip_status(struct usbip_usb_device *udev) { char status_attr_path[SYSFS_PATH_MAX]; + int size; int fd; int length; char status; int value = 0;
- snprintf(status_attr_path, SYSFS_PATH_MAX, "%s/usbip_status", - udev->path); + size = snprintf(status_attr_path, sizeof(status_attr_path), + "%s/usbip_status", udev->path); + if (size < 0 || (unsigned int)size >= sizeof(status_attr_path)) { + err("usbip_status path length %i >= %lu or < 0", size, + (long unsigned)sizeof(status_attr_path)); + return -1; + } +
fd = open(status_attr_path, O_RDONLY); if (fd < 0) { @@ -218,6 +225,7 @@ int usbip_export_device(struct usbip_exp { char attr_name[] = "usbip_sockfd"; char sockfd_attr_path[SYSFS_PATH_MAX]; + int size; char sockfd_buff[30]; int ret;
@@ -237,10 +245,20 @@ int usbip_export_device(struct usbip_exp }
/* only the first interface is true */ - snprintf(sockfd_attr_path, sizeof(sockfd_attr_path), "%s/%s", - edev->udev.path, attr_name); + size = snprintf(sockfd_attr_path, sizeof(sockfd_attr_path), "%s/%s", + edev->udev.path, attr_name); + if (size < 0 || (unsigned int)size >= sizeof(sockfd_attr_path)) { + err("exported device path length %i >= %lu or < 0", size, + (long unsigned)sizeof(sockfd_attr_path)); + return -1; + }
- snprintf(sockfd_buff, sizeof(sockfd_buff), "%d\n", sockfd); + size = snprintf(sockfd_buff, sizeof(sockfd_buff), "%d\n", sockfd); + if (size < 0 || (unsigned int)size >= sizeof(sockfd_buff)) { + err("socket length %i >= %lu or < 0", size, + (long unsigned)sizeof(sockfd_buff)); + return -1; + }
ret = write_sysfs_attribute(sockfd_attr_path, sockfd_buff, strlen(sockfd_buff));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Kleine-Budde mkl@pengutronix.de
commit 8cb68751c115d176ec851ca56ecfbb411568c9e8 upstream.
If an invalid CAN frame is received, from a driver or from a tun interface, a Kernel warning is generated.
This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a kernel, bootet with panic_on_warn, does not panic. A printk seems to be more appropriate here.
Reported-by: syzbot+4386709c0c1284dca827@syzkaller.appspotmail.com Suggested-by: Dmitry Vyukov dvyukov@google.com Acked-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/can/af_can.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -722,13 +722,12 @@ static int can_rcv(struct sk_buff *skb, if (unlikely(!net_eq(dev_net(dev), &init_net))) goto drop;
- if (WARN_ONCE(dev->type != ARPHRD_CAN || - skb->len != CAN_MTU || - cfd->len > CAN_MAX_DLEN, - "PF_CAN: dropped non conform CAN skbuf: " - "dev type %d, len %d, datalen %d\n", - dev->type, skb->len, cfd->len)) + if (unlikely(dev->type != ARPHRD_CAN || skb->len != CAN_MTU || + cfd->len > CAN_MAX_DLEN)) { + pr_warn_once("PF_CAN: dropped non conform CAN skbuf: dev type %d, len %d, datalen %d\n", + dev->type, skb->len, cfd->len); goto drop; + }
can_receive(skb, dev); return NET_RX_SUCCESS;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Kleine-Budde mkl@pengutronix.de
commit d4689846881d160a4d12a514e991a740bcb5d65a upstream.
If an invalid CANFD frame is received, from a driver or from a tun interface, a Kernel warning is generated.
This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a kernel, bootet with panic_on_warn, does not panic. A printk seems to be more appropriate here.
Reported-by: syzbot+e3b775f40babeff6e68b@syzkaller.appspotmail.com Suggested-by: Dmitry Vyukov dvyukov@google.com Acked-by: Oliver Hartkopp socketcan@hartkopp.net Cc: linux-stable stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/can/af_can.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -745,13 +745,12 @@ static int canfd_rcv(struct sk_buff *skb if (unlikely(!net_eq(dev_net(dev), &init_net))) goto drop;
- if (WARN_ONCE(dev->type != ARPHRD_CAN || - skb->len != CANFD_MTU || - cfd->len > CANFD_MAX_DLEN, - "PF_CAN: dropped non conform CAN FD skbuf: " - "dev type %d, len %d, datalen %d\n", - dev->type, skb->len, cfd->len)) + if (unlikely(dev->type != ARPHRD_CAN || skb->len != CANFD_MTU || + cfd->len > CANFD_MAX_DLEN)) { + pr_warn_once("PF_CAN: dropped non conform CAN FD skbuf: dev type %d, len %d, datalen %d\n", + dev->type, skb->len, cfd->len); goto drop; + }
can_receive(skb, dev); return NET_RX_SUCCESS;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Punit Agrawal punit.agrawal@arm.com
commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
KVM only supports PMD hugepages at stage 2 but doesn't actually check that the provided hugepage memory pagesize is PMD_SIZE before populating stage 2 entries.
In cases where the backing hugepage size is smaller than PMD_SIZE (such as when using contiguous hugepages), KVM can end up creating stage 2 mappings that extend beyond the supplied memory.
Fix this by checking for the pagesize of userspace vma before creating PMD hugepage at stage 2.
Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Punit Agrawal punit.agrawal@arm.com Cc: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1284,7 +1284,7 @@ static int user_mem_abort(struct kvm_vcp return -EFAULT; }
- if (is_vm_hugetlb_page(vma) && !logging_active) { + if (vma_kernel_pagesize(vma) && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else {
4.9-stable review patch. If anyone has any objections, please let me know.
I know that my response comes late but I just found out that the version of the patch applied in 4.9-stable, the one in this email, is different than the one applied on 4.14-stable. This is the one applied on 4.14: https://patchwork.kernel.org/patch/10177715/
As you can see, the check against PMD_SIZE which is present in 4.14 is not present in this version of the patch.
Am I missing something or should I send a patch to rectify this?
Thanks, Ioana Ciornei
From: Punit Agrawal punit.agrawal@arm.com
commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
KVM only supports PMD hugepages at stage 2 but doesn't actually check that the provided hugepage memory pagesize is PMD_SIZE before populating stage 2 entries.
In cases where the backing hugepage size is smaller than PMD_SIZE (such as when using contiguous hugepages), KVM can end up creating stage 2 mappings that extend beyond the supplied memory.
Fix this by checking for the pagesize of userspace vma before creating PMD hugepage at stage 2.
Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Punit Agrawal punit.agrawal@arm.com Cc: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1284,7 +1284,7 @@ static int user_mem_abort(struct kvm_vcp return -EFAULT; }
- if (is_vm_hugetlb_page(vma) && !logging_active) {
- if (vma_kernel_pagesize(vma) && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else {
Hi Ioana,
Ioana Ciornei ioana.ciornei@nxp.com writes:
4.9-stable review patch. If anyone has any objections, please let me know.
I know that my response comes late but I just found out that the version of the patch applied in 4.9-stable, the one in this email, is different than the one applied on 4.14-stable. This is the one applied on 4.14: https://patchwork.kernel.org/patch/10177715/
As you can see, the check against PMD_SIZE which is present in 4.14 is not present in this version of the patch.
Am I missing something or should I send a patch to rectify this?
You're right. The check against PMD_SIZE was lost during the rebase to v4.9. I should've noticed this before it got sent.
Please send a patch to fix this. Alternately, I can fix this with your reported by.
Thanks, Punit
Thanks, Ioana Ciornei
From: Punit Agrawal punit.agrawal@arm.com
commit c507babf10ead4d5c8cca704539b170752a8ac84 upstream.
KVM only supports PMD hugepages at stage 2 but doesn't actually check that the provided hugepage memory pagesize is PMD_SIZE before populating stage 2 entries.
In cases where the backing hugepage size is smaller than PMD_SIZE (such as when using contiguous hugepages), KVM can end up creating stage 2 mappings that extend beyond the supplied memory.
Fix this by checking for the pagesize of userspace vma before creating PMD hugepage at stage 2.
Fixes: 66b3923a1a0f77a ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Punit Agrawal punit.agrawal@arm.com Cc: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/arm/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1284,7 +1284,7 @@ static int user_mem_abort(struct kvm_vcp return -EFAULT; }
- if (is_vm_hugetlb_page(vma) && !logging_active) {
- if (vma_kernel_pagesize(vma) && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janakarajan Natarajan Janakarajan.Natarajan@amd.com
commit 88d879d29f9cc0de2d930b584285638cdada6625 upstream.
Newer hardware has uncovered a bug in the software implementation of using MWAITX for the delay function. A value of 0 for the timer is meant to indicate that a timeout will not be used to exit MWAITX. On newer hardware this can result in MWAITX never returning, resulting in NMI soft lockup messages being printed. On older hardware, some of the other conditions under which MWAITX can exit masked this issue. The AMD APM does not currently document this and will be updated.
Please refer to http://marc.info/?l=kvm&m=148950623231140 for information regarding NMI soft lockup messages on an AMD Ryzen 1800X. This has been root-caused as a 0 passed to MWAITX causing it to wait indefinitely.
This change has the added benefit of avoiding the unnecessary setup of MONITORX/MWAITX when the delay value is zero.
Signed-off-by: Janakarajan Natarajan Janakarajan.Natarajan@amd.com Link: http://lkml.kernel.org/r/1493156643-29366-1-git-send-email-Janakarajan.Natar... Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Davidlohr Bueso dbueso@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/lib/delay.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/x86/lib/delay.c +++ b/arch/x86/lib/delay.c @@ -93,6 +93,13 @@ static void delay_mwaitx(unsigned long _ { u64 start, end, delay, loops = __loops;
+ /* + * Timer value of 0 causes MWAITX to wait indefinitely, unless there + * is a store on the memory monitored by MONITORX. + */ + if (loops == 0) + return; + start = rdtsc_ordered();
for (;;) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudeep Holla sudeep.holla@arm.com
commit fac51482577d5e05bbb0efa8d602a3c2111098bf upstream.
With CONFIG_OF enabled on x86, we get the following error on boot: " Failed to find cpu0 device node Unable to detect cache hierarchy from DT for CPU 0 " and the cacheinfo fails to get populated in the corresponding sysfs entries. This is because cache_setup_of_node looks for of_node for setting up the shared cpu_map without checking that it's already populated in the architecture specific callback.
In order to indicate that the shared cpu_map is already populated, this patch introduces a boolean `cpu_map_populated` in struct cpu_cacheinfo that can be used by the generic code to skip cache_shared_cpu_map_setup.
This patch also sets that boolean for x86.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Mian Yousaf Kaukab yousaf.kaukab@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/intel_cacheinfo.c | 2 ++ drivers/base/cacheinfo.c | 3 +++ include/linux/cacheinfo.h | 1 + 3 files changed, 6 insertions(+)
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -934,6 +934,8 @@ static int __populate_cache_leaves(unsig ci_leaf_init(this_leaf++, &id4_regs); __cache_cpumap_setup(cpu, idx, &id4_regs); } + this_cpu_ci->cpu_map_populated = true; + return 0; }
--- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -106,6 +106,9 @@ static int cache_shared_cpu_map_setup(un unsigned int index; int ret;
+ if (this_cpu_ci->cpu_map_populated) + return 0; + ret = cache_setup_of_node(cpu); if (ret) return ret; --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -71,6 +71,7 @@ struct cpu_cacheinfo { struct cacheinfo *info_list; unsigned int num_levels; unsigned int num_leaves; + bool cpu_map_populated; };
/*
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudeep Holla sudeep.holla@arm.com
commit 55877ef45fbd7f975d078426866b7d1a2435dcc3 upstream.
ARM64 enables both CONFIG_OF and CONFIG_ACPI and the firmware can pass both ACPI tables and the device tree. Based on the kernel parameter, one of the two will be chosen. If acpi is enabled, then device tree is not unflattened.
Currently ARM64 platforms report: " Failed to find cpu0 device node Unable to detect cache hierarchy from DT for CPU 0 " which is incorrect when booting with ACPI. Also latest ACPI v6.1 has no support for cache properties/hierarchy.
This patch adds check for unflattened device tree and also returns as "not supported" if ACPI is runtime enabled.
It also removes the reference to DT from the error message as the cache hierarchy can be detected from the firmware(OF/DT/ACPI)
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Mian Yousaf Kaukab yousaf.kaukab@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/base/cacheinfo.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -16,6 +16,7 @@ * You should have received a copy of the GNU General Public License * along with this program. If not, see http://www.gnu.org/licenses/. */ +#include <linux/acpi.h> #include <linux/bitops.h> #include <linux/cacheinfo.h> #include <linux/compiler.h> @@ -104,12 +105,16 @@ static int cache_shared_cpu_map_setup(un struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); struct cacheinfo *this_leaf, *sib_leaf; unsigned int index; - int ret; + int ret = 0;
if (this_cpu_ci->cpu_map_populated) return 0;
- ret = cache_setup_of_node(cpu); + if (of_have_populated_dt()) + ret = cache_setup_of_node(cpu); + else if (!acpi_disabled) + /* No cache property/hierarchy support yet in ACPI */ + ret = -ENOTSUPP; if (ret) return ret;
@@ -206,8 +211,7 @@ static int detect_cache_attributes(unsig */ ret = cache_shared_cpu_map_setup(cpu); if (ret) { - pr_warn("Unable to detect cache hierarchy from DT for CPU %d\n", - cpu); + pr_warn("Unable to detect cache hierarchy for CPU %d\n", cpu); goto free_ci; } return 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko mhocko@suse.com
commit 561b5e0709e4a248c67d024d4d94b6e31e3edf2f upstream.
Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has introduced a regression in some rust and Java environments which are trying to implement their own stack guard page. They are punching a new MAP_FIXED mapping inside the existing stack Vma.
This will confuse expand_{downwards,upwards} into thinking that the stack expansion would in fact get us too close to an existing non-stack vma which is a correct behavior wrt safety. It is a real regression on the other hand.
Let's work around the problem by considering PROT_NONE mapping as a part of the stack. This is a gros hack but overflowing to such a mapping would trap anyway an we only can hope that usespace knows what it is doing and handle it propely.
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz Signed-off-by: Michal Hocko mhocko@suse.com Debugged-by: Vlastimil Babka vbabka@suse.cz Cc: Ben Hutchings ben@decadent.org.uk Cc: Willy Tarreau w@1wt.eu Cc: Oleg Nesterov oleg@redhat.com Cc: Rik van Riel riel@redhat.com Cc: Hugh Dickins hughd@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/mmap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -2240,7 +2240,8 @@ int expand_upwards(struct vm_area_struct gap_addr = TASK_SIZE;
next = vma->vm_next; - if (next && next->vm_start < gap_addr) { + if (next && next->vm_start < gap_addr && + (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { if (!(next->vm_flags & VM_GROWSUP)) return -ENOMEM; /* Check that both stack segments have the same anon_vma? */ @@ -2324,7 +2325,8 @@ int expand_downwards(struct vm_area_stru if (gap_addr > address) return -ENOMEM; prev = vma->vm_prev; - if (prev && prev->vm_end > gap_addr) { + if (prev && prev->vm_end > gap_addr && + (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { if (!(prev->vm_flags & VM_GROWSDOWN)) return -ENOMEM; /* Check that both stack segments have the same anon_vma? */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko mhocko@suse.com
commit 18365225f0440d09708ad9daade2ec11275c3df9 upstream.
Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In his particular case he has hit a bad_page("page still charged to cgroup") when onlining a hwpoison page. While this looks like something that shouldn't happen in the first place because onlining hwpages and returning them to the page allocator makes only little sense it shows a real problem.
hwpoison pages do not get freed usually so we do not uncharge them (at least not since commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")). Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css reference for each charged page")) as well and so the mem_cgroup and the associated state will never go away. Fix this leak by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also have to tweak uncharge_list because it cannot rely on zero ref count for these pages.
[akpm@linux-foundation.org: coding-style fixes] Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") Link: http://lkml.kernel.org/r/20170502185507.GB19165@dhcp22.suse.cz Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Laurent Dufour ldufour@linux.vnet.ibm.com Tested-by: Laurent Dufour ldufour@linux.vnet.ibm.com Reviewed-by: Balbir Singh bsingharora@gmail.com Reviewed-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memcontrol.c | 2 +- mm/memory-failure.c | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5531,7 +5531,7 @@ static void uncharge_list(struct list_he next = page->lru.next;
VM_BUG_ON_PAGE(PageLRU(page), page); - VM_BUG_ON_PAGE(page_count(page), page); + VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
if (!page->mem_cgroup) continue; --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -535,6 +535,13 @@ static int delete_from_lru_cache(struct */ ClearPageActive(p); ClearPageUnevictable(p); + + /* + * Poisoned page might never drop its ref count to 0 so we have + * to uncharge it manually from its memcg. + */ + mem_cgroup_uncharge(p); + /* * drop the page count elevated by isolate_lru_page() */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doug Berger opendmb@gmail.com
commit e048cb32f69038aa1c8f11e5c1b331be4181659d upstream.
The align_offset parameter is used by bitmap_find_next_zero_area_off() to represent the offset of map's base from the previous alignment boundary; the function ensures that the returned index, plus the align_offset, honors the specified align_mask.
The logic introduced by commit b5be83e308f7 ("mm: cma: align to physical address, not CMA region position") has the cma driver calculate the offset to the *next* alignment boundary. In most cases, the base alignment is greater than that specified when making allocations, resulting in a zero offset whether we align up or down. In the example given with the commit, the base alignment (8MB) was half the requested alignment (16MB) so the math also happened to work since the offset is 8MB in both directions. However, when requesting allocations with an alignment greater than twice that of the base, the returned index would not be correctly aligned.
Also, the align_order arguments of cma_bitmap_aligned_mask() and cma_bitmap_aligned_offset() should not be negative so the argument type was made unsigned.
Fixes: b5be83e308f7 ("mm: cma: align to physical address, not CMA region position") Link: http://lkml.kernel.org/r/20170628170742.2895-1-opendmb@gmail.com Signed-off-by: Angus Clark angus@angusclark.org Signed-off-by: Doug Berger opendmb@gmail.com Acked-by: Gregory Fong gregory.0xf0@gmail.com Cc: Doug Berger opendmb@gmail.com Cc: Angus Clark angus@angusclark.org Cc: Laura Abbott labbott@redhat.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Lucas Stach l.stach@pengutronix.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Shiraz Hashim shashim@codeaurora.org Cc: Jaewon Kim jaewon31.kim@samsung.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/cma.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
--- a/mm/cma.c +++ b/mm/cma.c @@ -54,7 +54,7 @@ unsigned long cma_get_size(const struct }
static unsigned long cma_bitmap_aligned_mask(const struct cma *cma, - int align_order) + unsigned int align_order) { if (align_order <= cma->order_per_bit) return 0; @@ -62,17 +62,14 @@ static unsigned long cma_bitmap_aligned_ }
/* - * Find a PFN aligned to the specified order and return an offset represented in - * order_per_bits. + * Find the offset of the base PFN from the specified align_order. + * The value returned is represented in order_per_bits. */ static unsigned long cma_bitmap_aligned_offset(const struct cma *cma, - int align_order) + unsigned int align_order) { - if (align_order <= cma->order_per_bit) - return 0; - - return (ALIGN(cma->base_pfn, (1UL << align_order)) - - cma->base_pfn) >> cma->order_per_bit; + return (cma->base_pfn & ((1UL << align_order) - 1)) + >> cma->order_per_bit; }
static unsigned long cma_bitmap_pages_to_bits(const struct cma *cma,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlastimil Babka vbabka@suse.cz
commit b050e3769c6b4013bb937e879fc43bf1847ee819 upstream.
Since commit 97a16fc82a7c ("mm, page_alloc: only enforce watermarks for order-0 allocations"), __zone_watermark_ok() check for high-order allocations will shortcut per-migratetype free list checks for ALLOC_HARDER allocations, and return true as long as there's free page of any migratetype. The intention is that ALLOC_HARDER can allocate from MIGRATE_HIGHATOMIC free lists, while normal allocations can't.
However, as a side effect, the watermark check will then also return true when there are pages only on the MIGRATE_ISOLATE list, or (prior to CMA conversion to ZONE_MOVABLE) on the MIGRATE_CMA list. Since the allocation cannot actually obtain isolated pages, and might not be able to obtain CMA pages, this can result in a false positive.
The condition should be rare and perhaps the outcome is not a fatal one. Still, it's better if the watermark check is correct. There also shouldn't be a performance tradeoff here.
Link: http://lkml.kernel.org/r/20171102125001.23708-1-vbabka@suse.cz Fixes: 97a16fc82a7c ("mm, page_alloc: only enforce watermarks for order-0 allocations") Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Mel Gorman mgorman@techsingularity.net Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Rik van Riel riel@redhat.com Cc: David Rientjes rientjes@google.com Cc: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/page_alloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2821,9 +2821,6 @@ bool __zone_watermark_ok(struct zone *z, if (!area->nr_free) continue;
- if (alloc_harder) - return true; - for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { if (!list_empty(&area->free_list[mt])) return true; @@ -2835,6 +2832,9 @@ bool __zone_watermark_ok(struct zone *z, return true; } #endif + if (alloc_harder && + !list_empty(&area->free_list[MIGRATE_HIGHATOMIC])) + return true; } return false; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
commit 999898355e08ae3b92dfd0a08db706e0c6703d30 upstream.
When LONG_MIN is passed to msgrcv, one would expect to recieve any message. But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is undefined. In particular, with my gcc -LONG_MIN produces -LONG_MIN again.
So handle this case properly by assigning LONG_MAX to *msgtyp if LONG_MIN was specified as msgtyp to msgrcv.
This code: long msg[] = { 100, 200 }; int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644); msgsnd(m, &msg, sizeof(msg), 0); msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);
produces currently nothing:
msgget(IPC_PRIVATE, IPC_CREAT|0644) = 65538 msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0 msgrcv(65538, ...
Except a UBSAN warning:
UBSAN: Undefined behaviour in ipc/msg.c:745:13 negation of -9223372036854775808 cannot be represented in type 'long int':
With the patch, I see what I expect:
msgget(IPC_PRIVATE, IPC_CREAT|0644) = 0 msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0 msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16
Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: Davidlohr Bueso dave@stgolabs.net Cc: Manfred Spraul manfred@colorfullife.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- ipc/msg.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/ipc/msg.c +++ b/ipc/msg.c @@ -763,7 +763,10 @@ static inline int convert_mode(long *msg if (*msgtyp == 0) return SEARCH_ANY; if (*msgtyp < 0) { - *msgtyp = -*msgtyp; + if (*msgtyp == LONG_MIN) /* -LONG_MIN is undefined */ + *msgtyp = LONG_MAX; + else + *msgtyp = -*msgtyp; return SEARCH_LESSEQUAL; } if (msgflg & MSG_EXCEPT)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit c2a6bbaf0c5f90463a7011a295bbdb7e33c80b51 upstream.
The way acpi_find_child_device() works currently is that, if there are two (or more) devices with the same _ADR value in the same namespace scope (which is not specifically allowed by the spec and the OS behavior in that case is not defined), the first one of them found to be present (with the help of _STA) will be returned.
This covers the majority of cases, but is not sufficient if some of the devices in question have a _HID (or _CID) returning some valid ACPI/PNP device IDs (which is disallowed by the spec) and the ASL writers' expectation appears to be that the OS will match devices without a valid ACPI/PNP device ID against a given bus address first.
To cover this special case as well, modify find_child_checks() to prefer devices without ACPI/PNP device IDs over devices that have them.
Suggested-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Tested-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/glue.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/acpi/glue.c +++ b/drivers/acpi/glue.c @@ -99,13 +99,13 @@ static int find_child_checks(struct acpi return -ENODEV;
/* - * If the device has a _HID (or _CID) returning a valid ACPI/PNP - * device ID, it is better to make it look less attractive here, so that - * the other device with the same _ADR value (that may not have a valid - * device ID) can be matched going forward. [This means a second spec - * violation in a row, so whatever we do here is best effort anyway.] + * If the device has a _HID returning a valid ACPI/PNP device ID, it is + * better to make it look less attractive here, so that the other device + * with the same _ADR value (that may not have a valid device ID) can be + * matched going forward. [This means a second spec violation in a row, + * so whatever we do here is best effort anyway.] */ - return sta_present && list_empty(&adev->pnp.ids) ? + return sta_present && !adev->pnp.type.platform_id ? FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Seunghun Han kkamagui@gmail.com
commit 3b2d69114fefa474fca542e51119036dceb4aa6f upstream.
ACPICA commit a23325b2e583556eae88ed3f764e457786bf4df6
I found some ACPI operand cache leaks in ACPI early abort cases.
Boot log of ACPI operand cache leak is as follows:
[ 0.174332] ACPI: Added _OSI(Module Device) [ 0.175504] ACPI: Added _OSI(Processor Device) [ 0.176010] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.177032] ACPI: Added _OSI(Processor Aggregator Device) [ 0.178284] ACPI: SCI (IRQ16705) allocation failed [ 0.179352] ACPI Exception: AE_NOT_ACQUIRED, Unable to install
System Control Interrupt handler (20160930/evevent-131)
[ 0.180008] ACPI: Unable to start the ACPI Interpreter [ 0.181125] ACPI Error: Could not remove SCI handler
(20160930/evmisc-281)
[ 0.184068] kmem_cache_destroy Acpi-Operand: Slab cache still has
objects
[ 0.185358] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc3 #2 [ 0.186820] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS
virtual_box 12/01/2006
[ 0.188000] Call Trace: [ 0.188000] ? dump_stack+0x5c/0x7d [ 0.188000] ? kmem_cache_destroy+0x224/0x230 [ 0.188000] ? acpi_sleep_proc_init+0x22/0x22 [ 0.188000] ? acpi_os_delete_cache+0xa/0xd [ 0.188000] ? acpi_ut_delete_caches+0x3f/0x7b [ 0.188000] ? acpi_terminate+0x5/0xf [ 0.188000] ? acpi_init+0x288/0x32e [ 0.188000] ? __class_create+0x4c/0x80 [ 0.188000] ? video_setup+0x7a/0x7a [ 0.188000] ? do_one_initcall+0x4e/0x1b0 [ 0.188000] ? kernel_init_freeable+0x194/0x21a [ 0.188000] ? rest_init+0x80/0x80 [ 0.188000] ? kernel_init+0xa/0x100 [ 0.188000] ? ret_from_fork+0x25/0x30
When early abort is occurred due to invalid ACPI information, Linux kernel terminates ACPI by calling acpi_terminate() function. The function calls acpi_ns_terminate() function to delete namespace data and ACPI operand cache (acpi_gbl_module_code_list).
But the deletion code in acpi_ns_terminate() function is wrapped in ACPI_EXEC_APP definition, therefore the code is only executed when the definition exists. If the define doesn't exist, ACPI operand cache (acpi_gbl_module_code_list) is leaked, and stack dump is shown in kernel log.
This causes a security threat because the old kernel (<= 4.9) shows memory locations of kernel functions in stack dump, therefore kernel ASLR can be neutralized.
To fix ACPI operand leak for enhancing security, I made a patch which removes the ACPI_EXEC_APP define in acpi_ns_terminate() function for executing the deletion code unconditionally.
Link: https://github.com/acpica/acpica/commit/a23325b2 Signed-off-by: Seunghun Han kkamagui@gmail.com Signed-off-by: Lv Zheng lv.zheng@intel.com Signed-off-by: Bob Moore robert.moore@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Lee, Chun-Yi jlee@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/nsutils.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-)
--- a/drivers/acpi/acpica/nsutils.c +++ b/drivers/acpi/acpica/nsutils.c @@ -594,25 +594,20 @@ struct acpi_namespace_node *acpi_ns_vali void acpi_ns_terminate(void) { acpi_status status; + union acpi_operand_object *prev; + union acpi_operand_object *next;
ACPI_FUNCTION_TRACE(ns_terminate);
-#ifdef ACPI_EXEC_APP - { - union acpi_operand_object *prev; - union acpi_operand_object *next; + /* Delete any module-level code blocks */
- /* Delete any module-level code blocks */ - - next = acpi_gbl_module_code_list; - while (next) { - prev = next; - next = next->method.mutex; - prev->method.mutex = NULL; /* Clear the Mutex (cheated) field */ - acpi_ut_remove_reference(prev); - } + next = acpi_gbl_module_code_list; + while (next) { + prev = next; + next = next->method.mutex; + prev->method.mutex = NULL; /* Clear the Mutex (cheated) field */ + acpi_ut_remove_reference(prev); } -#endif
/* * Free the entire namespace -- all nodes and all objects
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kevin Cernekee cernekee@chromium.org
commit 4b380c42f7d00a395feede754f0bc2292eebe6e5 upstream.
The capability check in nfnetlink_rcv() verifies that the caller has CAP_NET_ADMIN in the namespace that "owns" the netlink socket. However, nfnl_cthelper_list is shared by all net namespaces on the system. An unprivileged user can create user and net namespaces in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable() check:
$ nfct helper list nfct v1.4.4: netlink error: Operation not permitted $ vpnns -- nfct helper list { .name = ftp, .queuenum = 0, .l3protonum = 2, .l4protonum = 6, .priv_data_len = 24, .status = enabled, };
Add capable() checks in nfnetlink_cthelper, as this is cleaner than trying to generalize the solution.
Signed-off-by: Kevin Cernekee cernekee@chromium.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Acked-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nfnetlink_cthelper.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/net/netfilter/nfnetlink_cthelper.c +++ b/net/netfilter/nfnetlink_cthelper.c @@ -17,6 +17,7 @@ #include <linux/types.h> #include <linux/list.h> #include <linux/errno.h> +#include <linux/capability.h> #include <net/netlink.h> #include <net/sock.h>
@@ -392,6 +393,9 @@ static int nfnl_cthelper_new(struct net struct nfnl_cthelper *nlcth; int ret = 0;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE]) return -EINVAL;
@@ -595,6 +599,9 @@ static int nfnl_cthelper_get(struct net struct nfnl_cthelper *nlcth; bool tuple_set = false;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = nfnl_cthelper_dump_table, @@ -661,6 +668,9 @@ static int nfnl_cthelper_del(struct net struct nfnl_cthelper *nlcth, *n; int j = 0, ret;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (tb[NFCTH_NAME]) helper_name = nla_data(tb[NFCTH_NAME]);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kevin Cernekee cernekee@chromium.org
commit 916a27901de01446bcf57ecca4783f6cff493309 upstream.
The capability check in nfnetlink_rcv() verifies that the caller has CAP_NET_ADMIN in the namespace that "owns" the netlink socket. However, xt_osf_fingers is shared by all net namespaces on the system. An unprivileged user can create user and net namespaces in which he holds CAP_NET_ADMIN to bypass the netlink_net_capable() check:
vpnns -- nfnl_osf -f /tmp/pf.os
vpnns -- nfnl_osf -f /tmp/pf.os -d
These non-root operations successfully modify the systemwide OS fingerprint list. Add new capable() checks so that they can't.
Signed-off-by: Kevin Cernekee cernekee@chromium.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Acked-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/xt_osf.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/netfilter/xt_osf.c +++ b/net/netfilter/xt_osf.c @@ -19,6 +19,7 @@ #include <linux/module.h> #include <linux/kernel.h>
+#include <linux/capability.h> #include <linux/if.h> #include <linux/inetdevice.h> #include <linux/ip.h> @@ -69,6 +70,9 @@ static int xt_osf_add_callback(struct ne struct xt_osf_finger *kf = NULL, *sf; int err = 0;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (!osf_attrs[OSF_ATTR_FINGER]) return -EINVAL;
@@ -113,6 +117,9 @@ static int xt_osf_remove_callback(struct struct xt_osf_finger *sf; int err = -ENOENT;
+ if (!capable(CAP_NET_ADMIN)) + return -EPERM; + if (!osf_attrs[OSF_ATTR_FINGER]) return -EINVAL;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Mahoney jeffm@suse.com
commit 08db141b5313ac2f64b844fb5725b8d81744b417 upstream.
The main loop in __discard_prealloc is protected by the reiserfs write lock which is dropped across schedules like the BKL it replaced. The problem is that it checks the value, calls a routine that schedules, and then adjusts the state. As a result, two threads that are calling reiserfs_prealloc_discard at the same time can race when one calls reiserfs_free_prealloc_block, the lock is dropped, and the other calls reiserfs_free_prealloc_block with the same block number. In the right circumstances, it can cause the prealloc count to go negative.
Signed-off-by: Jeff Mahoney jeffm@suse.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/reiserfs/bitmap.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/fs/reiserfs/bitmap.c +++ b/fs/reiserfs/bitmap.c @@ -513,9 +513,17 @@ static void __discard_prealloc(struct re "inode has negative prealloc blocks count."); #endif while (ei->i_prealloc_count > 0) { - reiserfs_free_prealloc_block(th, inode, ei->i_prealloc_block); - ei->i_prealloc_block++; + b_blocknr_t block_to_free; + + /* + * reiserfs_free_prealloc_block can drop the write lock, + * which could allow another caller to free the same block. + * We can protect against it by modifying the prealloc + * state before calling it. + */ + block_to_free = ei->i_prealloc_block++; ei->i_prealloc_count--; + reiserfs_free_prealloc_block(th, inode, block_to_free); dirty = 1; } if (dirty)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeff Mahoney jeffm@suse.com
commit 54930dfeb46e978b447af0fb8ab4e181c1bf9d7a upstream.
Most extended attributes will fit in a single block. More importantly, we drop the reference to the inode while holding the transaction open so the preallocated blocks aren't released. As a result, the inode may be evicted before it's removed from the transaction's prealloc list which can cause memory corruption.
Signed-off-by: Jeff Mahoney jeffm@suse.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/reiserfs/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/reiserfs/bitmap.c +++ b/fs/reiserfs/bitmap.c @@ -1136,7 +1136,7 @@ static int determine_prealloc_size(reise hint->prealloc_size = 0;
if (!hint->formatted_node && hint->preallocate) { - if (S_ISREG(hint->inode->i_mode) + if (S_ISREG(hint->inode->i_mode) && !IS_PRIVATE(hint->inode) && hint->inode->i_size >= REISERFS_SB(hint->th->t_super)->s_alloc_options. preallocmin * hint->inode->i_sb->s_blocksize)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
commit fc3dc67471461c0efcb1ed22fb7595121d65fad9 upstream.
fcntl(0, F_SETOWN, 0x80000000) triggers: UBSAN: Undefined behaviour in fs/fcntl.c:118:7 negation of -2147483648 cannot be represented in type 'int': CPU: 1 PID: 18261 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1 ... Call Trace: ... [<ffffffffad8f0868>] ? f_setown+0x1d8/0x200 [<ffffffffad8f19a9>] ? SyS_fcntl+0x999/0xf30 [<ffffffffaed1fb00>] ? entry_SYSCALL_64_fastpath+0x23/0xc1
Fix that by checking the arg parameter properly (against INT_MAX) before "who = -who". And return immediatelly with -EINVAL in case it is wrong. Note that according to POSIX we can return EINVAL: http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html
[EINVAL] The cmd argument is F_SETOWN and the value of the argument is not valid as a process or process group identifier.
[v2] returns an error, v1 used to fail silently [v3] implement proper check for the bad value INT_MIN
Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: Jeff Layton jlayton@poochiereds.net Cc: "J. Bruce Fields" bfields@fieldses.org Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Jeff Layton jlayton@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fcntl.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -114,6 +114,10 @@ void f_setown(struct file *filp, unsigne int who = arg; type = PIDTYPE_PID; if (who < 0) { + /* avoid overflow below */ + if (who == INT_MIN) + return; + type = PIDTYPE_PGID; who = -who; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Thumshirn jthumshirn@suse.de
commit eef9ffdf9cd39b2986367bc8395e2772bc1284ba upstream.
The SCSI host byte should be shifted left by 16 in order to have scsi_decide_disposition() do the right thing (.i.e. requeue the command).
Signed-off-by: Johannes Thumshirn jthumshirn@suse.de Fixes: 661134ad3765 ("[SCSI] libiscsi, bnx2i: make bound ep check common") Cc: Lee Duncan lduncan@suse.com Cc: Hannes Reinecke hare@suse.de Cc: Bart Van Assche Bart.VanAssche@sandisk.com Cc: Chris Leech cleech@redhat.com Acked-by: Lee Duncan lduncan@suse.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/libiscsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1727,7 +1727,7 @@ int iscsi_queuecommand(struct Scsi_Host
if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) { reason = FAILURE_SESSION_IN_RECOVERY; - sc->result = DID_REQUEUE; + sc->result = DID_REQUEUE << 16; goto fault; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 5132ede0fe8092b043dae09a7cc32b8ae7272baa upstream.
This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12.
Turns out distros do not want to make retpoline as part of their "ABI", so this patch should not have been merged. Sorry Andi, this was my fault, I suggested it when your original patch was the "correct" way of doing this instead.
Reported-by: Jiri Kosina jikos@kernel.org Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC") Acked-by: Andi Kleen ak@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: rusty@rustcorp.com.au Cc: arjan.van.de.ven@intel.com Cc: jeyu@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/vermagic.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/include/linux/vermagic.h +++ b/include/linux/vermagic.h @@ -24,16 +24,10 @@ #ifndef MODULE_ARCH_VERMAGIC #define MODULE_ARCH_VERMAGIC "" #endif -#ifdef RETPOLINE -#define MODULE_VERMAGIC_RETPOLINE "retpoline " -#else -#define MODULE_VERMAGIC_RETPOLINE "" -#endif
#define VERMAGIC_STRING \ UTS_RELEASE " " \ MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \ MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS \ - MODULE_ARCH_VERMAGIC \ - MODULE_VERMAGIC_RETPOLINE + MODULE_ARCH_VERMAGIC
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Weiner hannes@cmpxchg.org
commit c73322d098e4b6f5f0f0fa1330bf57e218775539 upstream.
Patch series "mm: kswapd spinning on unreclaimable nodes - fixes and cleanups".
Jia reported a scenario in which the kswapd of a node indefinitely spins at 100% CPU usage. We have seen similar cases at Facebook.
The kernel's current method of judging its ability to reclaim a node (or whether to back off and sleep) is based on the amount of scanned pages in proportion to the amount of reclaimable pages. In Jia's and our scenarios, there are no reclaimable pages in the node, however, and the condition for backing off is never met. Kswapd busyloops in an attempt to restore the watermarks while having nothing to work with.
This series reworks the definition of an unreclaimable node based not on scanning but on whether kswapd is able to actually reclaim pages in MAX_RECLAIM_RETRIES (16) consecutive runs. This is the same criteria the page allocator uses for giving up on direct reclaim and invoking the OOM killer. If it cannot free any pages, kswapd will go to sleep and leave further attempts to direct reclaim invocations, which will either make progress and re-enable kswapd, or invoke the OOM killer.
Patch #1 fixes the immediate problem Jia reported, the remainder are smaller fixlets, cleanups, and overall phasing out of the old method.
Patch #6 is the odd one out. It's a nice cleanup to get_scan_count(), and directly related to #5, but in itself not relevant to the series.
If the whole series is too ambitious for 4.11, I would consider the first three patches fixes, the rest cleanups.
This patch (of 9):
Jia He reports a problem with kswapd spinning at 100% CPU when requesting more hugepages than memory available in the system:
$ echo 4000 >/proc/sys/vm/nr_hugepages
top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
At that time, there are no reclaimable pages left in the node, but as kswapd fails to restore the high watermarks it refuses to go to sleep.
Kswapd needs to back away from nodes that fail to balance. Up until commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") kswapd had such a mechanism. It considered zones whose theoretically reclaimable pages it had reclaimed six times over as unreclaimable and backed away from them. This guard was erroneously removed as the patch changed the definition of a balanced node.
However, simply restoring this code wouldn't help in the case reported here: there *are* no reclaimable pages that could be scanned until the threshold is met. Kswapd would stay awake anyway.
Introduce a new and much simpler way of backing off. If kswapd runs through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single page, make it back off from the node. This is the same number of shots direct reclaim takes before declaring OOM. Kswapd will go to sleep on that node until a direct reclaimer manages to reclaim some pages, thus proving the node reclaimable again.
[hannes@cmpxchg.org: check kswapd failure against the cumulative nr_reclaimed count] Link: http://lkml.kernel.org/r/20170306162410.GB2090@cmpxchg.org [shakeelb@google.com: fix condition for throttle_direct_reclaim] Link: http://lkml.kernel.org/r/20170314183228.20152-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20170228214007.5621-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Shakeel Butt shakeelb@google.com Reported-by: Jia He hejianet@gmail.com Tested-by: Jia He hejianet@gmail.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Hillf Danton hillf.zj@alibaba-inc.com Acked-by: Minchan Kim minchan@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Dmitry Shmidt dimitrysh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mmzone.h | 2 ++ mm/internal.h | 6 ++++++ mm/page_alloc.c | 9 ++------- mm/vmscan.c | 47 ++++++++++++++++++++++++++++++++--------------- mm/vmstat.c | 2 +- 5 files changed, 43 insertions(+), 23 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -633,6 +633,8 @@ typedef struct pglist_data { int kswapd_order; enum zone_type kswapd_classzone_idx;
+ int kswapd_failures; /* Number of 'reclaimed == 0' runs */ + #ifdef CONFIG_COMPACTION int kcompactd_max_order; enum zone_type kcompactd_classzone_idx; --- a/mm/internal.h +++ b/mm/internal.h @@ -74,6 +74,12 @@ static inline void set_page_refcounted(s extern unsigned long highest_memmap_pfn;
/* + * Maximum number of reclaim retries without progress before the OOM + * killer is consider the only way forward. + */ +#define MAX_RECLAIM_RETRIES 16 + +/* * in mm/vmscan.c: */ extern int isolate_lru_page(struct page *page); --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3422,12 +3422,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_ma }
/* - * Maximum number of reclaim retries without any progress before OOM killer - * is consider as the only way to move forward. - */ -#define MAX_RECLAIM_RETRIES 16 - -/* * Checks whether it makes sense to retry the reclaim to make a forward progress * for the given allocation request. * The reclaim feedback represented by did_some_progress (any progress during @@ -4385,7 +4379,8 @@ void show_free_areas(unsigned int filter K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), K(node_page_state(pgdat, NR_UNSTABLE_NFS)), node_page_state(pgdat, NR_PAGES_SCANNED), - !pgdat_reclaimable(pgdat) ? "yes" : "no"); + pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ? + "yes" : "no"); }
for_each_populated_zone(zone) { --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2606,6 +2606,15 @@ static bool shrink_node(pg_data_t *pgdat } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, sc->nr_scanned - nr_scanned, sc));
+ /* + * Kswapd gives up on balancing particular nodes after too + * many failures to reclaim anything from them and goes to + * sleep. On reclaim progress, reset the failure counter. A + * successful direct reclaim run will revive a dormant kswapd. + */ + if (reclaimable) + pgdat->kswapd_failures = 0; + return reclaimable; }
@@ -2680,10 +2689,6 @@ static void shrink_zones(struct zonelist GFP_KERNEL | __GFP_HARDWALL)) continue;
- if (sc->priority != DEF_PRIORITY && - !pgdat_reclaimable(zone->zone_pgdat)) - continue; /* Let kswapd poll it */ - /* * If we already have plenty of memory free for * compaction in this zone, don't free any more. @@ -2820,7 +2825,7 @@ retry: return 0; }
-static bool pfmemalloc_watermark_ok(pg_data_t *pgdat) +static bool allow_direct_reclaim(pg_data_t *pgdat) { struct zone *zone; unsigned long pfmemalloc_reserve = 0; @@ -2828,6 +2833,9 @@ static bool pfmemalloc_watermark_ok(pg_d int i; bool wmark_ok;
+ if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) + return true; + for (i = 0; i <= ZONE_NORMAL; i++) { zone = &pgdat->node_zones[i]; if (!managed_zone(zone) || @@ -2908,7 +2916,7 @@ static bool throttle_direct_reclaim(gfp_
/* Throttle based on the first usable node */ pgdat = zone->zone_pgdat; - if (pfmemalloc_watermark_ok(pgdat)) + if (allow_direct_reclaim(pgdat)) goto out; break; } @@ -2930,14 +2938,14 @@ static bool throttle_direct_reclaim(gfp_ */ if (!(gfp_mask & __GFP_FS)) { wait_event_interruptible_timeout(pgdat->pfmemalloc_wait, - pfmemalloc_watermark_ok(pgdat), HZ); + allow_direct_reclaim(pgdat), HZ);
goto check_pending; }
/* Throttle until kswapd wakes the process */ wait_event_killable(zone->zone_pgdat->pfmemalloc_wait, - pfmemalloc_watermark_ok(pgdat)); + allow_direct_reclaim(pgdat));
check_pending: if (fatal_signal_pending(current)) @@ -3116,7 +3124,7 @@ static bool prepare_kswapd_sleep(pg_data
/* * The throttled processes are normally woken up in balance_pgdat() as - * soon as pfmemalloc_watermark_ok() is true. But there is a potential + * soon as allow_direct_reclaim() is true. But there is a potential * race between when kswapd checks the watermarks and a process gets * throttled. There is also a potential race if processes get * throttled, kswapd wakes, a large process exits thereby balancing the @@ -3130,6 +3138,10 @@ static bool prepare_kswapd_sleep(pg_data if (waitqueue_active(&pgdat->pfmemalloc_wait)) wake_up_all(&pgdat->pfmemalloc_wait);
+ /* Hopeless node, leave it to direct reclaim */ + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) + return true; + for (i = 0; i <= classzone_idx; i++) { struct zone *zone = pgdat->node_zones + i;
@@ -3216,9 +3228,9 @@ static int balance_pgdat(pg_data_t *pgda count_vm_event(PAGEOUTRUN);
do { + unsigned long nr_reclaimed = sc.nr_reclaimed; bool raise_priority = true;
- sc.nr_reclaimed = 0; sc.reclaim_idx = classzone_idx;
/* @@ -3297,7 +3309,7 @@ static int balance_pgdat(pg_data_t *pgda * able to safely make forward progress. Wake them */ if (waitqueue_active(&pgdat->pfmemalloc_wait) && - pfmemalloc_watermark_ok(pgdat)) + allow_direct_reclaim(pgdat)) wake_up_all(&pgdat->pfmemalloc_wait);
/* Check if kswapd should be suspending */ @@ -3308,10 +3320,14 @@ static int balance_pgdat(pg_data_t *pgda * Raise priority if scanning rate is too low or there was no * progress in reclaiming pages */ - if (raise_priority || !sc.nr_reclaimed) + nr_reclaimed = sc.nr_reclaimed - nr_reclaimed; + if (raise_priority || !nr_reclaimed) sc.priority--; } while (sc.priority >= 1);
+ if (!sc.nr_reclaimed) + pgdat->kswapd_failures++; + out: /* * Return the order kswapd stopped reclaiming at as @@ -3511,6 +3527,10 @@ void wakeup_kswapd(struct zone *zone, in if (!waitqueue_active(&pgdat->kswapd_wait)) return;
+ /* Hopeless node, leave it to direct reclaim */ + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) + return; + /* Only wake kswapd if all zones are unbalanced */ for (z = 0; z <= classzone_idx; z++) { zone = pgdat->node_zones + z; @@ -3781,9 +3801,6 @@ int node_reclaim(struct pglist_data *pgd sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages) return NODE_RECLAIM_FULL;
- if (!pgdat_reclaimable(pgdat)) - return NODE_RECLAIM_FULL; - /* * Do not scan if the allocation should not be delayed. */ --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1421,7 +1421,7 @@ static void zoneinfo_show_print(struct s "\n node_unreclaimable: %u" "\n start_pfn: %lu" "\n node_inactive_ratio: %u", - !pgdat_reclaimable(zone->zone_pgdat), + pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES, zone->zone_start_pfn, zone->zone_pgdat->inactive_ratio); seq_putc(m, '\n');
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaron Ma aaron.ma@canonical.com
commit f5d07b9e98022d50720e38aa936fc11c67868ece upstream.
Lenovo introduced trackpoint compatible sticks with minimum PS/2 commands. They supposed to reply with 0x02, 0x03, or 0x04 in response to the "Read Extended ID" command, so we would know not to try certain extended commands. Unfortunately even some trackpoints reporting the original IBM version (0x01 firmware 0x0e) now respond with incorrect data to the "Get Extended Buttons" command:
thinkpad_acpi: ThinkPad BIOS R0DET87W (1.87 ), EC unknown thinkpad_acpi: Lenovo ThinkPad E470, model 20H1004SGE
psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 0/0
Since there are no trackpoints without buttons, let's assume the trackpoint has 3 buttons when we get 0 response to the extended buttons query.
Signed-off-by: Aaron Ma aaron.ma@canonical.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196253 Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/mouse/trackpoint.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/mouse/trackpoint.c +++ b/drivers/input/mouse/trackpoint.c @@ -383,6 +383,9 @@ int trackpoint_detect(struct psmouse *ps if (trackpoint_read(&psmouse->ps2dev, TP_EXT_BTN, &button_info)) { psmouse_warn(psmouse, "failed to get extended button data, assuming 3 buttons\n"); button_info = 0x33; + } else if (!button_info) { + psmouse_warn(psmouse, "got 0 in extended button data, assuming 3 buttons\n"); + button_info = 0x33; }
psmouse->private = kzalloc(sizeof(struct trackpoint_data), GFP_KERNEL);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Brandenburg martin@omnibond.com
commit 6793f1c450b1533a5e9c2493490de771d38b24f9 upstream.
After do_readv_writev, the inode cache is invalidated anyway, so i_size will never be read. It will be fetched from the server which will also know about updates from other machines.
Fixes deadlock on 32-bit SMP.
See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2
Signed-off-by: Martin Brandenburg martin@omnibond.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Mike Marshall hubcap@omnibond.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/orangefs/file.c | 7 ++----- fs/orangefs/orangefs-kernel.h | 11 ----------- 2 files changed, 2 insertions(+), 16 deletions(-)
--- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -446,7 +446,7 @@ ssize_t orangefs_inode_read(struct inode static ssize_t orangefs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; - loff_t pos = *(&iocb->ki_pos); + loff_t pos = iocb->ki_pos; ssize_t rc = 0;
BUG_ON(iocb->private); @@ -485,9 +485,6 @@ static ssize_t orangefs_file_write_iter( } }
- if (file->f_pos > i_size_read(file->f_mapping->host)) - orangefs_i_size_write(file->f_mapping->host, file->f_pos); - rc = generic_write_checks(iocb, iter);
if (rc <= 0) { @@ -501,7 +498,7 @@ static ssize_t orangefs_file_write_iter( * pos to the end of the file, so we will wait till now to set * pos... */ - pos = *(&iocb->ki_pos); + pos = iocb->ki_pos;
rc = do_readv_writev(ORANGEFS_IO_WRITE, file, --- a/fs/orangefs/orangefs-kernel.h +++ b/fs/orangefs/orangefs-kernel.h @@ -570,17 +570,6 @@ do { \ sys_attr.mask = ORANGEFS_ATTR_SYS_ALL_SETABLE; \ } while (0)
-static inline void orangefs_i_size_write(struct inode *inode, loff_t i_size) -{ -#if BITS_PER_LONG == 32 && defined(CONFIG_SMP) - inode_lock(inode); -#endif - i_size_write(inode, i_size); -#if BITS_PER_LONG == 32 && defined(CONFIG_SMP) - inode_unlock(inode); -#endif -} - static inline void orangefs_set_timeout(struct dentry *dentry) { unsigned long time = jiffies + orangefs_dcache_timeout_msecs*HZ/1000;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Meyer thomas@m3y3r.de
commit 883354afbc109c57f925ccc19840055193da0cc0 upstream.
Debian's gcc defaults to pie. The global Makefile already defines the -fno-pie option. Link UML dynamic kernel image also with -no-pie to fix the build.
Signed-off-by: Thomas Meyer thomas@m3y3r.de Signed-off-by: Richard Weinberger richard@nod.at Cc: Bernie Innocenti codewiz@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/um/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/um/Makefile +++ b/arch/um/Makefile @@ -117,7 +117,7 @@ archheaders: archprepare: include/generated/user_constants.h
LINK-$(CONFIG_LD_SCRIPT_STATIC) += -static -LINK-$(CONFIG_LD_SCRIPT_DYN) += -Wl,-rpath,/lib +LINK-$(CONFIG_LD_SCRIPT_DYN) += -Wl,-rpath,/lib $(call cc-option, -no-pie)
CFLAGS_NO_HARDENING := $(call cc-option, -fno-PIC,) $(call cc-option, -fno-pic,) \ $(call cc-option, -fno-stack-protector,) \
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Hutchings ben.hutchings@codethink.co.uk
The backport of KAISER to 4.4 turned vsyscall emulate mode into native mode. Add a vsyscall_pgprot variable to hold the correct page protections, like Borislav and Hugh did for 3.2 and 3.18.
Cc: Borislav Petkov bp@suse.de Cc: Hugh Dickins hughd@google.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/vsyscall/vsyscall_64.c | 7 ++++--- arch/x86/include/asm/vsyscall.h | 1 + arch/x86/mm/kaiser.c | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -46,6 +46,7 @@ static enum { EMULATE, NATIVE, NONE } vs #else EMULATE; #endif +unsigned long vsyscall_pgprot = __PAGE_KERNEL_VSYSCALL;
static int __init vsyscall_setup(char *str) { @@ -336,11 +337,11 @@ void __init map_vsyscall(void) extern char __vsyscall_page; unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
+ if (vsyscall_mode != NATIVE) + vsyscall_pgprot = __PAGE_KERNEL_VVAR; if (vsyscall_mode != NONE) __set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall, - vsyscall_mode == NATIVE - ? PAGE_KERNEL_VSYSCALL - : PAGE_KERNEL_VVAR); + __pgprot(vsyscall_pgprot));
BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) != (unsigned long)VSYSCALL_ADDR); --- a/arch/x86/include/asm/vsyscall.h +++ b/arch/x86/include/asm/vsyscall.h @@ -13,6 +13,7 @@ extern void map_vsyscall(void); */ extern bool emulate_vsyscall(struct pt_regs *regs, unsigned long address); extern bool vsyscall_enabled(void); +extern unsigned long vsyscall_pgprot; #else static inline void map_vsyscall(void) {} static inline bool emulate_vsyscall(struct pt_regs *regs, unsigned long address) --- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -344,7 +344,7 @@ void __init kaiser_init(void) if (vsyscall_enabled()) kaiser_add_user_map_early((void *)VSYSCALL_ADDR, PAGE_SIZE, - __PAGE_KERNEL_VSYSCALL); + vsyscall_pgprot);
for_each_possible_cpu(cpu) { void *percpu_vaddr = __per_cpu_user_mapped_start +
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg KH gregkh@linuxfoundation.org
commit 7e040726850a106587485c21bdacc0bfc8a0cbed upstream.
[resend due to me forgetting to cc: linux-api the first time around I posted these back on Feb 23]
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
For some reason these values are not in the uapi header file, so any libc has to define it themselves. To prevent them from needing to do this, just have the kernel provide the correct values.
Reported-by: Elliott Hughes enh@google.com Signed-off-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/uapi/linux/eventpoll.h | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -26,6 +26,19 @@ #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3
+/* Epoll event masks */ +#define EPOLLIN 0x00000001 +#define EPOLLPRI 0x00000002 +#define EPOLLOUT 0x00000004 +#define EPOLLERR 0x00000008 +#define EPOLLHUP 0x00000010 +#define EPOLLRDNORM 0x00000040 +#define EPOLLRDBAND 0x00000080 +#define EPOLLWRNORM 0x00000100 +#define EPOLLWRBAND 0x00000200 +#define EPOLLMSG 0x00000400 +#define EPOLLRDHUP 0x00002000 + /* Set exclusive wakeup mode for the target file descriptor */ #define EPOLLEXCLUSIVE (1 << 28)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit dd5684ecae3bd8e44b644f50e2c12c7e57fdfef5 ]
ccid2_hc_tx_rto_expire() timer callback always restarts the timer again and can run indefinitely (unless it is stopped outside), and after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"), which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from dccp_destroy_sock() to sk_destruct(), this started to happen quite often. The timer prevents releasing the socket, as a result, sk_destruct() won't be called.
Found with LTP/dccp_ipsec tests running on the bonding device, which later couldn't be unloaded after the tests were completed:
unregister_netdevice: waiting for bond0 to become free. Usage count = 148
Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation") Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid2.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/dccp/ccids/ccid2.c +++ b/net/dccp/ccids/ccid2.c @@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(unsig
ccid2_pr_debug("RTO_EXPIRE\n");
+ if (sk->sk_state == DCCP_CLOSED) + goto out; + /* back-off timer */ hc->tx_rto <<= 1; if (hc->tx_rto > DCCP_RTO_MAX)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Hutchings ben.hutchings@codethink.co.uk
[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used.
The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level.
Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ipv6.h | 1 + net/ipv6/ip6_output.c | 2 +- net/ipv6/ipv6_sockglue.c | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-)
--- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -290,6 +290,7 @@ int ipv6_flowlabel_opt_get(struct sock * int flags); int ip6_flowlabel_init(void); void ip6_flowlabel_cleanup(void); +bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np);
static inline void fl6_sock_release(struct ip6_flowlabel *fl) { --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -156,7 +156,7 @@ int ip6_output(struct net *net, struct s !(IP6CB(skb)->flags & IP6SKB_REROUTED)); }
-static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np) +bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np) { if (!np->autoflowlabel_set) return ip6_default_np_autolabel(net); --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -1316,7 +1316,7 @@ static int do_ipv6_getsockopt(struct soc break;
case IPV6_AUTOFLOWLABEL: - val = np->autoflowlabel; + val = ip6_autoflowlabel(sock_net(sk), np); break;
default:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Maloney maloney@google.com
[ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ]
The logic in __ip6_append_data() assumes that the MTU is at least large enough for the headers. A device's MTU may be adjusted after being added while sendmsg() is processing data, resulting in __ip6_append_data() seeing any MTU. For an mtu smaller than the size of the fragmentation header, the math results in a negative 'maxfraglen', which causes problems when refragmenting any previous skb in the skb_write_queue, leaving it possibly malformed.
Instead sendmsg returns EINVAL when the mtu is calculated to be less than IPV6_MIN_MTU.
Found by syzkaller: kernel BUG at ./include/linux/skbuff.h:2064! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d0b68580 task.stack: ffff8801ac6b8000 RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline] RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216 RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000 RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0 RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000 R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8 R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000 FS: 00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:911 [inline] udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x352/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512e9 RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9 RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005 RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69 R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000 Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570 RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570
Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Mike Maloney maloney@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1260,14 +1260,16 @@ static int ip6_setup_cork(struct sock *s v6_cork->tclass = ipc6->tclass; if (rt->dst.flags & DST_XFRM_TUNNEL) mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? - rt->dst.dev->mtu : dst_mtu(&rt->dst); + READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst); else mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ? - rt->dst.dev->mtu : dst_mtu(rt->dst.path); + READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path); if (np->frag_size < mtu) { if (np->frag_size) mtu = np->frag_size; } + if (mtu < IPV6_MIN_MTU) + return -EINVAL; cork->base.fragsize = mtu; if (dst_allfrag(rt->dst.path)) cork->base.flags |= IPCORK_ALLFRAG;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]
In my last patch, I missed fact that cork.base.dst was not initialized in ip6_make_skb() :
If ip6_setup_cork() returns an error, we might attempt a dst_release() on some random pointer.
Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1800,6 +1800,7 @@ struct sk_buff *ip6_make_skb(struct sock cork.base.flags = 0; cork.base.addr = 0; cork.base.opt = NULL; + cork.base.dst = NULL; v6_cork.opt = NULL; err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6); if (err) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuiko Oshino yuiko.oshino@microchip.com
[ Upstream commit a5b1379afbfabf91e3a689e82ac619a7157336b3 ]
Fix initialize the uninitialized tx_qlen to an appropriate value when USB Full Speed is used.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Yuiko Oshino yuiko.oshino@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/lan78xx.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2197,6 +2197,7 @@ static int lan78xx_reset(struct lan78xx_ buf = DEFAULT_BURST_CAP_SIZE / FS_USB_PKT_SIZE; dev->rx_urb_size = DEFAULT_BURST_CAP_SIZE; dev->rx_qlen = 4; + dev->tx_qlen = 4; }
ret = lan78xx_write_reg(dev, BURST_CAP, buf);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]
Commit "net: igmp: Use correct source address on IGMPv3 reports" introduced a check to validate the source address of locally generated IGMPv3 packets. Instead of checking the local interface address directly, it uses inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the local subnet (or equal to the point-to-point address if used).
This breaks for point-to-point interfaces, so check against ifa->ifa_local directly.
Cc: Kevin Cernekee cernekee@chromium.org Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports") Reported-by: Sebastian Gottschall s.gottschall@dd-wrt.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/igmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -332,7 +332,7 @@ static __be32 igmpv3_get_srcaddr(struct return htonl(INADDR_ANY);
for_ifa(in_dev) { - if (inet_ifa_match(fl4->saddr, ifa)) + if (fl4->saddr == ifa->ifa_local) return fl4->saddr; } endfor_ifa(in_dev);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]
Without proper validation of DODGY packets, we might very well feed qdisc_pkt_len_init() with invalid GSO packets.
tcp_hdrlen() might access out-of-bound data, so let's use skb_header_pointer() and proper checks.
Whole story is described in commit d0c081b49137 ("flow_dissector: properly cap thoff field")
We have the goal of validating DODGY packets earlier in the stack, so we might very well revert this fix in the future.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Cc: Jason Wang jasowang@redhat.com Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -3083,10 +3083,21 @@ static void qdisc_pkt_len_init(struct sk hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
/* + transport layer */ - if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) - hdr_len += tcp_hdrlen(skb); - else - hdr_len += sizeof(struct udphdr); + if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) { + const struct tcphdr *th; + struct tcphdr _tcphdr; + + th = skb_header_pointer(skb, skb_transport_offset(skb), + sizeof(_tcphdr), &_tcphdr); + if (likely(th)) + hdr_len += __tcp_hdrlen(th); + } else { + struct udphdr _udphdr; + + if (skb_header_pointer(skb, skb_transport_offset(skb), + sizeof(_udphdr), &_udphdr)) + hdr_len += sizeof(struct udphdr); + }
if (shinfo->gso_type & SKB_GSO_DODGY) gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Streetman ddstreet@ieee.org
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]
When a tcp socket is closed, if it detects that its net namespace is exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will never exit while the socket is open. However, kernel sockets do not take a reference to their net namespace, so it may begin exiting while the kernel socket is still open. In this case if the kernel socket is a tcp socket, it will stay open trying to complete its close sequence. The sock's dst(s) hold a reference to their interface, which are all transferred to the namespace's loopback interface when the real interfaces are taken down. When the namespace tries to take down its loopback interface, it hangs waiting for all references to the loopback interface to release, which results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes. Since the net namespace cleanup holds the net_mutex while calling its registered pernet callbacks, any new net namespace initialization is blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and closes immediately, releasing its dst(s) and their reference to the loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811 Signed-off-by: Dan Streetman ddstreet@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/net_namespace.h | 10 ++++++++++ net/ipv4/tcp.c | 3 +++ net/ipv4/tcp_timer.c | 15 +++++++++++++++ 3 files changed, 28 insertions(+)
--- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -213,6 +213,11 @@ int net_eq(const struct net *net1, const return net1 == net2; }
+static inline int check_net(const struct net *net) +{ + return atomic_read(&net->count) != 0; +} + void net_drop_ns(void *);
#else @@ -236,6 +241,11 @@ int net_eq(const struct net *net1, const { return 1; } + +static inline int check_net(const struct net *net) +{ + return 1; +}
#define net_drop_ns NULL #endif --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2215,6 +2215,9 @@ adjudge_to_death: tcp_send_active_reset(sk, GFP_ATOMIC); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); + } else if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_set_state(sk, TCP_CLOSE); } }
--- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *s * to prevent DoS attacks. It is called when a retransmission timeout * or zero probe timeout occurs on orphaned socket. * + * Also close if our net namespace is exiting; in that case there is no + * hope of ever communicating again since all netns interfaces are already + * down (or about to be down), and we need to release our dst references, + * which have been moved to the netns loopback interface, so the namespace + * can finish exiting. This condition is only possible if we are a kernel + * socket, as those do not hold references to the namespace. + * * Criteria is still not confirmed experimentally and may change. * We kill the socket, if: * 1. If number of orphaned sockets exceeds an administratively configured * limit. * 2. If we have strong memory pressure. + * 3. If our net namespace is exiting. */ static int tcp_out_of_resources(struct sock *sk, bool do_reset) { @@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct s __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY); return 1; } + + if (!check_net(sock_net(sk))) { + /* Not possible to send reset; just close */ + tcp_done(sk); + return 1; + } + return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit 02612bb05e51df8489db5e94d0cf8d1c81f87b0c ]
In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom was probably fine before the introduction of ->needed_headroom in commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").
But now, virtual devices typically advertise the size of their overhead in dev->needed_headroom, so we must also take it into account in skb_reserve(). Allocation size of skb is also updated to take dev->needed_tailroom into account and replace the arbitrary 32 bytes with the real size of a PPPoE header.
This issue was discovered by syzbot, who connected a pppoe socket to a gre device which had dev->header_ops->create == ipgre_header and dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any headroom, and dev_hard_header() crashed when ipgre_header() tried to prepend its header to skb->data.
skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24 head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted 4.15.0-rc7-next-20180115+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282 RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000 RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0 R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180 FS: 00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_under_panic net/core/skbuff.c:114 [inline] skb_push+0xce/0xf0 net/core/skbuff.c:1714 ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879 dev_hard_header include/linux/netdevice.h:2723 [inline] pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 sock_write_iter+0x31a/0x5d0 net/socket.c:909 call_write_iter include/linux/fs.h:1775 [inline] do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653 do_iter_write+0x154/0x540 fs/read_write.c:932 vfs_writev+0x18a/0x340 fs/read_write.c:977 do_writev+0xfc/0x2a0 fs/read_write.c:1012 SYSC_writev fs/read_write.c:1085 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1082 entry_SYSCALL_64_fastpath+0x29/0xa0
Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like interfaces, but reserving space for ->needed_headroom is a more fundamental issue that needs to be addressed first.
Same problem exists for __pppoe_xmit(), which also needs to take dev->needed_headroom into account in skb_cow_head().
Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Reviewed-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/pppoe.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -842,6 +842,7 @@ static int pppoe_sendmsg(struct socket * struct pppoe_hdr *ph; struct net_device *dev; char *start; + int hlen;
lock_sock(sk); if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) { @@ -860,16 +861,16 @@ static int pppoe_sendmsg(struct socket * if (total_len > (dev->mtu + dev->hard_header_len)) goto end;
- - skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32, - 0, GFP_KERNEL); + hlen = LL_RESERVED_SPACE(dev); + skb = sock_wmalloc(sk, hlen + sizeof(*ph) + total_len + + dev->needed_tailroom, 0, GFP_KERNEL); if (!skb) { error = -ENOMEM; goto end; }
/* Reserve space for headers. */ - skb_reserve(skb, dev->hard_header_len); + skb_reserve(skb, hlen); skb_reset_network_header(skb);
skb->dev = dev; @@ -930,7 +931,7 @@ static int __pppoe_xmit(struct sock *sk, /* Copy the data if there is no space for the header or if it's * read-only. */ - if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) + if (skb_cow_head(skb, LL_RESERVED_SPACE(dev) + sizeof(*ph))) goto abort;
__skb_push(skb, sizeof(*ph));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Francois Romieu romieu@fr.zoreil.com
[ Upstream commit a78e93661c5fd30b9e1dee464b2f62f966883ef7 ]
Hardware statistics retrieval hurts in tight invocation loops.
Avoid extraneous write and enforce strict ordering of writes targeted to the tally counters dump area address registers.
Signed-off-by: Francois Romieu romieu@fr.zoreil.com Tested-by: Oliver Freyermuth o.freyermuth@googlemail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2222,19 +2222,14 @@ static bool rtl8169_do_counters(struct n void __iomem *ioaddr = tp->mmio_addr; dma_addr_t paddr = tp->counters_phys_addr; u32 cmd; - bool ret;
RTL_W32(CounterAddrHigh, (u64)paddr >> 32); + RTL_R32(CounterAddrHigh); cmd = (u64)paddr & DMA_BIT_MASK(32); RTL_W32(CounterAddrLow, cmd); RTL_W32(CounterAddrLow, cmd | counter_cmd);
- ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000); - - RTL_W32(CounterAddrLow, 0); - RTL_W32(CounterAddrHigh, 0); - - return ret; + return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000); }
static bool rtl8169_reset_counters(struct net_device *dev)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]
The check in sctp_sockaddr_af is not robust enough to forbid binding a v4mapped v6 addr on a v4 socket.
The worse thing is that v4 socket's bind_verify would not convert this v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4 socket bound a v6 addr.
This patch is to fix it by doing the common sa.sa_family check first, then AF_INET check for v4mapped v6 addrs.
Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -332,16 +332,14 @@ static struct sctp_af *sctp_sockaddr_af( if (len < sizeof (struct sockaddr)) return NULL;
+ if (!opt->pf->af_supported(addr->sa.sa_family, opt)) + return NULL; + /* V4 mapped address are really of AF_INET family */ if (addr->sa.sa_family == AF_INET6 && - ipv6_addr_v4mapped(&addr->v6.sin6_addr)) { - if (!opt->pf->af_supported(AF_INET, opt)) - return NULL; - } else { - /* Does this PF support this AF? */ - if (!opt->pf->af_supported(addr->sa.sa_family, opt)) - return NULL; - } + ipv6_addr_v4mapped(&addr->v6.sin6_addr) && + !opt->pf->af_supported(AF_INET, opt)) + return NULL;
/* If we get this far, af is valid. */ af = sctp_get_af_specific(addr->sa.sa_family);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]
After commit cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep"), it may change to lock another sk if the asoc has been peeled off in sctp_wait_for_sndbuf.
However, the asoc's new sk could be already closed elsewhere, as it's in the sendmsg context of the old sk that can't avoid the new sk's closing. If the sk's last one refcnt is held by this asoc, later on after putting this asoc, the new sk will be freed, while under it's own lock.
This patch is to revert that commit, but fix the old issue by returning error under the old sk's lock.
Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep") Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -83,7 +83,7 @@ static int sctp_writeable(struct sock *sk); static void sctp_wfree(struct sk_buff *skb); static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, - size_t msg_len, struct sock **orig_sk); + size_t msg_len); static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p); static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p); static int sctp_wait_for_accept(struct sock *sk, long timeo); @@ -1956,7 +1956,7 @@ static int sctp_sendmsg(struct sock *sk, timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); if (!sctp_wspace(asoc)) { /* sk can be changed by peel off when waiting for buf. */ - err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk); + err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len); if (err) { if (err == -ESRCH) { /* asoc is already dead. */ @@ -7439,12 +7439,12 @@ void sctp_sock_rfree(struct sk_buff *skb
/* Helper function to wait for space in the sndbuf. */ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, - size_t msg_len, struct sock **orig_sk) + size_t msg_len) { struct sock *sk = asoc->base.sk; - int err = 0; long current_timeo = *timeo_p; DEFINE_WAIT(wait); + int err = 0;
pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc, *timeo_p, msg_len); @@ -7473,17 +7473,13 @@ static int sctp_wait_for_sndbuf(struct s release_sock(sk); current_timeo = schedule_timeout(current_timeo); lock_sock(sk); - if (sk != asoc->base.sk) { - release_sock(sk); - sk = asoc->base.sk; - lock_sock(sk); - } + if (sk != asoc->base.sk) + goto do_error;
*timeo_p = current_timeo; }
out: - *orig_sk = sk; finish_wait(&asoc->wait, &wait);
/* Release the association's refcnt. */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]
When tipc_node_find_by_name() fails, the nlmsg is not freed.
While on it, switch to a goto label to properly free it.
Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node") Reported-by: Dmitry Vyukov dvyukov@google.com Cc: Jon Maloy jon.maloy@ericsson.com Cc: Ying Xue ying.xue@windriver.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Acked-by: Ying Xue ying.xue@windriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/node.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-)
--- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1848,36 +1848,38 @@ int tipc_nl_node_get_link(struct sk_buff
if (strcmp(name, tipc_bclink_name) == 0) { err = tipc_nl_add_bc_link(net, &msg); - if (err) { - nlmsg_free(msg.skb); - return err; - } + if (err) + goto err_free; } else { int bearer_id; struct tipc_node *node; struct tipc_link *link;
node = tipc_node_find_by_name(net, name, &bearer_id); - if (!node) - return -EINVAL; + if (!node) { + err = -EINVAL; + goto err_free; + }
tipc_node_read_lock(node); link = node->links[bearer_id].link; if (!link) { tipc_node_read_unlock(node); - nlmsg_free(msg.skb); - return -EINVAL; + err = -EINVAL; + goto err_free; }
err = __tipc_nl_add_link(net, &msg, link, 0); tipc_node_read_unlock(node); - if (err) { - nlmsg_free(msg.skb); - return err; - } + if (err) + goto err_free; }
return genlmsg_reply(msg.skb, info); + +err_free: + nlmsg_free(msg.skb); + return err; }
int tipc_nl_node_reset_link_stats(struct sk_buff *skb, struct genl_info *info)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neil Horman nhorman@tuxdriver.com
[ Upstream commit 848b159835ddef99cc4193083f7e786c3992f580 ]
with the introduction of commit b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info is improperly handled. While it is heap allocated when an rx queue is setup, and freed when torn down, an old line of code in vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0] being set to NULL prior to its being freed, causing a memory leak, which eventually exhausts the system on repeated create/destroy operations (for example, when the mtu of a vmxnet3 interface is changed frequently.
Fix is pretty straight forward, just move the NULL set to after the free.
Tested by myself with successful results
Applies to net, and should likely be queued for stable, please
Signed-off-by: Neil Horman nhorman@tuxdriver.com Reported-By: boyang@redhat.com CC: boyang@redhat.com CC: Shrikrishna Khare skhare@vmware.com CC: "VMware, Inc." pv-drivers@vmware.com CC: David S. Miller davem@davemloft.net Acked-by: Shrikrishna Khare skhare@vmware.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1616,7 +1616,6 @@ static void vmxnet3_rq_destroy(struct vm rq->rx_ring[i].basePA); rq->rx_ring[i].base = NULL; } - rq->buf_info[i] = NULL; }
if (rq->data_ring.base) { @@ -1638,6 +1637,7 @@ static void vmxnet3_rq_destroy(struct vm (rq->rx_ring[0].size + rq->rx_ring[1].size); dma_free_coherent(&adapter->pdev->dev, sz, rq->buf_info[0], rq->buf_info_pa); + rq->buf_info[0] = rq->buf_info[1] = NULL; } }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jim Westfall jwestfall@surrealistic.net
[ Upstream commit 096b9854c04df86f03b38a97d40b6506e5730919 ]
Use n->primary_key instead of pkey to account for the possibility that a neigh constructor function may have modified the primary_key value.
Signed-off-by: Jim Westfall jwestfall@surrealistic.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -496,7 +496,7 @@ struct neighbour *__neigh_create(struct if (atomic_read(&tbl->entries) > (1 << nht->hash_shift)) nht = neigh_hash_grow(tbl, nht->hash_shift + 1);
- hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift); + hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
if (n->parms->dead) { rc = ERR_PTR(-EINVAL); @@ -508,7 +508,7 @@ struct neighbour *__neigh_create(struct n1 != NULL; n1 = rcu_dereference_protected(n1->next, lockdep_is_held(&tbl->lock))) { - if (dev == n1->dev && !memcmp(n1->primary_key, pkey, key_len)) { + if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)) { if (want_ref) neigh_hold(n1); rc = n1;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jim Westfall jwestfall@surrealistic.net
[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path) and later removed in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point devices) because it was broken.
Signed-off-by: Jim Westfall jwestfall@surrealistic.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/arp.h | 3 +++ net/ipv4/arp.c | 7 ++++++- 2 files changed, 9 insertions(+), 1 deletion(-)
--- a/include/net/arp.h +++ b/include/net/arp.h @@ -19,6 +19,9 @@ static inline u32 arp_hashfn(const void
static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key) { + if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) + key = INADDR_ANY; + return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev); }
--- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -223,11 +223,16 @@ static bool arp_key_eq(const struct neig
static int arp_constructor(struct neighbour *neigh) { - __be32 addr = *(__be32 *)neigh->primary_key; + __be32 addr; struct net_device *dev = neigh->dev; struct in_device *in_dev; struct neigh_parms *parms; + u32 inaddr_any = INADDR_ANY;
+ if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) + memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len); + + addr = *(__be32 *)neigh->primary_key; rcu_read_lock(); in_dev = __in_dev_get_rcu(dev); if (!in_dev) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit 0171c41835591e9aa2e384b703ef9a6ae367c610 ]
ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices, needs to lock pn->all_ppp_mutex. Therefore we mustn't call register_netdevice() with pn->all_ppp_mutex already locked, or we'd deadlock in case register_netdevice() fails and calls .ndo_uninit().
Fortunately, we can unlock pn->all_ppp_mutex before calling register_netdevice(). This lock protects pn->units_idr, which isn't used in the device registration process.
However, keeping pn->all_ppp_mutex locked during device registration did ensure that no device in transient state would be published in pn->units_idr. In practice, unlocking it before calling register_netdevice() doesn't change this property: ppp_unit_register() is called with 'ppp_mutex' locked and all searches done in pn->units_idr hold this lock too.
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/ppp_generic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -1002,17 +1002,18 @@ static int ppp_unit_register(struct ppp if (!ifname_is_set) snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+ mutex_unlock(&pn->all_ppp_mutex); + ret = register_netdevice(ppp->dev); if (ret < 0) goto err_unit;
atomic_inc(&ppp_unit_count);
- mutex_unlock(&pn->all_ppp_mutex); - return 0;
err_unit: + mutex_lock(&pn->all_ppp_mutex); unit_put(&pn->units_idr, ppp->file.index); err: mutex_unlock(&pn->all_ppp_mutex);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera cera@cera.cz
[ Upstream commit 52acf06451930eb4cefabd5ecea56e2d46c32f76 ]
The commit 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings") modified be_update_queues() so the IFACE (HW representation of the netdevice) is destroyed and then re-created. This causes a regression because potential promiscuous mode is not restored properly during be_open() because the driver thinks that the HW has promiscuous mode already enabled.
Note that Lancer is not affected by this bug because RX-filter flags are disabled during be_close() for this chipset.
Cc: Sathya Perla sathya.perla@broadcom.com Cc: Ajit Khaparde ajit.khaparde@broadcom.com Cc: Sriharsha Basavapatna sriharsha.basavapatna@broadcom.com Cc: Somnath Kotur somnath.kotur@broadcom.com
Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings") Signed-off-by: Ivan Vecera ivecera@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/emulex/benet/be_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -4733,6 +4733,15 @@ int be_update_queues(struct be_adapter *
be_schedule_worker(adapter);
+ /* + * The IF was destroyed and re-created. We need to clear + * all promiscuous flags valid for the destroyed IF. + * Without this promisc mode is not restored during + * be_open() because the driver thinks that it is + * already enabled in HW. + */ + adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS; + if (netif_running(netdev)) status = be_open(netdev);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Kodanev alexey.kodanev@oracle.com
[ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]
Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") moved dev->mtu initialization from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a result, the previously set values, before ndo_init(), are reset in the following cases:
* rtnl_create_link() can update dev->mtu from IFLA_MTU parameter.
* ip6gre_tnl_link_config() is invoked before ndo_init() in netlink and ioctl setup, so ndo_init() can reset MTU adjustments with the lower device MTU as well, dev->mtu and dev->hard_header_len.
Not applicable for ip6gretap because it has one more call to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().
Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]' parameter if a user sets it manually on a device creation, and fix the second one by moving ip6gre_tnl_link_config() call after register_netdevice().
Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting") Signed-off-by: Alexey Kodanev alexey.kodanev@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_gre.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -337,11 +337,12 @@ static struct ip6_tnl *ip6gre_tunnel_loc
nt->dev = dev; nt->net = dev_net(dev); - ip6gre_tnl_link_config(nt, 1);
if (register_netdevice(dev) < 0) goto failed_free;
+ ip6gre_tnl_link_config(nt, 1); + /* Can use a lockless transmit, unless we generate output sequences */ if (!(nt->parms.o_flags & TUNNEL_SEQ)) dev->features |= NETIF_F_LLTX; @@ -1263,7 +1264,6 @@ static void ip6gre_netlink_parms(struct
static int ip6gre_tap_init(struct net_device *dev) { - struct ip6_tnl *tunnel; int ret;
ret = ip6gre_tunnel_init_common(dev); @@ -1272,10 +1272,6 @@ static int ip6gre_tap_init(struct net_de
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
- tunnel = netdev_priv(dev); - - ip6gre_tnl_link_config(tunnel, 1); - return 0; }
@@ -1370,7 +1366,6 @@ static int ip6gre_newlink(struct net *sr
nt->dev = dev; nt->net = dev_net(dev); - ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
dev->features |= GRE6_FEATURES; dev->hw_features |= GRE6_FEATURES; @@ -1396,6 +1391,11 @@ static int ip6gre_newlink(struct net *sr if (err) goto out;
+ ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); + + if (tb[IFLA_MTU]) + ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU])); + dev_hold(dev); ip6gre_tunnel_link(ign, nt);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]
Validate gso_type during segmentation as SKB_GSO_DODGY sources may pass packets where the gso_type does not match the contents.
Syzkaller was able to enter the SCTP gso handler with a packet of gso_type SKB_GSO_TCPV4.
On entry of transport layer gso handlers, verify that the gso_type matches the transport protocol.
Fixes: 90017accff61 ("sctp: Add GSO support") Link: http://lkml.kernel.org/r/001a1137452496ffc305617e5fe0@google.com Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn willemb@google.com Acked-by: Jason Wang jasowang@redhat.com Reviewed-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_offload.c | 3 +++ net/ipv4/udp_offload.c | 3 +++ net/ipv6/tcpv6_offload.c | 3 +++ net/ipv6/udp_offload.c | 3 +++ net/sctp/offload.c | 3 +++ 5 files changed, 15 insertions(+)
--- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -32,6 +32,9 @@ static void tcp_gso_tstamp(struct sk_buf static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb, netdev_features_t features) { + if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)) + return ERR_PTR(-EINVAL); + if (!pskb_may_pull(skb, sizeof(struct tcphdr))) return ERR_PTR(-EINVAL);
--- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -205,6 +205,9 @@ static struct sk_buff *udp4_ufo_fragment goto out; }
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP)) + goto out; + if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
--- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -46,6 +46,9 @@ static struct sk_buff *tcp6_gso_segment( { struct tcphdr *th;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)) + return ERR_PTR(-EINVAL); + if (!pskb_may_pull(skb, sizeof(*th))) return ERR_PTR(-EINVAL);
--- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -55,6 +55,9 @@ static struct sk_buff *udp6_ufo_fragment const struct ipv6hdr *ipv6h; struct udphdr *uh;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP)) + goto out; + if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out;
--- a/net/sctp/offload.c +++ b/net/sctp/offload.c @@ -44,6 +44,9 @@ static struct sk_buff *sctp_gso_segment( struct sk_buff *segs = ERR_PTR(-EINVAL); struct sctphdr *sh;
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP)) + goto out; + sh = sctp_hdr(skb); if (!pskb_may_pull(skb, sizeof(*sh))) goto out;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuval Mintz yuvalm@mellanox.com
[ Upstream commit 1ecdaea02ca6bfacf2ecda500dc1af51e9780c42 ]
Driver periodically samples all neighbors configured in device in order to update the kernel regarding their state. When finding an entry configured in HW that doesn't show in neigh_lookup() driver logs an error message. This introduces a race when removing multiple neighbors - it's possible that a given entry would still be configured in HW as its removal is still being processed but is already removed from the kernel's neighbor tables.
Simply remove the error message and gracefully accept such events.
Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours") Signed-off-by: Yuval Mintz yuvalm@mellanox.com Reviewed-by: Ido Schimmel idosch@mellanox.com Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -765,11 +765,8 @@ static void mlxsw_sp_router_neigh_ent_ip dipn = htonl(dip); dev = mlxsw_sp->rifs[rif]->dev; n = neigh_lookup(&arp_tbl, &dipn, dev); - if (!n) { - netdev_err(dev, "Failed to find matching neighbour for IP=%pI4h\n", - &dip); + if (!n) return; - }
netdev_dbg(dev, "Updating neighbour with IP=%pI4h\n", &dip); neigh_event_send(n, NULL);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 4df0bfc79904b7169dc77dcce44598b1545721f9 ]
tfile->tun could be detached before we close the tun fd, via tun_detach_all(), so it should not be used to check for tfile->tx_array.
As Jason suggested, we probably have to clean it up unconditionally both in __tun_deatch() and tun_detach_all(), but this requires to check if it is initialized or not. Currently skb_array_cleanup() doesn't have such a check, so I check it in the caller and introduce a helper function, it is a bit ugly but we can always improve it in net-next.
Reported-by: Dmitry Vyukov dvyukov@google.com Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Cc: Jason Wang jasowang@redhat.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/tun.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -525,6 +525,14 @@ static void tun_queue_purge(struct tun_f skb_queue_purge(&tfile->sk.sk_error_queue); }
+static void tun_cleanup_tx_array(struct tun_file *tfile) +{ + if (tfile->tx_array.ring.queue) { + skb_array_cleanup(&tfile->tx_array); + memset(&tfile->tx_array, 0, sizeof(tfile->tx_array)); + } +} + static void __tun_detach(struct tun_file *tfile, bool clean) { struct tun_file *ntfile; @@ -566,8 +574,7 @@ static void __tun_detach(struct tun_file tun->dev->reg_state == NETREG_REGISTERED) unregister_netdevice(tun->dev); } - if (tun) - skb_array_cleanup(&tfile->tx_array); + tun_cleanup_tx_array(tfile); sock_put(&tfile->sk); } } @@ -606,11 +613,13 @@ static void tun_detach_all(struct net_de /* Drop read queue */ tun_queue_purge(tfile); sock_put(&tfile->sk); + tun_cleanup_tx_array(tfile); } list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) { tun_enable_queue(tfile); tun_queue_purge(tfile); sock_put(&tfile->sk); + tun_cleanup_tx_array(tfile); } BUG_ON(tun->numdisabled != 0);
@@ -2363,6 +2372,8 @@ static int tun_chr_open(struct inode *in
sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
+ memset(&tfile->tx_array, 0, sizeof(tfile->tx_array)); + return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]
syzbot reported yet another crash [1] that is caused by insufficient validation of DODGY packets.
Two bugs are happening here to trigger the crash.
1) Flow dissection leaves with incorrect thoff field.
2) skb_probe_transport_header() sets transport header to this invalid thoff, even if pointing after skb valid data.
3) qdisc_pkt_len_init() reads out-of-bound data because it trusts tcp_hdrlen(skb)
Possible fixes :
- Full flow dissector validation before injecting bad DODGY packets in the stack. This approach was attempted here : https://patchwork.ozlabs.org/patch/ 861874/
- Have more robust functions in the core. This might be needed anyway for stable versions.
This patch fixes the flow dissection issue.
[1] CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:355 [inline] kasan_report+0x23b/0x360 mm/kasan/report.c:413 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432 __tcp_hdrlen include/linux/tcp.h:35 [inline] tcp_hdrlen include/linux/tcp.h:40 [inline] qdisc_pkt_len_init net/core/dev.c:3160 [inline] __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465 dev_queue_xmit+0x17/0x20 net/core/dev.c:3554 packet_snd net/packet/af_packet.c:2943 [inline] packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 sock_write_iter+0x31a/0x5d0 net/socket.c:907 call_write_iter include/linux/fs.h:1776 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x684/0x970 fs/read_write.c:482 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 entry_SYSCALL_64_fastpath+0x1f/0x96
Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value") Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Willem de Bruijn willemb@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/flow_dissector.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -550,8 +550,8 @@ ip_proto_again: out_good: ret = true;
- key_control->thoff = (u16)nhoff; out: + key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen); key_basic->n_proto = proto; key_basic->ip_proto = ip_proto;
@@ -559,7 +559,6 @@ out:
out_bad: ret = false; - key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen); goto out; } EXPORT_SYMBOL(__skb_flow_dissect);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiao Liang xiliang@redhat.com
commit 40d4071ce2d20840d224b4a77b5dc6f752c9ab15 upstream.
The AMD power module can be loaded on non AMD platforms, but unload fails with the following Oops:
BUG: unable to handle kernel NULL pointer dereference at (null) IP: __list_del_entry_valid+0x29/0x90 Call Trace: perf_pmu_unregister+0x25/0xf0 amd_power_pmu_exit+0x1c/0xd23 [power] SyS_delete_module+0x1a8/0x2b0 ? exit_to_usermode_loop+0x8f/0xb0 entry_SYSCALL_64_fastpath+0x20/0x83
Return -ENODEV instead of 0 from the module init function if the CPU does not match.
Fixes: c7ab62bfbe0e ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism") Signed-off-by: Xiao Liang xiliang@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/events/amd/power.c +++ b/arch/x86/events/amd/power.c @@ -277,7 +277,7 @@ static int __init amd_power_pmu_init(voi int ret;
if (!x86_match_cpu(cpu_match)) - return 0; + return -ENODEV;
if (!boot_cpu_has(X86_FEATURE_ACC_POWER)) return -ENODEV;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jia Zhang zhang.jia@linux.alibaba.com
commit 7e702d17ed138cf4ae7c00e8c00681ed464587c7 upstream.
Commit b94b73733171 ("x86/microcode/intel: Extend BDW late-loading with a revision check") reduced the impact of erratum BDF90 for Broadwell model 79.
The impact can be reduced further by checking the size of the last level cache portion per core.
Tony: "The erratum says the problem only occurs on the large-cache SKUs. So we only need to avoid the update if we are on a big cache SKU that is also running old microcode."
For more details, see erratum BDF90 in document #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family Specification Update) from September 2017.
Fixes: b94b73733171 ("x86/microcode/intel: Extend BDW late-loading with a revision check") Signed-off-by: Jia Zhang zhang.jia@linux.alibaba.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Tony Luck tony.luck@intel.com Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/microcode/intel.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -40,6 +40,9 @@ #include <asm/setup.h> #include <asm/msr.h>
+/* last level cache size per core */ +static int llc_size_per_core; + /* * Temporary microcode blobs pointers storage. We note here during early load * the pointers to microcode blobs we've got from whatever storage (detached @@ -1053,12 +1056,14 @@ static bool is_blacklisted(unsigned int
/* * Late loading on model 79 with microcode revision less than 0x0b000021 - * may result in a system hang. This behavior is documented in item - * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family). + * and LLC size per core bigger than 2.5MB may result in a system hang. + * This behavior is documented in item BDF90, #334165 (Intel Xeon + * Processor E7-8800/4800 v4 Product Family). */ if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X && c->x86_mask == 0x01 && + llc_size_per_core > 2621440 && c->microcode < 0x0b000021) { pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode); pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n"); @@ -1125,6 +1130,15 @@ static struct microcode_ops microcode_in .microcode_fini_cpu = microcode_fini_cpu, };
+static int __init calc_llc_size_per_core(struct cpuinfo_x86 *c) +{ + u64 llc_size = c->x86_cache_size * 1024; + + do_div(llc_size, c->x86_max_cores); + + return (int)llc_size; +} + struct microcode_ops * __init init_intel_microcode(void) { struct cpuinfo_x86 *c = &boot_cpu_data; @@ -1135,6 +1149,8 @@ struct microcode_ops * __init init_intel return NULL; }
+ llc_size_per_core = calc_llc_size_per_core(c); + return µcode_intel_ops; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.
The hrtimer interrupt code contains a hang detection and mitigation mechanism, which prevents that a long delayed hrtimer interrupt causes a continous retriggering of interrupts which prevent the system from making progress. If a hang is detected then the timer hardware is programmed with a certain delay into the future and a flag is set in the hrtimer cpu base which prevents newly enqueued timers from reprogramming the timer hardware prior to the chosen delay. The subsequent hrtimer interrupt after the delay clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is unplugged then the hang_detected flag is set and stays that way when the CPU is plugged in again. At that point the timer hardware is not armed and it cannot be armed because the hang_detected flag is still active, so nothing clears that flag. As a consequence the CPU does not receive hrtimer interrupts and no timers expire on that CPU which results in RCU stalls and other malfunctions.
Clear the flag along with some other less critical members of the hrtimer cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the root cause of that hard to reproduce heisenbug. Once understood it's trivial and certainly justifies a brown paperbag.
Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic") Reported-by: Paul E. McKenney paulmck@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Sebastian Sewior bigeasy@linutronix.de Cc: Anna-Maria Gleixner anna-maria@linutronix.de Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/hrtimer.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -652,7 +652,9 @@ static void hrtimer_reprogram(struct hrt static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base) { base->expires_next.tv64 = KTIME_MAX; + base->hang_detected = 0; base->hres_active = 0; + base->next_timer = NULL; }
/* @@ -1610,6 +1612,7 @@ int hrtimers_prepare_cpu(unsigned int cp timerqueue_init_head(&cpu_base->clock_base[i].active); }
+ cpu_base->active_bases = 0; cpu_base->cpu = cpu; hrtimer_init_hres(cpu_base); return 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ upstream commit 84ccac6e7854ebbfb56d2fc6d5bef9be49bb304c ]
Saves 4 bytes replacing following instructions :
lea rax, [rsi + rdx * 8 + offsetof(...)] mov rax, qword ptr [rax] cmp rax, 0
by :
mov rax, [rsi + rdx * 8 + offsetof(...)] test rax, rax
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Alexei Starovoitov ast@kernel.org Cc: Daniel Borkmann daniel@iogearbox.net Acked-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/net/bpf_jit_comp.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -281,7 +281,7 @@ static void emit_bpf_tail_call(u8 **ppro EMIT4(0x48, 0x8B, 0x46, /* mov rax, qword ptr [rsi + 16] */ offsetof(struct bpf_array, map.max_entries)); EMIT3(0x48, 0x39, 0xD0); /* cmp rax, rdx */ -#define OFFSET1 47 /* number of bytes to jump */ +#define OFFSET1 43 /* number of bytes to jump */ EMIT2(X86_JBE, OFFSET1); /* jbe out */ label1 = cnt;
@@ -290,21 +290,20 @@ static void emit_bpf_tail_call(u8 **ppro */ EMIT2_off32(0x8B, 0x85, -STACKSIZE + 36); /* mov eax, dword ptr [rbp - 516] */ EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ -#define OFFSET2 36 +#define OFFSET2 32 EMIT2(X86_JA, OFFSET2); /* ja out */ label2 = cnt; EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ EMIT2_off32(0x89, 0x85, -STACKSIZE + 36); /* mov dword ptr [rbp - 516], eax */
/* prog = array->ptrs[index]; */ - EMIT4_off32(0x48, 0x8D, 0x84, 0xD6, /* lea rax, [rsi + rdx * 8 + offsetof(...)] */ + EMIT4_off32(0x48, 0x8B, 0x84, 0xD6, /* mov rax, [rsi + rdx * 8 + offsetof(...)] */ offsetof(struct bpf_array, ptrs)); - EMIT3(0x48, 0x8B, 0x00); /* mov rax, qword ptr [rax] */
/* if (prog == NULL) * goto out; */ - EMIT4(0x48, 0x83, 0xF8, 0x00); /* cmp rax, 0 */ + EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */ #define OFFSET3 10 EMIT2(X86_JE, OFFSET3); /* je out */ label3 = cnt;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@fb.com
[ upstream commit 90caccdd8cc0215705f18b92771b449b01e2474a ]
- bpf prog_array just like all other types of bpf array accepts 32-bit index. Clarify that in the comment. - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes - tighten corresponding check in the interpreter to stay consistent
The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and though JIT code is wrong it will check bounds correctly. Hence two fixes tags. All other JITs don't have this problem.
Signed-off-by: Alexei Starovoitov ast@kernel.org Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper") Acked-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Martin KaFai Lau kafai@fb.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/net/bpf_jit_comp.c | 4 ++-- kernel/bpf/core.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -278,9 +278,9 @@ static void emit_bpf_tail_call(u8 **ppro /* if (index >= array->map.max_entries) * goto out; */ - EMIT4(0x48, 0x8B, 0x46, /* mov rax, qword ptr [rsi + 16] */ + EMIT2(0x89, 0xD2); /* mov edx, edx */ + EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */ offsetof(struct bpf_array, map.max_entries)); - EMIT3(0x48, 0x39, 0xD0); /* cmp rax, rdx */ #define OFFSET1 43 /* number of bytes to jump */ EMIT2(X86_JBE, OFFSET1); /* jbe out */ label1 = cnt; --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -715,7 +715,7 @@ select_insn: struct bpf_map *map = (struct bpf_map *) (unsigned long) BPF_R2; struct bpf_array *array = container_of(map, struct bpf_array, map); struct bpf_prog *prog; - u64 index = BPF_R3; + u32 index = BPF_R3;
if (unlikely(index >= array->map.max_entries)) goto out;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@kernel.org
[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog: "At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config option that removes interpreter from the kernel in favor of JIT-only mode. So far eBPF JIT is supported by: x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only. In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3: - move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2: - fix init order, test_bpf and cBPF (Daniel's feedback) - fix offloaded bpf (Jakub's feedback) - add 'return 0' dummy in case something can invoke prog->bpf_func - retarget bpf tree. For bpf-next the patch would need one extra hunk. It will be sent when the trees are merged back to net-next
Considered doing: int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT; but it seems better to land the patch as-is and in bpf-next remove bpf_jit_enable global variable from all JITs, consolidate in one place and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- init/Kconfig | 7 +++++++ kernel/bpf/core.c | 18 ++++++++++++++++++ lib/test_bpf.c | 11 +++++++---- net/core/filter.c | 6 ++---- net/core/sysctl_net_core.c | 6 ++++++ net/socket.c | 9 +++++++++ 6 files changed, 49 insertions(+), 8 deletions(-)
--- a/init/Kconfig +++ b/init/Kconfig @@ -1609,6 +1609,13 @@ config BPF_SYSCALL Enable the bpf() system call that allows to manipulate eBPF programs and maps via file descriptors.
+config BPF_JIT_ALWAYS_ON + bool "Permanently enable BPF JIT and remove BPF interpreter" + depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT + help + Enables BPF JIT and removes BPF interpreter to avoid + speculative execution of BPF instructions by the interpreter + config SHMEM bool "Use full shmem filesystem" if EXPERT default y --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -458,6 +458,7 @@ noinline u64 __bpf_call_base(u64 r1, u64 } EXPORT_SYMBOL_GPL(__bpf_call_base);
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON /** * __bpf_prog_run - run eBPF program on a given context * @ctx: is the data we are operating on @@ -923,6 +924,13 @@ load_byte: } STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
+#else +static unsigned int __bpf_prog_ret0(void *ctx, const struct bpf_insn *insn) +{ + return 0; +} +#endif + bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp) { @@ -970,7 +978,11 @@ static int bpf_check_tail_call(const str */ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) { +#ifndef CONFIG_BPF_JIT_ALWAYS_ON fp->bpf_func = (void *) __bpf_prog_run; +#else + fp->bpf_func = (void *) __bpf_prog_ret0; +#endif
/* eBPF JITs can rewrite the program in case constant * blinding is active. However, in case of error during @@ -979,6 +991,12 @@ struct bpf_prog *bpf_prog_select_runtime * be JITed, but falls back to the interpreter. */ fp = bpf_int_jit_compile(fp); +#ifdef CONFIG_BPF_JIT_ALWAYS_ON + if (!fp->jited) { + *err = -ENOTSUPP; + return fp; + } +#endif bpf_prog_lock_ro(fp);
/* The tail call compatibility check can only be done at --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -5646,9 +5646,8 @@ static struct bpf_prog *generate_filter( return NULL; } } - /* We don't expect to fail. */ if (*err) { - pr_cont("FAIL to attach err=%d len=%d\n", + pr_cont("FAIL to prog_create err=%d len=%d\n", *err, fprog.len); return NULL; } @@ -5671,6 +5670,10 @@ static struct bpf_prog *generate_filter( * checks. */ fp = bpf_prog_select_runtime(fp, err); + if (*err) { + pr_cont("FAIL to select_runtime err=%d\n", *err); + return NULL; + } break; }
@@ -5856,8 +5859,8 @@ static __init int test_bpf(void) pass_cnt++; continue; } - - return err; + err_cnt++; + continue; }
pr_cont("jited:%u ", fp->jited); --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1005,11 +1005,9 @@ static struct bpf_prog *bpf_migrate_filt */ goto out_err_free;
- /* We are guaranteed to never error here with cBPF to eBPF - * transitions, since there's no issue with type compatibility - * checks on program arrays. - */ fp = bpf_prog_select_runtime(fp, &err); + if (err) + goto out_err_free;
kfree(old_prog); return fp; --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -292,7 +292,13 @@ static struct ctl_table net_core_table[] .data = &bpf_jit_enable, .maxlen = sizeof(int), .mode = 0644, +#ifndef CONFIG_BPF_JIT_ALWAYS_ON .proc_handler = proc_dointvec +#else + .proc_handler = proc_dointvec_minmax, + .extra1 = &one, + .extra2 = &one, +#endif }, # ifdef CONFIG_HAVE_EBPF_JIT { --- a/net/socket.c +++ b/net/socket.c @@ -2548,6 +2548,15 @@ out_fs:
core_initcall(sock_init); /* early initcall */
+static int __init jit_init(void) +{ +#ifdef CONFIG_BPF_JIT_ALWAYS_ON + bpf_jit_enable = 1; +#endif + return 0; +} +pure_initcall(jit_init); + #ifdef CONFIG_PROC_FS void socket_seq_show(struct seq_file *seq) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit 7891a87efc7116590eaba57acc3c422487802c6f ]
The following snippet was throwing an 'unknown opcode cc' warning in BPF interpreter:
0: (18) r0 = 0x0 2: (7b) *(u64 *)(r10 -16) = r0 3: (cc) (u32) r0 s>>= (u32) r0 4: (95) exit
Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X} generation, not all of them do and interpreter does neither. We can leave existing ones and implement it later in bpf-next for the remaining ones, but reject this properly in verifier for the time being.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1843,6 +1843,11 @@ static int check_alu_op(struct bpf_verif return -EINVAL; }
+ if (opcode == BPF_ARSH && BPF_CLASS(insn->code) != BPF_ALU64) { + verbose("BPF_ARSH not supported for 32 bit ALU\n"); + return -EINVAL; + } + if ((opcode == BPF_LSH || opcode == BPF_RSH || opcode == BPF_ARSH) && BPF_SRC(insn->code) == BPF_K) { int size = BPF_CLASS(insn->code) == BPF_ALU64 ? 64 : 32;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]
In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds speculation") also change the layout of struct bpf_map such that false sharing of fast-path members like max_entries is avoided when the maps reference counter is altered. Therefore enforce them to be placed into separate cachelines.
pahole dump after change:
struct bpf_map { const struct bpf_map_ops * ops; /* 0 8 */ struct bpf_map * inner_map_meta; /* 8 8 */ void * security; /* 16 8 */ enum bpf_map_type map_type; /* 24 4 */ u32 key_size; /* 28 4 */ u32 value_size; /* 32 4 */ u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ u32 pages; /* 44 4 */ u32 id; /* 48 4 */ int numa_node; /* 52 4 */ bool unpriv_array; /* 56 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */ struct user_struct * user; /* 64 8 */ atomic_t refcnt; /* 72 4 */ atomic_t usercnt; /* 76 4 */ struct work_struct work; /* 80 32 */ char name[16]; /* 112 16 */ /* --- cacheline 2 boundary (128 bytes) --- */
/* size: 128, cachelines: 2, members: 17 */ /* sum members: 121, holes: 1, sum holes: 7 */ };
Now all entries in the first cacheline are read only throughout the life time of the map, set up once during map creation. Overall struct size and number of cachelines doesn't change from the reordering. struct bpf_map is usually first member and embedded in map structs in specific map implementations, so also avoid those members to sit at the end where it could potentially share the cacheline with first map values e.g. in the array since remote CPUs could trigger map updates just as well for those (easily dirtying members like max_entries intentionally as well) while having subsequent values in cache.
Quoting from Google's Project Zero blog [1]:
Additionally, at least on the Intel machine on which this was tested, bouncing modified cache lines between cores is slow, apparently because the MESI protocol is used for cache coherence [8]. Changing the reference counter of an eBPF array on one physical CPU core causes the cache line containing the reference counter to be bounced over to that CPU core, making reads of the reference counter on all other CPU cores slow until the changed reference counter has been written back to memory. Because the length and the reference counter of an eBPF array are stored in the same cache line, this also means that changing the reference counter on one physical CPU core causes reads of the eBPF array's length to be slow on other physical CPU cores (intentional false sharing).
While this doesn't 'control' the out-of-bounds speculation through masking the index as in commit b2157399cc98, triggering a manipulation of the map's reference counter is really trivial, so lets not allow to easily affect max_entries from it.
Splitting to separate cachelines also generally makes sense from a performance perspective anyway in that fast-path won't have a cache miss if the map gets pinned, reused in other progs, etc out of control path, thus also avoids unintentional false sharing.
[1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with...
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -36,7 +36,10 @@ struct bpf_map_ops { };
struct bpf_map { - atomic_t refcnt; + /* 1st cacheline with read-mostly members of which some + * are also accessed in fast-path (e.g. ops, max_entries). + */ + const struct bpf_map_ops *ops ____cacheline_aligned; enum bpf_map_type map_type; u32 key_size; u32 value_size; @@ -44,10 +47,15 @@ struct bpf_map { u32 map_flags; u32 pages; bool unpriv_array; - struct user_struct *user; - const struct bpf_map_ops *ops; - struct work_struct work; + /* 7 bytes hole */ + + /* 2nd cacheline with misc members to avoid false sharing + * particularly with refcounting. + */ + struct user_struct *user ____cacheline_aligned; + atomic_t refcnt; atomic_t usercnt; + struct work_struct work; };
struct bpf_map_type_list {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ]
Divides by zero are not nice, lets avoid them if possible.
Also do_div() seems not needed when dealing with 32bit operands, but this seems a minor detail.
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -642,7 +642,7 @@ select_insn: DST = tmp; CONT; ALU_MOD_X: - if (unlikely(SRC == 0)) + if (unlikely((u32)SRC == 0)) return 0; tmp = (u32) DST; DST = do_div(tmp, (u32) SRC); @@ -661,7 +661,7 @@ select_insn: DST = div64_u64(DST, SRC); CONT; ALU_DIV_X: - if (unlikely(SRC == 0)) + if (unlikely((u32)SRC == 0)) return 0; tmp = (u32) DST; do_div(tmp, (u32) SRC);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@kernel.org
[ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]
due to some JITs doing if (src_reg == 0) check in 64-bit mode for div/mod operations mask upper 32-bits of src register before doing the check
Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT") Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.") Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 18 ++++++++++++++++++ net/core/filter.c | 4 ++++ 2 files changed, 22 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3391,6 +3391,24 @@ static int fixup_bpf_calls(struct bpf_ve
for (i = 0; i < insn_cnt; i++, insn++) { + if (insn->code == (BPF_ALU | BPF_MOD | BPF_X) || + insn->code == (BPF_ALU | BPF_DIV | BPF_X)) { + /* due to JIT bugs clear upper 32-bits of src register + * before div/mod operation + */ + insn_buf[0] = BPF_MOV32_REG(insn->src_reg, insn->src_reg); + insn_buf[1] = *insn; + cnt = 2; + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + continue; + } + if (insn->code != (BPF_JMP | BPF_CALL)) continue;
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -441,6 +441,10 @@ do_pass: convert_bpf_extensions(fp, &insn)) break;
+ if (fp->code == (BPF_ALU | BPF_DIV | BPF_X) || + fp->code == (BPF_ALU | BPF_MOD | BPF_X)) + *insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X); + *insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k); break;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
[ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ]
Alexei found that verifier does not reject stores into context via BPF_ST instead of BPF_STX. And while looking at it, we also should not allow XADD variant of BPF_STX.
The context rewriter is only assuming either BPF_LDX_MEM- or BPF_STX_MEM-type operations, thus reject anything other than that so that assumptions in the rewriter properly hold. Add test cases as well for BPF selftests.
Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -702,6 +702,13 @@ static bool is_pointer_value(struct bpf_ return __is_pointer_value(env->allow_ptr_leaks, &env->cur_state.regs[regno]); }
+static bool is_ctx_reg(struct bpf_verifier_env *env, int regno) +{ + const struct bpf_reg_state *reg = &env->cur_state.regs[regno]; + + return reg->type == PTR_TO_CTX; +} + static int check_ptr_alignment(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int size) { @@ -896,6 +903,12 @@ static int check_xadd(struct bpf_verifie return -EACCES; }
+ if (is_ctx_reg(env, insn->dst_reg)) { + verbose("BPF_XADD stores into R%d context is not allowed\n", + insn->dst_reg); + return -EACCES; + } + /* check whether atomic_add can read the memory */ err = check_mem_access(env, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_READ, -1); @@ -3012,6 +3025,12 @@ static int do_check(struct bpf_verifier_ if (err) return err;
+ if (is_ctx_reg(env, insn->dst_reg)) { + verbose("BPF_ST stores into R%d context is not allowed\n", + insn->dst_reg); + return -EACCES; + } + /* check that memory (dst_reg + off) is writeable */ err = check_mem_access(env, insn->dst_reg, insn->off, BPF_SIZE(insn->code), BPF_WRITE,
On 01/29/2018 05:56 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.79 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:38:12 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.79-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 29 January 2018 at 18:26, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.79 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:38:12 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.79-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.9.79-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: 0d0e61f289f38f6c09f044b35c1823c6ce573fe4 git describe: v4.9.78-68-g0d0e61f289f3 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.78-68-g...
No regressions (compared to build v4.9.78-67-g4265b9ca3e45)
Boards, architectures and test suites: -------------------------------------
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 40, skip: 23 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 983, skip: 121 * ltp-timers-tests - pass: 12,
juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 41, skip: 22 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 987, skip: 121 * ltp-timers-tests - pass: 12,
x15 - arm * boot - pass: 20, * kselftest - pass: 38, skip: 24 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1037, skip: 66 * ltp-timers-tests - pass: 12,
x86_64 * boot - pass: 20, * kselftest - pass: 51, skip: 26 * libhugetlbfs - pass: 88, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 61, skip: 1 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 9, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1016, skip: 116 * ltp-timers-tests - pass: 12,
Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports Tested-by: Naresh Kamboju naresh.kamboju@linaro.org
On 01/29/2018 04:56 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.79 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 31 12:38:12 UTC 2018. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 126 pass: 126 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
linux-stable-mirror@lists.linaro.org