This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.133-rc1
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp x86/kexec: Avoid double free_page() upon do_kexec_load() failure
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp hfsplus: stop workqueue when fill_super() failed
Johannes Berg johannes.berg@intel.com cfg80211: limit wiphy names to 128 bytes
Geert Uytterhoeven geert+renesas@glider.be gpio: rcar: Add Runtime PM handling for interrupts
John Stultz john.stultz@linaro.org time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Vinod Koul vinod.koul@intel.com dmaengine: ensure dmaengine helpers check valid callback
Jens Remus jremus@linux.ibm.com scsi: zfcp: fix infinite iteration on ERP ready list
Alexander Potapenko glider@google.com scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()
Jason Yan yanaijie@huawei.com scsi: libsas: defer ata device eh commands to libata
Martin Schwidefsky schwidefsky@de.ibm.com s390: use expoline thunks in the BPF JIT
Martin Schwidefsky schwidefsky@de.ibm.com s390: extend expoline to BC instructions
Martin Schwidefsky schwidefsky@de.ibm.com s390: move spectre sysfs attribute code
Martin Schwidefsky schwidefsky@de.ibm.com s390/kernel: use expoline for indirect branches
Martin Schwidefsky schwidefsky@de.ibm.com s390/ftrace: use expoline for indirect branches
Martin Schwidefsky schwidefsky@de.ibm.com s390/lib: use expoline for indirect branches
Martin Schwidefsky schwidefsky@de.ibm.com s390: move expoline assembler macros to a header
Martin Schwidefsky schwidefsky@de.ibm.com s390: add assembler macros for CPU alternatives
Al Viro viro@zeniv.linux.org.uk ext2: fix a block leak
Eric Dumazet edumazet@google.com tcp: purge write queue in tcp_connect_init()
Eric Dumazet edumazet@google.com sock_diag: fix use-after-free read in __sk_free
Willem de Bruijn willemb@google.com packet: in packet_snd start writing at link layer allocation
Willem de Bruijn willemb@google.com net: test tailroom before appending to linear skb
Liu Bo bo.liu@linux.alibaba.com btrfs: fix reading stale metadata blocks after degraded raid1 mounts
Anand Jain anand.jain@oracle.com btrfs: fix crash when trying to resume balance without the resume flag
Filipe Manana fdmanana@suse.com Btrfs: fix xattr loss after power failure
Masami Hiramatsu mhiramat@kernel.org ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
Masami Hiramatsu mhiramat@kernel.org ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
Masami Hiramatsu mhiramat@kernel.org ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
Dexuan Cui decui@microsoft.com tick/broadcast: Use for_each_cpu() specially on UP kernels
Masami Hiramatsu mhiramat@kernel.org ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
Ard Biesheuvel ard.biesheuvel@linaro.org efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
Martin Schwidefsky schwidefsky@de.ibm.com s390: remove indirect branch from do_softirq_own_stack
Julian Wiedmann jwi@linux.ibm.com s390/qdio: don't release memory in qdio_setup_irq()
Hendrik Brueckner brueckner@linux.ibm.com s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
Julian Wiedmann jwi@linux.ibm.com s390/qdio: fix access to uninitialized qdio_q fields
Pavel Tatashin pasha.tatashin@oracle.com mm: don't allow deferred pages with NEED_PER_CPU_KM
Nicholas Piggin npiggin@gmail.com powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
Janis Danisevskis jdanis@google.com procfs: fix pthread cross-thread naming if !PR_DUMPABLE
Mateusz Guzik mguzik@redhat.com proc read mm's {arg,env}_{start,end} with mmap semaphore taken.
Steven Rostedt (VMware) rostedt@goodmis.org tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com cpufreq: intel_pstate: Enable HWP by default
Waiman Long Waiman.Long@hpe.com signals: avoid unnecessary taking of sighand->siglock
Mel Gorman mgorman@techsingularity.net mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to complete during a read
Mel Gorman mgorman@techsingularity.net mm: filemap: remove redundant code in do_read_cache_page
Johannes Weiner hannes@cmpxchg.org proc: meminfo: estimate available memory more conservatively
Vladimir Davydov vdavydov@virtuozzo.com vmscan: do not force-scan file lru if its absolute size is small
Benjamin Herrenschmidt benh@kernel.crashing.org powerpc: Don't preempt_disable() in show_cpuinfo()
Anders Roxell anders.roxell@linaro.org cpuidle: coupled: remove unused define cpuidle_coupled_lock
Stewart Smith stewart@linux.vnet.ibm.com powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Stewart Smith stewart@linux.vnet.ibm.com powerpc/powernv: Remove OPALv2 firmware define and references
Stewart Smith stewart@linux.vnet.ibm.com powerpc/powernv: panic() on OPAL < V3
Andy Shevchenko andriy.shevchenko@linux.intel.com spi: pxa2xx: Allow 64-bit DMA
Wenwen Wang wang6495@umn.edu ALSA: control: fix a redundant-copy issue
Hans de Goede hdegoede@redhat.com ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
Federico Cuello fedux@fedux.com.ar ALSA: usb: mixer: volume quirk for CM102-A+/102S+
Shuah Khan (Samsung OSG) shuah@kernel.org usbip: usbip_host: fix bad unlock balance during stub_probe()
Shuah Khan (Samsung OSG) shuah@kernel.org usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
Shuah Khan (Samsung OSG) shuah@kernel.org usbip: usbip_host: run rebind from exit when module is removed
Shuah Khan (Samsung OSG) shuah@kernel.org usbip: usbip_host: delete device from busid_table after rebind
Shuah Khan shuahkh@osg.samsung.com usbip: usbip_host: refine probe and disconnect debug msgs to be useful
zhongjiang zhongjiang@huawei.com kernel/exit.c: avoid undefined behaviour when calling wait4()
Jiri Slaby jslaby@suse.cz futex: futex_wake_op, fix sign_extend32 sign bits
Michael Kerrisk (man-pages) mtk.manpages@gmail.com pipe: cap initial pipe capacity according to pipe-max-size limit
James Chapman jchapman@katalix.com l2tp: revert "l2tp: fix missing print session offset info"
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "ARM: dts: imx6qdl-wandboard: Fix audio channel swap"
Vasily Averin vvs@virtuozzo.com lockd: lost rollback of set_grace_period() in lockd_down_net()
Antony Antony antony@phenome.org xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM)
Jiri Slaby jslaby@suse.cz futex: Remove duplicated code and fix undefined behaviour
Mel Gorman mgorman@suse.de futex: Remove unnecessary warning from get_futex_key
Suzuki K Poulose suzuki.poulose@arm.com arm64: Add work around for Arm Cortex-A55 Erratum 1024718
Ard Biesheuvel ard.biesheuvel@linaro.org arm64: introduce mov_q macro to move a constant into a 64-bit register
Richard Guy Briggs rgb@redhat.com audit: move calcs after alloc and check when logging set loginuid
Takashi Iwai tiwai@suse.de ALSA: timer: Call notifier in the same spinlock
Xin Long lucien.xin@gmail.com sctp: delay the authentication for the duplicated cookie-echo chunk
Xin Long lucien.xin@gmail.com sctp: fix the issue that the cookie-ack with auth can't get processed
Yuchung Cheng ycheng@google.com tcp: ignore Fast Open on repair mode
Debabrata Banerjee dbanerje@akamai.com bonding: do not allow rlb updates to invalid mac
Michael Chan michael.chan@broadcom.com tg3: Fix vunmap() BUG_ON() triggered from tg3_free_consistent().
Xin Long lucien.xin@gmail.com sctp: use the old asoc when making the cookie-ack chunk in dupcook_d
Xin Long lucien.xin@gmail.com sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr
Heiner Kallweit hkallweit1@gmail.com r8169: fix powering up RTL8168h
Bjørn Mork bjorn@mork.no qmi_wwan: do not steal interfaces from class drivers
Stefano Brivio sbrivio@redhat.com openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found
Lance Richardson lance.richardson.net@gmail.com net: support compat 64-bit time in {s,g}etsockopt
Eric Dumazet edumazet@google.com net_sched: fq: take care of throttled flows before reuse
Moshe Shemesh moshe@mellanox.com net/mlx4_en: Verify coalescing parameters are in range
Rob Taglang rob@taglang.io net: ethernet: sun: niu set correct packet size in skb
Eric Dumazet edumazet@google.com llc: better deal with too small mtu
Andrey Ignatov rdna@fb.com ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
Eric Dumazet edumazet@google.com dccp: fix tasklet usage
Hangbin Liu liuhangbin@gmail.com bridge: check iface upper dev when setting master via ioctl
Ingo Molnar mingo@elte.hu 8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
-------------
Diffstat:
Makefile | 4 +- arch/alpha/include/asm/futex.h | 26 +-- arch/arc/include/asm/futex.h | 40 +---- arch/arm/boot/dts/imx6qdl-wandboard.dtsi | 1 - arch/arm/include/asm/assembler.h | 10 ++ arch/arm/include/asm/futex.h | 26 +-- arch/arm/kernel/traps.c | 5 +- arch/arm/lib/getuser.S | 10 ++ arch/arm/probes/kprobes/opt-arm.c | 4 +- arch/arm64/Kconfig | 14 ++ arch/arm64/include/asm/assembler.h | 60 +++++++ arch/arm64/include/asm/cputype.h | 11 ++ arch/arm64/include/asm/futex.h | 26 +-- arch/arm64/mm/proc.S | 5 + arch/frv/include/asm/futex.h | 3 +- arch/frv/kernel/futex.c | 27 +-- arch/hexagon/include/asm/futex.h | 38 +--- arch/ia64/include/asm/futex.h | 25 +-- arch/microblaze/include/asm/futex.h | 38 +--- arch/mips/include/asm/futex.h | 25 +-- arch/parisc/include/asm/futex.h | 25 +-- arch/powerpc/include/asm/firmware.h | 5 +- arch/powerpc/include/asm/futex.h | 26 +-- arch/powerpc/kernel/setup-common.c | 11 -- arch/powerpc/platforms/powernv/eeh-powernv.c | 4 +- arch/powerpc/platforms/powernv/idle.c | 2 +- arch/powerpc/platforms/powernv/opal-nvram.c | 14 +- arch/powerpc/platforms/powernv/opal-xscom.c | 2 +- arch/powerpc/platforms/powernv/opal.c | 36 ++-- arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- arch/powerpc/platforms/powernv/setup.c | 12 +- arch/powerpc/platforms/powernv/smp.c | 74 ++++---- arch/s390/include/asm/alternative-asm.h | 108 ++++++++++++ arch/s390/include/asm/futex.h | 23 +-- arch/s390/include/asm/nospec-insn.h | 193 +++++++++++++++++++++ arch/s390/kernel/Makefile | 1 + arch/s390/kernel/asm-offsets.c | 1 + arch/s390/kernel/base.S | 24 +-- arch/s390/kernel/entry.S | 105 +++-------- arch/s390/kernel/irq.c | 5 +- arch/s390/kernel/mcount.S | 14 +- arch/s390/kernel/nospec-branch.c | 43 +++-- arch/s390/kernel/nospec-sysfs.c | 21 +++ arch/s390/kernel/perf_cpum_sf.c | 4 + arch/s390/kernel/reipl.S | 5 +- arch/s390/kernel/swsusp.S | 10 +- arch/s390/lib/mem.S | 9 +- arch/s390/net/bpf_jit.S | 16 +- arch/s390/net/bpf_jit_comp.c | 63 ++++++- arch/sh/include/asm/futex.h | 26 +-- arch/sparc/include/asm/futex_64.h | 26 +-- arch/tile/include/asm/futex.h | 40 +---- arch/x86/boot/compressed/eboot.c | 6 +- arch/x86/include/asm/futex.h | 40 +---- arch/x86/kernel/machine_kexec_32.c | 6 +- arch/x86/kernel/machine_kexec_64.c | 4 +- arch/x86/xen/mmu.c | 4 - arch/xtensa/include/asm/futex.h | 27 +-- drivers/cpufreq/intel_pstate.c | 34 ++-- drivers/cpufreq/powernv-cpufreq.c | 2 +- drivers/cpuidle/coupled.c | 1 - drivers/cpuidle/cpuidle-powernv.c | 2 +- drivers/gpio/gpio-rcar.c | 46 +++++ drivers/net/bonding/bond_alb.c | 2 +- drivers/net/ethernet/broadcom/tg3.c | 9 +- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 16 ++ drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 7 +- drivers/net/ethernet/realtek/8139too.c | 2 +- drivers/net/ethernet/realtek/r8169.c | 3 + drivers/net/ethernet/sun/niu.c | 5 +- drivers/net/usb/qmi_wwan.c | 12 ++ drivers/s390/cio/qdio_setup.c | 12 +- drivers/s390/scsi/zfcp_dbf.c | 23 ++- drivers/s390/scsi/zfcp_ext.h | 5 +- drivers/s390/scsi/zfcp_scsi.c | 14 +- drivers/scsi/libsas/sas_scsi_host.c | 33 ++-- drivers/scsi/sg.c | 2 +- drivers/spi/spi-pxa2xx.h | 2 +- drivers/usb/usbip/stub.h | 2 + drivers/usb/usbip/stub_dev.c | 43 +++-- drivers/usb/usbip/stub_main.c | 105 +++++++++-- fs/btrfs/ctree.c | 6 +- fs/btrfs/tree-log.c | 7 + fs/btrfs/volumes.c | 9 + fs/ext2/inode.c | 10 -- fs/hfsplus/super.c | 1 + fs/lockd/svc.c | 2 + fs/pipe.c | 3 + fs/proc/base.c | 55 +++++- fs/proc/meminfo.c | 5 +- include/asm-generic/futex.h | 50 +----- include/linux/dmaengine.h | 20 ++- include/linux/efi.h | 8 +- include/linux/signal.h | 17 ++ include/linux/timekeeper_internal.h | 4 +- include/trace/events/xen.h | 16 -- include/uapi/linux/nl80211.h | 2 + kernel/auditsc.c | 7 +- kernel/exit.c | 4 + kernel/futex.c | 44 ++++- kernel/signal.c | 7 + kernel/time/tick-broadcast.c | 8 + kernel/time/timekeeping.c | 20 +-- mm/Kconfig | 1 + mm/filemap.c | 90 ++++++---- mm/util.c | 16 +- mm/vmscan.c | 12 +- net/bridge/br_if.c | 4 +- net/compat.c | 6 +- net/core/sock.c | 2 +- net/dccp/ccids/ccid2.c | 14 +- net/dccp/timer.c | 2 +- net/ipv4/ip_output.c | 3 +- net/ipv4/ping.c | 7 +- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_output.c | 7 +- net/ipv4/udp.c | 7 +- net/ipv6/ip6_output.c | 3 +- net/l2tp/l2tp_netlink.c | 2 - net/llc/af_llc.c | 3 + net/openvswitch/flow_netlink.c | 9 +- net/packet/af_packet.c | 4 +- net/sched/sch_fq.c | 37 ++-- net/sctp/associola.c | 30 +++- net/sctp/inqueue.c | 2 +- net/sctp/ipv6.c | 3 + net/sctp/sm_statefuns.c | 89 +++++----- net/wireless/core.c | 3 + net/xfrm/xfrm_state.c | 1 + sound/core/control_compat.c | 3 +- sound/core/timer.c | 220 +++++++++++------------- sound/pci/hda/hda_intel.c | 2 + sound/usb/mixer.c | 8 + 133 files changed, 1605 insertions(+), 1109 deletions(-)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ingo Molnar mingo@elte.hu
[ Upstream commit af3e0fcf78879f718c5f73df0814951bd7057d34 ]
Use disable_irq_nosync() instead of disable_irq() as this might be called in atomic context with netpoll.
Signed-off-by: Ingo Molnar mingo@elte.hu Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/8139too.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/realtek/8139too.c +++ b/drivers/net/ethernet/realtek/8139too.c @@ -2229,7 +2229,7 @@ static void rtl8139_poll_controller(stru struct rtl8139_private *tp = netdev_priv(dev); const int irq = tp->pci_dev->irq;
- disable_irq(irq); + disable_irq_nosync(irq); rtl8139_interrupt(irq, dev); enable_irq(irq); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit e8238fc2bd7b4c3c7554fa2df067e796610212fc ]
When we set a bond slave's master to bridge via ioctl, we only check the IFF_BRIDGE_PORT flag. Although we will find the slave's real master at netdev_master_upper_dev_link() later, it already does some settings and allocates some resources. It would be better to return as early as possible.
v1 -> v2: use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev() to check if we have a master, because not all upper devs are masters, e.g. vlan device.
Reported-by: syzbot+de73361ee4971b6e6f75@syzkaller.appspotmail.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_if.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -456,8 +456,8 @@ int br_add_if(struct net_bridge *br, str if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) return -ELOOP;
- /* Device is already being bridged */ - if (br_port_exists(dev)) + /* Device has master upper dev */ + if (netdev_master_upper_dev_get(dev)) return -EBUSY;
/* No bridging devices that dislike that (e.g. wireless) */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit a8d7aa17bbc970971ccdf71988ea19230ab368b1 ]
syzbot reported a crash in tasklet_action_common() caused by dccp.
dccp needs to make sure socket wont disappear before tasklet handler has completed.
This patch takes a reference on the socket when arming the tasklet, and moves the sock_put() from dccp_write_xmit_timer() to dccp_write_xmitlet()
kernel BUG at kernel/softirq.c:514! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.17.0-rc3+ #30 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP: 0018:ffff8801d9b3faf8 EFLAGS: 00010246 dccp_close: ABORT with 65423 bytes unread RAX: 1ffff1003b367f6b RBX: ffff8801daf1f3f0 RCX: 0000000000000000 RDX: ffff8801cf895498 RSI: 0000000000000004 RDI: 0000000000000000 RBP: ffff8801d9b3fc40 R08: ffffed0039f12a95 R09: ffffed0039f12a94 dccp_close: ABORT with 65423 bytes unread R10: ffffed0039f12a94 R11: ffff8801cf8954a3 R12: 0000000000000000 R13: ffff8801d9b3fc18 R14: dffffc0000000000 R15: ffff8801cf895490 FS: 0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2bc28000 CR3: 00000001a08a9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tasklet_action+0x1d/0x20 kernel/softirq.c:533 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 dccp_close: ABORT with 65423 bytes unread run_ksoftirqd+0x86/0x100 kernel/softirq.c:646 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Code: 48 8b 85 e8 fe ff ff 48 8b 95 f0 fe ff ff e9 94 fb ff ff 48 89 95 f0 fe ff ff e8 81 53 6e 00 48 8b 95 f0 fe ff ff e9 62 fb ff ff <0f> 0b 48 89 cf 48 89 8d e8 fe ff ff e8 64 53 6e 00 48 8b 8d e8 RIP: tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP: ffff8801d9b3faf8
Fixes: dc841e30eaea ("dccp: Extend CCID packet dequeueing interface") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid2.c | 14 ++++++++++++-- net/dccp/timer.c | 2 +- 2 files changed, 13 insertions(+), 3 deletions(-)
--- a/net/dccp/ccids/ccid2.c +++ b/net/dccp/ccids/ccid2.c @@ -126,6 +126,16 @@ static void ccid2_change_l_seq_window(st DCCPF_SEQ_WMAX)); }
+static void dccp_tasklet_schedule(struct sock *sk) +{ + struct tasklet_struct *t = &dccp_sk(sk)->dccps_xmitlet; + + if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) { + sock_hold(sk); + __tasklet_schedule(t); + } +} + static void ccid2_hc_tx_rto_expire(unsigned long data) { struct sock *sk = (struct sock *)data; @@ -166,7 +176,7 @@ static void ccid2_hc_tx_rto_expire(unsig
/* if we were blocked before, we may now send cwnd=1 packet */ if (sender_was_blocked) - tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet); + dccp_tasklet_schedule(sk); /* restart backed-off timer */ sk_reset_timer(sk, &hc->tx_rtotimer, jiffies + hc->tx_rto); out: @@ -706,7 +716,7 @@ static void ccid2_hc_tx_packet_recv(stru done: /* check if incoming Acks allow pending packets to be sent */ if (sender_was_blocked && !ccid2_cwnd_network_limited(hc)) - tasklet_schedule(&dccp_sk(sk)->dccps_xmitlet); + dccp_tasklet_schedule(sk); dccp_ackvec_parsed_cleanup(&hc->tx_av_chunks); }
--- a/net/dccp/timer.c +++ b/net/dccp/timer.c @@ -230,12 +230,12 @@ static void dccp_write_xmitlet(unsigned else dccp_write_xmit(sk); bh_unlock_sock(sk); + sock_put(sk); }
static void dccp_write_xmit_timer(unsigned long data) { dccp_write_xmitlet(data); - sock_put((struct sock *)data); }
void dccp_init_xmit_timers(struct sock *sk)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Ignatov rdna@fb.com
[ Upstream commit 1b97013bfb11d66f041de691de6f0fec748ce016 ]
Fix more memory leaks in ip_cmsg_send() callers. Part of them were fixed earlier in 919483096bfe.
* udp_sendmsg one was there since the beginning when linux sources were first added to git; * ping_v4_sendmsg one was copy/pasted in c319b4d76b9e.
Whenever return happens in udp_sendmsg() or ping_v4_sendmsg() IP options have to be freed if they were allocated previously.
Add label so that future callers (if any) can use it instead of kfree() before return that is easy to forget.
Fixes: c319b4d76b9e (net: ipv4: add IPPROTO_ICMP socket kind) Signed-off-by: Andrey Ignatov rdna@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ping.c | 7 +++++-- net/ipv4/udp.c | 7 +++++-- 2 files changed, 10 insertions(+), 4 deletions(-)
--- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -777,8 +777,10 @@ static int ping_v4_sendmsg(struct sock * ipc.addr = faddr = daddr;
if (ipc.opt && ipc.opt->opt.srr) { - if (!daddr) - return -EINVAL; + if (!daddr) { + err = -EINVAL; + goto out_free; + } faddr = ipc.opt->opt.faddr; } tos = get_rttos(&ipc, inet); @@ -843,6 +845,7 @@ back_from_confirm:
out: ip_rt_put(rt); +out_free: if (free) kfree(ipc.opt); if (!err) { --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -991,8 +991,10 @@ int udp_sendmsg(struct sock *sk, struct ipc.addr = faddr = daddr;
if (ipc.opt && ipc.opt->opt.srr) { - if (!daddr) - return -EINVAL; + if (!daddr) { + err = -EINVAL; + goto out_free; + } faddr = ipc.opt->opt.faddr; connected = 0; } @@ -1105,6 +1107,7 @@ do_append_data:
out: ip_rt_put(rt); +out_free: if (free) kfree(ipc.opt); if (!err)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2c5d5b13c6eb79f5677e206b8aad59b3a2097f60 ]
syzbot loves to set very small mtu on devices, since it brings joy. We must make llc_ui_sendmsg() fool proof.
usercopy: Kernel memory overwrite attempt detected to wrapped address (offset 0, size 18446612139802320068)!
kernel BUG at mm/usercopy.c:100! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 17464 Comm: syz-executor1 Not tainted 4.17.0-rc3+ #36 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:usercopy_abort+0xbb/0xbd mm/usercopy.c:88 RSP: 0018:ffff8801868bf800 EFLAGS: 00010282 RAX: 000000000000006c RBX: ffffffff87d2fb00 RCX: 0000000000000000 RDX: 000000000000006c RSI: ffffffff81610731 RDI: ffffed0030d17ef6 RBP: ffff8801868bf858 R08: ffff88018daa4200 R09: ffffed003b5c4fb0 R10: ffffed003b5c4fb0 R11: ffff8801dae27d87 R12: ffffffff87d2f8e0 R13: ffffffff87d2f7a0 R14: ffffffff87d2f7a0 R15: ffffffff87d2f7a0 FS: 00007f56a14ac700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2bc21000 CR3: 00000001abeb1000 CR4: 00000000001426f0 DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000030602 Call Trace: check_bogus_address mm/usercopy.c:153 [inline] __check_object_size+0x5d9/0x5d9 mm/usercopy.c:256 check_object_size include/linux/thread_info.h:108 [inline] check_copy_size include/linux/thread_info.h:139 [inline] copy_from_iter_full include/linux/uio.h:121 [inline] memcpy_from_msg include/linux/skbuff.h:3305 [inline] llc_ui_sendmsg+0x4b1/0x1530 net/llc/af_llc.c:941 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 __sys_sendto+0x3d7/0x670 net/socket.c:1789 __do_sys_sendto net/socket.c:1801 [inline] __se_sys_sendto net/socket.c:1797 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x455979 RSP: 002b:00007f56a14abc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f56a14ac6d4 RCX: 0000000000455979 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000018 RBP: 000000000072bea0 R08: 00000000200012c0 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000548 R14: 00000000006fbf60 R15: 0000000000000000 Code: 55 c0 e8 c0 55 bb ff ff 75 c8 48 8b 55 c0 4d 89 f9 ff 75 d0 4d 89 e8 48 89 d9 4c 89 e6 41 56 48 c7 c7 80 fa d2 87 e8 a0 0b a3 ff <0f> 0b e8 95 55 bb ff e8 c0 a8 f7 ff 8b 95 14 ff ff ff 4d 89 e8 RIP: usercopy_abort+0xbb/0xbd mm/usercopy.c:88 RSP: ffff8801868bf800
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/llc/af_llc.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -926,6 +926,9 @@ static int llc_ui_sendmsg(struct socket if (size > llc->dev->mtu) size = llc->dev->mtu; copied = size - hdrlen; + rc = -EINVAL; + if (copied < 0) + goto release; release_sock(sk); skb = sock_alloc_send_skb(sk, size, noblock, &rc); lock_sock(sk);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Taglang rob@taglang.io
[ Upstream commit 14224923c3600bae2ac4dcae3bf0c3d4dc2812be ]
Currently, skb->len and skb->data_len are set to the page size, not the packet size. This causes the frame check sequence to not be located at the "end" of the packet resulting in ethernet frame check errors. The driver does work currently, but stricter kernel facing networking solutions like OpenVSwitch will drop these packets as invalid.
These changes set the packet size correctly so that these errors no longer occur. The length does not include the frame check sequence, so that subtraction was removed.
Tested on Oracle/SUN Multithreaded 10-Gigabit Ethernet Network Controller [108e:abcd] and validated in wireshark.
Signed-off-by: Rob Taglang rob@taglang.io Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/sun/niu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/sun/niu.c +++ b/drivers/net/ethernet/sun/niu.c @@ -3442,7 +3442,7 @@ static int niu_process_rx_pkt(struct nap
len = (val & RCR_ENTRY_L2_LEN) >> RCR_ENTRY_L2_LEN_SHIFT; - len -= ETH_FCS_LEN; + append_size = len + ETH_HLEN + ETH_FCS_LEN;
addr = (val & RCR_ENTRY_PKT_BUF_ADDR) << RCR_ENTRY_PKT_BUF_ADDR_SHIFT; @@ -3452,7 +3452,6 @@ static int niu_process_rx_pkt(struct nap RCR_ENTRY_PKTBUFSZ_SHIFT];
off = addr & ~PAGE_MASK; - append_size = rcr_size; if (num_rcr == 1) { int ptype;
@@ -3465,7 +3464,7 @@ static int niu_process_rx_pkt(struct nap else skb_checksum_none_assert(skb); } else if (!(val & RCR_ENTRY_MULTI)) - append_size = len - skb->len; + append_size = append_size - skb->len;
niu_rx_skb_append(skb, page, off, append_size, rcr_size); if ((page->index + rp->rbr_block_size) - rcr_size == addr) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh moshe@mellanox.com
[ Upstream commit 6ad4e91c6d796b38a7f0e724db1de28eeb122bad ]
Add check of coalescing parameters received through ethtool are within range of values supported by the HW. Driver gets the coalescing rx/tx-usecs and rx/tx-frames as set by the users through ethtool. The ethtool support up to 32 bit value for each. However, mlx4 modify cq limits the coalescing time parameter and coalescing frames parameters to 16 bits. Return out of range error if user tries to set these parameters to higher values. Change type of sample-interval and adaptive_rx_coal parameters in mlx4 driver to u32 as the ethtool holds them as u32 and these parameters are not limited due to mlx4 HW.
Fixes: c27a02cd94d6 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC') Signed-off-by: Moshe Shemesh moshe@mellanox.com Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 16 ++++++++++++++++ drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 7 +++++-- 2 files changed, 21 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -967,6 +967,22 @@ static int mlx4_en_set_coalesce(struct n if (!coal->tx_max_coalesced_frames_irq) return -EINVAL;
+ if (coal->tx_coalesce_usecs > MLX4_EN_MAX_COAL_TIME || + coal->rx_coalesce_usecs > MLX4_EN_MAX_COAL_TIME || + coal->rx_coalesce_usecs_low > MLX4_EN_MAX_COAL_TIME || + coal->rx_coalesce_usecs_high > MLX4_EN_MAX_COAL_TIME) { + netdev_info(dev, "%s: maximum coalesce time supported is %d usecs\n", + __func__, MLX4_EN_MAX_COAL_TIME); + return -ERANGE; + } + + if (coal->tx_max_coalesced_frames > MLX4_EN_MAX_COAL_PKTS || + coal->rx_max_coalesced_frames > MLX4_EN_MAX_COAL_PKTS) { + netdev_info(dev, "%s: maximum coalesced frames supported is %d\n", + __func__, MLX4_EN_MAX_COAL_PKTS); + return -ERANGE; + } + priv->rx_frames = (coal->rx_max_coalesced_frames == MLX4_EN_AUTO_CONF) ? MLX4_EN_RX_COAL_TARGET : --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -140,6 +140,9 @@ enum { #define MLX4_EN_TX_COAL_PKTS 16 #define MLX4_EN_TX_COAL_TIME 0x10
+#define MLX4_EN_MAX_COAL_PKTS U16_MAX +#define MLX4_EN_MAX_COAL_TIME U16_MAX + #define MLX4_EN_RX_RATE_LOW 400000 #define MLX4_EN_RX_COAL_TIME_LOW 0 #define MLX4_EN_RX_RATE_HIGH 450000 @@ -518,8 +521,8 @@ struct mlx4_en_priv { u16 rx_usecs_low; u32 pkt_rate_high; u16 rx_usecs_high; - u16 sample_interval; - u16 adaptive_rx_coal; + u32 sample_interval; + u32 adaptive_rx_coal; u32 msg_enable; u32 loopback_ok; u32 validate_loopback;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7df40c2673a1307c3260aab6f9d4b9bf97ca8fd7 ]
Normally, a socket can not be freed/reused unless all its TX packets left qdisc and were TX-completed. However connect(AF_UNSPEC) allows this to happen.
With commit fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for reused flows") we cleared f->time_next_packet but took no special action if the flow was still in the throttled rb-tree.
Since f->time_next_packet is the key used in the rb-tree searches, blindly clearing it might break rb-tree integrity. We need to make sure the flow is no longer in the rb-tree to avoid this problem.
Fixes: fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for reused flows") Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_fq.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-)
--- a/net/sched/sch_fq.c +++ b/net/sched/sch_fq.c @@ -126,6 +126,28 @@ static bool fq_flow_is_detached(const st return f->next == &detached; }
+static bool fq_flow_is_throttled(const struct fq_flow *f) +{ + return f->next == &throttled; +} + +static void fq_flow_add_tail(struct fq_flow_head *head, struct fq_flow *flow) +{ + if (head->first) + head->last->next = flow; + else + head->first = flow; + head->last = flow; + flow->next = NULL; +} + +static void fq_flow_unset_throttled(struct fq_sched_data *q, struct fq_flow *f) +{ + rb_erase(&f->rate_node, &q->delayed); + q->throttled_flows--; + fq_flow_add_tail(&q->old_flows, f); +} + static void fq_flow_set_throttled(struct fq_sched_data *q, struct fq_flow *f) { struct rb_node **p = &q->delayed.rb_node, *parent = NULL; @@ -153,15 +175,6 @@ static void fq_flow_set_throttled(struct
static struct kmem_cache *fq_flow_cachep __read_mostly;
-static void fq_flow_add_tail(struct fq_flow_head *head, struct fq_flow *flow) -{ - if (head->first) - head->last->next = flow; - else - head->first = flow; - head->last = flow; - flow->next = NULL; -}
/* limit number of collected flows per round */ #define FQ_GC_MAX 8 @@ -265,6 +278,8 @@ static struct fq_flow *fq_classify(struc f->socket_hash != sk->sk_hash)) { f->credit = q->initial_quantum; f->socket_hash = sk->sk_hash; + if (fq_flow_is_throttled(f)) + fq_flow_unset_throttled(q, f); f->time_next_packet = 0ULL; } return f; @@ -419,9 +434,7 @@ static void fq_check_throttled(struct fq q->time_next_delayed_flow = f->time_next_packet; break; } - rb_erase(p, &q->delayed); - q->throttled_flows--; - fq_flow_add_tail(&q->old_flows, f); + fq_flow_unset_throttled(q, f); } }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lance Richardson lance.richardson.net@gmail.com
[ Upstream commit 988bf7243e03ef69238381594e0334a79cef74a6 ]
For the x32 ABI, struct timeval has two 64-bit fields. However the kernel currently interprets the user-space values used for the SO_RCVTIMEO and SO_SNDTIMEO socket options as having a pair of 32-bit fields.
When the seconds portion of the requested timeout is less than 2**32, the seconds portion of the effective timeout is correct but the microseconds portion is zero. When the seconds portion of the requested timeout is zero and the microseconds portion is non-zero, the kernel interprets the timeout as zero (never timeout).
Fix by using 64-bit time for SO_RCVTIMEO/SO_SNDTIMEO as required for the ABI.
The code included below demonstrates the problem.
Results before patch: $ gcc -m64 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 2.008181 seconds send time: 2.015985 seconds
$ gcc -m32 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 2.016763 seconds send time: 2.016062 seconds
$ gcc -mx32 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 1.007239 seconds send time: 1.023890 seconds
Results after patch: $ gcc -m64 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.010062 seconds send time: 2.015836 seconds
$ gcc -m32 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.013974 seconds send time: 2.015981 seconds
$ gcc -mx32 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.030257 seconds send time: 2.013383 seconds
#include <stdio.h> #include <stdlib.h> #include <sys/socket.h> #include <sys/types.h> #include <sys/time.h>
void checkrc(char *str, int rc) { if (rc >= 0) return;
perror(str); exit(1); }
static char buf[1024]; int main(int argc, char **argv) { int rc; int socks[2]; struct timeval tv; struct timeval start, end, delta;
rc = socketpair(AF_UNIX, SOCK_STREAM, 0, socks); checkrc("socketpair", rc);
/* set timeout to 1.999999 seconds */ tv.tv_sec = 1; tv.tv_usec = 999999; rc = setsockopt(socks[0], SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof tv); rc = setsockopt(socks[0], SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof tv); checkrc("setsockopt", rc);
/* measure actual receive timeout */ gettimeofday(&start, NULL); rc = recv(socks[0], buf, sizeof buf, 0); gettimeofday(&end, NULL); timersub(&end, &start, &delta);
printf("recv time: %ld.%06ld seconds\n", (long)delta.tv_sec, (long)delta.tv_usec);
/* fill send buffer */ do { rc = send(socks[0], buf, sizeof buf, 0); } while (rc > 0);
/* measure actual send timeout */ gettimeofday(&start, NULL); rc = send(socks[0], buf, sizeof buf, 0); gettimeofday(&end, NULL); timersub(&end, &start, &delta);
printf("send time: %ld.%06ld seconds\n", (long)delta.tv_sec, (long)delta.tv_usec); exit(0); }
Fixes: 515c7af85ed9 ("x32: Use compat shims for {g,s}etsockopt") Reported-by: Gopal RajagopalSai gopalsr83@gmail.com Signed-off-by: Lance Richardson lance.richardson.net@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/compat.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/compat.c +++ b/net/compat.c @@ -358,7 +358,8 @@ static int compat_sock_setsockopt(struct if (optname == SO_ATTACH_FILTER) return do_set_attach_filter(sock, level, optname, optval, optlen); - if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + if (!COMPAT_USE_64BIT_TIME && + (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) return do_set_sock_timeout(sock, level, optname, optval, optlen);
return sock_setsockopt(sock, level, optname, optval, optlen); @@ -423,7 +424,8 @@ static int do_get_sock_timeout(struct so static int compat_sock_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { - if (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO) + if (!COMPAT_USE_64BIT_TIME && + (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO)) return do_get_sock_timeout(sock, level, optname, optval, optlen); return sock_getsockopt(sock, level, optname, optval, optlen); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit 72f17baf2352ded6a1d3f4bb2d15da8c678cd2cb ]
If an OVS_ATTR_NESTED attribute type is found while walking through netlink attributes, we call nlattr_set() recursively passing the length table for the following nested attributes, if different from the current one.
However, once we're done with those sub-nested attributes, we should continue walking through attributes using the current table, instead of using the one related to the sub-nested attributes.
For example, given this sequence:
1 OVS_KEY_ATTR_PRIORITY 2 OVS_KEY_ATTR_TUNNEL 3 OVS_TUNNEL_KEY_ATTR_ID 4 OVS_TUNNEL_KEY_ATTR_IPV4_SRC 5 OVS_TUNNEL_KEY_ATTR_IPV4_DST 6 OVS_TUNNEL_KEY_ATTR_TTL 7 OVS_TUNNEL_KEY_ATTR_TP_SRC 8 OVS_TUNNEL_KEY_ATTR_TP_DST 9 OVS_KEY_ATTR_IN_PORT 10 OVS_KEY_ATTR_SKB_MARK 11 OVS_KEY_ATTR_MPLS
we switch to the 'ovs_tunnel_key_lens' table on attribute #3, and we don't switch back to 'ovs_key_lens' while setting attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is 15, we also get this kind of KASan splat while accessing the wrong table:
[ 7654.586496] ================================================================== [ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch] [ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430 [ 7654.610983] [ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1 [ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 [ 7654.631379] Call Trace: [ 7654.634108] [<ffffffffb65a7c50>] dump_stack+0x19/0x1b [ 7654.639843] [<ffffffffb53ff373>] print_address_description+0x33/0x290 [ 7654.647129] [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch] [ 7654.654607] [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330 [ 7654.661406] [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40 [ 7654.668789] [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch] [ 7654.676076] [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch] [ 7654.684234] [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40 [ 7654.689968] [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590 [ 7654.696574] [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch] [ 7654.705122] [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0 [ 7654.712503] [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21 [ 7654.719401] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.726298] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.733195] [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50 [ 7654.740187] [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0 [ 7654.746406] [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20 [ 7654.752914] [<ffffffffb53fe711>] ? memset+0x31/0x40 [ 7654.758456] [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch]
[snip]
[ 7655.132484] The buggy address belongs to the variable: [ 7655.138226] ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch] [ 7655.145507] [ 7655.147166] Memory state around the buggy address: [ 7655.152514] ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa [ 7655.160585] ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa [ 7655.176701] ^ [ 7655.184372] ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05 [ 7655.192431] ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.200490] ==================================================================
Reported-by: Hangbin Liu liuhangbin@gmail.com Fixes: 982b52700482 ("openvswitch: Fix mask generation for nested attributes.") Signed-off-by: Stefano Brivio sbrivio@redhat.com Reviewed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/openvswitch/flow_netlink.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1141,13 +1141,10 @@ static void nlattr_set(struct nlattr *at
/* The nlattr stream should already have been validated */ nla_for_each_nested(nla, attr, rem) { - if (tbl[nla_type(nla)].len == OVS_ATTR_NESTED) { - if (tbl[nla_type(nla)].next) - tbl = tbl[nla_type(nla)].next; - nlattr_set(nla, val, tbl); - } else { + if (tbl[nla_type(nla)].len == OVS_ATTR_NESTED) + nlattr_set(nla, val, tbl[nla_type(nla)].next ? : tbl); + else memset(nla_data(nla), val, nla_len(nla)); - }
if (nla_type(nla) == OVS_KEY_ATTR_CT_STATE) *(u32 *)nla_data(nla) &= CT_SUPPORTED_MASK;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 3148dedfe79e422f448a10250d3e2cdf8b7ee617 ]
Since commit a92a08499b1f "r8169: improve runtime pm in general and suspend unused ports" interfaces w/o link are runtime-suspended after 10s. On systems where drivers take longer to load this can lead to the situation that the interface is runtime-suspended already when it's initially brought up. This shouldn't be a problem because rtl_open() resumes MAC/PHY. However with at least one chip version the interface doesn't properly come up, as reported here: https://bugzilla.kernel.org/show_bug.cgi?id=199549
The vendor driver uses a delay to give certain chip versions some time to resume before starting the PHY configuration. So let's do the same. I don't know which chip versions may be affected, therefore apply this delay always.
This patch was reported to fix the issue for RTL8168h. I was able to reproduce the issue on an Asus H310I-Plus which also uses a RTL8168h. Also in my case the patch fixed the issue.
Reported-by: Slava Kardakov ojab@ojab.ru Tested-by: Slava Kardakov ojab@ojab.ru Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -4832,6 +4832,9 @@ static void rtl_pll_power_down(struct rt static void rtl_pll_power_up(struct rtl8169_private *tp) { rtl_generic_op(tp, tp->pll_power_ops.up); + + /* give MAC/PHY some time to resume */ + msleep(20); }
static void rtl_init_pll_power_ops(struct rtl8169_private *tp)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit d625329b06e46bd20baf9ee40847d11982569204 ]
Since sctp ipv6 socket also supports v4 addrs, it's possible to compare two v4 addrs in pf v6 .cmp_addr, sctp_inet6_cmp_addr.
However after Commit 1071ec9d453a ("sctp: do not check port in sctp_inet6_cmp_addr"), it no longer calls af1->cmp_addr, which in this case is sctp_v4_cmp_addr, but calls __sctp_v6_cmp_addr where it handles them as two v6 addrs. It would cause a out of bounds crash.
syzbot found this crash when trying to bind two v4 addrs to a v6 socket.
This patch fixes it by adding the process for two v4 addrs in sctp_inet6_cmp_addr.
Fixes: 1071ec9d453a ("sctp: do not check port in sctp_inet6_cmp_addr") Reported-by: syzbot+cd494c1dd681d4d93ebb@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/ipv6.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -863,6 +863,9 @@ static int sctp_inet6_cmp_addr(const uni if (sctp_is_any(sk, addr1) || sctp_is_any(sk, addr2)) return 1;
+ if (addr1->sa.sa_family == AF_INET && addr2->sa.sa_family == AF_INET) + return addr1->v4.sin_addr.s_addr == addr2->v4.sin_addr.s_addr; + return __sctp_v6_cmp_addr(addr1, addr2); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 46e16d4b956867013e0bbd7f2bad206f4aa55752 ]
When processing a duplicate cookie-echo chunk, for case 'D', sctp will not process the param from this chunk. It means old asoc has nothing to be updated, and the new temp asoc doesn't have the complete info.
So there's no reason to use the new asoc when creating the cookie-ack chunk. Otherwise, like when auth is enabled for cookie-ack, the chunk can not be set with auth, and it will definitely be dropped by peer.
This issue is there since very beginning, and we fix it by using the old asoc instead.
Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/sm_statefuns.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -1959,7 +1959,7 @@ static sctp_disposition_t sctp_sf_do_dup } }
- repl = sctp_make_cookie_ack(new_asoc, chunk); + repl = sctp_make_cookie_ack(asoc, chunk); if (!repl) goto nomem;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit d89a2adb8bfe6f8949ff389acdb9fa298b6e8e12 ]
tg3_free_consistent() calls dma_free_coherent() to free tp->hw_stats under spinlock and can trigger BUG_ON() in vunmap() because vunmap() may sleep. Fix it by removing the spinlock and relying on the TG3_FLAG_INIT_COMPLETE flag to prevent race conditions between tg3_get_stats64() and tg3_free_consistent(). TG3_FLAG_INIT_COMPLETE is always cleared under tp->lock before tg3_free_consistent() and therefore tg3_get_stats64() can safely access tp->hw_stats under tp->lock if TG3_FLAG_INIT_COMPLETE is set.
Fixes: f5992b72ebe0 ("tg3: Fix race condition in tg3_get_stats64().") Reported-by: Zumeng Chen zumeng.chen@gmail.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/tg3.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -8722,14 +8722,15 @@ static void tg3_free_consistent(struct t tg3_mem_rx_release(tp); tg3_mem_tx_release(tp);
- /* Protect tg3_get_stats64() from reading freed tp->hw_stats. */ - tg3_full_lock(tp, 0); + /* tp->hw_stats can be referenced safely: + * 1. under rtnl_lock + * 2. or under tp->lock if TG3_FLAG_INIT_COMPLETE is set. + */ if (tp->hw_stats) { dma_free_coherent(&tp->pdev->dev, sizeof(struct tg3_hw_stats), tp->hw_stats, tp->stats_mapping); tp->hw_stats = NULL; } - tg3_full_unlock(tp); }
/* @@ -14163,7 +14164,7 @@ static struct rtnl_link_stats64 *tg3_get struct tg3 *tp = netdev_priv(dev);
spin_lock_bh(&tp->lock); - if (!tp->hw_stats) { + if (!tp->hw_stats || !tg3_flag(tp, INIT_COMPLETE)) { *stats = tp->net_stats_prev; spin_unlock_bh(&tp->lock); return stats;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Debabrata Banerjee dbanerje@akamai.com
[ Upstream commit 4fa8667ca3989ce14cf66301fa251544fbddbdd0 ]
Make sure multicast, broadcast, and zero mac's cannot be the output of rlb updates, which should all be directed arps. Receive load balancing will be collapsed if any of these happen, as the switch will broadcast.
Signed-off-by: Debabrata Banerjee dbanerje@akamai.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_alb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -453,7 +453,7 @@ static void rlb_update_client(struct rlb { int i;
- if (!client_info->slave) + if (!client_info->slave || !is_valid_ether_addr(client_info->mac_dst)) return;
for (i = 0; i < RLB_ARP_BURST_SIZE; i++) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit 16ae6aa1705299789f71fdea59bfb119c1fbd9c0 ]
The TCP repair sequence of operation is to first set the socket in repair mode, then inject the TCP stats into the socket with repair socket options, then call connect() to re-activate the socket. The connect syscall simply returns and set state to ESTABLISHED mode. As a result Fast Open is meaningless for TCP repair.
However allowing sendto() system call with MSG_FASTOPEN flag half-way during the repair operation could unexpectedly cause data to be sent, before the operation finishes changing the internal TCP stats (e.g. MSS). This in turn triggers TCP warnings on inconsistent packet accounting.
The fix is to simply disallow Fast Open operation once the socket is in the repair mode.
Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Yuchung Cheng ycheng@google.com Reviewed-by: Neal Cardwell ncardwell@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1108,7 +1108,7 @@ int tcp_sendmsg(struct sock *sk, struct lock_sock(sk);
flags = msg->msg_flags; - if (flags & MSG_FASTOPEN) { + if ((flags & MSG_FASTOPEN) && !tp->repair) { err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size); if (err == -EINPROGRESS && copied_syn > 0) goto out;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit ce402f044e4e432c296f90eaabb8dbe8f3624391 ]
When auth is enabled for cookie-ack chunk, in sctp_inq_pop, sctp processes auth chunk first, then continues to the next chunk in this packet if chunk_end + chunk_hdr size < skb_tail_pointer(). Otherwise, it will go to the next packet or discard this chunk.
However, it missed the fact that cookie-ack chunk's size is equal to chunk_hdr size, which couldn't match that check, and thus this chunk would not get processed.
This patch fixes it by changing the check to chunk_end + chunk_hdr size <= skb_tail_pointer().
Fixes: 26b87c788100 ("net: sctp: fix remote memory pressure from excessive queueing") Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/inqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sctp/inqueue.c +++ b/net/sctp/inqueue.c @@ -178,7 +178,7 @@ struct sctp_chunk *sctp_inq_pop(struct s skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t)); chunk->subh.v = NULL; /* Subheader is no longer valid. */
- if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) < + if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <= skb_tail_pointer(chunk->skb)) { /* This is not a singleton */ chunk->singleton = 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]
Now sctp only delays the authentication for the normal cookie-echo chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does authentication first based on the old asoc, which will definitely fail due to the different auth info in the old asoc.
The duplicated cookie-echo chunk will create a new asoc with the auth info from this chunk, and the authentication should also be done with the new asoc's auth info for all of the collision 'A', 'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth will never pass the authentication and create the new connection.
This issue exists since very beginning, and this fix is to make sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does for the normal cookie-echo chunk to delay the authentication.
While at it, remove the unused params from sctp_sf_authenticate() and define sctp_auth_chunk_verify() used for all the places that do the delayed authentication.
v1->v2: fix the typo in changelog as Marcelo noticed.
Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/associola.c | 30 ++++++++++++++++ net/sctp/sm_statefuns.c | 87 ++++++++++++++++++++++++++---------------------- 2 files changed, 77 insertions(+), 40 deletions(-)
--- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1000,9 +1000,10 @@ static void sctp_assoc_bh_rcv(struct wor struct sctp_endpoint *ep; struct sctp_chunk *chunk; struct sctp_inq *inqueue; - int state; sctp_subtype_t subtype; + int first_time = 1; /* is this the first time through the loop */ int error = 0; + int state;
/* The association should be held so we should be safe. */ ep = asoc->ep; @@ -1013,6 +1014,30 @@ static void sctp_assoc_bh_rcv(struct wor state = asoc->state; subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
+ /* If the first chunk in the packet is AUTH, do special + * processing specified in Section 6.3 of SCTP-AUTH spec + */ + if (first_time && subtype.chunk == SCTP_CID_AUTH) { + struct sctp_chunkhdr *next_hdr; + + next_hdr = sctp_inq_peek(inqueue); + if (!next_hdr) + goto normal; + + /* If the next chunk is COOKIE-ECHO, skip the AUTH + * chunk while saving a pointer to it so we can do + * Authentication later (during cookie-echo + * processing). + */ + if (next_hdr->type == SCTP_CID_COOKIE_ECHO) { + chunk->auth_chunk = skb_clone(chunk->skb, + GFP_ATOMIC); + chunk->auth = 1; + continue; + } + } + +normal: /* SCTP-AUTH, Section 6.3: * The receiver has a list of chunk types which it expects * to be received only after an AUTH-chunk. This list has @@ -1051,6 +1076,9 @@ static void sctp_assoc_bh_rcv(struct wor /* If there is an error on chunk, discard this packet. */ if (error && chunk) chunk->pdiscard = 1; + + if (first_time) + first_time = 0; } sctp_association_put(asoc); } --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -144,10 +144,8 @@ static sctp_disposition_t sctp_sf_violat void *arg, sctp_cmd_seq_t *commands);
-static sctp_ierror_t sctp_sf_authenticate(struct net *net, - const struct sctp_endpoint *ep, +static sctp_ierror_t sctp_sf_authenticate( const struct sctp_association *asoc, - const sctp_subtype_t type, struct sctp_chunk *chunk);
static sctp_disposition_t __sctp_sf_do_9_1_abort(struct net *net, @@ -615,6 +613,38 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(s return SCTP_DISPOSITION_CONSUME; }
+static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk, + const struct sctp_association *asoc) +{ + struct sctp_chunk auth; + + if (!chunk->auth_chunk) + return true; + + /* SCTP-AUTH: auth_chunk pointer is only set when the cookie-echo + * is supposed to be authenticated and we have to do delayed + * authentication. We've just recreated the association using + * the information in the cookie and now it's much easier to + * do the authentication. + */ + + /* Make sure that we and the peer are AUTH capable */ + if (!net->sctp.auth_enable || !asoc->peer.auth_capable) + return false; + + /* set-up our fake chunk so that we can process it */ + auth.skb = chunk->auth_chunk; + auth.asoc = chunk->asoc; + auth.sctp_hdr = chunk->sctp_hdr; + auth.chunk_hdr = (struct sctp_chunkhdr *) + skb_push(chunk->auth_chunk, + sizeof(struct sctp_chunkhdr)); + skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr)); + auth.transport = chunk->transport; + + return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR; +} + /* * Respond to a normal COOKIE ECHO chunk. * We are the side that is being asked for an association. @@ -751,36 +781,9 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(st if (error) goto nomem_init;
- /* SCTP-AUTH: auth_chunk pointer is only set when the cookie-echo - * is supposed to be authenticated and we have to do delayed - * authentication. We've just recreated the association using - * the information in the cookie and now it's much easier to - * do the authentication. - */ - if (chunk->auth_chunk) { - struct sctp_chunk auth; - sctp_ierror_t ret; - - /* Make sure that we and the peer are AUTH capable */ - if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) { - sctp_association_free(new_asoc); - return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); - } - - /* set-up our fake chunk so that we can process it */ - auth.skb = chunk->auth_chunk; - auth.asoc = chunk->asoc; - auth.sctp_hdr = chunk->sctp_hdr; - auth.chunk_hdr = (sctp_chunkhdr_t *)skb_push(chunk->auth_chunk, - sizeof(sctp_chunkhdr_t)); - skb_pull(chunk->auth_chunk, sizeof(sctp_chunkhdr_t)); - auth.transport = chunk->transport; - - ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth); - if (ret != SCTP_IERROR_NO_ERROR) { - sctp_association_free(new_asoc); - return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); - } + if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) { + sctp_association_free(new_asoc); + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); }
repl = sctp_make_cookie_ack(new_asoc, chunk); @@ -1717,13 +1720,15 @@ static sctp_disposition_t sctp_sf_do_dup GFP_ATOMIC)) goto nomem;
+ if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) + return SCTP_DISPOSITION_DISCARD; + /* Make sure no new addresses are being added during the * restart. Though this is a pretty complicated attack * since you'd have to get inside the cookie. */ - if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) { + if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) return SCTP_DISPOSITION_CONSUME; - }
/* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes * the peer has restarted (Action A), it MUST NOT setup a new @@ -1828,6 +1833,9 @@ static sctp_disposition_t sctp_sf_do_dup GFP_ATOMIC)) goto nomem;
+ if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) + return SCTP_DISPOSITION_DISCARD; + /* Update the content of current association. */ sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc)); sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE, @@ -1920,6 +1928,9 @@ static sctp_disposition_t sctp_sf_do_dup * a COOKIE ACK. */
+ if (!sctp_auth_chunk_verify(net, chunk, asoc)) + return SCTP_DISPOSITION_DISCARD; + /* Don't accidentally move back into established state. */ if (asoc->state < SCTP_STATE_ESTABLISHED) { sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP, @@ -3985,10 +3996,8 @@ gen_shutdown: * * The return value is the disposition of the chunk. */ -static sctp_ierror_t sctp_sf_authenticate(struct net *net, - const struct sctp_endpoint *ep, +static sctp_ierror_t sctp_sf_authenticate( const struct sctp_association *asoc, - const sctp_subtype_t type, struct sctp_chunk *chunk) { struct sctp_authhdr *auth_hdr; @@ -4087,7 +4096,7 @@ sctp_disposition_t sctp_sf_eat_auth(stru commands);
auth_hdr = (struct sctp_authhdr *)chunk->skb->data; - error = sctp_sf_authenticate(net, ep, asoc, type, chunk); + error = sctp_sf_authenticate(asoc, chunk); switch (error) { case SCTP_IERROR_AUTH_BAD_HMAC: /* Generate the ERROR chunk and discard the rest
On Thu, 2018-05-24 at 11:37 +0200, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]
Now sctp only delays the authentication for the normal cookie-echo chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does authentication first based on the old asoc, which will definitely fail due to the different auth info in the old asoc.
[...]
--- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1000,9 +1000,10 @@ static void sctp_assoc_bh_rcv(struct wor struct sctp_endpoint *ep; struct sctp_chunk *chunk; struct sctp_inq *inqueue;
- int state;
sctp_subtype_t subtype;
- int first_time = 1; /* is this the first time through the loop */
int error = 0;
- int state;
/* The association should be held so we should be safe. */ ep = asoc->ep; @@ -1013,6 +1014,30 @@ static void sctp_assoc_bh_rcv(struct wor state = asoc->state; subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
/* If the first chunk in the packet is AUTH, do special
* processing specified in Section 6.3 of SCTP-AUTH spec
*/
if (first_time && subtype.chunk == SCTP_CID_AUTH) {
struct sctp_chunkhdr *next_hdr;
next_hdr = sctp_inq_peek(inqueue);
if (!next_hdr)
goto normal;
/* If the next chunk is COOKIE-ECHO, skip the AUTH
* chunk while saving a pointer to it so we can do
* Authentication later (during cookie-echo
* processing).
*/
if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
chunk->auth_chunk = skb_clone(chunk->skb,
GFP_ATOMIC);
chunk->auth = 1;
Doesn't the first_time flag need to be cleared here (and before the other continue statement in this loop)?
Ben.
continue;
}
}
+normal:
[...]
On Wed, Jun 06, 2018 at 11:31:47PM +0100, Ben Hutchings wrote:
On Thu, 2018-05-24 at 11:37 +0200, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 59d8d4434f429b4fa8a346fd889058bda427a837 ]
Now sctp only delays the authentication for the normal cookie-echo chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does authentication first based on the old asoc, which will definitely fail due to the different auth info in the old asoc.
[...]
--- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -1000,9 +1000,10 @@ static void sctp_assoc_bh_rcv(struct wor struct sctp_endpoint *ep; struct sctp_chunk *chunk; struct sctp_inq *inqueue;
- int state;
sctp_subtype_t subtype;
- int first_time = 1; /* is this the first time through the loop */
int error = 0;
- int state;
/* The association should be held so we should be safe. */ ep = asoc->ep; @@ -1013,6 +1014,30 @@ static void sctp_assoc_bh_rcv(struct wor state = asoc->state; subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
/* If the first chunk in the packet is AUTH, do special
* processing specified in Section 6.3 of SCTP-AUTH spec
*/
if (first_time && subtype.chunk == SCTP_CID_AUTH) {
struct sctp_chunkhdr *next_hdr;
next_hdr = sctp_inq_peek(inqueue);
if (!next_hdr)
goto normal;
/* If the next chunk is COOKIE-ECHO, skip the AUTH
* chunk while saving a pointer to it so we can do
* Authentication later (during cookie-echo
* processing).
*/
if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
chunk->auth_chunk = skb_clone(chunk->skb,
GFP_ATOMIC);
chunk->auth = 1;
Doesn't the first_time flag need to be cleared here (and before the other continue statement in this loop)?
Seems the description is not matching the code closely. As is, first_time is about the first time an AUTH chunk is handled followed by a COOKIE-ECHO chunk (which is what we wanted, in the end), and not strictly enforcing 'first chunk in the packet', as the description says.
We should rename this first_time into a chunk counter instead. It may even help with debugging on crashes.
Thanks for reviewing this, btw.
Marcelo
Ben.
continue;
}
}
+normal:
[...]
-- Ben Hutchings, Software Developer Codethink Ltd https://www.codethink.co.uk/ Dale House, 35 Dale Street Manchester, M1 2HF, United Kingdom
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit f65e0d299807d8a11812845c972493c3f9a18e10 upstream.
snd_timer_notify1() is called outside the spinlock and it retakes the lock after the unlock. This is rather racy, and it's safer to move snd_timer_notify() call inside the main spinlock.
The patch also contains a slight refactoring / cleanup of the code. Now all start/stop/continue/pause look more symmetric and a bit better readable.
Signed-off-by: Takashi Iwai tiwai@suse.de Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/timer.c | 220 ++++++++++++++++++++++++----------------------------- 1 file changed, 102 insertions(+), 118 deletions(-)
--- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -318,8 +318,6 @@ int snd_timer_open(struct snd_timer_inst return 0; }
-static int _snd_timer_stop(struct snd_timer_instance *timeri, int event); - /* * close a timer instance */ @@ -408,7 +406,6 @@ unsigned long snd_timer_resolution(struc static void snd_timer_notify1(struct snd_timer_instance *ti, int event) { struct snd_timer *timer; - unsigned long flags; unsigned long resolution = 0; struct snd_timer_instance *ts; struct timespec tstamp; @@ -432,34 +429,66 @@ static void snd_timer_notify1(struct snd return; if (timer->hw.flags & SNDRV_TIMER_HW_SLAVE) return; - spin_lock_irqsave(&timer->lock, flags); list_for_each_entry(ts, &ti->slave_active_head, active_list) if (ts->ccallback) ts->ccallback(ts, event + 100, &tstamp, resolution); - spin_unlock_irqrestore(&timer->lock, flags); }
-static int snd_timer_start1(struct snd_timer *timer, struct snd_timer_instance *timeri, - unsigned long sticks) +/* start/continue a master timer */ +static int snd_timer_start1(struct snd_timer_instance *timeri, + bool start, unsigned long ticks) { + struct snd_timer *timer; + int result; + unsigned long flags; + + timer = timeri->timer; + if (!timer) + return -EINVAL; + + spin_lock_irqsave(&timer->lock, flags); + if (timer->card && timer->card->shutdown) { + result = -ENODEV; + goto unlock; + } + if (timeri->flags & (SNDRV_TIMER_IFLG_RUNNING | + SNDRV_TIMER_IFLG_START)) { + result = -EBUSY; + goto unlock; + } + + if (start) + timeri->ticks = timeri->cticks = ticks; + else if (!timeri->cticks) + timeri->cticks = 1; + timeri->pticks = 0; + list_move_tail(&timeri->active_list, &timer->active_list_head); if (timer->running) { if (timer->hw.flags & SNDRV_TIMER_HW_SLAVE) goto __start_now; timer->flags |= SNDRV_TIMER_FLG_RESCHED; timeri->flags |= SNDRV_TIMER_IFLG_START; - return 1; /* delayed start */ + result = 1; /* delayed start */ } else { - timer->sticks = sticks; + if (start) + timer->sticks = ticks; timer->hw.start(timer); __start_now: timer->running++; timeri->flags |= SNDRV_TIMER_IFLG_RUNNING; - return 0; + result = 0; } + snd_timer_notify1(timeri, start ? SNDRV_TIMER_EVENT_START : + SNDRV_TIMER_EVENT_CONTINUE); + unlock: + spin_unlock_irqrestore(&timer->lock, flags); + return result; }
-static int snd_timer_start_slave(struct snd_timer_instance *timeri) +/* start/continue a slave timer */ +static int snd_timer_start_slave(struct snd_timer_instance *timeri, + bool start) { unsigned long flags;
@@ -473,88 +502,37 @@ static int snd_timer_start_slave(struct spin_lock(&timeri->timer->lock); list_add_tail(&timeri->active_list, &timeri->master->slave_active_head); + snd_timer_notify1(timeri, start ? SNDRV_TIMER_EVENT_START : + SNDRV_TIMER_EVENT_CONTINUE); spin_unlock(&timeri->timer->lock); } spin_unlock_irqrestore(&slave_active_lock, flags); return 1; /* delayed start */ }
-/* - * start the timer instance - */ -int snd_timer_start(struct snd_timer_instance *timeri, unsigned int ticks) -{ - struct snd_timer *timer; - int result = -EINVAL; - unsigned long flags; - - if (timeri == NULL || ticks < 1) - return -EINVAL; - if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) { - result = snd_timer_start_slave(timeri); - if (result >= 0) - snd_timer_notify1(timeri, SNDRV_TIMER_EVENT_START); - return result; - } - timer = timeri->timer; - if (timer == NULL) - return -EINVAL; - if (timer->card && timer->card->shutdown) - return -ENODEV; - spin_lock_irqsave(&timer->lock, flags); - if (timeri->flags & (SNDRV_TIMER_IFLG_RUNNING | - SNDRV_TIMER_IFLG_START)) { - result = -EBUSY; - goto unlock; - } - timeri->ticks = timeri->cticks = ticks; - timeri->pticks = 0; - result = snd_timer_start1(timer, timeri, ticks); - unlock: - spin_unlock_irqrestore(&timer->lock, flags); - if (result >= 0) - snd_timer_notify1(timeri, SNDRV_TIMER_EVENT_START); - return result; -} - -static int _snd_timer_stop(struct snd_timer_instance *timeri, int event) +/* stop/pause a master timer */ +static int snd_timer_stop1(struct snd_timer_instance *timeri, bool stop) { struct snd_timer *timer; + int result = 0; unsigned long flags;
- if (snd_BUG_ON(!timeri)) - return -ENXIO; - - if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) { - spin_lock_irqsave(&slave_active_lock, flags); - if (!(timeri->flags & SNDRV_TIMER_IFLG_RUNNING)) { - spin_unlock_irqrestore(&slave_active_lock, flags); - return -EBUSY; - } - if (timeri->timer) - spin_lock(&timeri->timer->lock); - timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING; - list_del_init(&timeri->ack_list); - list_del_init(&timeri->active_list); - if (timeri->timer) - spin_unlock(&timeri->timer->lock); - spin_unlock_irqrestore(&slave_active_lock, flags); - goto __end; - } timer = timeri->timer; if (!timer) return -EINVAL; spin_lock_irqsave(&timer->lock, flags); if (!(timeri->flags & (SNDRV_TIMER_IFLG_RUNNING | SNDRV_TIMER_IFLG_START))) { - spin_unlock_irqrestore(&timer->lock, flags); - return -EBUSY; + result = -EBUSY; + goto unlock; } list_del_init(&timeri->ack_list); list_del_init(&timeri->active_list); - if (timer->card && timer->card->shutdown) { - spin_unlock_irqrestore(&timer->lock, flags); - return 0; + if (timer->card && timer->card->shutdown) + goto unlock; + if (stop) { + timeri->cticks = timeri->ticks; + timeri->pticks = 0; } if ((timeri->flags & SNDRV_TIMER_IFLG_RUNNING) && !(--timer->running)) { @@ -569,35 +547,60 @@ static int _snd_timer_stop(struct snd_ti } } timeri->flags &= ~(SNDRV_TIMER_IFLG_RUNNING | SNDRV_TIMER_IFLG_START); + snd_timer_notify1(timeri, stop ? SNDRV_TIMER_EVENT_STOP : + SNDRV_TIMER_EVENT_CONTINUE); + unlock: spin_unlock_irqrestore(&timer->lock, flags); - __end: - if (event != SNDRV_TIMER_EVENT_RESOLUTION) - snd_timer_notify1(timeri, event); + return result; +} + +/* stop/pause a slave timer */ +static int snd_timer_stop_slave(struct snd_timer_instance *timeri, bool stop) +{ + unsigned long flags; + + spin_lock_irqsave(&slave_active_lock, flags); + if (!(timeri->flags & SNDRV_TIMER_IFLG_RUNNING)) { + spin_unlock_irqrestore(&slave_active_lock, flags); + return -EBUSY; + } + timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING; + if (timeri->timer) { + spin_lock(&timeri->timer->lock); + list_del_init(&timeri->ack_list); + list_del_init(&timeri->active_list); + snd_timer_notify1(timeri, stop ? SNDRV_TIMER_EVENT_STOP : + SNDRV_TIMER_EVENT_CONTINUE); + spin_unlock(&timeri->timer->lock); + } + spin_unlock_irqrestore(&slave_active_lock, flags); return 0; }
/* + * start the timer instance + */ +int snd_timer_start(struct snd_timer_instance *timeri, unsigned int ticks) +{ + if (timeri == NULL || ticks < 1) + return -EINVAL; + if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) + return snd_timer_start_slave(timeri, true); + else + return snd_timer_start1(timeri, true, ticks); +} + +/* * stop the timer instance. * * do not call this from the timer callback! */ int snd_timer_stop(struct snd_timer_instance *timeri) { - struct snd_timer *timer; - unsigned long flags; - int err; - - err = _snd_timer_stop(timeri, SNDRV_TIMER_EVENT_STOP); - if (err < 0) - return err; - timer = timeri->timer; - if (!timer) - return -EINVAL; - spin_lock_irqsave(&timer->lock, flags); - timeri->cticks = timeri->ticks; - timeri->pticks = 0; - spin_unlock_irqrestore(&timer->lock, flags); - return 0; + if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) + return snd_timer_stop_slave(timeri, true); + else + return snd_timer_stop1(timeri, true); }
/* @@ -605,32 +608,10 @@ int snd_timer_stop(struct snd_timer_inst */ int snd_timer_continue(struct snd_timer_instance *timeri) { - struct snd_timer *timer; - int result = -EINVAL; - unsigned long flags; - - if (timeri == NULL) - return result; if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) - return snd_timer_start_slave(timeri); - timer = timeri->timer; - if (! timer) - return -EINVAL; - if (timer->card && timer->card->shutdown) - return -ENODEV; - spin_lock_irqsave(&timer->lock, flags); - if (timeri->flags & SNDRV_TIMER_IFLG_RUNNING) { - result = -EBUSY; - goto unlock; - } - if (!timeri->cticks) - timeri->cticks = 1; - timeri->pticks = 0; - result = snd_timer_start1(timer, timeri, timer->sticks); - unlock: - spin_unlock_irqrestore(&timer->lock, flags); - snd_timer_notify1(timeri, SNDRV_TIMER_EVENT_CONTINUE); - return result; + return snd_timer_start_slave(timeri, false); + else + return snd_timer_start1(timeri, false, 0); }
/* @@ -638,7 +619,10 @@ int snd_timer_continue(struct snd_timer_ */ int snd_timer_pause(struct snd_timer_instance * timeri) { - return _snd_timer_stop(timeri, SNDRV_TIMER_EVENT_PAUSE); + if (timeri->flags & SNDRV_TIMER_IFLG_SLAVE) + return snd_timer_stop_slave(timeri, false); + else + return snd_timer_stop1(timeri, false); }
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Guy Briggs rgb@redhat.com
commit 76a658c20efd541a62838d9ff68ce94170d7a549 upstream.
Move the calculations of values after the allocation in case the allocation fails. This avoids wasting effort in the rare case that it fails, but more importantly saves us extra logic to release the tty ref.
Signed-off-by: Richard Guy Briggs rgb@redhat.com Signed-off-by: Paul Moore paul@paul-moore.com Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/auditsc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1981,14 +1981,15 @@ static void audit_log_set_loginuid(kuid_ if (!audit_enabled) return;
+ ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_LOGIN); + if (!ab) + return; + uid = from_kuid(&init_user_ns, task_uid(current)); oldloginuid = from_kuid(&init_user_ns, koldloginuid); loginuid = from_kuid(&init_user_ns, kloginuid), tty = audit_get_tty(current);
- ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_LOGIN); - if (!ab) - return; audit_log_format(ab, "pid=%d uid=%u", task_pid_nr(current), uid); audit_log_task_context(ab); audit_log_format(ab, " old-auid=%u auid=%u tty=%s old-ses=%u ses=%u res=%d",
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ard.biesheuvel@linaro.org
commit 30b5ba5cf333cc650e474eaf2cc1ae91bc7cf89f upstream.
Implement a macro mov_q that can be used to move an immediate constant into a 64-bit register, using between 2 and 4 movz/movk instructions (depending on the operand)
Acked-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/assembler.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -204,4 +204,24 @@ lr .req x30 // link register .size __pi_##x, . - x; \ ENDPROC(x)
+ /* + * mov_q - move an immediate constant into a 64-bit register using + * between 2 and 4 movz/movk instructions (depending on the + * magnitude and sign of the operand) + */ + .macro mov_q, reg, val + .if (((\val) >> 31) == 0 || ((\val) >> 31) == 0x1ffffffff) + movz \reg, :abs_g1_s:\val + .else + .if (((\val) >> 47) == 0 || ((\val) >> 47) == 0x1ffff) + movz \reg, :abs_g2_s:\val + .else + movz \reg, :abs_g3:\val + movk \reg, :abs_g2_nc:\val + .endif + movk \reg, :abs_g1_nc:\val + .endif + movk \reg, :abs_g0_nc:\val + .endm + #endif /* __ASM_ASSEMBLER_H */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suzuki K Poulose suzuki.poulose@arm.com
commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream.
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer from an erratum 1024718, which causes incorrect updates when DBM/AP bits in a page table entry is modified without a break-before-make sequence. The work around is to skip enabling the hardware DBM feature on the affected cores. The hardware Access Flag management features is not affected. There are some other cores suffering from this errata, which could be added to the midr_list to trigger the work around.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: ckadabi@codeaurora.org Reviewed-by: Dave Martin dave.martin@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/Kconfig | 14 ++++++++++++ arch/arm64/include/asm/assembler.h | 40 +++++++++++++++++++++++++++++++++++++ arch/arm64/include/asm/cputype.h | 11 ++++++++++ arch/arm64/mm/proc.S | 5 ++++ 4 files changed, 70 insertions(+)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -375,6 +375,20 @@ config ARM64_ERRATUM_843419
If unsure, say Y.
+config ARM64_ERRATUM_1024718 + bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update" + default y + help + This option adds work around for Arm Cortex-A55 Erratum 1024718. + + Affected Cortex-A55 cores (r0p0, r0p1, r1p0) could cause incorrect + update of the hardware dirty bit when the DBM/AP bits are updated + without a break-before-make. The work around is to disable the usage + of hardware DBM locally on the affected cores. CPUs not affected by + erratum will continue to use the feature. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -23,6 +23,7 @@ #ifndef __ASM_ASSEMBLER_H #define __ASM_ASSEMBLER_H
+#include <asm/cputype.h> #include <asm/ptrace.h> #include <asm/thread_info.h>
@@ -224,4 +225,43 @@ lr .req x30 // link register movk \reg, :abs_g0_nc:\val .endm
+/* + * Check the MIDR_EL1 of the current CPU for a given model and a range of + * variant/revision. See asm/cputype.h for the macros used below. + * + * model: MIDR_CPU_PART of CPU + * rv_min: Minimum of MIDR_CPU_VAR_REV() + * rv_max: Maximum of MIDR_CPU_VAR_REV() + * res: Result register. + * tmp1, tmp2, tmp3: Temporary registers + * + * Corrupts: res, tmp1, tmp2, tmp3 + * Returns: 0, if the CPU id doesn't match. Non-zero otherwise + */ + .macro cpu_midr_match model, rv_min, rv_max, res, tmp1, tmp2, tmp3 + mrs \res, midr_el1 + mov_q \tmp1, (MIDR_REVISION_MASK | MIDR_VARIANT_MASK) + mov_q \tmp2, MIDR_CPU_PART_MASK + and \tmp3, \res, \tmp2 // Extract model + and \tmp1, \res, \tmp1 // rev & variant + mov_q \tmp2, \model + cmp \tmp3, \tmp2 + cset \res, eq + cbz \res, .Ldone@ // Model matches ? + + .if (\rv_min != 0) // Skip min check if rv_min == 0 + mov_q \tmp3, \rv_min + cmp \tmp1, \tmp3 + cset \res, ge + .endif // \rv_min != 0 + /* Skip rv_max check if rv_min == rv_max && rv_min != 0 */ + .if ((\rv_min != \rv_max) || \rv_min == 0) + mov_q \tmp2, \rv_max + cmp \tmp1, \tmp2 + cset \tmp2, le + and \res, \res, \tmp2 + .endif +.Ldone@: + .endm + #endif /* __ASM_ASSEMBLER_H */ --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -57,6 +57,14 @@ #define MIDR_IMPLEMENTOR(midr) \ (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
+#define MIDR_CPU_VAR_REV(var, rev) \ + (((var) << MIDR_VARIANT_SHIFT) | (rev)) + +#define MIDR_CPU_PART_MASK \ + (MIDR_IMPLEMENTOR_MASK | \ + MIDR_ARCHITECTURE_MASK | \ + MIDR_PARTNUM_MASK) + #define MIDR_CPU_PART(imp, partnum) \ (((imp) << MIDR_IMPLEMENTOR_SHIFT) | \ (0xf << MIDR_ARCHITECTURE_SHIFT) | \ @@ -70,11 +78,14 @@ #define ARM_CPU_PART_FOUNDATION 0xD00 #define ARM_CPU_PART_CORTEX_A57 0xD07 #define ARM_CPU_PART_CORTEX_A53 0xD03 +#define ARM_CPU_PART_CORTEX_A55 0xD05
#define APM_CPU_PART_POTENZA 0x000
#define CAVIUM_CPU_PART_THUNDERX 0x0A1
+#define MIDR_CORTEX_A55 MIDR_CPU_PART(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55) + #ifndef __ASSEMBLY__
/* --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -221,6 +221,11 @@ ENTRY(__cpu_setup) cbz x9, 2f cmp x9, #2 b.lt 1f +#ifdef CONFIG_ARM64_ERRATUM_1024718 + /* Disable hardware DBM on Cortex-A55 r0p0, r0p1 & r1p0 */ + cpu_midr_match MIDR_CORTEX_A55, MIDR_CPU_VAR_REV(0, 0), MIDR_CPU_VAR_REV(1, 0), x1, x2, x3, x4 + cbnz x1, 1f +#endif orr x10, x10, #TCR_HD // hardware Dirty flag update 1: orr x10, x10, #TCR_HA // hardware Access flag update 2:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman mgorman@suse.de
commit 48fb6f4db940e92cfb16cd878cddd59ea6120d06 upstream.
Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()") removed an unnecessary lock_page() with the side-effect that page->mapping needed to be treated very carefully.
Two defensive warnings were added in case any assumption was missed and the first warning assumed a correct application would not alter a mapping backing a futex key. Since merging, it has not triggered for any unexpected case but Mark Rutland reported the following bug triggering due to the first warning.
kernel BUG at kernel/futex.c:679! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3 Hardware name: linux,dummy-virt (DT) task: ffff80001e271780 task.stack: ffff000010908000 PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
The fact that it's a bug instead of a warning was due to an unrelated arm64 problem, but the warning itself triggered because the underlying mapping changed.
This is an application issue but from a kernel perspective it's a recoverable situation and the warning is unnecessary so this patch removes the warning. The warning may potentially be triggered with the following test program from Mark although it may be necessary to adjust NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the system.
#include <linux/futex.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/time.h> #include <unistd.h>
#define NR_FUTEX_THREADS 16 pthread_t threads[NR_FUTEX_THREADS];
void *mem;
#define MEM_PROT (PROT_READ | PROT_WRITE) #define MEM_SIZE 65536
static int futex_wrapper(int *uaddr, int op, int val, const struct timespec *timeout, int *uaddr2, int val3) { syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3); }
void *poll_futex(void *unused) { for (;;) { futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1); } }
int main(int argc, char *argv[]) { int i;
mem = mmap(NULL, MEM_SIZE, MEM_PROT, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
printf("Mapping @ %p\n", mem);
printf("Creating futex threads...\n");
for (i = 0; i < NR_FUTEX_THREADS; i++) pthread_create(&threads[i], NULL, poll_futex, NULL);
printf("Flipping mapping...\n"); for (;;) { mmap(mem, MEM_SIZE, MEM_PROT, MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0); }
return 0; }
Reported-and-tested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Mel Gorman mgorman@suse.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/futex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -666,13 +666,14 @@ again: * this reference was taken by ihold under the page lock * pinning the inode in place so i_lock was unnecessary. The * only way for this check to fail is if the inode was - * truncated in parallel so warn for now if this happens. + * truncated in parallel which is almost certainly an + * application bug. In such a case, just retry. * * We are not calling into get_futex_key_refs() in file-backed * cases, therefore a successful atomic_inc return below will * guarantee that get_futex_key() will still imply smp_mb(); (B). */ - if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) { + if (!atomic_inc_not_zero(&inode->i_count)) { rcu_read_unlock(); put_page(page_head);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
There is code duplicated over all architecture's headers for futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, and comparison of the result.
Remove this duplication and leave up to the arches only the needed assembly which is now in arch_futex_atomic_op_inuser.
This effectively distributes the Will Deacon's arm64 fix for undefined behaviour reported by UBSAN to all architectures. The fix was done in commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
And as suggested by Thomas, check for negative oparg too, because it was also reported to cause undefined behaviour report.
Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: remove pointless access_ok() checks") as access_ok there returns true. We introduce it back to the helper for the sake of simplicity (it gets optimized away anyway).
Signed-off-by: Jiri Slaby jslaby@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Russell King rmk+kernel@armlinux.org.uk Acked-by: Michael Ellerman mpe@ellerman.id.au (powerpc) Acked-by: Heiko Carstens heiko.carstens@de.ibm.com [s390] Acked-by: Chris Metcalf cmetcalf@mellanox.com [for tile] Reviewed-by: Darren Hart (VMware) dvhart@infradead.org Reviewed-by: Will Deacon will.deacon@arm.com [core/arm64] Cc: linux-mips@linux-mips.org Cc: Rich Felker dalias@libc.org Cc: linux-ia64@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: peterz@infradead.org Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Max Filippov jcmvbkbc@gmail.com Cc: Paul Mackerras paulus@samba.org Cc: sparclinux@vger.kernel.org Cc: Jonas Bonn jonas@southpole.se Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: Yoshinori Sato ysato@users.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: Helge Deller deller@gmx.de Cc: "James E.J. Bottomley" jejb@parisc-linux.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Matt Turner mattst88@gmail.com Cc: linux-snps-arc@lists.infradead.org Cc: Fenghua Yu fenghua.yu@intel.com Cc: Arnd Bergmann arnd@arndb.de Cc: linux-xtensa@linux-xtensa.org Cc: Stefan Kristiansson stefan.kristiansson@saunalahti.fi Cc: openrisc@lists.librecores.org Cc: Ivan Kokshaysky ink@jurassic.park.msu.ru Cc: Stafford Horne shorne@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: Richard Henderson rth@twiddle.net Cc: Chris Zankel chris@zankel.net Cc: Michal Simek monstr@monstr.eu Cc: Tony Luck tony.luck@intel.com Cc: linux-parisc@vger.kernel.org Cc: Vineet Gupta vgupta@synopsys.com Cc: Ralf Baechle ralf@linux-mips.org Cc: Richard Kuo rkuo@codeaurora.org Cc: linux-alpha@vger.kernel.org Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" davem@davemloft.net Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/alpha/include/asm/futex.h | 26 +++--------------- arch/arc/include/asm/futex.h | 40 +++------------------------- arch/arm/include/asm/futex.h | 26 ++---------------- arch/arm64/include/asm/futex.h | 26 ++---------------- arch/frv/include/asm/futex.h | 3 +- arch/frv/kernel/futex.c | 27 ++----------------- arch/hexagon/include/asm/futex.h | 38 ++------------------------- arch/ia64/include/asm/futex.h | 25 ++---------------- arch/microblaze/include/asm/futex.h | 38 ++------------------------- arch/mips/include/asm/futex.h | 25 ++---------------- arch/parisc/include/asm/futex.h | 25 ++---------------- arch/powerpc/include/asm/futex.h | 26 +++--------------- arch/s390/include/asm/futex.h | 23 +++------------- arch/sh/include/asm/futex.h | 26 ++---------------- arch/sparc/include/asm/futex_64.h | 26 +++--------------- arch/tile/include/asm/futex.h | 40 +++------------------------- arch/x86/include/asm/futex.h | 40 +++------------------------- arch/xtensa/include/asm/futex.h | 27 +++---------------- include/asm-generic/futex.h | 50 ++++++------------------------------ kernel/futex.c | 39 ++++++++++++++++++++++++++++ 20 files changed, 126 insertions(+), 470 deletions(-)
--- a/arch/alpha/include/asm/futex.h +++ b/arch/alpha/include/asm/futex.h @@ -29,18 +29,10 @@ : "r" (uaddr), "r"(oparg) \ : "memory")
-static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
pagefault_disable();
@@ -66,17 +58,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/arc/include/asm/futex.h +++ b/arch/arc/include/asm/futex.h @@ -73,20 +73,11 @@
#endif
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; - #ifndef CONFIG_ARC_HAS_LLSC preempt_disable(); /* to guarantee atomic r-m-w of futex op */ #endif @@ -118,30 +109,9 @@ static inline int futex_atomic_op_inuser preempt_enable(); #endif
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: - ret = (oldval == cmparg); - break; - case FUTEX_OP_CMP_NE: - ret = (oldval != cmparg); - break; - case FUTEX_OP_CMP_LT: - ret = (oldval < cmparg); - break; - case FUTEX_OP_CMP_GE: - ret = (oldval >= cmparg); - break; - case FUTEX_OP_CMP_LE: - ret = (oldval <= cmparg); - break; - case FUTEX_OP_CMP_GT: - ret = (oldval > cmparg); - break; - default: - ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -128,20 +128,10 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, #endif /* !SMP */
static inline int -futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret, tmp;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT; - #ifndef CONFIG_SMP preempt_disable(); #endif @@ -172,17 +162,9 @@ futex_atomic_op_inuser (int encoded_op, preempt_enable(); #endif
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -53,20 +53,10 @@ : "memory")
static inline int -futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (int)(encoded_op << 8) >> 20; - int cmparg = (int)(encoded_op << 20) >> 20; int oldval = 0, ret, tmp;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1U << (oparg & 0x1f); - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT; - pagefault_disable();
switch (op) { @@ -96,17 +86,9 @@ futex_atomic_op_inuser(unsigned int enco
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/frv/include/asm/futex.h +++ b/arch/frv/include/asm/futex.h @@ -7,7 +7,8 @@ #include <asm/errno.h> #include <asm/uaccess.h>
-extern int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr); +extern int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr);
static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, --- a/arch/frv/kernel/futex.c +++ b/arch/frv/kernel/futex.c @@ -186,20 +186,10 @@ static inline int atomic_futex_op_xchg_x /* * do the futex operations */ -int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT; - pagefault_disable();
switch (op) { @@ -225,18 +215,9 @@ int futex_atomic_op_inuser(int encoded_o
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; break; - } - } + if (!ret) + *oval = oldval;
return ret;
-} /* end futex_atomic_op_inuser() */ +} /* end arch_futex_atomic_op_inuser() */ --- a/arch/hexagon/include/asm/futex.h +++ b/arch/hexagon/include/asm/futex.h @@ -31,18 +31,9 @@
static inline int -futex_atomic_op_inuser(int encoded_op, int __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT;
pagefault_disable();
@@ -72,30 +63,9 @@ futex_atomic_op_inuser(int encoded_op, i
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: - ret = (oldval == cmparg); - break; - case FUTEX_OP_CMP_NE: - ret = (oldval != cmparg); - break; - case FUTEX_OP_CMP_LT: - ret = (oldval < cmparg); - break; - case FUTEX_OP_CMP_GE: - ret = (oldval >= cmparg); - break; - case FUTEX_OP_CMP_LE: - ret = (oldval <= cmparg); - break; - case FUTEX_OP_CMP_GT: - ret = (oldval > cmparg); - break; - default: - ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/ia64/include/asm/futex.h +++ b/arch/ia64/include/asm/futex.h @@ -45,18 +45,9 @@ do { \ } while (0)
static inline int -futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
pagefault_disable();
@@ -84,17 +75,9 @@ futex_atomic_op_inuser (int encoded_op,
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/microblaze/include/asm/futex.h +++ b/arch/microblaze/include/asm/futex.h @@ -29,18 +29,9 @@ })
static inline int -futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
pagefault_disable();
@@ -66,30 +57,9 @@ futex_atomic_op_inuser(int encoded_op, u
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: - ret = (oldval == cmparg); - break; - case FUTEX_OP_CMP_NE: - ret = (oldval != cmparg); - break; - case FUTEX_OP_CMP_LT: - ret = (oldval < cmparg); - break; - case FUTEX_OP_CMP_GE: - ret = (oldval >= cmparg); - break; - case FUTEX_OP_CMP_LE: - ret = (oldval <= cmparg); - break; - case FUTEX_OP_CMP_GT: - ret = (oldval > cmparg); - break; - default: - ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/mips/include/asm/futex.h +++ b/arch/mips/include/asm/futex.h @@ -83,18 +83,9 @@ }
static inline int -futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
pagefault_disable();
@@ -125,17 +116,9 @@ futex_atomic_op_inuser(int encoded_op, u
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/parisc/include/asm/futex.h +++ b/arch/parisc/include/asm/futex.h @@ -32,20 +32,11 @@ _futex_spin_unlock_irqrestore(u32 __user }
static inline int -futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr) { unsigned long int flags; u32 val; - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr))) - return -EFAULT;
pagefault_disable();
@@ -98,17 +89,9 @@ futex_atomic_op_inuser (int encoded_op,
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/powerpc/include/asm/futex.h +++ b/arch/powerpc/include/asm/futex.h @@ -31,18 +31,10 @@ : "b" (uaddr), "i" (-EFAULT), "r" (oparg) \ : "cr0", "memory")
-static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
pagefault_disable();
@@ -68,17 +60,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/s390/include/asm/futex.h +++ b/arch/s390/include/asm/futex.h @@ -21,17 +21,12 @@ : "0" (-EFAULT), "d" (oparg), "a" (uaddr), \ "m" (*uaddr) : "cc");
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, newval, ret;
load_kernel_asce(); - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg;
pagefault_disable(); switch (op) { @@ -60,17 +55,9 @@ static inline int futex_atomic_op_inuser } pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/sh/include/asm/futex.h +++ b/arch/sh/include/asm/futex.h @@ -10,20 +10,11 @@ /* XXX: UP variants, fix for SH-4A and SMP.. */ #include <asm/futex-irq.h>
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT; - pagefault_disable();
switch (op) { @@ -49,17 +40,8 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval;
return ret; } --- a/arch/sparc/include/asm/futex_64.h +++ b/arch/sparc/include/asm/futex_64.h @@ -29,22 +29,14 @@ : "r" (uaddr), "r" (oparg), "i" (-EFAULT) \ : "memory")
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret, tem;
- if (unlikely(!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))) - return -EFAULT; if (unlikely((((unsigned long) uaddr) & 0x3UL))) return -EINVAL;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - pagefault_disable();
switch (op) { @@ -69,17 +61,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/tile/include/asm/futex.h +++ b/arch/tile/include/asm/futex.h @@ -106,12 +106,9 @@ lock = __atomic_hashed_lock((int __force *)uaddr) #endif
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int uninitialized_var(val), ret;
__futex_prolog(); @@ -119,12 +116,6 @@ static inline int futex_atomic_op_inuser /* The 32-bit futex code makes this assumption, so validate it here. */ BUILD_BUG_ON(sizeof(atomic_t) != sizeof(int));
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT; - pagefault_disable(); switch (op) { case FUTEX_OP_SET: @@ -148,30 +139,9 @@ static inline int futex_atomic_op_inuser } pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: - ret = (val == cmparg); - break; - case FUTEX_OP_CMP_NE: - ret = (val != cmparg); - break; - case FUTEX_OP_CMP_LT: - ret = (val < cmparg); - break; - case FUTEX_OP_CMP_GE: - ret = (val >= cmparg); - break; - case FUTEX_OP_CMP_LE: - ret = (val <= cmparg); - break; - case FUTEX_OP_CMP_GT: - ret = (val > cmparg); - break; - default: - ret = -ENOSYS; - } - } + if (!ret) + *oval = val; + return ret; }
--- a/arch/x86/include/asm/futex.h +++ b/arch/x86/include/asm/futex.h @@ -41,20 +41,11 @@ "+m" (*uaddr), "=&r" (tem) \ : "r" (oparg), "i" (-EFAULT), "1" (0))
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret, tem;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT; - pagefault_disable();
switch (op) { @@ -80,30 +71,9 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: - ret = (oldval == cmparg); - break; - case FUTEX_OP_CMP_NE: - ret = (oldval != cmparg); - break; - case FUTEX_OP_CMP_LT: - ret = (oldval < cmparg); - break; - case FUTEX_OP_CMP_GE: - ret = (oldval >= cmparg); - break; - case FUTEX_OP_CMP_LE: - ret = (oldval <= cmparg); - break; - case FUTEX_OP_CMP_GT: - ret = (oldval > cmparg); - break; - default: - ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/arch/xtensa/include/asm/futex.h +++ b/arch/xtensa/include/asm/futex.h @@ -44,18 +44,10 @@ : "r" (uaddr), "I" (-EFAULT), "r" (oparg) \ : "memory")
-static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
#if !XCHAL_HAVE_S32C1I return -ENOSYS; @@ -89,19 +81,10 @@ static inline int futex_atomic_op_inuser
pagefault_enable();
- if (ret) - return ret; - - switch (cmp) { - case FUTEX_OP_CMP_EQ: return (oldval == cmparg); - case FUTEX_OP_CMP_NE: return (oldval != cmparg); - case FUTEX_OP_CMP_LT: return (oldval < cmparg); - case FUTEX_OP_CMP_GE: return (oldval >= cmparg); - case FUTEX_OP_CMP_LE: return (oldval <= cmparg); - case FUTEX_OP_CMP_GT: return (oldval > cmparg); - } + if (!ret) + *oval = oldval;
- return -ENOSYS; + return ret; }
static inline int --- a/include/asm-generic/futex.h +++ b/include/asm-generic/futex.h @@ -13,7 +13,7 @@ */
/** - * futex_atomic_op_inuser() - Atomic arithmetic operation with constant + * arch_futex_atomic_op_inuser() - Atomic arithmetic operation with constant * argument and comparison of the previous * futex value with another constant. * @@ -25,18 +25,11 @@ * <0 - On error */ static inline int -futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval, ret; u32 tmp;
- if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - preempt_disable(); pagefault_disable();
@@ -74,17 +67,9 @@ out_pagefault_enable: pagefault_enable(); preempt_enable();
- if (ret == 0) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (ret == 0) + *oval = oldval; + return ret; }
@@ -126,18 +111,9 @@ futex_atomic_cmpxchg_inatomic(u32 *uval,
#else static inline int -futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr) { - int op = (encoded_op >> 28) & 7; - int cmp = (encoded_op >> 24) & 15; - int oparg = (encoded_op << 8) >> 20; - int cmparg = (encoded_op << 20) >> 20; int oldval = 0, ret; - if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) - oparg = 1 << oparg; - - if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u32))) - return -EFAULT;
pagefault_disable();
@@ -153,17 +129,9 @@ futex_atomic_op_inuser (int encoded_op,
pagefault_enable();
- if (!ret) { - switch (cmp) { - case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; - case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; - case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; - case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; - case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; - case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; - default: ret = -ENOSYS; - } - } + if (!ret) + *oval = oldval; + return ret; }
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -1453,6 +1453,45 @@ out: return ret; }
+static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr) +{ + unsigned int op = (encoded_op & 0x70000000) >> 28; + unsigned int cmp = (encoded_op & 0x0f000000) >> 24; + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12); + int cmparg = sign_extend32(encoded_op & 0x00000fff, 12); + int oldval, ret; + + if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) { + if (oparg < 0 || oparg > 31) + return -EINVAL; + oparg = 1 << oparg; + } + + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) + return -EFAULT; + + ret = arch_futex_atomic_op_inuser(op, oparg, &oldval, uaddr); + if (ret) + return ret; + + switch (cmp) { + case FUTEX_OP_CMP_EQ: + return oldval == cmparg; + case FUTEX_OP_CMP_NE: + return oldval != cmparg; + case FUTEX_OP_CMP_LT: + return oldval < cmparg; + case FUTEX_OP_CMP_GE: + return oldval >= cmparg; + case FUTEX_OP_CMP_LE: + return oldval <= cmparg; + case FUTEX_OP_CMP_GT: + return oldval > cmparg; + default: + return -ENOSYS; + } +} + /* * Wake up all waiters hashed on the physical page that is mapped * to this virtual address:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Antony Antony antony@phenome.org
commit 75bf50f4aaa1c78d769d854ab3d975884909e4fb upstream.
copy geniv when cloning the xfrm state.
x->geniv was not copied to the new state and migration would fail.
xfrm_do_migrate .. xfrm_state_clone() .. .. esp_init_aead() crypto_alloc_aead() crypto_alloc_tfm() crypto_find_alg() return EAGAIN and failed
Signed-off-by: Antony Antony antony@phenome.org Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/xfrm/xfrm_state.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1159,6 +1159,7 @@ static struct xfrm_state *xfrm_state_clo
if (orig->aead) { x->aead = xfrm_algo_aead_clone(orig->aead); + x->geniv = orig->geniv; if (!x->aead) goto error; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream.
Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect, it removes lockd_manager and disarm grace_period_end for init_net only.
If nfsd was started from another net namespace lockd_up_net() calls set_grace_period() that adds lockd_manager into per-netns list and queues grace_period_end delayed work.
These action should be reverted in lockd_down_net(). Otherwise it can lead to double list_add on after restart nfsd in netns, and to use-after-free if non-disarmed delayed work will be executed after netns destroy.
Fixes: efda760fe95e ("lockd: fix lockd shutdown race") Cc: stable@vger.kernel.org Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: J. Bruce Fields bfields@redhat.com Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/lockd/svc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -271,6 +271,8 @@ static void lockd_down_net(struct svc_se if (ln->nlmsvc_users) { if (--ln->nlmsvc_users == 0) { nlm_shutdown_hosts_net(net); + cancel_delayed_work_sync(&ln->grace_period_end); + locks_end_grace(&ln->lockd_manager); svc_shutdown_net(serv, net); dprintk("lockd_down_net: per-net data destroyed; net=%p\n", net); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 9de3a3bfed892608dc30a6bc3fd8bdbeae5b51a5 which was commit 79935915300c5eb88a0e94fa9148a7505c14a02a upstream.
As Ben points out: This depends on:
commit 570c70a60f53ca737ead4e5966c446bf0d39fac9 Author: Fabio Estevam fabio.estevam@nxp.com Date: Wed Apr 5 11:32:34 2017 -0300
ASoC: sgtl5000: Allow LRCLK pad drive strength to be changed
which did not show up until 4.13, so this makes no sense to have in this stable branch.
Reported-by: Ben Hutchings ben.hutchings@codethink.co.uk Cc: Fabio Estevam fabio.estevam@nxp.com Cc: Shawn Guo shawnguo@kernel.org Cc: Sasha Levin alexander.levin@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/imx6qdl-wandboard.dtsi | 1 - 1 file changed, 1 deletion(-)
--- a/arch/arm/boot/dts/imx6qdl-wandboard.dtsi +++ b/arch/arm/boot/dts/imx6qdl-wandboard.dtsi @@ -88,7 +88,6 @@ clocks = <&clks 201>; VDDA-supply = <®_2p5v>; VDDIO-supply = <®_3p3v>; - lrclk-strength = <3>; }; };
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Chapman jchapman@katalix.com
commit de3b58bc359a861d5132300f53f95e83f71954b3 upstream.
Revert commit 820da5357572 ("l2tp: fix missing print session offset info"). The peer_offset parameter is removed.
Signed-off-by: James Chapman jchapman@katalix.com Signed-off-by: David S. Miller davem@davemloft.net Cc: Guillaume Nault g.nault@alphalink.fr Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/l2tp/l2tp_netlink.c | 2 -- 1 file changed, 2 deletions(-)
--- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -732,8 +732,6 @@ static int l2tp_nl_session_send(struct s
if ((session->ifname[0] && nla_put_string(skb, L2TP_ATTR_IFNAME, session->ifname)) || - (session->offset && - nla_put_u16(skb, L2TP_ATTR_OFFSET, session->offset)) || (session->cookie_len && nla_put(skb, L2TP_ATTR_COOKIE, session->cookie_len, &session->cookie[0])) ||
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kerrisk (man-pages) mtk.manpages@gmail.com
commit 086e774a57fba4695f14383c0818994c0b31da7c upstream.
This is a patch that provides behavior that is more consistent, and probably less surprising to users. I consider the change optional, and welcome opinions about whether it should be applied.
By default, pipes are created with a capacity of 64 kiB. However, /proc/sys/fs/pipe-max-size may be set smaller than this value. In this scenario, an unprivileged user could thus create a pipe whose initial capacity exceeds the limit. Therefore, it seems logical to cap the initial pipe capacity according to the value of pipe-max-size.
The test program shown earlier in this patch series can be used to demonstrate the effect of the change brought about with this patch:
# cat /proc/sys/fs/pipe-max-size 1048576 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536 # echo 10000 > /proc/sys/fs/pipe-max-size # cat /proc/sys/fs/pipe-max-size 16384 # sudo -u mtk ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 16384 # ./test_F_SETPIPE_SZ 1 Initial pipe capacity: 65536
The last two executions of 'test_F_SETPIPE_SZ' show that pipe-max-size caps the initial allocation for a new pipe for unprivileged users, but not for privileged users.
Link: http://lkml.kernel.org/r/31dc7064-2a17-9c5b-1df1-4e3012ee992c@gmail.com Signed-off-by: Michael Kerrisk mtk.manpages@gmail.com Reviewed-by: Vegard Nossum vegard.nossum@oracle.com Cc: Willy Tarreau w@1wt.eu Cc: socketpair@gmail.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Jens Axboe axboe@fb.com Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Daniel Sangorrin daniel.sangorrin@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/pipe.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -616,6 +616,9 @@ struct pipe_inode_info *alloc_pipe_info( unsigned long pipe_bufs = PIPE_DEF_BUFFERS; struct user_struct *user = get_current_user();
+ if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE)) + pipe_bufs = pipe_max_size >> PAGE_SHIFT; + if (!too_many_pipe_buffers_hard(user)) { if (too_many_pipe_buffers_soft(user)) pipe_bufs = 1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
commit d70ef22892ed6c066e51e118b225923c9b74af34 upstream.
sign_extend32 counts the sign bit parameter from 0, not from 1. So we have to use "11" for 12th bit, not "12".
This mistake means we have not allowed negative op and cmp args since commit 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour") till now.
Fixes: 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour") Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Darren Hart dvhart@infradead.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/futex.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -1457,8 +1457,8 @@ static int futex_atomic_op_inuser(unsign { unsigned int op = (encoded_op & 0x70000000) >> 28; unsigned int cmp = (encoded_op & 0x0f000000) >> 24; - int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12); - int cmparg = sign_extend32(encoded_op & 0x00000fff, 12); + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 11); + int cmparg = sign_extend32(encoded_op & 0x00000fff, 11); int oldval, ret;
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: zhongjiang zhongjiang@huawei.com
commit dd83c161fbcc5d8be637ab159c0de015cbff5ba4 upstream.
wait4(-2147483648, 0x20, 0, 0xdd0000) triggers: UBSAN: Undefined behaviour in kernel/exit.c:1651:9
The related calltrace is as follows:
negation of -2147483648 cannot be represented in type 'int': CPU: 9 PID: 16482 Comm: zj Tainted: G B ---- ------- 3.10.0-327.53.58.71.x86_64+ #66 Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011 Call Trace: dump_stack+0x19/0x1b ubsan_epilogue+0xd/0x50 __ubsan_handle_negate_overflow+0x109/0x14e SyS_wait4+0x1cb/0x1e0 system_call_fastpath+0x16/0x1b
Exclude the overflow to avoid the UBSAN warning.
Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei... Signed-off-by: zhongjiang zhongjiang@huawei.com Cc: Oleg Nesterov oleg@redhat.com Cc: David Rientjes rientjes@google.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Xishi Qiu qiuxishi@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/exit.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/exit.c +++ b/kernel/exit.c @@ -1608,6 +1608,10 @@ SYSCALL_DEFINE4(wait4, pid_t, upid, int __WNOTHREAD|__WCLONE|__WALL)) return -EINVAL;
+ /* -INT_MIN is not defined */ + if (upid == INT_MIN) + return -ESRCH; + if (upid == -1) type = PIDTYPE_MAX; else if (upid < 0) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan shuahkh@osg.samsung.com
commit 28b68acc4a88dcf91fd1dcf2577371dc9bf574cc upstream.
Refine probe and disconnect debug msgs to be useful and say what is in progress.
Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/stub_dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/usbip/stub_dev.c +++ b/drivers/usb/usbip/stub_dev.c @@ -316,7 +316,7 @@ static int stub_probe(struct usb_device struct bus_id_priv *busid_priv; int rc;
- dev_dbg(&udev->dev, "Enter\n"); + dev_dbg(&udev->dev, "Enter probe\n");
/* check we should claim or not by busid_table */ busid_priv = get_busid_priv(udev_busid); @@ -419,7 +419,7 @@ static void stub_disconnect(struct usb_d struct bus_id_priv *busid_priv; int rc;
- dev_dbg(&udev->dev, "Enter\n"); + dev_dbg(&udev->dev, "Enter disconnect\n");
busid_priv = get_busid_priv(udev_busid); if (!busid_priv) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) shuah@kernel.org
commit 1e180f167d4e413afccbbb4a421b48b2de832549 upstream.
Device is left in the busid_table after unbind and rebind. Rebind initiates usb bus scan and the original driver claims the device. After rescan the device should be deleted from the busid_table as it no longer belongs to usbip_host.
Fix it to delete the device after device_attach() succeeds.
Signed-off-by: Shuah Khan (Samsung OSG) shuah@kernel.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/stub_main.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/usb/usbip/stub_main.c +++ b/drivers/usb/usbip/stub_main.c @@ -201,6 +201,9 @@ static ssize_t rebind_store(struct devic if (!bid) return -ENODEV;
+ /* mark the device for deletion so probe ignores it during rescan */ + bid->status = STUB_BUSID_OTHER; + /* device_attach() callers should hold parent lock for USB */ if (bid->udev->dev.parent) device_lock(bid->udev->dev.parent); @@ -212,6 +215,9 @@ static ssize_t rebind_store(struct devic return ret; }
+ /* delete device from busid_table */ + del_match_busid((char *) buf); + return count; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) shuah@kernel.org
commit 7510df3f29d44685bab7b1918b61a8ccd57126a9 upstream.
After removing usbip_host module, devices it releases are left without a driver. For example, when a keyboard or a mass storage device are bound to usbip_host when it is removed, these devices are no longer bound to any driver.
Fix it to run device_attach() from the module exit routine to restore the devices to their original drivers. This includes cleanup changes and moving device_attach() code to a common routine to be called from rebind_store() and usbip_host_exit().
Signed-off-by: Shuah Khan (Samsung OSG) shuah@kernel.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/stub_dev.c | 6 ---- drivers/usb/usbip/stub_main.c | 60 +++++++++++++++++++++++++++++++++++------- 2 files changed, 52 insertions(+), 14 deletions(-)
--- a/drivers/usb/usbip/stub_dev.c +++ b/drivers/usb/usbip/stub_dev.c @@ -463,12 +463,8 @@ static void stub_disconnect(struct usb_d busid_priv->sdev = NULL; stub_device_free(sdev);
- if (busid_priv->status == STUB_BUSID_ALLOC) { + if (busid_priv->status == STUB_BUSID_ALLOC) busid_priv->status = STUB_BUSID_ADDED; - } else { - busid_priv->status = STUB_BUSID_OTHER; - del_match_busid((char *)udev_busid); - } }
#ifdef CONFIG_PM --- a/drivers/usb/usbip/stub_main.c +++ b/drivers/usb/usbip/stub_main.c @@ -28,6 +28,7 @@ #define DRIVER_DESC "USB/IP Host Driver"
struct kmem_cache *stub_priv_cache; + /* * busid_tables defines matching busids that usbip can grab. A user can change * dynamically what device is locally used and what device is exported to a @@ -184,6 +185,51 @@ static ssize_t store_match_busid(struct static DRIVER_ATTR(match_busid, S_IRUSR | S_IWUSR, show_match_busid, store_match_busid);
+static int do_rebind(char *busid, struct bus_id_priv *busid_priv) +{ + int ret; + + /* device_attach() callers should hold parent lock for USB */ + if (busid_priv->udev->dev.parent) + device_lock(busid_priv->udev->dev.parent); + ret = device_attach(&busid_priv->udev->dev); + if (busid_priv->udev->dev.parent) + device_unlock(busid_priv->udev->dev.parent); + if (ret < 0) { + dev_err(&busid_priv->udev->dev, "rebind failed\n"); + return ret; + } + return 0; +} + +static void stub_device_rebind(void) +{ +#if IS_MODULE(CONFIG_USBIP_HOST) + struct bus_id_priv *busid_priv; + int i; + + /* update status to STUB_BUSID_OTHER so probe ignores the device */ + spin_lock(&busid_table_lock); + for (i = 0; i < MAX_BUSID; i++) { + if (busid_table[i].name[0] && + busid_table[i].shutdown_busid) { + busid_priv = &(busid_table[i]); + busid_priv->status = STUB_BUSID_OTHER; + } + } + spin_unlock(&busid_table_lock); + + /* now run rebind */ + for (i = 0; i < MAX_BUSID; i++) { + if (busid_table[i].name[0] && + busid_table[i].shutdown_busid) { + busid_priv = &(busid_table[i]); + do_rebind(busid_table[i].name, busid_priv); + } + } +#endif +} + static ssize_t rebind_store(struct device_driver *dev, const char *buf, size_t count) { @@ -204,16 +250,9 @@ static ssize_t rebind_store(struct devic /* mark the device for deletion so probe ignores it during rescan */ bid->status = STUB_BUSID_OTHER;
- /* device_attach() callers should hold parent lock for USB */ - if (bid->udev->dev.parent) - device_lock(bid->udev->dev.parent); - ret = device_attach(&bid->udev->dev); - if (bid->udev->dev.parent) - device_unlock(bid->udev->dev.parent); - if (ret < 0) { - dev_err(&bid->udev->dev, "rebind failed\n"); + ret = do_rebind((char *) buf, bid); + if (ret < 0) return ret; - }
/* delete device from busid_table */ del_match_busid((char *) buf); @@ -339,6 +378,9 @@ static void __exit usbip_host_exit(void) */ usb_deregister_device_driver(&stub_driver);
+ /* initiate scan to attach devices */ + stub_device_rebind(); + kmem_cache_destroy(stub_priv_cache); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) shuah@kernel.org
commit 22076557b07c12086eeb16b8ce2b0b735f7a27e7 upstream.
usbip_host updates device status without holding lock from stub probe, disconnect and rebind code paths. When multiple requests to import a device are received, these unprotected code paths step all over each other and drive fails with NULL-ptr deref and use-after-free errors.
The driver uses a table lock to protect the busid array for adding and deleting busids to the table. However, the probe, disconnect and rebind paths get the busid table entry and update the status without holding the busid table lock. Add a new finer grain lock to protect the busid entry. This new lock will be held to search and update the busid entry fields from get_busid_idx(), add_match_busid() and del_match_busid().
match_busid_show() does the same to access the busid entry fields.
get_busid_priv() changed to return the pointer to the busid entry holding the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind() call put_busid_priv() to release the busid lock before returning. This changes fixes the unprotected code paths eliminating the race conditions in updating the busid entries.
Reported-by: Jakub Jirasek Signed-off-by: Shuah Khan (Samsung OSG) shuah@kernel.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/stub.h | 2 ++ drivers/usb/usbip/stub_dev.c | 33 +++++++++++++++++++++++---------- drivers/usb/usbip/stub_main.c | 40 +++++++++++++++++++++++++++++++++++----- 3 files changed, 60 insertions(+), 15 deletions(-)
--- a/drivers/usb/usbip/stub.h +++ b/drivers/usb/usbip/stub.h @@ -88,6 +88,7 @@ struct bus_id_priv { struct stub_device *sdev; struct usb_device *udev; char shutdown_busid; + spinlock_t busid_lock; };
/* stub_priv is allocated from stub_priv_cache */ @@ -98,6 +99,7 @@ extern struct usb_device_driver stub_dri
/* stub_main.c */ struct bus_id_priv *get_busid_priv(const char *busid); +void put_busid_priv(struct bus_id_priv *bid); int del_match_busid(char *busid); void stub_device_cleanup_urbs(struct stub_device *sdev);
--- a/drivers/usb/usbip/stub_dev.c +++ b/drivers/usb/usbip/stub_dev.c @@ -314,7 +314,7 @@ static int stub_probe(struct usb_device struct stub_device *sdev = NULL; const char *udev_busid = dev_name(&udev->dev); struct bus_id_priv *busid_priv; - int rc; + int rc = 0;
dev_dbg(&udev->dev, "Enter probe\n");
@@ -331,13 +331,15 @@ static int stub_probe(struct usb_device * other matched drivers by the driver core. * See driver_probe_device() in driver/base/dd.c */ - return -ENODEV; + rc = -ENODEV; + goto call_put_busid_priv; }
if (udev->descriptor.bDeviceClass == USB_CLASS_HUB) { dev_dbg(&udev->dev, "%s is a usb hub device... skip!\n", udev_busid); - return -ENODEV; + rc = -ENODEV; + goto call_put_busid_priv; }
if (!strcmp(udev->bus->bus_name, "vhci_hcd")) { @@ -345,13 +347,16 @@ static int stub_probe(struct usb_device "%s is attached on vhci_hcd... skip!\n", udev_busid);
- return -ENODEV; + rc = -ENODEV; + goto call_put_busid_priv; }
/* ok, this is my device */ sdev = stub_device_alloc(udev); - if (!sdev) - return -ENOMEM; + if (!sdev) { + rc = -ENOMEM; + goto call_put_busid_priv; + }
dev_info(&udev->dev, "usbip-host: register new device (bus %u dev %u)\n", @@ -383,7 +388,9 @@ static int stub_probe(struct usb_device } busid_priv->status = STUB_BUSID_ALLOC;
- return 0; + rc = 0; + goto call_put_busid_priv; + err_files: usb_hub_release_port(udev->parent, udev->portnum, (struct usb_dev_state *) udev); @@ -394,6 +401,9 @@ err_port:
busid_priv->sdev = NULL; stub_device_free(sdev); + +call_put_busid_priv: + put_busid_priv(busid_priv); return rc; }
@@ -432,7 +442,7 @@ static void stub_disconnect(struct usb_d /* get stub_device */ if (!sdev) { dev_err(&udev->dev, "could not get device"); - return; + goto call_put_busid_priv; }
dev_set_drvdata(&udev->dev, NULL); @@ -447,12 +457,12 @@ static void stub_disconnect(struct usb_d (struct usb_dev_state *) udev); if (rc) { dev_dbg(&udev->dev, "unable to release port\n"); - return; + goto call_put_busid_priv; }
/* If usb reset is called from event handler */ if (busid_priv->sdev->ud.eh == current) - return; + goto call_put_busid_priv;
/* shutdown the current connection */ shutdown_busid(busid_priv); @@ -465,6 +475,9 @@ static void stub_disconnect(struct usb_d
if (busid_priv->status == STUB_BUSID_ALLOC) busid_priv->status = STUB_BUSID_ADDED; + +call_put_busid_priv: + put_busid_priv(busid_priv); }
#ifdef CONFIG_PM --- a/drivers/usb/usbip/stub_main.c +++ b/drivers/usb/usbip/stub_main.c @@ -40,6 +40,8 @@ static spinlock_t busid_table_lock;
static void init_busid_table(void) { + int i; + /* * This also sets the bus_table[i].status to * STUB_BUSID_OTHER, which is 0. @@ -47,6 +49,9 @@ static void init_busid_table(void) memset(busid_table, 0, sizeof(busid_table));
spin_lock_init(&busid_table_lock); + + for (i = 0; i < MAX_BUSID; i++) + spin_lock_init(&busid_table[i].busid_lock); }
/* @@ -58,15 +63,20 @@ static int get_busid_idx(const char *bus int i; int idx = -1;
- for (i = 0; i < MAX_BUSID; i++) + for (i = 0; i < MAX_BUSID; i++) { + spin_lock(&busid_table[i].busid_lock); if (busid_table[i].name[0]) if (!strncmp(busid_table[i].name, busid, BUSID_SIZE)) { idx = i; + spin_unlock(&busid_table[i].busid_lock); break; } + spin_unlock(&busid_table[i].busid_lock); + } return idx; }
+/* Returns holding busid_lock. Should call put_busid_priv() to unlock */ struct bus_id_priv *get_busid_priv(const char *busid) { int idx; @@ -74,13 +84,21 @@ struct bus_id_priv *get_busid_priv(const
spin_lock(&busid_table_lock); idx = get_busid_idx(busid); - if (idx >= 0) + if (idx >= 0) { bid = &(busid_table[idx]); + /* get busid_lock before returning */ + spin_lock(&bid->busid_lock); + } spin_unlock(&busid_table_lock);
return bid; }
+void put_busid_priv(struct bus_id_priv *bid) +{ + spin_unlock(&bid->busid_lock); +} + static int add_match_busid(char *busid) { int i; @@ -93,15 +111,19 @@ static int add_match_busid(char *busid) goto out; }
- for (i = 0; i < MAX_BUSID; i++) + for (i = 0; i < MAX_BUSID; i++) { + spin_lock(&busid_table[i].busid_lock); if (!busid_table[i].name[0]) { strlcpy(busid_table[i].name, busid, BUSID_SIZE); if ((busid_table[i].status != STUB_BUSID_ALLOC) && (busid_table[i].status != STUB_BUSID_REMOV)) busid_table[i].status = STUB_BUSID_ADDED; ret = 0; + spin_unlock(&busid_table[i].busid_lock); break; } + spin_unlock(&busid_table[i].busid_lock); + }
out: spin_unlock(&busid_table_lock); @@ -122,6 +144,8 @@ int del_match_busid(char *busid) /* found */ ret = 0;
+ spin_lock(&busid_table[idx].busid_lock); + if (busid_table[idx].status == STUB_BUSID_OTHER) memset(busid_table[idx].name, 0, BUSID_SIZE);
@@ -129,6 +153,7 @@ int del_match_busid(char *busid) (busid_table[idx].status != STUB_BUSID_ADDED)) busid_table[idx].status = STUB_BUSID_REMOV;
+ spin_unlock(&busid_table[idx].busid_lock); out: spin_unlock(&busid_table_lock);
@@ -141,9 +166,12 @@ static ssize_t show_match_busid(struct d char *out = buf;
spin_lock(&busid_table_lock); - for (i = 0; i < MAX_BUSID; i++) + for (i = 0; i < MAX_BUSID; i++) { + spin_lock(&busid_table[i].busid_lock); if (busid_table[i].name[0]) out += sprintf(out, "%s ", busid_table[i].name); + spin_unlock(&busid_table[i].busid_lock); + } spin_unlock(&busid_table_lock); out += sprintf(out, "\n");
@@ -219,7 +247,7 @@ static void stub_device_rebind(void) } spin_unlock(&busid_table_lock);
- /* now run rebind */ + /* now run rebind - no need to hold locks. driver files are removed */ for (i = 0; i < MAX_BUSID; i++) { if (busid_table[i].name[0] && busid_table[i].shutdown_busid) { @@ -249,6 +277,8 @@ static ssize_t rebind_store(struct devic
/* mark the device for deletion so probe ignores it during rescan */ bid->status = STUB_BUSID_OTHER; + /* release the busid lock */ + put_busid_priv(bid);
ret = do_rebind((char *) buf, bid); if (ret < 0)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan (Samsung OSG) shuah@kernel.org
commit c171654caa875919be3c533d3518da8be5be966e upstream.
stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer.
This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer.
usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip!
[ 367.359679] ===================================== [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ #5 Not tainted [ 367.359685] ------------------------------------- [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] ================================================================== [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768
[ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5
Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) shuah@kernel.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/stub_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/usbip/stub_main.c +++ b/drivers/usb/usbip/stub_main.c @@ -96,7 +96,8 @@ struct bus_id_priv *get_busid_priv(const
void put_busid_priv(struct bus_id_priv *bid) { - spin_unlock(&bid->busid_lock); + if (bid) + spin_unlock(&bid->busid_lock); }
static int add_match_busid(char *busid)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Federico Cuello fedux@fedux.com.ar
commit 21493316a3c4598f308d5a9fa31cc74639c4caff upstream.
Currently it's not possible to set volume lower than 26% (it just mutes).
Also fixes this warning:
Warning! Unlikely big volume range (=9472), cval->res is probably wrong. [13] FU [PCM Playback Volume] ch = 2, val = -9473/-1/1
, and volume works fine for full range.
Signed-off-by: Federico Cuello fedux@fedux.com.ar Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -904,6 +904,14 @@ static void volume_control_quirks(struct } break;
+ case USB_ID(0x0d8c, 0x0103): + if (!strcmp(kctl->id.name, "PCM Playback Volume")) { + usb_audio_info(chip, + "set volume quirk for CM102-A+/102S+\n"); + cval->min = -256; + } + break; + case USB_ID(0x0471, 0x0101): case USB_ID(0x0471, 0x0104): case USB_ID(0x0471, 0x0105):
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit c8beccc19b92f5172994c0732db689c08f4f98e5 upstream.
Power-saving is causing loud plops on the Lenovo C50 All in one, add it to the blacklist.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1572975 Signed-off-by: Hans de Goede hdegoede@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/hda_intel.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2072,6 +2072,8 @@ static struct snd_pci_quirk power_save_b SND_PCI_QUIRK(0x1849, 0x0c0c, "Asrock B85M-ITX", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0), + /* https://bugzilla.redhat.com/show_bug.cgi?id=1572975 */ + SND_PCI_QUIRK(0x17aa, 0x36a7, "Lenovo C50 All in one", 0), /* https://bugzilla.kernel.org/show_bug.cgi?id=198611 */ SND_PCI_QUIRK(0x17aa, 0x2227, "Lenovo X1 Carbon 3rd Gen", 0), {}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenwen Wang wang6495@umn.edu
commit 3f12888dfae2a48741c4caa9214885b3aaf350f9 upstream.
In snd_ctl_elem_add_compat(), the fields of the struct 'data' need to be copied from the corresponding fields of the struct 'data32' in userspace. This is achieved by invoking copy_from_user() and get_user() functions. The problem here is that the 'type' field is copied twice. One is by copy_from_user() and one is by get_user(). Given that the 'type' field is not used between the two copies, the second copy is *completely* redundant and should be removed for better performance and cleanup. Also, these two copies can cause inconsistent data: as the struct 'data32' resides in userspace and a malicious userspace process can race to change the 'type' field between the two copies to cause inconsistent data. Depending on how the data is used in the future, such an inconsistency may cause potential security risks.
For above reasons, we should take out the second copy.
Signed-off-by: Wenwen Wang wang6495@umn.edu Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/control_compat.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/sound/core/control_compat.c +++ b/sound/core/control_compat.c @@ -400,8 +400,7 @@ static int snd_ctl_elem_add_compat(struc if (copy_from_user(&data->id, &data32->id, sizeof(data->id)) || copy_from_user(&data->type, &data32->type, 3 * sizeof(u32))) goto error; - if (get_user(data->owner, &data32->owner) || - get_user(data->type, &data32->type)) + if (get_user(data->owner, &data32->owner)) goto error; switch (data->type) { case SNDRV_CTL_ELEM_TYPE_BOOLEAN:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
commit efc4a13724b852ddaa3358402a8dec024ffbcb17 upstream.
Currently the 32-bit device address only is supported for DMA. However, starting from Intel Sunrisepoint PCH the DMA address of the device FIFO can be 64-bit.
Change the respective variable to be compatible with DMA engine expectations, i.e. to phys_addr_t.
Fixes: 34cadd9c1bcb ("spi: pxa2xx: Add support for Intel Sunrisepoint") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/spi/spi-pxa2xx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/spi/spi-pxa2xx.h +++ b/drivers/spi/spi-pxa2xx.h @@ -38,7 +38,7 @@ struct driver_data {
/* SSP register addresses */ void __iomem *ioaddr; - u32 ssdr_physical; + phys_addr_t ssdr_physical;
/* SSP masks*/ u32 dma_cr1;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stewart Smith stewart@linux.vnet.ibm.com
commit 786842b62f81f20d14894925e8c225328ee8144b upstream.
The OpenPower Abstraction Layer firmware went through a couple of iterations in the lab before being released. What we now know as OPAL advertises itself as OPALv3.
OPALv2 and OPALv1 never made it outside the lab, and the possibility of anyone at all ever building a mainline kernel today and expecting it to boot on such hardware is zero.
Signed-off-by: Stewart Smith stewart@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mike Galbraith mgalbraith@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/powernv/opal.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -103,11 +103,8 @@ int __init early_init_dt_scan_opal(unsig powerpc_firmware_features |= FW_FEATURE_OPALv2; powerpc_firmware_features |= FW_FEATURE_OPALv3; pr_info("OPAL V3 detected !\n"); - } else if (of_flat_dt_is_compatible(node, "ibm,opal-v2")) { - powerpc_firmware_features |= FW_FEATURE_OPALv2; - pr_info("OPAL V2 detected !\n"); } else { - pr_info("OPAL V1 detected !\n"); + panic("OPAL != V3 detected, no longer supported.\n"); }
/* Reinit all cores with the right endian */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stewart Smith stewart@linux.vnet.ibm.com
commit 7261aafc095763b119136a562540dea7b1ccf657 upstream.
OPALv2 only ever existed in the lab and didn't escape to the world. All OPAL systems in the wild are OPALv3.
The probability of there being an OPALv2 system still powered on anywhere inside IBM is approximately zero, let alone anyone expecting to run mainline kernels.
So, start to remove references to OPALv2.
Signed-off-by: Stewart Smith stewart@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mike Galbraith mgalbraith@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/firmware.h | 4 +--- arch/powerpc/platforms/powernv/opal.c | 8 ++------ arch/powerpc/platforms/powernv/setup.c | 4 ---- arch/powerpc/platforms/powernv/smp.c | 4 ++-- 4 files changed, 5 insertions(+), 15 deletions(-)
--- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -47,7 +47,6 @@ #define FW_FEATURE_VPHN ASM_CONST(0x0000000004000000) #define FW_FEATURE_XCMO ASM_CONST(0x0000000008000000) #define FW_FEATURE_OPAL ASM_CONST(0x0000000010000000) -#define FW_FEATURE_OPALv2 ASM_CONST(0x0000000020000000) #define FW_FEATURE_SET_MODE ASM_CONST(0x0000000040000000) #define FW_FEATURE_BEST_ENERGY ASM_CONST(0x0000000080000000) #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000) @@ -70,8 +69,7 @@ enum { FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY | FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN, FW_FEATURE_PSERIES_ALWAYS = 0, - FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2 | - FW_FEATURE_OPALv3, + FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv3, FW_FEATURE_POWERNV_ALWAYS = 0, FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1, FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1, --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -100,7 +100,6 @@ int __init early_init_dt_scan_opal(unsig
powerpc_firmware_features |= FW_FEATURE_OPAL; if (of_flat_dt_is_compatible(node, "ibm,opal-v3")) { - powerpc_firmware_features |= FW_FEATURE_OPALv2; powerpc_firmware_features |= FW_FEATURE_OPALv3; pr_info("OPAL V3 detected !\n"); } else { @@ -349,7 +348,7 @@ int opal_put_chars(uint32_t vtermno, con * enough room and be done with it */ spin_lock_irqsave(&opal_write_lock, flags); - if (firmware_has_feature(FW_FEATURE_OPALv2)) { + if (firmware_has_feature(FW_FEATURE_OPALv3)) { rc = opal_console_write_buffer_space(vtermno, &olen); len = be64_to_cpu(olen); if (rc || len < total_len) { @@ -693,10 +692,7 @@ static int __init opal_init(void) }
/* Register OPAL consoles if any ports */ - if (firmware_has_feature(FW_FEATURE_OPALv2)) - consoles = of_find_node_by_path("/ibm,opal/consoles"); - else - consoles = of_node_get(opal_node); + consoles = of_find_node_by_path("/ibm,opal/consoles"); if (consoles) { for_each_child_of_node(consoles, np) { if (strcmp(np->name, "serial")) --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -142,10 +142,6 @@ static void pnv_show_cpuinfo(struct seq_ seq_printf(m, "machine\t\t: PowerNV %s\n", model); if (firmware_has_feature(FW_FEATURE_OPALv3)) seq_printf(m, "firmware\t: OPAL v3\n"); - else if (firmware_has_feature(FW_FEATURE_OPALv2)) - seq_printf(m, "firmware\t: OPAL v2\n"); - else if (firmware_has_feature(FW_FEATURE_OPAL)) - seq_printf(m, "firmware\t: OPAL v1\n"); else seq_printf(m, "firmware\t: BML\n"); of_node_put(root); --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -65,10 +65,10 @@ static int pnv_smp_kick_cpu(int nr) BUG_ON(nr < 0 || nr >= NR_CPUS);
/* - * If we already started or OPALv2 is not supported, we just + * If we already started or OPALv3 is not supported, we just * kick the CPU via the PACA */ - if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPALv2)) + if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPALv3)) goto kick;
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stewart Smith stewart@linux.vnet.ibm.com
commit e4d54f71d29997344b4c4c8d47708240f9f23a5c upstream.
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OPALv3 to be cared about or supported by mainline kernels.
So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL exclusively.
Signed-off-by: Stewart Smith stewart@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Mike Galbraith mgalbraith@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/include/asm/firmware.h | 3 - arch/powerpc/platforms/powernv/eeh-powernv.c | 4 - arch/powerpc/platforms/powernv/idle.c | 2 arch/powerpc/platforms/powernv/opal-xscom.c | 2 arch/powerpc/platforms/powernv/opal.c | 25 ++++----- arch/powerpc/platforms/powernv/pci-ioda.c | 2 arch/powerpc/platforms/powernv/setup.c | 8 +- arch/powerpc/platforms/powernv/smp.c | 74 +++++++++++---------------- drivers/cpufreq/powernv-cpufreq.c | 2 drivers/cpuidle/cpuidle-powernv.c | 2 10 files changed, 54 insertions(+), 70 deletions(-)
--- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -51,7 +51,6 @@ #define FW_FEATURE_BEST_ENERGY ASM_CONST(0x0000000080000000) #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000) #define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000) -#define FW_FEATURE_OPALv3 ASM_CONST(0x0000000400000000)
#ifndef __ASSEMBLY__
@@ -69,7 +68,7 @@ enum { FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY | FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN, FW_FEATURE_PSERIES_ALWAYS = 0, - FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv3, + FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL, FW_FEATURE_POWERNV_ALWAYS = 0, FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1, FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1, --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -48,8 +48,8 @@ static int pnv_eeh_init(void) struct pci_controller *hose; struct pnv_phb *phb;
- if (!firmware_has_feature(FW_FEATURE_OPALv3)) { - pr_warn("%s: OPALv3 is required !\n", + if (!firmware_has_feature(FW_FEATURE_OPAL)) { + pr_warn("%s: OPAL is required !\n", __func__); return -EINVAL; } --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -242,7 +242,7 @@ static int __init pnv_init_idle_states(v if (cpuidle_disable != IDLE_NO_OVERRIDE) goto out;
- if (!firmware_has_feature(FW_FEATURE_OPALv3)) + if (!firmware_has_feature(FW_FEATURE_OPAL)) goto out;
power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); --- a/arch/powerpc/platforms/powernv/opal-xscom.c +++ b/arch/powerpc/platforms/powernv/opal-xscom.c @@ -126,7 +126,7 @@ static const struct scom_controller opal
static int opal_xscom_init(void) { - if (firmware_has_feature(FW_FEATURE_OPALv3)) + if (firmware_has_feature(FW_FEATURE_OPAL)) scom_init(&opal_scom_controller); return 0; } --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -98,10 +98,9 @@ int __init early_init_dt_scan_opal(unsig pr_debug("OPAL Entry = 0x%llx (sizep=%p runtimesz=%d)\n", opal.size, sizep, runtimesz);
- powerpc_firmware_features |= FW_FEATURE_OPAL; if (of_flat_dt_is_compatible(node, "ibm,opal-v3")) { - powerpc_firmware_features |= FW_FEATURE_OPALv3; - pr_info("OPAL V3 detected !\n"); + powerpc_firmware_features |= FW_FEATURE_OPAL; + pr_info("OPAL detected !\n"); } else { panic("OPAL != V3 detected, no longer supported.\n"); } @@ -348,17 +347,15 @@ int opal_put_chars(uint32_t vtermno, con * enough room and be done with it */ spin_lock_irqsave(&opal_write_lock, flags); - if (firmware_has_feature(FW_FEATURE_OPALv3)) { - rc = opal_console_write_buffer_space(vtermno, &olen); - len = be64_to_cpu(olen); - if (rc || len < total_len) { - spin_unlock_irqrestore(&opal_write_lock, flags); - /* Closed -> drop characters */ - if (rc) - return total_len; - opal_poll_events(NULL); - return -EAGAIN; - } + rc = opal_console_write_buffer_space(vtermno, &olen); + len = be64_to_cpu(olen); + if (rc || len < total_len) { + spin_unlock_irqrestore(&opal_write_lock, flags); + /* Closed -> drop characters */ + if (rc) + return total_len; + opal_poll_events(NULL); + return -EAGAIN; }
/* We still try to handle partial completions, though they --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -344,7 +344,7 @@ static void __init pnv_ioda_parse_m64_wi return; }
- if (!firmware_has_feature(FW_FEATURE_OPALv3)) { + if (!firmware_has_feature(FW_FEATURE_OPAL)) { pr_info(" Firmware too old to support M64 window\n"); return; } --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -140,8 +140,8 @@ static void pnv_show_cpuinfo(struct seq_ if (root) model = of_get_property(root, "model", NULL); seq_printf(m, "machine\t\t: PowerNV %s\n", model); - if (firmware_has_feature(FW_FEATURE_OPALv3)) - seq_printf(m, "firmware\t: OPAL v3\n"); + if (firmware_has_feature(FW_FEATURE_OPAL)) + seq_printf(m, "firmware\t: OPAL\n"); else seq_printf(m, "firmware\t: BML\n"); of_node_put(root); @@ -270,9 +270,9 @@ static void pnv_kexec_cpu_down(int crash { xics_kexec_teardown_cpu(secondary);
- /* On OPAL v3, we return all CPUs to firmware */ + /* On OPAL, we return all CPUs to firmware */
- if (!firmware_has_feature(FW_FEATURE_OPALv3)) + if (!firmware_has_feature(FW_FEATURE_OPAL)) return;
if (secondary) { --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -61,14 +61,15 @@ static int pnv_smp_kick_cpu(int nr) unsigned long start_here = __pa(ppc_function_entry(generic_secondary_smp_init)); long rc; + uint8_t status;
BUG_ON(nr < 0 || nr >= NR_CPUS);
/* - * If we already started or OPALv3 is not supported, we just + * If we already started or OPAL is not supported, we just * kick the CPU via the PACA */ - if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPALv3)) + if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPAL)) goto kick;
/* @@ -77,55 +78,42 @@ static int pnv_smp_kick_cpu(int nr) * first time. OPAL v3 allows us to query OPAL to know if it * has the CPUs, so we do that */ - if (firmware_has_feature(FW_FEATURE_OPALv3)) { - uint8_t status; - - rc = opal_query_cpu_status(pcpu, &status); - if (rc != OPAL_SUCCESS) { - pr_warn("OPAL Error %ld querying CPU %d state\n", - rc, nr); - return -ENODEV; - } + rc = opal_query_cpu_status(pcpu, &status); + if (rc != OPAL_SUCCESS) { + pr_warn("OPAL Error %ld querying CPU %d state\n", rc, nr); + return -ENODEV; + }
- /* - * Already started, just kick it, probably coming from - * kexec and spinning - */ - if (status == OPAL_THREAD_STARTED) - goto kick; + /* + * Already started, just kick it, probably coming from + * kexec and spinning + */ + if (status == OPAL_THREAD_STARTED) + goto kick;
- /* - * Available/inactive, let's kick it - */ - if (status == OPAL_THREAD_INACTIVE) { - pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", - nr, pcpu); - rc = opal_start_cpu(pcpu, start_here); - if (rc != OPAL_SUCCESS) { - pr_warn("OPAL Error %ld starting CPU %d\n", - rc, nr); - return -ENODEV; - } - } else { - /* - * An unavailable CPU (or any other unknown status) - * shouldn't be started. It should also - * not be in the possible map but currently it can - * happen - */ - pr_devel("OPAL: CPU %d (HW 0x%x) is unavailable" - " (status %d)...\n", nr, pcpu, status); + /* + * Available/inactive, let's kick it + */ + if (status == OPAL_THREAD_INACTIVE) { + pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", nr, pcpu); + rc = opal_start_cpu(pcpu, start_here); + if (rc != OPAL_SUCCESS) { + pr_warn("OPAL Error %ld starting CPU %d\n", rc, nr); return -ENODEV; } } else { /* - * On OPAL v2, we just kick it and hope for the best, - * we must not test the error from opal_start_cpu() or - * we would fail to get CPUs from kexec. + * An unavailable CPU (or any other unknown status) + * shouldn't be started. It should also + * not be in the possible map but currently it can + * happen */ - opal_start_cpu(pcpu, start_here); + pr_devel("OPAL: CPU %d (HW 0x%x) is unavailable" + " (status %d)...\n", nr, pcpu, status); + return -ENODEV; } - kick: + +kick: return smp_generic_kick_cpu(nr); }
--- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -592,7 +592,7 @@ static int __init powernv_cpufreq_init(v int rc = 0;
/* Don't probe on pseries (guest) platforms */ - if (!firmware_has_feature(FW_FEATURE_OPALv3)) + if (!firmware_has_feature(FW_FEATURE_OPAL)) return -ENODEV;
/* Discover pstates from device tree and init */ --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -282,7 +282,7 @@ static int powernv_idle_probe(void) if (cpuidle_disable != IDLE_NO_OVERRIDE) return -ENODEV;
- if (firmware_has_feature(FW_FEATURE_OPALv3)) { + if (firmware_has_feature(FW_FEATURE_OPAL)) { cpuidle_state_table = powernv_states; /* Device tree can indicate more idle states */ max_idle_state = powernv_add_idle_states();
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anders Roxell anders.roxell@linaro.org
commit 75274b33e779ae40a750bcb4bd0b07c4dfef4746 upstream.
This was found with the -RT patch enabled, but the fix should apply to non-RT also.
Used multi_v7_defconfig+PREEMPT_RT_FULL=y and this caused a compilation warning without this fix: ../drivers/cpuidle/coupled.c:122:21: warning: 'cpuidle_coupled_lock' defined but not used [-Wunused-variable]
Signed-off-by: Anders Roxell anders.roxell@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Mike Galbraith mgalbraith@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cpuidle/coupled.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/cpuidle/coupled.c +++ b/drivers/cpuidle/coupled.c @@ -119,7 +119,6 @@ struct cpuidle_coupled {
#define CPUIDLE_COUPLED_NOT_IDLE (-1)
-static DEFINE_MUTEX(cpuidle_coupled_lock); static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb);
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Herrenschmidt benh@kernel.crashing.org
commit 349524bc0da698ec77f2057cf4a4948eb6349265 upstream.
This causes warnings from cpufreq mutex code. This is also rather unnecessary and ineffective. If we really want to prevent concurrent unplug, we could take the unplug read lock but I don't see this being critical.
Fixes: cd77b5ce208c ("powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo") Signed-off-by: Benjamin Herrenschmidt benh@kernel.crashing.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Acked-by: Michal Suchanek msuchanek@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/setup-common.c | 11 ----------- 1 file changed, 11 deletions(-)
--- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -217,14 +217,6 @@ static int show_cpuinfo(struct seq_file unsigned short maj; unsigned short min;
- /* We only show online cpus: disable preempt (overzealous, I - * knew) to prevent cpu going down. */ - preempt_disable(); - if (!cpu_online(cpu_id)) { - preempt_enable(); - return 0; - } - #ifdef CONFIG_SMP pvr = per_cpu(cpu_pvr, cpu_id); #else @@ -329,9 +321,6 @@ static int show_cpuinfo(struct seq_file #ifdef CONFIG_SMP seq_printf(m, "\n"); #endif - - preempt_enable(); - /* If this is the last cpu, print the summary */ if (cpumask_next(cpu_id, cpu_online_mask) >= nr_cpu_ids) show_cpuinfo_summary(m);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Davydov vdavydov@virtuozzo.com
commit 316bda0e6cc5f36f94b4af8bded16d642c90ad75 upstream.
We assume there is enough inactive page cache if the size of inactive file lru is greater than the size of active file lru, in which case we force-scan file lru ignoring anonymous pages. While this logic works fine when there are plenty of page cache pages, it fails if the size of file lru is small (several MB): in this case (lru_size >> prio) will be 0 for normal scan priorities, as a result, if inactive file lru happens to be larger than active file lru, anonymous pages of a cgroup will never get evicted unless the system experiences severe memory pressure, even if there are gigabytes of unused anonymous memory there, which is unfair in respect to other cgroups, whose workloads might be page cache oriented.
This patch attempts to fix this by elaborating the "enough inactive page cache" check: it makes it not only check that inactive lru size > active lru size, but also that we will scan something from the cgroup at the current scan priority. If these conditions do not hold, we proceed to SCAN_FRACT as usual.
Signed-off-by: Vladimir Davydov vdavydov@virtuozzo.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Michal Hocko mhocko@suse.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mel Gorman mgorman@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/vmscan.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2057,10 +2057,16 @@ static void get_scan_count(struct lruvec }
/* - * There is enough inactive page cache, do not reclaim - * anything from the anonymous working set right now. + * If there is enough inactive page cache, i.e. if the size of the + * inactive list is greater than that of the active list *and* the + * inactive list actually has some pages to scan on this priority, we + * do not reclaim anything from the anonymous working set right now. + * Without the second condition we could end up never scanning an + * lruvec even if it has plenty of old anonymous pages unless the + * system is under heavy pressure. */ - if (!inactive_file_is_low(lruvec)) { + if (!inactive_file_is_low(lruvec) && + get_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) { scan_balance = SCAN_FILE; goto out; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Weiner hannes@cmpxchg.org
commit 84ad5802a33a4964a49b8f7d24d80a214a096b19 upstream.
The MemAvailable item in /proc/meminfo is to give users a hint of how much memory is allocatable without causing swapping, so it excludes the zones' low watermarks as unavailable to userspace.
However, for a userspace allocation, kswapd will actually reclaim until the free pages hit a combination of the high watermark and the page allocator's lowmem protection that keeps a certain amount of DMA and DMA32 memory from userspace as well.
Subtract the full amount we know to be unavailable to userspace from the number of free pages when calculating MemAvailable.
Signed-off-by: Johannes Weiner hannes@cmpxchg.org Cc: Rik van Riel riel@redhat.com Cc: Mel Gorman mgorman@suse.de Acked-by: Michal Hocko mhocko@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mel Gorman mgorman@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/proc/meminfo.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -57,11 +57,8 @@ static int meminfo_proc_show(struct seq_ /* * Estimate the amount of memory available for userspace allocations, * without causing swapping. - * - * Free memory cannot be taken below the low watermark, before the - * system starts swapping. */ - available = i.freeram - wmark_low; + available = i.freeram - totalreserve_pages;
/* * Not all the page cache can be freed, otherwise the system will
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman mgorman@techsingularity.net
commit 32b635298ff4e991d8d8f64dc23782b02eec29c3 upstream.
do_read_cache_page and __read_cache_page duplicate page filler code when filling the page for the first time. This patch simply removes the duplicate logic.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Reviewed-by: Jan Kara jack@suse.cz Cc: Hugh Dickins hughd@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mel Gorman mgorman@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/filemap.c | 43 ++++++++++++------------------------------- 1 file changed, 12 insertions(+), 31 deletions(-)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -2215,7 +2215,7 @@ static struct page *wait_on_page_read(st return page; }
-static struct page *__read_cache_page(struct address_space *mapping, +static struct page *do_read_cache_page(struct address_space *mapping, pgoff_t index, int (*filler)(void *, struct page *), void *data, @@ -2237,31 +2237,19 @@ repeat: /* Presumably ENOMEM for radix tree node */ return ERR_PTR(err); } + +filler: err = filler(data, page); if (err < 0) { page_cache_release(page); - page = ERR_PTR(err); - } else { - page = wait_on_page_read(page); + return ERR_PTR(err); } - } - return page; -} - -static struct page *do_read_cache_page(struct address_space *mapping, - pgoff_t index, - int (*filler)(void *, struct page *), - void *data, - gfp_t gfp)
-{ - struct page *page; - int err; - -retry: - page = __read_cache_page(mapping, index, filler, data, gfp); - if (IS_ERR(page)) - return page; + page = wait_on_page_read(page); + if (IS_ERR(page)) + return page; + goto out; + } if (PageUptodate(page)) goto out;
@@ -2269,21 +2257,14 @@ retry: if (!page->mapping) { unlock_page(page); page_cache_release(page); - goto retry; + goto repeat; } if (PageUptodate(page)) { unlock_page(page); goto out; } - err = filler(data, page); - if (err < 0) { - page_cache_release(page); - return ERR_PTR(err); - } else { - page = wait_on_page_read(page); - if (IS_ERR(page)) - return page; - } + goto filler; + out: mark_page_accessed(page); return page;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman mgorman@techsingularity.net
commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream.
In the generic read paths the kernel looks up a page in the page cache and if it's up to date, it is used. If not, the page lock is acquired to wait for IO to complete and then check the page. If multiple processes are waiting on IO, they all serialise against the lock and duplicate the checks. This is unnecessary.
The page lock in itself does not give any guarantees to the callers about the page state as it can be immediately truncated or reclaimed after the page is unlocked. It's sufficient to wait_on_page_locked and then continue if the page is up to date on wakeup.
It is possible that a truncated but up-to-date page is returned but the reference taken during read prevents it disappearing underneath the caller and the data is still valid if PageUptodate.
The overall impact is small as even if processes serialise on the lock, the lock section is tiny once the IO is complete. Profiles indicated that unlock_page and friends are generally a tiny portion of a read-intensive workload. An artificial test was created that had instances of dd access a cache-cold file on an ext4 filesystem and measure how long the read took.
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid unnecessary calls to lock_page.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Reviewed-by: Jan Kara jack@suse.cz Cc: Hugh Dickins hughd@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mel Gorman mgorman@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -1581,6 +1581,15 @@ find_page: index, last_index - index); } if (!PageUptodate(page)) { + /* + * See comment in do_read_cache_page on why + * wait_on_page_locked is used to avoid unnecessarily + * serialisations and why it's safe. + */ + wait_on_page_locked_killable(page); + if (PageUptodate(page)) + goto page_ok; + if (inode->i_blkbits == PAGE_CACHE_SHIFT || !mapping->a_ops->is_partially_uptodate) goto page_not_up_to_date; @@ -2253,12 +2262,52 @@ filler: if (PageUptodate(page)) goto out;
+ /* + * Page is not up to date and may be locked due one of the following + * case a: Page is being filled and the page lock is held + * case b: Read/write error clearing the page uptodate status + * case c: Truncation in progress (page locked) + * case d: Reclaim in progress + * + * Case a, the page will be up to date when the page is unlocked. + * There is no need to serialise on the page lock here as the page + * is pinned so the lock gives no additional protection. Even if the + * the page is truncated, the data is still valid if PageUptodate as + * it's a race vs truncate race. + * Case b, the page will not be up to date + * Case c, the page may be truncated but in itself, the data may still + * be valid after IO completes as it's a read vs truncate race. The + * operation must restart if the page is not uptodate on unlock but + * otherwise serialising on page lock to stabilise the mapping gives + * no additional guarantees to the caller as the page lock is + * released before return. + * Case d, similar to truncation. If reclaim holds the page lock, it + * will be a race with remove_mapping that determines if the mapping + * is valid on unlock but otherwise the data is valid and there is + * no need to serialise with page lock. + * + * As the page lock gives no additional guarantee, we optimistically + * wait on the page to be unlocked and check if it's up to date and + * use the page if it is. Otherwise, the page lock is required to + * distinguish between the different cases. The motivation is that we + * avoid spurious serialisations and wakeups when multiple processes + * wait on the same page for IO to complete. + */ + wait_on_page_locked(page); + if (PageUptodate(page)) + goto out; + + /* Distinguish between all the cases under the safety of the lock */ lock_page(page); + + /* Case c or d, restart the operation */ if (!page->mapping) { unlock_page(page); page_cache_release(page); goto repeat; } + + /* Someone else locked and filled the page in a very small window */ if (PageUptodate(page)) { unlock_page(page); goto out;
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
Just one objection: Why does stable care about this (and the previous patch)? I've checked the stable queue and I don't see anything that would have these patches as a prerequisite. And on their own, they are only cleanups without substantial gains.
Honza
From: Mel Gorman mgorman@techsingularity.net
commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream.
In the generic read paths the kernel looks up a page in the page cache and if it's up to date, it is used. If not, the page lock is acquired to wait for IO to complete and then check the page. If multiple processes are waiting on IO, they all serialise against the lock and duplicate the checks. This is unnecessary.
The page lock in itself does not give any guarantees to the callers about the page state as it can be immediately truncated or reclaimed after the page is unlocked. It's sufficient to wait_on_page_locked and then continue if the page is up to date on wakeup.
It is possible that a truncated but up-to-date page is returned but the reference taken during read prevents it disappearing underneath the caller and the data is still valid if PageUptodate.
The overall impact is small as even if processes serialise on the lock, the lock section is tiny once the IO is complete. Profiles indicated that unlock_page and friends are generally a tiny portion of a read-intensive workload. An artificial test was created that had instances of dd access a cache-cold file on an ext4 filesystem and measure how long the read took.
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid unnecessary calls to lock_page.
Signed-off-by: Mel Gorman mgorman@techsingularity.net Reviewed-by: Jan Kara jack@suse.cz Cc: Hugh Dickins hughd@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mel Gorman mgorman@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
mm/filemap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
--- a/mm/filemap.c +++ b/mm/filemap.c @@ -1581,6 +1581,15 @@ find_page: index, last_index - index); } if (!PageUptodate(page)) {
/*
* See comment in do_read_cache_page on why
* wait_on_page_locked is used to avoid unnecessarily
* serialisations and why it's safe.
*/
wait_on_page_locked_killable(page);
if (PageUptodate(page))
goto page_ok;
if (inode->i_blkbits == PAGE_CACHE_SHIFT || !mapping->a_ops->is_partially_uptodate) goto page_not_up_to_date;
@@ -2253,12 +2262,52 @@ filler: if (PageUptodate(page)) goto out;
- /*
* Page is not up to date and may be locked due one of the following
* case a: Page is being filled and the page lock is held
* case b: Read/write error clearing the page uptodate status
* case c: Truncation in progress (page locked)
* case d: Reclaim in progress
*
* Case a, the page will be up to date when the page is unlocked.
* There is no need to serialise on the page lock here as the page
* is pinned so the lock gives no additional protection. Even if the
* the page is truncated, the data is still valid if PageUptodate as
* it's a race vs truncate race.
* Case b, the page will not be up to date
* Case c, the page may be truncated but in itself, the data may still
* be valid after IO completes as it's a read vs truncate race. The
* operation must restart if the page is not uptodate on unlock but
* otherwise serialising on page lock to stabilise the mapping gives
* no additional guarantees to the caller as the page lock is
* released before return.
* Case d, similar to truncation. If reclaim holds the page lock, it
* will be a race with remove_mapping that determines if the mapping
* is valid on unlock but otherwise the data is valid and there is
* no need to serialise with page lock.
*
* As the page lock gives no additional guarantee, we optimistically
* wait on the page to be unlocked and check if it's up to date and
* use the page if it is. Otherwise, the page lock is required to
* distinguish between the different cases. The motivation is that we
* avoid spurious serialisations and wakeups when multiple processes
* wait on the same page for IO to complete.
*/
- wait_on_page_locked(page);
- if (PageUptodate(page))
goto out;
- /* Distinguish between all the cases under the safety of the lock */ lock_page(page);
- /* Case c or d, restart the operation */ if (!page->mapping) { unlock_page(page); page_cache_release(page); goto repeat; }
- /* Someone else locked and filled the page in a very small window */ if (PageUptodate(page)) { unlock_page(page); goto out;
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
Just one objection: Why does stable care about this (and the previous patch)? I've checked the stable queue and I don't see anything that would have these patches as a prerequisite. And on their own, they are only cleanups without substantial gains.
There's a small gain here:
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid unnecessary calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be merged into stable", so obviously it matters to someone :)
thanks,
greg k-h
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me
know.
Just one objection: Why does stable care about this (and the previous patch)? I've checked the stable queue and I don't see anything that
would
have these patches as a prerequisite. And on their own, they are only cleanups without substantial gains.
There's a small gain here:
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid
unnecessary
calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that they made applying later patches easier, and didn't take the trouble he did to find that's not so.
I've no wish to be disputatious, but it does seem that the definition of "stable" has changed, and not necessarily for the better, if it's now a home for small gains: I thought we left those to upstream.
Hugh
On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me
know.
Just one objection: Why does stable care about this (and the previous patch)? I've checked the stable queue and I don't see anything that
would
have these patches as a prerequisite. And on their own, they are only cleanups without substantial gains.
There's a small gain here:
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid
unnecessary
calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that they made applying later patches easier, and didn't take the trouble he did to find that's not so.
I've no wish to be disputatious, but it does seem that the definition of "stable" has changed, and not necessarily for the better, if it's now a home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the section that says "this should be pushed to stable". So if it's good enough for the SLES kernel, why isn't it good enough for all users of this kernel tree?
If you all think it should be dropped in both places, that's fine with me :)
thanks,
greg k-h
On Thu 24-05-18 13:28:41, Greg KH wrote:
On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me
know.
Just one objection: Why does stable care about this (and the previous patch)? I've checked the stable queue and I don't see anything that
would
have these patches as a prerequisite. And on their own, they are only cleanups without substantial gains.
There's a small gain here:
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid
unnecessary
calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that they made applying later patches easier, and didn't take the trouble he did to find that's not so.
I've no wish to be disputatious, but it does seem that the definition of "stable" has changed, and not necessarily for the better, if it's now a home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the section that says "this should be pushed to stable". So if it's good enough for the SLES kernel, why isn't it good enough for all users of this kernel tree?
Heh, fair enough. I guess Mel in the end didn't find patches worthy enough to be pushed to stable tree. But at least now I know they are well tested with 4.4 base so they should do no harm in the stable tree so my stance is closer to neutral.
Honza
On Thu, May 24, 2018 at 02:02:37PM +0200, Jan Kara wrote:
This is in the SLES kernel for a reason, and again, it's in the section that says "this should be pushed to stable". So if it's good enough for the SLES kernel, why isn't it good enough for all users of this kernel tree?
Heh, fair enough. I guess Mel in the end didn't find patches worthy enough to be pushed to stable tree. But at least now I know they are well tested with 4.4 base so they should do no harm in the stable tree so my stance is closer to neutral.
Early on, I backported a number of performance patches to -stable with the view to having a good baseline to compare a new mainline release with. However, after a while some of them required unrelated backports that would be excessive for -stable. While I could have continued backporting some patches, I stopped as the time required to run all of the performance tests is excessive and I was already tracking too many kernels.
On Thu, May 24, 2018 at 4:28 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please
let me
know.
Just one objection: Why does stable care about this (and the
previous
patch)? I've checked the stable queue and I don't see anything that
would
have these patches as a prerequisite. And on their own, they are
only
cleanups without substantial gains.
There's a small gain here:
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid
unnecessary
calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that they made applying later patches easier, and didn't take the trouble he did
to
find that's not so.
I've no wish to be disputatious, but it does seem that the definition of "stable" has changed, and not necessarily for the better, if it's now a home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the section that says "this should be pushed to stable". So if it's good enough for the SLES kernel, why isn't it good enough for all users of this kernel tree?
If you all think it should be dropped in both places, that's fine with me :)
I think they are perfectly fine in SLES: folding in good work is a part of what distros are about.
But I cannot find anything in stable-kernel-rules.rst that would admit them - perhaps that's just out of date?
If -stable is to be a compendium of "this looks nice, you might like to include it", so be it: but the rules should then be updated.
Hugh
On Thu, May 24, 2018 at 10:27:59AM -0700, Hugh Dickins wrote:
On Thu, May 24, 2018 at 4:28 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please
let me
know.
Just one objection: Why does stable care about this (and the
previous
patch)? I've checked the stable queue and I don't see anything that
would
have these patches as a prerequisite. And on their own, they are
only
cleanups without substantial gains.
There's a small gain here:
paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%)
The impact is small but intuitively, it makes sense to avoid
unnecessary
calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to be merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that they made applying later patches easier, and didn't take the trouble he did
to
find that's not so.
I've no wish to be disputatious, but it does seem that the definition of "stable" has changed, and not necessarily for the better, if it's now a home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the section that says "this should be pushed to stable". So if it's good enough for the SLES kernel, why isn't it good enough for all users of this kernel tree?
If you all think it should be dropped in both places, that's fine with me :)
I think they are perfectly fine in SLES: folding in good work is a part of what distros are about.
And it's also what stable is for. We have had backports of performance improvements in the past, along with lots of other things over the years. This is a performance improvement. A tiny one, yes, but getting rid of a lock is a good thing, and I picked it up as part of my review of what a distro decided was worth adding for their users, as that's a huge signal that might be of value to others.
But I cannot find anything in stable-kernel-rules.rst that would admit them
- perhaps that's just out of date?
Nope, that's the list I use to say "no" to. You can't describe everything in that file, it's a judgement call.
If -stable is to be a compendium of "this looks nice, you might like to include it", so be it: but the rules should then be updated.
This is a "a bunch of people I trust took it in their kernel, and it's been running on zillion of machines for a while and causes no harm and a slight benefit, so let's add it!" type of thing. It's not the only patch in this series that was like that, but for some reason this one is the one the triggered the debate, which is funny to me as this does have numbers in it showing that it is an improvement :)
thanks,
greg k-h
On Thu, May 24, 2018 at 12:06 PM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 10:27:59AM -0700, Hugh Dickins wrote:
On Thu, May 24, 2018 at 4:28 AM Greg KH gregkh@linuxfoundation.org
wrote:
On Thu, May 24, 2018 at 04:17:12AM -0700, Hugh Dickins wrote:
Thu, May 24, 2018 at 4:06 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, May 24, 2018 at 12:50:11PM +0200, Jan Kara wrote:
On Thu 24-05-18 11:38:27, Greg Kroah-Hartman wrote: > 4.4-stable review patch. If anyone has any objections, please
let me
know.
Just one objection: Why does stable care about this (and the
previous
patch)? I've checked the stable queue and I don't see anything
that
would
have these patches as a prerequisite. And on their own, they are
only
cleanups without substantial gains.
There's a small gain here:
> paralleldd > 4.4.0
4.4.0
> vanilla
avoidlock
> Amean Elapsd-1 5.28 ( 0.00%) 5.15 (
2.50%)
> Amean Elapsd-4 5.29 ( 0.00%) 5.17 (
2.12%)
> Amean Elapsd-7 5.28 ( 0.00%) 5.18 (
1.78%)
> Amean Elapsd-12 5.20 ( 0.00%) 5.33 (
-2.50%)
> Amean Elapsd-21 5.14 ( 0.00%) 5.21 (
-1.41%)
> Amean Elapsd-30 5.30 ( 0.00%) 5.12 (
3.38%)
> Amean Elapsd-48 5.78 ( 0.00%) 5.42 (
6.21%)
> Amean Elapsd-79 6.78 ( 0.00%) 6.62 (
2.46%)
> Amean Elapsd-110 9.09 ( 0.00%) 8.99 (
1.15%)
> Amean Elapsd-128 10.60 ( 0.00%) 10.43 (
1.66%)
> > The impact is small but intuitively, it makes sense to avoid
unnecessary
> calls to lock_page.
Yes, it's small, but it's marked in the SLES kernel as "needs to
be
merged into stable", so obviously it matters to someone :)
Hmm. I had the same reaction to these two as Jan, but assumed that
they
made applying later patches easier, and didn't take the trouble he
did
to
find that's not so.
I've no wish to be disputatious, but it does seem that the
definition of
"stable" has changed, and not necessarily for the better, if it's
now a
home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the
section
that says "this should be pushed to stable". So if it's good enough
for
the SLES kernel, why isn't it good enough for all users of this kernel tree?
If you all think it should be dropped in both places, that's fine with me :)
I think they are perfectly fine in SLES: folding in good work is a part
of
what distros are about.
And it's also what stable is for. We have had backports of performance improvements in the past, along with lots of other things over the years. This is a performance improvement. A tiny one, yes, but getting rid of a lock is a good thing, and I picked it up as part of my review of what a distro decided was worth adding for their users, as that's a huge signal that might be of value to others.
But I cannot find anything in stable-kernel-rules.rst that would admit
them
- perhaps that's just out of date?
Nope, that's the list I use to say "no" to. You can't describe everything in that file, it's a judgement call.
If -stable is to be a compendium of "this looks nice, you might like to include it", so be it: but the rules should then be updated.
This is a "a bunch of people I trust took it in their kernel, and it's been running on zillion of machines for a while and causes no harm and a slight benefit, so let's add it!" type of thing. It's not the only patch in this series that was like that, but for some reason this one is the one the triggered the debate, which is funny to me as this does have numbers in it showing that it is an improvement :)
Thank you for looking after the -stable trees: please let me not waste your time any further. I have no specific objection to the two patches, which are certainly not egregious offenders. But I do still find the disconnect between stable-kernel-rules.rst and reality confusing - or perhaps I just find reality confusing :)
Hugh
Hi!
I've no wish to be disputatious, but it does seem that the definition of "stable" has changed, and not necessarily for the better, if it's now a home for small gains: I thought we left those to upstream.
This is in the SLES kernel for a reason, and again, it's in the section that says "this should be pushed to stable". So if it's good enough for the SLES kernel, why isn't it good enough for all users of this kernel tree?
If you all think it should be dropped in both places, that's fine with me :)
I think they are perfectly fine in SLES: folding in good work is a part of what distros are about.
And it's also what stable is for. We have had backports of performance improvements in the past, along with lots of other things over the years. This is a performance improvement. A tiny one, yes, but getting rid of a lock is a good thing, and I picked it up as part of my review of what a distro decided was worth adding for their users, as that's a huge signal that might be of value to others.
But I cannot find anything in stable-kernel-rules.rst that would admit them
- perhaps that's just out of date?
Nope, that's the list I use to say "no" to. You can't describe everything in that file, it's a judgement call.
Well, it would be good if the documentation matched reality, because other people use the documentation to decide, too.
For example, documentation says bug has to be fixed in mainline, but in actual practice you try to have exactly the same patch.
Pavel
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long Waiman.Long@hpe.com
commit c7be96af89d4b53211862d8599b2430e8900ed92 upstream.
When running certain database workload on a high-end system with many CPUs, it was found that spinlock contention in the sigprocmask syscalls became a significant portion of the overall CPU cycles as shown below.
9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2 [k] _raw_spin_lock_irq | ---_raw_spin_lock_irq | |--99.34%-- __set_current_blocked | sigprocmask | sys_rt_sigprocmask | system_call_fastpath | | | |--50.63%-- __swapcontext | | | | | |--99.91%-- upsleepgeneric | | | |--49.36%-- __setcontext | | ktskRun
Looking further into the swapcontext function in glibc, it was found that the function always call sigprocmask() without checking if there are changes in the signal mask.
A check was added to the __set_current_blocked() function to avoid taking the sighand->siglock spinlock if there is no change in the signal mask. This will prevent unneeded spinlock contention when many threads are trying to call sigprocmask().
With this patch applied, the spinlock contention in sigprocmask() was gone.
Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.c... Signed-off-by: Waiman Long Waiman.Long@hpe.com Acked-by: Oleg Nesterov oleg@redhat.com Cc: Ingo Molnar mingo@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Stas Sergeev stsp@list.ru Cc: Scott J Norton scott.norton@hpe.com Cc: Douglas Hatch doug.hatch@hpe.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Mel Gorman mgorman@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/signal.h | 17 +++++++++++++++++ kernel/signal.c | 7 +++++++ 2 files changed, 24 insertions(+)
--- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -97,6 +97,23 @@ static inline int sigisemptyset(sigset_t } }
+static inline int sigequalsets(const sigset_t *set1, const sigset_t *set2) +{ + switch (_NSIG_WORDS) { + case 4: + return (set1->sig[3] == set2->sig[3]) && + (set1->sig[2] == set2->sig[2]) && + (set1->sig[1] == set2->sig[1]) && + (set1->sig[0] == set2->sig[0]); + case 2: + return (set1->sig[1] == set2->sig[1]) && + (set1->sig[0] == set2->sig[0]); + case 1: + return set1->sig[0] == set2->sig[0]; + } + return 0; +} + #define sigmask(sig) (1UL << ((sig) - 1))
#ifndef __HAVE_ARCH_SIG_SETOPS --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2495,6 +2495,13 @@ void __set_current_blocked(const sigset_ { struct task_struct *tsk = current;
+ /* + * In case the signal mask hasn't changed, there is nothing we need + * to do. The current->blocked shouldn't be modified by other task. + */ + if (sigequalsets(&tsk->blocked, newset)) + return; + spin_lock_irq(&tsk->sighand->siglock); __set_task_blocked(tsk, newset); spin_unlock_irq(&tsk->sighand->siglock);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
commit 7791e4aa59ad724e0b4c8b4dea547a5735108972 upstream.
If the processor supports HWP, enable it by default without checking for the cpu model. This will allow to enable HWP in all supported processors without driver change.
Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Thomas Renninger trenn@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cpufreq/intel_pstate.c | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-)
--- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1361,6 +1361,11 @@ static inline bool intel_pstate_platform static inline bool intel_pstate_has_acpi_ppc(void) { return false; } #endif /* CONFIG_ACPI */
+static const struct x86_cpu_id hwp_support_ids[] __initconst = { + { X86_VENDOR_INTEL, 6, X86_MODEL_ANY, X86_FEATURE_HWP }, + {} +}; + static int __init intel_pstate_init(void) { int cpu, rc = 0; @@ -1370,17 +1375,16 @@ static int __init intel_pstate_init(void if (no_load) return -ENODEV;
+ if (x86_match_cpu(hwp_support_ids) && !no_hwp) { + copy_cpu_funcs(&core_params.funcs); + hwp_active++; + goto hwp_cpu_matched; + } + id = x86_match_cpu(intel_pstate_cpu_ids); if (!id) return -ENODEV;
- /* - * The Intel pstate driver will be ignored if the platform - * firmware has its own power management modes. - */ - if (intel_pstate_platform_pwr_mgmt_exists()) - return -ENODEV; - cpu_def = (struct cpu_defaults *)id->driver_data;
copy_pid_params(&cpu_def->pid_policy); @@ -1389,17 +1393,20 @@ static int __init intel_pstate_init(void if (intel_pstate_msrs_not_valid()) return -ENODEV;
+hwp_cpu_matched: + /* + * The Intel pstate driver will be ignored if the platform + * firmware has its own power management modes. + */ + if (intel_pstate_platform_pwr_mgmt_exists()) + return -ENODEV; + pr_info("Intel P-state driver initializing.\n");
all_cpu_data = vzalloc(sizeof(void *) * num_possible_cpus()); if (!all_cpu_data) return -ENOMEM;
- if (static_cpu_has_safe(X86_FEATURE_HWP) && !no_hwp) { - pr_info("intel_pstate: HWP enabled\n"); - hwp_active++; - } - if (!hwp_active && hwp_only) goto out;
@@ -1410,6 +1417,9 @@ static int __init intel_pstate_init(void intel_pstate_debug_expose_params(); intel_pstate_sysfs_expose_params();
+ if (hwp_active) + pr_info("intel_pstate: HWP enabled\n"); + return rc; out: get_online_cpus();
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 45dd9b0666a162f8e4be76096716670cf1741f0e upstream.
Doing an audit of trace events, I discovered two trace events in the xen subsystem that use a hack to create zero data size trace events. This is not what trace events are for. Trace events add memory footprint overhead, and if all you need to do is see if a function is hit or not, simply make that function noinline and use function tracer filtering.
Worse yet, the hack used was:
__array(char, x, 0)
Which creates a static string of zero in length. There's assumptions about such constructs in ftrace that this is a dynamic string that is nul terminated. This is not the case with these tracepoints and can cause problems in various parts of ftrace.
Nuke the trace events!
Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home
Cc: stable@vger.kernel.org Fixes: 95a7d76897c1e ("xen/mmu: Use Xen specific TLB flush instead of the generic one.") Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/xen/mmu.c | 4 ---- include/trace/events/xen.h | 16 ---------------- 2 files changed, 20 deletions(-)
--- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1316,8 +1316,6 @@ void xen_flush_tlb_all(void) struct mmuext_op *op; struct multicall_space mcs;
- trace_xen_mmu_flush_tlb_all(0); - preempt_disable();
mcs = xen_mc_entry(sizeof(*op)); @@ -1335,8 +1333,6 @@ static void xen_flush_tlb(void) struct mmuext_op *op; struct multicall_space mcs;
- trace_xen_mmu_flush_tlb(0); - preempt_disable();
mcs = xen_mc_entry(sizeof(*op)); --- a/include/trace/events/xen.h +++ b/include/trace/events/xen.h @@ -377,22 +377,6 @@ DECLARE_EVENT_CLASS(xen_mmu_pgd, DEFINE_XEN_MMU_PGD_EVENT(xen_mmu_pgd_pin); DEFINE_XEN_MMU_PGD_EVENT(xen_mmu_pgd_unpin);
-TRACE_EVENT(xen_mmu_flush_tlb_all, - TP_PROTO(int x), - TP_ARGS(x), - TP_STRUCT__entry(__array(char, x, 0)), - TP_fast_assign((void)x), - TP_printk("%s", "") - ); - -TRACE_EVENT(xen_mmu_flush_tlb, - TP_PROTO(int x), - TP_ARGS(x), - TP_STRUCT__entry(__array(char, x, 0)), - TP_fast_assign((void)x), - TP_printk("%s", "") - ); - TRACE_EVENT(xen_mmu_flush_tlb_single, TP_PROTO(unsigned long addr), TP_ARGS(addr),
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mateusz Guzik mguzik@redhat.com
commit a3b609ef9f8b1dbfe97034ccad6cd3fe71fbe7ab upstream.
Only functions doing more than one read are modified. Consumeres happened to deal with possibly changing data, but it does not seem like a good thing to rely on.
Signed-off-by: Mateusz Guzik mguzik@redhat.com Acked-by: Cyrill Gorcunov gorcunov@openvz.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Jarod Wilson jarod@redhat.com Cc: Jan Stancek jstancek@redhat.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Anshuman Khandual anshuman.linux@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/proc/base.c | 13 ++++++++++--- mm/util.c | 16 ++++++++++++---- 2 files changed, 22 insertions(+), 7 deletions(-)
--- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -953,6 +953,7 @@ static ssize_t environ_read(struct file unsigned long src = *ppos; int ret = 0; struct mm_struct *mm = file->private_data; + unsigned long env_start, env_end;
/* Ensure the process spawned far enough to have an environment. */ if (!mm || !mm->env_end) @@ -965,19 +966,25 @@ static ssize_t environ_read(struct file ret = 0; if (!atomic_inc_not_zero(&mm->mm_users)) goto free; + + down_read(&mm->mmap_sem); + env_start = mm->env_start; + env_end = mm->env_end; + up_read(&mm->mmap_sem); + while (count > 0) { size_t this_len, max_len; int retval;
- if (src >= (mm->env_end - mm->env_start)) + if (src >= (env_end - env_start)) break;
- this_len = mm->env_end - (mm->env_start + src); + this_len = env_end - (env_start + src);
max_len = min_t(size_t, PAGE_SIZE, count); this_len = min(max_len, this_len);
- retval = access_remote_vm(mm, (mm->env_start + src), + retval = access_remote_vm(mm, (env_start + src), page, this_len, 0);
if (retval <= 0) { --- a/mm/util.c +++ b/mm/util.c @@ -428,17 +428,25 @@ int get_cmdline(struct task_struct *task int res = 0; unsigned int len; struct mm_struct *mm = get_task_mm(task); + unsigned long arg_start, arg_end, env_start, env_end; if (!mm) goto out; if (!mm->arg_end) goto out_mm; /* Shh! No looking before we're done */
- len = mm->arg_end - mm->arg_start; + down_read(&mm->mmap_sem); + arg_start = mm->arg_start; + arg_end = mm->arg_end; + env_start = mm->env_start; + env_end = mm->env_end; + up_read(&mm->mmap_sem); + + len = arg_end - arg_start;
if (len > buflen) len = buflen;
- res = access_process_vm(task, mm->arg_start, buffer, len, 0); + res = access_process_vm(task, arg_start, buffer, len, 0);
/* * If the nul at the end of args has been overwritten, then @@ -449,10 +457,10 @@ int get_cmdline(struct task_struct *task if (len < res) { res = len; } else { - len = mm->env_end - mm->env_start; + len = env_end - env_start; if (len > buflen - res) len = buflen - res; - res += access_process_vm(task, mm->env_start, + res += access_process_vm(task, env_start, buffer+res, len, 0); res = strnlen(buffer, res); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janis Danisevskis jdanis@google.com
commit 1b3044e39a89cb1d4d5313da477e8dfea2b5232d upstream.
The PR_DUMPABLE flag causes the pid related paths of the proc file system to be owned by ROOT.
The implementation of pthread_set/getname_np however needs access to /proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this implementation is locked out.
This patch installs a special permission function for the file "comm" that grants read and write access to all threads of the same group regardless of the ownership of the inode. For all other threads the function falls back to the generic inode permission check.
[akpm@linux-foundation.org: fix spello in comment] Signed-off-by: Janis Danisevskis jdanis@google.com Acked-by: Kees Cook keescook@chromium.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Colin Ian King colin.king@canonical.com Cc: David Rientjes rientjes@google.com Cc: Minfei Huang mnfhuang@gmail.com Cc: John Stultz john.stultz@linaro.org Cc: Calvin Owens calvinowens@fb.com Cc: Jann Horn jann@thejh.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/proc/base.c | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-)
--- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3083,6 +3083,44 @@ int proc_pid_readdir(struct file *file, }
/* + * proc_tid_comm_permission is a special permission function exclusively + * used for the node /proc/<pid>/task/<tid>/comm. + * It bypasses generic permission checks in the case where a task of the same + * task group attempts to access the node. + * The rationale behind this is that glibc and bionic access this node for + * cross thread naming (pthread_set/getname_np(!self)). However, if + * PR_SET_DUMPABLE gets set to 0 this node among others becomes uid=0 gid=0, + * which locks out the cross thread naming implementation. + * This function makes sure that the node is always accessible for members of + * same thread group. + */ +static int proc_tid_comm_permission(struct inode *inode, int mask) +{ + bool is_same_tgroup; + struct task_struct *task; + + task = get_proc_task(inode); + if (!task) + return -ESRCH; + is_same_tgroup = same_thread_group(current, task); + put_task_struct(task); + + if (likely(is_same_tgroup && !(mask & MAY_EXEC))) { + /* This file (/proc/<pid>/task/<tid>/comm) can always be + * read or written by the members of the corresponding + * thread group. + */ + return 0; + } + + return generic_permission(inode, mask); +} + +static const struct inode_operations proc_tid_comm_inode_operations = { + .permission = proc_tid_comm_permission, +}; + +/* * Tasks */ static const struct pid_entry tid_base_stuff[] = { @@ -3100,7 +3138,9 @@ static const struct pid_entry tid_base_s #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), #endif - REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations), + NOD("comm", S_IFREG|S_IRUGO|S_IWUSR, + &proc_tid_comm_inode_operations, + &proc_pid_set_comm_operations, {}), #ifdef CONFIG_HAVE_ARCH_TRACEHOOK ONE("syscall", S_IRUSR, proc_pid_syscall), #endif
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicholas Piggin npiggin@gmail.com
commit c1d2a31397ec51f0370f6bd17b19b39152c263cb upstream.
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context.
Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/platforms/powernv/opal-nvram.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
--- a/arch/powerpc/platforms/powernv/opal-nvram.c +++ b/arch/powerpc/platforms/powernv/opal-nvram.c @@ -44,6 +44,10 @@ static ssize_t opal_nvram_read(char *buf return count; }
+/* + * This can be called in the panic path with interrupts off, so use + * mdelay in that case. + */ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index) { s64 rc = OPAL_BUSY; @@ -58,10 +62,16 @@ static ssize_t opal_nvram_write(char *bu while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { rc = opal_write_nvram(__pa(buf), count, off); if (rc == OPAL_BUSY_EVENT) { - msleep(OPAL_BUSY_DELAY_MS); + if (in_interrupt() || irqs_disabled()) + mdelay(OPAL_BUSY_DELAY_MS); + else + msleep(OPAL_BUSY_DELAY_MS); opal_poll_events(NULL); } else if (rc == OPAL_BUSY) { - msleep(OPAL_BUSY_DELAY_MS); + if (in_interrupt() || irqs_disabled()) + mdelay(OPAL_BUSY_DELAY_MS); + else + msleep(OPAL_BUSY_DELAY_MS); } }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Tatashin pasha.tatashin@oracle.com
commit ab1e8d8960b68f54af42b6484b5950bd13a4054b upstream.
It is unsafe to do virtual to physical translations before mm_init() is called if struct page is needed in order to determine the memory section number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init() we initialize struct pages for all the allocated memory when deferred struct pages are used.
My recent fix in commit c9e97a1997 ("mm: initialize pages on demand during boot") exposed this problem, because it greatly reduced number of pages that are initialized before mm_init(), but the problem existed even before my fix, as Fengguang Wu found.
Below is a more detailed explanation of the problem.
We initialize struct pages in four places:
1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones.
2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock.
3. Using on-demand logic when pages are allocated after mm_init call (when memblock is finished)
4. After smp_init() when the rest free deferred pages are initialized.
The problem occurs if we try to do va to phys translation of a memory between steps 1 and 2. Because we have not yet initialized struct pages for all the reserved pages, it is inherently unsafe to do va to phys if the translation itself requires access of "struct page" as in case of this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP
The following path exposes the problem:
start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT)
We disable this path by not allowing NEED_PER_CPU_KM with deferred struct pages feature.
The problems are discussed in these threads: http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.... http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.... http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Pavel Tatashin pasha.tatashin@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Steven Sistare steven.sistare@oracle.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Fengguang Wu fengguang.wu@intel.com Cc: Dennis Zhou dennisszhou@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/mm/Kconfig +++ b/mm/Kconfig @@ -628,6 +628,7 @@ config DEFERRED_STRUCT_PAGE_INIT default n depends on ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT depends on MEMORY_HOTPLUG + depends on !NEED_PER_CPU_KM help Ordinarily all struct pages are initialised during early boot in a single thread. On very large machines this can take a considerable
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.ibm.com
commit e521813468f786271a87e78e8644243bead48fad upstream.
Ever since CQ/QAOB support was added, calling qdio_free() straight after qdio_alloc() results in qdio_release_memory() accessing uninitialized memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a kmem_cache_free() on the random AOB addresses.
For older kernels that don't have 6e30c549f6ca, the same applies if qdio_establish() fails in the DEV_STATE_ONLINE check.
While initializing q->u.out.use_cq would be enough to fix this particular bug, the more future-proof change is to just zero-alloc the whole struct.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: stable@vger.kernel.org #v3.2+ Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/qdio_setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/cio/qdio_setup.c +++ b/drivers/s390/cio/qdio_setup.c @@ -140,7 +140,7 @@ static int __qdio_allocate_qs(struct qdi int i;
for (i = 0; i < nr_queues; i++) { - q = kmem_cache_alloc(qdio_q_cache, GFP_KERNEL); + q = kmem_cache_zalloc(qdio_q_cache, GFP_KERNEL); if (!q) return -ENOMEM;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hendrik Brueckner brueckner@linux.ibm.com
commit 4bbaf2584b86b0772413edeac22ff448f36351b1 upstream.
Correct a trinity finding for the perf_event_open() system call with a perf event attribute structure that uses a frequency but has the sampling frequency set to zero. This causes a FP divide exception during the sample rate initialization for the hardware sampling facility.
Fixes: 8c069ff4bd606 ("s390/perf: add support for the CPU-Measurement Sampling Facility") Cc: stable@vger.kernel.org # 3.14+ Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Hendrik Brueckner brueckner@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/perf_cpum_sf.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/s390/kernel/perf_cpum_sf.c +++ b/arch/s390/kernel/perf_cpum_sf.c @@ -744,6 +744,10 @@ static int __hw_perf_event_init(struct p */ rate = 0; if (attr->freq) { + if (!attr->sample_freq) { + err = -EINVAL; + goto out; + } rate = freq_to_sample_rate(&si, attr->sample_freq); rate = hw_limit_rate(&si, rate); attr->freq = 0;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.ibm.com
commit 2e68adcd2fb21b7188ba449f0fab3bee2910e500 upstream.
Calling qdio_release_memory() on error is just plain wrong. It frees the main qdio_irq struct, when following code still uses it.
Also, no other error path in qdio_establish() does this. So trust callers to clean up via qdio_free() if some step of the QDIO initialization fails.
Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Cc: stable@vger.kernel.org #v2.6.27+ Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/qdio_setup.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/drivers/s390/cio/qdio_setup.c +++ b/drivers/s390/cio/qdio_setup.c @@ -456,7 +456,6 @@ int qdio_setup_irq(struct qdio_initializ { struct ciw *ciw; struct qdio_irq *irq_ptr = init_data->cdev->private->qdio_data; - int rc;
memset(&irq_ptr->qib, 0, sizeof(irq_ptr->qib)); memset(&irq_ptr->siga_flag, 0, sizeof(irq_ptr->siga_flag)); @@ -493,16 +492,14 @@ int qdio_setup_irq(struct qdio_initializ ciw = ccw_device_get_ciw(init_data->cdev, CIW_TYPE_EQUEUE); if (!ciw) { DBF_ERROR("%4x NO EQ", irq_ptr->schid.sch_no); - rc = -EINVAL; - goto out_err; + return -EINVAL; } irq_ptr->equeue = *ciw;
ciw = ccw_device_get_ciw(init_data->cdev, CIW_TYPE_AQUEUE); if (!ciw) { DBF_ERROR("%4x NO AQ", irq_ptr->schid.sch_no); - rc = -EINVAL; - goto out_err; + return -EINVAL; } irq_ptr->aqueue = *ciw;
@@ -510,9 +507,6 @@ int qdio_setup_irq(struct qdio_initializ irq_ptr->orig_handler = init_data->cdev->handler; init_data->cdev->handler = qdio_int_handler; return 0; -out_err: - qdio_release_memory(irq_ptr); - return rc; }
void qdio_print_subchannel_info(struct qdio_irq *irq_ptr,
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
commit 9f18fff63cfd6f559daa1eaae60640372c65f84b upstream.
The inline assembly to call __do_softirq on the irq stack uses an indirect branch. This can be replaced with a normal relative branch.
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner brueckner@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/irq.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/s390/kernel/irq.c +++ b/arch/s390/kernel/irq.c @@ -173,10 +173,9 @@ void do_softirq_own_stack(void) new -= STACK_FRAME_OVERHEAD; ((struct stack_frame *) new)->back_chain = old; asm volatile(" la 15,0(%0)\n" - " basr 14,%2\n" + " brasl 14,__do_softirq\n" " la 15,0(%1)\n" - : : "a" (new), "a" (old), - "a" (__do_softirq) + : : "a" (new), "a" (old) : "0", "1", "2", "3", "4", "5", "14", "cc", "memory" ); } else {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ard Biesheuvel ard.biesheuvel@linaro.org
commit 0b3225ab9407f557a8e20f23f37aa7236c10a9b1 upstream.
Mixed mode allows a kernel built for x86_64 to interact with 32-bit EFI firmware, but requires us to define all struct definitions carefully when it comes to pointer sizes.
'struct efi_pci_io_protocol_32' currently uses a 'void *' for the 'romimage' field, which will be interpreted as a 64-bit field on such kernels, potentially resulting in bogus memory references and subsequent crashes.
Tested-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: stable@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Matt Fleming matt@codeblueprint.co.uk Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/boot/compressed/eboot.c | 6 ++++-- include/linux/efi.h | 8 ++++---- 2 files changed, 8 insertions(+), 6 deletions(-)
--- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -364,7 +364,8 @@ __setup_efi_pci32(efi_pci_io_protocol_32 if (status != EFI_SUCCESS) goto free_struct;
- memcpy(rom->romdata, pci->romimage, pci->romsize); + memcpy(rom->romdata, (void *)(unsigned long)pci->romimage, + pci->romsize); return status;
free_struct: @@ -470,7 +471,8 @@ __setup_efi_pci64(efi_pci_io_protocol_64 if (status != EFI_SUCCESS) goto free_struct;
- memcpy(rom->romdata, pci->romimage, pci->romsize); + memcpy(rom->romdata, (void *)(unsigned long)pci->romimage, + pci->romsize); return status;
free_struct: --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -364,8 +364,8 @@ typedef struct { u32 attributes; u32 get_bar_attributes; u32 set_bar_attributes; - uint64_t romsize; - void *romimage; + u64 romsize; + u32 romimage; } efi_pci_io_protocol_32;
typedef struct { @@ -384,8 +384,8 @@ typedef struct { u64 attributes; u64 get_bar_attributes; u64 set_bar_attributes; - uint64_t romsize; - void *romimage; + u64 romsize; + u64 romimage; } efi_pci_io_protocol_64;
typedef struct {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit eb0146daefdde65665b7f076fbff7b49dade95b9 upstream.
Prohibit kprobes on do_undefinstr because kprobes on arm is implemented by undefined instruction. This means if we probe do_undefinstr(), it can cause infinit recursive exception.
Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/kernel/traps.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -19,6 +19,7 @@ #include <linux/uaccess.h> #include <linux/hardirq.h> #include <linux/kdebug.h> +#include <linux/kprobes.h> #include <linux/module.h> #include <linux/kexec.h> #include <linux/bug.h> @@ -395,7 +396,8 @@ void unregister_undef_hook(struct undef_ raw_spin_unlock_irqrestore(&undef_lock, flags); }
-static int call_undef_hook(struct pt_regs *regs, unsigned int instr) +static nokprobe_inline +int call_undef_hook(struct pt_regs *regs, unsigned int instr) { struct undef_hook *hook; unsigned long flags; @@ -468,6 +470,7 @@ die_sig:
arm_notify_die("Oops - undefined instruction", regs, &info, 0, 6); } +NOKPROBE_SYMBOL(do_undefinstr)
/* * Handle FIQ similarly to NMI on x86 systems.
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui decui@microsoft.com
commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream.
for_each_cpu() unintuitively reports CPU0 as set independent of the actual cpumask content on UP kernels. This causes an unexpected PIT interrupt storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as a result, the virtual machine can suffer from a strange random delay of 1~20 minutes during boot-up, and sometimes it can hang forever.
Protect if by checking whether the cpumask is empty before entering the for_each_cpu() loop.
[ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]
Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Josh Poulson jopoulso@microsoft.com Cc: "Michael Kelley (EOSG)" Michael.H.Kelley@microsoft.com Cc: Peter Zijlstra peterz@infradead.org Cc: Frederic Weisbecker fweisbec@gmail.com Cc: stable@vger.kernel.org Cc: Rakib Mullick rakib.mullick@gmail.com Cc: Jork Loeser Jork.Loeser@microsoft.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Andrew Morton akpm@linux-foundation.org Cc: KY Srinivasan kys@microsoft.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301... Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/tick-broadcast.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -610,6 +610,14 @@ static void tick_handle_oneshot_broadcas now = ktime_get(); /* Find all expired events */ for_each_cpu(cpu, tick_broadcast_oneshot_mask) { + /* + * Required for !SMP because for_each_cpu() reports + * unconditionally CPU0 as set on UP kernels. + */ + if (!IS_ENABLED(CONFIG_SMP) && + cpumask_empty(tick_broadcast_oneshot_mask)) + break; + td = &per_cpu(tick_cpu_device, cpu); if (td->evtdev->next_event.tv64 <= now.tv64) { cpumask_set_cpu(cpu, tmpmask);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit 69af7e23a6870df2ea6fa79ca16493d59b3eebeb upstream.
Since get_kprobe_ctlblk() uses smp_processor_id() to access per-cpu variable, it hits smp_processor_id sanity check as below.
[ 7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 7.007859] caller is debug_smp_processor_id+0x20/0x24 [ 7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1 [ 7.008890] Hardware name: Generic DT based system [ 7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24) [ 7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98) [ 7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c) [ 7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24) [ 7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4) [ 7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0)
To fix this issue, call get_kprobe_ctlblk() right after irq-disabled since that disables preemption.
Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/probes/kprobes/opt-arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm/probes/kprobes/opt-arm.c +++ b/arch/arm/probes/kprobes/opt-arm.c @@ -165,13 +165,14 @@ optimized_callback(struct optimized_kpro { unsigned long flags; struct kprobe *p = &op->kp; - struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); + struct kprobe_ctlblk *kcb;
/* Save skipped registers */ regs->ARM_pc = (unsigned long)op->kp.addr; regs->ARM_ORIG_r0 = ~0UL;
local_irq_save(flags); + kcb = get_kprobe_ctlblk();
if (kprobe_running()) { kprobes_inc_nmissed_count(&op->kp);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit 70948c05fdde0aac32f9667856a88725c192fa40 upstream.
Prohibit probing on optimized_callback() because it is called from kprobes itself. If we put a kprobes on it, that will cause a recursive call loop. Mark it NOKPROBE_SYMBOL.
Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/probes/kprobes/opt-arm.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm/probes/kprobes/opt-arm.c +++ b/arch/arm/probes/kprobes/opt-arm.c @@ -192,6 +192,7 @@ optimized_callback(struct optimized_kpro
local_irq_restore(flags); } +NOKPROBE_SYMBOL(optimized_callback)
int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *orig) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit 0d73c3f8e7f6ee2aab1bb350f60c180f5ae21a2c upstream.
Since do_undefinstr() uses get_user to get the undefined instruction, it can be called before kprobes processes recursive check. This can cause an infinit recursive exception. Prohibit probing on get_user functions.
Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/include/asm/assembler.h | 10 ++++++++++ arch/arm/lib/getuser.S | 10 ++++++++++ 2 files changed, 20 insertions(+)
--- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -530,4 +530,14 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) #endif .endm
+#ifdef CONFIG_KPROBES +#define _ASM_NOKPROBE(entry) \ + .pushsection "_kprobe_blacklist", "aw" ; \ + .balign 4 ; \ + .long entry; \ + .popsection +#else +#define _ASM_NOKPROBE(entry) +#endif + #endif /* __ASM_ASSEMBLER_H__ */ --- a/arch/arm/lib/getuser.S +++ b/arch/arm/lib/getuser.S @@ -38,6 +38,7 @@ ENTRY(__get_user_1) mov r0, #0 ret lr ENDPROC(__get_user_1) +_ASM_NOKPROBE(__get_user_1)
ENTRY(__get_user_2) check_uaccess r0, 2, r1, r2, __get_user_bad @@ -58,6 +59,7 @@ rb .req r0 mov r0, #0 ret lr ENDPROC(__get_user_2) +_ASM_NOKPROBE(__get_user_2)
ENTRY(__get_user_4) check_uaccess r0, 4, r1, r2, __get_user_bad @@ -65,6 +67,7 @@ ENTRY(__get_user_4) mov r0, #0 ret lr ENDPROC(__get_user_4) +_ASM_NOKPROBE(__get_user_4)
ENTRY(__get_user_8) check_uaccess r0, 8, r1, r2, __get_user_bad8 @@ -78,6 +81,7 @@ ENTRY(__get_user_8) mov r0, #0 ret lr ENDPROC(__get_user_8) +_ASM_NOKPROBE(__get_user_8)
#ifdef __ARMEB__ ENTRY(__get_user_32t_8) @@ -91,6 +95,7 @@ ENTRY(__get_user_32t_8) mov r0, #0 ret lr ENDPROC(__get_user_32t_8) +_ASM_NOKPROBE(__get_user_32t_8)
ENTRY(__get_user_64t_1) check_uaccess r0, 1, r1, r2, __get_user_bad8 @@ -98,6 +103,7 @@ ENTRY(__get_user_64t_1) mov r0, #0 ret lr ENDPROC(__get_user_64t_1) +_ASM_NOKPROBE(__get_user_64t_1)
ENTRY(__get_user_64t_2) check_uaccess r0, 2, r1, r2, __get_user_bad8 @@ -114,6 +120,7 @@ rb .req r0 mov r0, #0 ret lr ENDPROC(__get_user_64t_2) +_ASM_NOKPROBE(__get_user_64t_2)
ENTRY(__get_user_64t_4) check_uaccess r0, 4, r1, r2, __get_user_bad8 @@ -121,6 +128,7 @@ ENTRY(__get_user_64t_4) mov r0, #0 ret lr ENDPROC(__get_user_64t_4) +_ASM_NOKPROBE(__get_user_64t_4) #endif
__get_user_bad8: @@ -131,6 +139,8 @@ __get_user_bad: ret lr ENDPROC(__get_user_bad) ENDPROC(__get_user_bad8) +_ASM_NOKPROBE(__get_user_bad) +_ASM_NOKPROBE(__get_user_bad8)
.pushsection __ex_table, "a" .long 1b, __get_user_bad
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 9a8fca62aacc1599fea8e813d01e1955513e4fad upstream.
If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up not logging all its xattrs. This results in deleting all xattrs when replying the log after a power failure.
Trivial reproducer
$ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt
$ touch /mnt/foobar $ setfattr -n user.xa -v qwerty /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar
$ sync
$ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power failure>
$ mount /dev/sdb /mnt $ getfattr --absolute-names --dump /mnt/foobar <empty output> $
So fix this by making sure all xattrs are logged if we log a file's inode item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor BTRFS_INODE_COPY_EVERYTHING were set in the inode.
Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path") Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/tree-log.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4568,6 +4568,7 @@ static int btrfs_log_inode(struct btrfs_ struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; u64 logged_isize = 0; bool need_log_inode_item = true; + bool xattrs_logged = false;
path = btrfs_alloc_path(); if (!path) @@ -4808,6 +4809,7 @@ next_slot: err = btrfs_log_all_xattrs(trans, root, inode, path, dst_path); if (err) goto out_unlock; + xattrs_logged = true; if (max_key.type >= BTRFS_EXTENT_DATA_KEY && !fast_search) { btrfs_release_path(path); btrfs_release_path(dst_path); @@ -4820,6 +4822,11 @@ log_extents: btrfs_release_path(dst_path); if (need_log_inode_item) { err = log_inode_item(trans, log, dst_path, inode); + if (!err && !xattrs_logged) { + err = btrfs_log_all_xattrs(trans, root, inode, path, + dst_path); + btrfs_release_path(path); + } if (err) goto out_unlock; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anand Jain anand.jain@oracle.com
commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream.
We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance() only, which isn't called during the remount. So when resuming from the paused balance we hit the bug:
kernel: kernel BUG at fs/btrfs/volumes.c:3890! :: kernel: balance_kthread+0x51/0x60 [btrfs] kernel: kthread+0x111/0x130 :: kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8
Reproducer: On a mounted filesystem:
btrfs balance start --full-balance /btrfs btrfs balance pause /btrfs mount -o remount,ro /dev/sdb /btrfs mount -o remount,rw /dev/sdb /btrfs
To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async().
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/volumes.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3850,6 +3850,15 @@ int btrfs_resume_balance_async(struct bt return 0; }
+ /* + * A ro->rw remount sequence should continue with the paused balance + * regardless of who pauses it, system or the user as of now, so set + * the resume flag. + */ + spin_lock(&fs_info->balance_lock); + fs_info->balance_ctl->flags |= BTRFS_BALANCE_RESUME; + spin_unlock(&fs_info->balance_lock); + tsk = kthread_run(balance_kthread, fs_info, "btrfs-balance"); return PTR_ERR_OR_ZERO(tsk); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liu Bo bo.liu@linux.alibaba.com
commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream.
If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e.
btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child
Unfortunately, the parent lock got released before reading child, so commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk.
A simple PoC is included in btrfs/124,
0. A two-disk raid1 btrfs,
1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet.
3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list.
4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem.
5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A.
6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk.
To avoid the problem, parent transid needs to be passed to read_tree_block().
In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child.
This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760415c4). The fix is to replace 0 by 'gen'.
Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Filipe Manana fdmanana@suse.com Reviewed-by: Qu Wenruo wqu@suse.com [ update changelog ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ctree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2497,10 +2497,8 @@ read_block_for_search(struct btrfs_trans if (p->reada) reada_for_search(root, p, level, slot, key->objectid);
- btrfs_release_path(p); - ret = -EAGAIN; - tmp = read_tree_block(root, blocknr, 0); + tmp = read_tree_block(root, blocknr, gen); if (!IS_ERR(tmp)) { /* * If the read above didn't mark this buffer up to date, @@ -2512,6 +2510,8 @@ read_block_for_search(struct btrfs_trans ret = -EIO; free_extent_buffer(tmp); } + + btrfs_release_path(p); return ret; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit 113f99c3358564a0647d444c2ae34e8b1abfd5b9 ]
Device features may change during transmission. In particular with corking, a device may toggle scatter-gather in between allocating and writing to an skb.
Do not unconditionally assume that !NETIF_F_SG at write time implies that the same held at alloc time and thus the skb has sufficient tailroom.
This issue predates git history.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Eric Dumazet edumazet@google.com Signed-off-by: Willem de Bruijn willemb@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_output.c | 3 ++- net/ipv6/ip6_output.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1062,7 +1062,8 @@ alloc_new_skb: if (copy > length) copy = length;
- if (!(rt->dst.dev->features&NETIF_F_SG)) { + if (!(rt->dst.dev->features&NETIF_F_SG) && + skb_tailroom(skb) >= copy) { unsigned int off;
off = skb->len; --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1529,7 +1529,8 @@ alloc_new_skb: if (copy > length) copy = length;
- if (!(rt->dst.dev->features&NETIF_F_SG)) { + if (!(rt->dst.dev->features&NETIF_F_SG) && + skb_tailroom(skb) >= copy) { unsigned int off;
off = skb->len;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit b84bbaf7a6c8cca24f8acf25a2c8e46913a947ba ]
Packet sockets allow construction of packets shorter than dev->hard_header_len to accommodate protocols with variable length link layer headers. These packets are padded to dev->hard_header_len, because some device drivers interpret that as a minimum packet size.
packet_snd reserves dev->hard_header_len bytes on allocation. SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that link layer headers are stored in the reserved range. SOCK_RAW sockets do the same in tpacket_snd, but not in packet_snd.
Syzbot was able to send a zero byte packet to a device with massive 116B link layer header, causing padding to cross over into skb_shinfo. Fix this by writing from the start of the llheader reserved range also in the case of packet_snd/SOCK_RAW.
Update skb_set_network_header to the new offset. This also corrects it for SOCK_DGRAM, where it incorrectly double counted reserve due to the skb_push in dev_hard_header.
Fixes: 9ed988cd5915 ("packet: validate variable length ll headers") Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2771,13 +2771,15 @@ static int packet_snd(struct socket *soc if (skb == NULL) goto out_unlock;
- skb_set_network_header(skb, reserve); + skb_reset_network_header(skb);
err = -EINVAL; if (sock->type == SOCK_DGRAM) { offset = dev_hard_header(skb, dev, ntohs(proto), addr, NULL, len); if (unlikely(offset < 0)) goto out_free; + } else if (reserve) { + skb_push(skb, reserve); }
/* Returns -EFAULT on error */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 9709020c86f6bf8439ca3effc58cfca49a5de192 ]
We must not call sock_diag_has_destroy_listeners(sk) on a socket that has no reference on net structure.
BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline] BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609 Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline] __sk_free+0x329/0x340 net/core/sock.c:1609 sk_free+0x42/0x50 net/core/sock.c:1623 sock_put include/net/sock.h:1664 [inline] reqsk_free include/net/request_sock.h:116 [inline] reqsk_put include/net/request_sock.h:124 [inline] inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline] reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54 RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000 RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680 RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0xc2/0x440 arch/x86/kernel/process.c:354 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x395/0x560 kernel/sched/idle.c:262 cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368 start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Allocated by task 4557: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 kmem_cache_zalloc include/linux/slab.h:691 [inline] net_alloc net/core/net_namespace.c:383 [inline] copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423 create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206 ksys_unshare+0x708/0xf90 kernel/fork.c:2408 __do_sys_unshare kernel/fork.c:2476 [inline] __se_sys_unshare kernel/fork.c:2474 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 69: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 net_free net/core/net_namespace.c:399 [inline] net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406 net_drop_ns net/core/net_namespace.c:405 [inline] cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the object at ffff88018a02c140 which belongs to the cache net_namespace of size 8832 The buggy address is located 8800 bytes inside of 8832-byte region [ffff88018a02c140, ffff88018a02e3c0) The buggy address belongs to the page: page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0 flags: 0x2fffc0000008100(slab|head) raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001 raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000 page dumped because: kasan: bad access detected
Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Craig Gallek kraig@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/sock.c +++ b/net/core/sock.c @@ -1474,7 +1474,7 @@ void sk_destruct(struct sock *sk)
static void __sk_free(struct sock *sk) { - if (unlikely(sock_diag_has_destroy_listeners(sk) && sk->sk_net_refcnt)) + if (unlikely(sk->sk_net_refcnt && sock_diag_has_destroy_listeners(sk))) sock_diag_broadcast_destroy(sk); else sk_destruct(sk);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7f582b248d0a86bae5788c548d7bb5bca6f7691a ]
syzkaller found a reliable way to crash the host, hitting a BUG() in __tcp_retransmit_skb()
Malicous MSG_FASTOPEN is the root cause. We need to purge write queue in tcp_connect_init() at the point we init snd_una/write_seq.
This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
kernel BUG at net/ipv4/tcp_output.c:2837! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837 RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206 RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49 RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005 RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2 R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80 FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923 tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488 tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573 tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Yuchung Cheng ycheng@google.com Cc: Neal Cardwell ncardwell@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_output.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2587,8 +2587,10 @@ int __tcp_retransmit_skb(struct sock *sk return -EBUSY;
if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) { - if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una)) - BUG(); + if (unlikely(before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))) { + WARN_ON_ONCE(1); + return -EINVAL; + } if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) return -ENOMEM; } @@ -3117,6 +3119,7 @@ static void tcp_connect_init(struct sock sock_reset_flag(sk, SOCK_DONE); tp->snd_wnd = 0; tcp_init_wl(tp, 0); + tcp_write_queue_purge(sk); tp->snd_una = tp->write_seq; tp->snd_sml = tp->write_seq; tp->snd_up = tp->write_seq;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream.
open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks *not* freed...
Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93b972 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr().
Cc: stable@kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext2/inode.c | 10 ---------- 1 file changed, 10 deletions(-)
--- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1175,21 +1175,11 @@ do_indirects:
static void ext2_truncate_blocks(struct inode *inode, loff_t offset) { - /* - * XXX: it seems like a bug here that we don't allow - * IS_APPEND inode to have blocks-past-i_size trimmed off. - * review and fix this. - * - * Also would be nice to be able to handle IO errors and such, - * but that's probably too much to ask. - */ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))) return; if (ext2_inode_is_fast_symlink(inode)) return; - if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) - return;
dax_sem_down_write(EXT2_I(inode)); __ext2_truncate_blocks(inode, offset);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit fba9eb7946251d6e420df3bdf7bc45195be7be9a ]
Add a header with macros usable in assembler files to emit alternative code sequences. It works analog to the alternatives for inline assmeblies in C files, with the same restrictions and capabilities. The syntax is
ALTERNATIVE "<default instructions sequence>", \ "<alternative instructions sequence>", \ "<features-bit>" and
ALTERNATIVE_2 "<default instructions sequence>", \ "<alternative instructions sqeuence #1>", \ "<feature-bit #1>", "<alternative instructions sqeuence #2>", \ "<feature-bit #2>"
Reviewed-by: Vasily Gorbik gor@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/alternative-asm.h | 108 ++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 arch/s390/include/asm/alternative-asm.h
--- /dev/null +++ b/arch/s390/include/asm/alternative-asm.h @@ -0,0 +1,108 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_S390_ALTERNATIVE_ASM_H +#define _ASM_S390_ALTERNATIVE_ASM_H + +#ifdef __ASSEMBLY__ + +/* + * Check the length of an instruction sequence. The length may not be larger + * than 254 bytes and it has to be divisible by 2. + */ +.macro alt_len_check start,end + .if ( \end - \start ) > 254 + .error "cpu alternatives does not support instructions blocks > 254 bytes\n" + .endif + .if ( \end - \start ) % 2 + .error "cpu alternatives instructions length is odd\n" + .endif +.endm + +/* + * Issue one struct alt_instr descriptor entry (need to put it into + * the section .altinstructions, see below). This entry contains + * enough information for the alternatives patching code to patch an + * instruction. See apply_alternatives(). + */ +.macro alt_entry orig_start, orig_end, alt_start, alt_end, feature + .long \orig_start - . + .long \alt_start - . + .word \feature + .byte \orig_end - \orig_start + .byte \alt_end - \alt_start +.endm + +/* + * Fill up @bytes with nops. The macro emits 6-byte nop instructions + * for the bulk of the area, possibly followed by a 4-byte and/or + * a 2-byte nop if the size of the area is not divisible by 6. + */ +.macro alt_pad_fill bytes + .fill ( \bytes ) / 6, 6, 0xc0040000 + .fill ( \bytes ) % 6 / 4, 4, 0x47000000 + .fill ( \bytes ) % 6 % 4 / 2, 2, 0x0700 +.endm + +/* + * Fill up @bytes with nops. If the number of bytes is larger + * than 6, emit a jg instruction to branch over all nops, then + * fill an area of size (@bytes - 6) with nop instructions. + */ +.macro alt_pad bytes + .if ( \bytes > 0 ) + .if ( \bytes > 6 ) + jg . + \bytes + alt_pad_fill \bytes - 6 + .else + alt_pad_fill \bytes + .endif + .endif +.endm + +/* + * Define an alternative between two instructions. If @feature is + * present, early code in apply_alternatives() replaces @oldinstr with + * @newinstr. ".skip" directive takes care of proper instruction padding + * in case @newinstr is longer than @oldinstr. + */ +.macro ALTERNATIVE oldinstr, newinstr, feature + .pushsection .altinstr_replacement,"ax" +770: \newinstr +771: .popsection +772: \oldinstr +773: alt_len_check 770b, 771b + alt_len_check 772b, 773b + alt_pad ( ( 771b - 770b ) - ( 773b - 772b ) ) +774: .pushsection .altinstructions,"a" + alt_entry 772b, 774b, 770b, 771b, \feature + .popsection +.endm + +/* + * Define an alternative between two instructions. If @feature is + * present, early code in apply_alternatives() replaces @oldinstr with + * @newinstr. ".skip" directive takes care of proper instruction padding + * in case @newinstr is longer than @oldinstr. + */ +.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2 + .pushsection .altinstr_replacement,"ax" +770: \newinstr1 +771: \newinstr2 +772: .popsection +773: \oldinstr +774: alt_len_check 770b, 771b + alt_len_check 771b, 772b + alt_len_check 773b, 774b + .if ( 771b - 770b > 772b - 771b ) + alt_pad ( ( 771b - 770b ) - ( 774b - 773b ) ) + .else + alt_pad ( ( 772b - 771b ) - ( 774b - 773b ) ) + .endif +775: .pushsection .altinstructions,"a" + alt_entry 773b, 775b, 770b, 771b,\feature1 + alt_entry 773b, 775b, 771b, 772b,\feature2 + .popsection +.endm + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_S390_ALTERNATIVE_ASM_H */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6dd85fbb87d1d6b87a3b1f02ca28d7b2abd2e7ba ]
To be able to use the expoline branches in different assembler files move the associated macros from entry.S to a new header nospec-insn.h.
While we are at it make the macros a bit nicer to use.
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/nospec-insn.h | 125 ++++++++++++++++++++++++++++++++++++ arch/s390/kernel/entry.S | 105 ++++++------------------------ 2 files changed, 149 insertions(+), 81 deletions(-) create mode 100644 arch/s390/include/asm/nospec-insn.h
--- /dev/null +++ b/arch/s390/include/asm/nospec-insn.h @@ -0,0 +1,125 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_S390_NOSPEC_ASM_H +#define _ASM_S390_NOSPEC_ASM_H + +#ifdef __ASSEMBLY__ + +#ifdef CONFIG_EXPOLINE + +/* + * The expoline macros are used to create thunks in the same format + * as gcc generates them. The 'comdat' section flag makes sure that + * the various thunks are merged into a single copy. + */ + .macro __THUNK_PROLOG_NAME name + .pushsection .text.\name,"axG",@progbits,\name,comdat + .globl \name + .hidden \name + .type \name,@function +\name: + .cfi_startproc + .endm + + .macro __THUNK_EPILOG + .cfi_endproc + .popsection + .endm + + .macro __THUNK_PROLOG_BR r1,r2 + __THUNK_PROLOG_NAME __s390x_indirect_jump_r\r2()use_r\r1 + .endm + + .macro __THUNK_BR r1,r2 + jg __s390x_indirect_jump_r\r2()use_r\r1 + .endm + + .macro __THUNK_BRASL r1,r2,r3 + brasl \r1,__s390x_indirect_jump_r\r3()use_r\r2 + .endm + + .macro __DECODE_RR expand,reg,ruse + .set __decode_fail,1 + .irp r1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \reg,%r\r1 + .irp r2,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \ruse,%r\r2 + \expand \r1,\r2 + .set __decode_fail,0 + .endif + .endr + .endif + .endr + .if __decode_fail == 1 + .error "__DECODE_RR failed" + .endif + .endm + + .macro __DECODE_RRR expand,rsave,rtarget,ruse + .set __decode_fail,1 + .irp r1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \rsave,%r\r1 + .irp r2,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \rtarget,%r\r2 + .irp r3,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \ruse,%r\r3 + \expand \r1,\r2,\r3 + .set __decode_fail,0 + .endif + .endr + .endif + .endr + .endif + .endr + .if __decode_fail == 1 + .error "__DECODE_RRR failed" + .endif + .endm + + .macro __THUNK_EX_BR reg,ruse +#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES + exrl 0,555f + j . +#else + larl \ruse,555f + ex 0,0(\ruse) + j . +#endif +555: br \reg + .endm + + .macro GEN_BR_THUNK reg,ruse=%r1 + __DECODE_RR __THUNK_PROLOG_BR,\reg,\ruse + __THUNK_EX_BR \reg,\ruse + __THUNK_EPILOG + .endm + + .macro BR_EX reg,ruse=%r1 +557: __DECODE_RR __THUNK_BR,\reg,\ruse + .pushsection .s390_indirect_branches,"a",@progbits + .long 557b-. + .popsection + .endm + + .macro BASR_EX rsave,rtarget,ruse=%r1 +559: __DECODE_RRR __THUNK_BRASL,\rsave,\rtarget,\ruse + .pushsection .s390_indirect_branches,"a",@progbits + .long 559b-. + .popsection + .endm + +#else + .macro GEN_BR_THUNK reg,ruse=%r1 + .endm + + .macro BR_EX reg,ruse=%r1 + br \reg + .endm + + .macro BASR_EX rsave,rtarget,ruse=%r1 + basr \rsave,\rtarget + .endm +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_S390_NOSPEC_ASM_H */ --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -23,6 +23,7 @@ #include <asm/vx-insn.h> #include <asm/setup.h> #include <asm/nmi.h> +#include <asm/nospec-insn.h>
__PT_R0 = __PT_GPRS __PT_R1 = __PT_GPRS + 8 @@ -225,74 +226,16 @@ _PIF_WORK = (_PIF_PER_TRAP) .popsection .endm
-#ifdef CONFIG_EXPOLINE - - .macro GEN_BR_THUNK name,reg,tmp - .section .text.\name,"axG",@progbits,\name,comdat - .globl \name - .hidden \name - .type \name,@function -\name: - .cfi_startproc -#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES - exrl 0,0f -#else - larl \tmp,0f - ex 0,0(\tmp) -#endif - j . -0: br \reg - .cfi_endproc - .endm - - GEN_BR_THUNK __s390x_indirect_jump_r1use_r9,%r9,%r1 - GEN_BR_THUNK __s390x_indirect_jump_r1use_r14,%r14,%r1 - GEN_BR_THUNK __s390x_indirect_jump_r11use_r14,%r14,%r11 - - .macro BASR_R14_R9 -0: brasl %r14,__s390x_indirect_jump_r1use_r9 - .pushsection .s390_indirect_branches,"a",@progbits - .long 0b-. - .popsection - .endm - - .macro BR_R1USE_R14 -0: jg __s390x_indirect_jump_r1use_r14 - .pushsection .s390_indirect_branches,"a",@progbits - .long 0b-. - .popsection - .endm - - .macro BR_R11USE_R14 -0: jg __s390x_indirect_jump_r11use_r14 - .pushsection .s390_indirect_branches,"a",@progbits - .long 0b-. - .popsection - .endm - -#else /* CONFIG_EXPOLINE */ - - .macro BASR_R14_R9 - basr %r14,%r9 - .endm - - .macro BR_R1USE_R14 - br %r14 - .endm - - .macro BR_R11USE_R14 - br %r14 - .endm - -#endif /* CONFIG_EXPOLINE */ - + GEN_BR_THUNK %r9 + GEN_BR_THUNK %r14 + GEN_BR_THUNK %r14,%r11
.section .kprobes.text, "ax"
ENTRY(__bpon) .globl __bpon BPON - BR_R1USE_R14 + BR_EX %r14
/* * Scheduler resume function, called by switch_to @@ -322,7 +265,7 @@ ENTRY(__switch_to) TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_LPP jz 0f .insn s,0xb2800000,__LC_LPP # set program parameter -0: BR_R1USE_R14 +0: BR_EX %r14
.L__critical_start:
@@ -388,7 +331,7 @@ sie_exit: xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code - BR_R1USE_R14 + BR_EX %r14 .Lsie_fault: lghi %r14,-EFAULT stg %r14,__SF_EMPTY+16(%r15) # set exit reason code @@ -445,7 +388,7 @@ ENTRY(system_call) lgf %r9,0(%r8,%r10) # get system call add. TSTMSK __TI_flags(%r12),_TIF_TRACE jnz .Lsysc_tracesys - BASR_R14_R9 # call sys_xxxx + BASR_EX %r14,%r9 # call sys_xxxx stg %r2,__PT_R2(%r11) # store return value
.Lsysc_return: @@ -585,7 +528,7 @@ ENTRY(system_call) lmg %r3,%r7,__PT_R3(%r11) stg %r7,STACK_FRAME_OVERHEAD(%r15) lg %r2,__PT_ORIG_GPR2(%r11) - BASR_R14_R9 # call sys_xxx + BASR_EX %r14,%r9 # call sys_xxx stg %r2,__PT_R2(%r11) # store return value .Lsysc_tracenogo: TSTMSK __TI_flags(%r12),_TIF_TRACE @@ -609,7 +552,7 @@ ENTRY(ret_from_fork) lmg %r9,%r10,__PT_R9(%r11) # load gprs ENTRY(kernel_thread_starter) la %r2,0(%r10) - BASR_R14_R9 + BASR_EX %r14,%r9 j .Lsysc_tracenogo
/* @@ -685,7 +628,7 @@ ENTRY(pgm_check_handler) je .Lpgm_return lgf %r9,0(%r10,%r1) # load address of handler routine lgr %r2,%r11 # pass pointer to pt_regs - BASR_R14_R9 # branch to interrupt-handler + BASR_EX %r14,%r9 # branch to interrupt-handler .Lpgm_return: LOCKDEP_SYS_EXIT tm __PT_PSW+1(%r11),0x01 # returning to user ? @@ -962,7 +905,7 @@ ENTRY(psw_idle) stpt __TIMER_IDLE_ENTER(%r2) .Lpsw_idle_lpsw: lpswe __SF_EMPTY(%r15) - BR_R1USE_R14 + BR_EX %r14 .Lpsw_idle_end:
/* @@ -1007,7 +950,7 @@ ENTRY(save_fpu_regs) .Lsave_fpu_regs_done: oi __LC_CPU_FLAGS+7,_CIF_FPU .Lsave_fpu_regs_exit: - BR_R1USE_R14 + BR_EX %r14 .Lsave_fpu_regs_end:
/* @@ -1054,7 +997,7 @@ load_fpu_regs: .Lload_fpu_regs_done: ni __LC_CPU_FLAGS+7,255-_CIF_FPU .Lload_fpu_regs_exit: - BR_R1USE_R14 + BR_EX %r14 .Lload_fpu_regs_end:
.L__critical_end: @@ -1227,7 +1170,7 @@ cleanup_critical: jl 0f clg %r9,BASED(.Lcleanup_table+104) # .Lload_fpu_regs_end jl .Lcleanup_load_fpu_regs -0: BR_R11USE_R14 +0: BR_EX %r14
.align 8 .Lcleanup_table: @@ -1257,7 +1200,7 @@ cleanup_critical: ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce larl %r9,sie_exit # skip forward to sie_exit - BR_R11USE_R14 + BR_EX %r14 #endif
.Lcleanup_system_call: @@ -1315,7 +1258,7 @@ cleanup_critical: stg %r15,56(%r11) # r15 stack pointer # set new psw address and exit larl %r9,.Lsysc_do_svc - BR_R11USE_R14 + BR_EX %r14,%r11 .Lcleanup_system_call_insn: .quad system_call .quad .Lsysc_stmg @@ -1325,7 +1268,7 @@ cleanup_critical:
.Lcleanup_sysc_tif: larl %r9,.Lsysc_tif - BR_R11USE_R14 + BR_EX %r14,%r11
.Lcleanup_sysc_restore: # check if stpt has been executed @@ -1342,14 +1285,14 @@ cleanup_critical: mvc 0(64,%r11),__PT_R8(%r9) lmg %r0,%r7,__PT_R0(%r9) 1: lmg %r8,%r9,__LC_RETURN_PSW - BR_R11USE_R14 + BR_EX %r14,%r11 .Lcleanup_sysc_restore_insn: .quad .Lsysc_exit_timer .quad .Lsysc_done - 4
.Lcleanup_io_tif: larl %r9,.Lio_tif - BR_R11USE_R14 + BR_EX %r14,%r11
.Lcleanup_io_restore: # check if stpt has been executed @@ -1363,7 +1306,7 @@ cleanup_critical: mvc 0(64,%r11),__PT_R8(%r9) lmg %r0,%r7,__PT_R0(%r9) 1: lmg %r8,%r9,__LC_RETURN_PSW - BR_R11USE_R14 + BR_EX %r14,%r11 .Lcleanup_io_restore_insn: .quad .Lio_exit_timer .quad .Lio_done - 4 @@ -1415,17 +1358,17 @@ cleanup_critical: # prepare return psw nihh %r8,0xfcfd # clear irq & wait state bits lg %r9,48(%r11) # return from psw_idle - BR_R11USE_R14 + BR_EX %r14,%r11 .Lcleanup_idle_insn: .quad .Lpsw_idle_lpsw
.Lcleanup_save_fpu_regs: larl %r9,save_fpu_regs - BR_R11USE_R14 + BR_EX %r14,%r11
.Lcleanup_load_fpu_regs: larl %r9,load_fpu_regs - BR_R11USE_R14 + BR_EX %r14,%r11
/* * Integer constants
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 97489e0663fa700d6e7febddc43b58df98d7bcda ]
The return from the memmove, memset, memcpy, __memset16, __memset32 and __memset64 functions are done with "br %r14". These are indirect branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner brueckner@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/lib/mem.S | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/s390/lib/mem.S +++ b/arch/s390/lib/mem.S @@ -5,6 +5,9 @@ */
#include <linux/linkage.h> +#include <asm/nospec-insn.h> + + GEN_BR_THUNK %r14
/* * memset implementation @@ -38,7 +41,7 @@ ENTRY(memset) .Lmemset_clear_rest: larl %r3,.Lmemset_xc ex %r4,0(%r3) - br %r14 + BR_EX %r14 .Lmemset_fill: stc %r3,0(%r2) cghi %r4,1 @@ -55,7 +58,7 @@ ENTRY(memset) .Lmemset_fill_rest: larl %r3,.Lmemset_mvc ex %r4,0(%r3) - br %r14 + BR_EX %r14 .Lmemset_xc: xc 0(1,%r1),0(%r1) .Lmemset_mvc: @@ -77,7 +80,7 @@ ENTRY(memcpy) .Lmemcpy_rest: larl %r5,.Lmemcpy_mvc ex %r4,0(%r5) - br %r14 + BR_EX %r14 .Lmemcpy_loop: mvc 0(256,%r1),0(%r3) la %r1,256(%r1)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 23a4d7fd34856da8218c4cfc23dba7a6ec0a423a ]
The return from the ftrace_stub, _mcount, ftrace_caller and return_to_handler functions is done with "br %r14" and "br %r1". These are indirect branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
The ftrace_caller function is a special case as it returns to the start of a function and may only use %r0 and %r1. For a pre z10 machine the standard execute trampoline uses a LARL + EX to do this, but this requires *two* registers in the range %r1..%r15. To get around this the 'br %r1' located in the lowcore is used, then the EX instruction does not need an address register. But the lowcore trick may only be used for pre z14 machines, with noexec=on the mapping for the first page may not contain instructions. The solution for that is an ALTERNATIVE in the expoline THUNK generated by 'GEN_BR_THUNK %r1' to switch to EXRL, this relies on the fact that a machine that supports noexec=on has EXRL as well.
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/nospec-insn.h | 11 +++++++++++ arch/s390/kernel/asm-offsets.c | 1 + arch/s390/kernel/mcount.S | 14 +++++++++----- 3 files changed, 21 insertions(+), 5 deletions(-)
--- a/arch/s390/include/asm/nospec-insn.h +++ b/arch/s390/include/asm/nospec-insn.h @@ -2,6 +2,9 @@ #ifndef _ASM_S390_NOSPEC_ASM_H #define _ASM_S390_NOSPEC_ASM_H
+#include <asm/alternative-asm.h> +#include <asm/asm-offsets.h> + #ifdef __ASSEMBLY__
#ifdef CONFIG_EXPOLINE @@ -76,13 +79,21 @@ .endm
.macro __THUNK_EX_BR reg,ruse + # Be very careful when adding instructions to this macro! + # The ALTERNATIVE replacement code has a .+10 which targets + # the "br \reg" after the code has been patched. #ifdef CONFIG_HAVE_MARCH_Z10_FEATURES exrl 0,555f j . #else + .ifc \reg,%r1 + ALTERNATIVE "ex %r0,_LC_BR_R1", ".insn ril,0xc60000000000,0,.+10", 35 + j . + .else larl \ruse,555f ex 0,0(\ruse) j . + .endif #endif 555: br \reg .endm --- a/arch/s390/kernel/asm-offsets.c +++ b/arch/s390/kernel/asm-offsets.c @@ -170,6 +170,7 @@ int main(void) OFFSET(__LC_MACHINE_FLAGS, _lowcore, machine_flags); OFFSET(__LC_GMAP, _lowcore, gmap); OFFSET(__LC_PASTE, _lowcore, paste); + OFFSET(__LC_BR_R1, _lowcore, br_r1_trampoline); /* software defined ABI-relevant lowcore locations 0xe00 - 0xe20 */ OFFSET(__LC_DUMP_REIPL, _lowcore, ipib); /* hardware defined lowcore locations 0x1000 - 0x18ff */ --- a/arch/s390/kernel/mcount.S +++ b/arch/s390/kernel/mcount.S @@ -8,12 +8,16 @@ #include <linux/linkage.h> #include <asm/asm-offsets.h> #include <asm/ftrace.h> +#include <asm/nospec-insn.h> #include <asm/ptrace.h>
+ GEN_BR_THUNK %r1 + GEN_BR_THUNK %r14 + .section .kprobes.text, "ax"
ENTRY(ftrace_stub) - br %r14 + BR_EX %r14
#define STACK_FRAME_SIZE (STACK_FRAME_OVERHEAD + __PT_SIZE) #define STACK_PTREGS (STACK_FRAME_OVERHEAD) @@ -21,7 +25,7 @@ ENTRY(ftrace_stub) #define STACK_PTREGS_PSW (STACK_PTREGS + __PT_PSW)
ENTRY(_mcount) - br %r14 + BR_EX %r14
ENTRY(ftrace_caller) .globl ftrace_regs_caller @@ -49,7 +53,7 @@ ENTRY(ftrace_caller) #endif lgr %r3,%r14 la %r5,STACK_PTREGS(%r15) - basr %r14,%r1 + BASR_EX %r14,%r1 #ifdef CONFIG_FUNCTION_GRAPH_TRACER # The j instruction gets runtime patched to a nop instruction. # See ftrace_enable_ftrace_graph_caller. @@ -64,7 +68,7 @@ ftrace_graph_caller_end: #endif lg %r1,(STACK_PTREGS_PSW+8)(%r15) lmg %r2,%r15,(STACK_PTREGS_GPRS+2*8)(%r15) - br %r1 + BR_EX %r1
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -77,6 +81,6 @@ ENTRY(return_to_handler) aghi %r15,STACK_FRAME_OVERHEAD lgr %r14,%r2 lmg %r2,%r5,32(%r15) - br %r14 + BR_EX %r14
#endif
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit c50c84c3ac4d5db683904bdb3257798b6ef980ae ]
The assember code in arch/s390/kernel uses a few more indirect branches which need to be done with execute trampolines for CONFIG_EXPOLINE=y.
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner brueckner@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/base.S | 24 ++++++++++++++---------- arch/s390/kernel/reipl.S | 5 ++++- arch/s390/kernel/swsusp.S | 10 ++++++---- 3 files changed, 24 insertions(+), 15 deletions(-)
--- a/arch/s390/kernel/base.S +++ b/arch/s390/kernel/base.S @@ -8,18 +8,22 @@
#include <linux/linkage.h> #include <asm/asm-offsets.h> +#include <asm/nospec-insn.h> #include <asm/ptrace.h> #include <asm/sigp.h>
+ GEN_BR_THUNK %r9 + GEN_BR_THUNK %r14 + ENTRY(s390_base_mcck_handler) basr %r13,0 0: lg %r15,__LC_PANIC_STACK # load panic stack aghi %r15,-STACK_FRAME_OVERHEAD larl %r1,s390_base_mcck_handler_fn - lg %r1,0(%r1) - ltgr %r1,%r1 + lg %r9,0(%r1) + ltgr %r9,%r9 jz 1f - basr %r14,%r1 + BASR_EX %r14,%r9 1: la %r1,4095 lmg %r0,%r15,__LC_GPREGS_SAVE_AREA-4095(%r1) lpswe __LC_MCK_OLD_PSW @@ -36,10 +40,10 @@ ENTRY(s390_base_ext_handler) basr %r13,0 0: aghi %r15,-STACK_FRAME_OVERHEAD larl %r1,s390_base_ext_handler_fn - lg %r1,0(%r1) - ltgr %r1,%r1 + lg %r9,0(%r1) + ltgr %r9,%r9 jz 1f - basr %r14,%r1 + BASR_EX %r14,%r9 1: lmg %r0,%r15,__LC_SAVE_AREA_ASYNC ni __LC_EXT_OLD_PSW+1,0xfd # clear wait state bit lpswe __LC_EXT_OLD_PSW @@ -56,10 +60,10 @@ ENTRY(s390_base_pgm_handler) basr %r13,0 0: aghi %r15,-STACK_FRAME_OVERHEAD larl %r1,s390_base_pgm_handler_fn - lg %r1,0(%r1) - ltgr %r1,%r1 + lg %r9,0(%r1) + ltgr %r9,%r9 jz 1f - basr %r14,%r1 + BASR_EX %r14,%r9 lmg %r0,%r15,__LC_SAVE_AREA_SYNC lpswe __LC_PGM_OLD_PSW 1: lpswe disabled_wait_psw-0b(%r13) @@ -116,7 +120,7 @@ ENTRY(diag308_reset) larl %r4,.Lcontinue_psw # Restore PSW flags lpswe 0(%r4) .Lcontinue: - br %r14 + BR_EX %r14 .align 16 .Lrestart_psw: .long 0x00080000,0x80000000 + .Lrestart_part2 --- a/arch/s390/kernel/reipl.S +++ b/arch/s390/kernel/reipl.S @@ -6,8 +6,11 @@
#include <linux/linkage.h> #include <asm/asm-offsets.h> +#include <asm/nospec-insn.h> #include <asm/sigp.h>
+ GEN_BR_THUNK %r14 + # # store_status # @@ -62,7 +65,7 @@ ENTRY(store_status) st %r3,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 4(%r1) larl %r2,store_status stg %r2,__LC_PSW_SAVE_AREA-SAVE_AREA_BASE + 8(%r1) - br %r14 + BR_EX %r14
.section .bss .align 8 --- a/arch/s390/kernel/swsusp.S +++ b/arch/s390/kernel/swsusp.S @@ -12,6 +12,7 @@ #include <asm/ptrace.h> #include <asm/thread_info.h> #include <asm/asm-offsets.h> +#include <asm/nospec-insn.h> #include <asm/sigp.h>
/* @@ -23,6 +24,8 @@ * (see below) in the resume process. * This function runs with disabled interrupts. */ + GEN_BR_THUNK %r14 + .section .text ENTRY(swsusp_arch_suspend) stmg %r6,%r15,__SF_GPRS(%r15) @@ -102,7 +105,7 @@ ENTRY(swsusp_arch_suspend) spx 0x318(%r1) lmg %r6,%r15,STACK_FRAME_OVERHEAD + __SF_GPRS(%r15) lghi %r2,0 - br %r14 + BR_EX %r14
/* * Restore saved memory image to correct place and restore register context. @@ -196,11 +199,10 @@ pgm_check_entry: larl %r15,init_thread_union ahi %r15,1<<(PAGE_SHIFT+THREAD_ORDER) larl %r2,.Lpanic_string - larl %r3,_sclp_print_early lghi %r1,0 sam31 sigp %r1,%r0,SIGP_SET_ARCHITECTURE - basr %r14,%r3 + brasl %r14,_sclp_print_early larl %r3,.Ldisabled_wait_31 lpsw 0(%r3) 4: @@ -266,7 +268,7 @@ restore_registers: /* Return 0 */ lmg %r6,%r15,STACK_FRAME_OVERHEAD + __SF_GPRS(%r15) lghi %r2,0 - br %r14 + BR_EX %r14
.section .data..nosave,"aw",@progbits .align 8
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 4253b0e0627ee3461e64c2495c616f1c8f6b127b ]
The nospec-branch.c file is compiled without the gcc options to generate expoline thunks. The return branch of the sysfs show functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect branch as well. These need to be compiled with expolines.
Move the sysfs functions for spectre reporting to a separate file and loose an '.' for one of the messages.
Cc: stable@vger.kernel.org # 4.16 Fixes: d424986f1d ("s390: add sysfs attributes for spectre") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/Makefile | 1 + arch/s390/kernel/nospec-branch.c | 18 ------------------ arch/s390/kernel/nospec-sysfs.c | 21 +++++++++++++++++++++ 3 files changed, 22 insertions(+), 18 deletions(-) create mode 100644 arch/s390/kernel/nospec-sysfs.c
--- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -49,6 +49,7 @@ obj-y += nospec-branch.o
extra-y += head.o head64.o vmlinux.lds
+obj-$(CONFIG_SYSFS) += nospec-sysfs.o CFLAGS_REMOVE_nospec-branch.o += $(CC_FLAGS_EXPOLINE)
obj-$(CONFIG_MODULES) += s390_ksyms.o module.o --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -44,24 +44,6 @@ static int __init nospec_report(void) } arch_initcall(nospec_report);
-#ifdef CONFIG_SYSFS -ssize_t cpu_show_spectre_v1(struct device *dev, - struct device_attribute *attr, char *buf) -{ - return sprintf(buf, "Mitigation: __user pointer sanitization\n"); -} - -ssize_t cpu_show_spectre_v2(struct device *dev, - struct device_attribute *attr, char *buf) -{ - if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) - return sprintf(buf, "Mitigation: execute trampolines\n"); - if (__test_facility(82, S390_lowcore.alt_stfle_fac_list)) - return sprintf(buf, "Mitigation: limited branch prediction.\n"); - return sprintf(buf, "Vulnerable\n"); -} -#endif - #ifdef CONFIG_EXPOLINE
int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); --- /dev/null +++ b/arch/s390/kernel/nospec-sysfs.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/device.h> +#include <linux/cpu.h> +#include <asm/facility.h> +#include <asm/nospec-branch.h> + +ssize_t cpu_show_spectre_v1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); +} + +ssize_t cpu_show_spectre_v2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) + return sprintf(buf, "Mitigation: execute trampolines\n"); + if (__test_facility(82, S390_lowcore.alt_stfle_fac_list)) + return sprintf(buf, "Mitigation: limited branch prediction\n"); + return sprintf(buf, "Vulnerable\n"); +}
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6deaa3bbca804b2a3627fd685f75de64da7be535 ]
The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition of the sk_load_word and sk_load_half functions.
Add support for branch-on-condition instructions contained in the thunk code of an expoline.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/nospec-insn.h | 57 ++++++++++++++++++++++++++++++++++++ arch/s390/kernel/nospec-branch.c | 25 ++++++++++++--- 2 files changed, 77 insertions(+), 5 deletions(-)
--- a/arch/s390/include/asm/nospec-insn.h +++ b/arch/s390/include/asm/nospec-insn.h @@ -32,10 +32,18 @@ __THUNK_PROLOG_NAME __s390x_indirect_jump_r\r2()use_r\r1 .endm
+ .macro __THUNK_PROLOG_BC d0,r1,r2 + __THUNK_PROLOG_NAME __s390x_indirect_branch_\d0()_\r2()use_\r1 + .endm + .macro __THUNK_BR r1,r2 jg __s390x_indirect_jump_r\r2()use_r\r1 .endm
+ .macro __THUNK_BC d0,r1,r2 + jg __s390x_indirect_branch_\d0()_\r2()use_\r1 + .endm + .macro __THUNK_BRASL r1,r2,r3 brasl \r1,__s390x_indirect_jump_r\r3()use_r\r2 .endm @@ -78,6 +86,23 @@ .endif .endm
+ .macro __DECODE_DRR expand,disp,reg,ruse + .set __decode_fail,1 + .irp r1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \reg,%r\r1 + .irp r2,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + .ifc \ruse,%r\r2 + \expand \disp,\r1,\r2 + .set __decode_fail,0 + .endif + .endr + .endif + .endr + .if __decode_fail == 1 + .error "__DECODE_DRR failed" + .endif + .endm + .macro __THUNK_EX_BR reg,ruse # Be very careful when adding instructions to this macro! # The ALTERNATIVE replacement code has a .+10 which targets @@ -98,12 +123,30 @@ 555: br \reg .endm
+ .macro __THUNK_EX_BC disp,reg,ruse +#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES + exrl 0,556f + j . +#else + larl \ruse,556f + ex 0,0(\ruse) + j . +#endif +556: b \disp(\reg) + .endm + .macro GEN_BR_THUNK reg,ruse=%r1 __DECODE_RR __THUNK_PROLOG_BR,\reg,\ruse __THUNK_EX_BR \reg,\ruse __THUNK_EPILOG .endm
+ .macro GEN_B_THUNK disp,reg,ruse=%r1 + __DECODE_DRR __THUNK_PROLOG_BC,\disp,\reg,\ruse + __THUNK_EX_BC \disp,\reg,\ruse + __THUNK_EPILOG + .endm + .macro BR_EX reg,ruse=%r1 557: __DECODE_RR __THUNK_BR,\reg,\ruse .pushsection .s390_indirect_branches,"a",@progbits @@ -111,6 +154,13 @@ .popsection .endm
+ .macro B_EX disp,reg,ruse=%r1 +558: __DECODE_DRR __THUNK_BC,\disp,\reg,\ruse + .pushsection .s390_indirect_branches,"a",@progbits + .long 558b-. + .popsection + .endm + .macro BASR_EX rsave,rtarget,ruse=%r1 559: __DECODE_RRR __THUNK_BRASL,\rsave,\rtarget,\ruse .pushsection .s390_indirect_branches,"a",@progbits @@ -122,10 +172,17 @@ .macro GEN_BR_THUNK reg,ruse=%r1 .endm
+ .macro GEN_B_THUNK disp,reg,ruse=%r1 + .endm + .macro BR_EX reg,ruse=%r1 br \reg .endm
+ .macro B_EX disp,reg,ruse=%r1 + b \disp(\reg) + .endm + .macro BASR_EX rsave,rtarget,ruse=%r1 basr \rsave,\rtarget .endm --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -94,7 +94,6 @@ static void __init_or_module __nospec_re s32 *epo;
/* Second part of the instruction replace is always a nop */ - memcpy(insnbuf + 2, (char[]) { 0x47, 0x00, 0x00, 0x00 }, 4); for (epo = start; epo < end; epo++) { instr = (u8 *) epo + *epo; if (instr[0] == 0xc0 && (instr[1] & 0x0f) == 0x04) @@ -115,18 +114,34 @@ static void __init_or_module __nospec_re br = thunk + (*(int *)(thunk + 2)) * 2; else continue; - if (br[0] != 0x07 || (br[1] & 0xf0) != 0xf0) + /* Check for unconditional branch 0x07f? or 0x47f???? */ + if ((br[0] & 0xbf) != 0x07 || (br[1] & 0xf0) != 0xf0) continue; + + memcpy(insnbuf + 2, (char[]) { 0x47, 0x00, 0x07, 0x00 }, 4); switch (type) { case BRCL_EXPOLINE: - /* brcl to thunk, replace with br + nop */ insnbuf[0] = br[0]; insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f); + if (br[0] == 0x47) { + /* brcl to b, replace with bc + nopr */ + insnbuf[2] = br[2]; + insnbuf[3] = br[3]; + } else { + /* brcl to br, replace with bcr + nop */ + } break; case BRASL_EXPOLINE: - /* brasl to thunk, replace with basr + nop */ - insnbuf[0] = 0x0d; insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f); + if (br[0] == 0x47) { + /* brasl to b, replace with bas + nopr */ + insnbuf[0] = 0x4d; + insnbuf[2] = br[2]; + insnbuf[3] = br[3]; + } else { + /* brasl to br, replace with basr + nop */ + insnbuf[0] = 0x0d; + } break; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit de5cb6eb514ebe241e3edeb290cb41deb380b81d ]
The BPF JIT need safe guarding against spectre v2 in the sk_load_xxx assembler stubs and the indirect branches generated by the JIT itself need to be converted to expolines.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/net/bpf_jit.S | 16 ++++++---- arch/s390/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 71 insertions(+), 8 deletions(-)
--- a/arch/s390/net/bpf_jit.S +++ b/arch/s390/net/bpf_jit.S @@ -8,6 +8,7 @@ */
#include <linux/linkage.h> +#include <asm/nospec-insn.h> #include "bpf_jit.h"
/* @@ -53,7 +54,7 @@ ENTRY(sk_load_##NAME##_pos); \ clg %r3,STK_OFF_HLEN(%r15); /* Offset + SIZE > hlen? */ \ jh sk_load_##NAME##_slow; \ LOAD %r14,-SIZE(%r3,%r12); /* Get data from skb */ \ - b OFF_OK(%r6); /* Return */ \ + B_EX OFF_OK,%r6; /* Return */ \ \ sk_load_##NAME##_slow:; \ lgr %r2,%r7; /* Arg1 = skb pointer */ \ @@ -63,11 +64,14 @@ sk_load_##NAME##_slow:; \ brasl %r14,skb_copy_bits; /* Get data from skb */ \ LOAD %r14,STK_OFF_TMP(%r15); /* Load from temp bufffer */ \ ltgr %r2,%r2; /* Set cc to (%r2 != 0) */ \ - br %r6; /* Return */ + BR_EX %r6; /* Return */
sk_load_common(word, 4, llgf) /* r14 = *(u32 *) (skb->data+offset) */ sk_load_common(half, 2, llgh) /* r14 = *(u16 *) (skb->data+offset) */
+ GEN_BR_THUNK %r6 + GEN_B_THUNK OFF_OK,%r6 + /* * Load 1 byte from SKB (optimized version) */ @@ -79,7 +83,7 @@ ENTRY(sk_load_byte_pos) clg %r3,STK_OFF_HLEN(%r15) # Offset >= hlen? jnl sk_load_byte_slow llgc %r14,0(%r3,%r12) # Get byte from skb - b OFF_OK(%r6) # Return OK + B_EX OFF_OK,%r6 # Return OK
sk_load_byte_slow: lgr %r2,%r7 # Arg1 = skb pointer @@ -89,7 +93,7 @@ sk_load_byte_slow: brasl %r14,skb_copy_bits # Get data from skb llgc %r14,STK_OFF_TMP(%r15) # Load result from temp buffer ltgr %r2,%r2 # Set cc to (%r2 != 0) - br %r6 # Return cc + BR_EX %r6 # Return cc
#define sk_negative_common(NAME, SIZE, LOAD) \ sk_load_##NAME##_slow_neg:; \ @@ -103,7 +107,7 @@ sk_load_##NAME##_slow_neg:; \ jz bpf_error; \ LOAD %r14,0(%r2); /* Get data from pointer */ \ xr %r3,%r3; /* Set cc to zero */ \ - br %r6; /* Return cc */ + BR_EX %r6; /* Return cc */
sk_negative_common(word, 4, llgf) sk_negative_common(half, 2, llgh) @@ -112,4 +116,4 @@ sk_negative_common(byte, 1, llgc) bpf_error: # force a return 0 from jit handler ltgr %r15,%r15 # Set condition code - br %r6 + BR_EX %r6 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -24,6 +24,8 @@ #include <linux/bpf.h> #include <asm/cacheflush.h> #include <asm/dis.h> +#include <asm/facility.h> +#include <asm/nospec-branch.h> #include "bpf_jit.h"
int bpf_jit_enable __read_mostly; @@ -41,6 +43,8 @@ struct bpf_jit { int base_ip; /* Base address for literal pool */ int ret0_ip; /* Address of return 0 */ int exit_ip; /* Address of exit */ + int r1_thunk_ip; /* Address of expoline thunk for 'br %r1' */ + int r14_thunk_ip; /* Address of expoline thunk for 'br %r14' */ int tail_call_start; /* Tail call start offset */ int labels[1]; /* Labels for local jumps */ }; @@ -248,6 +252,19 @@ static inline void reg_set_seen(struct b REG_SET_SEEN(b2); \ })
+#define EMIT6_PCREL_RILB(op, b, target) \ +({ \ + int rel = (target - jit->prg) / 2; \ + _EMIT6(op | reg_high(b) << 16 | rel >> 16, rel & 0xffff); \ + REG_SET_SEEN(b); \ +}) + +#define EMIT6_PCREL_RIL(op, target) \ +({ \ + int rel = (target - jit->prg) / 2; \ + _EMIT6(op | rel >> 16, rel & 0xffff); \ +}) + #define _EMIT6_IMM(op, imm) \ ({ \ unsigned int __imm = (imm); \ @@ -475,8 +492,45 @@ static void bpf_jit_epilogue(struct bpf_ EMIT4(0xb9040000, REG_2, BPF_REG_0); /* Restore registers */ save_restore_regs(jit, REGS_RESTORE); + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) { + jit->r14_thunk_ip = jit->prg; + /* Generate __s390_indirect_jump_r14 thunk */ + if (test_facility(35)) { + /* exrl %r0,.+10 */ + EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10); + } else { + /* larl %r1,.+14 */ + EMIT6_PCREL_RILB(0xc0000000, REG_1, jit->prg + 14); + /* ex 0,0(%r1) */ + EMIT4_DISP(0x44000000, REG_0, REG_1, 0); + } + /* j . */ + EMIT4_PCREL(0xa7f40000, 0); + } /* br %r14 */ _EMIT2(0x07fe); + + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable && + (jit->seen & SEEN_FUNC)) { + jit->r1_thunk_ip = jit->prg; + /* Generate __s390_indirect_jump_r1 thunk */ + if (test_facility(35)) { + /* exrl %r0,.+10 */ + EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10); + /* j . */ + EMIT4_PCREL(0xa7f40000, 0); + /* br %r1 */ + _EMIT2(0x07f1); + } else { + /* larl %r1,.+14 */ + EMIT6_PCREL_RILB(0xc0000000, REG_1, jit->prg + 14); + /* ex 0,S390_lowcore.br_r1_tampoline */ + EMIT4_DISP(0x44000000, REG_0, REG_0, + offsetof(struct _lowcore, br_r1_trampoline)); + /* j . */ + EMIT4_PCREL(0xa7f40000, 0); + } + } }
/* @@ -980,8 +1034,13 @@ static noinline int bpf_jit_insn(struct /* lg %w1,<d(imm)>(%l) */ EMIT6_DISP_LH(0xe3000000, 0x0004, REG_W1, REG_0, REG_L, EMIT_CONST_U64(func)); - /* basr %r14,%w1 */ - EMIT2(0x0d00, REG_14, REG_W1); + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) { + /* brasl %r14,__s390_indirect_jump_r1 */ + EMIT6_PCREL_RILB(0xc0050000, REG_14, jit->r1_thunk_ip); + } else { + /* basr %r14,%w1 */ + EMIT2(0x0d00, REG_14, REG_W1); + } /* lgr %b0,%r2: load return value into %b0 */ EMIT4(0xb9040000, BPF_REG_0, REG_2); if (bpf_helper_changes_skb_data((void *)func)) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Yan yanaijie@huawei.com
commit 318aaf34f1179b39fa9c30fa0f3288b645beee39 upstream.
When ata device doing EH, some commands still attached with tasks are not passed to libata when abort failed or recover failed, so libata did not handle these commands. After these commands done, sas task is freed, but ata qc is not freed. This will cause ata qc leak and trigger a warning like below:
WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037 ata_eh_finish+0xb4/0xcc CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1 ...... Call trace: [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8 [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694 [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80 [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170 [<ffff0000080ebd70>] process_one_work+0x144/0x390 [<ffff0000080ec100>] worker_thread+0x144/0x418 [<ffff0000080f2c98>] kthread+0x10c/0x138 [<ffff0000080855dc>] ret_from_fork+0x10/0x18
If ata qc leaked too many, ata tag allocation will fail and io blocked for ever.
As suggested by Dan Williams, defer ata device commands to libata and merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle ata qcs correctly after this.
Signed-off-by: Jason Yan yanaijie@huawei.com CC: Xiaofei Tan tanxiaofei@huawei.com CC: John Garry john.garry@huawei.com CC: Dan Williams dan.j.williams@intel.com Reviewed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/libsas/sas_scsi_host.c | 33 +++++++++++++-------------------- 1 file changed, 13 insertions(+), 20 deletions(-)
--- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -222,6 +222,7 @@ out_done: static void sas_eh_finish_cmd(struct scsi_cmnd *cmd) { struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host); + struct domain_device *dev = cmd_to_domain_dev(cmd); struct sas_task *task = TO_SAS_TASK(cmd);
/* At this point, we only get called following an actual abort @@ -230,6 +231,14 @@ static void sas_eh_finish_cmd(struct scs */ sas_end_task(cmd, task);
+ if (dev_is_sata(dev)) { + /* defer commands to libata so that libata EH can + * handle ata qcs correctly + */ + list_move_tail(&cmd->eh_entry, &sas_ha->eh_ata_q); + return; + } + /* now finish the command and move it on to the error * handler done list, this also takes it off the * error handler pending list. @@ -237,22 +246,6 @@ static void sas_eh_finish_cmd(struct scs scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q); }
-static void sas_eh_defer_cmd(struct scsi_cmnd *cmd) -{ - struct domain_device *dev = cmd_to_domain_dev(cmd); - struct sas_ha_struct *ha = dev->port->ha; - struct sas_task *task = TO_SAS_TASK(cmd); - - if (!dev_is_sata(dev)) { - sas_eh_finish_cmd(cmd); - return; - } - - /* report the timeout to libata */ - sas_end_task(cmd, task); - list_move_tail(&cmd->eh_entry, &ha->eh_ata_q); -} - static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd *my_cmd) { struct scsi_cmnd *cmd, *n; @@ -260,7 +253,7 @@ static void sas_scsi_clear_queue_lu(stru list_for_each_entry_safe(cmd, n, error_q, eh_entry) { if (cmd->device->sdev_target == my_cmd->device->sdev_target && cmd->device->lun == my_cmd->device->lun) - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); } }
@@ -622,12 +615,12 @@ static void sas_eh_handle_sas_errors(str case TASK_IS_DONE: SAS_DPRINTK("%s: task 0x%p is done\n", __func__, task); - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); continue; case TASK_IS_ABORTED: SAS_DPRINTK("%s: task 0x%p is aborted\n", __func__, task); - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); continue; case TASK_IS_AT_LU: SAS_DPRINTK("task 0x%p is at LU: lu recover\n", task); @@ -638,7 +631,7 @@ static void sas_eh_handle_sas_errors(str "recovered\n", SAS_ADDR(task->dev), cmd->device->lun); - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); sas_scsi_clear_queue_lu(work_q, cmd); goto Again; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Potapenko glider@google.com
commit a45b599ad808c3c982fdcdc12b0b8611c2f92824 upstream.
This shall help avoid copying uninitialized memory to the userspace when calling ioctl(fd, SG_IO) with an empty command.
Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Alexander Potapenko glider@google.com Acked-by: Douglas Gilbert dgilbert@interlog.com Reviewed-by: Johannes Thumshirn jthumshirn@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/sg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1903,7 +1903,7 @@ retry: num = (rem_sz > scatter_elem_sz_prev) ? scatter_elem_sz_prev : rem_sz;
- schp->pages[k] = alloc_pages(gfp_mask, order); + schp->pages[k] = alloc_pages(gfp_mask | __GFP_ZERO, order); if (!schp->pages[k]) goto out;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Remus jremus@linux.ibm.com
commit fa89adba1941e4f3b213399b81732a5c12fd9131 upstream.
zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen adapter ERP action via zfcp_erp_action_enqueue(). Both are separately processed asynchronously and concurrently.
Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock and then iterates with list_for_each() over the adapter's ERP ready list without holding the ERP lock. This opens a race window in which the current list entry can be moved to another list, causing list_for_each() to iterate forever on the wrong list, as the erp_ready_head is never encountered as terminal condition.
Meanwhile the ERP action can be processed in the ERP thread by zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP lock and then calls zfcp_erp_action_to_running() to move the ERP action from the ready to the running list. zfcp_erp_action_to_running() can move the ERP action using list_move() just during the aforementioned race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run(). zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is held by the infinitely looping kworker, it effectively spins forever.
Example Sequence Diagram:
Process ERP Thread rport_work ------------------- ------------------- ------------------- zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() zfcp_scsi_schedule_rports_block() lock ERP zfcp_scsi_rport_work() zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER) list_add_tail() on ready !(rport_task==RPORT_ADD) wake_up() ERP thread zfcp_scsi_rport_block() zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig() unlock ERP lock DBF REC zfcp_erp_wait() lock ERP | zfcp_erp_action_to_running() | list_for_each() ready | list_move() current entry | ready to running | zfcp_dbf_rec_run() endless loop over running | zfcp_dbf_rec_run_lvl() | lock DBF REC spins forever
Any adapter recovery can trigger this, such as setting the device offline or reboot.
V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone") introduced additional tracing of (un)blocking of rports. It missed that the adapter->erp_lock must be held when calling zfcp_dbf_rec_trig().
This fix uses the approach formerly introduced by commit aa0fec62391c ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.").
Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that acquires and releases the adapter->erp_lock for read.
Reported-by: Sebastian Ott sebott@linux.ibm.com Signed-off-by: Jens Remus jremus@linux.ibm.com Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone") Cc: stable@vger.kernel.org # 2.6.32+ Reviewed-by: Benjamin Block bblock@linux.vnet.ibm.com Signed-off-by: Steffen Maier maier@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/scsi/zfcp_dbf.c | 23 ++++++++++++++++++++++- drivers/s390/scsi/zfcp_ext.h | 5 ++++- drivers/s390/scsi/zfcp_scsi.c | 14 +++++++------- 3 files changed, 33 insertions(+), 9 deletions(-)
--- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -3,7 +3,7 @@ * * Debug traces for zfcp. * - * Copyright IBM Corp. 2002, 2017 + * Copyright IBM Corp. 2002, 2018 */
#define KMSG_COMPONENT "zfcp" @@ -287,6 +287,27 @@ void zfcp_dbf_rec_trig(char *tag, struct spin_unlock_irqrestore(&dbf->rec_lock, flags); }
+/** + * zfcp_dbf_rec_trig_lock - trace event related to triggered recovery with lock + * @tag: identifier for event + * @adapter: adapter on which the erp_action should run + * @port: remote port involved in the erp_action + * @sdev: scsi device involved in the erp_action + * @want: wanted erp_action + * @need: required erp_action + * + * The adapter->erp_lock must not be held. + */ +void zfcp_dbf_rec_trig_lock(char *tag, struct zfcp_adapter *adapter, + struct zfcp_port *port, struct scsi_device *sdev, + u8 want, u8 need) +{ + unsigned long flags; + + read_lock_irqsave(&adapter->erp_lock, flags); + zfcp_dbf_rec_trig(tag, adapter, port, sdev, want, need); + read_unlock_irqrestore(&adapter->erp_lock, flags); +}
/** * zfcp_dbf_rec_run_lvl - trace event related to running recovery --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -3,7 +3,7 @@ * * External function declarations. * - * Copyright IBM Corp. 2002, 2016 + * Copyright IBM Corp. 2002, 2018 */
#ifndef ZFCP_EXT_H @@ -34,6 +34,9 @@ extern int zfcp_dbf_adapter_register(str extern void zfcp_dbf_adapter_unregister(struct zfcp_adapter *); extern void zfcp_dbf_rec_trig(char *, struct zfcp_adapter *, struct zfcp_port *, struct scsi_device *, u8, u8); +extern void zfcp_dbf_rec_trig_lock(char *tag, struct zfcp_adapter *adapter, + struct zfcp_port *port, + struct scsi_device *sdev, u8 want, u8 need); extern void zfcp_dbf_rec_run(char *, struct zfcp_erp_action *); extern void zfcp_dbf_rec_run_lvl(int level, char *tag, struct zfcp_erp_action *erp); --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -3,7 +3,7 @@ * * Interface to Linux SCSI midlayer. * - * Copyright IBM Corp. 2002, 2017 + * Copyright IBM Corp. 2002, 2018 */
#define KMSG_COMPONENT "zfcp" @@ -616,9 +616,9 @@ static void zfcp_scsi_rport_register(str ids.port_id = port->d_id; ids.roles = FC_RPORT_ROLE_FCP_TARGET;
- zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL, - ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD, - ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD); + zfcp_dbf_rec_trig_lock("scpaddy", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD, + ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD); rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids); if (!rport) { dev_err(&port->adapter->ccw_device->dev, @@ -640,9 +640,9 @@ static void zfcp_scsi_rport_block(struct struct fc_rport *rport = port->rport;
if (rport) { - zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL, - ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL, - ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL); + zfcp_dbf_rec_trig_lock("scpdely", port->adapter, port, NULL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL, + ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL); fc_remote_port_delete(rport); port->rport = NULL; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vinod Koul vinod.koul@intel.com
commit 757d12e5849be549076901b0d33c60d5f360269c upstream.
dmaengine has various device callbacks and exposes helper functions to invoke these. These helpers should check if channel, device and callback is valid or not before invoking them.
Reported-by: Jon Hunter jonathanh@nvidia.com Signed-off-by: Vinod Koul vinod.koul@intel.com Signed-off-by: Fabrizio Castro fabrizio.castro@bp.renesas.com Signed-off-by: Jianming Qiao jianming.qiao@bp.renesas.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/dmaengine.h | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
--- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -767,6 +767,9 @@ static inline struct dma_async_tx_descri sg_dma_address(&sg) = buf; sg_dma_len(&sg) = len;
+ if (!chan || !chan->device || !chan->device->device_prep_slave_sg) + return NULL; + return chan->device->device_prep_slave_sg(chan, &sg, 1, dir, flags, NULL); } @@ -775,6 +778,9 @@ static inline struct dma_async_tx_descri struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction dir, unsigned long flags) { + if (!chan || !chan->device || !chan->device->device_prep_slave_sg) + return NULL; + return chan->device->device_prep_slave_sg(chan, sgl, sg_len, dir, flags, NULL); } @@ -786,6 +792,9 @@ static inline struct dma_async_tx_descri enum dma_transfer_direction dir, unsigned long flags, struct rio_dma_ext *rio_ext) { + if (!chan || !chan->device || !chan->device->device_prep_slave_sg) + return NULL; + return chan->device->device_prep_slave_sg(chan, sgl, sg_len, dir, flags, rio_ext); } @@ -796,6 +805,9 @@ static inline struct dma_async_tx_descri size_t period_len, enum dma_transfer_direction dir, unsigned long flags) { + if (!chan || !chan->device || !chan->device->device_prep_dma_cyclic) + return NULL; + return chan->device->device_prep_dma_cyclic(chan, buf_addr, buf_len, period_len, dir, flags); } @@ -804,6 +816,9 @@ static inline struct dma_async_tx_descri struct dma_chan *chan, struct dma_interleaved_template *xt, unsigned long flags) { + if (!chan || !chan->device || !chan->device->device_prep_interleaved_dma) + return NULL; + return chan->device->device_prep_interleaved_dma(chan, xt, flags); }
@@ -811,7 +826,7 @@ static inline struct dma_async_tx_descri struct dma_chan *chan, dma_addr_t dest, int value, size_t len, unsigned long flags) { - if (!chan || !chan->device) + if (!chan || !chan->device || !chan->device->device_prep_dma_memset) return NULL;
return chan->device->device_prep_dma_memset(chan, dest, value, @@ -824,6 +839,9 @@ static inline struct dma_async_tx_descri struct scatterlist *src_sg, unsigned int src_nents, unsigned long flags) { + if (!chan || !chan->device || !chan->device->device_prep_dma_sg) + return NULL; + return chan->device->device_prep_dma_sg(chan, dst_sg, dst_nents, src_sg, src_nents, flags); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Stultz john.stultz@linaro.org
commit 3d88d56c5873f6eebe23e05c3da701960146b801 upstream.
Due to how the MONOTONIC_RAW accumulation logic was handled, there is the potential for a 1ns discontinuity when we do accumulations. This small discontinuity has for the most part gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW in their vDSO clock_gettime implementation, we've seen failures with the inconsistency-check test in kselftest.
This patch addresses the issue by using the same sub-ns accumulation handling that CLOCK_MONOTONIC uses, which avoids the issue for in-kernel users.
Since the ARM64 vDSO implementation has its own clock_gettime calculation logic, this patch reduces the frequency of errors, but failures are still seen. The ARM64 vDSO will need to be updated to include the sub-nanosecond xtime_nsec values in its calculation for this issue to be completely fixed.
Signed-off-by: John Stultz john.stultz@linaro.org Tested-by: Daniel Mentz danielmentz@google.com Cc: Prarit Bhargava prarit@redhat.com Cc: Kevin Brodsky kevin.brodsky@arm.com Cc: Richard Cochran richardcochran@gmail.com Cc: Stephen Boyd stephen.boyd@linaro.org Cc: Will Deacon will.deacon@arm.com Cc: "stable #4 . 8+" stable@vger.kernel.org Cc: Miroslav Lichvar mlichvar@redhat.com Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linar... Signed-off-by: Thomas Gleixner tglx@linutronix.de [fabrizio: cherry-pick to 4.4. Kept cycle_t type for function logarithmic_accumulation local variable "interval". Dropped casting of "interval" variable] Signed-off-by: Fabrizio Castro fabrizio.castro@bp.renesas.com Signed-off-by: Biju Das biju.das@bp.renesas.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/timekeeper_internal.h | 4 ++-- kernel/time/timekeeping.c | 20 ++++++++++---------- 2 files changed, 12 insertions(+), 12 deletions(-)
--- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -56,7 +56,7 @@ struct tk_read_base { * interval. * @xtime_remainder: Shifted nano seconds left over when rounding * @cycle_interval - * @raw_interval: Raw nano seconds accumulated per NTP interval. + * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. * @ntp_error: Difference between accumulated time and NTP time in ntp * shifted nano seconds. * @ntp_error_shift: Shift conversion between clock shifted nano seconds and @@ -97,7 +97,7 @@ struct timekeeper { cycle_t cycle_interval; u64 xtime_interval; s64 xtime_remainder; - u32 raw_interval; + u64 raw_interval; /* The ntp_tick_length() value currently being used. * This cached copy ensures we consistently apply the tick * length for an entire tick, as ntp_tick_length may change --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -277,8 +277,7 @@ static void tk_setup_internals(struct ti /* Go back from cycles -> shifted ns */ tk->xtime_interval = (u64) interval * clock->mult; tk->xtime_remainder = ntpinterval - tk->xtime_interval; - tk->raw_interval = - ((u64) interval * clock->mult) >> clock->shift; + tk->raw_interval = interval * clock->mult;
/* if changing clocks, convert xtime_nsec shift units */ if (old_clock) { @@ -1767,7 +1766,7 @@ static cycle_t logarithmic_accumulation( unsigned int *clock_set) { cycle_t interval = tk->cycle_interval << shift; - u64 raw_nsecs; + u64 snsec_per_sec;
/* If the offset is smaller than a shifted interval, do nothing */ if (offset < interval) @@ -1782,14 +1781,15 @@ static cycle_t logarithmic_accumulation( *clock_set |= accumulate_nsecs_to_secs(tk);
/* Accumulate raw time */ - raw_nsecs = (u64)tk->raw_interval << shift; - raw_nsecs += tk->raw_time.tv_nsec; - if (raw_nsecs >= NSEC_PER_SEC) { - u64 raw_secs = raw_nsecs; - raw_nsecs = do_div(raw_secs, NSEC_PER_SEC); - tk->raw_time.tv_sec += raw_secs; + tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift; + tk->tkr_raw.xtime_nsec += tk->raw_interval << shift; + snsec_per_sec = (u64)NSEC_PER_SEC << tk->tkr_raw.shift; + while (tk->tkr_raw.xtime_nsec >= snsec_per_sec) { + tk->tkr_raw.xtime_nsec -= snsec_per_sec; + tk->raw_time.tv_sec++; } - tk->raw_time.tv_nsec = raw_nsecs; + tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift; + tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;
/* Accumulate error between NTP and clock interval */ tk->ntp_error += tk->ntp_tick << shift;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
commit b26a719bdba9aa926ceaadecc66e07623d2b8a53 upstream.
The R-Car GPIO driver handles Runtime PM for requested GPIOs only.
When using a GPIO purely as an interrupt source, no Runtime PM handling is done, and the GPIO module's clock may not be enabled.
To fix this: - Add .irq_request_resources() and .irq_release_resources() callbacks to handle Runtime PM when an interrupt is requested, - Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM when e.g. disabling/enabling an interrupt, or configuring the interrupt type.
Fixes: d5c3d84657db57bd "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS" Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Linus Walleij linus.walleij@linaro.org [fabrizio: cherry-pick to v4.4.y. Use container_of instead of gpiochip_get_data.] Signed-off-by: Fabrizio Castro fabrizio.castro@bp.renesas.com Reviewed-by: Biju Das biju.das@bp.renesas.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpio-rcar.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
--- a/drivers/gpio/gpio-rcar.c +++ b/drivers/gpio/gpio-rcar.c @@ -200,6 +200,48 @@ static int gpio_rcar_irq_set_wake(struct return 0; }
+static void gpio_rcar_irq_bus_lock(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv, + gpio_chip); + + pm_runtime_get_sync(&p->pdev->dev); +} + +static void gpio_rcar_irq_bus_sync_unlock(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv, + gpio_chip); + + pm_runtime_put(&p->pdev->dev); +} + + +static int gpio_rcar_irq_request_resources(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv, + gpio_chip); + int error; + + error = pm_runtime_get_sync(&p->pdev->dev); + if (error < 0) + return error; + + return 0; +} + +static void gpio_rcar_irq_release_resources(struct irq_data *d) +{ + struct gpio_chip *gc = irq_data_get_irq_chip_data(d); + struct gpio_rcar_priv *p = container_of(gc, struct gpio_rcar_priv, + gpio_chip); + + pm_runtime_put(&p->pdev->dev); +} + static irqreturn_t gpio_rcar_irq_handler(int irq, void *dev_id) { struct gpio_rcar_priv *p = dev_id; @@ -460,6 +502,10 @@ static int gpio_rcar_probe(struct platfo irq_chip->irq_unmask = gpio_rcar_irq_enable; irq_chip->irq_set_type = gpio_rcar_irq_set_type; irq_chip->irq_set_wake = gpio_rcar_irq_set_wake; + irq_chip->irq_bus_lock = gpio_rcar_irq_bus_lock; + irq_chip->irq_bus_sync_unlock = gpio_rcar_irq_bus_sync_unlock; + irq_chip->irq_request_resources = gpio_rcar_irq_request_resources; + irq_chip->irq_release_resources = gpio_rcar_irq_release_resources; irq_chip->flags = IRQCHIP_SET_TYPE_MASKED | IRQCHIP_MASK_ON_SUSPEND;
ret = gpiochip_add(gpio_chip);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit a7cfebcb7594a24609268f91299ab85ba064bf82 upstream.
There's currently no limit on wiphy names, other than netlink message size and memory limitations, but that causes issues when, for example, the wiphy name is used in a uevent, e.g. in rfkill where we use the same name for the rfkill instance, and then the buffer there is "only" 2k for the environment variables.
This was reported by syzkaller, which used a 4k name.
Limit the name to something reasonable, I randomly picked 128.
Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/uapi/linux/nl80211.h | 2 ++ net/wireless/core.c | 3 +++ 2 files changed, 5 insertions(+)
--- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -2195,6 +2195,8 @@ enum nl80211_attrs { #define NL80211_ATTR_KEYS NL80211_ATTR_KEYS #define NL80211_ATTR_FEATURE_FLAGS NL80211_ATTR_FEATURE_FLAGS
+#define NL80211_WIPHY_NAME_MAXLEN 128 + #define NL80211_MAX_SUPP_RATES 32 #define NL80211_MAX_SUPP_HT_RATES 77 #define NL80211_MAX_SUPP_REG_RULES 64 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -94,6 +94,9 @@ static int cfg80211_dev_check_name(struc
ASSERT_RTNL();
+ if (strlen(newname) > NL80211_WIPHY_NAME_MAXLEN) + return -EINVAL; + /* prohibit calling the thing phy%d when %d is not its number */ sscanf(newname, PHY_NAME "%d%n", &wiphy_idx, &taken); if (taken == strlen(newname) && wiphy_idx != rdev->wiphy_idx) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 66072c29328717072fd84aaff3e070e3f008ba77 upstream.
syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
As far as I can see, it is hfsplus_mark_mdb_dirty() from hfsplus_new_inode() in hfsplus_fill_super() that calls queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does not fail if queue_delayed_work() was called, and the out_put_hidden_dir label is the appropriate location to call cancel_delayed_work_sync().
[1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c...
Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.... Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: David Howells dhowells@redhat.com Cc: Ernesto A. Fernandez ernesto.mnd.fernandez@gmail.com Cc: Vyacheslav Dubeyko slava@dubeyko.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/hfsplus/super.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/hfsplus/super.c +++ b/fs/hfsplus/super.c @@ -585,6 +585,7 @@ static int hfsplus_fill_super(struct sup return 0;
out_put_hidden_dir: + cancel_delayed_work_sync(&sbi->sync_work); iput(sbi->hidden_dir); out_put_root: dput(sb->s_root);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit a466ef76b815b86748d9870ef2a430af7b39c710 upstream.
From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Date: Wed, 9 May 2018 12:12:39 +0900 Subject: [PATCH 4.4 92/92] x86/kexec: Avoid double free_page() upon do_kexec_load() failure
syzbot is reporting crashes after memory allocation failure inside do_kexec_load() [1]. This is because free_transition_pgtable() is called by both init_transition_pgtable() and machine_kexec_cleanup() when memory allocation failed inside init_transition_pgtable().
Regarding 32bit code, machine_kexec_free_page_tables() is called by both machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory allocation failed inside machine_kexec_alloc_page_tables().
Fix this by leaving the error handling to machine_kexec_cleanup() (and optionally setting NULL after free_page()).
[1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b4...
Fixes: f5deb79679af6eb4 ("x86: kexec: Use one page table in x86_64 machine_kexec") Fixes: 92be3d6bdf2cb349 ("kexec/i386: allocate page table pages dynamically") Reported-by: syzbot syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Baoquan He bhe@redhat.com Cc: thomas.lendacky@amd.com Cc: prudo@linux.vnet.ibm.com Cc: Huang Ying ying.huang@intel.com Cc: syzkaller-bugs@googlegroups.com Cc: takahiro.akashi@linaro.org Cc: H. Peter Anvin hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dyoung@redhat.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/machine_kexec_32.c | 6 +++++- arch/x86/kernel/machine_kexec_64.c | 4 +++- 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -71,12 +71,17 @@ static void load_segments(void) static void machine_kexec_free_page_tables(struct kimage *image) { free_page((unsigned long)image->arch.pgd); + image->arch.pgd = NULL; #ifdef CONFIG_X86_PAE free_page((unsigned long)image->arch.pmd0); + image->arch.pmd0 = NULL; free_page((unsigned long)image->arch.pmd1); + image->arch.pmd1 = NULL; #endif free_page((unsigned long)image->arch.pte0); + image->arch.pte0 = NULL; free_page((unsigned long)image->arch.pte1); + image->arch.pte1 = NULL; }
static int machine_kexec_alloc_page_tables(struct kimage *image) @@ -93,7 +98,6 @@ static int machine_kexec_alloc_page_tabl !image->arch.pmd0 || !image->arch.pmd1 || #endif !image->arch.pte0 || !image->arch.pte1) { - machine_kexec_free_page_tables(image); return -ENOMEM; } return 0; --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -37,8 +37,11 @@ static struct kexec_file_ops *kexec_file static void free_transition_pgtable(struct kimage *image) { free_page((unsigned long)image->arch.pud); + image->arch.pud = NULL; free_page((unsigned long)image->arch.pmd); + image->arch.pmd = NULL; free_page((unsigned long)image->arch.pte); + image->arch.pte = NULL; }
static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) @@ -79,7 +82,6 @@ static int init_transition_pgtable(struc set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); return 0; err: - free_transition_pgtable(image); return result; }
On 05/24/2018 02:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Building s390:allmodconfig ... failed
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1': (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1' make[1]: *** [vmlinux] Error 1 make: *** [sub-make] Error 2
Complete report follows later.
Guenter
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my Pixel 2 XL and OnePlus 5.
No initial issues noticed in dmesg or general usage.
Thanks! Nathan
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 145 fail: 1 Failed builds: s390:allmodconfig Qemu test results: total: 127 pass: 127 fail: 0
Build error (s390:allmodconfig):
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1': (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1'
Details are available at http://kerneltests.org/builders/.
Guenter
On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 145 fail: 1 Failed builds: s390:allmodconfig Qemu test results: total: 127 pass: 127 fail: 0
Build error (s390:allmodconfig):
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1': (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1'
I'll look into the s390 stuff in the morning, I think I know what I messed up there...
thanks,
greg k-h
On Thu, May 24, 2018 at 09:47:42PM +0200, Greg Kroah-Hartman wrote:
On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 145 fail: 1 Failed builds: s390:allmodconfig Qemu test results: total: 127 pass: 127 fail: 0
Build error (s390:allmodconfig):
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1': (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1'
I'll look into the s390 stuff in the morning, I think I know what I messed up there...
Nope, I don't think this was my fault :)
Oddly 'defconfig' worked for s390 with the offending patch, but not allmodconfig. I've now dropped the patch I think was causing the problem and pushed out a -rc2.
thanks,
greg k-h
On Fri, May 25, 2018 at 04:11:45PM +0200, Greg Kroah-Hartman wrote:
On Thu, May 24, 2018 at 09:47:42PM +0200, Greg Kroah-Hartman wrote:
On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 145 fail: 1 Failed builds: s390:allmodconfig Qemu test results: total: 127 pass: 127 fail: 0
Build error (s390:allmodconfig):
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1': (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1'
I'll look into the s390 stuff in the morning, I think I know what I messed up there...
Nope, I don't think this was my fault :)
Oddly 'defconfig' worked for s390 with the offending patch, but not allmodconfig. I've now dropped the patch I think was causing the problem and pushed out a -rc2.
Confirmed to build with v4.4.132-92-g8330f2b.
Guenter
On Fri, May 25, 2018 at 09:39:19AM -0700, Guenter Roeck wrote:
On Fri, May 25, 2018 at 04:11:45PM +0200, Greg Kroah-Hartman wrote:
On Thu, May 24, 2018 at 09:47:42PM +0200, Greg Kroah-Hartman wrote:
On Thu, May 24, 2018 at 10:32:08AM -0700, Guenter Roeck wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 145 fail: 1 Failed builds: s390:allmodconfig Qemu test results: total: 127 pass: 127 fail: 0
Build error (s390:allmodconfig):
arch/s390/built-in.o: In function `__s390x_indirect_jump_r1use_r1': (.text.__s390x_indirect_jump_r1use_r1[__s390x_indirect_jump_r1use_r1]+0x2): undefined reference to `_LC_BR_R1'
I'll look into the s390 stuff in the morning, I think I know what I messed up there...
Nope, I don't think this was my fault :)
Oddly 'defconfig' worked for s390 with the offending patch, but not allmodconfig. I've now dropped the patch I think was causing the problem and pushed out a -rc2.
Confirmed to build with v4.4.132-92-g8330f2b.
Wonderful, thanks for testing and letting me know.
greg k-h
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
Ran 6863 total tests in the following environments and test suites.
Environments -------------- - juno-r2 - arm64 - qemu_arm - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * ltp-cap_bounds-tests * ltp-containers-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * libhugetlbfs * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
Summary ------------------------------------------------------------------------
kernel: 4.4.133-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.133-rc1-hikey-20180524-200 git commit: 9a1fc06621f38daaaa47feeaa1eeaf38db532433 git describe: 4.4.133-rc1-hikey-20180524-200 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.133-rc1-hikey-20180521-198)
Ran 2610 total tests in the following environments and test suites.
Environments -------------- - hi6220-hikey - arm64 - qemu_arm64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
Shouldn't this compare against v4.4.132 ?
I looked into kselftest/rtnetlink.sh test results. The test history is a bit confusing.
- v4.4.131-57-ge33795f7a573 and earlier passed - v4.4.132 failed - v4.4.132-30-ga102378c6551 passed ... - v4.4.132-70-gaa7ab28e9c5e passed - v4.4.132-71-g180635995c36 and later failed
Does that mean that this specific test is unreliable ?
Thanks, Guenter
Ran 6863 total tests in the following environments and test suites.
Environments
- juno-r2 - arm64
- qemu_arm
- qemu_x86_64
- x15 - arm
- x86_64
Test Suites
- boot
- kselftest
- ltp-cap_bounds-tests
- ltp-containers-tests
- ltp-fcntl-locktests-tests
- ltp-filecaps-tests
- ltp-fs-tests
- ltp-fs_bind-tests
- ltp-fs_perms_simple-tests
- ltp-fsx-tests
- ltp-hugetlb-tests
- ltp-io-tests
- ltp-ipc-tests
- ltp-math-tests
- ltp-nptl-tests
- ltp-pty-tests
- ltp-sched-tests
- ltp-securebits-tests
- ltp-syscalls-tests
- ltp-timers-tests
- libhugetlbfs
- kselftest-vsyscall-mode-native
- kselftest-vsyscall-mode-none
Summary
kernel: 4.4.133-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.133-rc1-hikey-20180524-200 git commit: 9a1fc06621f38daaaa47feeaa1eeaf38db532433 git describe: 4.4.133-rc1-hikey-20180524-200 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.133-rc1-hikey-20180521-198)
Ran 2610 total tests in the following environments and test suites.
Environments
- hi6220-hikey - arm64
- qemu_arm64
Test Suites
- boot
- kselftest
- libhugetlbfs
- ltp-cap_bounds-tests
- ltp-containers-tests
- ltp-fcntl-locktests-tests
- ltp-filecaps-tests
- ltp-fs-tests
- ltp-fs_bind-tests
- ltp-fs_perms_simple-tests
- ltp-fsx-tests
- ltp-hugetlb-tests
- ltp-io-tests
- ltp-ipc-tests
- ltp-math-tests
- ltp-nptl-tests
- ltp-pty-tests
- ltp-sched-tests
- ltp-securebits-tests
- ltp-syscalls-tests
- ltp-timers-tests
-- Linaro LKFT https://lkft.linaro.org
On 24 May 2018 at 23:47, Guenter Roeck linux@roeck-us.net wrote:
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
Shouldn't this compare against v4.4.132 ?
I looked into kselftest/rtnetlink.sh test results. The test history is a bit confusing.
- v4.4.131-57-ge33795f7a573 and earlier passed
- v4.4.132 failed
- v4.4.132-30-ga102378c6551 passed
...
- v4.4.132-70-gaa7ab28e9c5e passed
- v4.4.132-71-g180635995c36 and later failed
Does that mean that this specific test is unreliable ?
kselftest rtnetlink.sh test case failure is not a regression on 4.16, 4.14, 4.9 and 4.4 builds. Because it used to skip due to missing tc tool.
Now the ''tc' tool added to Open Embedded build and test running and reported failed. This is not a regression in the kernel. It is a change in the user space.
Old output ============ SKIP: Could not run test without the tc tool selftests: rtnetlink.sh [PASS] https://lkft.validation.linaro.org/scheduler/job/225015#L2255
New output ============= RTNETLINK answers: Operation not supported Cannot find device "test-dummy0" FAIL: cannot add dummy interface selftests: rtnetlink.sh [FAIL] https://lkft.validation.linaro.org/scheduler/job/226352#L3375
Ref bug link: https://bugs.linaro.org/show_bug.cgi?id=3834
- Naresh
On 05/24/2018 03:34 PM, Naresh Kamboju wrote:
On 24 May 2018 at 23:47, Guenter Roeck linux@roeck-us.net wrote:
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
Shouldn't this compare against v4.4.132 ?
I looked into kselftest/rtnetlink.sh test results. The test history is a bit confusing.
- v4.4.131-57-ge33795f7a573 and earlier passed
- v4.4.132 failed
- v4.4.132-30-ga102378c6551 passed
...
- v4.4.132-70-gaa7ab28e9c5e passed
- v4.4.132-71-g180635995c36 and later failed
Does that mean that this specific test is unreliable ?
kselftest rtnetlink.sh test case failure is not a regression on 4.16, 4.14, 4.9 and 4.4 builds. Because it used to skip due to missing tc tool.
Now the ''tc' tool added to Open Embedded build and test running and reported failed. This is not a regression in the kernel. It is a change in the user space.
Old output
SKIP: Could not run test without the tc tool selftests: rtnetlink.sh [PASS] https://lkft.validation.linaro.org/scheduler/job/225015#L2255
New output
RTNETLINK answers: Operation not supported Cannot find device "test-dummy0" FAIL: cannot add dummy interface selftests: rtnetlink.sh [FAIL] https://lkft.validation.linaro.org/scheduler/job/226352#L3375
Ref bug link: https://bugs.linaro.org/show_bug.cgi?id=3834
- Naresh
Which kselftest versdion do you run? Is this from linux-next?
thanks, -- Shuah
On Thu, May 24, 2018 at 03:52:49PM -0600, Shuah Khan wrote:
On 05/24/2018 03:34 PM, Naresh Kamboju wrote:
On 24 May 2018 at 23:47, Guenter Roeck linux@roeck-us.net wrote:
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
Shouldn't this compare against v4.4.132 ?
I looked into kselftest/rtnetlink.sh test results. The test history is a bit confusing.
- v4.4.131-57-ge33795f7a573 and earlier passed
- v4.4.132 failed
- v4.4.132-30-ga102378c6551 passed
...
- v4.4.132-70-gaa7ab28e9c5e passed
- v4.4.132-71-g180635995c36 and later failed
Does that mean that this specific test is unreliable ?
kselftest rtnetlink.sh test case failure is not a regression on 4.16, 4.14, 4.9 and 4.4 builds. Because it used to skip due to missing tc tool.
Now the ''tc' tool added to Open Embedded build and test running and reported failed. This is not a regression in the kernel. It is a change in the user space.
Old output
SKIP: Could not run test without the tc tool selftests: rtnetlink.sh [PASS] https://lkft.validation.linaro.org/scheduler/job/225015#L2255
New output
RTNETLINK answers: Operation not supported Cannot find device "test-dummy0" FAIL: cannot add dummy interface selftests: rtnetlink.sh [FAIL] https://lkft.validation.linaro.org/scheduler/job/226352#L3375
Ref bug link: https://bugs.linaro.org/show_bug.cgi?id=3834
- Naresh
Which kselftest versdion do you run? Is this from linux-next?
It's using kselftest from 4.16 (latest stable, as a rule). I think this patch will fix the 'Operation not supported' issue: https://patchwork.kernel.org/patch/10424807/
Dan
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2 LTP tests that we previously were not passing. I don't know why you all were not reporting that in the past, it took someone else randomly deciding to run LTP to report it to me...
Why did an improvement in results not show up?
thanks,
greg k-h
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2 LTP tests that we previously were not passing. I don't know why you all were not reporting that in the past, it took someone else randomly deciding to run LTP to report it to me...
Why did an improvement in results not show up?
Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ? I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y (commit 49eea433b326 in mainline) so this "fix" is changing the timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in ARM64. Needs review though :\
Hello Rafael,
The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS. https://www.spinics.net/lists/stable/msg239475.html
Looking at https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-... I see that these two tests (and other important tests as well) are being SKIPPED.
By the way, I see that select04 FAILS in your case. But in my setup, select04 was working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
Thanks, Daniel Sangorrin
-----Original Message----- From: stable-owner@vger.kernel.org [mailto:stable-owner@vger.kernel.org] On Behalf Of Rafael Tinoco Sent: Friday, May 25, 2018 5:32 AM To: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org; shuah@kernel.org; patches@kernelci.org; lkft-triage@lists.linaro.org; ben.hutchings@codethink.co.uk; stable@vger.kernel.org; akpm@linux-foundation.org; torvalds@linux-foundation.org; linux@roeck-us.net Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
kernel: 4.4.133-rc1 git repo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-... 3d7cdea9
No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2 LTP tests that we previously were not passing. I don't know why you all were not reporting that in the past, it took someone else randomly deciding to run LTP to report it to me...
Why did an improvement in results not show up?
Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ? I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y (commit 49eea433b326 in mainline) so this "fix" is changing the timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in ARM64. Needs review though :\
Thank you Daniel! Will investigate those.
Meanwhile, Greg, I referred to:
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Since we're not using this type of clock on arm64's 4.4 kernel vdso functions. This commit's description calls attention for it to be responsible for fixing kselftest flacking tests, we wouldn't get that on 4.4 according to bellow:
stable-rc-linux-4.14.y dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
stable-rc-linux-4.16.y dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
stable-rc-linux-4.4.y <none>
stable-rc-linux-4.9.y 99f66b5182a4 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
Yet, the second fix was backported to all (including 4.4.y):
stable-rc-linux-4.14.y 3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting stable-rc-linux-4.16.y 3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting stable-rc-linux-4.4.y 7c8bd6e07430 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting stable-rc-linux-4.9.y a53bfdda06ac time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Not sure you want to keep it in 4.4, thought it was worth mentioning it.
Cheers.
On 24 May 2018 at 22:34, Daniel Sangorrin daniel.sangorrin@toshiba.co.jp wrote:
Hello Rafael,
The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS. https://www.spinics.net/lists/stable/msg239475.html
Looking at https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-... I see that these two tests (and other important tests as well) are being SKIPPED.
By the way, I see that select04 FAILS in your case. But in my setup, select04 was working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
Thanks, Daniel Sangorrin
-----Original Message----- From: stable-owner@vger.kernel.org [mailto:stable-owner@vger.kernel.org] On Behalf Of Rafael Tinoco Sent: Friday, May 25, 2018 5:32 AM To: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org; shuah@kernel.org; patches@kernelci.org; lkft-triage@lists.linaro.org; ben.hutchings@codethink.co.uk; stable@vger.kernel.org; akpm@linux-foundation.org; torvalds@linux-foundation.org; linux@roeck-us.net Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
kernel: 4.4.133-rc1 git repo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-... 3d7cdea9
No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2 LTP tests that we previously were not passing. I don't know why you all were not reporting that in the past, it took someone else randomly deciding to run LTP to report it to me...
Why did an improvement in results not show up?
Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ? I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y (commit 49eea433b326 in mainline) so this "fix" is changing the timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in ARM64. Needs review though :\
-----Original Message----- From: Rafael Tinoco [mailto:rafael.tinoco@linaro.org]
Thank you Daniel! Will investigate those.
OK, thank you :). Notice that I did those tests on x86_64. It seems you are testing on arm, so there may be some differences.
I just checked these tests on 4.4.133 (on x86_64): fcntl35: PASS fcntl35_64: PASS select04: PASS
I am currently investigating other tests that are failing as well. They are not regressions, just some patches have not been backported yet.
Thanks, Daniel
Meanwhile, Greg, I referred to:
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Since we're not using this type of clock on arm64's 4.4 kernel vdso functions. This commit's description calls attention for it to be responsible for fixing kselftest flacking tests, we wouldn't get that on 4.4 according to bellow:
stable-rc-linux-4.14.y dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
stable-rc-linux-4.16.y dbb236c1ceb6 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
stable-rc-linux-4.4.y
<none>
stable-rc-linux-4.9.y 99f66b5182a4 arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW 49eea433b326 arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO 82e88ff1ea94 hrtimer: Revert CLOCK_MONOTONIC_RAW support 9c808765e88e hrtimer: Add support for CLOCK_MONOTONIC_RAW
Yet, the second fix was backported to all (including 4.4.y):
stable-rc-linux-4.14.y 3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting stable-rc-linux-4.16.y 3d88d56c5873 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting stable-rc-linux-4.4.y 7c8bd6e07430 time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting stable-rc-linux-4.9.y a53bfdda06ac time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
Not sure you want to keep it in 4.4, thought it was worth mentioning it.
Cheers.
On 24 May 2018 at 22:34, Daniel Sangorrin daniel.sangorrin@toshiba.co.jp wrote:
Hello Rafael,
The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS. https://www.spinics.net/lists/stable/msg239475.html
Looking at
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-... 3d7cdea9/testrun/228569/suite/ltp-syscalls-tests/tests/
I see that these two tests (and other important tests as well) are being SKIPPED.
By the way, I see that select04 FAILS in your case. But in my setup, select04 was
working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
Thanks, Daniel Sangorrin
-----Original Message----- From: stable-owner@vger.kernel.org [mailto:stable-owner@vger.kernel.org]
On
Behalf Of Rafael Tinoco Sent: Friday, May 25, 2018 5:32 AM To: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org; shuah@kernel.org; patches@kernelci.org; lkft-triage@lists.linaro.org; ben.hutchings@codethink.co.uk; stable@vger.kernel.org; akpm@linux-foundation.org; torvalds@linux-foundation.org; linux@roeck-us.net Subject: Re: [PATCH 4.4 00/92] 4.4.133-stable review
kernel: 4.4.133-rc1 git repo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
3d7cdea9
No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2 LTP tests that we previously were not passing. I don't know why you all were not reporting that in the past, it took someone else randomly deciding to run LTP to report it to me...
Why did an improvement in results not show up?
Are you referring to the CLOCK_MONOTONIC_RAW fix for the arm64 vDSO ? I think that CLOCK_MONOTONIC_RAW in VDSO wasn't backported to 4.4.y (commit 49eea433b326 in mainline) so this "fix" is changing the timekeeping sauce (that would fix MONOTONIC RAW) but not for 4.4.y in ARM64. Needs review though :\
Hi Daniel,
On 25 May 2018 at 07:04, Daniel Sangorrin daniel.sangorrin@toshiba.co.jp wrote:
Hello Rafael,
The tests fcntl35 and fcntl35_64 should have go from FAIL to PASS. https://www.spinics.net/lists/stable/msg239475.html
Thanks for the patch.
Now i have manually tested LTP syscalls and confirms, fcntl35 and fcntl35_64 pass on qemu_x86_64, (arm64) Hikey, Juno and (arm32) x15.
Linux version 4.4.133-rc1 (buildslave@x86-64-07) (gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11) ) #1 SMP Thu May 24 10:24:11 UTC 2018
[ 131.873912] LTP: starting fcntl35 tst_test.c:980: INFO: Timeout per run is 0h 05m 00s fcntl35.c:101: PASS: an unprivileged user init the capacity of a pipe to 4096 successfully fcntl35.c:101: PASS: a privileged user init the capacity of a pipe to 65536 successfully Summary: passed 2 failed 0 skipped 0
[ 132.090096] LTP: starting fcntl35_64 incrementing stop tst_test.c:980: INFO: Timeout per run is 0h 05m 00s fcntl35.c:101: PASS: an unprivileged user init the capacity of a pipe to 4096 successfully fcntl35.c:101: PASS: a privileged user init the capacity of a pipe to 65536 successfully Summary: passed 2 failed 0 skipped 0 warnings 0
Looking at https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-... I see that these two tests (and other important tests as well) are being SKIPPED.
fcntl35 and fcntl35_64 will be unskipped from now on.
By the way, I see that select04 FAILS in your case. But in my setup, select04 was working fine (x86_64) in 4.4.132. I will confirm that it still works in 4.4.133
select04 failed on (slow) qemu_arm only and PASS on real hardware of arm32 x15, arm64 Juno and x86_64. Test case verdict comparison across all boards and qemu, https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
LTP select04 failed log on slow qemu_arm32, tst_test.c:980: INFO: Timeout per run is 0h 15m 00s tst_timer_test.c:356: INFO: CLOCK_MONOTONIC resolution 1ns tst_timer_test.c:368: INFO: prctl(PR_GET_TIMERSLACK) = 50us tst_timer_test.c:275: INFO: select() sleeping for 1000us 500 iterations, threshold 450.01us tst_timer_test.c:318: INFO: min 1336us, max 2282us, median 1522us, trunc mean 1542.82us (discarded 25) tst_timer_test.c:321: FAIL: select() slept for too long
Full log: https://lkft.validation.linaro.org/scheduler/job/228569#L8792
Best regards Naresh Kamboju
On Thu, May 24, 2018 at 09:08:06PM +0200, Greg Kroah-Hartman wrote:
On Thu, May 24, 2018 at 01:06:52PM -0500, Dan Rue wrote:
On Thu, May 24, 2018 at 11:37:37AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary
kernel: 4.4.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 915a3d7cdea9daa9e9fb6b855f10c1312e6910c4 git describe: v4.4.132-93-g915a3d7cdea9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.132-93-...
No regressions (compared to build v4.4.132-71-g180635995c36)
It should have gotten better, as there was a fix in here for at least 2 LTP tests that we previously were not passing. I don't know why you all were not reporting that in the past, it took someone else randomly deciding to run LTP to report it to me...
Why did an improvement in results not show up?
Our normal process for failing tests is to skip them and then fix/report/etc separately until they're fixed, at which time we re-enable them. Otherwise, we would have to wade through too many failures in every result set, and we could miss actual regressions. We've never really reported things as they've been fixed - just immediate regressions.
However! We are working on 'known issue' support in qa-reports (SQUAD), so that we can run failing tests and the system will mark them as a known failure. This will allow us to keep our baseline 'green', and also let us run the failing tests so that we can see in realtime as issues get fixed. Once we have that, the only tests that we'll carry on our skiplists are those that cause boards to crash or reboot.
If you have the test cases in mind, I can check them. Otherwise, I can run our full set of tests in staging (without a skiplist) tomorrow and see if any are now passing.
Dan
On 05/24/2018 03:37 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.133 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat May 26 09:31:28 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org