This is the start of the stable review cycle for the 4.4.130 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Apr 29 13:56:42 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.130-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.130-rc1
Heiko Carstens heiko.carstens@de.ibm.com s390/uprobes: implement arch_uretprobe_is_alive()
Sebastian Ott sebott@linux.ibm.com s390/cio: update chpid descriptor after resource accessibility event
Dan Carpenter dan.carpenter@oracle.com cdrom: information leak in cdrom_ioctl_media_changed()
Martin K. Petersen martin.petersen@oracle.com scsi: mptsas: Disable WRITE SAME
Eric Dumazet edumazet@google.com ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
Eric Dumazet edumazet@google.com net: af_packet: fix race in PACKET_{R|T}X_RING
Eric Dumazet edumazet@google.com tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
Wolfgang Bumiller w.bumiller@proxmox.com net: fix deadlock while clearing neighbor proxy table
Eric Dumazet edumazet@google.com tipc: add policy for TIPC_NLA_NET_ADDR
Cong Wang xiyou.wangcong@gmail.com llc: fix NULL pointer deref for SOCK_ZAPPED
Cong Wang xiyou.wangcong@gmail.com llc: hold llc_sap before release_sock()
Xin Long lucien.xin@gmail.com sctp: do not check port in sctp_inet6_cmp_addr
Toshiaki Makita makita.toshiaki@lab.ntt.co.jp vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
Guillaume Nault g.nault@alphalink.fr pppoe: check sockaddr length in pppoe_connect()
Willem de Bruijn willemb@google.com packet: fix bitfield update race
Xin Long lucien.xin@gmail.com team: fix netconsole setup over team
Paolo Abeni pabeni@redhat.com team: avoid adding twice the same option to the event list
Jann Horn jannh@google.com tcp: don't read out-of-bounds opsize
Cong Wang xiyou.wangcong@gmail.com llc: delete timers synchronously in llc_sk_free()
Eric Dumazet edumazet@google.com net: validate attribute sizes in neigh_dump_table()
Guillaume Nault g.nault@alphalink.fr l2tp: check sockaddr length in pppol2tp_connect()
Eric Biggers ebiggers@google.com KEYS: DNS: limit the length of option strings
Xin Long lucien.xin@gmail.com bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
Martin Schwidefsky schwidefsky@de.ibm.com s390: correct module section names for expoline code revert
Martin Schwidefsky schwidefsky@de.ibm.com s390: correct nospec auto detection init order
Martin Schwidefsky schwidefsky@de.ibm.com s390: add sysfs attributes for spectre
Martin Schwidefsky schwidefsky@de.ibm.com s390: report spectre mitigation via syslog
Martin Schwidefsky schwidefsky@de.ibm.com s390: add automatic detection of the spectre defense
Martin Schwidefsky schwidefsky@de.ibm.com s390: move nobp parameter functions to nospec-branch.c
Martin Schwidefsky schwidefsky@de.ibm.com s390/entry.S: fix spurious zeroing of r0
Martin Schwidefsky schwidefsky@de.ibm.com s390: do not bypass BPENTER for interrupt system calls
Martin Schwidefsky schwidefsky@de.ibm.com s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
Martin Schwidefsky schwidefsky@de.ibm.com s390: introduce execute-trampolines for branches
Martin Schwidefsky schwidefsky@de.ibm.com s390: run user space and KVM guests with modified branch prediction
Martin Schwidefsky schwidefsky@de.ibm.com s390: add options to change branch prediction behaviour for the kernel
Martin Schwidefsky schwidefsky@de.ibm.com s390/alternative: use a copy of the facility bit mask
Martin Schwidefsky schwidefsky@de.ibm.com s390: add optimized array_index_mask_nospec
Martin Schwidefsky schwidefsky@de.ibm.com s390: scrub registers on kernel entry and KVM exit
Martin Schwidefsky schwidefsky@de.ibm.com KVM: s390: wire up bpb feature
Martin Schwidefsky schwidefsky@de.ibm.com s390: enable CPU alternatives unconditionally
Martin Schwidefsky schwidefsky@de.ibm.com s390: introduce CPU alternatives
Karthikeyan Periyasamy periyasa@codeaurora.org Revert "ath10k: send (re)assoc peer command when NSS changed"
Sahitya Tummala stummala@codeaurora.org jbd2: fix use after free in kjournald2()
Felix Fietkau nbd@nbd.name ath9k_hw: check if the chip failed to wake up
Dmitry Torokhov dmitry.torokhov@gmail.com Input: drv260x - fix initializing overdrive voltage
Grant Grundler grundler@chromium.org r8152: add Linksys USB3GIGV1 id
Chen Feng puck.chen@hisilicon.com staging: ion : Donnot wakeup kswapd in ion system alloc
Jiri Olsa jolsa@kernel.org perf: Return proper values for user stack errors
Xiaoming Gao gxm.linux.kernel@gmail.com x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
Steve French smfrench@gmail.com cifs: do not allow creating sockets except with SMB1 posix exensions
-------------
Diffstat:
Documentation/kernel-parameters.txt | 3 + Makefile | 4 +- arch/s390/Kconfig | 47 +++++ arch/s390/Makefile | 10 ++ arch/s390/include/asm/alternative.h | 149 +++++++++++++++ arch/s390/include/asm/barrier.h | 24 +++ arch/s390/include/asm/facility.h | 18 ++ arch/s390/include/asm/kvm_host.h | 3 +- arch/s390/include/asm/lowcore.h | 7 +- arch/s390/include/asm/nospec-branch.h | 17 ++ arch/s390/include/asm/processor.h | 4 + arch/s390/include/asm/thread_info.h | 4 + arch/s390/include/uapi/asm/kvm.h | 3 + arch/s390/kernel/Makefile | 5 +- arch/s390/kernel/alternative.c | 112 ++++++++++++ arch/s390/kernel/early.c | 5 + arch/s390/kernel/entry.S | 250 +++++++++++++++++++++++--- arch/s390/kernel/ipl.c | 1 + arch/s390/kernel/module.c | 65 ++++++- arch/s390/kernel/nospec-branch.c | 169 +++++++++++++++++ arch/s390/kernel/processor.c | 18 ++ arch/s390/kernel/setup.c | 14 +- arch/s390/kernel/smp.c | 7 +- arch/s390/kernel/uprobes.c | 9 + arch/s390/kernel/vmlinux.lds.S | 37 ++++ arch/s390/kvm/kvm-s390.c | 13 +- arch/x86/kernel/tsc.c | 2 +- drivers/cdrom/cdrom.c | 2 +- drivers/input/misc/drv260x.c | 2 +- drivers/message/fusion/mptsas.c | 1 + drivers/net/bonding/bond_main.c | 3 +- drivers/net/ppp/pppoe.c | 4 + drivers/net/team/team.c | 38 +++- drivers/net/usb/cdc_ether.c | 10 ++ drivers/net/usb/r8152.c | 2 + drivers/net/wireless/ath/ath10k/mac.c | 5 +- drivers/net/wireless/ath/ath9k/hw.c | 4 + drivers/s390/char/Makefile | 2 + drivers/s390/cio/chsc.c | 14 +- drivers/staging/android/ion/ion_system_heap.c | 2 +- fs/cifs/dir.c | 9 +- fs/jbd2/journal.c | 2 +- include/linux/if_vlan.h | 7 +- include/net/llc_conn.h | 1 + include/uapi/linux/kvm.h | 1 + kernel/events/core.c | 4 +- net/core/dev.c | 2 +- net/core/neighbour.c | 40 +++-- net/dns_resolver/dns_key.c | 13 +- net/ipv4/tcp.c | 6 +- net/ipv4/tcp_input.c | 7 +- net/ipv6/route.c | 2 + net/l2tp/l2tp_ppp.c | 7 + net/llc/af_llc.c | 14 +- net/llc/llc_c_ac.c | 9 +- net/llc/llc_conn.c | 22 ++- net/packet/af_packet.c | 88 ++++++--- net/packet/internal.h | 10 +- net/sctp/ipv6.c | 60 +++---- net/tipc/net.c | 3 +- 60 files changed, 1228 insertions(+), 168 deletions(-)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French smfrench@gmail.com
commit 1d0cffa674cfa7d185a302c8c6850fc50b893bed upstream.
RHBZ: 1453123
Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040
Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used.
Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin.
Acked-by: Gustavo A. R. Silva gustavo@embeddedor.com CC: Colin Ian King colin.king@canonical.com Reviewed-by: Pavel Shilovsky pshilov@microsoft.com Reported-by: Eryu Guan eguan@redhat.com Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French smfrench@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/dir.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -673,6 +673,9 @@ int cifs_mknod(struct inode *inode, stru goto mknod_out; }
+ if (!S_ISCHR(mode) && !S_ISBLK(mode)) + goto mknod_out; + if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL)) goto mknod_out;
@@ -681,10 +684,8 @@ int cifs_mknod(struct inode *inode, stru
buf = kmalloc(sizeof(FILE_ALL_INFO), GFP_KERNEL); if (buf == NULL) { - kfree(full_path); rc = -ENOMEM; - free_xid(xid); - return rc; + goto mknod_out; }
if (backup_cred(cifs_sb)) @@ -731,7 +732,7 @@ int cifs_mknod(struct inode *inode, stru pdev->minor = cpu_to_le64(MINOR(device_number)); rc = tcon->ses->server->ops->sync_write(xid, &fid, &io_parms, &bytes_written, iov, 1); - } /* else if (S_ISFIFO) */ + } tcon->ses->server->ops->close(xid, tcon, &fid); d_drop(direntry);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiaoming Gao gxm.linux.kernel@gmail.com
commit d3878e164dcd3925a237a20e879432400e369172 upstream.
The TSC calibration code uses HPET as reference. The conversion normalizes the delta of two HPET timestamps:
hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6
and then divides the normalized delta of the corresponding TSC timestamps by the result to calulate the TSC frequency.
tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref
This uses do_div() which takes an u32 as the divisor, which worked so far because the HPET frequency was low enough that 'hpetref' never exceeded 32bit.
On Skylake machines the HPET frequency increased so 'hpetref' can exceed 32bit. do_div() truncates the divisor, which causes the calibration to fail.
Use div64_u64() to avoid the problem.
[ tglx: Fixes whitespace mangled patch and rewrote changelog ]
Signed-off-by: Xiaoming Gao newtongao@tencent.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Cc: peterz@infradead.org Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/tsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -408,7 +408,7 @@ static unsigned long calc_hpet_ref(u64 d hpet2 -= hpet1; tmp = ((u64)hpet2 * hpet_readl(HPET_PERIOD)); do_div(tmp, 1000000); - do_div(deltatsc, tmp); + deltatsc = div64_u64(deltatsc, tmp);
return (unsigned long) deltatsc; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
commit 78b562fbfa2cf0a9fcb23c3154756b690f4905c1 upstream.
Return immediately when we find issue in the user stack checks. The error value could get overwritten by following check for PERF_SAMPLE_REGS_INTR.
Signed-off-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen andi@firstfloor.org Cc: H. Peter Anvin hpa@zytor.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: syzkaller-bugs@googlegroups.com Cc: x86@kernel.org Fixes: 60e2364e60e8 ("perf: Add ability to sample machine state on interrupt") Link: http://lkml.kernel.org/r/20180415092352.12403-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8133,9 +8133,9 @@ static int perf_copy_attr(struct perf_ev * __u16 sample size limit. */ if (attr->sample_stack_user >= USHRT_MAX) - ret = -EINVAL; + return -EINVAL; else if (!IS_ALIGNED(attr->sample_stack_user, sizeof(u64))) - ret = -EINVAL; + return -EINVAL; }
if (attr->sample_type & PERF_SAMPLE_REGS_INTR)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Feng puck.chen@hisilicon.com
commit 2ef230531ee171a475fc3ddad5516dd7e09a8a77 upstream.
Since ion alloc can be called by userspace,eg gralloc. When it is called frequently, the efficiency of kswapd is to low. And the reclaimed memory is too lower. In this way, the kswapd can use to much cpu resources.
With 3.5GB DMA Zone and 0.5 Normal Zone.
pgsteal_kswapd_dma 9364140 pgsteal_kswapd_normal 7071043 pgscan_kswapd_dma 10428250 pgscan_kswapd_normal 37840094
With this change the reclaim ratio has greatly improved 18.9% -> 72.5%
Signed-off-by: Chen Feng puck.chen@hisilicon.com Signed-off-by: Lu bing albert.lubing@hisilicon.com Reviewed-by: Laura Abbott labbott@redhat.com Cc: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/android/ion/ion_system_heap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/android/ion/ion_system_heap.c +++ b/drivers/staging/android/ion/ion_system_heap.c @@ -27,7 +27,7 @@ #include "ion_priv.h"
static gfp_t high_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN | - __GFP_NORETRY) & ~__GFP_DIRECT_RECLAIM; + __GFP_NORETRY) & ~__GFP_RECLAIM; static gfp_t low_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN); static const unsigned int orders[] = {8, 4, 0}; static const int num_orders = ARRAY_SIZE(orders);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Grant Grundler grundler@chromium.org
commit 90841047a01b452cc8c3f9b990698b264143334a upstream.
This linksys dongle by default comes up in cdc_ether mode. This patch allows r8152 to claim the device: Bus 002 Device 002: ID 13b1:0041 Linksys
Signed-off-by: Grant Grundler grundler@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: David S. Miller davem@davemloft.net [krzk: Rebase on v4.4] Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/cdc_ether.c | 10 ++++++++++ drivers/net/usb/r8152.c | 2 ++ 2 files changed, 12 insertions(+)
--- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -461,6 +461,7 @@ static const struct driver_info wwan_inf #define REALTEK_VENDOR_ID 0x0bda #define SAMSUNG_VENDOR_ID 0x04e8 #define LENOVO_VENDOR_ID 0x17ef +#define LINKSYS_VENDOR_ID 0x13b1 #define NVIDIA_VENDOR_ID 0x0955 #define HP_VENDOR_ID 0x03f0
@@ -650,6 +651,15 @@ static const struct usb_device_id produc .driver_info = 0, },
+#if IS_ENABLED(CONFIG_USB_RTL8152) +/* Linksys USB3GIGV1 Ethernet Adapter */ +{ + USB_DEVICE_AND_INTERFACE_INFO(LINKSYS_VENDOR_ID, 0x0041, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + .driver_info = 0, +}, +#endif + /* Lenovo Thinkpad USB 3.0 Ethernet Adapters (based on Realtek RTL8153) */ { USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x7205, USB_CLASS_COMM, --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -506,6 +506,7 @@ enum rtl8152_flags { #define VENDOR_ID_REALTEK 0x0bda #define VENDOR_ID_SAMSUNG 0x04e8 #define VENDOR_ID_LENOVO 0x17ef +#define VENDOR_ID_LINKSYS 0x13b1 #define VENDOR_ID_NVIDIA 0x0955
#define MCU_TYPE_PLA 0x0100 @@ -4376,6 +4377,7 @@ static struct usb_device_id rtl8152_tabl {REALTEK_USB_DEVICE(VENDOR_ID_SAMSUNG, 0xa101)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x7205)}, {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO, 0x304f)}, + {REALTEK_USB_DEVICE(VENDOR_ID_LINKSYS, 0x0041)}, {REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA, 0x09ff)}, {} };
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Torokhov dmitry.torokhov@gmail.com
commit 74c82dae6c474933f2be401976e1530b5f623221 upstream.
We were accidentally initializing haptics->rated_voltage twice, and did not initialize overdrive voltage.
Acked-by: Dan Murphy dmurphy@ti.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Cc: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/misc/drv260x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/misc/drv260x.c +++ b/drivers/input/misc/drv260x.c @@ -521,7 +521,7 @@ static int drv260x_probe(struct i2c_clie if (!haptics) return -ENOMEM;
- haptics->rated_voltage = DRV260X_DEF_OD_CLAMP_VOLT; + haptics->overdrive_voltage = DRV260X_DEF_OD_CLAMP_VOLT; haptics->rated_voltage = DRV260X_DEF_RATED_VOLT;
if (pdata) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit a34d0a0da1abae46a5f6ebd06fb0ec484ca099d9 upstream.
In an RFC patch, Sven Eckelmann and Simon Wunderlich reported:
"QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a state in which a read of AR_CFG always returns 0xdeadbeef. This should not happen when when the power_mode of the device is ATH9K_PM_AWAKE."
Include the check for the default register state in the existing MAC hang check.
Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Kalle Valo kvalo@qca.qualcomm.com Cc: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath9k/hw.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/wireless/ath/ath9k/hw.c +++ b/drivers/net/wireless/ath/ath9k/hw.c @@ -1595,6 +1595,10 @@ bool ath9k_hw_check_alive(struct ath_hw int count = 50; u32 reg, last_val;
+ /* Check if chip failed to wake up */ + if (REG_READ(ah, AR_CFG) == 0xdeadbeef) + return false; + if (AR_SREV_9300(ah)) return !ath9k_hw_detect_mac_hang(ah);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sahitya Tummala stummala@codeaurora.org
commit dbfcef6b0f4012c57bc0b6e0e660d5ed12a5eaed upstream.
Below is the synchronization issue between unmount and kjournald2 contexts, which results into use after free issue in kjournald2(). Fix this issue by using journal->j_state_lock to synchronize the wait_event() done in journal_kill_thread() and the wake_up() done in kjournald2().
TASK 1: umount cmd: |--jbd2_journal_destroy() { |--journal_kill_thread() { write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_UNMOUNT; ... write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); TASK 2 wakes up here: kjournald2() { ... checks JBD2_UNMOUNT flag and calls goto end-loop; ... end_loop: write_unlock(&journal->j_state_lock); journal->j_task = NULL; --> If this thread gets pre-empted here, then TASK 1 wait_event will exit even before this thread is completely done. wait_event(journal->j_wait_done_commit, journal->j_task == NULL); ... write_lock(&journal->j_state_lock); write_unlock(&journal->j_state_lock); } |--kfree(journal); } } wake_up(&journal->j_wait_done_commit); --> this step now results into use after free issue. }
Signed-off-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/journal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -275,11 +275,11 @@ loop: goto loop;
end_loop: - write_unlock(&journal->j_state_lock); del_timer_sync(&journal->j_commit_timer); journal->j_task = NULL; wake_up(&journal->j_wait_done_commit); jbd_debug(1, "Journal thread exiting.\n"); + write_unlock(&journal->j_state_lock); return 0; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karthikeyan Periyasamy periyasa@codeaurora.org
commit 55cc11da69895a680940c1733caabc37be685f5e upstream.
This reverts commit 55884c045d31a29cf69db8332d1064a1b61dd159.
When Ath10k is in AP mode and an unassociated STA sends a VHT action frame (Operating Mode Notification for the NSS change) periodically to AP this causes ath10k to call ath10k_station_assoc() which sends WMI_PEER_ASSOC_CMDID during NSS update. Over the time (with a certain client it can happen within 15 mins when there are over 500 of these VHT action frames) continuous calls of WMI_PEER_ASSOC_CMDID cause firmware to assert due to resource exhaust.
To my knowledge setting WMI_PEER_NSS peer param itself enough to handle NSS updates and no need to call ath10k_station_assoc(). So revert the original commit from 2014 as it's unclear why the change was really needed. Now the firmware assert doesn't happen anymore.
Issue observed in QCA9984 platform with firmware version:10.4-3.5.3-00053. This Change tested in QCA9984 with firmware version: 10.4-3.5.3-00053 and QCA988x platform with firmware version: 10.2.4-1.0-00036.
Firmware Assert log:
ath10k_pci 0002:01:00.0: firmware crashed! (guid e61f1274-9acd-4c5b-bcca-e032ea6e723c) ath10k_pci 0002:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe ath10k_pci 0002:01:00.0: kconfig debug 1 debugfs 1 tracing 0 dfs 1 testmode 1 ath10k_pci 0002:01:00.0: firmware ver 10.4-3.5.3-00053 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 4c56a386 ath10k_pci 0002:01:00.0: board_file api 2 bmi_id 0:4 crc32 c2271344 ath10k_pci 0002:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal otp max-sta 512 raw 0 hwcrypto 1 ath10k_pci 0002:01:00.0: firmware register dump: ath10k_pci 0002:01:00.0: [00]: 0x0000000A 0x000015B3 0x00981E5F 0x00975B31 ath10k_pci 0002:01:00.0: [04]: 0x00981E5F 0x00060530 0x00000011 0x00446C60 ath10k_pci 0002:01:00.0: [08]: 0x0042F1FC 0x00458080 0x00000017 0x00000000 ath10k_pci 0002:01:00.0: [12]: 0x00000009 0x00000000 0x00973ABC 0x00973AD2 ath10k_pci 0002:01:00.0: [16]: 0x00973AB0 0x00960E62 0x009606CA 0x00000000 ath10k_pci 0002:01:00.0: [20]: 0x40981E5F 0x004066DC 0x00400000 0x00981E34 ath10k_pci 0002:01:00.0: [24]: 0x80983B48 0x0040673C 0x000000C0 0xC0981E5F ath10k_pci 0002:01:00.0: [28]: 0x80993DEB 0x0040676C 0x00431AB8 0x0045D0C4 ath10k_pci 0002:01:00.0: [32]: 0x80993E5C 0x004067AC 0x004303C0 0x0045D0C4 ath10k_pci 0002:01:00.0: [36]: 0x80994AAB 0x004067DC 0x00000000 0x0045D0C4 ath10k_pci 0002:01:00.0: [40]: 0x809971A0 0x0040681C 0x004303C0 0x00441B00 ath10k_pci 0002:01:00.0: [44]: 0x80991904 0x0040688C 0x004303C0 0x0045D0C4 ath10k_pci 0002:01:00.0: [48]: 0x80963AD3 0x00406A7C 0x004303C0 0x009918FC ath10k_pci 0002:01:00.0: [52]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000 ath10k_pci 0002:01:00.0: [56]: 0x80960E51 0x00406ACC 0x00400000 0x00000000 ath10k_pci 0002:01:00.0: Copy Engine register dump: ath10k_pci 0002:01:00.0: index: addr: sr_wr_idx: sr_r_idx: dst_wr_idx: dst_r_idx: ath10k_pci 0002:01:00.0: [00]: 0x0004a000 15 15 3 3 ath10k_pci 0002:01:00.0: [01]: 0x0004a400 17 17 212 213 ath10k_pci 0002:01:00.0: [02]: 0x0004a800 21 21 20 21 ath10k_pci 0002:01:00.0: [03]: 0x0004ac00 25 25 27 25 ath10k_pci 0002:01:00.0: [04]: 0x0004b000 515 515 144 104 ath10k_pci 0002:01:00.0: [05]: 0x0004b400 28 28 155 156 ath10k_pci 0002:01:00.0: [06]: 0x0004b800 12 12 12 12 ath10k_pci 0002:01:00.0: [07]: 0x0004bc00 1 1 1 1 ath10k_pci 0002:01:00.0: [08]: 0x0004c000 0 0 127 0 ath10k_pci 0002:01:00.0: [09]: 0x0004c400 1 1 1 1 ath10k_pci 0002:01:00.0: [10]: 0x0004c800 0 0 0 0 ath10k_pci 0002:01:00.0: [11]: 0x0004cc00 0 0 0 0 ath10k_pci 0002:01:00.0: CE[1] write_index 212 sw_index 213 hw_index 0 nentries_mask 0x000001ff ath10k_pci 0002:01:00.0: CE[2] write_index 20 sw_index 21 hw_index 0 nentries_mask 0x0000007f ath10k_pci 0002:01:00.0: CE[5] write_index 155 sw_index 156 hw_index 0 nentries_mask 0x000001ff ath10k_pci 0002:01:00.0: DMA addr: nbytes: meta data: byte swap: gather: ath10k_pci 0002:01:00.0: [455]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [456]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [457]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [458]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [459]: 0x580c0a42 0 0 0 0 ath10k_pci 0002:01:00.0: [460]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [461]: 0x580c0c42 0 0 0 0 ath10k_pci 0002:01:00.0: [462]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [463]: 0x580c0c42 0 0 0 0 ath10k_pci 0002:01:00.0: [464]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [465]: 0x580c0a42 0 0 0 0 ath10k_pci 0002:01:00.0: [466]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [467]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [468]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [469]: 0x580c1c42 0 0 0 0 ath10k_pci 0002:01:00.0: [470]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [471]: 0x580c1c42 0 0 0 0 ath10k_pci 0002:01:00.0: [472]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [473]: 0x580c1c42 0 0 0 0 ath10k_pci 0002:01:00.0: [474]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [475]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [476]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [477]: 0x580c0842 0 0 0 0 ath10k_pci 0002:01:00.0: [478]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [479]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [480]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [481]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [482]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [483]: 0x580c0842 0 0 0 0 ath10k_pci 0002:01:00.0: [484]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [485]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [486]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [487]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [488]: 0x594a0038 0 0 0 1 ath10k_pci 0002:01:00.0: [489]: 0x580c0842 0 0 0 0 ath10k_pci 0002:01:00.0: [490]: 0x594a0060 0 0 0 1 ath10k_pci 0002:01:00.0: [491]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [492]: 0x58174040 0 1 0 0 ath10k_pci 0002:01:00.0: [493]: 0x5a946040 0 1 0 0 ath10k_pci 0002:01:00.0: [494]: 0x59909040 0 1 0 0 ath10k_pci 0002:01:00.0: [495]: 0x5ae5a040 0 1 0 0 ath10k_pci 0002:01:00.0: [496]: 0x58096040 0 1 0 0 ath10k_pci 0002:01:00.0: [497]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [498]: 0x580c0642 0 0 0 0 ath10k_pci 0002:01:00.0: [499]: 0x5c1e0040 0 1 0 0 ath10k_pci 0002:01:00.0: [500]: 0x58153040 0 1 0 0 ath10k_pci 0002:01:00.0: [501]: 0x58129040 0 1 0 0 ath10k_pci 0002:01:00.0: [502]: 0x5952f040 0 1 0 0 ath10k_pci 0002:01:00.0: [503]: 0x59535040 0 1 0 0 ath10k_pci 0002:01:00.0: [504]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [505]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [506]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [507]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [508]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [509]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [510]: 0x594a0010 0 0 0 1 ath10k_pci 0002:01:00.0: [511]: 0x580c0042 0 0 0 0 ath10k_pci 0002:01:00.0: [512]: 0x5adcc040 0 1 0 0 ath10k_pci 0002:01:00.0: [513]: 0x5cf3d040 0 1 0 0 ath10k_pci 0002:01:00.0: [514]: 0x5c1e9040 64 1 0 0 ath10k_pci 0002:01:00.0: [515]: 0x00000000 0 0 0 0
Signed-off-by: Karthikeyan Periyasamy periyasa@codeaurora.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/mac.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -5285,9 +5285,8 @@ static void ath10k_sta_rc_update_wk(stru sta->addr, smps, err); }
- if (changed & IEEE80211_RC_SUPP_RATES_CHANGED || - changed & IEEE80211_RC_NSS_CHANGED) { - ath10k_dbg(ar, ATH10K_DBG_MAC, "mac update sta %pM supp rates/nss\n", + if (changed & IEEE80211_RC_SUPP_RATES_CHANGED) { + ath10k_dbg(ar, ATH10K_DBG_MAC, "mac update sta %pM supp rates\n", sta->addr);
err = ath10k_station_assoc(ar, arvif->vif, sta, true);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Vasily Gorbik gor@linux.vnet.ibm.com
[ Upstream commit 686140a1a9c41d85a4212a1c26d671139b76404b ]
Implement CPU alternatives, which allows to optionally patch newer instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although ideal instructions padding (when altinstr is longer then oldinstr) is added at compile time, and no oldinstr nops optimization has to be done at runtime. Also couple of compile time sanity checks are done: 1. oldinstr and altinstr must be <= 254 bytes long, 2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility); alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of __init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik gor@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 3 arch/s390/Kconfig | 17 +++ arch/s390/include/asm/alternative.h | 163 ++++++++++++++++++++++++++++++++++++ arch/s390/kernel/Makefile | 1 arch/s390/kernel/alternative.c | 110 ++++++++++++++++++++++++ arch/s390/kernel/module.c | 17 +++ arch/s390/kernel/setup.c | 3 arch/s390/kernel/vmlinux.lds.S | 23 +++++ 8 files changed, 337 insertions(+) create mode 100644 arch/s390/include/asm/alternative.h create mode 100644 arch/s390/kernel/alternative.c
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2402,6 +2402,9 @@ bytes respectively. Such letter suffixes
noalign [KNL,ARM]
+ noaltinstr [S390] Disables alternative instructions patching + (CPU alternatives feature). + noapic [SMP,APIC] Tells the kernel to not make use of any IOAPICs that may be present in the system.
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -705,6 +705,22 @@ config SECCOMP
If unsure, say Y.
+config ALTERNATIVES + def_bool y + prompt "Patch optimized instructions for running CPU type" + help + When enabled the kernel code is compiled with additional + alternative instructions blocks optimized for newer CPU types. + These alternative instructions blocks are patched at kernel boot + time when running CPU supports them. This mechanism is used to + optimize some critical code paths (i.e. spinlocks) for newer CPUs + even if kernel is build to support older machine generations. + + This mechanism could be disabled by appending "noaltinstr" + option to the kernel command line. + + If unsure, say Y. + endmenu
menu "Power Management" @@ -754,6 +770,7 @@ config PFAULT config SHARED_KERNEL bool "VM shared kernel support" depends on !JUMP_LABEL + depends on !ALTERNATIVES help Select this option, if you want to share the text segment of the Linux kernel between different VM guests. This reduces memory --- /dev/null +++ b/arch/s390/include/asm/alternative.h @@ -0,0 +1,163 @@ +#ifndef _ASM_S390_ALTERNATIVE_H +#define _ASM_S390_ALTERNATIVE_H + +#ifndef __ASSEMBLY__ + +#include <linux/types.h> +#include <linux/stddef.h> +#include <linux/stringify.h> + +struct alt_instr { + s32 instr_offset; /* original instruction */ + s32 repl_offset; /* offset to replacement instruction */ + u16 facility; /* facility bit set for replacement */ + u8 instrlen; /* length of original instruction */ + u8 replacementlen; /* length of new instruction */ +} __packed; + +#ifdef CONFIG_ALTERNATIVES +extern void apply_alternative_instructions(void); +extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); +#else +static inline void apply_alternative_instructions(void) {}; +static inline void apply_alternatives(struct alt_instr *start, + struct alt_instr *end) {}; +#endif +/* + * |661: |662: |6620 |663: + * +-----------+---------------------+ + * | oldinstr | oldinstr_padding | + * | +----------+----------+ + * | | | | + * | | >6 bytes |6/4/2 nops| + * | |6 bytes jg-----------> + * +-----------+---------------------+ + * ^^ static padding ^^ + * + * .altinstr_replacement section + * +---------------------+-----------+ + * |6641: |6651: + * | alternative instr 1 | + * +-----------+---------+- - - - - -+ + * |6642: |6652: | + * | alternative instr 2 | padding + * +---------------------+- - - - - -+ + * ^ runtime ^ + * + * .altinstructions section + * +---------------------------------+ + * | alt_instr entries for each | + * | alternative instr | + * +---------------------------------+ + */ + +#define b_altinstr(num) "664"#num +#define e_altinstr(num) "665"#num + +#define e_oldinstr_pad_end "663" +#define oldinstr_len "662b-661b" +#define oldinstr_total_len e_oldinstr_pad_end"b-661b" +#define altinstr_len(num) e_altinstr(num)"b-"b_altinstr(num)"b" +#define oldinstr_pad_len(num) \ + "-(((" altinstr_len(num) ")-(" oldinstr_len ")) > 0) * " \ + "((" altinstr_len(num) ")-(" oldinstr_len "))" + +#define INSTR_LEN_SANITY_CHECK(len) \ + ".if " len " > 254\n" \ + "\t.error "cpu alternatives does not support instructions " \ + "blocks > 254 bytes"\n" \ + ".endif\n" \ + ".if (" len ") %% 2\n" \ + "\t.error "cpu alternatives instructions length is odd"\n" \ + ".endif\n" + +#define OLDINSTR_PADDING(oldinstr, num) \ + ".if " oldinstr_pad_len(num) " > 6\n" \ + "\tjg " e_oldinstr_pad_end "f\n" \ + "6620:\n" \ + "\t.fill (" oldinstr_pad_len(num) " - (6620b-662b)) / 2, 2, 0x0700\n" \ + ".else\n" \ + "\t.fill " oldinstr_pad_len(num) " / 6, 6, 0xc0040000\n" \ + "\t.fill " oldinstr_pad_len(num) " %% 6 / 4, 4, 0x47000000\n" \ + "\t.fill " oldinstr_pad_len(num) " %% 6 %% 4 / 2, 2, 0x0700\n" \ + ".endif\n" + +#define OLDINSTR(oldinstr, num) \ + "661:\n\t" oldinstr "\n662:\n" \ + OLDINSTR_PADDING(oldinstr, num) \ + e_oldinstr_pad_end ":\n" \ + INSTR_LEN_SANITY_CHECK(oldinstr_len) + +#define OLDINSTR_2(oldinstr, num1, num2) \ + "661:\n\t" oldinstr "\n662:\n" \ + ".if " altinstr_len(num1) " < " altinstr_len(num2) "\n" \ + OLDINSTR_PADDING(oldinstr, num2) \ + ".else\n" \ + OLDINSTR_PADDING(oldinstr, num1) \ + ".endif\n" \ + e_oldinstr_pad_end ":\n" \ + INSTR_LEN_SANITY_CHECK(oldinstr_len) + +#define ALTINSTR_ENTRY(facility, num) \ + "\t.long 661b - .\n" /* old instruction */ \ + "\t.long " b_altinstr(num)"b - .\n" /* alt instruction */ \ + "\t.word " __stringify(facility) "\n" /* facility bit */ \ + "\t.byte " oldinstr_total_len "\n" /* source len */ \ + "\t.byte " altinstr_len(num) "\n" /* alt instruction len */ + +#define ALTINSTR_REPLACEMENT(altinstr, num) /* replacement */ \ + b_altinstr(num)":\n\t" altinstr "\n" e_altinstr(num) ":\n" \ + INSTR_LEN_SANITY_CHECK(altinstr_len(num)) + +#ifdef CONFIG_ALTERNATIVES +/* alternative assembly primitive: */ +#define ALTERNATIVE(oldinstr, altinstr, facility) \ + ".pushsection .altinstr_replacement, "ax"\n" \ + ALTINSTR_REPLACEMENT(altinstr, 1) \ + ".popsection\n" \ + OLDINSTR(oldinstr, 1) \ + ".pushsection .altinstructions,"a"\n" \ + ALTINSTR_ENTRY(facility, 1) \ + ".popsection\n" + +#define ALTERNATIVE_2(oldinstr, altinstr1, facility1, altinstr2, facility2)\ + ".pushsection .altinstr_replacement, "ax"\n" \ + ALTINSTR_REPLACEMENT(altinstr1, 1) \ + ALTINSTR_REPLACEMENT(altinstr2, 2) \ + ".popsection\n" \ + OLDINSTR_2(oldinstr, 1, 2) \ + ".pushsection .altinstructions,"a"\n" \ + ALTINSTR_ENTRY(facility1, 1) \ + ALTINSTR_ENTRY(facility2, 2) \ + ".popsection\n" +#else +/* Alternative instructions are disabled, let's put just oldinstr in */ +#define ALTERNATIVE(oldinstr, altinstr, facility) \ + oldinstr "\n" + +#define ALTERNATIVE_2(oldinstr, altinstr1, facility1, altinstr2, facility2) \ + oldinstr "\n" +#endif + +/* + * Alternative instructions for different CPU types or capabilities. + * + * This allows to use optimized instructions even on generic binary + * kernels. + * + * oldinstr is padded with jump and nops at compile time if altinstr is + * longer. altinstr is padded with jump and nops at run-time during patching. + * + * For non barrier like inlines please define new variants + * without volatile and memory clobber. + */ +#define alternative(oldinstr, altinstr, facility) \ + asm volatile(ALTERNATIVE(oldinstr, altinstr, facility) : : : "memory") + +#define alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2) \ + asm volatile(ALTERNATIVE_2(oldinstr, altinstr1, facility1, \ + altinstr2, facility2) ::: "memory") + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_S390_ALTERNATIVE_H */ --- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_UPROBES) += uprobes.o +obj-$(CONFIG_ALTERNATIVES) += alternative.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_cpum_cf.o perf_cpum_sf.o obj-$(CONFIG_PERF_EVENTS) += perf_cpum_cf_events.o --- /dev/null +++ b/arch/s390/kernel/alternative.c @@ -0,0 +1,110 @@ +#include <linux/module.h> +#include <asm/alternative.h> +#include <asm/facility.h> + +#define MAX_PATCH_LEN (255 - 1) + +static int __initdata_or_module alt_instr_disabled; + +static int __init disable_alternative_instructions(char *str) +{ + alt_instr_disabled = 1; + return 0; +} + +early_param("noaltinstr", disable_alternative_instructions); + +struct brcl_insn { + u16 opc; + s32 disp; +} __packed; + +static u16 __initdata_or_module nop16 = 0x0700; +static u32 __initdata_or_module nop32 = 0x47000000; +static struct brcl_insn __initdata_or_module nop48 = { + 0xc004, 0 +}; + +static const void *nops[] __initdata_or_module = { + &nop16, + &nop32, + &nop48 +}; + +static void __init_or_module add_jump_padding(void *insns, unsigned int len) +{ + struct brcl_insn brcl = { + 0xc0f4, + len / 2 + }; + + memcpy(insns, &brcl, sizeof(brcl)); + insns += sizeof(brcl); + len -= sizeof(brcl); + + while (len > 0) { + memcpy(insns, &nop16, 2); + insns += 2; + len -= 2; + } +} + +static void __init_or_module add_padding(void *insns, unsigned int len) +{ + if (len > 6) + add_jump_padding(insns, len); + else if (len >= 2) + memcpy(insns, nops[len / 2 - 1], len); +} + +static void __init_or_module __apply_alternatives(struct alt_instr *start, + struct alt_instr *end) +{ + struct alt_instr *a; + u8 *instr, *replacement; + u8 insnbuf[MAX_PATCH_LEN]; + + /* + * The scan order should be from start to end. A later scanned + * alternative code can overwrite previously scanned alternative code. + */ + for (a = start; a < end; a++) { + int insnbuf_sz = 0; + + instr = (u8 *)&a->instr_offset + a->instr_offset; + replacement = (u8 *)&a->repl_offset + a->repl_offset; + + if (!test_facility(a->facility)) + continue; + + if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) { + WARN_ONCE(1, "cpu alternatives instructions length is " + "odd, skipping patching\n"); + continue; + } + + memcpy(insnbuf, replacement, a->replacementlen); + insnbuf_sz = a->replacementlen; + + if (a->instrlen > a->replacementlen) { + add_padding(insnbuf + a->replacementlen, + a->instrlen - a->replacementlen); + insnbuf_sz += a->instrlen - a->replacementlen; + } + + s390_kernel_write(instr, insnbuf, insnbuf_sz); + } +} + +void __init_or_module apply_alternatives(struct alt_instr *start, + struct alt_instr *end) +{ + if (!alt_instr_disabled) + __apply_alternatives(start, end); +} + +extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; +void __init apply_alternative_instructions(void) +{ + apply_alternatives(__alt_instructions, __alt_instructions_end); +} --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -31,6 +31,7 @@ #include <linux/kernel.h> #include <linux/moduleloader.h> #include <linux/bug.h> +#include <asm/alternative.h>
#if 0 #define DEBUGP printk @@ -424,6 +425,22 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *me) { + const Elf_Shdr *s; + char *secstrings; + + if (IS_ENABLED(CONFIG_ALTERNATIVES)) { + secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; + for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { + if (!strcmp(".altinstructions", + secstrings + s->sh_name)) { + /* patch .altinstructions */ + void *aseg = (void *)s->sh_addr; + + apply_alternatives(aseg, aseg + s->sh_size); + } + } + } + jump_label_apply_nops(me); vfree(me->arch.syminfo); me->arch.syminfo = NULL; --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -63,6 +63,7 @@ #include <asm/sclp.h> #include <asm/sysinfo.h> #include <asm/numa.h> +#include <asm/alternative.h> #include "entry.h"
/* @@ -893,6 +894,8 @@ void __init setup_arch(char **cmdline_p) conmode_default(); set_preferred_console();
+ apply_alternative_instructions(); + /* Setup zfcpdump support */ setup_zfcpdump();
--- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -78,6 +78,29 @@ SECTIONS EXIT_DATA }
+ /* + * struct alt_inst entries. From the header (alternative.h): + * "Alternative instructions for different CPU types or capabilities" + * Think locking instructions on spinlocks. + * Note, that it is a part of __init region. + */ + . = ALIGN(8); + .altinstructions : { + __alt_instructions = .; + *(.altinstructions) + __alt_instructions_end = .; + } + + /* + * And here are the replacement instructions. The linker sticks + * them as binary blobs. The .altinstructions has enough data to + * get the address and the length of them to patch the kernel safely. + * Note, that it is a part of __init region. + */ + .altinstr_replacement : { + *(.altinstr_replacement) + } + /* early.c uses stsi, which requires page aligned data. */ . = ALIGN(PAGE_SIZE); INIT_DATA_SECTION(0x100)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Heiko Carstens heiko.carstens@de.ibm.com
[ Upstream commit 049a2c2d486e8cc82c5cd79fa479c5b105b109e9 ]
Remove the CPU_ALTERNATIVES config option and enable the code unconditionally. The config option was only added to avoid a conflict with the named saved segment support. Since that code is gone there is no reason to keep the CPU_ALTERNATIVES config option.
Just enable it unconditionally to also reduce the number of config options and make it less likely that something breaks.
Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 16 ---------------- arch/s390/include/asm/alternative.h | 20 +++----------------- arch/s390/kernel/Makefile | 3 +-- arch/s390/kernel/module.c | 15 ++++++--------- 4 files changed, 10 insertions(+), 44 deletions(-)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -705,22 +705,6 @@ config SECCOMP
If unsure, say Y.
-config ALTERNATIVES - def_bool y - prompt "Patch optimized instructions for running CPU type" - help - When enabled the kernel code is compiled with additional - alternative instructions blocks optimized for newer CPU types. - These alternative instructions blocks are patched at kernel boot - time when running CPU supports them. This mechanism is used to - optimize some critical code paths (i.e. spinlocks) for newer CPUs - even if kernel is build to support older machine generations. - - This mechanism could be disabled by appending "noaltinstr" - option to the kernel command line. - - If unsure, say Y. - endmenu
menu "Power Management" --- a/arch/s390/include/asm/alternative.h +++ b/arch/s390/include/asm/alternative.h @@ -15,14 +15,9 @@ struct alt_instr { u8 replacementlen; /* length of new instruction */ } __packed;
-#ifdef CONFIG_ALTERNATIVES -extern void apply_alternative_instructions(void); -extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); -#else -static inline void apply_alternative_instructions(void) {}; -static inline void apply_alternatives(struct alt_instr *start, - struct alt_instr *end) {}; -#endif +void apply_alternative_instructions(void); +void apply_alternatives(struct alt_instr *start, struct alt_instr *end); + /* * |661: |662: |6620 |663: * +-----------+---------------------+ @@ -109,7 +104,6 @@ static inline void apply_alternatives(st b_altinstr(num)":\n\t" altinstr "\n" e_altinstr(num) ":\n" \ INSTR_LEN_SANITY_CHECK(altinstr_len(num))
-#ifdef CONFIG_ALTERNATIVES /* alternative assembly primitive: */ #define ALTERNATIVE(oldinstr, altinstr, facility) \ ".pushsection .altinstr_replacement, "ax"\n" \ @@ -130,14 +124,6 @@ static inline void apply_alternatives(st ALTINSTR_ENTRY(facility1, 1) \ ALTINSTR_ENTRY(facility2, 2) \ ".popsection\n" -#else -/* Alternative instructions are disabled, let's put just oldinstr in */ -#define ALTERNATIVE(oldinstr, altinstr, facility) \ - oldinstr "\n" - -#define ALTERNATIVE_2(oldinstr, altinstr1, facility1, altinstr2, facility2) \ - oldinstr "\n" -#endif
/* * Alternative instructions for different CPU types or capabilities. --- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -44,7 +44,7 @@ obj-y += processor.o sys_s390.o ptrace.o obj-y += debug.o irq.o ipl.o dis.o diag.o sclp.o vdso.o obj-y += sysinfo.o jump_label.o lgr.o os_info.o machine_kexec.o pgm_check.o obj-y += runtime_instr.o cache.o dumpstack.o -obj-y += entry.o reipl.o relocate_kernel.o +obj-y += entry.o reipl.o relocate_kernel.o alternative.o
extra-y += head.o head64.o vmlinux.lds
@@ -62,7 +62,6 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_UPROBES) += uprobes.o -obj-$(CONFIG_ALTERNATIVES) += alternative.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o perf_cpum_cf.o perf_cpum_sf.o obj-$(CONFIG_PERF_EVENTS) += perf_cpum_cf_events.o --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -428,16 +428,13 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *s; char *secstrings;
- if (IS_ENABLED(CONFIG_ALTERNATIVES)) { - secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; - for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { - if (!strcmp(".altinstructions", - secstrings + s->sh_name)) { - /* patch .altinstructions */ - void *aseg = (void *)s->sh_addr; + secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; + for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { + if (!strcmp(".altinstructions", secstrings + s->sh_name)) { + /* patch .altinstructions */ + void *aseg = (void *)s->sh_addr;
- apply_alternatives(aseg, aseg + s->sh_size); - } + apply_alternatives(aseg, aseg + s->sh_size); } }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 ]
Clear all user space registers on entry to the kernel and all KVM guest registers on KVM guest exit if the register does not contain either a parameter or a result value.
Reviewed-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -244,6 +244,12 @@ ENTRY(sie64a) sie_exit: lg %r14,__SF_EMPTY+8(%r15) # load guest register save area stmg %r0,%r13,0(%r14) # save guest gprs 0-13 + xgr %r0,%r0 # clear guest registers to + xgr %r1,%r1 # prevent speculative use + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code br %r14 @@ -277,6 +283,8 @@ ENTRY(system_call) .Lsysc_vtime: UPDATE_VTIME %r10,%r13,__LC_SYNC_ENTER_TIMER stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled register to prevent speculative use + xgr %r0,%r0 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC @@ -504,6 +512,15 @@ ENTRY(pgm_check_handler) mvc __THREAD_trap_tdb(256,%r14),0(%r13) 3: la %r11,STACK_FRAME_OVERHEAD(%r15) stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC stmg %r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC @@ -567,6 +584,16 @@ ENTRY(io_int_handler) lmg %r8,%r9,__LC_IO_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg %r8,%r9,__PT_PSW(%r11) mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID @@ -742,6 +769,16 @@ ENTRY(ext_int_handler) lmg %r8,%r9,__LC_EXT_OLD_PSW SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC stmg %r8,%r9,__PT_PSW(%r11) lghi %r1,__LC_EXT_PARAMS2 @@ -908,6 +945,16 @@ ENTRY(mcck_int_handler) .Lmcck_skip: lghi %r14,__LC_GPREGS_SAVE_AREA+64 stmg %r0,%r7,__PT_R0(%r11) + # clear user controlled registers to prevent speculative use + xgr %r0,%r0 + xgr %r1,%r1 + xgr %r2,%r2 + xgr %r3,%r3 + xgr %r4,%r4 + xgr %r5,%r5 + xgr %r6,%r6 + xgr %r7,%r7 + xgr %r10,%r10 mvc __PT_R8(64,%r11),0(%r14) stmg %r8,%r9,__PT_PSW(%r11) xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit e2dd833389cc4069a96b57bdd24227b5f52288f5 ]
Add an optimized version of the array_index_mask_nospec function for s390 based on a compare and a subtract with borrow.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/barrier.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
--- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -53,4 +53,28 @@ do { \ ___p1; \ })
+/** + * array_index_mask_nospec - generate a mask for array_idx() that is + * ~0UL when the bounds check succeeds and 0 otherwise + * @index: array element index + * @size: number of elements in array + */ +#define array_index_mask_nospec array_index_mask_nospec +static inline unsigned long array_index_mask_nospec(unsigned long index, + unsigned long size) +{ + unsigned long mask; + + if (__builtin_constant_p(size) && size > 0) { + asm(" clgr %2,%1\n" + " slbgr %0,%0\n" + :"=d" (mask) : "d" (size-1), "d" (index) :"cc"); + return mask; + } + asm(" clgr %1,%2\n" + " slbgr %0,%0\n" + :"=d" (mask) : "d" (size), "d" (index) :"cc"); + return ~mask; +} + #endif /* __ASM_BARRIER_H */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/facility.h | 18 ++++++++++++++++++ arch/s390/include/asm/lowcore.h | 3 ++- arch/s390/kernel/alternative.c | 3 ++- arch/s390/kernel/early.c | 3 +++ arch/s390/kernel/setup.c | 4 +++- arch/s390/kernel/smp.c | 4 +++- 6 files changed, 31 insertions(+), 4 deletions(-)
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@
#define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
+static inline void __set_facility(unsigned long nr, void *facilities) +{ + unsigned char *ptr = (unsigned char *) facilities; + + if (nr >= MAX_FACILITY_BIT) + return; + ptr[nr >> 3] |= 0x80 >> (nr & 7); +} + +static inline void __clear_facility(unsigned long nr, void *facilities) +{ + unsigned char *ptr = (unsigned char *) facilities; + + if (nr >= MAX_FACILITY_BIT) + return; + ptr[nr >> 3] &= ~(0x80 >> (nr & 7)); +} + static inline int __test_facility(unsigned long nr, void *facilities) { unsigned char *ptr; --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -170,7 +170,8 @@ struct _lowcore { __u8 pad_0x0e20[0x0f00-0x0e20]; /* 0x0e20 */
/* Extended facility list */ - __u64 stfle_fac_list[32]; /* 0x0f00 */ + __u64 stfle_fac_list[16]; /* 0x0f00 */ + __u64 alt_stfle_fac_list[16]; /* 0x0f80 */ __u8 pad_0x1000[0x11b0-0x1000]; /* 0x1000 */
/* Pointer to vector register save area */ --- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -74,7 +74,8 @@ static void __init_or_module __apply_alt instr = (u8 *)&a->instr_offset + a->instr_offset; replacement = (u8 *)&a->repl_offset + a->repl_offset;
- if (!test_facility(a->facility)) + if (!__test_facility(a->facility, + S390_lowcore.alt_stfle_fac_list)) continue;
if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) { --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -279,6 +279,9 @@ static noinline __init void setup_facili { stfle(S390_lowcore.stfle_fac_list, ARRAY_SIZE(S390_lowcore.stfle_fac_list)); + memcpy(S390_lowcore.alt_stfle_fac_list, + S390_lowcore.stfle_fac_list, + sizeof(S390_lowcore.alt_stfle_fac_list)); }
static __init void detect_diag9c(void) --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -334,7 +334,9 @@ static void __init setup_lowcore(void) lc->machine_flags = S390_lowcore.machine_flags; lc->stfl_fac_list = S390_lowcore.stfl_fac_list; memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list, - MAX_FACILITY_BIT/8); + sizeof(lc->stfle_fac_list)); + memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list, + sizeof(lc->alt_stfle_fac_list)); if (MACHINE_HAS_VX) lc->vector_save_area_addr = (unsigned long) &lc->vector_save_area; --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -250,7 +250,9 @@ static void pcpu_prepare_secondary(struc __ctl_store(lc->cregs_save_area, 0, 15); save_access_regs((unsigned int *) lc->access_regs_save_area); memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list, - MAX_FACILITY_BIT/8); + sizeof(lc->stfle_fac_list)); + memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list, + sizeof(lc->alt_stfle_fac_list)); }
static void pcpu_attach_task(struct pcpu *pcpu, struct task_struct *tsk)
On 04/27/2018, 03:58 PM, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
...
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@ #define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
I wonder if the below (plus __test_facility) is correct in 4.4, given MAX_FACILITY_BIT is defined as such and not as sizeof(stfle_fac_list * 8) as in upstream?
+static inline void __set_facility(unsigned long nr, void *facilities) +{
- unsigned char *ptr = (unsigned char *) facilities;
- if (nr >= MAX_FACILITY_BIT)
return;
- ptr[nr >> 3] |= 0x80 >> (nr & 7);
+}
+static inline void __clear_facility(unsigned long nr, void *facilities) +{
- unsigned char *ptr = (unsigned char *) facilities;
- if (nr >= MAX_FACILITY_BIT)
return;
- ptr[nr >> 3] &= ~(0x80 >> (nr & 7));
+}
static inline int __test_facility(unsigned long nr, void *facilities) { unsigned char *ptr; --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -170,7 +170,8 @@ struct _lowcore { __u8 pad_0x0e20[0x0f00-0x0e20]; /* 0x0e20 */ /* Extended facility list */
- __u64 stfle_fac_list[32]; /* 0x0f00 */
- __u64 stfle_fac_list[16]; /* 0x0f00 */
- __u64 alt_stfle_fac_list[16]; /* 0x0f80 */ __u8 pad_0x1000[0x11b0-0x1000]; /* 0x1000 */
thanks,
On 05/04/2018, 09:37 AM, Jiri Slaby wrote:
On 04/27/2018, 03:58 PM, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
...
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@ #define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
I wonder if the below (plus __test_facility) is correct in 4.4, given MAX_FACILITY_BIT is defined as such and not as sizeof(stfle_fac_list * 8) as in upstream?
IOW, we should include this 4.12-commit in 4.4 too: commit 6f5165e864d240d15675cc2fb5a369d57e1f60d0 Author: Heiko Carstens heiko.carstens@de.ibm.com Date: Mon Mar 20 14:29:50 2017 +0100
s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
thanks,
On Fri, May 04, 2018 at 09:37:20AM +0200, Jiri Slaby wrote:
On 04/27/2018, 03:58 PM, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
...
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@ #define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
I wonder if the below (plus __test_facility) is correct in 4.4, given MAX_FACILITY_BIT is defined as such and not as sizeof(stfle_fac_list * 8) as in upstream?
I'm going to defer to Marin here, as he did the backport... Martin?
thanks,
greg k-h
On Fri, 4 May 2018 15:18:08 -0700 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Fri, May 04, 2018 at 09:37:20AM +0200, Jiri Slaby wrote:
On 04/27/2018, 03:58 PM, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
...
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@ #define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
I wonder if the below (plus __test_facility) is correct in 4.4, given MAX_FACILITY_BIT is defined as such and not as sizeof(stfle_fac_list * 8) as in upstream?
I'm going to defer to Marin here, as he did the backport... Martin?
Good catch. With MAX_FACILITY_BIT == 2048 and the patch applied the result for a test_facility/__test_facility call with a facility number >= 1024 would give an incorrect result. Fortunately there are no such calls in the current 4.4 kernel source. And there are no facilities defined with bit numbers this large, so even out-of-tree code would not do this if it is sane.
To correct this the MAX_FACILITY_BIT define needs to be reduced to 1024 which would require the patch pointed out be Heiko:
commit 6f5165e864d240d15675cc2fb5a369d57e1f60d0 Author: Heiko Carstens heiko.carstens@de.ibm.com Date: Mon Mar 20 14:29:50 2017 +0100
s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
I would say yes, it *does* make sense to include this patch even if it does not fix anything.
On Mon, May 07, 2018 at 08:07:07AM +0200, Martin Schwidefsky wrote:
On Fri, 4 May 2018 15:18:08 -0700 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Fri, May 04, 2018 at 09:37:20AM +0200, Jiri Slaby wrote:
On 04/27/2018, 03:58 PM, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
...
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@ #define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
I wonder if the below (plus __test_facility) is correct in 4.4, given MAX_FACILITY_BIT is defined as such and not as sizeof(stfle_fac_list * 8) as in upstream?
I'm going to defer to Marin here, as he did the backport... Martin?
Good catch. With MAX_FACILITY_BIT == 2048 and the patch applied the result for a test_facility/__test_facility call with a facility number >= 1024 would give an incorrect result. Fortunately there are no such calls in the current 4.4 kernel source. And there are no facilities defined with bit numbers this large, so even out-of-tree code would not do this if it is sane.
To correct this the MAX_FACILITY_BIT define needs to be reduced to 1024 which would require the patch pointed out be Heiko:
commit 6f5165e864d240d15675cc2fb5a369d57e1f60d0 Author: Heiko Carstens heiko.carstens@de.ibm.com Date: Mon Mar 20 14:29:50 2017 +0100
s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
I would say yes, it *does* make sense to include this patch even if it does not fix anything.
Ok, I've now queued this up for 4.4.y and 4.9.y, thanks!
greg k-h
On Tue, May 08, 2018 at 09:20:55AM +0200, Greg Kroah-Hartman wrote:
On Mon, May 07, 2018 at 08:07:07AM +0200, Martin Schwidefsky wrote:
On Fri, 4 May 2018 15:18:08 -0700 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Fri, May 04, 2018 at 09:37:20AM +0200, Jiri Slaby wrote:
On 04/27/2018, 03:58 PM, Greg Kroah-Hartman wrote:
4.4-stable review patch. If anyone has any objections, please let me know.
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters make a copy of the facility bit mask provided by STFLE and use the copy for the decision to apply an alternative.
...
--- a/arch/s390/include/asm/facility.h +++ b/arch/s390/include/asm/facility.h @@ -13,6 +13,24 @@ #define MAX_FACILITY_BIT (256*8) /* stfle_fac_list has 256 bytes */
I wonder if the below (plus __test_facility) is correct in 4.4, given MAX_FACILITY_BIT is defined as such and not as sizeof(stfle_fac_list * 8) as in upstream?
I'm going to defer to Marin here, as he did the backport... Martin?
Good catch. With MAX_FACILITY_BIT == 2048 and the patch applied the result for a test_facility/__test_facility call with a facility number >= 1024 would give an incorrect result. Fortunately there are no such calls in the current 4.4 kernel source. And there are no facilities defined with bit numbers this large, so even out-of-tree code would not do this if it is sane.
To correct this the MAX_FACILITY_BIT define needs to be reduced to 1024 which would require the patch pointed out be Heiko:
commit 6f5165e864d240d15675cc2fb5a369d57e1f60d0 Author: Heiko Carstens heiko.carstens@de.ibm.com Date: Mon Mar 20 14:29:50 2017 +0100
s390/facilites: use stfle_fac_list array size for MAX_FACILITY_BIT
I would say yes, it *does* make sense to include this patch even if it does not fix anything.
Ok, I've now queued this up for 4.4.y and 4.9.y, thanks!
It breaks the build on 4.4.y, so I'm dropping it :(
If you think it's needed there, a working version would be nice :)
thanks,
greg k-h
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit d768bd892fc8f066cd3aa000eb1867bcf32db0ee ]
Add the PPA instruction to the system entry and exit path to switch the kernel to a different branch prediction behaviour. The instructions are added via CPU alternatives and can be disabled with the "nospec" or the "nobp=0" kernel parameter. If the default behaviour selected with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be used to enable the changed kernel branch prediction.
Acked-by: Cornelia Huck cohuck@redhat.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 17 +++++++++++++ arch/s390/include/asm/processor.h | 1 arch/s390/kernel/alternative.c | 23 ++++++++++++++++++ arch/s390/kernel/early.c | 2 + arch/s390/kernel/entry.S | 48 ++++++++++++++++++++++++++++++++++++++ arch/s390/kernel/ipl.c | 1 arch/s390/kernel/smp.c | 2 + 7 files changed, 94 insertions(+)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -705,6 +705,23 @@ config SECCOMP
If unsure, say Y.
+config KERNEL_NOBP + def_bool n + prompt "Enable modified branch prediction for the kernel by default" + help + If this option is selected the kernel will switch to a modified + branch prediction mode if the firmware interface is available. + The modified branch prediction mode improves the behaviour in + regard to speculative execution. + + With the option enabled the kernel parameter "nobp=0" or "nospec" + can be used to run the kernel in the normal branch prediction mode. + + With the option disabled the modified branch prediction mode is + enabled with the "nobp=1" kernel parameter. + + If unsure, say N. + endmenu
menu "Power Management" --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -69,6 +69,7 @@ extern void s390_adjust_jiffies(void); extern const struct seq_operations cpuinfo_op; extern int sysctl_ieee_emulation_warnings; extern void execve_tail(void); +extern void __bpon(void);
/* * User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit. --- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -14,6 +14,29 @@ static int __init disable_alternative_in
early_param("noaltinstr", disable_alternative_instructions);
+static int __init nobp_setup_early(char *str) +{ + bool enabled; + int rc; + + rc = kstrtobool(str, &enabled); + if (rc) + return rc; + if (enabled && test_facility(82)) + __set_facility(82, S390_lowcore.alt_stfle_fac_list); + else + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nobp", nobp_setup_early); + +static int __init nospec_setup_early(char *str) +{ + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nospec", nospec_setup_early); + struct brcl_insn { u16 opc; s32 disp; --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -282,6 +282,8 @@ static noinline __init void setup_facili memcpy(S390_lowcore.alt_stfle_fac_list, S390_lowcore.stfle_fac_list, sizeof(S390_lowcore.alt_stfle_fac_list)); + if (!IS_ENABLED(CONFIG_KERNEL_NOBP)) + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); }
static __init void detect_diag9c(void) --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -162,8 +162,41 @@ _PIF_WORK = (_PIF_PER_TRAP) tm off+\addr, \mask .endm
+ .macro BPOFF + .pushsection .altinstr_replacement, "ax" +660: .long 0xb2e8c000 + .popsection +661: .long 0x47000000 + .pushsection .altinstructions, "a" + .long 661b - . + .long 660b - . + .word 82 + .byte 4 + .byte 4 + .popsection + .endm + + .macro BPON + .pushsection .altinstr_replacement, "ax" +662: .long 0xb2e8d000 + .popsection +663: .long 0x47000000 + .pushsection .altinstructions, "a" + .long 663b - . + .long 662b - . + .word 82 + .byte 4 + .byte 4 + .popsection + .endm + .section .kprobes.text, "ax"
+ENTRY(__bpon) + .globl __bpon + BPON + br %r14 + /* * Scheduler resume function, called by switch_to * gpr2 = (task_struct *) prev @@ -223,7 +256,10 @@ ENTRY(sie64a) jnz .Lsie_skip TSTMSK __LC_CPU_FLAGS,_CIF_FPU jo .Lsie_skip # exit if fp/vx regs changed + BPON sie 0(%r14) +.Lsie_exit: + BPOFF .Lsie_skip: ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce @@ -273,6 +309,7 @@ ENTRY(system_call) stpt __LC_SYNC_ENTER_TIMER .Lsysc_stmg: stmg %r8,%r15,__LC_SAVE_AREA_SYNC + BPOFF lg %r10,__LC_LAST_BREAK lg %r12,__LC_THREAD_INFO lghi %r14,_PIF_SYSCALL @@ -319,6 +356,7 @@ ENTRY(system_call) jnz .Lsysc_work # check for work TSTMSK __LC_CPU_FLAGS,_CIF_WORK jnz .Lsysc_work + BPON .Lsysc_restore: lg %r14,__LC_VDSO_PER_CPU lmg %r0,%r10,__PT_R0(%r11) @@ -479,6 +517,7 @@ ENTRY(kernel_thread_starter)
ENTRY(pgm_check_handler) stpt __LC_SYNC_ENTER_TIMER + BPOFF stmg %r8,%r15,__LC_SAVE_AREA_SYNC lg %r10,__LC_LAST_BREAK lg %r12,__LC_THREAD_INFO @@ -577,6 +616,7 @@ ENTRY(pgm_check_handler) ENTRY(io_int_handler) STCK __LC_INT_CLOCK stpt __LC_ASYNC_ENTER_TIMER + BPOFF stmg %r8,%r15,__LC_SAVE_AREA_ASYNC lg %r10,__LC_LAST_BREAK lg %r12,__LC_THREAD_INFO @@ -628,9 +668,13 @@ ENTRY(io_int_handler) lg %r14,__LC_VDSO_PER_CPU lmg %r0,%r10,__PT_R0(%r11) mvc __LC_RETURN_PSW(16),__PT_PSW(%r11) + tm __PT_PSW+1(%r11),0x01 # returning to user ? + jno .Lio_exit_kernel + BPON .Lio_exit_timer: stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER +.Lio_exit_kernel: lmg %r11,%r15,__PT_R11(%r11) lpswe __LC_RETURN_PSW .Lio_done: @@ -762,6 +806,7 @@ ENTRY(io_int_handler) ENTRY(ext_int_handler) STCK __LC_INT_CLOCK stpt __LC_ASYNC_ENTER_TIMER + BPOFF stmg %r8,%r15,__LC_SAVE_AREA_ASYNC lg %r10,__LC_LAST_BREAK lg %r12,__LC_THREAD_INFO @@ -810,6 +855,7 @@ ENTRY(psw_idle) .insn rsy,0xeb0000000017,%r1,5,__SF_EMPTY+16(%r15) .Lpsw_idle_stcctm: #endif + BPON STCK __CLOCK_IDLE_ENTER(%r2) stpt __TIMER_IDLE_ENTER(%r2) .Lpsw_idle_lpsw: @@ -914,6 +960,7 @@ load_fpu_regs: */ ENTRY(mcck_int_handler) STCK __LC_MCCK_CLOCK + BPOFF la %r1,4095 # revalidate r1 spt __LC_CPU_TIMER_SAVE_AREA-4095(%r1) # revalidate cpu timer lmg %r0,%r15,__LC_GPREGS_SAVE_AREA-4095(%r1)# revalidate gprs @@ -980,6 +1027,7 @@ ENTRY(mcck_int_handler) mvc __LC_RETURN_MCCK_PSW(16),__PT_PSW(%r11) # move return PSW tm __LC_RETURN_MCCK_PSW+1,0x01 # returning to user ? jno 0f + BPON stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER 0: lmg %r11,%r15,__PT_R11(%r11) --- a/arch/s390/kernel/ipl.c +++ b/arch/s390/kernel/ipl.c @@ -563,6 +563,7 @@ static struct kset *ipl_kset;
static void __ipl_run(void *unused) { + __bpon(); diag308(DIAG308_IPL, NULL); if (MACHINE_IS_VM) __cpcmd("IPL", NULL, 0, NULL); --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -301,6 +301,7 @@ static void pcpu_delegate(struct pcpu *p mem_assign_absolute(lc->restart_fn, (unsigned long) func); mem_assign_absolute(lc->restart_data, (unsigned long) data); mem_assign_absolute(lc->restart_source, source_cpu); + __bpon(); asm volatile( "0: sigp 0,%0,%2 # sigp restart to target cpu\n" " brc 2,0b # busy, try again\n" @@ -890,6 +891,7 @@ void __cpu_die(unsigned int cpu) void __noreturn cpu_die(void) { idle_task_exit(); + __bpon(); pcpu_sigp_retry(pcpu_devices + smp_processor_id(), SIGP_STOP, 0); for (;;) ; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6b73044b2b0081ee3dd1cd6eaab7dee552601efb ]
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary plumbing in entry.S to be able to run user space and KVM guests with limited branch prediction.
To switch a user space process to limited branch prediction the s390_isolate_bp() function has to be call, and to run a vCPU of a KVM guest associated with the current task with limited branch prediction call s390_isolate_bp_guest().
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/processor.h | 3 ++ arch/s390/include/asm/thread_info.h | 4 ++ arch/s390/kernel/entry.S | 49 +++++++++++++++++++++++++++++++++--- arch/s390/kernel/processor.c | 18 +++++++++++++ 4 files changed, 70 insertions(+), 4 deletions(-)
--- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -316,6 +316,9 @@ extern void memcpy_absolute(void *, void memcpy_absolute(&(dest), &__tmp, sizeof(__tmp)); \ }
+extern int s390_isolate_bp(void); +extern int s390_isolate_bp_guest(void); + #endif /* __ASSEMBLY__ */
#endif /* __ASM_S390_PROCESSOR_H */ --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -78,6 +78,8 @@ void arch_release_task_struct(struct tas #define TIF_SECCOMP 5 /* secure computing */ #define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ #define TIF_UPROBE 7 /* breakpointed or single-stepping */ +#define TIF_ISOLATE_BP 8 /* Run process with isolated BP */ +#define TIF_ISOLATE_BP_GUEST 9 /* Run KVM guests with isolated BP */ #define TIF_31BIT 16 /* 32bit process */ #define TIF_MEMDIE 17 /* is terminating due to OOM killer */ #define TIF_RESTORE_SIGMASK 18 /* restore signal mask in do_signal() */ @@ -93,6 +95,8 @@ void arch_release_task_struct(struct tas #define _TIF_SECCOMP _BITUL(TIF_SECCOMP) #define _TIF_SYSCALL_TRACEPOINT _BITUL(TIF_SYSCALL_TRACEPOINT) #define _TIF_UPROBE _BITUL(TIF_UPROBE) +#define _TIF_ISOLATE_BP _BITUL(TIF_ISOLATE_BP) +#define _TIF_ISOLATE_BP_GUEST _BITUL(TIF_ISOLATE_BP_GUEST) #define _TIF_31BIT _BITUL(TIF_31BIT) #define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -104,6 +104,7 @@ _PIF_WORK = (_PIF_PER_TRAP) j 3f 1: LAST_BREAK %r14 UPDATE_VTIME %r14,%r15,\timer + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP 2: lg %r15,__LC_ASYNC_STACK # load async stack 3: la %r11,STACK_FRAME_OVERHEAD(%r15) .endm @@ -190,6 +191,40 @@ _PIF_WORK = (_PIF_PER_TRAP) .popsection .endm
+ .macro BPENTER tif_ptr,tif_mask + .pushsection .altinstr_replacement, "ax" +662: .word 0xc004, 0x0000, 0x0000 # 6 byte nop + .word 0xc004, 0x0000, 0x0000 # 6 byte nop + .popsection +664: TSTMSK \tif_ptr,\tif_mask + jz . + 8 + .long 0xb2e8d000 + .pushsection .altinstructions, "a" + .long 664b - . + .long 662b - . + .word 82 + .byte 12 + .byte 12 + .popsection + .endm + + .macro BPEXIT tif_ptr,tif_mask + TSTMSK \tif_ptr,\tif_mask + .pushsection .altinstr_replacement, "ax" +662: jnz . + 8 + .long 0xb2e8d000 + .popsection +664: jz . + 8 + .long 0xb2e8c000 + .pushsection .altinstructions, "a" + .long 664b - . + .long 662b - . + .word 82 + .byte 8 + .byte 8 + .popsection + .endm + .section .kprobes.text, "ax"
ENTRY(__bpon) @@ -237,9 +272,11 @@ ENTRY(__switch_to) */ ENTRY(sie64a) stmg %r6,%r14,__SF_GPRS(%r15) # save kernel registers + lg %r12,__LC_CURRENT stg %r2,__SF_EMPTY(%r15) # save control block pointer stg %r3,__SF_EMPTY+8(%r15) # save guest register save area xc __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # reason code = 0 + mvc __SF_EMPTY+24(8,%r15),__TI_flags(%r12) # copy thread flags TSTMSK __LC_CPU_FLAGS,_CIF_FPU # load guest fp/vx registers ? jno .Lsie_load_guest_gprs brasl %r14,load_fpu_regs # load guest fp/vx regs @@ -256,10 +293,11 @@ ENTRY(sie64a) jnz .Lsie_skip TSTMSK __LC_CPU_FLAGS,_CIF_FPU jo .Lsie_skip # exit if fp/vx regs changed - BPON + BPEXIT __SF_EMPTY+24(%r15),(_TIF_ISOLATE_BP|_TIF_ISOLATE_BP_GUEST) sie 0(%r14) .Lsie_exit: BPOFF + BPENTER __SF_EMPTY+24(%r15),(_TIF_ISOLATE_BP|_TIF_ISOLATE_BP_GUEST) .Lsie_skip: ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce @@ -319,6 +357,7 @@ ENTRY(system_call) LAST_BREAK %r13 .Lsysc_vtime: UPDATE_VTIME %r10,%r13,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP stmg %r0,%r7,__PT_R0(%r11) # clear user controlled register to prevent speculative use xgr %r0,%r0 @@ -356,7 +395,7 @@ ENTRY(system_call) jnz .Lsysc_work # check for work TSTMSK __LC_CPU_FLAGS,_CIF_WORK jnz .Lsysc_work - BPON + BPEXIT __TI_flags(%r12),_TIF_ISOLATE_BP .Lsysc_restore: lg %r14,__LC_VDSO_PER_CPU lmg %r0,%r10,__PT_R0(%r11) @@ -542,6 +581,7 @@ ENTRY(pgm_check_handler) j 3f 2: LAST_BREAK %r14 UPDATE_VTIME %r14,%r15,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP lg %r15,__LC_KERNEL_STACK lg %r14,__TI_task(%r12) aghi %r14,__TASK_thread # pointer to thread_struct @@ -670,7 +710,7 @@ ENTRY(io_int_handler) mvc __LC_RETURN_PSW(16),__PT_PSW(%r11) tm __PT_PSW+1(%r11),0x01 # returning to user ? jno .Lio_exit_kernel - BPON + BPEXIT __TI_flags(%r12),_TIF_ISOLATE_BP .Lio_exit_timer: stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER @@ -1027,7 +1067,7 @@ ENTRY(mcck_int_handler) mvc __LC_RETURN_MCCK_PSW(16),__PT_PSW(%r11) # move return PSW tm __LC_RETURN_MCCK_PSW+1,0x01 # returning to user ? jno 0f - BPON + BPEXIT __TI_flags(%r12),_TIF_ISOLATE_BP stpt __LC_EXIT_TIMER mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER 0: lmg %r11,%r15,__PT_R11(%r11) @@ -1148,6 +1188,7 @@ cleanup_critical: .quad .Lsie_done
.Lcleanup_sie: + BPENTER __SF_EMPTY+24(%r15),(_TIF_ISOLATE_BP|_TIF_ISOLATE_BP_GUEST) lg %r9,__SF_EMPTY(%r15) # get control block pointer ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce --- a/arch/s390/kernel/processor.c +++ b/arch/s390/kernel/processor.c @@ -13,6 +13,7 @@ #include <linux/cpu.h> #include <asm/diag.h> #include <asm/elf.h> +#include <asm/facility.h> #include <asm/lowcore.h> #include <asm/param.h> #include <asm/smp.h> @@ -113,3 +114,20 @@ const struct seq_operations cpuinfo_op = .show = show_cpuinfo, };
+int s390_isolate_bp(void) +{ + if (!test_facility(82)) + return -EOPNOTSUPP; + set_thread_flag(TIF_ISOLATE_BP); + return 0; +} +EXPORT_SYMBOL(s390_isolate_bp); + +int s390_isolate_bp_guest(void) +{ + if (!test_facility(82)) + return -EOPNOTSUPP; + set_thread_flag(TIF_ISOLATE_BP_GUEST); + return 0; +} +EXPORT_SYMBOL(s390_isolate_bp_guest);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit f19fbd5ed642dc31c809596412dab1ed56f2f156 ]
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and -mfunction_return= compiler options to create a kernel fortified against the specte v2 attack.
With CONFIG_EXPOLINE=y all indirect branches will be issued with an execute type instruction. For z10 or newer the EXRL instruction will be used, for older machines the EX instruction. The typical indirect call
basr %r14,%r1
is replaced with a PC relative call to a new thunk
brasl %r14,__s390x_indirect_jump_r1
The thunk contains the EXRL/EX instruction to the indirect branch
__s390x_indirect_jump_r1: exrl 0,0f j . 0: br %r1
The detour via the execute type instruction has a performance impact. To get rid of the detour the new kernel parameter "nospectre_v2" and "spectre_v2=[on,off,auto]" can be used. If the parameter is specified the kernel and module code will be patched at runtime.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 28 ++++++++ arch/s390/Makefile | 10 +++ arch/s390/include/asm/lowcore.h | 4 - arch/s390/include/asm/nospec-branch.h | 18 +++++ arch/s390/kernel/Makefile | 3 arch/s390/kernel/entry.S | 113 ++++++++++++++++++++++++++-------- arch/s390/kernel/module.c | 62 +++++++++++++++--- arch/s390/kernel/nospec-branch.c | 101 ++++++++++++++++++++++++++++++ arch/s390/kernel/setup.c | 4 + arch/s390/kernel/smp.c | 1 arch/s390/kernel/vmlinux.lds.S | 14 ++++ drivers/s390/char/Makefile | 2 12 files changed, 325 insertions(+), 35 deletions(-) create mode 100644 arch/s390/include/asm/nospec-branch.h create mode 100644 arch/s390/kernel/nospec-branch.c
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -722,6 +722,34 @@ config KERNEL_NOBP
If unsure, say N.
+config EXPOLINE + def_bool n + prompt "Avoid speculative indirect branches in the kernel" + help + Compile the kernel with the expoline compiler options to guard + against kernel-to-user data leaks by avoiding speculative indirect + branches. + Requires a compiler with -mindirect-branch=thunk support for full + protection. The kernel may run slower. + + If unsure, say N. + +choice + prompt "Expoline default" + depends on EXPOLINE + default EXPOLINE_FULL + +config EXPOLINE_OFF + bool "spectre_v2=off" + +config EXPOLINE_MEDIUM + bool "spectre_v2=auto" + +config EXPOLINE_FULL + bool "spectre_v2=on" + +endchoice + endmenu
menu "Power Management" --- a/arch/s390/Makefile +++ b/arch/s390/Makefile @@ -77,6 +77,16 @@ ifeq ($(call cc-option-yn,-mwarn-dynamic cflags-$(CONFIG_WARN_DYNAMIC_STACK) += -mwarn-dynamicstack endif
+ifdef CONFIG_EXPOLINE + ifeq ($(call cc-option-yn,$(CC_FLAGS_MARCH) -mindirect-branch=thunk),y) + CC_FLAGS_EXPOLINE := -mindirect-branch=thunk + CC_FLAGS_EXPOLINE += -mfunction-return=thunk + CC_FLAGS_EXPOLINE += -mindirect-branch-table + export CC_FLAGS_EXPOLINE + cflags-y += $(CC_FLAGS_EXPOLINE) + endif +endif + ifdef CONFIG_FUNCTION_TRACER # make use of hotpatch feature if the compiler supports it cc_hotpatch := -mhotpatch=0,3 --- a/arch/s390/include/asm/lowcore.h +++ b/arch/s390/include/asm/lowcore.h @@ -155,7 +155,9 @@ struct _lowcore { /* Per cpu primary space access list */ __u32 paste[16]; /* 0x0400 */
- __u8 pad_0x04c0[0x0e00-0x0440]; /* 0x0440 */ + /* br %r1 trampoline */ + __u16 br_r1_trampoline; /* 0x0440 */ + __u8 pad_0x0442[0x0e00-0x0442]; /* 0x0442 */
/* * 0xe00 contains the address of the IPL Parameter Information --- /dev/null +++ b/arch/s390/include/asm/nospec-branch.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_S390_EXPOLINE_H +#define _ASM_S390_EXPOLINE_H + +#ifndef __ASSEMBLY__ + +#include <linux/types.h> + +extern int nospec_call_disable; +extern int nospec_return_disable; + +void nospec_init_branches(void); +void nospec_call_revert(s32 *start, s32 *end); +void nospec_return_revert(s32 *start, s32 *end); + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_S390_EXPOLINE_H */ --- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -48,6 +48,9 @@ obj-y += entry.o reipl.o relocate_kernel
extra-y += head.o head64.o vmlinux.lds
+obj-$(CONFIG_EXPOLINE) += nospec-branch.o +CFLAGS_REMOVE_expoline.o += $(CC_FLAGS_EXPOLINE) + obj-$(CONFIG_MODULES) += s390_ksyms.o module.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_SCHED_BOOK) += topology.o --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -225,12 +225,74 @@ _PIF_WORK = (_PIF_PER_TRAP) .popsection .endm
+#ifdef CONFIG_EXPOLINE + + .macro GEN_BR_THUNK name,reg,tmp + .section .text.\name,"axG",@progbits,\name,comdat + .globl \name + .hidden \name + .type \name,@function +\name: + .cfi_startproc +#ifdef CONFIG_HAVE_MARCH_Z10_FEATURES + exrl 0,0f +#else + larl \tmp,0f + ex 0,0(\tmp) +#endif + j . +0: br \reg + .cfi_endproc + .endm + + GEN_BR_THUNK __s390x_indirect_jump_r1use_r9,%r9,%r1 + GEN_BR_THUNK __s390x_indirect_jump_r1use_r14,%r14,%r1 + GEN_BR_THUNK __s390x_indirect_jump_r11use_r14,%r14,%r11 + + .macro BASR_R14_R9 +0: brasl %r14,__s390x_indirect_jump_r1use_r9 + .pushsection .s390_indirect_branches,"a",@progbits + .long 0b-. + .popsection + .endm + + .macro BR_R1USE_R14 +0: jg __s390x_indirect_jump_r1use_r14 + .pushsection .s390_indirect_branches,"a",@progbits + .long 0b-. + .popsection + .endm + + .macro BR_R11USE_R14 +0: jg __s390x_indirect_jump_r11use_r14 + .pushsection .s390_indirect_branches,"a",@progbits + .long 0b-. + .popsection + .endm + +#else /* CONFIG_EXPOLINE */ + + .macro BASR_R14_R9 + basr %r14,%r9 + .endm + + .macro BR_R1USE_R14 + br %r14 + .endm + + .macro BR_R11USE_R14 + br %r14 + .endm + +#endif /* CONFIG_EXPOLINE */ + + .section .kprobes.text, "ax"
ENTRY(__bpon) .globl __bpon BPON - br %r14 + BR_R1USE_R14
/* * Scheduler resume function, called by switch_to @@ -258,9 +320,9 @@ ENTRY(__switch_to) mvc __LC_CURRENT_PID(4,%r0),__TASK_pid(%r3) # store pid of next lmg %r6,%r15,__SF_GPRS(%r15) # load gprs of next task TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_LPP - bzr %r14 + jz 0f .insn s,0xb2800000,__LC_LPP # set program parameter - br %r14 +0: BR_R1USE_R14
.L__critical_start:
@@ -326,7 +388,7 @@ sie_exit: xgr %r5,%r5 lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers lg %r2,__SF_EMPTY+16(%r15) # return exit reason code - br %r14 + BR_R1USE_R14 .Lsie_fault: lghi %r14,-EFAULT stg %r14,__SF_EMPTY+16(%r15) # set exit reason code @@ -383,7 +445,7 @@ ENTRY(system_call) lgf %r9,0(%r8,%r10) # get system call add. TSTMSK __TI_flags(%r12),_TIF_TRACE jnz .Lsysc_tracesys - basr %r14,%r9 # call sys_xxxx + BASR_R14_R9 # call sys_xxxx stg %r2,__PT_R2(%r11) # store return value
.Lsysc_return: @@ -523,7 +585,7 @@ ENTRY(system_call) lmg %r3,%r7,__PT_R3(%r11) stg %r7,STACK_FRAME_OVERHEAD(%r15) lg %r2,__PT_ORIG_GPR2(%r11) - basr %r14,%r9 # call sys_xxx + BASR_R14_R9 # call sys_xxx stg %r2,__PT_R2(%r11) # store return value .Lsysc_tracenogo: TSTMSK __TI_flags(%r12),_TIF_TRACE @@ -547,7 +609,7 @@ ENTRY(ret_from_fork) lmg %r9,%r10,__PT_R9(%r11) # load gprs ENTRY(kernel_thread_starter) la %r2,0(%r10) - basr %r14,%r9 + BASR_R14_R9 j .Lsysc_tracenogo
/* @@ -621,9 +683,9 @@ ENTRY(pgm_check_handler) nill %r10,0x007f sll %r10,2 je .Lpgm_return - lgf %r1,0(%r10,%r1) # load address of handler routine + lgf %r9,0(%r10,%r1) # load address of handler routine lgr %r2,%r11 # pass pointer to pt_regs - basr %r14,%r1 # branch to interrupt-handler + BASR_R14_R9 # branch to interrupt-handler .Lpgm_return: LOCKDEP_SYS_EXIT tm __PT_PSW+1(%r11),0x01 # returning to user ? @@ -900,7 +962,7 @@ ENTRY(psw_idle) stpt __TIMER_IDLE_ENTER(%r2) .Lpsw_idle_lpsw: lpswe __SF_EMPTY(%r15) - br %r14 + BR_R1USE_R14 .Lpsw_idle_end:
/* @@ -914,7 +976,7 @@ ENTRY(save_fpu_regs) lg %r2,__LC_CURRENT aghi %r2,__TASK_thread TSTMSK __LC_CPU_FLAGS,_CIF_FPU - bor %r14 + jo .Lsave_fpu_regs_exit stfpc __THREAD_FPU_fpc(%r2) .Lsave_fpu_regs_fpc_end: lg %r3,__THREAD_FPU_regs(%r2) @@ -944,7 +1006,8 @@ ENTRY(save_fpu_regs) std 15,120(%r3) .Lsave_fpu_regs_done: oi __LC_CPU_FLAGS+7,_CIF_FPU - br %r14 +.Lsave_fpu_regs_exit: + BR_R1USE_R14 .Lsave_fpu_regs_end:
/* @@ -961,7 +1024,7 @@ load_fpu_regs: lg %r4,__LC_CURRENT aghi %r4,__TASK_thread TSTMSK __LC_CPU_FLAGS,_CIF_FPU - bnor %r14 + jno .Lload_fpu_regs_exit lfpc __THREAD_FPU_fpc(%r4) TSTMSK __LC_MACHINE_FLAGS,MACHINE_FLAG_VX lg %r4,__THREAD_FPU_regs(%r4) # %r4 <- reg save area @@ -990,7 +1053,8 @@ load_fpu_regs: ld 15,120(%r4) .Lload_fpu_regs_done: ni __LC_CPU_FLAGS+7,255-_CIF_FPU - br %r14 +.Lload_fpu_regs_exit: + BR_R1USE_R14 .Lload_fpu_regs_end:
.L__critical_end: @@ -1163,7 +1227,7 @@ cleanup_critical: jl 0f clg %r9,BASED(.Lcleanup_table+104) # .Lload_fpu_regs_end jl .Lcleanup_load_fpu_regs -0: br %r14 +0: BR_R11USE_R14
.align 8 .Lcleanup_table: @@ -1193,7 +1257,7 @@ cleanup_critical: ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce larl %r9,sie_exit # skip forward to sie_exit - br %r14 + BR_R11USE_R14 #endif
.Lcleanup_system_call: @@ -1250,7 +1314,7 @@ cleanup_critical: stg %r15,56(%r11) # r15 stack pointer # set new psw address and exit larl %r9,.Lsysc_do_svc - br %r14 + BR_R11USE_R14 .Lcleanup_system_call_insn: .quad system_call .quad .Lsysc_stmg @@ -1260,7 +1324,7 @@ cleanup_critical:
.Lcleanup_sysc_tif: larl %r9,.Lsysc_tif - br %r14 + BR_R11USE_R14
.Lcleanup_sysc_restore: # check if stpt has been executed @@ -1277,14 +1341,14 @@ cleanup_critical: mvc 0(64,%r11),__PT_R8(%r9) lmg %r0,%r7,__PT_R0(%r9) 1: lmg %r8,%r9,__LC_RETURN_PSW - br %r14 + BR_R11USE_R14 .Lcleanup_sysc_restore_insn: .quad .Lsysc_exit_timer .quad .Lsysc_done - 4
.Lcleanup_io_tif: larl %r9,.Lio_tif - br %r14 + BR_R11USE_R14
.Lcleanup_io_restore: # check if stpt has been executed @@ -1298,7 +1362,7 @@ cleanup_critical: mvc 0(64,%r11),__PT_R8(%r9) lmg %r0,%r7,__PT_R0(%r9) 1: lmg %r8,%r9,__LC_RETURN_PSW - br %r14 + BR_R11USE_R14 .Lcleanup_io_restore_insn: .quad .Lio_exit_timer .quad .Lio_done - 4 @@ -1350,17 +1414,17 @@ cleanup_critical: # prepare return psw nihh %r8,0xfcfd # clear irq & wait state bits lg %r9,48(%r11) # return from psw_idle - br %r14 + BR_R11USE_R14 .Lcleanup_idle_insn: .quad .Lpsw_idle_lpsw
.Lcleanup_save_fpu_regs: larl %r9,save_fpu_regs - br %r14 + BR_R11USE_R14
.Lcleanup_load_fpu_regs: larl %r9,load_fpu_regs - br %r14 + BR_R11USE_R14
/* * Integer constants @@ -1376,7 +1440,6 @@ cleanup_critical: .Lsie_critical_length: .quad .Lsie_done - .Lsie_gmap #endif - .section .rodata, "a" #define SYSCALL(esame,emu) .long esame .globl sys_call_table --- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -32,6 +32,8 @@ #include <linux/moduleloader.h> #include <linux/bug.h> #include <asm/alternative.h> +#include <asm/nospec-branch.h> +#include <asm/facility.h>
#if 0 #define DEBUGP printk @@ -164,7 +166,11 @@ int module_frob_arch_sections(Elf_Ehdr * me->arch.got_offset = me->core_size; me->core_size += me->arch.got_size; me->arch.plt_offset = me->core_size; - me->core_size += me->arch.plt_size; + if (me->arch.plt_size) { + if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_call_disable) + me->arch.plt_size += PLT_ENTRY_SIZE; + me->core_size += me->arch.plt_size; + } return 0; }
@@ -318,9 +324,21 @@ static int apply_rela(Elf_Rela *rela, El unsigned int *ip; ip = me->module_core + me->arch.plt_offset + info->plt_offset; - ip[0] = 0x0d10e310; /* basr 1,0; lg 1,10(1); br 1 */ - ip[1] = 0x100a0004; - ip[2] = 0x07f10000; + ip[0] = 0x0d10e310; /* basr 1,0 */ + ip[1] = 0x100a0004; /* lg 1,10(1) */ + if (IS_ENABLED(CONFIG_EXPOLINE) && + !nospec_call_disable) { + unsigned int *ij; + ij = me->module_core + + me->arch.plt_offset + + me->arch.plt_size - PLT_ENTRY_SIZE; + ip[2] = 0xa7f40000 + /* j __jump_r1 */ + (unsigned int)(u16) + (((unsigned long) ij - 8 - + (unsigned long) ip) / 2); + } else { + ip[2] = 0x07f10000; /* br %r1 */ + } ip[3] = (unsigned int) (val >> 32); ip[4] = (unsigned int) val; info->plt_initialized = 1; @@ -426,16 +444,42 @@ int module_finalize(const Elf_Ehdr *hdr, struct module *me) { const Elf_Shdr *s; - char *secstrings; + char *secstrings, *secname; + void *aseg; + + if (IS_ENABLED(CONFIG_EXPOLINE) && + !nospec_call_disable && me->arch.plt_size) { + unsigned int *ij; + + ij = me->module_core + me->arch.plt_offset + + me->arch.plt_size - PLT_ENTRY_SIZE; + if (test_facility(35)) { + ij[0] = 0xc6000000; /* exrl %r0,.+10 */ + ij[1] = 0x0005a7f4; /* j . */ + ij[2] = 0x000007f1; /* br %r1 */ + } else { + ij[0] = 0x44000000 | (unsigned int) + offsetof(struct _lowcore, br_r1_trampoline); + ij[1] = 0xa7f40000; /* j . */ + } + }
secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { - if (!strcmp(".altinstructions", secstrings + s->sh_name)) { - /* patch .altinstructions */ - void *aseg = (void *)s->sh_addr; + aseg = (void *) s->sh_addr; + secname = secstrings + s->sh_name;
+ if (!strcmp(".altinstructions", secname)) + /* patch .altinstructions */ apply_alternatives(aseg, aseg + s->sh_size); - } + + if (IS_ENABLED(CONFIG_EXPOLINE) && + (!strcmp(".nospec_call_table", secname))) + nospec_call_revert(aseg, aseg + s->sh_size); + + if (IS_ENABLED(CONFIG_EXPOLINE) && + (!strcmp(".nospec_return_table", secname))) + nospec_return_revert(aseg, aseg + s->sh_size); }
jump_label_apply_nops(me); --- /dev/null +++ b/arch/s390/kernel/nospec-branch.c @@ -0,0 +1,101 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/module.h> +#include <asm/facility.h> +#include <asm/nospec-branch.h> + +int nospec_call_disable = IS_ENABLED(EXPOLINE_OFF); +int nospec_return_disable = !IS_ENABLED(EXPOLINE_FULL); + +static int __init nospectre_v2_setup_early(char *str) +{ + nospec_call_disable = 1; + nospec_return_disable = 1; + return 0; +} +early_param("nospectre_v2", nospectre_v2_setup_early); + +static int __init spectre_v2_setup_early(char *str) +{ + if (str && !strncmp(str, "on", 2)) { + nospec_call_disable = 0; + nospec_return_disable = 0; + } + if (str && !strncmp(str, "off", 3)) { + nospec_call_disable = 1; + nospec_return_disable = 1; + } + if (str && !strncmp(str, "auto", 4)) { + nospec_call_disable = 0; + nospec_return_disable = 1; + } + return 0; +} +early_param("spectre_v2", spectre_v2_setup_early); + +static void __init_or_module __nospec_revert(s32 *start, s32 *end) +{ + enum { BRCL_EXPOLINE, BRASL_EXPOLINE } type; + u8 *instr, *thunk, *br; + u8 insnbuf[6]; + s32 *epo; + + /* Second part of the instruction replace is always a nop */ + memcpy(insnbuf + 2, (char[]) { 0x47, 0x00, 0x00, 0x00 }, 4); + for (epo = start; epo < end; epo++) { + instr = (u8 *) epo + *epo; + if (instr[0] == 0xc0 && (instr[1] & 0x0f) == 0x04) + type = BRCL_EXPOLINE; /* brcl instruction */ + else if (instr[0] == 0xc0 && (instr[1] & 0x0f) == 0x05) + type = BRASL_EXPOLINE; /* brasl instruction */ + else + continue; + thunk = instr + (*(int *)(instr + 2)) * 2; + if (thunk[0] == 0xc6 && thunk[1] == 0x00) + /* exrl %r0,<target-br> */ + br = thunk + (*(int *)(thunk + 2)) * 2; + else if (thunk[0] == 0xc0 && (thunk[1] & 0x0f) == 0x00 && + thunk[6] == 0x44 && thunk[7] == 0x00 && + (thunk[8] & 0x0f) == 0x00 && thunk[9] == 0x00 && + (thunk[1] & 0xf0) == (thunk[8] & 0xf0)) + /* larl %rx,<target br> + ex %r0,0(%rx) */ + br = thunk + (*(int *)(thunk + 2)) * 2; + else + continue; + if (br[0] != 0x07 || (br[1] & 0xf0) != 0xf0) + continue; + switch (type) { + case BRCL_EXPOLINE: + /* brcl to thunk, replace with br + nop */ + insnbuf[0] = br[0]; + insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f); + break; + case BRASL_EXPOLINE: + /* brasl to thunk, replace with basr + nop */ + insnbuf[0] = 0x0d; + insnbuf[1] = (instr[1] & 0xf0) | (br[1] & 0x0f); + break; + } + + s390_kernel_write(instr, insnbuf, 6); + } +} + +void __init_or_module nospec_call_revert(s32 *start, s32 *end) +{ + if (nospec_call_disable) + __nospec_revert(start, end); +} + +void __init_or_module nospec_return_revert(s32 *start, s32 *end) +{ + if (nospec_return_disable) + __nospec_revert(start, end); +} + +extern s32 __nospec_call_start[], __nospec_call_end[]; +extern s32 __nospec_return_start[], __nospec_return_end[]; +void __init nospec_init_branches(void) +{ + nospec_call_revert(__nospec_call_start, __nospec_call_end); + nospec_return_revert(__nospec_return_start, __nospec_return_end); +} --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -64,6 +64,7 @@ #include <asm/sysinfo.h> #include <asm/numa.h> #include <asm/alternative.h> +#include <asm/nospec-branch.h> #include "entry.h"
/* @@ -373,6 +374,7 @@ static void __init setup_lowcore(void) #ifdef CONFIG_SMP lc->spinlock_lockval = arch_spin_lockval(0); #endif + lc->br_r1_trampoline = 0x07f1; /* br %r1 */
set_prefix((u32)(unsigned long) lc); lowcore_ptr[0] = lc; @@ -897,6 +899,8 @@ void __init setup_arch(char **cmdline_p) set_preferred_console();
apply_alternative_instructions(); + if (IS_ENABLED(CONFIG_EXPOLINE)) + nospec_init_branches();
/* Setup zfcpdump support */ setup_zfcpdump(); --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -200,6 +200,7 @@ static int pcpu_alloc_lowcore(struct pcp lc->panic_stack = panic_stack + PANIC_FRAME_OFFSET; lc->cpu_nr = cpu; lc->spinlock_lockval = arch_spin_lockval(cpu); + lc->br_r1_trampoline = 0x07f1; /* br %r1 */ if (MACHINE_HAS_VX) lc->vector_save_area_addr = (unsigned long) &lc->vector_save_area; --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -101,6 +101,20 @@ SECTIONS *(.altinstr_replacement) }
+ /* + * Table with the patch locations to undo expolines + */ + .nospec_call_table : { + __nospec_call_start = . ; + *(.s390_indirect*) + __nospec_call_end = . ; + } + .nospec_return_table : { + __nospec_return_start = . ; + *(.s390_return*) + __nospec_return_end = . ; + } + /* early.c uses stsi, which requires page aligned data. */ . = ALIGN(PAGE_SIZE); INIT_DATA_SECTION(0x100) --- a/drivers/s390/char/Makefile +++ b/drivers/s390/char/Makefile @@ -2,6 +2,8 @@ # S/390 character devices #
+CFLAGS_REMOVE_sclp_early_core.o += $(CC_FLAGS_EXPOLINE) + obj-y += ctrlchar.o keyboard.o defkeymap.o sclp.o sclp_rw.o sclp_quiesce.o \ sclp_cmd.o sclp_config.o sclp_cpi_sys.o sclp_ocf.o sclp_ctl.o \ sclp_early.o
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Eugeniu Rosca erosca@de.adit-jv.com
[ Upstream commit 2cb370d615e9fbed9e95ed222c2c8f337181aa90 ]
I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which obviously always evaluate to false. Fix this.
Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches") Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/nospec-branch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -3,8 +3,8 @@ #include <asm/facility.h> #include <asm/nospec-branch.h>
-int nospec_call_disable = IS_ENABLED(EXPOLINE_OFF); -int nospec_return_disable = !IS_ENABLED(EXPOLINE_FULL); +int nospec_call_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); +int nospec_return_disable = !IS_ENABLED(CONFIG_EXPOLINE_FULL);
static int __init nospectre_v2_setup_early(char *str) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit d5feec04fe578c8dbd9e2e1439afc2f0af761ed4 ]
The system call path can be interrupted before the switch back to the standard branch prediction with BPENTER has been done. The critical section cleanup code skips forward to .Lsysc_do_svc and bypasses the BPENTER. In this case the kernel and all subsequent code will run with the limited branch prediction.
Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -1299,7 +1299,8 @@ cleanup_critical: srag %r9,%r9,23 jz 0f mvc __TI_last_break(8,%r12),16(%r11) -0: # set up saved register r11 +0: BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP + # set up saved register r11 lg %r15,__LC_KERNEL_STACK la %r9,STACK_FRAME_OVERHEAD(%r15) stg %r9,24(%r11) # r11 pt_regs pointer
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
From: Christian Borntraeger borntraeger@de.ibm.com
[ Upstream commit d3f468963cd6fd6d2aa5e26aed8b24232096d0e1 ]
when a system call is interrupted we might call the critical section cleanup handler that re-does some of the operations. When we are between .Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the problem state registers r0-r7:
.Lcleanup_system_call: [...] 0: # update accounting time stamp mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER # set up saved register r11 lg %r15,__LC_KERNEL_STACK la %r9,STACK_FRAME_OVERHEAD(%r15) stg %r9,24(%r11) # r11 pt_regs pointer # fill pt_regs mvc __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC ---> stmg %r0,%r7,__PT_R0(%r9)
The problem is now, that we might have already zeroed out r0. The fix is to move the zeroing of r0 after sysc_do_svc.
Reported-by: Farhan Ali alifm@linux.vnet.ibm.com Fixes: 7041d28115e91 ("s390: scrub registers on kernel entry and KVM exit") Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/entry.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -421,13 +421,13 @@ ENTRY(system_call) UPDATE_VTIME %r10,%r13,__LC_SYNC_ENTER_TIMER BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP stmg %r0,%r7,__PT_R0(%r11) - # clear user controlled register to prevent speculative use - xgr %r0,%r0 mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC stg %r14,__PT_FLAGS(%r11) .Lsysc_do_svc: + # clear user controlled register to prevent speculative use + xgr %r0,%r0 lg %r10,__TI_sysc_table(%r12) # address of system call table llgh %r8,__PT_INT_CODE+2(%r11) slag %r8,%r8,2 # shift and test for svc 0
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit b2e2f43a01bace1a25bdbae04c9f9846882b727a ]
Keep the code for the nobp parameter handling with the code for expolines. Both are related to the spectre v2 mitigation.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/Makefile | 4 ++-- arch/s390/kernel/alternative.c | 23 ----------------------- arch/s390/kernel/nospec-branch.c | 27 +++++++++++++++++++++++++++ 3 files changed, 29 insertions(+), 25 deletions(-)
--- a/arch/s390/kernel/Makefile +++ b/arch/s390/kernel/Makefile @@ -45,11 +45,11 @@ obj-y += debug.o irq.o ipl.o dis.o diag. obj-y += sysinfo.o jump_label.o lgr.o os_info.o machine_kexec.o pgm_check.o obj-y += runtime_instr.o cache.o dumpstack.o obj-y += entry.o reipl.o relocate_kernel.o alternative.o +obj-y += nospec-branch.o
extra-y += head.o head64.o vmlinux.lds
-obj-$(CONFIG_EXPOLINE) += nospec-branch.o -CFLAGS_REMOVE_expoline.o += $(CC_FLAGS_EXPOLINE) +CFLAGS_REMOVE_nospec-branch.o += $(CC_FLAGS_EXPOLINE)
obj-$(CONFIG_MODULES) += s390_ksyms.o module.o obj-$(CONFIG_SMP) += smp.o --- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -14,29 +14,6 @@ static int __init disable_alternative_in
early_param("noaltinstr", disable_alternative_instructions);
-static int __init nobp_setup_early(char *str) -{ - bool enabled; - int rc; - - rc = kstrtobool(str, &enabled); - if (rc) - return rc; - if (enabled && test_facility(82)) - __set_facility(82, S390_lowcore.alt_stfle_fac_list); - else - __clear_facility(82, S390_lowcore.alt_stfle_fac_list); - return 0; -} -early_param("nobp", nobp_setup_early); - -static int __init nospec_setup_early(char *str) -{ - __clear_facility(82, S390_lowcore.alt_stfle_fac_list); - return 0; -} -early_param("nospec", nospec_setup_early); - struct brcl_insn { u16 opc; s32 disp; --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -3,6 +3,31 @@ #include <asm/facility.h> #include <asm/nospec-branch.h>
+static int __init nobp_setup_early(char *str) +{ + bool enabled; + int rc; + + rc = kstrtobool(str, &enabled); + if (rc) + return rc; + if (enabled && test_facility(82)) + __set_facility(82, S390_lowcore.alt_stfle_fac_list); + else + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nobp", nobp_setup_early); + +static int __init nospec_setup_early(char *str) +{ + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + return 0; +} +early_param("nospec", nospec_setup_early); + +#ifdef CONFIG_EXPOLINE + int nospec_call_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); int nospec_return_disable = !IS_ENABLED(CONFIG_EXPOLINE_FULL);
@@ -99,3 +124,5 @@ void __init nospec_init_branches(void) nospec_call_revert(__nospec_call_start, __nospec_call_end); nospec_return_revert(__nospec_return_start, __nospec_return_end); } + +#endif /* CONFIG_EXPOLINE */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6e179d64126b909f0b288fa63cdbf07c531e9b1d ]
Automatically decide between nobp vs. expolines if the spectre_v2=auto kernel parameter is specified or CONFIG_EXPOLINE_AUTO=y is set.
The decision made at boot time due to CONFIG_EXPOLINE_AUTO=y being set can be overruled with the nobp, nospec and spectre_v2 kernel parameters.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 2 - arch/s390/Makefile | 2 - arch/s390/include/asm/nospec-branch.h | 6 +-- arch/s390/kernel/alternative.c | 1 arch/s390/kernel/module.c | 11 ++--- arch/s390/kernel/nospec-branch.c | 68 +++++++++++++++++++++------------- 6 files changed, 52 insertions(+), 38 deletions(-)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -742,7 +742,7 @@ choice config EXPOLINE_OFF bool "spectre_v2=off"
-config EXPOLINE_MEDIUM +config EXPOLINE_AUTO bool "spectre_v2=auto"
config EXPOLINE_FULL --- a/arch/s390/Makefile +++ b/arch/s390/Makefile @@ -83,7 +83,7 @@ ifdef CONFIG_EXPOLINE CC_FLAGS_EXPOLINE += -mfunction-return=thunk CC_FLAGS_EXPOLINE += -mindirect-branch-table export CC_FLAGS_EXPOLINE - cflags-y += $(CC_FLAGS_EXPOLINE) + cflags-y += $(CC_FLAGS_EXPOLINE) -DCC_USING_EXPOLINE endif endif
--- a/arch/s390/include/asm/nospec-branch.h +++ b/arch/s390/include/asm/nospec-branch.h @@ -6,12 +6,10 @@
#include <linux/types.h>
-extern int nospec_call_disable; -extern int nospec_return_disable; +extern int nospec_disable;
void nospec_init_branches(void); -void nospec_call_revert(s32 *start, s32 *end); -void nospec_return_revert(s32 *start, s32 *end); +void nospec_revert(s32 *start, s32 *end);
#endif /* __ASSEMBLY__ */
--- a/arch/s390/kernel/alternative.c +++ b/arch/s390/kernel/alternative.c @@ -1,6 +1,7 @@ #include <linux/module.h> #include <asm/alternative.h> #include <asm/facility.h> +#include <asm/nospec-branch.h>
#define MAX_PATCH_LEN (255 - 1)
--- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -167,7 +167,7 @@ int module_frob_arch_sections(Elf_Ehdr * me->core_size += me->arch.got_size; me->arch.plt_offset = me->core_size; if (me->arch.plt_size) { - if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_call_disable) + if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable) me->arch.plt_size += PLT_ENTRY_SIZE; me->core_size += me->arch.plt_size; } @@ -326,8 +326,7 @@ static int apply_rela(Elf_Rela *rela, El info->plt_offset; ip[0] = 0x0d10e310; /* basr 1,0 */ ip[1] = 0x100a0004; /* lg 1,10(1) */ - if (IS_ENABLED(CONFIG_EXPOLINE) && - !nospec_call_disable) { + if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable) { unsigned int *ij; ij = me->module_core + me->arch.plt_offset + @@ -448,7 +447,7 @@ int module_finalize(const Elf_Ehdr *hdr, void *aseg;
if (IS_ENABLED(CONFIG_EXPOLINE) && - !nospec_call_disable && me->arch.plt_size) { + !nospec_disable && me->arch.plt_size) { unsigned int *ij;
ij = me->module_core + me->arch.plt_offset + @@ -475,11 +474,11 @@ int module_finalize(const Elf_Ehdr *hdr,
if (IS_ENABLED(CONFIG_EXPOLINE) && (!strcmp(".nospec_call_table", secname))) - nospec_call_revert(aseg, aseg + s->sh_size); + nospec_revert(aseg, aseg + s->sh_size);
if (IS_ENABLED(CONFIG_EXPOLINE) && (!strcmp(".nospec_return_table", secname))) - nospec_return_revert(aseg, aseg + s->sh_size); + nospec_revert(aseg, aseg + s->sh_size); }
jump_label_apply_nops(me); --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -11,10 +11,17 @@ static int __init nobp_setup_early(char rc = kstrtobool(str, &enabled); if (rc) return rc; - if (enabled && test_facility(82)) + if (enabled && test_facility(82)) { + /* + * The user explicitely requested nobp=1, enable it and + * disable the expoline support. + */ __set_facility(82, S390_lowcore.alt_stfle_fac_list); - else + if (IS_ENABLED(CONFIG_EXPOLINE)) + nospec_disable = 1; + } else { __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + } return 0; } early_param("nobp", nobp_setup_early); @@ -28,31 +35,46 @@ early_param("nospec", nospec_setup_early
#ifdef CONFIG_EXPOLINE
-int nospec_call_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF); -int nospec_return_disable = !IS_ENABLED(CONFIG_EXPOLINE_FULL); +int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
static int __init nospectre_v2_setup_early(char *str) { - nospec_call_disable = 1; - nospec_return_disable = 1; + nospec_disable = 1; return 0; } early_param("nospectre_v2", nospectre_v2_setup_early);
+static int __init spectre_v2_auto_early(void) +{ + if (IS_ENABLED(CC_USING_EXPOLINE)) { + /* + * The kernel has been compiled with expolines. + * Keep expolines enabled and disable nobp. + */ + nospec_disable = 0; + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); + } + /* + * If the kernel has not been compiled with expolines the + * nobp setting decides what is done, this depends on the + * CONFIG_KERNEL_NP option and the nobp/nospec parameters. + */ + return 0; +} +#ifdef CONFIG_EXPOLINE_AUTO +early_initcall(spectre_v2_auto_early); +#endif + static int __init spectre_v2_setup_early(char *str) { if (str && !strncmp(str, "on", 2)) { - nospec_call_disable = 0; - nospec_return_disable = 0; - } - if (str && !strncmp(str, "off", 3)) { - nospec_call_disable = 1; - nospec_return_disable = 1; - } - if (str && !strncmp(str, "auto", 4)) { - nospec_call_disable = 0; - nospec_return_disable = 1; + nospec_disable = 0; + __clear_facility(82, S390_lowcore.alt_stfle_fac_list); } + if (str && !strncmp(str, "off", 3)) + nospec_disable = 1; + if (str && !strncmp(str, "auto", 4)) + spectre_v2_auto_early(); return 0; } early_param("spectre_v2", spectre_v2_setup_early); @@ -105,15 +127,9 @@ static void __init_or_module __nospec_re } }
-void __init_or_module nospec_call_revert(s32 *start, s32 *end) -{ - if (nospec_call_disable) - __nospec_revert(start, end); -} - -void __init_or_module nospec_return_revert(s32 *start, s32 *end) +void __init_or_module nospec_revert(s32 *start, s32 *end) { - if (nospec_return_disable) + if (nospec_disable) __nospec_revert(start, end); }
@@ -121,8 +137,8 @@ extern s32 __nospec_call_start[], __nosp extern s32 __nospec_return_start[], __nospec_return_end[]; void __init nospec_init_branches(void) { - nospec_call_revert(__nospec_call_start, __nospec_call_end); - nospec_return_revert(__nospec_return_start, __nospec_return_end); + nospec_revert(__nospec_call_start, __nospec_call_end); + nospec_revert(__nospec_return_start, __nospec_return_end); }
#endif /* CONFIG_EXPOLINE */
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit bc035599718412cfba9249aa713f90ef13f13ee9 ]
Add a boot message if either of the spectre defenses is active. The message is "Spectre V2 mitigation: execute trampolines." or "Spectre V2 mitigation: limited branch prediction."
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/nospec-branch.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -33,6 +33,16 @@ static int __init nospec_setup_early(cha } early_param("nospec", nospec_setup_early);
+static int __init nospec_report(void) +{ + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) + pr_info("Spectre V2 mitigation: execute trampolines.\n"); + if (__test_facility(82, S390_lowcore.alt_stfle_fac_list)) + pr_info("Spectre V2 mitigation: limited branch prediction.\n"); + return 0; +} +arch_initcall(nospec_report); + #ifdef CONFIG_EXPOLINE
int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit d424986f1d6b16079b3231db0314923f4f8deed1 ]
Set CONFIG_GENERIC_CPU_VULNERABILITIES and provide the two functions cpu_show_spectre_v1 and cpu_show_spectre_v2 to report the spectre mitigations.
Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/Kconfig | 1 + arch/s390/kernel/nospec-branch.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+)
--- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -111,6 +111,7 @@ config S390 select GENERIC_CLOCKEVENTS select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_DEVICES if !SMP + select GENERIC_CPU_VULNERABILITIES select GENERIC_FIND_FIRST_BIT select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/module.h> +#include <linux/device.h> #include <asm/facility.h> #include <asm/nospec-branch.h>
@@ -43,6 +44,24 @@ static int __init nospec_report(void) } arch_initcall(nospec_report);
+#ifdef CONFIG_SYSFS +ssize_t cpu_show_spectre_v1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); +} + +ssize_t cpu_show_spectre_v2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (IS_ENABLED(CC_USING_EXPOLINE) && !nospec_disable) + return sprintf(buf, "Mitigation: execute trampolines\n"); + if (__test_facility(82, S390_lowcore.alt_stfle_fac_list)) + return sprintf(buf, "Mitigation: limited branch prediction.\n"); + return sprintf(buf, "Vulnerable\n"); +} +#endif + #ifdef CONFIG_EXPOLINE
int nospec_disable = IS_ENABLED(CONFIG_EXPOLINE_OFF);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6a3d1e81a434fc311f224b8be77258bafc18ccc6 ]
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via early_initcall is done *after* the early_param functions. This overwrites any settings done with the nobp/no_spectre_v2/spectre_v2 parameters. The code patching for the kernel is done after the evaluation of the early parameters but before the early_initcall is done. The end result is a kernel image that is patched correctly but the kernel modules are not.
Make sure that the nospec auto detection function is called before the early parameters are evaluated and before the code patching is done.
Fixes: 6e179d64126b ("s390: add automatic detection of the spectre defense") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/nospec-branch.h | 1 + arch/s390/kernel/nospec-branch.c | 8 ++------ arch/s390/kernel/setup.c | 3 +++ 3 files changed, 6 insertions(+), 6 deletions(-)
--- a/arch/s390/include/asm/nospec-branch.h +++ b/arch/s390/include/asm/nospec-branch.h @@ -9,6 +9,7 @@ extern int nospec_disable;
void nospec_init_branches(void); +void nospec_auto_detect(void); void nospec_revert(s32 *start, s32 *end);
#endif /* __ASSEMBLY__ */ --- a/arch/s390/kernel/nospec-branch.c +++ b/arch/s390/kernel/nospec-branch.c @@ -73,7 +73,7 @@ static int __init nospectre_v2_setup_ear } early_param("nospectre_v2", nospectre_v2_setup_early);
-static int __init spectre_v2_auto_early(void) +void __init nospec_auto_detect(void) { if (IS_ENABLED(CC_USING_EXPOLINE)) { /* @@ -88,11 +88,7 @@ static int __init spectre_v2_auto_early( * nobp setting decides what is done, this depends on the * CONFIG_KERNEL_NP option and the nobp/nospec parameters. */ - return 0; } -#ifdef CONFIG_EXPOLINE_AUTO -early_initcall(spectre_v2_auto_early); -#endif
static int __init spectre_v2_setup_early(char *str) { @@ -103,7 +99,7 @@ static int __init spectre_v2_setup_early if (str && !strncmp(str, "off", 3)) nospec_disable = 1; if (str && !strncmp(str, "auto", 4)) - spectre_v2_auto_early(); + nospec_auto_detect(); return 0; } early_param("spectre_v2", spectre_v2_setup_early); --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -846,6 +846,9 @@ void __init setup_arch(char **cmdline_p) init_mm.end_data = (unsigned long) &_edata; init_mm.brk = (unsigned long) &_end;
+ if (IS_ENABLED(CONFIG_EXPOLINE_AUTO)) + nospec_auto_detect(); + parse_early_param(); os_info_init(); setup_ipl();
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Schwidefsky schwidefsky@de.ibm.com
[ Upstream commit 6cf09958f32b9667bb3ebadf74367c791112771b ]
The main linker script vmlinux.lds.S for the kernel image merges the expoline code patch tables into two section ".nospec_call_table" and ".nospec_return_table". This is *not* done for the modules, there the sections retain their original names as generated by gcc: ".s390_indirect_call", ".s390_return_mem" and ".s390_return_reg".
The module_finalize code has to check for the compiler generated section names, otherwise no code patching is done. This slows down the module code in case of "spectre_v2=off".
Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/module.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -473,11 +473,11 @@ int module_finalize(const Elf_Ehdr *hdr, apply_alternatives(aseg, aseg + s->sh_size);
if (IS_ENABLED(CONFIG_EXPOLINE) && - (!strcmp(".nospec_call_table", secname))) + (!strncmp(".s390_indirect", secname, 14))) nospec_revert(aseg, aseg + s->sh_size);
if (IS_ENABLED(CONFIG_EXPOLINE) && - (!strcmp(".nospec_return_table", secname))) + (!strncmp(".s390_return", secname, 12))) nospec_revert(aseg, aseg + s->sh_size); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ]
After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set.
However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo.
One way to reproduce it:
# modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole
The netpoll won't really work.
This patch is to remove that slave_dev npinfo setting in bond_enslave().
Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1614,8 +1614,7 @@ int bond_enslave(struct net_device *bond } /* switch(bond_mode) */
#ifdef CONFIG_NET_POLL_CONTROLLER - slave_dev->npinfo = bond->dev->npinfo; - if (slave_dev->npinfo) { + if (bond->dev->npinfo) { if (slave_enable_netpoll(new_slave)) { netdev_info(bond_dev, "master_dev is using netpoll, but new slave device does not support netpoll\n"); res = -EBUSY;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
[ Upstream commit 9c438d7a3a52dcc2b9ed095cb87d3a5e83cf7e60 ]
Adding a dns_resolver key whose payload contains a very long option name resulted in that string being printed in full. This hit the WARN_ONCE() in set_precision() during the printk(), because printk() only supports a precision of up to 32767 bytes:
precision 1000000 too large WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0
Fix it by limiting option strings (combined name + value) to a much more reasonable 128 bytes. The exact limit is arbitrary, but currently the only recognized option is formatted as "dnserror=%lu" which fits well within this limit.
Also ratelimit the printks.
Reproducer:
perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s
This bug was found using syzkaller.
Reported-by: Mark Rutland mark.rutland@arm.com Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dns_resolver/dns_key.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
--- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -25,6 +25,7 @@ #include <linux/moduleparam.h> #include <linux/slab.h> #include <linux/string.h> +#include <linux/ratelimit.h> #include <linux/kernel.h> #include <linux/keyctl.h> #include <linux/err.h> @@ -91,9 +92,9 @@ dns_resolver_preparse(struct key_prepars
next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (!opt_len) { - printk(KERN_WARNING - "Empty option to dns_resolver key\n"); + if (opt_len <= 0 || opt_len > 128) { + pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", + opt_len); return -EINVAL; }
@@ -127,10 +128,8 @@ dns_resolver_preparse(struct key_prepars }
bad_option_value: - printk(KERN_WARNING - "Option '%*.*s' to dns_resolver key:" - " bad/missing value\n", - opt_nlen, opt_nlen, opt); + pr_warn_ratelimited("Option '%*.*s' to dns_resolver key: bad/missing value\n", + opt_nlen, opt_nlen, opt); return -EINVAL; } while (opt = next_opt + 1, opt < end); }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit eb1c28c05894a4b1f6b56c5bf072205e64cfa280 ]
Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that it actually points to valid data.
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_ppp.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -606,6 +606,13 @@ static int pppol2tp_connect(struct socke lock_sock(sk);
error = -EINVAL; + + if (sockaddr_len != sizeof(struct sockaddr_pppol2tp) && + sockaddr_len != sizeof(struct sockaddr_pppol2tpv3) && + sockaddr_len != sizeof(struct sockaddr_pppol2tpin6) && + sockaddr_len != sizeof(struct sockaddr_pppol2tpv3in6)) + goto end; + if (sp->sa_protocol != PX_PROTO_OL2TP) goto end;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7dd07c143a4b54d050e748bee4b4b9e94a7b1744 ]
Since neigh_dump_table() calls nlmsg_parse() without giving policy constraints, attributes can have arbirary size that we must validate
Reported by syzbot/KMSAN :
BUG: KMSAN: uninit-value in neigh_master_filtered net/core/neighbour.c:2292 [inline] BUG: KMSAN: uninit-value in neigh_dump_table net/core/neighbour.c:2348 [inline] BUG: KMSAN: uninit-value in neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438 CPU: 1 PID: 3575 Comm: syzkaller268891 Not tainted 4.16.0+ #83 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 neigh_master_filtered net/core/neighbour.c:2292 [inline] neigh_dump_table net/core/neighbour.c:2348 [inline] neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438 netlink_dump+0x9ad/0x1540 net/netlink/af_netlink.c:2225 __netlink_dump_start+0x1167/0x12a0 net/netlink/af_netlink.c:2322 netlink_dump_start include/linux/netlink.h:214 [inline] rtnetlink_rcv_msg+0x1435/0x1560 net/core/rtnetlink.c:4598 netlink_rcv_skb+0x355/0x5f0 net/netlink/af_netlink.c:2447 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4653 netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline] netlink_unicast+0x1672/0x1750 net/netlink/af_netlink.c:1337 netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43fed9 RSP: 002b:00007ffddbee2798 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fed9 RDX: 0000000000000000 RSI: 0000000020005000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401800 R13: 0000000000401890 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline] netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 21fdd092acc7 ("net: Add support for filtering neigh dump by master device") Signed-off-by: Eric Dumazet edumazet@google.com Cc: David Ahern dsa@cumulusnetworks.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: David Ahern dsa@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -2280,12 +2280,16 @@ static int neigh_dump_table(struct neigh
err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL); if (!err) { - if (tb[NDA_IFINDEX]) + if (tb[NDA_IFINDEX]) { + if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32)) + return -EINVAL; filter_idx = nla_get_u32(tb[NDA_IFINDEX]); - - if (tb[NDA_MASTER]) + } + if (tb[NDA_MASTER]) { + if (nla_len(tb[NDA_MASTER]) != sizeof(u32)) + return -EINVAL; filter_master_idx = nla_get_u32(tb[NDA_MASTER]); - + } if (filter_idx || filter_master_idx) flags |= NLM_F_DUMP_FILTERED; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit b905ef9ab90115d001c1658259af4b1c65088779 ]
The connection timers of an llc sock could be still flying after we delete them in llc_sk_free(), and even possibly after we free the sock. We could just wait synchronously here in case of troubles.
Note, I leave other call paths as they are, since they may not have to wait, at least we can change them to synchronously when needed.
Also, move the code to net/llc/llc_conn.c, which is apparently a better place.
Reported-by: syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/llc_conn.h | 1 + net/llc/llc_c_ac.c | 9 +-------- net/llc/llc_conn.c | 22 +++++++++++++++++++++- 3 files changed, 23 insertions(+), 9 deletions(-)
--- a/include/net/llc_conn.h +++ b/include/net/llc_conn.h @@ -97,6 +97,7 @@ static __inline__ char llc_backlog_type(
struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot, int kern); +void llc_sk_stop_all_timers(struct sock *sk, bool sync); void llc_sk_free(struct sock *sk);
void llc_sk_reset(struct sock *sk); --- a/net/llc/llc_c_ac.c +++ b/net/llc/llc_c_ac.c @@ -1096,14 +1096,7 @@ int llc_conn_ac_inc_tx_win_size(struct s
int llc_conn_ac_stop_all_timers(struct sock *sk, struct sk_buff *skb) { - struct llc_sock *llc = llc_sk(sk); - - del_timer(&llc->pf_cycle_timer.timer); - del_timer(&llc->ack_timer.timer); - del_timer(&llc->rej_sent_timer.timer); - del_timer(&llc->busy_state_timer.timer); - llc->ack_must_be_send = 0; - llc->ack_pf = 0; + llc_sk_stop_all_timers(sk, false); return 0; }
--- a/net/llc/llc_conn.c +++ b/net/llc/llc_conn.c @@ -951,6 +951,26 @@ out: return sk; }
+void llc_sk_stop_all_timers(struct sock *sk, bool sync) +{ + struct llc_sock *llc = llc_sk(sk); + + if (sync) { + del_timer_sync(&llc->pf_cycle_timer.timer); + del_timer_sync(&llc->ack_timer.timer); + del_timer_sync(&llc->rej_sent_timer.timer); + del_timer_sync(&llc->busy_state_timer.timer); + } else { + del_timer(&llc->pf_cycle_timer.timer); + del_timer(&llc->ack_timer.timer); + del_timer(&llc->rej_sent_timer.timer); + del_timer(&llc->busy_state_timer.timer); + } + + llc->ack_must_be_send = 0; + llc->ack_pf = 0; +} + /** * llc_sk_free - Frees a LLC socket * @sk - socket to free @@ -963,7 +983,7 @@ void llc_sk_free(struct sock *sk)
llc->state = LLC_CONN_OUT_OF_SVC; /* Stop all (possibly) running timers */ - llc_conn_ac_stop_all_timers(sk, NULL); + llc_sk_stop_all_timers(sk, true); #ifdef DEBUG_LLC_CONN_ALLOC printk(KERN_INFO "%s: unackq=%d, txq=%d\n", __func__, skb_queue_len(&llc->pdu_unack_q),
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
[ Upstream commit 7e5a206ab686f098367b61aca989f5cdfa8114a3 ]
The old code reads the "opsize" variable from out-of-bounds memory (first byte behind the segment) if a broken TCP segment ends directly after an opcode that is neither EOL nor NOP.
The result of the read isn't used for anything, so the worst thing that could theoretically happen is a pagefault; and since the physmap is usually mostly contiguous, even that seems pretty unlikely.
The following C reproducer triggers the uninitialized read - however, you can't actually see anything happen unless you put something like a pr_warn() in tcp_parse_md5sig_option() to print the opsize.
==================================== #define _GNU_SOURCE #include <arpa/inet.h> #include <stdlib.h> #include <errno.h> #include <stdarg.h> #include <net/if.h> #include <linux/if.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/in.h> #include <linux/if_tun.h> #include <err.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <assert.h>
void systemf(const char *command, ...) { char *full_command; va_list ap; va_start(ap, command); if (vasprintf(&full_command, command, ap) == -1) err(1, "vasprintf"); va_end(ap); printf("systemf: <<<%s>>>\n", full_command); system(full_command); }
char *devname;
int tun_alloc(char *name) { int fd = open("/dev/net/tun", O_RDWR); if (fd == -1) err(1, "open tun dev"); static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI }; strcpy(req.ifr_name, name); if (ioctl(fd, TUNSETIFF, &req)) err(1, "TUNSETIFF"); devname = req.ifr_name; printf("device name: %s\n", devname); return fd; }
#define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
void sum_accumulate(unsigned int *sum, void *data, int len) { assert((len&2)==0); for (int i=0; i<len/2; i++) { *sum += ntohs(((unsigned short *)data)[i]); } }
unsigned short sum_final(unsigned int sum) { sum = (sum >> 16) + (sum & 0xffff); sum = (sum >> 16) + (sum & 0xffff); return htons(~sum); }
void fix_ip_sum(struct iphdr *ip) { unsigned int sum = 0; sum_accumulate(&sum, ip, sizeof(*ip)); ip->check = sum_final(sum); }
void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) { unsigned int sum = 0; struct { unsigned int saddr; unsigned int daddr; unsigned char pad; unsigned char proto_num; unsigned short tcp_len; } fakehdr = { .saddr = ip->saddr, .daddr = ip->daddr, .proto_num = ip->protocol, .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4) }; sum_accumulate(&sum, &fakehdr, sizeof(fakehdr)); sum_accumulate(&sum, tcp, tcp->doff*4); tcp->check = sum_final(sum); }
int main(void) { int tun_fd = tun_alloc("inject_dev%d"); systemf("ip link set %s up", devname); systemf("ip addr add 192.168.42.1/24 dev %s", devname);
struct { struct iphdr ip; struct tcphdr tcp; unsigned char tcp_opts[20]; } __attribute__((packed)) syn_packet = { .ip = { .ihl = sizeof(struct iphdr)/4, .version = 4, .tot_len = htons(sizeof(syn_packet)), .ttl = 30, .protocol = IPPROTO_TCP, /* FIXUP check */ .saddr = IPADDR(192,168,42,2), .daddr = IPADDR(192,168,42,1) }, .tcp = { .source = htons(1), .dest = htons(1337), .seq = 0x12345678, .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4, .syn = 1, .window = htons(64), .check = 0 /*FIXUP*/ }, .tcp_opts = { /* INVALID: trailing MD5SIG opcode after NOPs */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 19 } }; fix_ip_sum(&syn_packet.ip); fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp); while (1) { int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet)); if (write_res != sizeof(syn_packet)) err(1, "packet write failed"); } } ====================================
Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3869,11 +3869,8 @@ const u8 *tcp_parse_md5sig_option(const int length = (th->doff << 2) - sizeof(*th); const u8 *ptr = (const u8 *)(th + 1);
- /* If the TCP option is too short, we can short cut */ - if (length < TCPOLEN_MD5SIG) - return NULL; - - while (length > 0) { + /* If not enough data remaining, we can short cut */ + while (length >= TCPOLEN_MD5SIG) { int opcode = *ptr++; int opsize;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 4fb0534fb7bbc2346ba7d3a072b538007f4135a5 ]
When parsing the options provided by the user space, team_nl_cmd_options_set() insert them in a temporary list to send multiple events with a single message. While each option's attribute is correctly validated, the code does not check for duplicate entries before inserting into the event list.
Exploiting the above, the syzbot was able to trigger the following splat:
kernel BUG at lib/list_debug.c:31! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4466 Comm: syzkaller556835 Not tainted 4.16.0+ #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: 0018:ffff8801b04bf248 EFLAGS: 00010286 RAX: 0000000000000058 RBX: ffff8801c8fc7a90 RCX: 0000000000000000 RDX: 0000000000000058 RSI: ffffffff815fbf41 RDI: ffffed0036097e3f RBP: ffff8801b04bf260 R08: ffff8801b0b2a700 R09: ffffed003b604f90 R10: ffffed003b604f90 R11: ffff8801db027c87 R12: ffff8801c8fc7a90 R13: ffff8801c8fc7a90 R14: dffffc0000000000 R15: 0000000000000000 FS: 0000000000b98880(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000043fc30 CR3: 00000001afe8e000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __list_add include/linux/list.h:60 [inline] list_add include/linux/list.h:79 [inline] team_nl_cmd_options_set+0x9ff/0x12b0 drivers/net/team/team.c:2571 genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599 genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448 genl_rcv+0x28/0x40 net/netlink/genetlink.c:635 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 ___sys_sendmsg+0x805/0x940 net/socket.c:2117 __sys_sendmsg+0x115/0x270 net/socket.c:2155 SYSC_sendmsg net/socket.c:2164 [inline] SyS_sendmsg+0x29/0x30 net/socket.c:2162 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4458b9 RSP: 002b:00007ffd1d4a7278 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000001b RCX: 00000000004458b9 RDX: 0000000000000010 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00000000004a74ed R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000213 R12: 00007ffd1d4a7348 R13: 0000000000402a60 R14: 0000000000000000 R15: 0000000000000000 Code: 75 e8 eb a9 48 89 f7 48 89 75 e8 e8 d1 85 7b fe 48 8b 75 e8 eb bb 48 89 f2 48 89 d9 4c 89 e6 48 c7 c7 a0 84 d8 87 e8 ea 67 28 fe <0f> 0b 0f 1f 40 00 48 b8 00 00 00 00 00 fc ff df 55 48 89 e5 41 RIP: __list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: ffff8801b04bf248
This changeset addresses the avoiding list_add() if the current option is already present in the event list.
Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/team/team.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -247,6 +247,17 @@ static void __team_option_inst_mark_remo } }
+static bool __team_option_inst_tmp_find(const struct list_head *opts, + const struct team_option_inst *needle) +{ + struct team_option_inst *opt_inst; + + list_for_each_entry(opt_inst, opts, tmp_list) + if (opt_inst == needle) + return true; + return false; +} + static int __team_options_register(struct team *team, const struct team_option *option, size_t option_count) @@ -2544,6 +2555,14 @@ static int team_nl_cmd_options_set(struc if (err) goto team_put; opt_inst->changed = true; + + /* dumb/evil user-space can send us duplicate opt, + * keep only the last one + */ + if (__team_option_inst_tmp_find(&opt_inst_list, + opt_inst)) + continue; + list_add(&opt_inst->tmp_list, &opt_inst_list); } if (!opt_found) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit a6361f0ca4b25460f2cdf3235ebe8115f622901e ]
Updates to the bitfields in struct packet_sock are not atomic. Serialize these read-modify-write cycles.
Move po->running into a separate variable. Its writes are protected by po->bind_lock (except for one startup case at packet_create). Also replace a textual precondition warning with lockdep annotation.
All others are set only in packet_setsockopt. Serialize these updates by holding the socket lock. Analogous to other field updates, also hold the lock when testing whether a ring is active (pg_vec).
Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg") Reported-by: DaeRyong Jeong threeearcat@gmail.com Reported-by: Byoungyoung Lee byoungyoung@purdue.edu Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 60 +++++++++++++++++++++++++++++++++++-------------- net/packet/internal.h | 10 ++++---- 2 files changed, 49 insertions(+), 21 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -332,11 +332,11 @@ static void packet_pick_tx_queue(struct skb_set_queue_mapping(skb, queue_index); }
-/* register_prot_hook must be invoked with the po->bind_lock held, +/* __register_prot_hook must be invoked through register_prot_hook * or from a context in which asynchronous accesses to the packet * socket is not possible (packet_create()). */ -static void register_prot_hook(struct sock *sk) +static void __register_prot_hook(struct sock *sk) { struct packet_sock *po = pkt_sk(sk);
@@ -351,8 +351,13 @@ static void register_prot_hook(struct so } }
-/* {,__}unregister_prot_hook() must be invoked with the po->bind_lock - * held. If the sync parameter is true, we will temporarily drop +static void register_prot_hook(struct sock *sk) +{ + lockdep_assert_held_once(&pkt_sk(sk)->bind_lock); + __register_prot_hook(sk); +} + +/* If the sync parameter is true, we will temporarily drop * the po->bind_lock and do a synchronize_net to make sure no * asynchronous packet processing paths still refer to the elements * of po->prot_hook. If the sync parameter is false, it is the @@ -362,6 +367,8 @@ static void __unregister_prot_hook(struc { struct packet_sock *po = pkt_sk(sk);
+ lockdep_assert_held_once(&po->bind_lock); + po->running = 0;
if (po->fanout) @@ -3134,7 +3141,7 @@ static int packet_create(struct net *net
if (proto) { po->prot_hook.type = proto; - register_prot_hook(sk); + __register_prot_hook(sk); }
mutex_lock(&net->packet.sklist_lock); @@ -3653,12 +3660,18 @@ packet_setsockopt(struct socket *sock, i
if (optlen != sizeof(val)) return -EINVAL; - if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) - return -EBUSY; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT; - po->tp_loss = !!val; - return 0; + + lock_sock(sk); + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { + ret = -EBUSY; + } else { + po->tp_loss = !!val; + ret = 0; + } + release_sock(sk); + return ret; } case PACKET_AUXDATA: { @@ -3669,7 +3682,9 @@ packet_setsockopt(struct socket *sock, i if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
+ lock_sock(sk); po->auxdata = !!val; + release_sock(sk); return 0; } case PACKET_ORIGDEV: @@ -3681,7 +3696,9 @@ packet_setsockopt(struct socket *sock, i if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
+ lock_sock(sk); po->origdev = !!val; + release_sock(sk); return 0; } case PACKET_VNET_HDR: @@ -3690,15 +3707,20 @@ packet_setsockopt(struct socket *sock, i
if (sock->type != SOCK_RAW) return -EINVAL; - if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) - return -EBUSY; if (optlen < sizeof(val)) return -EINVAL; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT;
- po->has_vnet_hdr = !!val; - return 0; + lock_sock(sk); + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { + ret = -EBUSY; + } else { + po->has_vnet_hdr = !!val; + ret = 0; + } + release_sock(sk); + return ret; } case PACKET_TIMESTAMP: { @@ -3736,11 +3758,17 @@ packet_setsockopt(struct socket *sock, i
if (optlen != sizeof(val)) return -EINVAL; - if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) - return -EBUSY; if (copy_from_user(&val, optval, sizeof(val))) return -EFAULT; - po->tp_tx_has_off = !!val; + + lock_sock(sk); + if (po->rx_ring.pg_vec || po->tx_ring.pg_vec) { + ret = -EBUSY; + } else { + po->tp_tx_has_off = !!val; + ret = 0; + } + release_sock(sk); return 0; } case PACKET_QDISC_BYPASS: --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -109,10 +109,12 @@ struct packet_sock { int copy_thresh; spinlock_t bind_lock; struct mutex pg_vec_lock; - unsigned int running:1, /* prot_hook is attached*/ - auxdata:1, + unsigned int running; /* bind_lock must be held */ + unsigned int auxdata:1, /* writer must hold sock lock */ origdev:1, - has_vnet_hdr:1; + has_vnet_hdr:1, + tp_loss:1, + tp_tx_has_off:1; int pressure; int ifindex; /* bound device */ __be16 num; @@ -122,8 +124,6 @@ struct packet_sock { enum tpacket_versions tp_version; unsigned int tp_hdrlen; unsigned int tp_reserve; - unsigned int tp_loss:1; - unsigned int tp_tx_has_off:1; unsigned int tp_tstamp; struct net_device __rcu *cached_dev; int (*xmit)(struct sk_buff *skb);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guillaume Nault g.nault@alphalink.fr
[ Upstream commit a49e2f5d5fb141884452ddb428f551b123d436b5 ]
We must validate sockaddr_len, otherwise userspace can pass fewer data than we expect and we end up accessing invalid data.
Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/pppoe.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -638,6 +638,10 @@ static int pppoe_connect(struct socket * lock_sock(sk);
error = -EINVAL; + + if (sockaddr_len != sizeof(struct sockaddr_pppox)) + goto end; + if (sp->sa_protocol != PX_PROTO_OE) goto end;
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp
[ Upstream commit 7ce2367254e84753bceb07327aaf5c953cfce117 ]
Syzkaller spotted an old bug which leads to reading skb beyond tail by 4 bytes on vlan tagged packets. This is caused because skb_vlan_tagged_multi() did not check skb_headlen.
BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline] BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline] BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline] BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 eth_type_vlan include/linux/if_vlan.h:283 [inline] skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] vlan_features_check include/linux/if_vlan.h:672 [inline] dflt_features_check net/core/dev.c:2949 [inline] netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084 __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43ffa9 RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9 RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0 R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.") Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com Signed-off-by: Toshiaki Makita makita.toshiaki@lab.ntt.co.jp Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/if_vlan.h | 7 +++++-- net/core/dev.c | 2 +- 2 files changed, 6 insertions(+), 3 deletions(-)
--- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -585,7 +585,7 @@ static inline bool skb_vlan_tagged(const * Returns true if the skb is tagged with multiple vlan headers, regardless * of whether it is hardware accelerated or not. */ -static inline bool skb_vlan_tagged_multi(const struct sk_buff *skb) +static inline bool skb_vlan_tagged_multi(struct sk_buff *skb) { __be16 protocol = skb->protocol;
@@ -596,6 +596,9 @@ static inline bool skb_vlan_tagged_multi protocol != htons(ETH_P_8021AD))) return false;
+ if (unlikely(!pskb_may_pull(skb, VLAN_ETH_HLEN))) + return false; + veh = (struct vlan_ethhdr *)skb->data; protocol = veh->h_vlan_encapsulated_proto; } @@ -613,7 +616,7 @@ static inline bool skb_vlan_tagged_multi * * Returns features without unsafe ones if the skb has multiple tags. */ -static inline netdev_features_t vlan_features_check(const struct sk_buff *skb, +static inline netdev_features_t vlan_features_check(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tagged_multi(skb)) { --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2706,7 +2706,7 @@ netdev_features_t passthru_features_chec } EXPORT_SYMBOL(passthru_features_check);
-static netdev_features_t dflt_features_check(const struct sk_buff *skb, +static netdev_features_t dflt_features_check(struct sk_buff *skb, struct net_device *dev, netdev_features_t features) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 1071ec9d453a38023579714b64a951a2fb982071 ]
pf->cmp_addr() is called before binding a v6 address to the sock. It should not check ports, like in sctp_inet_cmp_addr.
But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr, sctp_v6_cmp_addr where it also compares the ports.
This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind multiple duplicated IPv6 addresses after Commit 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr").
This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr, but do the proper check for both v6 addrs and v4mapped addrs.
v1->v2: - define __sctp_v6_cmp_addr to do the common address comparison used for both pf and af v6 cmp_addr.
Fixes: 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr") Reported-by: Jianwen Ji jiji@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/ipv6.c | 60 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 30 insertions(+), 30 deletions(-)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -519,46 +519,49 @@ static void sctp_v6_to_addr(union sctp_a addr->v6.sin6_scope_id = 0; }
-/* Compare addresses exactly. - * v4-mapped-v6 is also in consideration. - */ -static int sctp_v6_cmp_addr(const union sctp_addr *addr1, - const union sctp_addr *addr2) +static int __sctp_v6_cmp_addr(const union sctp_addr *addr1, + const union sctp_addr *addr2) { if (addr1->sa.sa_family != addr2->sa.sa_family) { if (addr1->sa.sa_family == AF_INET && addr2->sa.sa_family == AF_INET6 && - ipv6_addr_v4mapped(&addr2->v6.sin6_addr)) { - if (addr2->v6.sin6_port == addr1->v4.sin_port && - addr2->v6.sin6_addr.s6_addr32[3] == - addr1->v4.sin_addr.s_addr) - return 1; - } + ipv6_addr_v4mapped(&addr2->v6.sin6_addr) && + addr2->v6.sin6_addr.s6_addr32[3] == + addr1->v4.sin_addr.s_addr) + return 1; + if (addr2->sa.sa_family == AF_INET && addr1->sa.sa_family == AF_INET6 && - ipv6_addr_v4mapped(&addr1->v6.sin6_addr)) { - if (addr1->v6.sin6_port == addr2->v4.sin_port && - addr1->v6.sin6_addr.s6_addr32[3] == - addr2->v4.sin_addr.s_addr) - return 1; - } + ipv6_addr_v4mapped(&addr1->v6.sin6_addr) && + addr1->v6.sin6_addr.s6_addr32[3] == + addr2->v4.sin_addr.s_addr) + return 1; + return 0; } - if (addr1->v6.sin6_port != addr2->v6.sin6_port) - return 0; + if (!ipv6_addr_equal(&addr1->v6.sin6_addr, &addr2->v6.sin6_addr)) return 0; + /* If this is a linklocal address, compare the scope_id. */ - if (ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { - if (addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id && - (addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id)) { - return 0; - } - } + if ((ipv6_addr_type(&addr1->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) && + addr1->v6.sin6_scope_id && addr2->v6.sin6_scope_id && + addr1->v6.sin6_scope_id != addr2->v6.sin6_scope_id) + return 0;
return 1; }
+/* Compare addresses exactly. + * v4-mapped-v6 is also in consideration. + */ +static int sctp_v6_cmp_addr(const union sctp_addr *addr1, + const union sctp_addr *addr2) +{ + return __sctp_v6_cmp_addr(addr1, addr2) && + addr1->v6.sin6_port == addr2->v6.sin6_port; +} + /* Initialize addr struct to INADDR_ANY. */ static void sctp_v6_inaddr_any(union sctp_addr *addr, __be16 port) { @@ -843,8 +846,8 @@ static int sctp_inet6_cmp_addr(const uni const union sctp_addr *addr2, struct sctp_sock *opt) { - struct sctp_af *af1, *af2; struct sock *sk = sctp_opt2sk(opt); + struct sctp_af *af1, *af2;
af1 = sctp_get_af_specific(addr1->sa.sa_family); af2 = sctp_get_af_specific(addr2->sa.sa_family); @@ -860,10 +863,7 @@ static int sctp_inet6_cmp_addr(const uni if (sctp_is_any(sk, addr1) || sctp_is_any(sk, addr2)) return 1;
- if (addr1->sa.sa_family != addr2->sa.sa_family) - return 0; - - return af1->cmp_addr(addr1, addr2); + return __sctp_v6_cmp_addr(addr1, addr2); }
/* Verify that the provided sockaddr looks bindable. Common verification,
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit f7e43672683b097bb074a8fe7af9bc600a23f231 ]
syzbot reported we still access llc->sap in llc_backlog_rcv() after it is freed in llc_sap_remove_socket():
Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785 llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline] llc_conn_service net/llc/llc_conn.c:400 [inline] llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75 llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x12f/0x3a0 net/core/sock.c:2335 release_sock+0xa4/0x2b0 net/core/sock.c:2850 llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204
llc->sap is refcount'ed and llc_sap_remove_socket() is paired with llc_sap_add_socket(). This can be amended by holding its refcount before llc_sap_remove_socket() and releasing it after release_sock().
Reported-by: syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/llc/af_llc.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -187,6 +187,7 @@ static int llc_ui_release(struct socket { struct sock *sk = sock->sk; struct llc_sock *llc; + struct llc_sap *sap;
if (unlikely(sk == NULL)) goto out; @@ -197,9 +198,15 @@ static int llc_ui_release(struct socket llc->laddr.lsap, llc->daddr.lsap); if (!llc_send_disc(sk)) llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo); + sap = llc->sap; + /* Hold this for release_sock(), so that llc_backlog_rcv() could still + * use it. + */ + llc_sap_hold(sap); if (!sock_flag(sk, SOCK_ZAPPED)) llc_sap_remove_socket(llc->sap, sk); release_sock(sk); + llc_sap_put(sap); if (llc->dev) dev_put(llc->dev); sock_put(sk);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 3a04ce7130a7e5dad4e78d45d50313747f8c830f ]
For SOCK_ZAPPED socket, we don't need to care about llc->sap, so we should just skip these refcount functions in this case.
Fixes: f7e43672683b ("llc: hold llc_sap before release_sock()") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/llc/af_llc.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
--- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -187,7 +187,6 @@ static int llc_ui_release(struct socket { struct sock *sk = sock->sk; struct llc_sock *llc; - struct llc_sap *sap;
if (unlikely(sk == NULL)) goto out; @@ -198,15 +197,19 @@ static int llc_ui_release(struct socket llc->laddr.lsap, llc->daddr.lsap); if (!llc_send_disc(sk)) llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo); - sap = llc->sap; - /* Hold this for release_sock(), so that llc_backlog_rcv() could still - * use it. - */ - llc_sap_hold(sap); - if (!sock_flag(sk, SOCK_ZAPPED)) + if (!sock_flag(sk, SOCK_ZAPPED)) { + struct llc_sap *sap = llc->sap; + + /* Hold this for release_sock(), so that llc_backlog_rcv() + * could still use it. + */ + llc_sap_hold(sap); llc_sap_remove_socket(llc->sap, sk); - release_sock(sk); - llc_sap_put(sap); + release_sock(sk); + llc_sap_put(sap); + } else { + release_sock(sk); + } if (llc->dev) dev_put(llc->dev); sock_put(sk);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit ec518f21cb1a1b1f8a516499ea05c60299e04963 ]
Before syzbot/KMSAN bites, add the missing policy for TIPC_NLA_NET_ADDR
Fixes: 27c21416727a ("tipc: add net set to new netlink api") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Jon Maloy jon.maloy@ericsson.com Cc: Ying Xue ying.xue@windriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/tipc/net.c +++ b/net/tipc/net.c @@ -44,7 +44,8 @@
static const struct nla_policy tipc_nl_net_policy[TIPC_NLA_NET_MAX + 1] = { [TIPC_NLA_NET_UNSPEC] = { .type = NLA_UNSPEC }, - [TIPC_NLA_NET_ID] = { .type = NLA_U32 } + [TIPC_NLA_NET_ID] = { .type = NLA_U32 }, + [TIPC_NLA_NET_ADDR] = { .type = NLA_U32 }, };
/*
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfgang Bumiller w.bumiller@proxmox.com
[ Upstream commit 53b76cdf7e8fecec1d09e38aad2f8579882591a8 ]
When coming from ndisc_netdev_event() in net/ipv6/ndisc.c, neigh_ifdown() is called with &nd_tbl, locking this while clearing the proxy neighbor entries when eg. deleting an interface. Calling the table's pndisc_destructor() with the lock still held, however, can cause a deadlock: When a multicast listener is available an IGMP packet of type ICMPV6_MGM_REDUCTION may be sent out. When reaching ip6_finish_output2(), if no neighbor entry for the target address is found, __neigh_create() is called with &nd_tbl, which it'll want to lock.
Move the elements into their own list, then unlock the table and perform the destruction.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289 Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().") Signed-off-by: Wolfgang Bumiller w.bumiller@proxmox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -54,7 +54,8 @@ do { \ static void neigh_timer_handler(unsigned long arg); static void __neigh_notify(struct neighbour *n, int type, int flags); static void neigh_update_notify(struct neighbour *neigh); -static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); +static int pneigh_ifdown_and_unlock(struct neigh_table *tbl, + struct net_device *dev);
#ifdef CONFIG_PROC_FS static const struct file_operations neigh_stat_seq_fops; @@ -254,8 +255,7 @@ int neigh_ifdown(struct neigh_table *tbl { write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev); - pneigh_ifdown(tbl, dev); - write_unlock_bh(&tbl->lock); + pneigh_ifdown_and_unlock(tbl, dev);
del_timer_sync(&tbl->proxy_timer); pneigh_queue_purge(&tbl->proxy_queue); @@ -645,9 +645,10 @@ int pneigh_delete(struct neigh_table *tb return -ENOENT; }
-static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev) +static int pneigh_ifdown_and_unlock(struct neigh_table *tbl, + struct net_device *dev) { - struct pneigh_entry *n, **np; + struct pneigh_entry *n, **np, *freelist = NULL; u32 h;
for (h = 0; h <= PNEIGH_HASHMASK; h++) { @@ -655,16 +656,23 @@ static int pneigh_ifdown(struct neigh_ta while ((n = *np) != NULL) { if (!dev || n->dev == dev) { *np = n->next; - if (tbl->pdestructor) - tbl->pdestructor(n); - if (n->dev) - dev_put(n->dev); - kfree(n); + n->next = freelist; + freelist = n; continue; } np = &n->next; } } + write_unlock_bh(&tbl->lock); + while ((n = freelist)) { + freelist = n->next; + n->next = NULL; + if (tbl->pdestructor) + tbl->pdestructor(n); + if (n->dev) + dev_put(n->dev); + kfree(n); + } return -ENOENT; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7212303268918b9a203aebeacfdbd83b5e87b20d ]
syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
I believe this was caused by a TCP_MD5SIG being set on live flow.
This is highly unexpected, since TCP option space is limited.
For instance, presence of TCP MD5 option automatically disables TCP TimeStamp option at SYN/SYNACK time, which we can not do once flow has been established.
Really, adding/deleting an MD5 key only makes sense on sockets in CLOSE or LISTEN state.
[1] BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720 CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720 tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline] tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184 tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453 tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747 SyS_sendto+0x8a/0xb0 net/socket.c:1715 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x448fe9 RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9 RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004 RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010 R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000 R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624 __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline] tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline] tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469 tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747 SyS_sendto+0x8a/0xb0 net/socket.c:1715 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2589,8 +2589,10 @@ static int do_tcp_setsockopt(struct sock
#ifdef CONFIG_TCP_MD5SIG case TCP_MD5SIG: - /* Read the IP->Key mappings from userspace */ - err = tp->af_specific->md5_parse(sk, optval, optlen); + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) + err = tp->af_specific->md5_parse(sk, optval, optlen); + else + err = -EINVAL; break; #endif case TCP_USER_TIMEOUT:
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 5171b37d959641bbc619781caf62e61f7b940871 ]
In order to remove the race caught by syzbot [1], we need to lock the socket before using po->tp_version as this could change under us otherwise.
This means lock_sock() and release_sock() must be done by packet_set_ring() callers.
[1] : BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249 CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249 packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849 SyS_setsockopt+0x76/0xa0 net/socket.c:1828 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x449099 RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099 RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003 RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000 R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001
Local variable description: ----req_u@packet_setsockopt Variable was created at: packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2899,6 +2899,7 @@ static int packet_release(struct socket
packet_flush_mclist(sk);
+ lock_sock(sk); if (po->rx_ring.pg_vec) { memset(&req_u, 0, sizeof(req_u)); packet_set_ring(sk, &req_u, 1, 0); @@ -2908,6 +2909,7 @@ static int packet_release(struct socket memset(&req_u, 0, sizeof(req_u)); packet_set_ring(sk, &req_u, 1, 1); } + release_sock(sk);
f = fanout_release(sk);
@@ -3577,6 +3579,7 @@ packet_setsockopt(struct socket *sock, i union tpacket_req_u req_u; int len;
+ lock_sock(sk); switch (po->tp_version) { case TPACKET_V1: case TPACKET_V2: @@ -3587,14 +3590,21 @@ packet_setsockopt(struct socket *sock, i len = sizeof(req_u.req3); break; } - if (optlen < len) - return -EINVAL; - if (pkt_sk(sk)->has_vnet_hdr) - return -EINVAL; - if (copy_from_user(&req_u.req, optval, len)) - return -EFAULT; - return packet_set_ring(sk, &req_u, 0, - optname == PACKET_TX_RING); + if (optlen < len) { + ret = -EINVAL; + } else { + if (pkt_sk(sk)->has_vnet_hdr) { + ret = -EINVAL; + } else { + if (copy_from_user(&req_u.req, optval, len)) + ret = -EFAULT; + else + ret = packet_set_ring(sk, &req_u, 0, + optname == PACKET_TX_RING); + } + } + release_sock(sk); + return ret; } case PACKET_COPY_THRESH: { @@ -4144,7 +4154,6 @@ static int packet_set_ring(struct sock * /* Added to avoid minimal code churn */ struct tpacket_req *req = &req_u->req;
- lock_sock(sk); /* Opening a Tx-ring is NOT supported in TPACKET_V3 */ if (!closing && tx_ring && (po->tp_version > TPACKET_V2)) { WARN(1, "Tx-ring is not supported.\n"); @@ -4280,7 +4289,6 @@ static int packet_set_ring(struct sock * if (pg_vec) free_pg_vec(pg_vec, order, req->tp_block_nr); out: - release_sock(sk); return err; }
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit aa8f8778493c85fff480cdf8b349b1e1dcb5f243 ]
KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute.
I also believe RTA_PREFSRC lacks a similar check.
Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/route.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2711,6 +2711,7 @@ void rt6_mtu_change(struct net_device *d
static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = { [RTA_GATEWAY] = { .len = sizeof(struct in6_addr) }, + [RTA_PREFSRC] = { .len = sizeof(struct in6_addr) }, [RTA_OIF] = { .type = NLA_U32 }, [RTA_IIF] = { .type = NLA_U32 }, [RTA_PRIORITY] = { .type = NLA_U32 }, @@ -2719,6 +2720,7 @@ static const struct nla_policy rtm_ipv6_ [RTA_PREF] = { .type = NLA_U8 }, [RTA_ENCAP_TYPE] = { .type = NLA_U16 }, [RTA_ENCAP] = { .type = NLA_NESTED }, + [RTA_TABLE] = { .type = NLA_U32 }, };
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin K. Petersen martin.petersen@oracle.com
commit 94e5395d2403c8bc2504a7cbe4c4caaacb7b8b84 upstream.
First generation MPT Fusion controllers can not translate WRITE SAME when the attached device is a SATA drive. Disable WRITE SAME support.
Reported-by: Nikola Ciprich nikola.ciprich@linuxbox.cz Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/message/fusion/mptsas.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -1994,6 +1994,7 @@ static struct scsi_host_template mptsas_ .cmd_per_lun = 7, .use_clustering = ENABLE_CLUSTERING, .shost_attrs = mptscsih_host_attrs, + .no_write_same = 1, };
static int mptsas_get_linkerrors(struct sas_phy *phy)
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 9de4ee40547fd315d4a0ed1dd15a2fa3559ad707 upstream.
This cast is wrong. "cdi->capacity" is an int and "arg" is an unsigned long. The way the check is written now, if one of the high 32 bits is set then we could read outside the info->slots[] array.
This bug is pretty old and it predates git.
Reviewed-by: Christoph Hellwig hch@lst.de Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cdrom/cdrom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -2358,7 +2358,7 @@ static int cdrom_ioctl_media_changed(str if (!CDROM_CAN(CDC_SELECT_DISC) || arg == CDSL_CURRENT) return media_changed(cdi, 1);
- if ((unsigned int)arg >= cdi->capacity) + if (arg >= cdi->capacity) return -EINVAL;
info = kmalloc(sizeof(*info), GFP_KERNEL);
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Ott sebott@linux.ibm.com
commit af2e460ade0b0180d0f3812ca4f4f59cc9597f3e upstream.
Channel path descriptors have been seen as something stable (as long as the chpid is configured). Recent tests have shown that the descriptor can also be altered when the link state of a channel path changes. Thus it is necessary to update the descriptor during handling of resource accessibility events.
Cc: stable@vger.kernel.org Signed-off-by: Sebastian Ott sebott@linux.ibm.com Reviewed-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/chsc.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/s390/cio/chsc.c +++ b/drivers/s390/cio/chsc.c @@ -451,6 +451,7 @@ static void chsc_process_sei_link_incide
static void chsc_process_sei_res_acc(struct chsc_sei_nt0_area *sei_area) { + struct channel_path *chp; struct chp_link link; struct chp_id chpid; int status; @@ -463,10 +464,17 @@ static void chsc_process_sei_res_acc(str chpid.id = sei_area->rsid; /* allocate a new channel path structure, if needed */ status = chp_get_status(chpid); - if (status < 0) - chp_new(chpid); - else if (!status) + if (!status) return; + + if (status < 0) { + chp_new(chpid); + } else { + chp = chpid_to_chp(chpid); + mutex_lock(&chp->lock); + chp_update_desc(chp); + mutex_unlock(&chp->lock); + } memset(&link, 0, sizeof(struct chp_link)); link.chpid = chpid; if ((sei_area->vf & 0xc0) != 0) {
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Carstens heiko.carstens@de.ibm.com
commit 783c3b53b9506db3e05daacfe34e0287eebb09d8 upstream.
Implement s390 specific arch_uretprobe_is_alive() to avoid SIGSEGVs observed with uretprobes in combination with setjmp/longjmp.
See commit 2dea1d9c38e4 ("powerpc/uprobes: Implement arch_uretprobe_is_alive()") for more details.
With this implemented all test cases referenced in the above commit pass.
Reported-by: Ziqian SUN zsun@redhat.com Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/uprobes.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/arch/s390/kernel/uprobes.c +++ b/arch/s390/kernel/uprobes.c @@ -147,6 +147,15 @@ unsigned long arch_uretprobe_hijack_retu return orig; }
+bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx, + struct pt_regs *regs) +{ + if (ctx == RP_CHECK_CHAIN_CALL) + return user_stack_pointer(regs) <= ret->stack; + else + return user_stack_pointer(regs) < ret->stack; +} + /* Instruction Emulation */
static void adjust_psw_addr(psw_t *psw, unsigned long len)
On 04/27/2018 07:58 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.130 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Apr 29 13:56:42 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.130-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Fri, Apr 27, 2018 at 03:58:02PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.130 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Apr 29 13:56:42 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.130-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled, and installed onto my Pixel 2 XL and OnePlus 5.
No initial issues noticed in either dmesg or general usage.
Thanks! Nathan
On Fri, Apr 27, 2018 at 03:58:02PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.130 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Apr 29 13:56:42 UTC 2018. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.4.130-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: aedfcc63a1a9aca5a263e7733131589117195778 git describe: v4.4.129-51-gaedfcc63a1a9 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.129-51-...
No regressions (compared to build v4.4.129)
Boards, architectures and test suites: -------------------------------------
juno-r2 - arm64 * boot - pass: 20, * kselftest - skip: 37, pass: 29, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 53, pass: 28, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 4, pass: 10, * ltp-securebits-tests - pass: 4, * ltp-timers-tests - pass: 13,
qemu_arm * boot - pass: 10, fail: 24 * ltp-cap_bounds-tests - pass: 2, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-hugetlb-tests - skip: 1, pass: 21, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-pty-tests - pass: 4, * ltp-securebits-tests - pass: 4,
qemu_x86_64 * boot - pass: 22, * kselftest - skip: 40, pass: 40, * kselftest-vsyscall-mode-native - skip: 40, pass: 40, * kselftest-vsyscall-mode-none - skip: 40, pass: 40, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 1, pass: 13, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 155, pass: 995, * ltp-timers-tests - pass: 13,
x15 - arm * boot - pass: 20, * kselftest - skip: 36, pass: 29, * libhugetlbfs - skip: 1, pass: 87, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 18, pass: 63, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 5, pass: 58, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 2, pass: 20, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 1, pass: 13, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 78, pass: 1072, * ltp-timers-tests - pass: 13,
x86_64 * boot - pass: 22, * kselftest - skip: 37, pass: 41, * kselftest-vsyscall-mode-native - skip: 37, pass: 40, fail: 1 * kselftest-vsyscall-mode-none - skip: 37, pass: 41, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 17, pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 5, pass: 58, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 5, pass: 9, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 119, pass: 1031, * ltp-timers-tests - pass: 13,
Summary ------------------------------------------------------------------------
kernel: 4.4.130-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git tag: 4.4.130-rc1-hikey-20180427-178 git commit: 2f9fbcd225029f66acc57b4fe3f037e1b65e1804 git describe: 4.4.130-rc1-hikey-20180427-178 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.129-rc2-hikey-20180423-176)
Boards, architectures and test suites: -------------------------------------
hi6220-hikey - arm64 * boot - pass: 19, * kselftest - skip: 38, pass: 27, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - skip: 53, pass: 28, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 6, pass: 57, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 1, pass: 21, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 4, pass: 10, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 142, pass: 1008, * ltp-timers-tests - pass: 13,
qemu_arm64 * boot - pass: 3, fail: 17 * ltp-cap_bounds-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-syscalls-tests - skip: 162, pass: 986, fail: 2
On 04/27/2018 06:58 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.130 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Apr 29 13:56:42 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 146 fail: 0 Qemu test results: total: 127 pass: 127 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org