This is the start of the stable review cycle for the 4.4.189 release. There are 21 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 11 Aug 2019 01:42:28 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.189-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.4.189-rc1
Thomas Gleixner tglx@linutronix.de x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
Josh Poimboeuf jpoimboe@redhat.com x86/entry/64: Use JMP instead of JMPQ
Josh Poimboeuf jpoimboe@redhat.com x86/speculation: Enable Spectre v1 swapgs mitigations
Josh Poimboeuf jpoimboe@redhat.com x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
Wanpeng Li wanpeng.li@hotmail.com x86/entry/64: Fix context tracking state warning when load_gs_index fails
Ben Hutchings ben@decadent.org.uk x86: cpufeatures: Sort feature word 7
Lukas Wunner lukas@wunner.de spi: bcm2835: Fix 3-wire mode if DMA is enabled
xiao jin jin.xiao@intel.com block: blk_init_allocated_queue() set q->fq as NULL in the fail case
Arnd Bergmann arnd@arndb.de compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
Sudarsana Reddy Kalluru skalluru@marvell.com bnx2x: Disable multi-cos feature.
Mark Zhang markz@mellanox.com net/mlx5: Use reversed order when unregister devices
Jia-Ju Bai baijiaju1990@gmail.com net: sched: Fix a possible null-pointer dereference in dequeue_func()
Taras Kondratiuk takondra@cisco.com tipc: compat: allow tipc commands without arguments
Jiri Pirko jiri@mellanox.com net: fix ifindex collision during namespace removal
Nikolay Aleksandrov nikolay@cumulusnetworks.com net: bridge: delete local fdb on device init failure
Gustavo A. R. Silva gustavo@embeddedor.com atm: iphase: Fix Spectre v1 vulnerability
Eric Dumazet edumazet@google.com tcp: be more careful in tcp_fragment()
Sebastian Parschauer s.parschauer@gmx.de HID: Add quirk for HP X1200 PIXART OEM mouse
Phil Turnbull phil.turnbull@oracle.com netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter
Will Deacon will@kernel.org arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG}
Will Deacon will.deacon@arm.com arm64: cpufeature: Fix CTR_EL0 field definitions
-------------
Diffstat:
Documentation/kernel-parameters.txt | 7 +- Makefile | 4 +- arch/arm64/include/asm/cpufeature.h | 7 +- arch/arm64/kernel/cpufeature.c | 14 +++- arch/x86/entry/calling.h | 19 +++++ arch/x86/entry/entry_64.S | 25 +++++- arch/x86/include/asm/cpufeatures.h | 10 ++- arch/x86/kernel/cpu/bugs.c | 105 ++++++++++++++++++++++-- arch/x86/kernel/cpu/common.c | 42 ++++++---- block/blk-core.c | 1 + drivers/atm/iphase.c | 8 +- drivers/hid/hid-ids.h | 1 + drivers/hid/usbhid/hid-quirks.c | 1 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- drivers/net/ppp/pppoe.c | 3 + drivers/net/ppp/pppox.c | 13 +++ drivers/net/ppp/pptp.c | 3 + drivers/spi/spi-bcm2835.c | 3 +- fs/compat_ioctl.c | 3 - include/linux/if_pppox.h | 3 + include/net/tcp.h | 17 ++++ net/bridge/br_vlan.c | 5 ++ net/core/dev.c | 2 + net/ipv4/tcp_output.c | 11 ++- net/l2tp/l2tp_ppp.c | 3 + net/netfilter/nfnetlink_acct.c | 2 + net/sched/sch_codel.c | 3 +- net/tipc/netlink_compat.c | 11 ++- 29 files changed, 272 insertions(+), 58 deletions(-)
commit be68a8aaf925aaf35574260bf820bb09d2f9e07f upstream.
Our field definitions for CTR_EL0 suffer from a number of problems:
- The IDC and DIC fields are missing, which causes us to enable CTR trapping on CPUs with either of these returning non-zero values.
- The ERG is FTR_LOWER_SAFE, whereas it should be treated like CWG as FTR_HIGHER_SAFE so that applications can use it to avoid false sharing.
- [nit] A RES1 field is described as "RAO"
This patch updates the CTR_EL0 field definitions to fix these issues.
Cc: stable@vger.kernel.org # 4.4.y only Cc: Shanker Donthineni shankerd@codeaurora.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/cpufeature.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index c1eddc07d9968..fff0bf2f889e1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -126,10 +126,12 @@ static struct arm64_ftr_bits ftr_id_aa64mmfr1[] = { };
static struct arm64_ftr_bits ftr_ctr[] = { - U_ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RAO */ - ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 28, 3, 0), + U_ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RES1 */ + ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 30, 1, 0), + U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1), /* DIC */ + U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1), /* IDC */ U_ARM64_FTR_BITS(FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), /* CWG */ - U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 0), /* ERG */ + U_ARM64_FTR_BITS(FTR_STRICT, FTR_HIGHER_SAFE, 20, 4, 0), /* ERG */ U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1), /* DminLine */ /* * Linux can handle differing I-cache policies. Userspace JITs will
commit 147b9635e6347104b91f48ca9dca61eb0fbf2a54 upstream.
If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have their architecturally maximum values, which defeats the use of FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous machines.
Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively saturate at zero.
Fixes: 3c739b571084 ("arm64: Keep track of CPU feature registers") Cc: stable@vger.kernel.org # 4.4.y only Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/include/asm/cpufeature.h | 7 ++++--- arch/arm64/kernel/cpufeature.c | 8 ++++++-- 2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index ad83c245781c3..0a66f8241f185 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -41,9 +41,10 @@
/* CPU feature register tracking */ enum ftr_type { - FTR_EXACT, /* Use a predefined safe value */ - FTR_LOWER_SAFE, /* Smaller value is safe */ - FTR_HIGHER_SAFE,/* Bigger value is safe */ + FTR_EXACT, /* Use a predefined safe value */ + FTR_LOWER_SAFE, /* Smaller value is safe */ + FTR_HIGHER_SAFE, /* Bigger value is safe */ + FTR_HIGHER_OR_ZERO_SAFE, /* Bigger value is safe, but 0 is biggest */ };
#define FTR_STRICT true /* SANITY check strict matching required */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index fff0bf2f889e1..062484d344509 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -130,8 +130,8 @@ static struct arm64_ftr_bits ftr_ctr[] = { ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 30, 1, 0), U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 29, 1, 1), /* DIC */ U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 28, 1, 1), /* IDC */ - U_ARM64_FTR_BITS(FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), /* CWG */ - U_ARM64_FTR_BITS(FTR_STRICT, FTR_HIGHER_SAFE, 20, 4, 0), /* ERG */ + U_ARM64_FTR_BITS(FTR_STRICT, FTR_HIGHER_OR_ZERO_SAFE, 24, 4, 0), /* CWG */ + U_ARM64_FTR_BITS(FTR_STRICT, FTR_HIGHER_OR_ZERO_SAFE, 20, 4, 0), /* ERG */ U_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1), /* DminLine */ /* * Linux can handle differing I-cache policies. Userspace JITs will @@ -341,6 +341,10 @@ static s64 arm64_ftr_safe_value(struct arm64_ftr_bits *ftrp, s64 new, s64 cur) case FTR_LOWER_SAFE: ret = new < cur ? new : cur; break; + case FTR_HIGHER_OR_ZERO_SAFE: + if (!cur || !new) + break; + /* Fallthrough */ case FTR_HIGHER_SAFE: ret = new > cur ? new : cur; break;
[ Upstream commit eda3fc50daa93b08774a18d51883c5a5d8d85e15 ]
If a quota bit is set in NFACCT_FLAGS but the NFACCT_QUOTA parameter is missing then a NULL pointer dereference is triggered. CAP_NET_ADMIN is required to trigger the bug.
Signed-off-by: Phil Turnbull phil.turnbull@oracle.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nfnetlink_acct.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/netfilter/nfnetlink_acct.c +++ b/net/netfilter/nfnetlink_acct.c @@ -97,6 +97,8 @@ nfnl_acct_new(struct sock *nfnl, struct return -EINVAL; if (flags & NFACCT_F_OVERQUOTA) return -EINVAL; + if ((flags & NFACCT_F_QUOTA) && !tb[NFACCT_QUOTA]) + return -EINVAL;
size += sizeof(u64); }
From: Sebastian Parschauer s.parschauer@gmx.de
commit 49869d2ea9eecc105a10724c1abf035151a3c4e2 upstream.
The PixArt OEM mice are known for disconnecting every minute in runlevel 1 or 3 if they are not always polled. So add quirk ALWAYS_POLL for this one as well.
Jonathan Teh (@jonathan-teh) reported and tested the quirk. Reference: https://github.com/sriemer/fix-linux-mouse/issues/15
Signed-off-by: Sebastian Parschauer s.parschauer@gmx.de CC: stable@vger.kernel.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/hid-ids.h | 1 + drivers/hid/usbhid/hid-quirks.c | 1 + 2 files changed, 2 insertions(+)
--- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -470,6 +470,7 @@ #define USB_PRODUCT_ID_HP_LOGITECH_OEM_USB_OPTICAL_MOUSE_0A4A 0x0a4a #define USB_PRODUCT_ID_HP_LOGITECH_OEM_USB_OPTICAL_MOUSE_0B4A 0x0b4a #define USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE 0x134a +#define USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE_0641 0x0641
#define USB_VENDOR_ID_HUION 0x256c #define USB_DEVICE_ID_HUION_TABLET 0x006e --- a/drivers/hid/usbhid/hid-quirks.c +++ b/drivers/hid/usbhid/hid-quirks.c @@ -82,6 +82,7 @@ static const struct hid_blacklist { { USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_LOGITECH_OEM_USB_OPTICAL_MOUSE_0A4A, HID_QUIRK_ALWAYS_POLL }, { USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_LOGITECH_OEM_USB_OPTICAL_MOUSE_0B4A, HID_QUIRK_ALWAYS_POLL }, { USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE, HID_QUIRK_ALWAYS_POLL }, + { USB_VENDOR_ID_HP, USB_PRODUCT_ID_HP_PIXART_OEM_USB_OPTICAL_MOUSE_0641, HID_QUIRK_ALWAYS_POLL }, { USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_C077, HID_QUIRK_ALWAYS_POLL }, { USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_KEYBOARD_G710_PLUS, HID_QUIRK_NOGET }, { USB_VENDOR_ID_LOGITECH, USB_DEVICE_ID_LOGITECH_MOUSE_C01A, HID_QUIRK_ALWAYS_POLL },
[ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ]
Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15
Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); }
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Andrew Prout aprout@ll.mit.edu Tested-by: Andrew Prout aprout@ll.mit.edu Tested-by: Jonathan Lemon jonathan.lemon@gmail.com Tested-by: Michal Kubecek mkubecek@suse.cz Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Yuchung Cheng ycheng@google.com Acked-by: Christoph Paasch cpaasch@apple.com Cc: Jonathan Looney jtl@netflix.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tcp.h | 17 +++++++++++++++++ net/ipv4/tcp_output.c | 11 ++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 77438a8406ecf..0410fd29d5695 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1526,6 +1526,23 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli tcp_sk(sk)->highest_sack = NULL; }
+static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk) +{ + struct sk_buff *skb = tcp_write_queue_head(sk); + + if (skb == tcp_send_head(sk)) + skb = NULL; + + return skb; +} + +static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) +{ + struct sk_buff *skb = tcp_send_head(sk); + + return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); +} + static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { __skb_queue_tail(&sk->sk_write_queue, skb); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 53edd60fd3817..76ffce0c18aeb 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1151,6 +1151,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; int nsize, old_factor; + long limit; int nlen; u8 flags;
@@ -1161,7 +1162,15 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, if (nsize < 0) nsize = 0;
- if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 0x20000)) { + /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb. + * We need some allowance to not penalize applications setting small + * SO_SNDBUF values. + * Also allow first and last skb in retransmit queue to be split. + */ + limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE); + if (unlikely((sk->sk_wmem_queued >> 1) > limit && + skb != tcp_rtx_queue_head(sk) && + skb != tcp_rtx_queue_tail(sk))) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); return -ENOMEM; }
From: "Gustavo A. R. Silva" gustavo@embeddedor.com
[ Upstream commit ea443e5e98b5b74e317ef3d26bcaea54931ccdee ]
board is controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/iphase.c:2765 ia_ioctl() warn: potential spectre issue 'ia_dev' [r] (local cap) drivers/atm/iphase.c:2774 ia_ioctl() warn: possible spectre second half. 'iadev' drivers/atm/iphase.c:2782 ia_ioctl() warn: possible spectre second half. 'iadev' drivers/atm/iphase.c:2816 ia_ioctl() warn: possible spectre second half. 'iadev' drivers/atm/iphase.c:2823 ia_ioctl() warn: possible spectre second half. 'iadev' drivers/atm/iphase.c:2830 ia_ioctl() warn: potential spectre issue '_ia_dev' [r] (local cap) drivers/atm/iphase.c:2845 ia_ioctl() warn: possible spectre second half. 'iadev' drivers/atm/iphase.c:2856 ia_ioctl() warn: possible spectre second half. 'iadev'
Fix this by sanitizing board before using it to index ia_dev and _ia_dev
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/atm/iphase.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/atm/iphase.c +++ b/drivers/atm/iphase.c @@ -63,6 +63,7 @@ #include <asm/byteorder.h> #include <linux/vmalloc.h> #include <linux/jiffies.h> +#include <linux/nospec.h> #include "iphase.h" #include "suni.h" #define swap_byte_order(x) (((x & 0xff) << 8) | ((x & 0xff00) >> 8)) @@ -2755,8 +2756,11 @@ static int ia_ioctl(struct atm_dev *dev, } if (copy_from_user(&ia_cmds, arg, sizeof ia_cmds)) return -EFAULT; board = ia_cmds.status; - if ((board < 0) || (board > iadev_count)) - board = 0; + + if ((board < 0) || (board > iadev_count)) + board = 0; + board = array_index_nospec(board, iadev_count + 1); + iadev = ia_dev[board]; switch (ia_cmds.cmd) { case MEMDUMP:
From: Nikolay Aleksandrov nikolay@cumulusnetworks.com
[ Upstream commit d7bae09fa008c6c9a489580db0a5a12063b97f97 ]
On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init()
This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it.
Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bridge/br_vlan.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -580,6 +580,11 @@ void br_vlan_flush(struct net_bridge *br
ASSERT_RTNL();
+ /* delete auto-added default pvid local fdb before flushing vlans + * otherwise it will be leaked on bridge device init failure + */ + br_fdb_delete_by_port(br, NULL, 0, 1); + vg = br_vlan_group(br); __vlan_flush(vg); RCU_INIT_POINTER(br->vlgrp, NULL);
From: Jiri Pirko jiri@mellanox.com
[ Upstream commit 55b40dbf0e76b4bfb9d8b3a16a0208640a9a45df ]
Commit aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") introduced a possibility to hit a BUG in case device is returning back to init_net and two following conditions are met: 1) dev->ifindex value is used in a name of another "dev%d" device in init_net. 2) dev->name is used by another device in init_net.
Under real life circumstances this is hard to get. Therefore this has been present happily for over 10 years. To reproduce:
$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip netns add ns1 $ ip -n ns1 link add dummy1ns1 type dummy $ ip -n ns1 link add dummy2ns1 type dummy $ ip link set enp0s2 netns ns1 $ ip -n ns1 link set enp0s2 name dummy0 [ 100.858894] virtio_net virtio0 dummy0: renamed from enp0s2 $ ip link add dev4 type dummy $ ip -n ns1 a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff 3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff 4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff $ ip netns del ns1 [ 158.717795] default_device_exit: failed to move dummy0 to init_net: -17 [ 158.719316] ------------[ cut here ]------------ [ 158.720591] kernel BUG at net/core/dev.c:9824! [ 158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI [ 158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18 [ 158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 158.727508] Workqueue: netns cleanup_net [ 158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.750638] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.752944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 158.762758] Call Trace: [ 158.763882] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.766148] ? devlink_nl_cmd_set_doit+0x520/0x520 [ 158.768034] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.769870] ops_exit_list.isra.0+0xa8/0x150 [ 158.771544] cleanup_net+0x446/0x8f0 [ 158.772945] ? unregister_pernet_operations+0x4a0/0x4a0 [ 158.775294] process_one_work+0xa1a/0x1740 [ 158.776896] ? pwq_dec_nr_in_flight+0x310/0x310 [ 158.779143] ? do_raw_spin_lock+0x11b/0x280 [ 158.780848] worker_thread+0x9e/0x1060 [ 158.782500] ? process_one_work+0x1740/0x1740 [ 158.784454] kthread+0x31b/0x420 [ 158.786082] ? __kthread_create_on_node+0x3f0/0x3f0 [ 158.788286] ret_from_fork+0x3a/0x50 [ 158.789871] ---[ end trace defd6c657c71f936 ]--- [ 158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.829899] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.834923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fix this by checking if a device with the same name exists in init_net and fallback to original code - dev%d to allocate name - in case it does.
This was found using syzkaller.
Fixes: aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -7768,6 +7768,8 @@ static void __net_exit default_device_ex
/* Push remaining network devices to init_net */ snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex); + if (__dev_get_by_name(&init_net, fb_name)) + snprintf(fb_name, IFNAMSIZ, "dev%%d"); err = dev_change_net_namespace(dev, &init_net, fb_name); if (err) { pr_emerg("%s: failed to move %s to init_net: %d\n",
From: Taras Kondratiuk takondra@cisco.com
[ Upstream commit 4da5f0018eef4c0de31675b670c80e82e13e99d1 ]
Commit 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit") broke older tipc tools that use compat interface (e.g. tipc-config from tipcutils package):
% tipc-config -p operation not supported
The commit started to reject TIPC netlink compat messages that do not have attributes. It is too restrictive because some of such messages are valid (they don't need any arguments):
% grep 'tx none' include/uapi/linux/tipc_config.h #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */ #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */ #define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */ #define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */ #define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */ #define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */ #define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */ #define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */
This patch relaxes the original fix and rejects messages without arguments only if such arguments are expected by a command (reg_type is non zero).
Fixes: 2753ca5d9009 ("tipc: fix uninit-value in tipc_nl_compat_doit") Cc: stable@vger.kernel.org Signed-off-by: Taras Kondratiuk takondra@cisco.com Acked-by: Ying Xue ying.xue@windriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/netlink_compat.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -55,6 +55,7 @@ struct tipc_nl_compat_msg { int rep_type; int rep_size; int req_type; + int req_size; struct net *net; struct sk_buff *rep; struct tlv_desc *req; @@ -252,7 +253,8 @@ static int tipc_nl_compat_dumpit(struct int err; struct sk_buff *arg;
- if (msg->req_type && !TLV_CHECK_TYPE(msg->req, msg->req_type)) + if (msg->req_type && (!msg->req_size || + !TLV_CHECK_TYPE(msg->req, msg->req_type))) return -EINVAL;
msg->rep = tipc_tlv_alloc(msg->rep_size); @@ -345,7 +347,8 @@ static int tipc_nl_compat_doit(struct ti { int err;
- if (msg->req_type && !TLV_CHECK_TYPE(msg->req, msg->req_type)) + if (msg->req_type && (!msg->req_size || + !TLV_CHECK_TYPE(msg->req, msg->req_type))) return -EINVAL;
err = __tipc_nl_compat_doit(cmd, msg); @@ -1192,8 +1195,8 @@ static int tipc_nl_compat_recv(struct sk goto send; }
- len = nlmsg_attrlen(req_nlh, GENL_HDRLEN + TIPC_GENL_HDRLEN); - if (!len || !TLV_OK(msg.req, len)) { + msg.req_size = nlmsg_attrlen(req_nlh, GENL_HDRLEN + TIPC_GENL_HDRLEN); + if (msg.req_size && !TLV_OK(msg.req, msg.req_size)) { msg.rep = tipc_get_err_tlv(TIPC_CFG_NOT_SUPPORTED); err = -EOPNOTSUPP; goto send;
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit 051c7b39be4a91f6b7d8c4548444e4b850f1f56c ]
In dequeue_func(), there is an if statement on line 74 to check whether skb is NULL: if (skb)
When skb is NULL, it is used on line 77: prefetch(&skb->end);
Thus, a possible null-pointer dereference may occur.
To fix this bug, skb->end is used when skb is not NULL.
This bug is found by a static analysis tool STCheck written by us.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Reviewed-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_codel.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/sched/sch_codel.c +++ b/net/sched/sch_codel.c @@ -68,7 +68,8 @@ static struct sk_buff *dequeue(struct co { struct sk_buff *skb = __skb_dequeue(&sch->q);
- prefetch(&skb->end); /* we'll need skb_shinfo() */ + if (skb) + prefetch(&skb->end); /* we'll need skb_shinfo() */ return skb; }
From: Mark Zhang markz@mellanox.com
[ Upstream commit 08aa5e7da6bce1a1963f63cf32c2e7ad434ad578 ]
When lag is active, which is controlled by the bonded mlx5e netdev, mlx5 interface unregestering must happen in the reverse order where rdma is unregistered (unloaded) first, to guarantee all references to the lag context in hardware is removed, then remove mlx5e netdev interface which will cleanup the lag context from hardware.
Without this fix during destroy of LAG interface, we observed following errors: * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xe4ac33) * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa5aee8).
Fixes: a31208b1e11d ("net/mlx5_core: New init and exit flow for mlx5_core") Reviewed-by: Parav Pandit parav@mellanox.com Reviewed-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Mark Zhang markz@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -778,7 +778,7 @@ static void mlx5_unregister_device(struc struct mlx5_interface *intf;
mutex_lock(&intf_mutex); - list_for_each_entry(intf, &intf_list, list) + list_for_each_entry_reverse(intf, &intf_list, list) mlx5_remove_device(intf, priv); list_del(&priv->dev_list); mutex_unlock(&intf_mutex);
From: Sudarsana Reddy Kalluru skalluru@marvell.com
[ Upstream commit d1f0b5dce8fda09a7f5f04c1878f181d548e42f5 ]
Commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") which enabled multi-cos feature after prolonged time in driver added some regression causing numerous issues (sudden reboots, tx timeout etc.) reported by customers. We plan to backout this commit and submit proper fix once we have root cause of issues reported with this feature enabled.
Fixes: 3968d38917eb ("bnx2x: Fix Multi-Cos.") Signed-off-by: Sudarsana Reddy Kalluru skalluru@marvell.com Signed-off-by: Manish Chopra manishc@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -1957,7 +1957,7 @@ u16 bnx2x_select_queue(struct net_device }
/* select a non-FCoE queue */ - return fallback(dev, skb) % (BNX2X_NUM_ETH_QUEUES(bp) * bp->max_cos); + return fallback(dev, skb) % (BNX2X_NUM_ETH_QUEUES(bp)); }
void bnx2x_set_num_queues(struct bnx2x *bp)
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ]
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures.
Guillaume Nault adds:
And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used.
Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function.
All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion.
This should apply to all stable kernels.
Acked-by: Guillaume Nault g.nault@alphalink.fr Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ppp/pppoe.c | 3 +++ drivers/net/ppp/pppox.c | 13 +++++++++++++ drivers/net/ppp/pptp.c | 3 +++ fs/compat_ioctl.c | 3 --- include/linux/if_pppox.h | 3 +++ net/l2tp/l2tp_ppp.c | 3 +++ 6 files changed, 25 insertions(+), 3 deletions(-)
--- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -1152,6 +1152,9 @@ static const struct proto_ops pppoe_ops .recvmsg = pppoe_recvmsg, .mmap = sock_no_mmap, .ioctl = pppox_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = pppox_compat_ioctl, +#endif };
static const struct pppox_proto pppoe_proto = { --- a/drivers/net/ppp/pppox.c +++ b/drivers/net/ppp/pppox.c @@ -22,6 +22,7 @@ #include <linux/string.h> #include <linux/module.h> #include <linux/kernel.h> +#include <linux/compat.h> #include <linux/errno.h> #include <linux/netdevice.h> #include <linux/net.h> @@ -103,6 +104,18 @@ int pppox_ioctl(struct socket *sock, uns
EXPORT_SYMBOL(pppox_ioctl);
+#ifdef CONFIG_COMPAT +int pppox_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +{ + if (cmd == PPPOEIOCSFWD32) + cmd = PPPOEIOCSFWD; + + return pppox_ioctl(sock, cmd, (unsigned long)compat_ptr(arg)); +} + +EXPORT_SYMBOL(pppox_compat_ioctl); +#endif + static int pppox_create(struct net *net, struct socket *sock, int protocol, int kern) { --- a/drivers/net/ppp/pptp.c +++ b/drivers/net/ppp/pptp.c @@ -674,6 +674,9 @@ static const struct proto_ops pptp_ops = .recvmsg = sock_no_recvmsg, .mmap = sock_no_mmap, .ioctl = pppox_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = pppox_compat_ioctl, +#endif };
static const struct pppox_proto pppox_pptp_proto = { --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -1016,9 +1016,6 @@ COMPATIBLE_IOCTL(PPPIOCDISCONN) COMPATIBLE_IOCTL(PPPIOCATTCHAN) COMPATIBLE_IOCTL(PPPIOCGCHAN) COMPATIBLE_IOCTL(PPPIOCGL2TPSTATS) -/* PPPOX */ -COMPATIBLE_IOCTL(PPPOEIOCSFWD) -COMPATIBLE_IOCTL(PPPOEIOCDFWD) /* ppdev */ COMPATIBLE_IOCTL(PPSETMODE) COMPATIBLE_IOCTL(PPRSTATUS) --- a/include/linux/if_pppox.h +++ b/include/linux/if_pppox.h @@ -84,6 +84,9 @@ extern int register_pppox_proto(int prot extern void unregister_pppox_proto(int proto_num); extern void pppox_unbind_sock(struct sock *sk);/* delete ppp-channel binding */ extern int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); +extern int pppox_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg); + +#define PPPOEIOCSFWD32 _IOW(0xB1 ,0, compat_size_t)
/* PPPoX socket states */ enum { --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -1805,6 +1805,9 @@ static const struct proto_ops pppol2tp_o .recvmsg = pppol2tp_recvmsg, .mmap = sock_no_mmap, .ioctl = pppox_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = pppox_compat_ioctl, +#endif };
static const struct pppox_proto pppol2tp_proto = {
From: xiao jin jin.xiao@intel.com
commit 54648cf1ec2d7f4b6a71767799c45676a138ca24 upstream.
We find the memory use-after-free issue in __blk_drain_queue() on the kernel 4.14. After read the latest kernel 4.18-rc6 we think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue(). If the elevator init function called with error return, it will run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery") Cc: stable@vger.kernel.org Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bart.vanassche@wdc.com Signed-off-by: xiao jin jin.xiao@intel.com Signed-off-by: Jens Axboe axboe@kernel.dk [groeck: backport to v4.4.y/v4.9.y (context change)] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Alessio Balsini balsini@android.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-core.c | 1 + 1 file changed, 1 insertion(+)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -870,6 +870,7 @@ blk_init_allocated_queue(struct request_
fail: blk_free_flush_queue(q->fq); + q->fq = NULL; return NULL; } EXPORT_SYMBOL(blk_init_allocated_queue);
From: Lukas Wunner lukas@wunner.de
commit 8d8bef50365847134b51c1ec46786bc2873e4e47 upstream.
Commit 6935224da248 ("spi: bcm2835: enable support of 3-wire mode") added 3-wire support to the BCM2835 SPI driver by setting the REN bit (Read Enable) in the CS register when receiving data. The REN bit puts the transmitter in high-impedance state. The driver recognizes that data is to be received by checking whether the rx_buf of a transfer is non-NULL.
Commit 3ecd37edaa2a ("spi: bcm2835: enable dma modes for transfers meeting certain conditions") subsequently broke 3-wire support because it set the SPI_MASTER_MUST_RX flag which causes spi_map_msg() to replace rx_buf with a dummy buffer if it is NULL. As a result, rx_buf is *always* non-NULL if DMA is enabled.
Reinstate 3-wire support by not only checking whether rx_buf is non-NULL, but also checking that it is not the dummy buffer.
Fixes: 3ecd37edaa2a ("spi: bcm2835: enable dma modes for transfers meeting certain conditions") Reported-by: Nuno Sá nuno.sa@analog.com Signed-off-by: Lukas Wunner lukas@wunner.de Cc: stable@vger.kernel.org # v4.2+ Cc: Martin Sperl kernel@martin.sperl.org Acked-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/328318841455e505370ef8ecad97b646c033dc8a.156214852... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/spi/spi-bcm2835.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -554,7 +554,8 @@ static int bcm2835_spi_transfer_one(stru bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
/* handle all the 3-wire mode */ - if ((spi->mode & SPI_3WIRE) && (tfr->rx_buf)) + if (spi->mode & SPI_3WIRE && tfr->rx_buf && + tfr->rx_buf != master->dummy_rx) cs |= BCM2835_SPI_CS_REN; else cs &= ~BCM2835_SPI_CS_REN;
From: Ben Hutchings ben@decadent.org.uk
This will make it clearer which bits are allocated, in case we need to assign more feature bits for later backports.
Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -196,13 +196,10 @@ #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* "" AMD Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ -#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */ - #define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */ #define X86_FEATURE_SSBD ( 7*32+17) /* Speculative Store Bypass Disable */
-/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */ -#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */ +#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */
#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled*/ #define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */ @@ -215,6 +212,7 @@ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ +#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
/* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
From: Wanpeng Li wanpeng.li@hotmail.com
commit 2fa5f04f85730d0c4f49f984b7efeb4f8d5bd1fc upstream.
This warning:
WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50 CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 enter_from_user_mode+0x32/0x50 error_entry+0x6d/0xc0 ? general_protection+0x12/0x30 ? native_load_gs_index+0xd/0x20 ? do_set_thread_area+0x19c/0x1f0 SyS_set_thread_area+0x24/0x30 do_int80_syscall_32+0x7c/0x220 entry_INT80_compat+0x38/0x50
... can be reproduced by running the GS testcase of the ldt_gdt test unit in the x86 selftests.
do_int80_syscall_32() will call enter_form_user_mode() to convert context tracking state from user state to kernel state. The load_gs_index() call can fail with user gsbase, gsbase will be fixed up and proceed if this happen.
However, enter_from_user_mode() will be called again in the fixed up path though it is context tracking kernel state currently.
This patch fixes it by just fixing up gsbase and telling lockdep that IRQs are off once load_gs_index() failed with user gsbase.
Signed-off-by: Wanpeng Li wanpeng.li@hotmail.com Acked-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/entry_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1133,7 +1133,6 @@ ENTRY(error_entry) testb $3, CS+8(%rsp) jz .Lerror_kernelspace
-.Lerror_entry_from_usermode_swapgs: /* * We entered from user mode or we're pretending to have entered * from user mode due to an IRET fault. @@ -1177,7 +1176,8 @@ ENTRY(error_entry) * gsbase and proceed. We'll fix up the exception and land in * gs_change's error handler with kernel gsbase. */ - jmp .Lerror_entry_from_usermode_swapgs + SWAPGS + jmp .Lerror_entry_done
.Lbstep_iret: /* Fix truncated RIP */
From: Josh Poimboeuf jpoimboe@redhat.com
commit 18ec54fdd6d18d92025af097cd042a75cf0ea24c upstream.
Spectre v1 isn't only about array bounds checks. It can affect any conditional checks. The kernel entry code interrupt, exception, and NMI handlers all have conditional swapgs checks. Those may be problematic in the context of Spectre v1, as kernel code can speculatively run with a user GS.
For example:
if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg mov (%reg), %reg1
When coming from user space, the CPU can speculatively skip the swapgs, and then do a speculative percpu load using the user GS value. So the user can speculatively force a read of any kernel value. If a gadget exists which uses the percpu value as an address in another load/store, then the contents of the kernel value may become visible via an L1 side channel attack.
A similar attack exists when coming from kernel space. The CPU can speculatively do the swapgs, causing the user GS to get used for the rest of the speculative window.
The mitigation is similar to a traditional Spectre v1 mitigation, except:
a) index masking isn't possible; because the index (percpu offset) isn't user-controlled; and
b) an lfence is needed in both the "from user" swapgs path and the "from kernel" non-swapgs path (because of the two attacks described above).
The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a CR3 write when PTI is enabled. Since CR3 writes are serializing, the lfences can be skipped in those cases.
On the other hand, the kernel entry swapgs paths don't depend on PTI.
To avoid unnecessary lfences for the user entry case, create two separate features for alternative patching:
X86_FEATURE_FENCE_SWAPGS_USER X86_FEATURE_FENCE_SWAPGS_KERNEL
Use these features in entry code to patch in lfences where needed.
The features aren't enabled yet, so there's no functional change.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Dave Hansen dave.hansen@intel.com [bwh: Backported to 4.4: - Assign the CPU feature bits from word 7 - Add FENCE_SWAPGS_KERNEL_ENTRY to NMI entry, since it does not use paranoid_entry - Include <asm/cpufeatures.h> in calling.h - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/calling.h | 19 +++++++++++++++++++ arch/x86/entry/entry_64.S | 21 +++++++++++++++++++-- arch/x86/include/asm/cpufeatures.h | 3 +++ 3 files changed, 41 insertions(+), 2 deletions(-)
--- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -1,3 +1,5 @@ +#include <asm/cpufeatures.h> + /*
x86 function call convention, 64-bit: @@ -199,6 +201,23 @@ For 32-bit we have the following convent .byte 0xf1 .endm
+/* + * Mitigate Spectre v1 for conditional swapgs code paths. + * + * FENCE_SWAPGS_USER_ENTRY is used in the user entry swapgs code path, to + * prevent a speculative swapgs when coming from kernel space. + * + * FENCE_SWAPGS_KERNEL_ENTRY is used in the kernel entry non-swapgs code path, + * to prevent the swapgs from getting speculatively skipped when coming from + * user space. + */ +.macro FENCE_SWAPGS_USER_ENTRY + ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_USER +.endm +.macro FENCE_SWAPGS_KERNEL_ENTRY + ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL +.endm + #else /* CONFIG_X86_64 */
/* --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -551,6 +551,7 @@ END(irq_entries_start) * tracking that we're in kernel mode. */ SWAPGS + FENCE_SWAPGS_USER_ENTRY SWITCH_KERNEL_CR3
/* @@ -566,8 +567,10 @@ END(irq_entries_start) #ifdef CONFIG_CONTEXT_TRACKING call enter_from_user_mode #endif - + jmpq 2f 1: + FENCE_SWAPGS_KERNEL_ENTRY +2: /* * Save previous stack pointer, optionally switch to interrupt stack. * irq_count is used to check if a CPU is already on an interrupt stack @@ -1077,6 +1080,13 @@ ENTRY(paranoid_entry) movq %rax, %cr3 2: #endif + /* + * The above doesn't do an unconditional CR3 write, even in the PTI + * case. So do an lfence to prevent GS speculation, regardless of + * whether PTI is enabled. + */ + FENCE_SWAPGS_KERNEL_ENTRY + ret END(paranoid_entry)
@@ -1138,6 +1148,7 @@ ENTRY(error_entry) * from user mode due to an IRET fault. */ SWAPGS + FENCE_SWAPGS_USER_ENTRY
.Lerror_entry_from_usermode_after_swapgs: /* @@ -1151,6 +1162,8 @@ ENTRY(error_entry) #endif ret
+.Lerror_entry_done_lfence: + FENCE_SWAPGS_KERNEL_ENTRY .Lerror_entry_done: TRACE_IRQS_OFF ret @@ -1169,7 +1182,7 @@ ENTRY(error_entry) cmpq %rax, RIP+8(%rsp) je .Lbstep_iret cmpq $gs_change, RIP+8(%rsp) - jne .Lerror_entry_done + jne .Lerror_entry_done_lfence
/* * hack: gs_change can fail with user gsbase. If this happens, fix up @@ -1177,6 +1190,7 @@ ENTRY(error_entry) * gs_change's error handler with kernel gsbase. */ SWAPGS + FENCE_SWAPGS_USER_ENTRY jmp .Lerror_entry_done
.Lbstep_iret: @@ -1190,6 +1204,7 @@ ENTRY(error_entry) * Switch to kernel gsbase: */ SWAPGS + FENCE_SWAPGS_USER_ENTRY
/* * Pretend that the exception came from user mode: set up pt_regs @@ -1286,6 +1301,7 @@ ENTRY(nmi) * to switch CR3 here. */ cld + FENCE_SWAPGS_USER_ENTRY movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp pushq 5*8(%rdx) /* pt_regs->ss */ @@ -1574,6 +1590,7 @@ end_repeat_nmi: movq %rax, %cr3 2: #endif + FENCE_SWAPGS_KERNEL_ENTRY
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ call do_nmi --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -192,6 +192,9 @@ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_FENCE_SWAPGS_USER ( 7*32+10) /* "" LFENCE in user entry SWAPGS path */ +#define X86_FEATURE_FENCE_SWAPGS_KERNEL ( 7*32+11) /* "" LFENCE in kernel entry SWAPGS path */ + #define X86_FEATURE_RETPOLINE ( 7*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */ #define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* "" AMD Retpoline mitigation for Spectre variant 2 */
From: Josh Poimboeuf jpoimboe@redhat.com
commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream.
The previous commit added macro calls in the entry code which mitigate the Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are enabled. Enable those features where applicable.
The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
There are different features which can affect the risk of attack:
- When FSGSBASE is enabled, unprivileged users are able to place any value in GS, using the wrgsbase instruction. This means they can write a GS value which points to any value in kernel space, which can be useful with the following gadget in an interrupt/exception/NMI handler:
if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg1 // dependent load or store based on the value of %reg // for example: mov %(reg1), %reg2
If an interrupt is coming from user space, and the entry code speculatively skips the swapgs (due to user branch mistraining), it may speculatively execute the GS-based load and a subsequent dependent load or store, exposing the kernel data to an L1 side channel leak.
Note that, on Intel, a similar attack exists in the above gadget when coming from kernel space, if the swapgs gets speculatively executed to switch back to the user GS. On AMD, this variant isn't possible because swapgs is serializing with respect to future GS-based accesses.
NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case doesn't exist quite yet.
- When FSGSBASE is disabled, the issue is mitigated somewhat because unprivileged users must use prctl(ARCH_SET_GS) to set GS, which restricts GS values to user space addresses only. That means the gadget would need an additional step, since the target kernel address needs to be read from user space first. Something like:
if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg1 mov (%reg1), %reg2 // dependent load or store based on the value of %reg2 // for example: mov %(reg2), %reg3
It's difficult to audit for this gadget in all the handlers, so while there are no known instances of it, it's entirely possible that it exists somewhere (or could be introduced in the future). Without tooling to analyze all such code paths, consider it vulnerable.
Effects of SMAP on the !FSGSBASE case:
- If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not susceptible to Meltdown), the kernel is prevented from speculatively reading user space memory, even L1 cached values. This effectively disables the !FSGSBASE attack vector.
- If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP still prevents the kernel from speculatively reading user space memory. But it does *not* prevent the kernel from reading the user value from L1, if it has already been cached. This is probably only a small hurdle for an attacker to overcome.
Thanks to Dave Hansen for contributing the speculative_smap() function.
Thanks to Andrew Cooper for providing the inside scoop on whether swapgs is serializing on AMD.
[ tglx: Fixed the USER fence decision and polished the comment as suggested by Dave Hansen ]
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Dave Hansen dave.hansen@intel.com [bwh: Backported to 4.4: - Check for X86_FEATURE_KAISER instead of X86_FEATURE_PTI - mitigations= parameter is x86-only here - Don't use __ro_after_init - Adjust filename, context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 7 +- arch/x86/kernel/cpu/bugs.c | 115 +++++++++++++++++++++++++++++++++--- 2 files changed, 110 insertions(+), 12 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2184,6 +2184,7 @@ bytes respectively. Such letter suffixes improves system performance, but it may also expose users to several CPU vulnerabilities. Equivalent to: nopti [X86] + nospectre_v1 [X86] nospectre_v2 [X86] spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86] @@ -2498,9 +2499,9 @@ bytes respectively. Such letter suffixes
nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.
- nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds - check bypass). With this option data leaks are possible - in the system. + nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1 + (bounds check bypass). With this option data leaks are + possible in the system.
nospectre_v2 [X86,PPC_FSL_BOOK3E] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -30,6 +30,7 @@ #include <asm/intel-family.h> #include <asm/e820.h>
+static void __init spectre_v1_select_mitigation(void); static void __init spectre_v2_select_mitigation(void); static void __init ssb_select_mitigation(void); static void __init l1tf_select_mitigation(void); @@ -87,17 +88,11 @@ void __init check_bugs(void) if (boot_cpu_has(X86_FEATURE_STIBP)) x86_spec_ctrl_mask |= SPEC_CTRL_STIBP;
- /* Select the proper spectre mitigation before patching alternatives */ + /* Select the proper CPU mitigations before patching alternatives: */ + spectre_v1_select_mitigation(); spectre_v2_select_mitigation(); - - /* - * Select proper mitigation for any exposure to the Speculative Store - * Bypass vulnerability. - */ ssb_select_mitigation(); - l1tf_select_mitigation(); - mds_select_mitigation();
arch_smt_update(); @@ -252,6 +247,108 @@ static int __init mds_cmdline(char *str) early_param("mds", mds_cmdline);
#undef pr_fmt +#define pr_fmt(fmt) "Spectre V1 : " fmt + +enum spectre_v1_mitigation { + SPECTRE_V1_MITIGATION_NONE, + SPECTRE_V1_MITIGATION_AUTO, +}; + +static enum spectre_v1_mitigation spectre_v1_mitigation = + SPECTRE_V1_MITIGATION_AUTO; + +static const char * const spectre_v1_strings[] = { + [SPECTRE_V1_MITIGATION_NONE] = "Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers", + [SPECTRE_V1_MITIGATION_AUTO] = "Mitigation: usercopy/swapgs barriers and __user pointer sanitization", +}; + +static bool is_swapgs_serializing(void) +{ + /* + * Technically, swapgs isn't serializing on AMD (despite it previously + * being documented as such in the APM). But according to AMD, %gs is + * updated non-speculatively, and the issuing of %gs-relative memory + * operands will be blocked until the %gs update completes, which is + * good enough for our purposes. + */ + return boot_cpu_data.x86_vendor == X86_VENDOR_AMD; +} + +/* + * Does SMAP provide full mitigation against speculative kernel access to + * userspace? + */ +static bool smap_works_speculatively(void) +{ + if (!boot_cpu_has(X86_FEATURE_SMAP)) + return false; + + /* + * On CPUs which are vulnerable to Meltdown, SMAP does not + * prevent speculative access to user data in the L1 cache. + * Consider SMAP to be non-functional as a mitigation on these + * CPUs. + */ + if (boot_cpu_has(X86_BUG_CPU_MELTDOWN)) + return false; + + return true; +} + +static void __init spectre_v1_select_mitigation(void) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1) || cpu_mitigations_off()) { + spectre_v1_mitigation = SPECTRE_V1_MITIGATION_NONE; + return; + } + + if (spectre_v1_mitigation == SPECTRE_V1_MITIGATION_AUTO) { + /* + * With Spectre v1, a user can speculatively control either + * path of a conditional swapgs with a user-controlled GS + * value. The mitigation is to add lfences to both code paths. + * + * If FSGSBASE is enabled, the user can put a kernel address in + * GS, in which case SMAP provides no protection. + * + * [ NOTE: Don't check for X86_FEATURE_FSGSBASE until the + * FSGSBASE enablement patches have been merged. ] + * + * If FSGSBASE is disabled, the user can only put a user space + * address in GS. That makes an attack harder, but still + * possible if there's no SMAP protection. + */ + if (!smap_works_speculatively()) { + /* + * Mitigation can be provided from SWAPGS itself or + * PTI as the CR3 write in the Meltdown mitigation + * is serializing. + * + * If neither is there, mitigate with an LFENCE. + */ + if (!is_swapgs_serializing() && !boot_cpu_has(X86_FEATURE_KAISER)) + setup_force_cpu_cap(X86_FEATURE_FENCE_SWAPGS_USER); + + /* + * Enable lfences in the kernel entry (non-swapgs) + * paths, to prevent user entry from speculatively + * skipping swapgs. + */ + setup_force_cpu_cap(X86_FEATURE_FENCE_SWAPGS_KERNEL); + } + } + + pr_info("%s\n", spectre_v1_strings[spectre_v1_mitigation]); +} + +static int __init nospectre_v1_cmdline(char *str) +{ + spectre_v1_mitigation = SPECTRE_V1_MITIGATION_NONE; + return 0; +} +early_param("nospectre_v1", nospectre_v1_cmdline); + +#undef pr_fmt #define pr_fmt(fmt) "Spectre V2 : " fmt
static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; @@ -1154,7 +1251,7 @@ static ssize_t cpu_show_common(struct de break;
case X86_BUG_SPECTRE_V1: - return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + return sprintf(buf, "%s\n", spectre_v1_strings[spectre_v1_mitigation]);
case X86_BUG_SPECTRE_V2: return sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
From: Josh Poimboeuf jpoimboe@redhat.com
commit 64dbc122b20f75183d8822618c24f85144a5a94d upstream.
Somehow the swapgs mitigation entry code patch ended up with a JMPQ instruction instead of JMP, where only the short jump is needed. Some assembler versions apparently fail to optimize JMPQ into a two-byte JMP when possible, instead always using a 7-byte JMP with relocation. For some reason that makes the entry code explode with a #GP during boot.
Change it back to "JMP" as originally intended.
Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations") Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/entry_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -567,7 +567,7 @@ END(irq_entries_start) #ifdef CONFIG_CONTEXT_TRACKING call enter_from_user_mode #endif - jmpq 2f + jmp 2f 1: FENCE_SWAPGS_KERNEL_ENTRY 2:
From: Thomas Gleixner tglx@linutronix.de
commit f36cf386e3fec258a341d446915862eded3e13d8 upstream.
Intel provided the following information:
On all current Atom processors, instructions that use a segment register value (e.g. a load or store) will not speculatively execute before the last writer of that segment retires. Thus they will not use a speculatively written segment value.
That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS entry paths can be excluded from the extra LFENCE if PTI is disabled.
Create a separate bug flag for the through SWAPGS speculation and mark all out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs are excluded from the whole mitigation mess anyway.
Reported-by: Andrew Cooper andrew.cooper3@citrix.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Tyler Hicks tyhicks@canonical.com Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com [bwh: Backported to 4.4: - There's no whitelist entry (or any support) for Hygon CPUs - Adjust context, indentation] Signed-off-by: Ben Hutchings ben@decadent.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 arch/x86/kernel/cpu/bugs.c | 18 +++------------ arch/x86/kernel/cpu/common.c | 42 +++++++++++++++++++++++-------------- 3 files changed, 32 insertions(+), 29 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -339,5 +339,6 @@ #define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ #define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */ #define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */ +#define X86_BUG_SWAPGS X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -262,18 +262,6 @@ static const char * const spectre_v1_str [SPECTRE_V1_MITIGATION_AUTO] = "Mitigation: usercopy/swapgs barriers and __user pointer sanitization", };
-static bool is_swapgs_serializing(void) -{ - /* - * Technically, swapgs isn't serializing on AMD (despite it previously - * being documented as such in the APM). But according to AMD, %gs is - * updated non-speculatively, and the issuing of %gs-relative memory - * operands will be blocked until the %gs update completes, which is - * good enough for our purposes. - */ - return boot_cpu_data.x86_vendor == X86_VENDOR_AMD; -} - /* * Does SMAP provide full mitigation against speculative kernel access to * userspace? @@ -324,9 +312,11 @@ static void __init spectre_v1_select_mit * PTI as the CR3 write in the Meltdown mitigation * is serializing. * - * If neither is there, mitigate with an LFENCE. + * If neither is there, mitigate with an LFENCE to + * stop speculation through swapgs. */ - if (!is_swapgs_serializing() && !boot_cpu_has(X86_FEATURE_KAISER)) + if (boot_cpu_has_bug(X86_BUG_SWAPGS) && + !boot_cpu_has(X86_FEATURE_KAISER)) setup_force_cpu_cap(X86_FEATURE_FENCE_SWAPGS_USER);
/* --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -853,6 +853,7 @@ static void identify_cpu_without_cpuid(s #define NO_L1TF BIT(3) #define NO_MDS BIT(4) #define MSBDS_ONLY BIT(5) +#define NO_SWAPGS BIT(6)
#define VULNWL(_vendor, _family, _model, _whitelist) \ { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } @@ -876,29 +877,37 @@ static const __initconst struct x86_cpu_ VULNWL_INTEL(ATOM_BONNELL, NO_SPECULATION), VULNWL_INTEL(ATOM_BONNELL_MID, NO_SPECULATION),
- VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), - VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF | MSBDS_ONLY), - VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF | MSBDS_ONLY), - VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), - VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF | MSBDS_ONLY), - VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS), + VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS), + VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS), + VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS), + VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS), + VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
VULNWL_INTEL(CORE_YONAH, NO_SSB),
- VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
- VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF), - VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF), - VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS), + VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF | NO_SWAPGS), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS), + + /* + * Technically, swapgs isn't serializing on AMD (despite it previously + * being documented as such in the APM). But according to AMD, %gs is + * updated non-speculatively, and the issuing of %gs-relative memory + * operands will be blocked until the %gs update completes, which is + * good enough for our purposes. + */
/* AMD Family 0xf - 0x12 */ - VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), - VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), - VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), - VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS), + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS), + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS), + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ - VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS), + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS), {} };
@@ -935,6 +944,9 @@ static void __init cpu_set_bug_bits(stru setup_force_cpu_bug(X86_BUG_MSBDS_ONLY); }
+ if (!cpu_matches(NO_SWAPGS)) + setup_force_cpu_bug(X86_BUG_SWAPGS); + if (cpu_matches(NO_MELTDOWN)) return;
On 8/9/19 7:45 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.189 release. There are 21 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 11 Aug 2019 01:42:28 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.189-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
stable-rc/linux-4.4.y boot: 94 boots: 2 failed, 82 passed with 9 offline, 1 conflict (v4.4.188-22-gab9a14a0618d)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.1... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.188-22-g...
Tree: stable-rc Branch: linux-4.4.y Git Describe: v4.4.188-22-gab9a14a0618d Git Commit: ab9a14a0618d99ad7e0b7e589a97f3421ac4d662 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 46 unique boards, 20 SoC families, 14 builds out of 190
Boot Regressions Detected:
arm:
bcm2835_defconfig: gcc-8: bcm2835-rpi-b: lab-baylibre-seattle: failing since 1 day (last pass: v4.4.187-23-g462a4b2bd3bf - first fail: v4.4.187-40-geae076a61a51)
sama5_defconfig: gcc-8: at91-sama5d4_xplained: lab-baylibre-seattle: failing since 1 day (last pass: v4.4.187-23-g462a4b2bd3bf - first fail: v4.4.187-40-geae076a61a51)
arm64:
defconfig: gcc-8: apq8016-sbc: lab-baylibre-seattle: failing since 1 day (last pass: v4.4.187-23-g462a4b2bd3bf - first fail: v4.4.187-40-geae076a61a51)
Boot Failures Detected:
arm64: defconfig: gcc-8: qcom-qdf2400: 1 failed lab
arm: multi_v7_defconfig: gcc-8: stih410-b2120: 1 failed lab
Offline Platforms:
arm64:
defconfig: gcc-8 apq8016-sbc: 1 offline lab
arm:
bcm2835_defconfig: gcc-8 bcm2835-rpi-b: 1 offline lab
sama5_defconfig: gcc-8 at91-sama5d4_xplained: 1 offline lab
multi_v7_defconfig: gcc-8 alpine-db: 1 offline lab at91-sama5d4_xplained: 1 offline lab bcm4708-smartrg-sr400ac: 1 offline lab socfpga_cyclone5_de0_sockit: 1 offline lab sun5i-r8-chip: 1 offline lab
sunxi_defconfig: gcc-8 sun5i-r8-chip: 1 offline lab
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
x86_64: x86_64_defconfig: qemu: lab-baylibre: PASS (gcc-8) lab-mhart: PASS (gcc-8) lab-linaro-lkft: FAIL (gcc-8) lab-drue: PASS (gcc-8) lab-collabora: PASS (gcc-8)
--- For more info write to info@kernelci.org
On Fri, Aug 09, 2019 at 03:45:04PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.189 release. There are 21 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 11 Aug 2019 01:42:28 PM UTC. Anything received after that time might be too late.
Build results: total: 170 pass: 170 fail: 0 Qemu test results: total: 324 pass: 324 fail: 0
Guenter
On Fri, Aug 09, 2019 at 03:45:04PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.189 release. There are 21 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 11 Aug 2019 01:42:28 PM UTC. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Note that test counts are a bit lower than previous because we are having some infrastructure/lab issue with our qemu/x86 environments. There is no evidence that it's kernel related.
Summary ------------------------------------------------------------------------
kernel: 4.4.189-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: ab9a14a0618d99ad7e0b7e589a97f3421ac4d662 git describe: v4.4.187-45-gab9a14a0618d Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.187-45-...
No regressions (compared to build v4.4.187)
No fixes (compared to build v4.4.187)
Ran 16774 total tests in the following environments and test suites.
Environments -------------- - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * network-basic-tests * perf * prep-tmp-disk * spectre-meltdown-checker-test * kvm-unit-tests * v4l2-compliance * install-android-platform-tools-r2600 * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
Summary ------------------------------------------------------------------------
kernel: 4.4.189-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.189-rc1-hikey-20190809-523 git commit: ffbfd13890f25f989c107e0a79063ff644d02753 git describe: 4.4.189-rc1-hikey-20190809-523 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.1...
No regressions (compared to build 4.4.189-rc1-hikey-20190809-522)
No fixes (compared to build 4.4.189-rc1-hikey-20190809-522)
Ran 1550 total tests in the following environments and test suites.
Environments -------------- - hi6220-hikey - arm64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance
On Sat, Aug 10, 2019 at 11:02:47AM -0500, Dan Rue wrote:
On Fri, Aug 09, 2019 at 03:45:04PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.4.189 release. There are 21 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun 11 Aug 2019 01:42:28 PM UTC. Anything received after that time might be too late.
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Note that test counts are a bit lower than previous because we are having some infrastructure/lab issue with our qemu/x86 environments. There is no evidence that it's kernel related.
thanks for testing and letting me know, and good luck with your infrastructure issues :)
greg k-h
linux-stable-mirror@lists.linaro.org