This is the start of the stable review cycle for the 4.14.125 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 11 Jun 2019 04:40:01 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.125-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.125-rc1
Kirill Smelkov kirr@nexedi.com fuse: Add FOPEN_STREAM to use stream_open()
Kirill Smelkov kirr@nexedi.com fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
Kristian Evensen kristian.evensen@gmail.com qmi_wwan: Add quirk for Quectel dynamic config
Jiri Slaby jslaby@suse.cz TTY: serial_core, add ->install
Daniel Drake drake@endlessm.com drm/i915/fbc: disable framebuffer compression on GeminiLake
Chris Wilson chris@chris-wilson.co.uk drm/i915: Fix I915_EXEC_RING_MASK
Christian König christian.koenig@amd.com drm/radeon: prefer lower reference dividers
Alex Deucher alexander.deucher@amd.com drm/amdgpu/psp: move psp version specific function pointers to early_init
Dave Airlie airlied@redhat.com drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)
Patrik Jakobsson patrik.r.jakobsson@gmail.com drm/gma500/cdv: Check vbt config bits when detecting lvds panels
Dan Carpenter dan.carpenter@oracle.com test_firmware: Use correct snprintf() limit
Dan Carpenter dan.carpenter@oracle.com genwqe: Prevent an integer overflow in the ioctl
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "MIPS: perf: ath79: Fix perfcount IRQ assignment"
Paul Burton paul.burton@mips.com MIPS: pistachio: Build uImage.gz by default
Paul Burton paul.burton@mips.com MIPS: Bounds check virt_addr_valid
Robert Hancock hancock@sedsystems.ca i2c: xiic: Add max_read_len quirk
Jiri Kosina jkosina@suse.cz x86/power: Fix 'nosmt' vs hibernation triple fault during resume
Kees Cook keescook@chromium.org pstore/ram: Run without kernel crash dump region
Kees Cook keescook@chromium.org pstore: Convert buf_lock to semaphore
Kees Cook keescook@chromium.org pstore: Remove needless lock during console writes
Miklos Szeredi mszeredi@redhat.com fuse: fallocate: fix return with locked inode
John David Anglin dave.anglin@bell.net parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
Linus Torvalds torvalds@linux-foundation.org rcu: locking and unlocking need to always be at least barriers
Hangbin Liu liuhangbin@gmail.com Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...")"
Xin Long lucien.xin@gmail.com ipv6: fix the check before getting the cookie in rt6_get_cookie
Russell King rmk+kernel@armlinux.org.uk net: sfp: read eeprom in maximum 16 byte increments
Olivier Matz olivier.matz@6wind.com ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
Olivier Matz olivier.matz@6wind.com ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
Paolo Abeni pabeni@redhat.com pktgen: do not sleep with the thread lock held.
Zhu Yanjun yanjun.zhu@oracle.com net: rds: fix memory leak in rds_ib_flush_mr_pool
Erez Alfasi ereza@mellanox.com net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query
David Ahern dsahern@gmail.com neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
Neil Horman nhorman@tuxdriver.com Fix memory leak in sctp_process_init
Vivien Didelot vivien.didelot@gmail.com ethtool: fix potential userspace buffer overflow
-------------
Diffstat:
Makefile | 4 +- arch/mips/ath79/setup.c | 6 + arch/mips/mm/mmap.c | 5 + arch/mips/pistachio/Platform | 1 + arch/powerpc/kernel/nvram_64.c | 2 - arch/x86/power/cpu.c | 10 + arch/x86/power/hibernate_64.c | 33 +++ drivers/acpi/apei/erst.c | 1 - drivers/firmware/efi/efi-pstore.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 19 +- drivers/gpu/drm/gma500/cdv_intel_lvds.c | 3 + drivers/gpu/drm/gma500/intel_bios.c | 3 + drivers/gpu/drm/gma500/psb_drv.h | 1 + drivers/gpu/drm/i915/intel_fbc.c | 4 + drivers/gpu/drm/nouveau/Kconfig | 13 +- drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +- drivers/gpu/drm/radeon/radeon_display.c | 4 +- drivers/i2c/busses/i2c-xiic.c | 5 + drivers/irqchip/irq-ath79-misc.c | 11 - drivers/misc/genwqe/card_dev.c | 2 + drivers/misc/genwqe/card_utils.c | 4 + drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 +- drivers/net/ethernet/mellanox/mlx4/port.c | 5 - drivers/net/phy/sfp.c | 24 +- drivers/net/usb/qmi_wwan.c | 39 ++- drivers/parisc/ccio-dma.c | 4 +- drivers/parisc/sba_iommu.c | 3 +- drivers/tty/serial/serial_core.c | 24 +- drivers/xen/xenbus/xenbus_dev_frontend.c | 4 +- fs/fuse/file.c | 6 +- fs/open.c | 18 ++ fs/pstore/platform.c | 76 ++--- fs/pstore/ram.c | 37 ++- fs/read_write.c | 5 +- include/linux/cpu.h | 4 + include/linux/fs.h | 4 + include/linux/pstore.h | 7 +- include/linux/rcupdate.h | 6 +- include/net/ip6_fib.h | 3 +- include/uapi/drm/i915_drm.h | 2 +- include/uapi/linux/fuse.h | 2 + kernel/cpu.c | 4 +- kernel/power/hibernate.c | 9 + lib/test_firmware.c | 14 +- net/core/ethtool.c | 5 +- net/core/fib_rules.c | 7 +- net/core/neighbour.c | 9 +- net/core/pktgen.c | 11 + net/ipv6/raw.c | 25 +- net/rds/ib_rdma.c | 10 +- net/sctp/sm_make_chunk.c | 13 +- net/sctp/sm_sideeffect.c | 5 + scripts/coccinelle/api/stream_open.cocci | 363 ++++++++++++++++++++++++ 53 files changed, 720 insertions(+), 174 deletions(-)
From: Vivien Didelot vivien.didelot@gmail.com
[ Upstream commit 0ee4e76937d69128a6a66861ba393ebdc2ffc8a2 ]
ethtool_get_regs() allocates a buffer of size ops->get_regs_len(), and pass it to the kernel driver via ops->get_regs() for filling.
There is no restriction about what the kernel drivers can or cannot do with the open ethtool_regs structure. They usually set regs->version and ignore regs->len or set it to the same size as ops->get_regs_len().
But if userspace allocates a smaller buffer for the registers dump, we would cause a userspace buffer overflow in the final copy_to_user() call, which uses the regs.len value potentially reset by the driver.
To fix this, make this case obvious and store regs.len before calling ops->get_regs(), to only copy as much data as requested by userspace, up to the value returned by ops->get_regs_len().
While at it, remove the redundant check for non-null regbuf.
Signed-off-by: Vivien Didelot vivien.didelot@gmail.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/ethtool.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -1402,13 +1402,16 @@ static int ethtool_get_regs(struct net_d return -ENOMEM; }
+ if (regs.len < reglen) + reglen = regs.len; + ops->get_regs(dev, ®s, regbuf);
ret = -EFAULT; if (copy_to_user(useraddr, ®s, sizeof(regs))) goto out; useraddr += offsetof(struct ethtool_regs, data); - if (regbuf && copy_to_user(useraddr, regbuf, regs.len)) + if (copy_to_user(useraddr, regbuf, reglen)) goto out; ret = 0;
From: Neil Horman nhorman@tuxdriver.com
[ Upstream commit 0a8dd9f67cd0da7dc284f48b032ce00db1a68791 ]
syzbot found the following leak in sctp_process_init BUG: memory leak unreferenced object 0xffff88810ef68400 (size 1024): comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s) hex dump (first 32 bytes): 1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25 ..(........h...% 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline] [<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline] [<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675 [<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119 [<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline] [<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20 net/sctp/sm_make_chunk.c:2437 [<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682 [inline] [<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384 [inline] [<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194 [inline] [<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165 [<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200 net/sctp/associola.c:1074 [<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95 [<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354 [<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline] [<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418 [<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934 [<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122 [<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802 [<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline] [<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671 [<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292 [<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330 [<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline] [<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline] [<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337 [<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3
The problem was that the peer.cookie value points to an skb allocated area on the first pass through this function, at which point it is overwritten with a heap allocated value, but in certain cases, where a COOKIE_ECHO chunk is included in the packet, a second pass through sctp_process_init is made, where the cookie value is re-allocated, leaking the first allocation.
Fix is to always allocate the cookie value, and free it when we are done using it.
Signed-off-by: Neil Horman nhorman@tuxdriver.com Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com CC: Marcelo Ricardo Leitner marcelo.leitner@gmail.com CC: "David S. Miller" davem@davemloft.net CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/sm_make_chunk.c | 13 +++---------- net/sctp/sm_sideeffect.c | 5 +++++ 2 files changed, 8 insertions(+), 10 deletions(-)
--- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -2318,7 +2318,6 @@ int sctp_process_init(struct sctp_associ union sctp_addr addr; struct sctp_af *af; int src_match = 0; - char *cookie;
/* We must include the address that the INIT packet came from. * This is the only address that matters for an INIT packet. @@ -2422,14 +2421,6 @@ int sctp_process_init(struct sctp_associ /* Peer Rwnd : Current calculated value of the peer's rwnd. */ asoc->peer.rwnd = asoc->peer.i.a_rwnd;
- /* Copy cookie in case we need to resend COOKIE-ECHO. */ - cookie = asoc->peer.cookie; - if (cookie) { - asoc->peer.cookie = kmemdup(cookie, asoc->peer.cookie_len, gfp); - if (!asoc->peer.cookie) - goto clean_up; - } - /* RFC 2960 7.2.1 The initial value of ssthresh MAY be arbitrarily * high (for example, implementations MAY use the size of the receiver * advertised window). @@ -2595,7 +2586,9 @@ do_addr_param: case SCTP_PARAM_STATE_COOKIE: asoc->peer.cookie_len = ntohs(param.p->length) - sizeof(struct sctp_paramhdr); - asoc->peer.cookie = param.cookie->body; + asoc->peer.cookie = kmemdup(param.cookie->body, asoc->peer.cookie_len, gfp); + if (!asoc->peer.cookie) + retval = 0; break;
case SCTP_PARAM_HEARTBEAT_INFO: --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -878,6 +878,11 @@ static void sctp_cmd_new_state(struct sc asoc->rto_initial; }
+ if (sctp_state(asoc, ESTABLISHED)) { + kfree(asoc->peer.cookie); + asoc->peer.cookie = NULL; + } + if (sctp_state(asoc, ESTABLISHED) || sctp_state(asoc, CLOSED) || sctp_state(asoc, SHUTDOWN_RECEIVED)) {
From: David Ahern dsahern@gmail.com
[ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ]
Commit cd9ff4de0107 changed the key for IFF_POINTOPOINT devices to INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not updated to use the altered key. The result is that every packet Tx does a lookup on the gateway address which does not find an entry, a new one is created only to find the existing one in the table right before the insert since arp_constructor was updated to reset the primary key. This is seen in the allocs and destroys counters: ip -s -4 ntable show | head -10 | grep alloc
which increase for each packet showing the unnecessary overhread.
Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE.
Fixes: cd9ff4de0107 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY") Reported-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David Ahern dsahern@gmail.com Tested-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -30,6 +30,7 @@ #include <linux/times.h> #include <net/net_namespace.h> #include <net/neighbour.h> +#include <net/arp.h> #include <net/dst.h> #include <net/sock.h> #include <net/netevent.h> @@ -2528,7 +2529,13 @@ int neigh_xmit(int index, struct net_dev if (!tbl) goto out; rcu_read_lock_bh(); - neigh = __neigh_lookup_noref(tbl, addr, dev); + if (index == NEIGH_ARP_TABLE) { + u32 key = *((u32 *)addr); + + neigh = __ipv4_neigh_lookup_noref(dev, key); + } else { + neigh = __neigh_lookup_noref(tbl, addr, dev); + } if (!neigh) neigh = __neigh_create(tbl, addr, dev, false); err = PTR_ERR(neigh);
Hi,
On Sun, Jun 09, 2019 at 06:42:09PM +0200, Greg Kroah-Hartman wrote:
From: David Ahern dsahern@gmail.com
[ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ]
Commit cd9ff4de0107 changed the key for IFF_POINTOPOINT devices to INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not updated to use the altered key. The result is that every packet Tx does a lookup on the gateway address which does not find an entry, a new one is created only to find the existing one in the table right before the insert since arp_constructor was updated to reset the primary key. This is seen in the allocs and destroys counters: ip -s -4 ntable show | head -10 | grep alloc
which increase for each packet showing the unnecessary overhread.
Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE.
Fixes: cd9ff4de0107 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY") Reported-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David Ahern dsahern@gmail.com Tested-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This commit also requires the following commit:
commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 Author: David Ahern dsahern@gmail.com Date: Sun May 5 11:16:20 2019 -0700
ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled
Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is disabled.
Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit") Reported-by: kbuild test robot lkp@intel.com Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net
And this is also necessary for 4.4.y, 4.14.y, 4.19.y and 5.1.y. Please apply this commit.
Best regards, Nobuhiro
Hi again.
-----Original Message----- From: stable-owner@vger.kernel.org [mailto:stable-owner@vger.kernel.org] On Behalf Of Nobuhiro Iwamatsu Sent: Monday, June 10, 2019 10:10 AM To: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org; stable@vger.kernel.org; Alan Maguire alan.maguire@oracle.com; David Ahern dsahern@gmail.com; David S. Miller davem@davemloft.net Subject: Re: [PATCH 4.14 03/35] neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
Hi,
On Sun, Jun 09, 2019 at 06:42:09PM +0200, Greg Kroah-Hartman wrote:
From: David Ahern dsahern@gmail.com
[ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ]
Commit cd9ff4de0107 changed the key for IFF_POINTOPOINT devices to INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not updated to use the altered key. The result is that every packet
Tx
does a lookup on the gateway address which does not find an entry, a new one is created only to find the existing one in the table right before the insert since arp_constructor was updated to reset the primary key. This is seen in the allocs and destroys counters: ip -s -4 ntable show | head -10 | grep alloc
which increase for each packet showing the unnecessary overhread.
Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for
NEIGH_ARP_TABLE.
Fixes: cd9ff4de0107 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY") Reported-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David Ahern dsahern@gmail.com Tested-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This commit also requires the following commit:
commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 Author: David Ahern dsahern@gmail.com Date: Sun May 5 11:16:20 2019 -0700
ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is
disabled.
Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in
neigh_xmit") Reported-by: kbuild test robot lkp@intel.com Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net
And this is also necessary for 4.4.y, 4.14.y, 4.19.y and 5.1.y.
4.4.y, 4.9.y, 4.19.y and 5.1.y.
Please apply this commit.
Best regards, Nobuhiro
On Mon, Jun 10, 2019 at 01:13:16AM +0000, nobuhiro1.iwamatsu@toshiba.co.jp wrote:
Hi again.
-----Original Message----- From: stable-owner@vger.kernel.org [mailto:stable-owner@vger.kernel.org] On Behalf Of Nobuhiro Iwamatsu Sent: Monday, June 10, 2019 10:10 AM To: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org; stable@vger.kernel.org; Alan Maguire alan.maguire@oracle.com; David Ahern dsahern@gmail.com; David S. Miller davem@davemloft.net Subject: Re: [PATCH 4.14 03/35] neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
Hi,
On Sun, Jun 09, 2019 at 06:42:09PM +0200, Greg Kroah-Hartman wrote:
From: David Ahern dsahern@gmail.com
[ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ]
Commit cd9ff4de0107 changed the key for IFF_POINTOPOINT devices to INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not updated to use the altered key. The result is that every packet
Tx
does a lookup on the gateway address which does not find an entry, a new one is created only to find the existing one in the table right before the insert since arp_constructor was updated to reset the primary key. This is seen in the allocs and destroys counters: ip -s -4 ntable show | head -10 | grep alloc
which increase for each packet showing the unnecessary overhread.
Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for
NEIGH_ARP_TABLE.
Fixes: cd9ff4de0107 ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY") Reported-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David Ahern dsahern@gmail.com Tested-by: Alan Maguire alan.maguire@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This commit also requires the following commit:
commit 9b3040a6aafd7898ece7fc7efcbca71e42aa8069 Author: David Ahern dsahern@gmail.com Date: Sun May 5 11:16:20 2019 -0700
ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled Define __ipv4_neigh_lookup_noref to return NULL when CONFIG_INET is
disabled.
Fixes: 4b2a2bfeb3f0 ("neighbor: Call __ipv4_neigh_lookup_noref in
neigh_xmit") Reported-by: kbuild test robot lkp@intel.com Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net
And this is also necessary for 4.4.y, 4.14.y, 4.19.y and 5.1.y.
4.4.y, 4.9.y, 4.19.y and 5.1.y.
Thanks for the information, now queued up everywhere.
greg k-h
From: Erez Alfasi ereza@mellanox.com
[ Upstream commit 135dd9594f127c8a82d141c3c8430e9e2143216a ]
Querying EEPROM high pages data for SFP module is currently not supported by our driver but is still tried, resulting in invalid FW queries.
Set the EEPROM ethtool data length to 256 for SFP module to limit the reading for page 0 only and prevent invalid FW queries.
Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support") Signed-off-by: Erez Alfasi ereza@mellanox.com Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 +++- drivers/net/ethernet/mellanox/mlx4/port.c | 5 ----- 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -1982,6 +1982,8 @@ static int mlx4_en_set_tunable(struct ne return ret; }
+#define MLX4_EEPROM_PAGE_LEN 256 + static int mlx4_en_get_module_info(struct net_device *dev, struct ethtool_modinfo *modinfo) { @@ -2016,7 +2018,7 @@ static int mlx4_en_get_module_info(struc break; case MLX4_MODULE_ID_SFP: modinfo->type = ETH_MODULE_SFF_8472; - modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; + modinfo->eeprom_len = MLX4_EEPROM_PAGE_LEN; break; default: return -EINVAL; --- a/drivers/net/ethernet/mellanox/mlx4/port.c +++ b/drivers/net/ethernet/mellanox/mlx4/port.c @@ -2077,11 +2077,6 @@ int mlx4_get_module_info(struct mlx4_dev size -= offset + size - I2C_PAGE_SIZE;
i2c_addr = I2C_ADDR_LOW; - if (offset >= I2C_PAGE_SIZE) { - /* Reset offset to high page */ - i2c_addr = I2C_ADDR_HIGH; - offset -= I2C_PAGE_SIZE; - }
cable_info = (struct mlx4_cable_info *)inmad->data; cable_info->dev_mem_address = cpu_to_be16(offset);
From: Zhu Yanjun yanjun.zhu@oracle.com
[ Upstream commit 85cb928787eab6a2f4ca9d2a798b6f3bed53ced1 ]
When the following tests last for several hours, the problem will occur.
Server: rds-stress -r 1.1.1.16 -D 1M Client: rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
The following will occur.
" Starting up.... tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu % 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00 "
From vmcore, we can find that clean_list is NULL.
From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls " rds_ib_flush_mr_pool(pool, 0, NULL); " Then in function " int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool, int free_all, struct rds_ib_mr **ibmr_ret) " ibmr_ret is NULL.
In the source code, " ... list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail); if (ibmr_ret) *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
/* more than one entry in llist nodes */ if (clean_nodes->next) llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list); ... " When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next instead of clean_nodes is added in clean_list. So clean_nodes is discarded. It can not be used again. The workqueue is executed periodically. So more and more clean_nodes are discarded. Finally the clean_list is NULL. Then this problem will occur.
Fixes: 1bc144b62524 ("net, rds, Replace xlist in net/rds/xlist.h with llist") Signed-off-by: Zhu Yanjun yanjun.zhu@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/ib_rdma.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -416,12 +416,14 @@ int rds_ib_flush_mr_pool(struct rds_ib_m wait_clean_list_grace();
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail); - if (ibmr_ret) + if (ibmr_ret) { *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode); - + clean_nodes = clean_nodes->next; + } /* more than one entry in llist nodes */ - if (clean_nodes->next) - llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list); + if (clean_nodes) + llist_add_batch(clean_nodes, clean_tail, + &pool->clean_list);
}
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 720f1de4021f09898b8c8443f3b3e995991b6e3a ]
Currently, the process issuing a "start" command on the pktgen procfs interface, acquires the pktgen thread lock and never release it, until all pktgen threads are completed. The above can blocks indefinitely any other pktgen command and any (even unrelated) netdevice removal - as the pktgen netdev notifier acquires the same lock.
The issue is demonstrated by the following script, reported by Matteo:
ip -b - <<'EOF' link add type dummy link add type veth link set dummy0 up EOF modprobe pktgen echo reset >/proc/net/pktgen/pgctrl { echo rem_device_all echo add_device dummy0 } >/proc/net/pktgen/kpktgend_0 echo count 0 >/proc/net/pktgen/dummy0 echo start >/proc/net/pktgen/pgctrl & sleep 1 rmmod veth
Fix the above releasing the thread lock around the sleep call.
Additionally we must prevent racing with forcefull rmmod - as the thread lock no more protects from them. Instead, acquire a self-reference before waiting for any thread. As a side effect, running
rmmod pktgen
while some thread is running now fails with "module in use" error, before this patch such command hanged indefinitely.
Note: the issue predates the commit reported in the fixes tag, but this fix can't be applied before the mentioned commit.
v1 -> v2: - no need to check for thread existence after flipping the lock, pktgen threads are freed only at net exit time -
Fixes: 6146e6a43b35 ("[PKTGEN]: Removes thread_{un,}lock() macros.") Reported-and-tested-by: Matteo Croce mcroce@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/pktgen.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -3149,7 +3149,13 @@ static int pktgen_wait_thread_run(struct { while (thread_is_running(t)) {
+ /* note: 't' will still be around even after the unlock/lock + * cycle because pktgen_thread threads are only cleared at + * net exit + */ + mutex_unlock(&pktgen_thread_lock); msleep_interruptible(100); + mutex_lock(&pktgen_thread_lock);
if (signal_pending(current)) goto signal; @@ -3164,6 +3170,10 @@ static int pktgen_wait_all_threads_run(s struct pktgen_thread *t; int sig = 1;
+ /* prevent from racing with rmmod */ + if (!try_module_get(THIS_MODULE)) + return sig; + mutex_lock(&pktgen_thread_lock);
list_for_each_entry(t, &pn->pktgen_threads, th_list) { @@ -3177,6 +3187,7 @@ static int pktgen_wait_all_threads_run(s t->control |= (T_STOP);
mutex_unlock(&pktgen_thread_lock); + module_put(THIS_MODULE); return sig; }
From: Olivier Matz olivier.matz@6wind.com
[ Upstream commit b9aa52c4cb457e7416cc0c95f475e72ef4a61336 ]
The following code returns EFAULT (Bad address):
s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1); sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */
The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW instead of IPPROTO_ICMPV6.
The failure happens because 2 bytes are eaten from the msghdr by rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt""), but at that time it was not a problem because IPV6_HDRINCL was not yet introduced.
Only eat these 2 bytes if hdrincl == 0.
Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets") Signed-off-by: Olivier Matz olivier.matz@6wind.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/raw.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -889,11 +889,14 @@ static int rawv6_sendmsg(struct sock *sk opt = ipv6_fixup_options(&opt_space, opt);
fl6.flowi6_proto = proto; - rfv.msg = msg; - rfv.hlen = 0; - err = rawv6_probe_proto_opt(&rfv, &fl6); - if (err) - goto out; + + if (!hdrincl) { + rfv.msg = msg; + rfv.hlen = 0; + err = rawv6_probe_proto_opt(&rfv, &fl6); + if (err) + goto out; + }
if (!ipv6_addr_any(daddr)) fl6.daddr = *daddr;
From: Olivier Matz olivier.matz@6wind.com
[ Upstream commit 59e3e4b52663a9d97efbce7307f62e4bc5c9ce91 ]
As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the value of inet->hdrincl in a local variable, to avoid introducing a race condition in the next commit.
Signed-off-by: Olivier Matz olivier.matz@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/raw.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -782,6 +782,7 @@ static int rawv6_sendmsg(struct sock *sk struct sockcm_cookie sockc; struct ipcm6_cookie ipc6; int addr_len = msg->msg_namelen; + int hdrincl; u16 proto; int err;
@@ -795,6 +796,13 @@ static int rawv6_sendmsg(struct sock *sk if (msg->msg_flags & MSG_OOB) return -EOPNOTSUPP;
+ /* hdrincl should be READ_ONCE(inet->hdrincl) + * but READ_ONCE() doesn't work with bit fields. + * Doing this indirectly yields the same result. + */ + hdrincl = inet->hdrincl; + hdrincl = READ_ONCE(hdrincl); + /* * Get and verify the address. */ @@ -913,7 +921,7 @@ static int rawv6_sendmsg(struct sock *sk fl6.flowi6_oif = np->ucast_oif; security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
- if (inet->hdrincl) + if (hdrincl) fl6.flowi6_flags |= FLOWI_FLAG_KNOWN_NH;
if (ipc6.tclass < 0) @@ -936,7 +944,7 @@ static int rawv6_sendmsg(struct sock *sk goto do_confirm;
back_from_confirm: - if (inet->hdrincl) + if (hdrincl) err = rawv6_send_hdrinc(sk, msg, len, &fl6, &dst, msg->msg_flags); else { ipc6.opt = opt;
From: Russell King rmk+kernel@armlinux.org.uk
[ Upstream commit 28e74a7cfd6403f0d1c0f8b10b45d6fae37b227e ]
Some SFP modules do not like reads longer than 16 bytes, so read the EEPROM in chunks of 16 bytes at a time. This behaviour is not specified in the SFP MSAs, which specifies:
"The serial interface uses the 2-wire serial CMOS E2PROM protocol defined for the ATMEL AT24C01A/02/04 family of components."
and
"As long as the SFP+ receives an acknowledge, it shall serially clock out sequential data words. The sequence is terminated when the host responds with a NACK and a STOP instead of an acknowledge."
We must avoid breaking a read across a 16-bit quantity in the diagnostic page, thankfully all 16-bit quantities in that page are naturally aligned.
Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/sfp.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-)
--- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -168,6 +168,7 @@ static int sfp__i2c_read(struct i2c_adap void *buf, size_t len) { struct i2c_msg msgs[2]; + size_t this_len; int ret;
msgs[0].addr = bus_addr; @@ -179,11 +180,26 @@ static int sfp__i2c_read(struct i2c_adap msgs[1].len = len; msgs[1].buf = buf;
- ret = i2c_transfer(i2c, msgs, ARRAY_SIZE(msgs)); - if (ret < 0) - return ret; + while (len) { + this_len = len; + if (this_len > 16) + this_len = 16;
- return ret == ARRAY_SIZE(msgs) ? len : 0; + msgs[1].len = this_len; + + ret = i2c_transfer(i2c, msgs, ARRAY_SIZE(msgs)); + if (ret < 0) + return ret; + + if (ret != ARRAY_SIZE(msgs)) + break; + + msgs[1].buf += this_len; + dev_addr += this_len; + len -= this_len; + } + + return msgs[1].buf - (u8 *)buf; }
static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 addr, void *buf,
From: Xin Long lucien.xin@gmail.com
[ Upstream commit b7999b07726c16974ba9ca3bb9fe98ecbec5f81c ]
In Jianlin's testing, netperf was broken with 'Connection reset by peer', as the cookie check failed in rt6_check() and ip6_dst_check() always returned NULL.
It's caused by Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"), where the cookie can be got only when 'c1'(see below) for setting dst_cookie whereas rt6_check() is called when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check() (!c1) will check the 'from' cookie, this patch is to remove the c1 check in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1: (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip6_fib.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -199,8 +199,7 @@ static inline u32 rt6_get_cookie(const s { u32 cookie = 0;
- if (rt->rt6i_flags & RTF_PCPU || - (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->dst.from)) + if (rt->dst.from) rt = (struct rt6_info *)(rt->dst.from);
rt6_get_cookie_safe(rt, &cookie);
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 691306ebd18f945e44b4552a4bfcca3475e5d957 as the patch that this "fixes" is about to be reverted...
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/fib_rules.c | 1 - 1 file changed, 1 deletion(-)
--- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -564,7 +564,6 @@ int fib_nl_newrule(struct sk_buff *skb, }
if (rule_exists(ops, frh, tb, rule)) { - err = 0; if (nlh->nlmsg_flags & NLM_F_EXCL) err = -EEXIST; goto errout_free;
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 4970b42d5c362bf873982db7d93245c5281e58f4 ]
This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849.
Nathan reported the new behaviour breaks Android, as Android just add new rules and delete old ones.
If we return 0 without adding dup rules, Android will remove the new added rules and causing system to soft-reboot.
Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied") Reported-by: Nathan Chancellor natechancellor@gmail.com Reported-by: Yaro Slav yaro330@gmail.com Reported-by: Maciej Żenczykowski zenczykowski@gmail.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/fib_rules.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -563,9 +563,9 @@ int fib_nl_newrule(struct sk_buff *skb, rule->uid_range = fib_kuid_range_unset; }
- if (rule_exists(ops, frh, tb, rule)) { - if (nlh->nlmsg_flags & NLM_F_EXCL) - err = -EEXIST; + if ((nlh->nlmsg_flags & NLM_F_EXCL) && + rule_exists(ops, frh, tb, rule)) { + err = -EEXIST; goto errout_free; }
From: Linus Torvalds torvalds@linux-foundation.org
commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream.
Herbert Xu pointed out that commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") was incorrect in making the preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is a no-op, but still is a compiler barrier.
And RCU locking still _needs_ that compiler barrier.
It is simply fundamentally not true that RCU locking would be a complete no-op: we still need to guarantee (for example) that things that can trap and cause preemption cannot migrate into the RCU locked region.
The way we do that is by making it a barrier.
See for example commit 386afc91144b ("spinlocks and preemption points need to be at least compiler barriers") from back in 2013 that had similar issues with spinlocks that become no-ops on UP: they must still constrain the compiler from moving other operations into the critical region.
Now, it is true that a lot of RCU operations already use READ_ONCE() and WRITE_ONCE() (which in practice likely would never be re-ordered wrt anything remotely interesting), but it is also true that that is not globally the case, and that it's not even necessarily always possible (ie bitfields etc).
Reported-by: Herbert Xu herbert@gondor.apana.org.au Fixes: bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") Cc: stable@kernel.org Cc: Boqun Feng boqun.feng@gmail.com Cc: Paul E. McKenney paulmck@linux.vnet.ibm.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/rcupdate.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -79,14 +79,12 @@ void synchronize_rcu(void);
static inline void __rcu_read_lock(void) { - if (IS_ENABLED(CONFIG_PREEMPT_COUNT)) - preempt_disable(); + preempt_disable(); }
static inline void __rcu_read_unlock(void) { - if (IS_ENABLED(CONFIG_PREEMPT_COUNT)) - preempt_enable(); + preempt_enable(); }
static inline void synchronize_rcu(void)
From: John David Anglin dave.anglin@bell.net
commit 63923d2c3800919774f5c651d503d1dd2adaddd5 upstream.
We only support I/O to kernel space. Using %sr1 to load the coherence index may be racy unless interrupts are disabled. This patch changes the code used to load the coherence index to use implicit space register selection. This saves one instruction and eliminates the race.
Tested on rp3440, c8000 and c3750.
Signed-off-by: John David Anglin dave.anglin@bell.net Cc: stable@vger.kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/parisc/ccio-dma.c | 4 +--- drivers/parisc/sba_iommu.c | 3 +-- 2 files changed, 2 insertions(+), 5 deletions(-)
--- a/drivers/parisc/ccio-dma.c +++ b/drivers/parisc/ccio-dma.c @@ -565,8 +565,6 @@ ccio_io_pdir_entry(u64 *pdir_ptr, space_ /* We currently only support kernel addresses */ BUG_ON(sid != KERNEL_SPACE);
- mtsp(sid,1); - /* ** WORD 1 - low order word ** "hints" parm includes the VALID bit! @@ -597,7 +595,7 @@ ccio_io_pdir_entry(u64 *pdir_ptr, space_ ** Grab virtual index [0:11] ** Deposit virt_idx bits into I/O PDIR word */ - asm volatile ("lci %%r0(%%sr1, %1), %0" : "=r" (ci) : "r" (vba)); + asm volatile ("lci %%r0(%1), %0" : "=r" (ci) : "r" (vba)); asm volatile ("extru %1,19,12,%0" : "+r" (ci) : "r" (ci)); asm volatile ("depw %1,15,12,%0" : "+r" (pa) : "r" (ci));
--- a/drivers/parisc/sba_iommu.c +++ b/drivers/parisc/sba_iommu.c @@ -575,8 +575,7 @@ sba_io_pdir_entry(u64 *pdir_ptr, space_t pa = virt_to_phys(vba); pa &= IOVP_MASK;
- mtsp(sid,1); - asm("lci 0(%%sr1, %1), %0" : "=r" (ci) : "r" (vba)); + asm("lci 0(%1), %0" : "=r" (ci) : "r" (vba)); pa |= (ci >> PAGE_SHIFT) & 0xff; /* move CI (8 bits) into lowest byte */
pa |= SBA_PDIR_VALID_BIT; /* set "valid" bit */
From: Miklos Szeredi mszeredi@redhat.com
commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream.
Do the proper cleanup in case the size check fails.
Tested with xfstests:generic/228
Reported-by: kbuild test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate") Cc: Liu Bo bo.liu@linux.alibaba.com Cc: stable@vger.kernel.org # v3.5 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2978,7 +2978,7 @@ static long fuse_file_fallocate(struct f offset + length > i_size_read(inode)) { err = inode_newsize_ok(inode, offset + length); if (err) - return err; + goto out; }
if (!(mode & FALLOC_FL_KEEP_SIZE))
From: Kees Cook keescook@chromium.org
commit b77fa617a2ff4d6beccad3d3d4b3a1f2d10368aa upstream.
Since the console writer does not use the preallocated crash dump buffer any more, there is no reason to perform locking around it.
Fixes: 70ad35db3321 ("pstore: Convert console write to use ->write_buf") Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Joel Fernandes (Google) joel@joelfernandes.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/pstore/platform.c | 29 ++++++----------------------- 1 file changed, 6 insertions(+), 23 deletions(-)
--- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -597,31 +597,14 @@ static void pstore_unregister_kmsg(void) #ifdef CONFIG_PSTORE_CONSOLE static void pstore_console_write(struct console *con, const char *s, unsigned c) { - const char *e = s + c; + struct pstore_record record;
- while (s < e) { - struct pstore_record record; - unsigned long flags; + pstore_record_init(&record, psinfo); + record.type = PSTORE_TYPE_CONSOLE;
- pstore_record_init(&record, psinfo); - record.type = PSTORE_TYPE_CONSOLE; - - if (c > psinfo->bufsize) - c = psinfo->bufsize; - - if (oops_in_progress) { - if (!spin_trylock_irqsave(&psinfo->buf_lock, flags)) - break; - } else { - spin_lock_irqsave(&psinfo->buf_lock, flags); - } - record.buf = (char *)s; - record.size = c; - psinfo->write(&record); - spin_unlock_irqrestore(&psinfo->buf_lock, flags); - s += c; - c = e - s; - } + record.buf = (char *)s; + record.size = c; + psinfo->write(&record); }
static struct console pstore_console = {
From: Kees Cook keescook@chromium.org
commit ea84b580b95521644429cc6748b6c2bf27c8b0f3 upstream.
Instead of running with interrupts disabled, use a semaphore. This should make it easier for backends that may need to sleep (e.g. EFI) when performing a write:
|BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum |Preemption disabled at: |[<ffffffff99d60512>] pstore_dump+0x72/0x330 |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45 |Call Trace: | dump_stack+0x4f/0x6a | ___might_sleep.cold.91+0xd3/0xe4 | __might_sleep+0x50/0x90 | wait_for_completion+0x32/0x130 | virt_efi_query_variable_info+0x14e/0x160 | efi_query_variable_store+0x51/0x1a0 | efivar_entry_set_safe+0xa3/0x1b0 | efi_pstore_write+0x109/0x140 | pstore_dump+0x11c/0x330 | kmsg_dump+0xa4/0xd0 | oops_exit+0x22/0x30 ...
Reported-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars") Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/nvram_64.c | 2 - drivers/acpi/apei/erst.c | 1 drivers/firmware/efi/efi-pstore.c | 4 --- fs/pstore/platform.c | 44 +++++++++++++++++++------------------- fs/pstore/ram.c | 1 include/linux/pstore.h | 7 ++---- 6 files changed, 27 insertions(+), 32 deletions(-)
--- a/arch/powerpc/kernel/nvram_64.c +++ b/arch/powerpc/kernel/nvram_64.c @@ -566,8 +566,6 @@ static int nvram_pstore_init(void) nvram_pstore_info.buf = oops_data; nvram_pstore_info.bufsize = oops_data_sz;
- spin_lock_init(&nvram_pstore_info.buf_lock); - rc = pstore_register(&nvram_pstore_info); if (rc && (rc != -EPERM)) /* Print error only when pstore.backend == nvram */ --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -1175,7 +1175,6 @@ static int __init erst_init(void) "Error Record Serialization Table (ERST) support is initialized.\n");
buf = kmalloc(erst_erange.size, GFP_KERNEL); - spin_lock_init(&erst_info.buf_lock); if (buf) { erst_info.buf = buf + sizeof(struct cper_pstore_record); erst_info.bufsize = erst_erange.size - --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -258,8 +258,7 @@ static int efi_pstore_write(struct pstor efi_name[i] = name[i];
ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES, - !pstore_cannot_block_path(record->reason), - record->size, record->psi->buf); + preemptible(), record->size, record->psi->buf);
if (record->reason == KMSG_DUMP_OOPS) efivar_run_worker(); @@ -368,7 +367,6 @@ static __init int efivars_pstore_init(vo return -ENOMEM;
efi_pstore_info.bufsize = 1024; - spin_lock_init(&efi_pstore_info.buf_lock);
if (pstore_register(&efi_pstore_info)) { kfree(efi_pstore_info.buf); --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -129,26 +129,27 @@ static const char *get_reason_str(enum k } }
-bool pstore_cannot_block_path(enum kmsg_dump_reason reason) +/* + * Should pstore_dump() wait for a concurrent pstore_dump()? If + * not, the current pstore_dump() will report a failure to dump + * and return. + */ +static bool pstore_cannot_wait(enum kmsg_dump_reason reason) { - /* - * In case of NMI path, pstore shouldn't be blocked - * regardless of reason. - */ + /* In NMI path, pstore shouldn't block regardless of reason. */ if (in_nmi()) return true;
switch (reason) { /* In panic case, other cpus are stopped by smp_send_stop(). */ case KMSG_DUMP_PANIC: - /* Emergency restart shouldn't be blocked by spin lock. */ + /* Emergency restart shouldn't be blocked. */ case KMSG_DUMP_EMERG: return true; default: return false; } } -EXPORT_SYMBOL_GPL(pstore_cannot_block_path);
#ifdef CONFIG_PSTORE_ZLIB_COMPRESS /* Derived from logfs_compress() */ @@ -499,23 +500,23 @@ static void pstore_dump(struct kmsg_dump unsigned long total = 0; const char *why; unsigned int part = 1; - unsigned long flags = 0; - int is_locked; int ret;
why = get_reason_str(reason);
- if (pstore_cannot_block_path(reason)) { - is_locked = spin_trylock_irqsave(&psinfo->buf_lock, flags); - if (!is_locked) { - pr_err("pstore dump routine blocked in %s path, may corrupt error record\n" - , in_nmi() ? "NMI" : why); + if (down_trylock(&psinfo->buf_lock)) { + /* Failed to acquire lock: give up if we cannot wait. */ + if (pstore_cannot_wait(reason)) { + pr_err("dump skipped in %s path: may corrupt error record\n", + in_nmi() ? "NMI" : why); + return; + } + if (down_interruptible(&psinfo->buf_lock)) { + pr_err("could not grab semaphore?!\n"); return; } - } else { - spin_lock_irqsave(&psinfo->buf_lock, flags); - is_locked = 1; } + oopscount++; while (total < kmsg_bytes) { char *dst; @@ -532,7 +533,7 @@ static void pstore_dump(struct kmsg_dump record.part = part; record.buf = psinfo->buf;
- if (big_oops_buf && is_locked) { + if (big_oops_buf) { dst = big_oops_buf; dst_size = big_oops_buf_sz; } else { @@ -550,7 +551,7 @@ static void pstore_dump(struct kmsg_dump dst_size, &dump_size)) break;
- if (big_oops_buf && is_locked) { + if (big_oops_buf) { zipped_len = pstore_compress(dst, psinfo->buf, header_size + dump_size, psinfo->bufsize); @@ -573,8 +574,8 @@ static void pstore_dump(struct kmsg_dump total += record.size; part++; } - if (is_locked) - spin_unlock_irqrestore(&psinfo->buf_lock, flags); + + up(&psinfo->buf_lock); }
static struct kmsg_dumper pstore_dumper = { @@ -693,6 +694,7 @@ int pstore_register(struct pstore_info * psi->write_user = pstore_write_user_compat; psinfo = psi; mutex_init(&psinfo->read_mutex); + sema_init(&psinfo->buf_lock, 1); spin_unlock(&pstore_lock);
if (owner && !try_module_get(owner)) { --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -812,7 +812,6 @@ static int ramoops_probe(struct platform err = -ENOMEM; goto fail_clear; } - spin_lock_init(&cxt->pstore.buf_lock);
cxt->pstore.flags = PSTORE_FLAGS_DMESG; if (cxt->console_size) --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -26,7 +26,7 @@ #include <linux/errno.h> #include <linux/kmsg_dump.h> #include <linux/mutex.h> -#include <linux/spinlock.h> +#include <linux/semaphore.h> #include <linux/time.h> #include <linux/types.h>
@@ -88,7 +88,7 @@ struct pstore_record { * @owner: module which is repsonsible for this backend driver * @name: name of the backend driver * - * @buf_lock: spinlock to serialize access to @buf + * @buf_lock: semaphore to serialize access to @buf * @buf: preallocated crash dump buffer * @bufsize: size of @buf available for crash dump bytes (must match * smallest number of bytes available for writing to a @@ -173,7 +173,7 @@ struct pstore_info { struct module *owner; char *name;
- spinlock_t buf_lock; + struct semaphore buf_lock; char *buf; size_t bufsize;
@@ -199,7 +199,6 @@ struct pstore_info {
extern int pstore_register(struct pstore_info *); extern void pstore_unregister(struct pstore_info *); -extern bool pstore_cannot_block_path(enum kmsg_dump_reason reason);
struct pstore_ftrace_record { unsigned long ip;
From: Kees Cook keescook@chromium.org
commit 8880fa32c557600f5f624084152668ed3c2ea51e upstream.
The ram pstore backend has always had the crash dumper frontend enabled unconditionally. However, it was possible to effectively disable it by setting a record_size=0. All the machinery would run (storing dumps to the temporary crash buffer), but 0 bytes would ultimately get stored due to there being no przs allocated for dumps. Commit 89d328f637b9 ("pstore/ram: Correctly calculate usable PRZ bytes"), however, assumed that there would always be at least one allocated dprz for calculating the size of the temporary crash buffer. This was, of course, not the case when record_size=0, and would lead to a NULL deref trying to find the dprz buffer size:
BUG: unable to handle kernel NULL pointer dereference at (null) ... IP: ramoops_probe+0x285/0x37e (fs/pstore/ram.c:808)
cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
Instead, we need to only enable the frontends based on the success of the prz initialization and only take the needed actions when those zones are available. (This also fixes a possible error in detecting if the ftrace frontend should be enabled.)
Reported-and-tested-by: Yaro Slav yaro330@gmail.com Fixes: 89d328f637b9 ("pstore/ram: Correctly calculate usable PRZ bytes") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/pstore/platform.c | 3 ++- fs/pstore/ram.c | 36 +++++++++++++++++++++++------------- 2 files changed, 25 insertions(+), 14 deletions(-)
--- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -702,7 +702,8 @@ int pstore_register(struct pstore_info * return -EINVAL; }
- allocate_buf_for_compression(); + if (psi->flags & PSTORE_FLAGS_DMESG) + allocate_buf_for_compression();
if (pstore_is_mounted()) pstore_get_records(0); --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -801,26 +801,36 @@ static int ramoops_probe(struct platform
cxt->pstore.data = cxt; /* - * Since bufsize is only used for dmesg crash dumps, it - * must match the size of the dprz record (after PRZ header - * and ECC bytes have been accounted for). + * Prepare frontend flags based on which areas are initialized. + * For ramoops_init_przs() cases, the "max count" variable tells + * if there are regions present. For ramoops_init_prz() cases, + * the single region size is how to check. */ - cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size; - cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL); - if (!cxt->pstore.buf) { - pr_err("cannot allocate pstore crash dump buffer\n"); - err = -ENOMEM; - goto fail_clear; - } - - cxt->pstore.flags = PSTORE_FLAGS_DMESG; + cxt->pstore.flags = 0; + if (cxt->max_dump_cnt) + cxt->pstore.flags |= PSTORE_FLAGS_DMESG; if (cxt->console_size) cxt->pstore.flags |= PSTORE_FLAGS_CONSOLE; - if (cxt->ftrace_size) + if (cxt->max_ftrace_cnt) cxt->pstore.flags |= PSTORE_FLAGS_FTRACE; if (cxt->pmsg_size) cxt->pstore.flags |= PSTORE_FLAGS_PMSG;
+ /* + * Since bufsize is only used for dmesg crash dumps, it + * must match the size of the dprz record (after PRZ header + * and ECC bytes have been accounted for). + */ + if (cxt->pstore.flags & PSTORE_FLAGS_DMESG) { + cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size; + cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL); + if (!cxt->pstore.buf) { + pr_err("cannot allocate pstore crash dump buffer\n"); + err = -ENOMEM; + goto fail_clear; + } + } + err = pstore_register(&cxt->pstore); if (err) { pr_err("registering with pstore failed\n");
From: Jiri Kosina jkosina@suse.cz
commit ec527c318036a65a083ef68d8ba95789d2212246 upstream.
As explained in
0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees.
That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once.
This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid.
That results in triple fault shortly after cr3 is switched, and machine reboots.
Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration.
Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait.
Cc: 4.19+ stable@vger.kernel.org # v4.19+ Debugged-by: Thomas Gleixner tglx@linutronix.de Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina jkosina@suse.cz Acked-by: Pavel Machek pavel@ucw.cz Reviewed-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/power/cpu.c | 10 ++++++++++ arch/x86/power/hibernate_64.c | 33 +++++++++++++++++++++++++++++++++ include/linux/cpu.h | 4 ++++ kernel/cpu.c | 4 ++-- kernel/power/hibernate.c | 9 +++++++++ 5 files changed, 58 insertions(+), 2 deletions(-)
--- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -299,7 +299,17 @@ int hibernate_resume_nonboot_cpu_disable * address in its instruction pointer may not be possible to resolve * any more at that point (the page tables used by it previously may * have been overwritten by hibernate image data). + * + * First, make sure that we wake up all the potentially disabled SMT + * threads which have been initially brought up and then put into + * mwait/cpuidle sleep. + * Those will be put to proper (not interfering with hibernation + * resume) sleep afterwards, and the resumed kernel will decide itself + * what to do with them. */ + ret = cpuhp_smt_enable(); + if (ret) + return ret; smp_ops.play_dead = resume_play_dead; ret = disable_nonboot_cpus(); smp_ops.play_dead = play_dead; --- a/arch/x86/power/hibernate_64.c +++ b/arch/x86/power/hibernate_64.c @@ -13,6 +13,7 @@ #include <linux/suspend.h> #include <linux/scatterlist.h> #include <linux/kdebug.h> +#include <linux/cpu.h>
#include <crypto/hash.h>
@@ -347,3 +348,35 @@ int arch_hibernation_header_restore(void
return 0; } + +int arch_resume_nosmt(void) +{ + int ret = 0; + /* + * We reached this while coming out of hibernation. This means + * that SMT siblings are sleeping in hlt, as mwait is not safe + * against control transition during resume (see comment in + * hibernate_resume_nonboot_cpu_disable()). + * + * If the resumed kernel has SMT disabled, we have to take all the + * SMT siblings out of hlt, and offline them again so that they + * end up in mwait proper. + * + * Called with hotplug disabled. + */ + cpu_hotplug_enable(); + if (cpu_smt_control == CPU_SMT_DISABLED || + cpu_smt_control == CPU_SMT_FORCE_DISABLED) { + enum cpuhp_smt_control old = cpu_smt_control; + + ret = cpuhp_smt_enable(); + if (ret) + goto out; + ret = cpuhp_smt_disable(old); + if (ret) + goto out; + } +out: + cpu_hotplug_disable(); + return ret; +} --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -191,10 +191,14 @@ enum cpuhp_smt_control { extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); extern void cpu_smt_check_topology(void); +extern int cpuhp_smt_enable(void); +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); #else # define cpu_smt_control (CPU_SMT_ENABLED) static inline void cpu_smt_disable(bool force) { } static inline void cpu_smt_check_topology(void) { } +static inline int cpuhp_smt_enable(void) { return 0; } +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } #endif
/* --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2054,7 +2054,7 @@ static void cpuhp_online_cpu_device(unsi kobject_uevent(&dev->kobj, KOBJ_ONLINE); }
-static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) +int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0;
@@ -2088,7 +2088,7 @@ static int cpuhp_smt_disable(enum cpuhp_ return ret; }
-static int cpuhp_smt_enable(void) +int cpuhp_smt_enable(void) { int cpu, ret = 0;
--- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -258,6 +258,11 @@ void swsusp_show_speed(ktime_t start, kt (kps % 1000) / 10); }
+__weak int arch_resume_nosmt(void) +{ + return 0; +} + /** * create_image - Create a hibernation image. * @platform_mode: Whether or not to use the platform driver. @@ -322,6 +327,10 @@ static int create_image(int platform_mod Enable_cpus: enable_nonboot_cpus();
+ /* Allow architectures to do nosmt-specific post-resume dances */ + if (!in_suspend) + error = arch_resume_nosmt(); + Platform_finish: platform_finish(platform_mode);
From: Robert Hancock hancock@sedsystems.ca
commit 49b809586730a77b57ce620b2f9689de765d790b upstream.
This driver does not support reading more than 255 bytes at once because the register for storing the number of bytes to read is only 8 bits. Add a max_read_len quirk to enforce this.
This was found when using this driver with the SFP driver, which was previously reading all 256 bytes in the SFP EEPROM in one transaction. This caused a bunch of hard-to-debug errors in the xiic driver since the driver/logic was treating the number of bytes to read as zero. Rejecting transactions that aren't supported at least allows the problem to be diagnosed more easily.
Signed-off-by: Robert Hancock hancock@sedsystems.ca Reviewed-by: Michal Simek michal.simek@xilinx.com Signed-off-by: Wolfram Sang wsa@the-dreams.de Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-xiic.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/i2c/busses/i2c-xiic.c +++ b/drivers/i2c/busses/i2c-xiic.c @@ -725,11 +725,16 @@ static const struct i2c_algorithm xiic_a .functionality = xiic_func, };
+static const struct i2c_adapter_quirks xiic_quirks = { + .max_read_len = 255, +}; + static const struct i2c_adapter xiic_adapter = { .owner = THIS_MODULE, .name = DRIVER_NAME, .class = I2C_CLASS_DEPRECATED, .algo = &xiic_algorithm, + .quirks = &xiic_quirks, };
From: Paul Burton paul.burton@mips.com
commit 074a1e1167afd82c26f6d03a9a8b997d564bb241 upstream.
The virt_addr_valid() function is meant to return true iff virt_to_page() will return a valid struct page reference. This is true iff the address provided is found within the unmapped address range between PAGE_OFFSET & MAP_BASE, but we don't currently check for that condition. Instead we simply mask the address to obtain what will be a physical address if the virtual address is indeed in the desired range, shift it to form a PFN & then call pfn_valid(). This can incorrectly return true if called with a virtual address which, after masking, happens to form a physical address corresponding to a valid PFN.
For example we may vmalloc an address in the kernel mapped region starting a MAP_BASE & obtain the virtual address:
addr = 0xc000000000002000
When masked by virt_to_phys(), which uses __pa() & in turn CPHYSADDR(), we obtain the following (bogus) physical address:
addr = 0x2000
In a common system with PHYS_OFFSET=0 this will correspond to a valid struct page which should really be accessed by virtual address PAGE_OFFSET+0x2000, causing virt_addr_valid() to incorrectly return 1 indicating that the original address corresponds to a struct page.
This is equivalent to the ARM64 change made in commit ca219452c6b8 ("arm64: Correctly bounds check virt_addr_valid").
This fixes fallout when hardened usercopy is enabled caused by the related commit 517e1fbeb65f ("mm/usercopy: Drop extra is_vmalloc_or_module() check") which removed a check for the vmalloc range that was present from the introduction of the hardened usercopy feature.
Signed-off-by: Paul Burton paul.burton@mips.com Reported-by: Julien Cristau jcristau@debian.org Reviewed-by: Philippe Mathieu-Daudé f4bug@amsat.org Tested-by: YunQiang Su ysu@wavecomp.com URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929366 Cc: stable@vger.kernel.org # v4.12+ Cc: linux-mips@vger.kernel.org Cc: Yunqiang Su ysu@wavecomp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/mm/mmap.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/mips/mm/mmap.c +++ b/arch/mips/mm/mmap.c @@ -203,6 +203,11 @@ unsigned long arch_randomize_brk(struct
int __virt_addr_valid(const volatile void *kaddr) { + unsigned long vaddr = (unsigned long)vaddr; + + if ((vaddr < PAGE_OFFSET) || (vaddr >= MAP_BASE)) + return 0; + return pfn_valid(PFN_DOWN(virt_to_phys(kaddr))); } EXPORT_SYMBOL_GPL(__virt_addr_valid);
From: Paul Burton paul.burton@mips.com
commit e4f2d1af7163becb181419af9dece9206001e0a6 upstream.
The pistachio platform uses the U-Boot bootloader & generally boots a kernel in the uImage format. As such it's useful to build one when building the kernel, but to do so currently requires the user to manually specify a uImage target on the make command line.
Make uImage.gz the pistachio platform's default build target, so that the default is to build a kernel image that we can actually boot on a board such as the MIPS Creator Ci40.
Marked for stable backport as far as v4.1 where pistachio support was introduced. This is primarily useful for CI systems such as kernelci.org which will benefit from us building a suitable image which can then be booted as part of automated testing, extending our test coverage to the affected stable branches.
Signed-off-by: Paul Burton paul.burton@mips.com Reviewed-by: Philippe Mathieu-Daudé f4bug@amsat.org Reviewed-by: Kevin Hilman khilman@baylibre.com Tested-by: Kevin Hilman khilman@baylibre.com URL: https://groups.io/g/kernelci/message/388 Cc: stable@vger.kernel.org # v4.1+ Cc: linux-mips@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/pistachio/Platform | 1 + 1 file changed, 1 insertion(+)
--- a/arch/mips/pistachio/Platform +++ b/arch/mips/pistachio/Platform @@ -6,3 +6,4 @@ cflags-$(CONFIG_MACH_PISTACHIO) += \ -I$(srctree)/arch/mips/include/asm/mach-pistachio load-$(CONFIG_MACH_PISTACHIO) += 0xffffffff80400000 zload-$(CONFIG_MACH_PISTACHIO) += 0xffffffff81000000 +all-$(CONFIG_MACH_PISTACHIO) := uImage.gz
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 9547d81ac3bc0d2b9729a28e7dd610007144a837 which is commit a1e8783db8e0d58891681bc1e6d9ada66eae8e20 upstream.
Petr writes: Karl has reported to me today, that he's experiencing weird reboot hang on his devices with 4.9.180 kernel and that he has bisected it down to my backported patch.
I would like to kindly ask you for removal of this patch. This patch should be reverted from all stable kernels up to 5.1, because perf counters were not broken on those kernels, and this patch won't work on the ath79 legacy IRQ code anyway, it needs new irqchip driver which was enabled on ath79 with commit 51fa4f8912c0 ("MIPS: ath79: drop legacy IRQ code").
Reported-by: Petr Štetiar ynezz@true.cz Cc: Kevin 'ldir' Darbyshire-Bryant ldir@darbyshire-bryant.me.uk Cc: John Crispin john@phrozen.org Cc: Marc Zyngier marc.zyngier@arm.com Cc: Paul Burton paul.burton@mips.com Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle ralf@linux-mips.org Cc: James Hogan jhogan@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Jason Cooper jason@lakedaemon.net Cc: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/ath79/setup.c | 6 ++++++ drivers/irqchip/irq-ath79-misc.c | 11 ----------- 2 files changed, 6 insertions(+), 11 deletions(-)
--- a/arch/mips/ath79/setup.c +++ b/arch/mips/ath79/setup.c @@ -183,6 +183,12 @@ const char *get_system_type(void) return ath79_sys_type; }
+int get_c0_perfcount_int(void) +{ + return ATH79_MISC_IRQ(5); +} +EXPORT_SYMBOL_GPL(get_c0_perfcount_int); + unsigned int get_c0_compare_int(void) { return CP0_LEGACY_COMPARE_IRQ; --- a/drivers/irqchip/irq-ath79-misc.c +++ b/drivers/irqchip/irq-ath79-misc.c @@ -22,15 +22,6 @@ #define AR71XX_RESET_REG_MISC_INT_ENABLE 4
#define ATH79_MISC_IRQ_COUNT 32 -#define ATH79_MISC_PERF_IRQ 5 - -static int ath79_perfcount_irq; - -int get_c0_perfcount_int(void) -{ - return ath79_perfcount_irq; -} -EXPORT_SYMBOL_GPL(get_c0_perfcount_int);
static void ath79_misc_irq_handler(struct irq_desc *desc) { @@ -122,8 +113,6 @@ static void __init ath79_misc_intc_domai { void __iomem *base = domain->host_data;
- ath79_perfcount_irq = irq_create_mapping(domain, ATH79_MISC_PERF_IRQ); - /* Disable and clear all interrupts */ __raw_writel(0, base + AR71XX_RESET_REG_MISC_INT_ENABLE); __raw_writel(0, base + AR71XX_RESET_REG_MISC_INT_STATUS);
From: Dan Carpenter dan.carpenter@oracle.com
commit 110080cea0d0e4dfdb0b536e7f8a5633ead6a781 upstream.
There are a couple potential integer overflows here.
round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE);
The first thing is that the "m->size + (...)" addition could overflow, and the second is that round_up() overflows to zero if the result is within PAGE_SIZE of the type max.
In this code, the "m->size" variable is an u64 but we're saving the result in "map_size" which is an unsigned long and genwqe_user_vmap() takes an unsigned long as well. So I have used ULONG_MAX as the upper bound. From a practical perspective unsigned long is fine/better than trying to change all the types to u64.
Fixes: eaf4722d4645 ("GenWQE Character device and DDCB queue") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/genwqe/card_dev.c | 2 ++ drivers/misc/genwqe/card_utils.c | 4 ++++ 2 files changed, 6 insertions(+)
--- a/drivers/misc/genwqe/card_dev.c +++ b/drivers/misc/genwqe/card_dev.c @@ -782,6 +782,8 @@ static int genwqe_pin_mem(struct genwqe_
if ((m->addr == 0x0) || (m->size == 0)) return -EINVAL; + if (m->size > ULONG_MAX - PAGE_SIZE - (m->addr & ~PAGE_MASK)) + return -EINVAL;
map_addr = (m->addr & PAGE_MASK); map_size = round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE); --- a/drivers/misc/genwqe/card_utils.c +++ b/drivers/misc/genwqe/card_utils.c @@ -582,6 +582,10 @@ int genwqe_user_vmap(struct genwqe_dev * /* determine space needed for page_list. */ data = (unsigned long)uaddr; offs = offset_in_page(data); + if (size > ULONG_MAX - PAGE_SIZE - offs) { + m->size = 0; /* mark unused and not added */ + return -EINVAL; + } m->nr_pages = DIV_ROUND_UP(offs + size, PAGE_SIZE);
m->page_list = kcalloc(m->nr_pages,
From: Dan Carpenter dan.carpenter@oracle.com
commit bd17cc5a20ae9aaa3ed775f360b75ff93cd66a1d upstream.
The limit here is supposed to be how much of the page is left, but it's just using PAGE_SIZE as the limit.
The other thing to remember is that snprintf() returns the number of bytes which would have been copied if we had had enough room. So that means that if we run out of space then this code would end up passing a negative value as the limit and the kernel would print an error message. I have change the code to use scnprintf() which returns the number of bytes that were successfully printed (not counting the NUL terminator).
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/test_firmware.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -222,30 +222,30 @@ static ssize_t config_show(struct device
mutex_lock(&test_fw_mutex);
- len += snprintf(buf, PAGE_SIZE, + len += scnprintf(buf, PAGE_SIZE - len, "Custom trigger configuration for: %s\n", dev_name(dev));
if (test_fw_config->name) - len += snprintf(buf+len, PAGE_SIZE, + len += scnprintf(buf+len, PAGE_SIZE - len, "name:\t%s\n", test_fw_config->name); else - len += snprintf(buf+len, PAGE_SIZE, + len += scnprintf(buf+len, PAGE_SIZE - len, "name:\tEMTPY\n");
- len += snprintf(buf+len, PAGE_SIZE, + len += scnprintf(buf+len, PAGE_SIZE - len, "num_requests:\t%u\n", test_fw_config->num_requests);
- len += snprintf(buf+len, PAGE_SIZE, + len += scnprintf(buf+len, PAGE_SIZE - len, "send_uevent:\t\t%s\n", test_fw_config->send_uevent ? "FW_ACTION_HOTPLUG" : "FW_ACTION_NOHOTPLUG"); - len += snprintf(buf+len, PAGE_SIZE, + len += scnprintf(buf+len, PAGE_SIZE - len, "sync_direct:\t\t%s\n", test_fw_config->sync_direct ? "true" : "false"); - len += snprintf(buf+len, PAGE_SIZE, + len += scnprintf(buf+len, PAGE_SIZE - len, "read_fw_idx:\t%u\n", test_fw_config->read_fw_idx);
mutex_unlock(&test_fw_mutex);
From: Patrik Jakobsson patrik.r.jakobsson@gmail.com
commit 7c420636860a719049fae9403e2c87804f53bdde upstream.
Some machines have an lvds child device in vbt even though a panel is not attached. To make detection more reliable we now also check the lvds config bits available in the vbt.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1665766 Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Patrik Jakobsson patrik.r.jakobsson@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/20190416114607.1072-1-patrik.r... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/gma500/cdv_intel_lvds.c | 3 +++ drivers/gpu/drm/gma500/intel_bios.c | 3 +++ drivers/gpu/drm/gma500/psb_drv.h | 1 + 3 files changed, 7 insertions(+)
--- a/drivers/gpu/drm/gma500/cdv_intel_lvds.c +++ b/drivers/gpu/drm/gma500/cdv_intel_lvds.c @@ -594,6 +594,9 @@ void cdv_intel_lvds_init(struct drm_devi int pipe; u8 pin;
+ if (!dev_priv->lvds_enabled_in_vbt) + return; + pin = GMBUS_PORT_PANEL; if (!lvds_is_present_in_vbt(dev, &pin)) { DRM_DEBUG_KMS("LVDS is not present in VBT\n"); --- a/drivers/gpu/drm/gma500/intel_bios.c +++ b/drivers/gpu/drm/gma500/intel_bios.c @@ -436,6 +436,9 @@ parse_driver_features(struct drm_psb_pri if (driver->lvds_config == BDB_DRIVER_FEATURE_EDP) dev_priv->edp.support = 1;
+ dev_priv->lvds_enabled_in_vbt = driver->lvds_config != 0; + DRM_DEBUG_KMS("LVDS VBT config bits: 0x%x\n", driver->lvds_config); + /* This bit means to use 96Mhz for DPLL_A or not */ if (driver->primary_lfp_id) dev_priv->dplla_96mhz = true; --- a/drivers/gpu/drm/gma500/psb_drv.h +++ b/drivers/gpu/drm/gma500/psb_drv.h @@ -538,6 +538,7 @@ struct drm_psb_private { int lvds_ssc_freq; bool is_lvds_on; bool is_mipi_on; + bool lvds_enabled_in_vbt; u32 mipi_ctrl_display;
unsigned int core_freq;
From: Dave Airlie airlied@redhat.com
commit b30a43ac7132cdda833ac4b13dd1ebd35ace14b7 upstream.
There was a nouveau DDX that relied on legacy context ioctls to work, but we fixed it years ago, give distros that have a modern DDX the option to break the uAPI and close the mess of holes that legacy context support is.
Full context of the story:
commit 0e975980d435d58df2d430d688b8c18778b42218 Author: Peter Antoine peter.antoine@intel.com Date: Tue Jun 23 08:18:49 2015 +0100
drm: Turn off Legacy Context Functions
The context functions are not used by the i915 driver and should not be used by modeset drivers. These driver functions contain several bugs and security holes. This change makes these functions optional can be turned on by a setting, they are turned off by default for modeset driver with the exception of the nouvea driver that may require them with an old version of libdrm.
The previous attempt was
commit 7c510133d93dd6f15ca040733ba7b2891ed61fd1 Author: Daniel Vetter daniel.vetter@ffwll.ch Date: Thu Aug 8 15:41:21 2013 +0200
drm: mark context support as a legacy subsystem
but this had to be reverted
commit c21eb21cb50d58e7cbdcb8b9e7ff68b85cfa5095 Author: Dave Airlie airlied@redhat.com Date: Fri Sep 20 08:32:59 2013 +1000
Revert "drm: mark context support as a legacy subsystem"
v2: remove returns from void function, and formatting (Daniel Vetter)
v3: - s/Nova/nouveau/ in the commit message, and add references to the previous attempts - drop the part touching the drm hw lock, that should be a separate patch.
Signed-off-by: Peter Antoine peter.antoine@intel.com (v2) Cc: Peter Antoine peter.antoine@intel.com (v2) Reviewed-by: Peter Antoine peter.antoine@intel.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
v2: move DRM_VM dependency into legacy config. v3: fix missing dep (kbuild robot)
Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/Kconfig | 13 ++++++++++++- drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++++-- 2 files changed, 17 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/nouveau/Kconfig +++ b/drivers/gpu/drm/nouveau/Kconfig @@ -16,10 +16,21 @@ config DRM_NOUVEAU select INPUT if ACPI && X86 select THERMAL if ACPI && X86 select ACPI_VIDEO if ACPI && X86 - select DRM_VM help Choose this option for open-source NVIDIA support.
+config NOUVEAU_LEGACY_CTX_SUPPORT + bool "Nouveau legacy context support" + depends on DRM_NOUVEAU + select DRM_VM + default y + help + There was a version of the nouveau DDX that relied on legacy + ctx ioctls not erroring out. But that was back in time a long + ways, so offer a way to disable it now. For uapi compat with + old nouveau ddx this should be on by default, but modern distros + should consider turning it off. + config NOUVEAU_PLATFORM_DRIVER bool "Nouveau (NVIDIA) SoC GPUs" depends on DRM_NOUVEAU && ARCH_TEGRA --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -967,8 +967,11 @@ nouveau_driver_fops = { static struct drm_driver driver_stub = { .driver_features = - DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER | - DRIVER_KMS_LEGACY_CONTEXT, + DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER +#if defined(CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT) + | DRIVER_KMS_LEGACY_CONTEXT +#endif + ,
.load = nouveau_drm_load, .unload = nouveau_drm_unload,
From: Alex Deucher alexander.deucher@amd.com
commit 9d6fea5744d6798353f37ac42a8a653a2607ca69 upstream.
In case we need to use them for GPU reset prior initializing the asic. Fixes a crash if the driver attempts to reset the GPU at driver load time.
Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -37,18 +37,10 @@ static void psp_set_funcs(struct amdgpu_ static int psp_early_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct psp_context *psp = &adev->psp;
psp_set_funcs(adev);
- return 0; -} - -static int psp_sw_init(void *handle) -{ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; - struct psp_context *psp = &adev->psp; - int ret; - switch (adev->asic_type) { case CHIP_VEGA10: psp->init_microcode = psp_v3_1_init_microcode; @@ -79,6 +71,15 @@ static int psp_sw_init(void *handle)
psp->adev = adev;
+ return 0; +} + +static int psp_sw_init(void *handle) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct psp_context *psp = &adev->psp; + int ret; + ret = psp_init_microcode(psp); if (ret) { DRM_ERROR("Failed to load psp firmware!\n");
From: Christian König christian.koenig@amd.com
commit 2e26ccb119bde03584be53406bbd22e711b0d6e6 upstream.
Instead of the closest reference divider prefer the lowest, this fixes flickering issues on HP Compaq nx9420.
Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108514 Suggested-by: Paul Dufresne dufresnep@gmail.com Signed-off-by: Christian König christian.koenig@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/radeon/radeon_display.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -923,12 +923,12 @@ static void avivo_get_fb_ref_div(unsigne ref_div_max = max(min(100 / post_div, ref_div_max), 1u);
/* get matching reference and feedback divider */ - *ref_div = min(max(DIV_ROUND_CLOSEST(den, post_div), 1u), ref_div_max); + *ref_div = min(max(den/post_div, 1u), ref_div_max); *fb_div = DIV_ROUND_CLOSEST(nom * *ref_div * post_div, den);
/* limit fb divider to its maximum */ if (*fb_div > fb_div_max) { - *ref_div = DIV_ROUND_CLOSEST(*ref_div * fb_div_max, *fb_div); + *ref_div = (*ref_div * fb_div_max)/(*fb_div); *fb_div = fb_div_max; } }
From: Chris Wilson chris@chris-wilson.co.uk
commit d90c06d57027203f73021bb7ddb30b800d65c636 upstream.
This was supposed to be a mask of all known rings, but it is being used by execbuffer to filter out invalid rings, and so is instead mapping high unused values onto valid rings. Instead of a mask of all known rings, we need it to be the mask of all possible rings.
Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring") Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v4.6+ Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/uapi/drm/i915_drm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -853,7 +853,7 @@ struct drm_i915_gem_execbuffer2 { * struct drm_i915_gem_exec_fence *fences. */ __u64 cliprects_ptr; -#define I915_EXEC_RING_MASK (7<<0) +#define I915_EXEC_RING_MASK (0x3f) #define I915_EXEC_DEFAULT (0<<0) #define I915_EXEC_RENDER (1<<0) #define I915_EXEC_BSD (2<<0)
From: Daniel Drake drake@endlessm.com
commit 396dd8143bdd94bd1c358a228a631c8c895a1126 upstream.
On many (all?) the Gemini Lake systems we work with, there is frequent momentary graphical corruption at the top of the screen, and it seems that disabling framebuffer compression can avoid this.
The ticket was reported 6 months ago and has already affected a multitude of users, without any real progress being made. So, lets disable framebuffer compression on GeminiLake until a solution is found.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108085 Fixes: fd7d6c5c8f3e ("drm/i915: enable FBC on gen9+ too") Cc: Paulo Zanoni paulo.r.zanoni@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Jani Nikula jani.nikula@linux.intel.com Cc: stable@vger.kernel.org # v4.11+ Reviewed-by: Paulo Zanoni paulo.r.zanoni@intel.com Signed-off-by: Daniel Drake drake@endlessm.com Signed-off-by: Jian-Hong Pan jian-hong@endlessm.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20190423092810.28359-1-jian-ho... (cherry picked from commit 1d25724b41fad7eeb2c3058a5c8190d6ece73e08) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/intel_fbc.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/gpu/drm/i915/intel_fbc.c +++ b/drivers/gpu/drm/i915/intel_fbc.c @@ -1299,6 +1299,10 @@ static int intel_sanitize_fbc_option(str if (!HAS_FBC(dev_priv)) return 0;
+ /* https://bugs.freedesktop.org/show_bug.cgi?id=108085 */ + if (IS_GEMINILAKE(dev_priv)) + return 0; + if (IS_BROADWELL(dev_priv) || INTEL_GEN(dev_priv) >= 9) return 1;
From: Jiri Slaby jslaby@suse.cz
commit 4cdd17ba1dff20ffc99fdbd2e6f0201fc7fe67df upstream.
We need to compute the uart state only on the first open. This is usually what is done in the ->install hook. serial_core used to do this in ->open on every open. So move it to ->install.
As a side effect, it ensures the state is set properly in the window after tty_init_dev is called, but before uart_open. This fixes a bunch of races between tty_open and flush_to_ldisc we were dealing with recently.
One of such bugs was attempted to fix in commit fedb5760648a (serial: fix race between flush_to_ldisc and tty_open), but it only took care of a couple of functions (uart_start and uart_unthrottle). I was able to reproduce the crash on a SLE system, but in uart_write_room which is also called from flush_to_ldisc via process_echoes. I was *unable* to reproduce the bug locally. It is due to having this patch in my queue since 2012!
general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G L 4.12.14-396-default #1 SLE15-SP1 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound flush_to_ldisc task: ffff8800427d8040 task.stack: ffff8800427f0000 RIP: 0010:uart_write_room+0xc4/0x590 RSP: 0018:ffff8800427f7088 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 000000000000002f RSI: 00000000000000ee RDI: ffff88003888bd90 RBP: ffffffffb9545850 R08: 0000000000000001 R09: 0000000000000400 R10: ffff8800427d825c R11: 000000000000006e R12: 1ffff100084fee12 R13: ffffc900004c5000 R14: ffff88003888bb28 R15: 0000000000000178 FS: 0000000000000000(0000) GS:ffff880043300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561da0794148 CR3: 000000000ebf4000 CR4: 00000000000006e0 Call Trace: tty_write_room+0x6d/0xc0 __process_echoes+0x55/0x870 n_tty_receive_buf_common+0x105e/0x26d0 tty_ldisc_receive_buf+0xb7/0x1c0 tty_port_default_receive_buf+0x107/0x180 flush_to_ldisc+0x35d/0x5c0 ...
0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is tried to be dereferenced (0x178 >> 3 is 0x2f in rdx) at uart_write_room+0xc4. 0x178 is exactly (struct uart_state *)NULL->refcount used in uart_port_lock from uart_write_room.
So revert the upstream commit here as my local patch should fix the whole family.
Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: Li RongQing lirongqing@baidu.com Cc: Wang Li wangli39@baidu.com Cc: Zhang Yu zhangyu31@baidu.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/serial/serial_core.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
--- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -143,9 +143,6 @@ static void uart_start(struct tty_struct struct uart_port *port; unsigned long flags;
- if (!state) - return; - port = uart_port_lock(state, flags); __uart_start(tty); uart_port_unlock(port, flags); @@ -1731,11 +1728,8 @@ static void uart_dtr_rts(struct tty_port */ static int uart_open(struct tty_struct *tty, struct file *filp) { - struct uart_driver *drv = tty->driver->driver_state; - int retval, line = tty->index; - struct uart_state *state = drv->state + line; - - tty->driver_data = state; + struct uart_state *state = tty->driver_data; + int retval;
retval = tty_port_open(&state->port, tty, filp); if (retval > 0) @@ -2418,9 +2412,6 @@ static void uart_poll_put_char(struct tt struct uart_state *state = drv->state + line; struct uart_port *port;
- if (!state) - return; - port = uart_port_ref(state); if (!port) return; @@ -2432,7 +2423,18 @@ static void uart_poll_put_char(struct tt } #endif
+static int uart_install(struct tty_driver *driver, struct tty_struct *tty) +{ + struct uart_driver *drv = driver->driver_state; + struct uart_state *state = drv->state + tty->index; + + tty->driver_data = state; + + return tty_standard_install(driver, tty); +} + static const struct tty_operations uart_ops = { + .install = uart_install, .open = uart_open, .close = uart_close, .write = uart_write,
From: Kristian Evensen kristian.evensen@gmail.com
commit e4bf63482c309287ca84d91770ffa7dcc18e37eb upstream.
Most, if not all, Quectel devices use dynamic interface numbers, and users are able to change the USB configuration at will. Matching on for example interface number is therefore not possible.
Instead, the QMI device can be identified by looking at the interface class, subclass and protocol (all 0xff), as well as the number of endpoints. The reason we need to look at the number of endpoints, is that the diagnostic port interface has the same class, subclass and protocol as QMI. However, the diagnostic port only has two endpoints, while QMI has three.
Until now, we have identified the QMI device by combining a match on class, subclass and protocol, with a call to the function quectel_diag_detect(). In quectel_diag_detect(), we check if the number of endpoints matches for known Quectel vendor/product ids.
Adding new vendor/product ids to quectel_diag_detect() is not a good long-term solution. This commit replaces the function with a quirk, and applies the quirk to affected Quectel devices that I have been able to test the change with (EP06, EM12 and EC25). If the quirk is set and the number of endpoints equal two, we return from qmi_wwan_probe() with -ENODEV.
[In order for this patch to apply cleanly to 4.14, two minor changes had to be made. First, the original work-around (quectel_diag_detect()) for the dynamic interface numbers was never backported to 4.14, so there is no need to remove this code. Second, support for the EM12 was also not backported to 4.14. Since supporting EM12 is a trivial change (just another VID/PID match), and the match for EM12 is changed by this patch, I chose to not submit adding EM12-support as a separate patch.]
Signed-off-by: Kristian Evensen kristian.evensen@gmail.com Acked-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/qmi_wwan.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
--- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -63,6 +63,7 @@ enum qmi_wwan_flags {
enum qmi_wwan_quirks { QMI_WWAN_QUIRK_DTR = 1 << 0, /* needs "set DTR" request */ + QMI_WWAN_QUIRK_QUECTEL_DYNCFG = 1 << 1, /* check num. endpoints */ };
struct qmimux_hdr { @@ -845,6 +846,16 @@ static const struct driver_info qmi_wwan .data = QMI_WWAN_QUIRK_DTR, };
+static const struct driver_info qmi_wwan_info_quirk_quectel_dyncfg = { + .description = "WWAN/QMI device", + .flags = FLAG_WWAN | FLAG_SEND_ZLP, + .bind = qmi_wwan_bind, + .unbind = qmi_wwan_unbind, + .manage_power = qmi_wwan_manage_power, + .rx_fixup = qmi_wwan_rx_fixup, + .data = QMI_WWAN_QUIRK_DTR | QMI_WWAN_QUIRK_QUECTEL_DYNCFG, +}; + #define HUAWEI_VENDOR_ID 0x12D1
/* map QMI/wwan function by a fixed interface number */ @@ -865,6 +876,15 @@ static const struct driver_info qmi_wwan #define QMI_GOBI_DEVICE(vend, prod) \ QMI_FIXED_INTF(vend, prod, 0)
+/* Quectel does not use fixed interface numbers on at least some of their + * devices. We need to check the number of endpoints to ensure that we bind to + * the correct interface. + */ +#define QMI_QUIRK_QUECTEL_DYNCFG(vend, prod) \ + USB_DEVICE_AND_INTERFACE_INFO(vend, prod, USB_CLASS_VENDOR_SPEC, \ + USB_SUBCLASS_VENDOR_SPEC, 0xff), \ + .driver_info = (unsigned long)&qmi_wwan_info_quirk_quectel_dyncfg + static const struct usb_device_id products[] = { /* 1. CDC ECM like devices match on the control interface */ { /* Huawei E392, E398 and possibly others sharing both device id and more... */ @@ -969,6 +989,9 @@ static const struct usb_device_id produc USB_DEVICE_AND_INTERFACE_INFO(0x03f0, 0x581d, USB_CLASS_VENDOR_SPEC, 1, 7), .driver_info = (unsigned long)&qmi_wwan_info, }, + {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0125)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ + {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0306)}, /* Quectel EP06/EG06/EM06 */ + {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0512)}, /* Quectel EG12/EM12 */
/* 3. Combined interface devices matching on interface number */ {QMI_FIXED_INTF(0x0408, 0xea42, 4)}, /* Yota / Megafon M100-1 */ @@ -1258,11 +1281,9 @@ static const struct usb_device_id produc {QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)}, /* HP lt4120 Snapdragon X5 LTE */ {QMI_FIXED_INTF(0x22de, 0x9061, 3)}, /* WeTelecom WPD-600N */ {QMI_QUIRK_SET_DTR(0x1e0e, 0x9001, 5)}, /* SIMCom 7100E, 7230E, 7600E ++ */ - {QMI_QUIRK_SET_DTR(0x2c7c, 0x0125, 4)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)}, /* Quectel EC21 Mini PCIe */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0191, 4)}, /* Quectel EG91 */ {QMI_FIXED_INTF(0x2c7c, 0x0296, 4)}, /* Quectel BG96 */ - {QMI_QUIRK_SET_DTR(0x2c7c, 0x0306, 4)}, /* Quectel EP06 Mini PCIe */ {QMI_QUIRK_SET_DTR(0x2cb7, 0x0104, 4)}, /* Fibocom NL678 series */
/* 4. Gobi 1000 devices */ @@ -1344,6 +1365,7 @@ static int qmi_wwan_probe(struct usb_int { struct usb_device_id *id = (struct usb_device_id *)prod; struct usb_interface_descriptor *desc = &intf->cur_altsetting->desc; + const struct driver_info *info;
/* Workaround to enable dynamic IDs. This disables usbnet * blacklisting functionality. Which, if required, can be @@ -1373,6 +1395,19 @@ static int qmi_wwan_probe(struct usb_int return -ENODEV; }
+ info = (void *)&id->driver_info; + + /* Several Quectel modems supports dynamic interface configuration, so + * we need to match on class/subclass/protocol. These values are + * identical for the diagnostic- and QMI-interface, but bNumEndpoints is + * different. Ignore the current interface if the number of endpoints + * equals the number for the diag interface (two). + */ + if (info->data & QMI_WWAN_QUIRK_QUECTEL_DYNCFG) { + if (desc->bNumEndpoints == 2) + return -ENODEV; + } + return usbnet_probe(intf, id); }
From: Kirill Smelkov kirr@nexedi.com
commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream.
Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added locking for file.f_pos access and in particular made concurrent read and write not possible - now both those functions take f_pos lock for the whole run, and so if e.g. a read is blocked waiting for data, write will deadlock waiting for that read to complete.
This caused regression for stream-like files where previously read and write could run simultaneously, but after that patch could not do so anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") which fixes such regression for particular case of /proc/xen/xenbus.
The patch that added f_pos lock in 2014 did so to guarantee POSIX thread safety for read/write/lseek and added the locking to file descriptors of all regular files. In 2014 that thread-safety problem was not new as it was already discussed earlier in 2006.
However even though 2006'th version of Linus's patch was adding f_pos locking "only for files that are marked seekable with FMODE_LSEEK (thus avoiding the stream-like objects like pipes and sockets)", the 2014 version - the one that actually made it into the tree as 9c225f2655e3 - is doing so irregardless of whether a file is seekable or not.
See
https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/ https://lwn.net/Articles/180387 https://lwn.net/Articles/180396
for historic context.
The reason that it did so is, probably, that there are many files that are marked non-seekable, but e.g. their read implementation actually depends on knowing current position to correctly handle the read. Some examples:
kernel/power/user.c snapshot_read fs/debugfs/file.c u32_array_read fs/fuse/control.c fuse_conn_waiting_read + ... drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read arch/s390/hypfs/inode.c hypfs_read_iter ...
Despite that, many nonseekable_open users implement read and write with pure stream semantics - they don't depend on passed ppos at all. And for those cases where read could wait for something inside, it creates a situation similar to xenbus - the write could be never made to go until read is done, and read is waiting for some, potentially external, event, for potentially unbounded time -> deadlock.
Besides xenbus, there are 14 such places in the kernel that I've found with semantic patch (see below):
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write() drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write() drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write() drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write() net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write() drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write() drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write() drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write() net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write() drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write() drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write() drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write() drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write() drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
In addition to the cases above another regression caused by f_pos locking is that now FUSE filesystems that implement open with FOPEN_NONSEEKABLE flag, can no longer implement bidirectional stream-like files - for the same reason as above e.g. read can deadlock write locking on file.f_pos in the kernel.
FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse: implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and write routines not depending on current position at all, and with both read and write being potentially blocking operations:
See
https://github.com/libfuse/osspd https://lwn.net/Articles/308445
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477 https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as "somewhat pipe-like files ..." with read handler not using offset. However that test implements only read without write and cannot exercise the deadlock scenario:
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c... https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c... https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c...
I've actually hit the read vs write deadlock for real while implementing my FUSE filesystem where there is /head/watch file, for which open creates separate bidirectional socket-like stream in between filesystem and its user with both read and write being later performed simultaneously. And there it is semantically not easy to split the stream into two separate read-only and write-only channels:
https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
Let's fix this regression. The plan is:
1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS - doing so would break many in-kernel nonseekable_open users which actually use ppos in read/write handlers.
2. Add stream_open() to kernel to open stream-like non-seekable file descriptors. Read and write on such file descriptors would never use nor change ppos. And with that property on stream-like files read and write will be running without taking f_pos lock - i.e. read and write could be running simultaneously.
3. With semantic patch search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access.
4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via steam_open if that bit is present in filesystem open reply.
It was tempting to change fs/fuse/ open handler to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfused... https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfused... https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfused...
so if we would do such a change it will break a real user.
5. Add stream_open and FOPEN_STREAM handling to stable kernels starting from v3.14+ (the kernel where 9c225f2655 first appeared).
This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in their open handler and this way avoid the deadlock on all kernel versions. This should work because fs/fuse/ ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock.
This patch adds stream_open, converts /proc/xen/xenbus to it and adds semantic patch to automatically locate in-kernel places that are either required to be converted due to read vs write deadlock, or that are just safe to be converted because read and write do not use ppos and there are no other funky methods in file_operations.
Regarding semantic patch I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations.
The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c)
Cc: Michael Kerrisk mtk.manpages@gmail.com Cc: Yongzhi Pan panyongzhi@gmail.com Cc: Jonathan Corbet corbet@lwn.net Cc: David Vrabel david.vrabel@citrix.com Cc: Juergen Gross jgross@suse.com Cc: Miklos Szeredi miklos@szeredi.hu Cc: Tejun Heo tj@kernel.org Cc: Kirill Tkhai ktkhai@virtuozzo.com Cc: Arnd Bergmann arnd@arndb.de Cc: Christoph Hellwig hch@lst.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Julia Lawall Julia.Lawall@lip6.fr Cc: Nikolaus Rath Nikolaus@rath.org Cc: Han-Wen Nienhuys hanwen@google.com Signed-off-by: Kirill Smelkov kirr@nexedi.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/xenbus/xenbus_dev_frontend.c | 4 fs/open.c | 18 + fs/read_write.c | 5 include/linux/fs.h | 4 scripts/coccinelle/api/stream_open.cocci | 363 +++++++++++++++++++++++++++++++ 5 files changed, 389 insertions(+), 5 deletions(-)
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c @@ -614,9 +614,7 @@ static int xenbus_file_open(struct inode if (xen_store_evtchn == 0) return -ENOENT;
- nonseekable_open(inode, filp); - - filp->f_mode &= ~FMODE_ATOMIC_POS; /* cdev-style semantics */ + stream_open(inode, filp);
u = kzalloc(sizeof(*u), GFP_KERNEL); if (u == NULL) --- a/fs/open.c +++ b/fs/open.c @@ -1212,3 +1212,21 @@ int nonseekable_open(struct inode *inode }
EXPORT_SYMBOL(nonseekable_open); + +/* + * stream_open is used by subsystems that want stream-like file descriptors. + * Such file descriptors are not seekable and don't have notion of position + * (file.f_pos is always 0). Contrary to file descriptors of other regular + * files, .read() and .write() can run simultaneously. + * + * stream_open never fails and is marked to return int so that it could be + * directly used as file_operations.open . + */ +int stream_open(struct inode *inode, struct file *filp) +{ + filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE | FMODE_ATOMIC_POS); + filp->f_mode |= FMODE_STREAM; + return 0; +} + +EXPORT_SYMBOL(stream_open); --- a/fs/read_write.c +++ b/fs/read_write.c @@ -555,12 +555,13 @@ ssize_t vfs_write(struct file *file, con
static inline loff_t file_pos_read(struct file *file) { - return file->f_pos; + return file->f_mode & FMODE_STREAM ? 0 : file->f_pos; }
static inline void file_pos_write(struct file *file, loff_t pos) { - file->f_pos = pos; + if ((file->f_mode & FMODE_STREAM) == 0) + file->f_pos = pos; }
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -148,6 +148,9 @@ typedef int (dio_iodone_t)(struct kiocb /* Has write method(s) */ #define FMODE_CAN_WRITE ((__force fmode_t)0x40000)
+/* File is stream-like */ +#define FMODE_STREAM ((__force fmode_t)0x200000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000)
@@ -2945,6 +2948,7 @@ extern loff_t no_seek_end_llseek_size(st extern loff_t no_seek_end_llseek(struct file *, loff_t, int); extern int generic_file_open(struct inode * inode, struct file * filp); extern int nonseekable_open(struct inode * inode, struct file * filp); +extern int stream_open(struct inode * inode, struct file * filp);
#ifdef CONFIG_BLOCK typedef void (dio_submit_t)(struct bio *bio, struct inode *inode, --- /dev/null +++ b/scripts/coccinelle/api/stream_open.cocci @@ -0,0 +1,363 @@ +// SPDX-License-Identifier: GPL-2.0 +// Author: Kirill Smelkov (kirr@nexedi.com) +// +// Search for stream-like files that are using nonseekable_open and convert +// them to stream_open. A stream-like file is a file that does not use ppos in +// its read and write. Rationale for the conversion is to avoid deadlock in +// between read and write. + +virtual report +virtual patch +virtual explain // explain decisions in the patch (SPFLAGS="-D explain") + +// stream-like reader & writer - ones that do not depend on f_pos. +@ stream_reader @ +identifier readstream, ppos; +identifier f, buf, len; +type loff_t; +@@ + ssize_t readstream(struct file *f, char *buf, size_t len, loff_t *ppos) + { + ... when != ppos + } + +@ stream_writer @ +identifier writestream, ppos; +identifier f, buf, len; +type loff_t; +@@ + ssize_t writestream(struct file *f, const char *buf, size_t len, loff_t *ppos) + { + ... when != ppos + } + + +// a function that blocks +@ blocks @ +identifier block_f; +identifier wait_event =~ "^wait_event_.*"; +@@ + block_f(...) { + ... when exists + wait_event(...) + ... when exists + } + +// stream_reader that can block inside. +// +// XXX wait_* can be called not directly from current function (e.g. func -> f -> g -> wait()) +// XXX currently reader_blocks supports only direct and 1-level indirect cases. +@ reader_blocks_direct @ +identifier stream_reader.readstream; +identifier wait_event =~ "^wait_event_.*"; +@@ + readstream(...) + { + ... when exists + wait_event(...) + ... when exists + } + +@ reader_blocks_1 @ +identifier stream_reader.readstream; +identifier blocks.block_f; +@@ + readstream(...) + { + ... when exists + block_f(...) + ... when exists + } + +@ reader_blocks depends on reader_blocks_direct || reader_blocks_1 @ +identifier stream_reader.readstream; +@@ + readstream(...) { + ... + } + + +// file_operations + whether they have _any_ .read, .write, .llseek ... at all. +// +// XXX add support for file_operations xxx[N] = ... (sound/core/pcm_native.c) +@ fops0 @ +identifier fops; +@@ + struct file_operations fops = { + ... + }; + +@ has_read @ +identifier fops0.fops; +identifier read_f; +@@ + struct file_operations fops = { + .read = read_f, + }; + +@ has_read_iter @ +identifier fops0.fops; +identifier read_iter_f; +@@ + struct file_operations fops = { + .read_iter = read_iter_f, + }; + +@ has_write @ +identifier fops0.fops; +identifier write_f; +@@ + struct file_operations fops = { + .write = write_f, + }; + +@ has_write_iter @ +identifier fops0.fops; +identifier write_iter_f; +@@ + struct file_operations fops = { + .write_iter = write_iter_f, + }; + +@ has_llseek @ +identifier fops0.fops; +identifier llseek_f; +@@ + struct file_operations fops = { + .llseek = llseek_f, + }; + +@ has_no_llseek @ +identifier fops0.fops; +@@ + struct file_operations fops = { + .llseek = no_llseek, + }; + +@ has_mmap @ +identifier fops0.fops; +identifier mmap_f; +@@ + struct file_operations fops = { + .mmap = mmap_f, + }; + +@ has_copy_file_range @ +identifier fops0.fops; +identifier copy_file_range_f; +@@ + struct file_operations fops = { + .copy_file_range = copy_file_range_f, + }; + +@ has_remap_file_range @ +identifier fops0.fops; +identifier remap_file_range_f; +@@ + struct file_operations fops = { + .remap_file_range = remap_file_range_f, + }; + +@ has_splice_read @ +identifier fops0.fops; +identifier splice_read_f; +@@ + struct file_operations fops = { + .splice_read = splice_read_f, + }; + +@ has_splice_write @ +identifier fops0.fops; +identifier splice_write_f; +@@ + struct file_operations fops = { + .splice_write = splice_write_f, + }; + + +// file_operations that is candidate for stream_open conversion - it does not +// use mmap and other methods that assume @offset access to file. +// +// XXX for simplicity require no .{read/write}_iter and no .splice_{read/write} for now. +// XXX maybe_steam.fops cannot be used in other rules - it gives "bad rule maybe_stream or bad variable fops". +@ maybe_stream depends on (!has_llseek || has_no_llseek) && !has_mmap && !has_copy_file_range && !has_remap_file_range && !has_read_iter && !has_write_iter && !has_splice_read && !has_splice_write @ +identifier fops0.fops; +@@ + struct file_operations fops = { + }; + + +// ---- conversions ---- + +// XXX .open = nonseekable_open -> .open = stream_open +// XXX .open = func -> openfunc -> nonseekable_open + +// read & write +// +// if both are used in the same file_operations together with an opener - +// under that conditions we can use stream_open instead of nonseekable_open. +@ fops_rw depends on maybe_stream @ +identifier fops0.fops, openfunc; +identifier stream_reader.readstream; +identifier stream_writer.writestream; +@@ + struct file_operations fops = { + .open = openfunc, + .read = readstream, + .write = writestream, + }; + +@ report_rw depends on report @ +identifier fops_rw.openfunc; +position p1; +@@ + openfunc(...) { + <... + nonseekable_open@p1 + ...> + } + +@ script:python depends on report && reader_blocks @ +fops << fops0.fops; +p << report_rw.p1; +@@ +coccilib.report.print_report(p[0], + "ERROR: %s: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix." % (fops,)) + +@ script:python depends on report && !reader_blocks @ +fops << fops0.fops; +p << report_rw.p1; +@@ +coccilib.report.print_report(p[0], + "WARNING: %s: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open." % (fops,)) + + +@ explain_rw_deadlocked depends on explain && reader_blocks @ +identifier fops_rw.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ nonseekable_open /* read & write (was deadlock) */ + ...> + } + + +@ explain_rw_nodeadlock depends on explain && !reader_blocks @ +identifier fops_rw.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ nonseekable_open /* read & write (no direct deadlock) */ + ...> + } + +@ patch_rw depends on patch @ +identifier fops_rw.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ stream_open + ...> + } + + +// read, but not write +@ fops_r depends on maybe_stream && !has_write @ +identifier fops0.fops, openfunc; +identifier stream_reader.readstream; +@@ + struct file_operations fops = { + .open = openfunc, + .read = readstream, + }; + +@ report_r depends on report @ +identifier fops_r.openfunc; +position p1; +@@ + openfunc(...) { + <... + nonseekable_open@p1 + ...> + } + +@ script:python depends on report @ +fops << fops0.fops; +p << report_r.p1; +@@ +coccilib.report.print_report(p[0], + "WARNING: %s: .read() has stream semantic; safe to change nonseekable_open -> stream_open." % (fops,)) + +@ explain_r depends on explain @ +identifier fops_r.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ nonseekable_open /* read only */ + ...> + } + +@ patch_r depends on patch @ +identifier fops_r.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ stream_open + ...> + } + + +// write, but not read +@ fops_w depends on maybe_stream && !has_read @ +identifier fops0.fops, openfunc; +identifier stream_writer.writestream; +@@ + struct file_operations fops = { + .open = openfunc, + .write = writestream, + }; + +@ report_w depends on report @ +identifier fops_w.openfunc; +position p1; +@@ + openfunc(...) { + <... + nonseekable_open@p1 + ...> + } + +@ script:python depends on report @ +fops << fops0.fops; +p << report_w.p1; +@@ +coccilib.report.print_report(p[0], + "WARNING: %s: .write() has stream semantic; safe to change nonseekable_open -> stream_open." % (fops,)) + +@ explain_w depends on explain @ +identifier fops_w.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ nonseekable_open /* write only */ + ...> + } + +@ patch_w depends on patch @ +identifier fops_w.openfunc; +@@ + openfunc(...) { + <... +- nonseekable_open ++ stream_open + ...> + } + + +// no read, no write - don't change anything
From: Kirill Smelkov kirr@nexedi.com
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfused... https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfused... https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfused...
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle.
This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov kirr@nexedi.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fuse/file.c | 4 +++- include/uapi/linux/fuse.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -178,7 +178,9 @@ void fuse_finish_open(struct inode *inod file->f_op = &fuse_direct_io_file_operations; if (!(ff->open_flags & FOPEN_KEEP_CACHE)) invalidate_inode_pages2(inode->i_mapping); - if (ff->open_flags & FOPEN_NONSEEKABLE) + if (ff->open_flags & FOPEN_STREAM) + stream_open(inode, file); + else if (ff->open_flags & FOPEN_NONSEEKABLE) nonseekable_open(inode, file); if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) { struct fuse_inode *fi = get_fuse_inode(inode); --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -216,10 +216,12 @@ struct fuse_file_lock { * FOPEN_DIRECT_IO: bypass page cache for this open file * FOPEN_KEEP_CACHE: don't invalidate the data cache on open * FOPEN_NONSEEKABLE: the file is not seekable + * FOPEN_STREAM: the file is stream-like (no file position at all) */ #define FOPEN_DIRECT_IO (1 << 0) #define FOPEN_KEEP_CACHE (1 << 1) #define FOPEN_NONSEEKABLE (1 << 2) +#define FOPEN_STREAM (1 << 4)
/** * INIT request/reply flags
On Sun, 9 Jun 2019 at 22:20, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.125 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 11 Jun 2019 04:40:01 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.125-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.125-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 396ea3538ca4ce6f760fff7a837e10f2450c5526 git describe: v4.14.123-106-g396ea3538ca4 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.123-1...
No regressions (compared to build v4.14.123)
No fixes (compared to build v4.14.123)
Ran 23749 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-io-tests * ltp-ipc-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-commands-tests * ltp-hugetlb-tests * ltp-math-tests * ltp-mm-tests * network-basic-tests * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 09/06/2019 17:42, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.125 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 11 Jun 2019 04:40:01 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.125-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 24 tests: 24 pass, 0 fail
Linux version: 4.14.125-rc1-g396ea3538ca4 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Sun, Jun 09, 2019 at 06:42:06PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.125 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 11 Jun 2019 04:40:01 PM UTC. Anything received after that time might be too late.
Build results: total: 172 pass: 172 fail: 0 Qemu test results: total: 335 pass: 335 fail: 0
Guenter
On 6/9/19 10:42 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.125 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue 11 Jun 2019 04:40:01 PM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.125-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org