This is the start of the stable review cycle for the 4.14.41 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed May 16 06:47:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.41-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.41-rc1
Anthoine Bourgeois anthoine.bourgeois@blade-group.com KVM: x86: remove APIC Timer periodic/oneshot spikes
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
Peter Zijlstra peterz@infradead.org perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
Peter Zijlstra peterz@infradead.org perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
Peter Zijlstra peterz@infradead.org perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
Peter Zijlstra peterz@infradead.org perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
Peter Zijlstra peterz@infradead.org perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
Masami Hiramatsu mhiramat@kernel.org tracing/uprobe_event: Fix strncpy corner case
Peter Zijlstra peterz@infradead.org sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
Steve French smfrench@gmail.com smb3: directory sync should not return an error
Jens Axboe axboe@kernel.dk nvme: add quirk to force medium priority for SQ creation
Marek Szyprowski m.szyprowski@samsung.com thermal: exynos: Propagate error value from tmu_read()
Marek Szyprowski m.szyprowski@samsung.com thermal: exynos: Reading temperature makes sense only when TMU is turned on
Hans de Goede hdegoede@redhat.com Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets
Hans de Goede hdegoede@redhat.com Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table
Hans de Goede hdegoede@redhat.com Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174"
Rafael J. Wysocki rafael.j.wysocki@intel.com cpufreq: schedutil: Avoid using invalid next_freq
Rafael J. Wysocki rafael.j.wysocki@intel.com PCI / PM: Check device_may_wakeup() in pci_enable_wake()
Kai Heng Feng kai.heng.feng@canonical.com PCI / PM: Always check PME wakeup capability for runtime wakeup support
Gustavo A. R. Silva gustavo@embeddedor.com atm: zatm: Fix potential Spectre v1
Gustavo A. R. Silva gustavo@embeddedor.com net: atm: Fix potential Spectre v1
Ville Syrjälä ville.syrjala@linux.intel.com drm/atomic: Clean private obj old_state/new_state in drm_atomic_state_default_clear()
Ville Syrjälä ville.syrjala@linux.intel.com drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear()
Lyude Paul lyude@redhat.com drm/nouveau: Fix deadlock in nv50_mstm_register_connector()
Florent Flament contact@florentflament.com drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log
Boris Brezillon boris.brezillon@bootlin.com drm/vc4: Fix scaling of uni-planar formats
Lukas Wunner lukas@wunner.de can: hi311x: Work around TX complete interrupt erratum
Lukas Wunner lukas@wunner.de can: hi311x: Acquire SPI lock on ->do_get_berr_counter
Jimmy Assarsson extja@kvaser.com can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg()
Ilya Dryomov idryomov@gmail.com ceph: fix rsize/wsize capping in ceph_direct_read_write()
David Rientjes rientjes@google.com mm, oom: fix concurrent munlock and oom reaper unmap, v3
Pavel Tatashin pasha.tatashin@oracle.com mm: sections are not offlined during memory hotremove
Vitaly Wool vitalywool@gmail.com z3fold: fix reclaim lock-ups
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Fix regex_match_front() to not over compare the test string
Mikulas Patocka mpatocka@redhat.com dm integrity: use kvfree for kvmalloc'd memory
Hans de Goede hdegoede@redhat.com libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs
Johan Hovold johan@kernel.org rfkill: gpio: fix memory leak in probe error path
Uwe Kleine-König u.kleine-koenig@pengutronix.de gpio: fix error path in lineevent_create
Govert Overgaauw govert.overgaauw@prodrive-technologies.com gpio: fix aspeed_gpio unmask irq
Timur Tabi timur@codeaurora.org gpioib: do not free unrequested descriptors
Jann Horn jannh@google.com compat: fix 4-byte infoleak via uninitialized struct field
Suzuki K Poulose suzuki.poulose@arm.com arm64: Add work around for Arm Cortex-A55 Erratum 1024718
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
Laurent Vivier lvivier@redhat.com KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
Paul Mackerras paulus@ozlabs.org KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry
Jan Kara jack@suse.cz bdi: Fix oops in wb_workfn()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp bdi: wake up concurrent wb_shutdown() callers.
Eric Dumazet edumazet@google.com tcp: fix TCP_REPAIR_QUEUE bound checking
Jiri Olsa jolsa@kernel.org perf: Remove superfluous allocation error check
Michal Hocko mhocko@suse.com memcg: fix per_node_info cleanup
Eric Dumazet edumazet@google.com inetpeer: fix uninit-value in inet_getpeer
Eric Dumazet edumazet@google.com soreuseport: initialise timewait reuseport field
Eric Dumazet edumazet@google.com ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
Eric Dumazet edumazet@google.com dccp: initialize ireq->ir_mark
Eric Dumazet edumazet@google.com net: fix uninit-value in __hw_addr_add_ex()
Eric Dumazet edumazet@google.com net: initialize skb->peeked when cloning
Eric Dumazet edumazet@google.com net: fix rtnh_ok()
Eric Dumazet edumazet@google.com netlink: fix uninit-value in netlink_sendmsg
Eric Dumazet edumazet@google.com crypto: af_alg - fix possible uninit-value in alg_bind()
Tom Herbert tom@quantonium.net kcm: Call strp_stop before strp_done in kcm_attach
Florian Westphal fw@strlen.de netfilter: ebtables: don't attempt to allocate 0-sized compat array
Julian Anastasov ja@ssi.bg ipvs: fix rtnl_lock lockups caused by start_sync_thread
-------------
Diffstat:
Documentation/arm64/silicon-errata.txt | 1 + Makefile | 4 +- arch/arm64/Kconfig | 14 +++ arch/arm64/include/asm/assembler.h | 40 +++++++++ arch/arm64/include/asm/cputype.h | 2 + arch/arm64/mm/proc.S | 5 ++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 72 +++++++++------ arch/powerpc/kvm/book3s_hv.c | 17 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +- arch/x86/events/core.c | 8 +- arch/x86/events/intel/cstate.c | 2 + arch/x86/events/msr.c | 9 +- arch/x86/kvm/lapic.c | 37 ++++---- crypto/af_alg.c | 8 +- drivers/ata/libata-core.c | 3 + drivers/atm/zatm.c | 3 + drivers/bluetooth/btusb.c | 19 +++- drivers/gpio/gpio-aspeed.c | 2 +- drivers/gpio/gpiolib.c | 7 +- drivers/gpu/drm/drm_atomic.c | 8 ++ drivers/gpu/drm/i915/intel_lvds.c | 3 +- drivers/gpu/drm/nouveau/nv50_display.c | 7 +- drivers/gpu/drm/vc4/vc4_plane.c | 2 +- drivers/md/dm-integrity.c | 2 +- drivers/net/can/spi/hi311x.c | 11 ++- drivers/net/can/usb/kvaser_usb.c | 2 +- drivers/nvme/host/nvme.h | 5 ++ drivers/nvme/host/pci.c | 12 ++- drivers/pci/pci.c | 37 +++++--- drivers/thermal/samsung/exynos_tmu.c | 14 ++- fs/ceph/file.c | 10 +-- fs/cifs/cifsfs.c | 13 +++ fs/fs-writeback.c | 2 +- include/linux/oom.h | 2 + include/linux/wait_bit.h | 17 ++++ include/net/inet_timewait_sock.h | 1 + include/net/nexthop.h | 2 +- kernel/compat.c | 1 + kernel/events/callchain.c | 10 +-- kernel/events/ring_buffer.c | 7 +- kernel/sched/autogroup.c | 7 +- kernel/sched/cpufreq_schedutil.c | 3 +- kernel/trace/trace_events_filter.c | 3 + kernel/trace/trace_uprobe.c | 2 + mm/backing-dev.c | 2 +- mm/memcontrol.c | 3 + mm/mmap.c | 44 +++++---- mm/oom_kill.c | 74 ++++++++------- mm/sparse.c | 2 +- mm/z3fold.c | 42 ++++++--- net/atm/lec.c | 9 +- net/bridge/netfilter/ebtables.c | 11 +-- net/core/dev_addr_lists.c | 4 +- net/core/skbuff.c | 1 + net/dccp/ipv4.c | 1 + net/dccp/ipv6.c | 1 + net/ipv4/inet_timewait_sock.c | 1 + net/ipv4/inetpeer.c | 1 + net/ipv4/route.c | 11 +-- net/ipv4/tcp.c | 2 +- net/kcm/kcmsock.c | 1 + net/netfilter/ipvs/ip_vs_ctl.c | 8 -- net/netfilter/ipvs/ip_vs_sync.c | 155 ++++++++++++++++---------------- net/netlink/af_netlink.c | 2 + net/rfkill/rfkill-gpio.c | 7 +- 65 files changed, 543 insertions(+), 283 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Anastasov ja@ssi.bg
commit 5c64576a77894a50be80be0024bed27171b55989 upstream.
syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2]
We have 2 problems in start_sync_thread if error path is taken, eg. on memory allocation error or failure to configure sockets for mcast group or addr/port binding:
1. recursive locking: holding rtnl_lock while calling sock_release which in turn calls again rtnl_lock in ip_mc_drop_socket to leave the mcast group, as noticed by Florian Westphal. Additionally, sock_release can not be called while holding sync_mutex (ABBA deadlock).
2. task hung: holding rtnl_lock while calling kthread_stop to stop the running kthreads. As the kthreads do the same to leave the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) they hang.
Fix the problems by calling rtnl_unlock early in the error path, now sock_release is called after unlocking both mutexes.
Problem 3 (task hung reported by syzkaller [2]) is variant of problem 2: use _trylock to prevent one user to call rtnl_lock and then while waiting for sync_mutex to block kthreads that execute sock_release when they are stopped by stop_sync_thread.
[1] IPVS: stopping backup sync thread 4500 ... WARNING: possible recursive locking detected 4.16.0-rc7+ #3 Not tainted -------------------------------------------- syzkaller688027/4497 is trying to acquire lock: (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
but task is already holding lock: IPVS: stopping backup sync thread 4495 ... (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(rtnl_mutex); lock(rtnl_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by syzkaller688027/4497: #0: (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<00000000703f78e3>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388
stack backtrace: CPU: 1 PID: 4497 Comm: syzkaller688027 Not tainted 4.16.0-rc7+ #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_deadlock_bug kernel/locking/lockdep.c:1761 [inline] check_deadlock kernel/locking/lockdep.c:1805 [inline] validate_chain kernel/locking/lockdep.c:2401 [inline] __lock_acquire+0xe8f/0x3e00 kernel/locking/lockdep.c:3431 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 ip_mc_drop_socket+0x88/0x230 net/ipv4/igmp.c:2643 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:413 sock_release+0x8d/0x1e0 net/socket.c:595 start_sync_thread+0x2213/0x2b70 net/netfilter/ipvs/ip_vs_sync.c:1924 do_ip_vs_set_ctl+0x1139/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2389 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1261 udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2406 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x446a69 RSP: 002b:00007fa1c3a64da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000446a69 RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00000000006e29fc R08: 0000000000000018 R09: 0000000000000000 R10: 00000000200000c0 R11: 0000000000000246 R12: 00000000006e29f8 R13: 00676e697279656b R14: 00007fa1c3a659c0 R15: 00000000006e2b60
[2] IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4, id = 0 IPVS: stopping backup sync thread 25415 ... INFO: task syz-executor7:25421 blocked for more than 120 seconds. Not tainted 4.16.0-rc6+ #284 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor7 D23688 25421 4408 0x00000004 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 do_wait_for_common kernel/sched/completion.c:86 [inline] __wait_for_common kernel/sched/completion.c:107 [inline] wait_for_common kernel/sched/completion.c:118 [inline] wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 kthread_stop+0x14a/0x7a0 kernel/kthread.c:530 stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996 do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253 sctp_setsockopt+0x2ca/0x63e0 net/sctp/socket.c:4154 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039 SYSC_setsockopt net/socket.c:1850 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1829 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x454889 RSP: 002b:00007fc927626c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fc9276276d4 RCX: 0000000000454889 RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000017 RBP: 000000000072bf58 R08: 0000000000000018 R09: 0000000000000000 R10: 0000000020000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000051c R14: 00000000006f9b40 R15: 0000000000000001
Showing all locks held in the system: 2 locks held by khungtaskd/868: #0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] watchdog+0x1c5/0xd60 kernel/hung_task.c:249 #1: (tasklist_lock){.+.+}, at: [<0000000037c2f8f9>] debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 1 lock held by rsyslogd/4247: #0: (&f->f_pos_lock){+.+.}, at: [<000000000d8d6983>] __fdget_pos+0x12b/0x190 fs/file.c:765 2 locks held by getty/4338: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4339: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4340: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4341: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4342: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4343: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4344: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 3 locks held by kworker/0:5/6494: #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] work_static include/linux/workqueue.h:198 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] set_work_data kernel/workqueue.c:619 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084 #1: ((addr_chk_work).work){+.+.}, at: [<00000000278427d5>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088 #2: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 1 lock held by syz-executor7/25421: #0: (ipvs->sync_mutex){+.+.}, at: [<00000000d414a689>] do_ip_vs_set_ctl+0x277/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2393 2 locks held by syz-executor7/25427: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<00000000e6d48489>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388 1 lock held by syz-executor7/25435: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 1 lock held by ipvs-b:2:0/25415: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
Reported-and-tested-by: syzbot+a46d6abf9d56b1365a72@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+5fe074c01b2032ce9618@syzkaller.appspotmail.com Fixes: e0b26cc997d5 ("ipvs: call rtnl_lock early") Signed-off-by: Julian Anastasov ja@ssi.bg Signed-off-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Cc: Zubin Mithra zsm@chromium.org Cc: Guenter Roeck groeck@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/ipvs/ip_vs_ctl.c | 8 -- net/netfilter/ipvs/ip_vs_sync.c | 155 ++++++++++++++++++++-------------------- 2 files changed, 80 insertions(+), 83 deletions(-)
--- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -2387,11 +2387,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cm strlcpy(cfg.mcast_ifn, dm->mcast_ifn, sizeof(cfg.mcast_ifn)); cfg.syncid = dm->syncid; - rtnl_lock(); - mutex_lock(&ipvs->sync_mutex); ret = start_sync_thread(ipvs, &cfg, dm->state); - mutex_unlock(&ipvs->sync_mutex); - rtnl_unlock(); } else { mutex_lock(&ipvs->sync_mutex); ret = stop_sync_thread(ipvs, dm->state); @@ -3484,12 +3480,8 @@ static int ip_vs_genl_new_daemon(struct if (ipvs->mixed_address_family_dests > 0) return -EINVAL;
- rtnl_lock(); - mutex_lock(&ipvs->sync_mutex); ret = start_sync_thread(ipvs, &c, nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE])); - mutex_unlock(&ipvs->sync_mutex); - rtnl_unlock(); return ret; }
--- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -49,6 +49,7 @@ #include <linux/kthread.h> #include <linux/wait.h> #include <linux/kernel.h> +#include <linux/sched/signal.h>
#include <asm/unaligned.h> /* Used for ntoh_seq and hton_seq */
@@ -1360,15 +1361,9 @@ static void set_mcast_pmtudisc(struct so /* * Specifiy default interface for outgoing multicasts */ -static int set_mcast_if(struct sock *sk, char *ifname) +static int set_mcast_if(struct sock *sk, struct net_device *dev) { - struct net_device *dev; struct inet_sock *inet = inet_sk(sk); - struct net *net = sock_net(sk); - - dev = __dev_get_by_name(net, ifname); - if (!dev) - return -ENODEV;
if (sk->sk_bound_dev_if && dev->ifindex != sk->sk_bound_dev_if) return -EINVAL; @@ -1396,19 +1391,14 @@ static int set_mcast_if(struct sock *sk, * in the in_addr structure passed in as a parameter. */ static int -join_mcast_group(struct sock *sk, struct in_addr *addr, char *ifname) +join_mcast_group(struct sock *sk, struct in_addr *addr, struct net_device *dev) { - struct net *net = sock_net(sk); struct ip_mreqn mreq; - struct net_device *dev; int ret;
memset(&mreq, 0, sizeof(mreq)); memcpy(&mreq.imr_multiaddr, addr, sizeof(struct in_addr));
- dev = __dev_get_by_name(net, ifname); - if (!dev) - return -ENODEV; if (sk->sk_bound_dev_if && dev->ifindex != sk->sk_bound_dev_if) return -EINVAL;
@@ -1423,15 +1413,10 @@ join_mcast_group(struct sock *sk, struct
#ifdef CONFIG_IP_VS_IPV6 static int join_mcast_group6(struct sock *sk, struct in6_addr *addr, - char *ifname) + struct net_device *dev) { - struct net *net = sock_net(sk); - struct net_device *dev; int ret;
- dev = __dev_get_by_name(net, ifname); - if (!dev) - return -ENODEV; if (sk->sk_bound_dev_if && dev->ifindex != sk->sk_bound_dev_if) return -EINVAL;
@@ -1443,24 +1428,18 @@ static int join_mcast_group6(struct sock } #endif
-static int bind_mcastif_addr(struct socket *sock, char *ifname) +static int bind_mcastif_addr(struct socket *sock, struct net_device *dev) { - struct net *net = sock_net(sock->sk); - struct net_device *dev; __be32 addr; struct sockaddr_in sin;
- dev = __dev_get_by_name(net, ifname); - if (!dev) - return -ENODEV; - addr = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); if (!addr) pr_err("You probably need to specify IP address on " "multicast interface.\n");
IP_VS_DBG(7, "binding socket with (%s) %pI4\n", - ifname, &addr); + dev->name, &addr);
/* Now bind the socket with the address of multicast interface */ sin.sin_family = AF_INET; @@ -1493,7 +1472,8 @@ static void get_mcast_sockaddr(union ipv /* * Set up sending multicast socket over UDP */ -static struct socket *make_send_sock(struct netns_ipvs *ipvs, int id) +static int make_send_sock(struct netns_ipvs *ipvs, int id, + struct net_device *dev, struct socket **sock_ret) { /* multicast addr */ union ipvs_sockaddr mcast_addr; @@ -1505,9 +1485,10 @@ static struct socket *make_send_sock(str IPPROTO_UDP, &sock); if (result < 0) { pr_err("Error during creation of socket; terminating\n"); - return ERR_PTR(result); + goto error; } - result = set_mcast_if(sock->sk, ipvs->mcfg.mcast_ifn); + *sock_ret = sock; + result = set_mcast_if(sock->sk, dev); if (result < 0) { pr_err("Error setting outbound mcast interface\n"); goto error; @@ -1522,7 +1503,7 @@ static struct socket *make_send_sock(str set_sock_size(sock->sk, 1, result);
if (AF_INET == ipvs->mcfg.mcast_af) - result = bind_mcastif_addr(sock, ipvs->mcfg.mcast_ifn); + result = bind_mcastif_addr(sock, dev); else result = 0; if (result < 0) { @@ -1538,19 +1519,18 @@ static struct socket *make_send_sock(str goto error; }
- return sock; + return 0;
error: - sock_release(sock); - return ERR_PTR(result); + return result; }
/* * Set up receiving multicast socket over UDP */ -static struct socket *make_receive_sock(struct netns_ipvs *ipvs, int id, - int ifindex) +static int make_receive_sock(struct netns_ipvs *ipvs, int id, + struct net_device *dev, struct socket **sock_ret) { /* multicast addr */ union ipvs_sockaddr mcast_addr; @@ -1562,8 +1542,9 @@ static struct socket *make_receive_sock( IPPROTO_UDP, &sock); if (result < 0) { pr_err("Error during creation of socket; terminating\n"); - return ERR_PTR(result); + goto error; } + *sock_ret = sock; /* it is equivalent to the REUSEADDR option in user-space */ sock->sk->sk_reuse = SK_CAN_REUSE; result = sysctl_sync_sock_size(ipvs); @@ -1571,7 +1552,7 @@ static struct socket *make_receive_sock( set_sock_size(sock->sk, 0, result);
get_mcast_sockaddr(&mcast_addr, &salen, &ipvs->bcfg, id); - sock->sk->sk_bound_dev_if = ifindex; + sock->sk->sk_bound_dev_if = dev->ifindex; result = sock->ops->bind(sock, (struct sockaddr *)&mcast_addr, salen); if (result < 0) { pr_err("Error binding to the multicast addr\n"); @@ -1582,21 +1563,20 @@ static struct socket *make_receive_sock( #ifdef CONFIG_IP_VS_IPV6 if (ipvs->bcfg.mcast_af == AF_INET6) result = join_mcast_group6(sock->sk, &mcast_addr.in6.sin6_addr, - ipvs->bcfg.mcast_ifn); + dev); else #endif result = join_mcast_group(sock->sk, &mcast_addr.in.sin_addr, - ipvs->bcfg.mcast_ifn); + dev); if (result < 0) { pr_err("Error joining to the multicast group\n"); goto error; }
- return sock; + return 0;
error: - sock_release(sock); - return ERR_PTR(result); + return result; }
@@ -1781,13 +1761,12 @@ static int sync_thread_backup(void *data int start_sync_thread(struct netns_ipvs *ipvs, struct ipvs_sync_daemon_cfg *c, int state) { - struct ip_vs_sync_thread_data *tinfo; + struct ip_vs_sync_thread_data *tinfo = NULL; struct task_struct **array = NULL, *task; - struct socket *sock; struct net_device *dev; char *name; int (*threadfn)(void *data); - int id, count, hlen; + int id = 0, count, hlen; int result = -ENOMEM; u16 mtu, min_mtu;
@@ -1795,6 +1774,18 @@ int start_sync_thread(struct netns_ipvs IP_VS_DBG(7, "Each ip_vs_sync_conn entry needs %zd bytes\n", sizeof(struct ip_vs_sync_conn_v0));
+ /* Do not hold one mutex and then to block on another */ + for (;;) { + rtnl_lock(); + if (mutex_trylock(&ipvs->sync_mutex)) + break; + rtnl_unlock(); + mutex_lock(&ipvs->sync_mutex); + if (rtnl_trylock()) + break; + mutex_unlock(&ipvs->sync_mutex); + } + if (!ipvs->sync_state) { count = clamp(sysctl_sync_ports(ipvs), 1, IPVS_SYNC_PORTS_MAX); ipvs->threads_mask = count - 1; @@ -1813,7 +1804,8 @@ int start_sync_thread(struct netns_ipvs dev = __dev_get_by_name(ipvs->net, c->mcast_ifn); if (!dev) { pr_err("Unknown mcast interface: %s\n", c->mcast_ifn); - return -ENODEV; + result = -ENODEV; + goto out_early; } hlen = (AF_INET6 == c->mcast_af) ? sizeof(struct ipv6hdr) + sizeof(struct udphdr) : @@ -1830,26 +1822,30 @@ int start_sync_thread(struct netns_ipvs c->sync_maxlen = mtu - hlen;
if (state == IP_VS_STATE_MASTER) { + result = -EEXIST; if (ipvs->ms) - return -EEXIST; + goto out_early;
ipvs->mcfg = *c; name = "ipvs-m:%d:%d"; threadfn = sync_thread_master; } else if (state == IP_VS_STATE_BACKUP) { + result = -EEXIST; if (ipvs->backup_threads) - return -EEXIST; + goto out_early;
ipvs->bcfg = *c; name = "ipvs-b:%d:%d"; threadfn = sync_thread_backup; } else { - return -EINVAL; + result = -EINVAL; + goto out_early; }
if (state == IP_VS_STATE_MASTER) { struct ipvs_master_sync_state *ms;
+ result = -ENOMEM; ipvs->ms = kcalloc(count, sizeof(ipvs->ms[0]), GFP_KERNEL); if (!ipvs->ms) goto out; @@ -1865,39 +1861,38 @@ int start_sync_thread(struct netns_ipvs } else { array = kcalloc(count, sizeof(struct task_struct *), GFP_KERNEL); + result = -ENOMEM; if (!array) goto out; }
- tinfo = NULL; for (id = 0; id < count; id++) { - if (state == IP_VS_STATE_MASTER) - sock = make_send_sock(ipvs, id); - else - sock = make_receive_sock(ipvs, id, dev->ifindex); - if (IS_ERR(sock)) { - result = PTR_ERR(sock); - goto outtinfo; - } + result = -ENOMEM; tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL); if (!tinfo) - goto outsocket; + goto out; tinfo->ipvs = ipvs; - tinfo->sock = sock; + tinfo->sock = NULL; if (state == IP_VS_STATE_BACKUP) { tinfo->buf = kmalloc(ipvs->bcfg.sync_maxlen, GFP_KERNEL); if (!tinfo->buf) - goto outtinfo; + goto out; } else { tinfo->buf = NULL; } tinfo->id = id; + if (state == IP_VS_STATE_MASTER) + result = make_send_sock(ipvs, id, dev, &tinfo->sock); + else + result = make_receive_sock(ipvs, id, dev, &tinfo->sock); + if (result < 0) + goto out;
task = kthread_run(threadfn, tinfo, name, ipvs->gen, id); if (IS_ERR(task)) { result = PTR_ERR(task); - goto outtinfo; + goto out; } tinfo = NULL; if (state == IP_VS_STATE_MASTER) @@ -1914,20 +1909,20 @@ int start_sync_thread(struct netns_ipvs ipvs->sync_state |= state; spin_unlock_bh(&ipvs->sync_buff_lock);
+ mutex_unlock(&ipvs->sync_mutex); + rtnl_unlock(); + /* increase the module use count */ ip_vs_use_count_inc();
return 0;
-outsocket: - sock_release(sock); - -outtinfo: - if (tinfo) { - sock_release(tinfo->sock); - kfree(tinfo->buf); - kfree(tinfo); - } +out: + /* We do not need RTNL lock anymore, release it here so that + * sock_release below and in the kthreads can use rtnl_lock + * to leave the mcast group. + */ + rtnl_unlock(); count = id; while (count-- > 0) { if (state == IP_VS_STATE_MASTER) @@ -1935,13 +1930,23 @@ outtinfo: else kthread_stop(array[count]); } - kfree(array); - -out: if (!(ipvs->sync_state & IP_VS_STATE_MASTER)) { kfree(ipvs->ms); ipvs->ms = NULL; } + mutex_unlock(&ipvs->sync_mutex); + if (tinfo) { + if (tinfo->sock) + sock_release(tinfo->sock); + kfree(tinfo->buf); + kfree(tinfo); + } + kfree(array); + return result; + +out_early: + mutex_unlock(&ipvs->sync_mutex); + rtnl_unlock(); return result; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 3f1e53abff84cf40b1adb3455d480dd295bf42e8 upstream.
Dmitry reports 32bit ebtables on 64bit kernel got broken by a recent change that returns -EINVAL when ruleset has no entries.
ebtables however only counts user-defined chains, so for the initial table nentries will be 0.
Don't try to allocate the compat array in this case, as no user defined rules exist no rule will need 64bit translation.
Reported-by: Dmitry Vyukov dvyukov@google.com Fixes: 7d7d7e02111e9 ("netfilter: compat: reject huge allocation requests") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/netfilter/ebtables.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1819,13 +1819,14 @@ static int compat_table_info(const struc { unsigned int size = info->entries_size; const void *entries = info->entries; - int ret;
newinfo->entries_size = size; - - ret = xt_compat_init_offsets(NFPROTO_BRIDGE, info->nentries); - if (ret) - return ret; + if (info->nentries) { + int ret = xt_compat_init_offsets(NFPROTO_BRIDGE, + info->nentries); + if (ret) + return ret; + }
return EBT_ENTRY_ITERATE(entries, size, compat_calc_entry, info, entries, newinfo);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Herbert tom@quantonium.net
commit dff8baa261174de689a44572d0ea182d7aa70598 upstream.
In kcm_attach strp_done is called when sk_user_data is already set to fail the attach. strp_done needs the strp to be stopped and warns if it isn't. Call strp_stop in this case to eliminate the warning message.
Reported-by: syzbot+88dfb55e4c8b770d86e3@syzkaller.appspotmail.com Fixes: e5571240236c5652f ("kcm: Check if sk_user_data already set in kcm_attach" Signed-off-by: Tom Herbert tom@quantonium.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/kcm/kcmsock.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1425,6 +1425,7 @@ static int kcm_attach(struct socket *soc */ if (csk->sk_user_data) { write_unlock_bh(&csk->sk_callback_lock); + strp_stop(&psock->strp); strp_done(&psock->strp); kmem_cache_free(kcm_psockp, psock); err = -EALREADY;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit a466856e0b7ab269cdf9461886d007e88ff575b0 upstream.
syzbot reported :
BUG: KMSAN: uninit-value in alg_bind+0xe3/0xd90 crypto/af_alg.c:162
We need to check addr_len before dereferencing sa (or uaddr)
Fixes: bb30b8848c85 ("crypto: af_alg - whitelist mask and type") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Stephan Mueller smueller@chronox.de Cc: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/af_alg.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -158,16 +158,16 @@ static int alg_bind(struct socket *sock, void *private; int err;
- /* If caller uses non-allowed flag, return error. */ - if ((sa->salg_feat & ~allowed) || (sa->salg_mask & ~allowed)) - return -EINVAL; - if (sock->state == SS_CONNECTED) return -EINVAL;
if (addr_len < sizeof(*sa)) return -EINVAL;
+ /* If caller uses non-allowed flag, return error. */ + if ((sa->salg_feat & ~allowed) || (sa->salg_mask & ~allowed)) + return -EINVAL; + sa->salg_type[sizeof(sa->salg_type) - 1] = 0; sa->salg_name[sizeof(sa->salg_name) + addr_len - sizeof(*sa) - 1] = 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 6091f09c2f79730d895149bcfe3d66140288cd0e upstream.
syzbot reported :
BUG: KMSAN: uninit-value in ffs arch/x86/include/asm/bitops.h:432 [inline] BUG: KMSAN: uninit-value in netlink_sendmsg+0xb26/0x1310 net/netlink/af_netlink.c:1851
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netlink/af_netlink.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1813,6 +1813,8 @@ static int netlink_sendmsg(struct socket
if (msg->msg_namelen) { err = -EINVAL; + if (msg->msg_namelen < sizeof(struct sockaddr_nl)) + goto out; if (addr->nl_family != AF_NETLINK) goto out; dst_portid = addr->nl_pid;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit b1993a2de12c9e75c35729e2ffbc3a92d50c0d31 upstream.
syzbot reported :
BUG: KMSAN: uninit-value in rtnh_ok include/net/nexthop.h:11 [inline] BUG: KMSAN: uninit-value in fib_count_nexthops net/ipv4/fib_semantics.c:469 [inline] BUG: KMSAN: uninit-value in fib_create_info+0x554/0x8d20 net/ipv4/fib_semantics.c:1091
@remaining is an integer, coming from user space. If it is negative we want rtnh_ok() to return false.
Fixes: 4e902c57417c ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/nexthop.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -7,7 +7,7 @@
static inline int rtnh_ok(const struct rtnexthop *rtnh, int remaining) { - return remaining >= sizeof(*rtnh) && + return remaining >= (int)sizeof(*rtnh) && rtnh->rtnh_len >= sizeof(*rtnh) && rtnh->rtnh_len <= remaining; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit b13dda9f9aa7caceeee61c080c2e544d5f5d85e5 upstream.
syzbot reported __skb_try_recv_from_queue() was using skb->peeked while it was potentially unitialized.
We need to clear it in __skb_clone()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/core/skbuff.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -857,6 +857,7 @@ static struct sk_buff *__skb_clone(struc n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len; n->cloned = 1; n->nohdr = 0; + n->peeked = 0; n->destructor = NULL; C(tail); C(end);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 77d36398d99f2565c0a8d43a86fd520a82e64bb8 upstream.
syzbot complained :
BUG: KMSAN: uninit-value in memcmp+0x119/0x180 lib/string.c:861 CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 memcmp+0x119/0x180 lib/string.c:861 __hw_addr_add_ex net/core/dev_addr_lists.c:60 [inline] __dev_mc_add+0x1c2/0x8e0 net/core/dev_addr_lists.c:670 dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687 igmp6_group_added+0x2db/0xa00 net/ipv6/mcast.c:662 ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914 addrconf_join_solict net/ipv6/addrconf.c:2078 [inline] addrconf_dad_begin net/ipv6/addrconf.c:3828 [inline] addrconf_dad_work+0x427/0x2150 net/ipv6/addrconf.c:3954 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2113 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2247 kthread+0x539/0x720 kernel/kthread.c:239
Fixes: f001fde5eadd ("net: introduce a list of device addresses dev_addr_list (v6)") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/core/dev_addr_lists.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/dev_addr_lists.c +++ b/net/core/dev_addr_lists.c @@ -57,8 +57,8 @@ static int __hw_addr_add_ex(struct netde return -EINVAL;
list_for_each_entry(ha, &list->list, list) { - if (!memcmp(ha->addr, addr, addr_len) && - ha->type == addr_type) { + if (ha->type == addr_type && + !memcmp(ha->addr, addr, addr_len)) { if (global) { /* check if addr is already used as global */ if (ha->global_use)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit b855ff827476adbdc2259e9895681d82b7b26065 upstream.
syzbot reported an uninit-value read of skb->mark in iptable_mangle_hook()
Thanks to the nice report, I tracked the problem to dccp not caring of ireq->ir_mark for passive sessions.
BUG: KMSAN: uninit-value in ipt_mangle_out net/ipv4/netfilter/iptable_mangle.c:66 [inline] BUG: KMSAN: uninit-value in iptable_mangle_hook+0x5e5/0x720 net/ipv4/netfilter/iptable_mangle.c:84 CPU: 0 PID: 5300 Comm: syz-executor3 Not tainted 4.16.0+ #81 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 ipt_mangle_out net/ipv4/netfilter/iptable_mangle.c:66 [inline] iptable_mangle_hook+0x5e5/0x720 net/ipv4/netfilter/iptable_mangle.c:84 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0x158/0x3d0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] __ip_local_out net/ipv4/ip_output.c:113 [inline] ip_local_out net/ipv4/ip_output.c:122 [inline] ip_queue_xmit+0x1d21/0x21c0 net/ipv4/ip_output.c:504 dccp_transmit_skb+0x15eb/0x1900 net/dccp/output.c:142 dccp_xmit_packet+0x814/0x9e0 net/dccp/output.c:281 dccp_write_xmit+0x20f/0x480 net/dccp/output.c:363 dccp_sendmsg+0x12ca/0x12d0 net/dccp/proto.c:818 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x455259 RSP: 002b:00007f1a4473dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f1a4473e6d4 RCX: 0000000000455259 RDX: 0000000000000000 RSI: 0000000020b76fc8 RDI: 0000000000000015 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000004f0 R14: 00000000006fa720 R15: 0000000000000000
Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_save_stack mm/kmsan/kmsan.c:293 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521 ip_queue_xmit+0x1e35/0x21c0 net/ipv4/ip_output.c:502 dccp_transmit_skb+0x15eb/0x1900 net/dccp/output.c:142 dccp_xmit_packet+0x814/0x9e0 net/dccp/output.c:281 dccp_write_xmit+0x20f/0x480 net/dccp/output.c:363 dccp_sendmsg+0x12ca/0x12d0 net/dccp/proto.c:818 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_save_stack mm/kmsan/kmsan.c:293 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521 inet_csk_clone_lock+0x503/0x580 net/ipv4/inet_connection_sock.c:797 dccp_create_openreq_child+0x7f/0x890 net/dccp/minisocks.c:92 dccp_v4_request_recv_sock+0x22c/0xe90 net/dccp/ipv4.c:408 dccp_v6_request_recv_sock+0x290/0x2000 net/dccp/ipv6.c:414 dccp_check_req+0x7b9/0x8f0 net/dccp/minisocks.c:197 dccp_v4_rcv+0x12e4/0x2630 net/dccp/ipv4.c:840 ip_local_deliver_finish+0x6ed/0xd40 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:449 [inline] ip_rcv_finish+0x1253/0x16d0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] process_backlog+0x62d/0xe20 net/core/dev.c:5307 napi_poll net/core/dev.c:5705 [inline] net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771 __do_softirq+0x56d/0x93d kernel/softirq.c:285 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756 reqsk_alloc include/net/request_sock.h:88 [inline] inet_reqsk_alloc+0xc4/0x7f0 net/ipv4/tcp_input.c:6145 dccp_v4_conn_request+0x5cc/0x1770 net/dccp/ipv4.c:600 dccp_v6_conn_request+0x299/0x1880 net/dccp/ipv6.c:317 dccp_rcv_state_process+0x2ea/0x2410 net/dccp/input.c:612 dccp_v4_do_rcv+0x229/0x340 net/dccp/ipv4.c:682 dccp_v6_do_rcv+0x16d/0x1220 net/dccp/ipv6.c:578 sk_backlog_rcv include/net/sock.h:908 [inline] __sk_receive_skb+0x60e/0xf20 net/core/sock.c:513 dccp_v4_rcv+0x24d4/0x2630 net/dccp/ipv4.c:874 ip_local_deliver_finish+0x6ed/0xd40 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:449 [inline] ip_rcv_finish+0x1253/0x16d0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] process_backlog+0x62d/0xe20 net/core/dev.c:5307 napi_poll net/core/dev.c:5705 [inline] net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771 __do_softirq+0x56d/0x93d kernel/softirq.c:285
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/dccp/ipv4.c | 1 + net/dccp/ipv6.c | 1 + 2 files changed, 2 insertions(+)
--- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -614,6 +614,7 @@ int dccp_v4_conn_request(struct sock *sk ireq = inet_rsk(req); sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr); sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr); + ireq->ir_mark = inet_request_mark(sk, skb); ireq->ireq_family = AF_INET; ireq->ir_iif = sk->sk_bound_dev_if;
--- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -351,6 +351,7 @@ static int dccp_v6_conn_request(struct s ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; ireq->ireq_family = AF_INET6; + ireq->ir_mark = inet_request_mark(sk, skb);
if (ipv6_opt_accepted(sk, skb, IP6CB(skb)) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit d0ea2b12500543535be3f54e17920fffc9bb45f6 upstream.
syzbot complained that res.type could be used while not initialized.
Using RTN_UNSPEC as initial value seems better than using garbage.
BUG: KMSAN: uninit-value in __mkroute_output net/ipv4/route.c:2200 [inline] BUG: KMSAN: uninit-value in ip_route_output_key_hash_rcu+0x31f0/0x3940 net/ipv4/route.c:2493 CPU: 1 PID: 12207 Comm: syz-executor0 Not tainted 4.16.0+ #81 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 __mkroute_output net/ipv4/route.c:2200 [inline] ip_route_output_key_hash_rcu+0x31f0/0x3940 net/ipv4/route.c:2493 ip_route_output_key_hash net/ipv4/route.c:2322 [inline] __ip_route_output_key include/net/route.h:126 [inline] ip_route_output_flow+0x1eb/0x3c0 net/ipv4/route.c:2577 raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747 SyS_sendto+0x8a/0xb0 net/socket.c:1715 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x455259 RSP: 002b:00007fdc0625dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fdc0625e6d4 RCX: 0000000000455259 RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000020000080 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000004f7 R14: 00000000006fa7c8 R15: 0000000000000000
Local variable description: ----res.i.i@ip_route_output_flow Variable was created at: ip_route_output_flow+0x75/0x3c0 net/ipv4/route.c:2576 raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/route.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2288,13 +2288,14 @@ struct rtable *ip_route_output_key_hash( const struct sk_buff *skb) { __u8 tos = RT_FL_TOS(fl4); - struct fib_result res; + struct fib_result res = { + .type = RTN_UNSPEC, + .fi = NULL, + .table = NULL, + .tclassid = 0, + }; struct rtable *rth;
- res.tclassid = 0; - res.fi = NULL; - res.table = NULL; - fl4->flowi4_iif = LOOPBACK_IFINDEX; fl4->flowi4_tos = tos & IPTOS_RT_MASK; fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 3099a52918937ab86ec47038ad80d377ba16c531 upstream.
syzbot reported an uninit-value in inet_csk_bind_conflict() [1]
It turns out we never propagated sk->sk_reuseport into timewait socket.
[1] BUG: KMSAN: uninit-value in inet_csk_bind_conflict+0x5f9/0x990 net/ipv4/inet_connection_sock.c:151 CPU: 1 PID: 3589 Comm: syzkaller008242 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 inet_csk_bind_conflict+0x5f9/0x990 net/ipv4/inet_connection_sock.c:151 inet_csk_get_port+0x1d28/0x1e40 net/ipv4/inet_connection_sock.c:320 inet6_bind+0x121c/0x1820 net/ipv6/af_inet6.c:399 SYSC_bind+0x3f2/0x4b0 net/socket.c:1474 SyS_bind+0x54/0x80 net/socket.c:1460 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x4416e9 RSP: 002b:00007ffce6d15c88 EFLAGS: 00000217 ORIG_RAX: 0000000000000031 RAX: ffffffffffffffda RBX: 0100000000000000 RCX: 00000000004416e9 RDX: 000000000000001c RSI: 0000000020402000 RDI: 0000000000000004 RBP: 0000000000000000 R08: 00000000e6d15e08 R09: 00000000e6d15e08 R10: 0000000000000004 R11: 0000000000000217 R12: 0000000000009478 R13: 00000000006cd448 R14: 0000000000000000 R15: 0000000000000000
Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_save_stack mm/kmsan/kmsan.c:293 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521 tcp_time_wait+0xf17/0xf50 net/ipv4/tcp_minisocks.c:283 tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003 tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269 inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427 inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435 sock_release net/socket.c:595 [inline] sock_close+0xe0/0x300 net/socket.c:1149 __fput+0x49e/0xa10 fs/file_table.c:209 ____fput+0x37/0x40 fs/file_table.c:243 task_work_run+0x243/0x2c0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x10e1/0x38d0 kernel/exit.c:867 do_group_exit+0x1a0/0x360 kernel/exit.c:970 SYSC_exit_group+0x21/0x30 kernel/exit.c:981 SyS_exit_group+0x25/0x30 kernel/exit.c:979 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_save_stack mm/kmsan/kmsan.c:293 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521 inet_twsk_alloc+0xaef/0xc00 net/ipv4/inet_timewait_sock.c:182 tcp_time_wait+0xd9/0xf50 net/ipv4/tcp_minisocks.c:258 tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003 tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269 inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427 inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435 sock_release net/socket.c:595 [inline] sock_close+0xe0/0x300 net/socket.c:1149 __fput+0x49e/0xa10 fs/file_table.c:209 ____fput+0x37/0x40 fs/file_table.c:243 task_work_run+0x243/0x2c0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x10e1/0x38d0 kernel/exit.c:867 do_group_exit+0x1a0/0x360 kernel/exit.c:970 SYSC_exit_group+0x21/0x30 kernel/exit.c:981 SyS_exit_group+0x25/0x30 kernel/exit.c:979 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756 inet_twsk_alloc+0x13b/0xc00 net/ipv4/inet_timewait_sock.c:163 tcp_time_wait+0xd9/0xf50 net/ipv4/tcp_minisocks.c:258 tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003 tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331 sk_backlog_rcv include/net/sock.h:908 [inline] __release_sock+0x2d6/0x680 net/core/sock.c:2271 release_sock+0x97/0x2a0 net/core/sock.c:2786 tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269 inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427 inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435 sock_release net/socket.c:595 [inline] sock_close+0xe0/0x300 net/socket.c:1149 __fput+0x49e/0xa10 fs/file_table.c:209 ____fput+0x37/0x40 fs/file_table.c:243 task_work_run+0x243/0x2c0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x10e1/0x38d0 kernel/exit.c:867 do_group_exit+0x1a0/0x360 kernel/exit.c:970 SYSC_exit_group+0x21/0x30 kernel/exit.c:981 SyS_exit_group+0x25/0x30 kernel/exit.c:979 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: da5e36308d9f ("soreuseport: TCP/IPv4 implementation") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/inet_timewait_sock.h | 1 + net/ipv4/inet_timewait_sock.c | 1 + 2 files changed, 2 insertions(+)
--- a/include/net/inet_timewait_sock.h +++ b/include/net/inet_timewait_sock.h @@ -43,6 +43,7 @@ struct inet_timewait_sock { #define tw_family __tw_common.skc_family #define tw_state __tw_common.skc_state #define tw_reuse __tw_common.skc_reuse +#define tw_reuseport __tw_common.skc_reuseport #define tw_ipv6only __tw_common.skc_ipv6only #define tw_bound_dev_if __tw_common.skc_bound_dev_if #define tw_node __tw_common.skc_nulls_node --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -179,6 +179,7 @@ struct inet_timewait_sock *inet_twsk_all tw->tw_dport = inet->inet_dport; tw->tw_family = sk->sk_family; tw->tw_reuse = sk->sk_reuse; + tw->tw_reuseport = sk->sk_reuseport; tw->tw_hash = sk->sk_hash; tw->tw_ipv6only = 0; tw->tw_transparent = inet->transparent;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit b6a37e5e25414df4b8e9140a5c6f5ee0ec6f3b90 upstream.
syzbot/KMSAN reported that p->dtime was read while it was not yet initialized in :
delta = (__u32)jiffies - p->dtime; if (delta < ttl || !refcount_dec_if_one(&p->refcnt)) gc_stack[i] = NULL;
This is a false positive, because the inetpeer wont be erased from rb-tree if the refcount_dec_if_one(&p->refcnt) does not succeed. And this wont happen before first inet_putpeer() call for this inetpeer has been done, and ->dtime field is written exactly before the refcount_dec_and_test(&p->refcnt).
The KMSAN report was :
BUG: KMSAN: uninit-value in inet_peer_gc net/ipv4/inetpeer.c:163 [inline] BUG: KMSAN: uninit-value in inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228 CPU: 0 PID: 9494 Comm: syz-executor5 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 inet_peer_gc net/ipv4/inetpeer.c:163 [inline] inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228 inet_getpeer_v4 include/net/inetpeer.h:110 [inline] icmpv4_xrlim_allow net/ipv4/icmp.c:330 [inline] icmp_send+0x2b44/0x3050 net/ipv4/icmp.c:725 ip_options_compile+0x237c/0x29f0 net/ipv4/ip_options.c:472 ip_rcv_options net/ipv4/ip_input.c:284 [inline] ip_rcv_finish+0xda8/0x16d0 net/ipv4/ip_input.c:365 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701 netif_receive_skb+0x230/0x240 net/core/dev.c:4725 tun_rx_batched drivers/net/tun.c:1555 [inline] tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x455111 RSP: 002b:00007fae0365cba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 000000000000002e RCX: 0000000000455111 RDX: 0000000000000001 RSI: 00007fae0365cbf0 RDI: 00000000000000fc RBP: 0000000020000040 R08: 00000000000000fc R09: 0000000000000000 R10: 000000000000002e R11: 0000000000000293 R12: 00000000ffffffff R13: 0000000000000658 R14: 00000000006fc8e0 R15: 0000000000000000
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756 inet_getpeer+0xed8/0x1e70 net/ipv4/inetpeer.c:210 inet_getpeer_v4 include/net/inetpeer.h:110 [inline] ip4_frag_init+0x4d1/0x740 net/ipv4/ip_fragment.c:153 inet_frag_alloc net/ipv4/inet_fragment.c:369 [inline] inet_frag_create net/ipv4/inet_fragment.c:385 [inline] inet_frag_find+0x7da/0x1610 net/ipv4/inet_fragment.c:418 ip_find net/ipv4/ip_fragment.c:275 [inline] ip_defrag+0x448/0x67a0 net/ipv4/ip_fragment.c:676 ip_check_defrag+0x775/0xda0 net/ipv4/ip_fragment.c:724 packet_rcv_fanout+0x2a8/0x8d0 net/packet/af_packet.c:1447 deliver_skb net/core/dev.c:1897 [inline] deliver_ptype_list_skb net/core/dev.c:1912 [inline] __netif_receive_skb_core+0x314a/0x4a80 net/core/dev.c:4545 __netif_receive_skb net/core/dev.c:4627 [inline] netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701 netif_receive_skb+0x230/0x240 net/core/dev.c:4725 tun_rx_batched drivers/net/tun.c:1555 [inline] tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/inetpeer.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/inetpeer.c +++ b/net/ipv4/inetpeer.c @@ -210,6 +210,7 @@ struct inet_peer *inet_getpeer(struct in p = kmem_cache_alloc(peer_cachep, GFP_ATOMIC); if (p) { p->daddr = *daddr; + p->dtime = (__u32)jiffies; refcount_set(&p->refcnt, 2); atomic_set(&p->rid, 0); p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Hocko mhocko@suse.com
commit 4eaf431f6f71bbed40a4c733ffe93a7e8cedf9d9 upstream.
syzbot has triggered a NULL ptr dereference when allocation fault injection enforces a failure and alloc_mem_cgroup_per_node_info initializes memcg->nodeinfo only half way through.
But __mem_cgroup_free still tries to free all per-node data and dereferences pn->lruvec_stat_cpu unconditioanlly even if the specific per-node data hasn't been initialized.
The bug is quite unlikely to hit because small allocations do not fail and we would need quite some numa nodes to make struct mem_cgroup_per_node large enough to cross the costly order.
Link: http://lkml.kernel.org/r/20180406100906.17790-1-mhocko@kernel.org Reported-by: syzbot+8a5de3cce7cdc70e9ebe@syzkaller.appspotmail.com Fixes: 00f3ca2c2d66 ("mm: memcontrol: per-lruvec stats infrastructure") Signed-off-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4187,6 +4187,9 @@ static void free_mem_cgroup_per_node_inf { struct mem_cgroup_per_node *pn = memcg->nodeinfo[node];
+ if (!pn) + return; + free_percpu(pn->lruvec_stat); kfree(pn); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
commit bfb3d7b8b906b66551424d7636182126e1d134c8 upstream.
If the get_callchain_buffers fails to allocate the buffer it will decrease the nr_callchain_events right away.
There's no point of checking the allocation error for nr_callchain_events > 1. Removing that check.
Signed-off-by: Jiri Olsa jolsa@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen andi@firstfloor.org Cc: H. Peter Anvin hpa@zytor.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: syzkaller-bugs@googlegroups.com Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20180415092352.12403-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/events/callchain.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/kernel/events/callchain.c +++ b/kernel/events/callchain.c @@ -131,14 +131,8 @@ int get_callchain_buffers(int event_max_ goto exit; }
- if (count > 1) { - /* If the allocation failed, give up */ - if (!callchain_cpus_entries) - err = -ENOMEM; - goto exit; - } - - err = alloc_callchain_buffers(); + if (count == 1) + err = alloc_callchain_buffers(); exit: if (err) atomic_dec(&nr_callchain_events);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit bf2acc943a45d2b2e8a9f1a5ddff6b6e43cc69d9 upstream.
syzbot is able to produce a nasty WARN_ON() in tcp_verify_left_out() with following C-repro :
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0 setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 sendto(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1242, MSG_FASTOPEN, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("127.0.0.1")}, 16) = 1242 setsockopt(3, SOL_TCP, TCP_REPAIR_WINDOW, "\4\0\0@+\205\0\0\377\377\0\0\377\377\377\177\0\0\0\0", 20) = 0 writev(3, [{"\270", 1}], 1) = 1 setsockopt(3, SOL_TCP, TCP_REPAIR_OPTIONS, "\10\0\0\0\0\0\0\0\0\0\0\0|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 386) = 0 writev(3, [{"\210v\r[\226\320t\231qwQ\204\264l\254\t\1\20\245\214p\350H\223\254;\\37\345\307p$"..., 3144}], 1) = 3144
The 3rd system call looks odd : setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0
This patch makes sure bound checking is using an unsigned compare.
Fixes: ee9952831cfd ("tcp: Initial repair mode") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Pavel Emelyanov xemul@parallels.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2601,7 +2601,7 @@ static int do_tcp_setsockopt(struct sock case TCP_REPAIR_QUEUE: if (!tp->repair) err = -EPERM; - else if (val < TCP_QUEUES_NR) + else if ((unsigned int)val < TCP_QUEUES_NR) tp->repair_queue = val; else err = -EINVAL;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 8236b0ae31c837d2b3a2565c5f8d77f637e824cc upstream.
syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).
Introduce a helper function clear_and_wake_up_bit() and use it, in order to avoid similar errors in future.
[1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") Cc: Tejun Heo tj@kernel.org Reviewed-by: Jan Kara jack@suse.cz Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/wait_bit.h | 17 +++++++++++++++++ mm/backing-dev.c | 2 +- 2 files changed, 18 insertions(+), 1 deletion(-)
--- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -259,4 +259,21 @@ int wait_on_atomic_t(atomic_t *val, int return out_of_line_wait_on_atomic_t(val, action, mode); }
+/** + * clear_and_wake_up_bit - clear a bit and wake up anyone waiting on that bit + * + * @bit: the bit of the word being waited on + * @word: the word being waited on, a kernel virtual address + * + * You can use this helper if bitflags are manipulated atomically rather than + * non-atomically under a lock. + */ +static inline void clear_and_wake_up_bit(int bit, void *word) +{ + clear_bit_unlock(bit, word); + /* See wake_up_bit() for which memory barrier you need to use. */ + smp_mb__after_atomic(); + wake_up_bit(word, bit); +} + #endif /* _LINUX_WAIT_BIT_H */ --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -381,7 +381,7 @@ static void wb_shutdown(struct bdi_write * the barrier provided by test_and_clear_bit() above. */ smp_wmb(); - clear_bit(WB_shutting_down, &wb->state); + clear_and_wake_up_bit(WB_shutting_down, &wb->state); }
static void wb_exit(struct bdi_writeback *wb)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
commit b8b784958eccbf8f51ebeee65282ca3fd59ea391 upstream.
Syzbot has reported that it can hit a NULL pointer dereference in wb_workfn() due to wb->bdi->dev being NULL. This indicates that wb_workfn() was called for an already unregistered bdi which should not happen as wb_shutdown() called from bdi_unregister() should make sure all pending writeback works are completed before bdi is unregistered. Except that wb_workfn() itself can requeue the work with:
mod_delayed_work(bdi_wq, &wb->dwork, 0);
and if this happens while wb_shutdown() is waiting in:
flush_delayed_work(&wb->dwork);
the dwork can get executed after wb_shutdown() has finished and bdi_unregister() has cleared wb->bdi->dev.
Make wb_workfn() use wakeup_wb() for requeueing the work which takes all the necessary precautions against racing with bdi unregistration.
CC: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp CC: Tejun Heo tj@kernel.org Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reported-by: syzbot syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fs-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1940,7 +1940,7 @@ void wb_workfn(struct work_struct *work) }
if (!list_empty(&wb->work_list)) - mod_delayed_work(bdi_wq, &wb->dwork, 0); + wb_wakeup(wb); else if (wb_has_dirty_io(wb) && dirty_writeback_interval) wb_wakeup_delayed(wb);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Mackerras paulus@ozlabs.org
commit a8b48a4dccea77e29462e59f1dbf0d5aa1ff167c upstream.
This fixes a bug where the trap number that is returned by __kvmppc_vcore_entry gets corrupted. The effect of the corruption is that IPIs get ignored on POWER9 systems when the IPI is sent via a doorbell interrupt to a CPU which is executing in a KVM guest. The effect of the IPI being ignored is often that another CPU locks up inside smp_call_function_many() (and if that CPU is holding a spinlock, other CPUs then lock up inside raw_spin_lock()).
The trap number is currently held in register r12 for most of the assembly-language part of the guest exit path. In that path, we call kvmppc_subcore_exit_guest(), which is a C function, without restoring r12 afterwards. Depending on the kernel config and the compiler, it may modify r12 or it may not, so some config/compiler combinations see the bug and others don't.
To fix this, we arrange for the trap number to be stored on the stack from the 'guest_bypass:' label until the end of the function, then the trap number is loaded and returned in r12 as before.
Cc: stable@vger.kernel.org # v4.8+ Fixes: fd7bacbca47a ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -308,7 +308,6 @@ kvm_novcpu_exit: stw r12, STACK_SLOT_TRAP(r1) bl kvmhv_commence_exit nop - lwz r12, STACK_SLOT_TRAP(r1) b kvmhv_switch_to_host
/* @@ -1136,6 +1135,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
secondary_too_late: li r12, 0 + stw r12, STACK_SLOT_TRAP(r1) cmpdi r4, 0 beq 11f stw r12, VCPU_TRAP(r4) @@ -1445,12 +1445,12 @@ mc_cont: 1: #endif /* CONFIG_KVM_XICS */
+ stw r12, STACK_SLOT_TRAP(r1) mr r3, r12 /* Increment exit count, poke other threads to exit */ bl kvmhv_commence_exit nop ld r9, HSTATE_KVM_VCPU(r13) - lwz r12, VCPU_TRAP(r9)
/* Stop others sending VCPU interrupts to this physical CPU */ li r0, -1 @@ -1816,6 +1816,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 * POWER7/POWER8 guest -> host partition switch code. * We don't have to lock against tlbies but we do * have to coordinate the hardware threads. + * Here STACK_SLOT_TRAP(r1) contains the trap number. */ kvmhv_switch_to_host: /* Secondary threads wait for primary to do partition switch */ @@ -1868,11 +1869,11 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/* If HMI, call kvmppc_realmode_hmi_handler() */ + lwz r12, STACK_SLOT_TRAP(r1) cmpwi r12, BOOK3S_INTERRUPT_HMI bne 27f bl kvmppc_realmode_hmi_handler nop - li r12, BOOK3S_INTERRUPT_HMI /* * At this point kvmppc_realmode_hmi_handler would have resync-ed * the TB. Hence it is not required to subtract guest timebase @@ -1950,6 +1951,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_R li r0, KVM_GUEST_MODE_NONE stb r0, HSTATE_IN_GUEST(r13)
+ lwz r12, STACK_SLOT_TRAP(r1) /* return trap # in r12 */ ld r0, SFS+PPC_LR_STKOFF(r1) addi r1, r1, SFS mtlr r0
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Laurent Vivier lvivier@redhat.com
commit 61bd0f66ff92d5ce765ff9850fd3cbfec773c560 upstream.
Since commit 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the guest time is not accounted to guest time and user time, but instead to system time.
This is because guest_enter()/guest_exit() are called while interrupts are disabled and the tick counter cannot be updated between them.
To fix that, move guest_exit() after local_irq_enable(), and as guest_enter() is called with IRQ disabled, call guest_enter_irqoff() instead.
Fixes: 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry") Signed-off-by: Laurent Vivier lvivier@redhat.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2847,7 +2847,7 @@ static noinline void kvmppc_run_core(str */ trace_hardirqs_on();
- guest_enter(); + guest_enter_irqoff();
srcu_idx = srcu_read_lock(&vc->kvm->srcu);
@@ -2855,8 +2855,6 @@ static noinline void kvmppc_run_core(str
srcu_read_unlock(&vc->kvm->srcu, srcu_idx);
- guest_exit(); - trace_hardirqs_off(); set_irq_happened(trap);
@@ -2890,6 +2888,7 @@ static noinline void kvmppc_run_core(str kvmppc_set_host_core(pcpu);
local_irq_enable(); + guest_exit();
/* Let secondaries go back to the offline loop */ for (i = 0; i < controlled_threads; ++i) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Mackerras paulus@ozlabs.org
commit debd574f4195e205ba505b25e19b2b797f4bcd94 upstream.
The current code for initializing the VRMA (virtual real memory area) for HPT guests requires the page size of the backing memory to be one of 4kB, 64kB or 16MB. With a radix host we have the possibility that the backing memory page size can be 2MB or 1GB. In these cases, if the guest switches to HPT mode, KVM will not initialize the VRMA and the guest will fail to run.
In fact it is not necessary that the VRMA page size is the same as the backing memory page size; any VRMA page size less than or equal to the backing memory page size is acceptable. Therefore we now choose the largest page size out of the set {4k, 64k, 16M} which is not larger than the backing memory page size.
Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_hv.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3618,15 +3618,17 @@ static int kvmppc_hv_setup_htab_rma(stru goto up_out;
psize = vma_kernel_pagesize(vma); - porder = __ilog2(psize);
up_read(¤t->mm->mmap_sem);
/* We can handle 4k, 64k or 16M pages in the VRMA */ - err = -EINVAL; - if (!(psize == 0x1000 || psize == 0x10000 || - psize == 0x1000000)) - goto out_srcu; + if (psize >= 0x1000000) + psize = 0x1000000; + else if (psize >= 0x10000) + psize = 0x10000; + else + psize = 0x1000; + porder = __ilog2(psize);
senc = slb_pgsize_encoding(psize); kvm->arch.vrma_slb_v = senc | SLB_VSID_B_1T |
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suzuki K Poulose suzuki.poulose@arm.com
commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream.
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer from an erratum 1024718, which causes incorrect updates when DBM/AP bits in a page table entry is modified without a break-before-make sequence. The work around is to skip enabling the hardware DBM feature on the affected cores. The hardware Access Flag management features is not affected. There are some other cores suffering from this errata, which could be added to the midr_list to trigger the work around.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: ckadabi@codeaurora.org Reviewed-by: Dave Martin dave.martin@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/arm64/silicon-errata.txt | 1 arch/arm64/Kconfig | 14 +++++++++++ arch/arm64/include/asm/assembler.h | 40 +++++++++++++++++++++++++++++++++ arch/arm64/include/asm/cputype.h | 2 + arch/arm64/mm/proc.S | 5 ++++ 5 files changed, 62 insertions(+)
--- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -55,6 +55,7 @@ stable kernels. | ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | | ARM | Cortex-A72 | #853709 | N/A | | ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | +| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | | ARM | MMU-500 | #841119,#826419 | N/A | | | | | | | Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -443,6 +443,20 @@ config ARM64_ERRATUM_843419
If unsure, say Y.
+config ARM64_ERRATUM_1024718 + bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update" + default y + help + This option adds work around for Arm Cortex-A55 Erratum 1024718. + + Affected Cortex-A55 cores (r0p0, r0p1, r1p0) could cause incorrect + update of the hardware dirty bit when the DBM/AP bits are updated + without a break-before-make. The work around is to disable the usage + of hardware DBM locally on the affected cores. CPUs not affected by + erratum will continue to use the feature. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -25,6 +25,7 @@
#include <asm/asm-offsets.h> #include <asm/cpufeature.h> +#include <asm/cputype.h> #include <asm/page.h> #include <asm/pgtable-hwdef.h> #include <asm/ptrace.h> @@ -495,4 +496,43 @@ alternative_endif and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT) .endm
+/* + * Check the MIDR_EL1 of the current CPU for a given model and a range of + * variant/revision. See asm/cputype.h for the macros used below. + * + * model: MIDR_CPU_MODEL of CPU + * rv_min: Minimum of MIDR_CPU_VAR_REV() + * rv_max: Maximum of MIDR_CPU_VAR_REV() + * res: Result register. + * tmp1, tmp2, tmp3: Temporary registers + * + * Corrupts: res, tmp1, tmp2, tmp3 + * Returns: 0, if the CPU id doesn't match. Non-zero otherwise + */ + .macro cpu_midr_match model, rv_min, rv_max, res, tmp1, tmp2, tmp3 + mrs \res, midr_el1 + mov_q \tmp1, (MIDR_REVISION_MASK | MIDR_VARIANT_MASK) + mov_q \tmp2, MIDR_CPU_MODEL_MASK + and \tmp3, \res, \tmp2 // Extract model + and \tmp1, \res, \tmp1 // rev & variant + mov_q \tmp2, \model + cmp \tmp3, \tmp2 + cset \res, eq + cbz \res, .Ldone@ // Model matches ? + + .if (\rv_min != 0) // Skip min check if rv_min == 0 + mov_q \tmp3, \rv_min + cmp \tmp1, \tmp3 + cset \res, ge + .endif // \rv_min != 0 + /* Skip rv_max check if rv_min == rv_max && rv_min != 0 */ + .if ((\rv_min != \rv_max) || \rv_min == 0) + mov_q \tmp2, \rv_max + cmp \tmp1, \tmp2 + cset \tmp2, le + and \res, \res, \tmp2 + .endif +.Ldone@: + .endm + #endif /* __ASM_ASSEMBLER_H */ --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -78,6 +78,7 @@
#define ARM_CPU_PART_AEM_V8 0xD0F #define ARM_CPU_PART_FOUNDATION 0xD00 +#define ARM_CPU_PART_CORTEX_A55 0xD05 #define ARM_CPU_PART_CORTEX_A57 0xD07 #define ARM_CPU_PART_CORTEX_A72 0xD08 #define ARM_CPU_PART_CORTEX_A53 0xD03 @@ -98,6 +99,7 @@ #define QCOM_CPU_PART_KRYO 0x200
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) +#define MIDR_CORTEX_A55 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55) #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) #define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73) --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -438,6 +438,11 @@ ENTRY(__cpu_setup) cbz x9, 2f cmp x9, #2 b.lt 1f +#ifdef CONFIG_ARM64_ERRATUM_1024718 + /* Disable hardware DBM on Cortex-A55 r0p0, r0p1 & r1p0 */ + cpu_midr_match MIDR_CORTEX_A55, MIDR_CPU_VAR_REV(0, 0), MIDR_CPU_VAR_REV(1, 0), x1, x2, x3, x4 + cbnz x1, 1f +#endif orr x10, x10, #TCR_HD // hardware Dirty flag update 1: orr x10, x10, #TCR_HA // hardware Access flag update 2:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 0a0b98734479aa5b3c671d5190e86273372cab95 upstream.
Commit 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") removed the memset() in compat_get_timex(). Since then, the compat adjtimex syscall can invoke do_adjtimex() with an uninitialized ->tai.
If do_adjtimex() doesn't write to ->tai (e.g. because the arguments are invalid), compat_put_timex() then copies the uninitialized ->tai field to userspace.
Fix it by adding the memset() back.
Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") Signed-off-by: Jann Horn jannh@google.com Acked-by: Kees Cook keescook@chromium.org Acked-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/compat.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/compat.c +++ b/kernel/compat.c @@ -34,6 +34,7 @@ int compat_get_timex(struct timex *txc, { struct compat_timex tx32;
+ memset(txc, 0, sizeof(struct timex)); if (copy_from_user(&tx32, utp, sizeof(struct compat_timex))) return -EFAULT;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Timur Tabi timur@codeaurora.org
commit ab3dbcf78f60f46d6a0ad63b1f4b690b7a427140 upstream.
If the main loop in linehandle_create() encounters an error, it unwinds completely by freeing all previously requested GPIO descriptors. However, if the error occurs in the beginning of the loop before that GPIO is requested, then the exit code attempts to free a null descriptor. If extrachecks is enabled, gpiod_free() triggers a WARN_ON.
Instead, keep a separate count of legitimate GPIOs so that only those are freed.
Cc: stable@vger.kernel.org Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines") Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Timur Tabi timur@codeaurora.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpiolib.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -443,7 +443,7 @@ static int linehandle_create(struct gpio struct gpiohandle_request handlereq; struct linehandle_state *lh; struct file *file; - int fd, i, ret; + int fd, i, count = 0, ret;
if (copy_from_user(&handlereq, ip, sizeof(handlereq))) return -EFAULT; @@ -489,6 +489,7 @@ static int linehandle_create(struct gpio if (ret) goto out_free_descs; lh->descs[i] = desc; + count = i;
if (lflags & GPIOHANDLE_REQUEST_ACTIVE_LOW) set_bit(FLAG_ACTIVE_LOW, &desc->flags); @@ -555,7 +556,7 @@ static int linehandle_create(struct gpio out_put_unused_fd: put_unused_fd(fd); out_free_descs: - for (; i >= 0; i--) + for (i = 0; i < count; i++) gpiod_free(lh->descs[i]); kfree(lh->label); out_free_lh:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Govert Overgaauw govert.overgaauw@prodrive-technologies.com
commit f241632fd087d3d9fbd5450f4d8c8604badd8348 upstream.
The unmask function disables all interrupts in a bank when unmasking an interrupt. Only disable the given interrupt.
Cc: stable@vger.kernel.org Signed-off-by: Govert Overgaauw govert.overgaauw@prodrive-technologies.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpio-aspeed.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpio/gpio-aspeed.c +++ b/drivers/gpio/gpio-aspeed.c @@ -375,7 +375,7 @@ static void aspeed_gpio_irq_set_mask(str if (set) reg |= bit; else - reg &= bit; + reg &= ~bit; iowrite32(reg, addr);
spin_unlock_irqrestore(&gpio->lock, flags);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit 4bf01ca21e2e0e4561d1a03c48c3d740418702db upstream.
Make sure to free the rfkill device in case registration fails during probe.
Fixes: 5e7ca3937fbe ("net: rfkill: gpio: convert to resource managed allocation") Cc: stable stable@vger.kernel.org # 3.13 Cc: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/rfkill/rfkill-gpio.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/rfkill/rfkill-gpio.c +++ b/net/rfkill/rfkill-gpio.c @@ -137,13 +137,18 @@ static int rfkill_gpio_probe(struct plat
ret = rfkill_register(rfkill->rfkill_dev); if (ret < 0) - return ret; + goto err_destroy;
platform_set_drvdata(pdev, rfkill);
dev_info(&pdev->dev, "%s device registered.\n", rfkill->name);
return 0; + +err_destroy: + rfkill_destroy(rfkill->rfkill_dev); + + return ret; }
static int rfkill_gpio_remove(struct platform_device *pdev)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 184add2ca23ce5edcac0ab9c3b9be13f91e7b567 upstream.
Richard Jones has reported that using med_power_with_dipm on a T450s with a Sandisk SD7UB3Q256G1001 SSD (firmware version X2180501) is causing the machine to hang.
Switching the LPM to max_performance fixes this, so it seems that this Sandisk SSD does not handle LPM well.
Note in the past there have been bug-reports about the following Sandisk models not working with min_power, so we may need to extend the quirk list in the future: name - firmware Sandisk SD6SB2M512G1022I - X210400 Sandisk SD6PP4M-256G-1006 - A200906
Cc: stable@vger.kernel.org Cc: Richard W.M. Jones rjones@redhat.com Reported-and-tested-by: Richard W.M. Jones rjones@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/ata/libata-core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4539,6 +4539,9 @@ static const struct ata_blacklist_entry ATA_HORKAGE_ZERO_AFTER_TRIM | ATA_HORKAGE_NOLPM, },
+ /* Sandisk devices which are known to not handle LPM well */ + { "SanDisk SD7UB3Q*G1001", NULL, ATA_HORKAGE_NOLPM, }, + /* devices that don't properly handle queued TRIM commands */ { "Micron_M500_*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, },
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit fc8cec113904a47396bf0a1afc62920d66319d36 upstream.
Use kvfree instead of kfree because the array is allocated with kvmalloc.
Fixes: 7eada909bfd7a ("dm: add integrity target") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -2439,7 +2439,7 @@ static void dm_integrity_free_journal_sc unsigned i; for (i = 0; i < ic->journal_sections; i++) kvfree(sl[i]); - kfree(sl); + kvfree(sl); }
static struct scatterlist **dm_integrity_alloc_journal_scatterlist(struct dm_integrity_c *ic, struct page_list *pl)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit dc432c3d7f9bceb3de6f5b44fb9c657c9810ed6d upstream.
The regex match function regex_match_front() in the tracing filter logic, was fixed to test just the pattern length from testing the entire test string. That is, it went from strncmp(str, r->pattern, len) to strcmp(str, r->pattern, r->len).
The issue is that str is not guaranteed to be nul terminated, and if r->len is greater than the length of str, it can access more memory than is allocated.
The solution is to add a simple test if (len < r->len) return 0.
Cc: stable@vger.kernel.org Fixes: 285caad415f45 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_events_filter.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -338,6 +338,9 @@ static int regex_match_full(char *str, s
static int regex_match_front(char *str, struct regex *r, int len) { + if (len < r->len) + return 0; + if (strncmp(str, r->pattern, r->len) == 0) return 1; return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Wool vitalywool@gmail.com
commit 6098d7e136692f9c6e23ae362c62ec822343e4d5 upstream.
Do not try to optimize in-page object layout while the page is under reclaim. This fixes lock-ups on reclaim and improves reclaim performance at the same time.
[akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com Signed-off-by: Vitaly Wool vitaly.vul@sony.com Reported-by: Guenter Roeck linux@roeck-us.net Tested-by: Guenter Roeck linux@roeck-us.net Cc: Oleksiy.Avramchenko@sony.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/z3fold.c | 42 ++++++++++++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 12 deletions(-)
--- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM };
/***************** @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold clear_bit(MIDDLE_CHUNK_MAPPED, &page->private); clear_bit(NEEDS_COMPACTING, &page->private); clear_bit(PAGE_STALE, &page->private); + clear_bit(UNDER_RECLAIM, &page->private);
spin_lock_init(&zhdr->page_lock); kref_init(&zhdr->refcount); @@ -748,6 +750,10 @@ static void z3fold_free(struct z3fold_po atomic64_dec(&pool->pages_nr); return; } + if (test_bit(UNDER_RECLAIM, &page->private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, &page->private)) { z3fold_page_unlock(zhdr); return; @@ -832,6 +838,8 @@ static int z3fold_reclaim_page(struct z3 kref_get(&zhdr->refcount); list_del_init(&zhdr->buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, &page->private); + break; }
list_del_init(&page->lru); @@ -879,25 +887,35 @@ static int z3fold_reclaim_page(struct z3 goto next; } next: - spin_lock(&pool->lock); if (test_bit(PAGE_HEADLESS, &page->private)) { if (ret == 0) { - spin_unlock(&pool->lock); free_z3fold_page(page); return 0; } - } else if (kref_put(&zhdr->refcount, release_z3fold_page)) { - atomic64_dec(&pool->pages_nr); + spin_lock(&pool->lock); + list_add(&page->lru, &pool->lru); + spin_unlock(&pool->lock); + } else { + z3fold_page_lock(zhdr); + clear_bit(UNDER_RECLAIM, &page->private); + if (kref_put(&zhdr->refcount, + release_z3fold_page_locked)) { + atomic64_dec(&pool->pages_nr); + return 0; + } + /* + * if we are here, the page is still not completely + * free. Take the global pool lock then to be able + * to add it back to the lru list + */ + spin_lock(&pool->lock); + list_add(&page->lru, &pool->lru); spin_unlock(&pool->lock); - return 0; + z3fold_page_unlock(zhdr); }
- /* - * Add to the beginning of LRU. - * Pool lock has to be kept here to ensure the page has - * not already been released - */ - list_add(&page->lru, &pool->lru); + /* We started off locked to we need to lock the pool back */ + spin_lock(&pool->lock); } spin_unlock(&pool->lock); return -EAGAIN;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Tatashin pasha.tatashin@oracle.com
commit 27227c733852f71008e9bf165950bb2edaed3a90 upstream.
Memory hotplug and hotremove operate with per-block granularity. If the machine has a large amount of memory (more than 64G), the size of a memory block can span multiple sections. By mistake, during hotremove we set only the first section to offline state.
The bug was discovered because kernel selftest started to fail: https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is older than this commit. In this optimization we also added a check for sections to be in a proper state during hotplug operation.
Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes") Signed-off-by: Pavel Tatashin pasha.tatashin@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Steven Sistare steven.sistare@oracle.com Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/sparse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/sparse.c +++ b/mm/sparse.c @@ -661,7 +661,7 @@ void offline_mem_sections(unsigned long unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { - unsigned long section_nr = pfn_to_section_nr(start_pfn); + unsigned long section_nr = pfn_to_section_nr(pfn); struct mem_section *ms;
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rientjes rientjes@google.com
commit 27ae357fa82be5ab73b2ef8d39dcb8ca2563483a upstream.
Since exit_mmap() is done without the protection of mm->mmap_sem, it is possible for the oom reaper to concurrently operate on an mm until MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom reaper is operating on a vma. Since munlock_vma_pages_range() depends on clearing VM_LOCKED from vm_flags before actually doing the munlock to determine if any other vmas are locking the same memory, the check for VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd is zapped by the oom reaper during follow_page_mask() after the check for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a kernel oops.
Fix this by manually freeing all possible memory from the mm before doing the munlock and then setting MMF_OOM_SKIP. The oom reaper can not run on the mm anymore so the munlock is safe to do in exit_mmap(). It also matches the logic that the oom reaper currently uses for determining when to set MMF_OOM_SKIP itself, so there's no new risk of excessive oom killing.
This issue fixes CVE-2018-1000200.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp... Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: David Rientjes rientjes@google.com Suggested-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Acked-by: Michal Hocko mhocko@suse.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: stable@vger.kernel.org [4.14+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/oom.h | 2 + mm/mmap.c | 44 ++++++++++++++++++------------ mm/oom_kill.c | 74 ++++++++++++++++++++++++++++------------------------ 3 files changed, 68 insertions(+), 52 deletions(-)
--- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -95,6 +95,8 @@ static inline int check_stable_address_s return 0; }
+void __oom_reap_task_mm(struct mm_struct *mm); + extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages); --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2982,6 +2982,32 @@ void exit_mmap(struct mm_struct *mm) /* mm's last user has gone, and its about to be pulled down */ mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) { + /* + * Manually reap the mm to free as much memory as possible. + * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard + * this mm from further consideration. Taking mm->mmap_sem for + * write after setting MMF_OOM_SKIP will guarantee that the oom + * reaper will not run on this mm again after mmap_sem is + * dropped. + * + * Nothing can be holding mm->mmap_sem here and the above call + * to mmu_notifier_release(mm) ensures mmu notifier callbacks in + * __oom_reap_task_mm() will not block. + * + * This needs to be done before calling munlock_vma_pages_all(), + * which clears VM_LOCKED, otherwise the oom reaper cannot + * reliably test it. + */ + mutex_lock(&oom_lock); + __oom_reap_task_mm(mm); + mutex_unlock(&oom_lock); + + set_bit(MMF_OOM_SKIP, &mm->flags); + down_write(&mm->mmap_sem); + up_write(&mm->mmap_sem); + } + if (mm->locked_vm) { vma = mm->mmap; while (vma) { @@ -3003,24 +3029,6 @@ void exit_mmap(struct mm_struct *mm) /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); - - if (unlikely(mm_is_oom_victim(mm))) { - /* - * Wait for oom_reap_task() to stop working on this - * mm. Because MMF_OOM_SKIP is already set before - * calling down_read(), oom_reap_task() will not run - * on this "mm" post up_write(). - * - * mm_is_oom_victim() cannot be set from under us - * either because victim->mm is already set to NULL - * under task_lock before calling mmput and oom_mm is - * set not NULL by the OOM killer only if victim->mm - * is found not NULL while holding the task_lock. - */ - set_bit(MMF_OOM_SKIP, &mm->flags); - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); - } free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1);
--- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -456,7 +456,6 @@ bool process_shares_mm(struct task_struc return false; }
- #ifdef CONFIG_MMU /* * OOM Reaper kernel thread which tries to reap the memory used by the OOM @@ -467,16 +466,51 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock);
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +void __oom_reap_task_mm(struct mm_struct *mm) { - struct mmu_gather tlb; struct vm_area_struct *vma; + + /* + * Tell all users of get_user/copy_from_user etc... that the content + * is no longer stable. No barriers really needed because unmapping + * should imply barriers already and the reader would hit a page fault + * if it stumbled over a reaped memory. + */ + set_bit(MMF_UNSTABLE, &mm->flags); + + for (vma = mm->mmap ; vma; vma = vma->vm_next) { + if (!can_madv_dontneed_vma(vma)) + continue; + + /* + * Only anonymous pages have a good chance to be dropped + * without additional steps which we cannot afford as we + * are OOM already. + * + * We do not even care about fs backed pages because all + * which are reclaimable have already been reclaimed and + * we do not want to block exit_mmap by keeping mm ref + * count elevated without a good reason. + */ + if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { + struct mmu_gather tlb; + + tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end); + unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end, + NULL); + tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end); + } + } +} + +static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +{ bool ret = true;
/* * We have to make sure to not race with the victim exit path * and cause premature new oom victim selection: - * __oom_reap_task_mm exit_mm + * oom_reap_task_mm exit_mm * mmget_not_zero * mmput * atomic_dec_and_test @@ -524,35 +558,8 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /* - * Tell all users of get_user/copy_from_user etc... that the content - * is no longer stable. No barriers really needed because unmapping - * should imply barriers already and the reader would hit a page fault - * if it stumbled over a reaped memory. - */ - set_bit(MMF_UNSTABLE, &mm->flags); - - for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) - continue; + __oom_reap_task_mm(mm);
- /* - * Only anonymous pages have a good chance to be dropped - * without additional steps which we cannot afford as we - * are OOM already. - * - * We do not even care about fs backed pages because all - * which are reclaimable have already been reclaimed and - * we do not want to block exit_mmap by keeping mm ref - * count elevated without a good reason. - */ - if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { - tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end); - unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end, - NULL); - tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end); - } - } pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)), @@ -573,13 +580,12 @@ static void oom_reap_task(struct task_st struct mm_struct *mm = tsk->signal->oom_mm;
/* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm)) schedule_timeout_idle(HZ/10);
if (attempts <= MAX_OOM_REAP_RETRIES) goto done;
- pr_info("oom_reaper: unable to reap pid:%d (%s)\n", task_pid_nr(tsk), tsk->comm); debug_show_all_locks();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit 3a15b38fd2efc1d648cb33186bf71e9138c93491 upstream.
rsize/wsize cap should be applied before ceph_osdc_new_request() is called. Otherwise, if the size is limited by the cap instead of the stripe unit, ceph_osdc_new_request() would setup an extent op that is bigger than what dio_get_pages_alloc() would pin and add to the page vector, triggering asserts in the messenger.
Cc: stable@vger.kernel.org Fixes: 95cca2b44e54 ("ceph: limit osd write size") Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: "Yan, Zheng" zyan@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/file.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -873,6 +873,11 @@ ceph_direct_read_write(struct kiocb *ioc size_t start = 0; ssize_t len;
+ if (write) + size = min_t(u64, size, fsc->mount_options->wsize); + else + size = min_t(u64, size, fsc->mount_options->rsize); + vino = ceph_vino(inode); req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout, vino, pos, &size, 0, @@ -888,11 +893,6 @@ ceph_direct_read_write(struct kiocb *ioc break; }
- if (write) - size = min_t(u64, size, fsc->mount_options->wsize); - else - size = min_t(u64, size, fsc->mount_options->rsize); - len = size; pages = dio_get_pages_alloc(iter, len, &start, &num_pages); if (IS_ERR(pages)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jimmy Assarsson extja@kvaser.com
commit 6ee00865ffe4e8c8ba4a68d26db53c7ec09bbb89 upstream.
Increase rx_dropped, if alloc_can_skb() fails, not tx_dropped.
Signed-off-by: Jimmy Assarsson extja@kvaser.com Cc: linux-stable stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/usb/kvaser_usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -1179,7 +1179,7 @@ static void kvaser_usb_rx_can_msg(const
skb = alloc_can_skb(priv->netdev, &cf); if (!skb) { - stats->tx_dropped++; + stats->rx_dropped++; return; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Wunner lukas@wunner.de
commit 5cec9425b41dcf834c3d48776900d6acb7e96f38 upstream.
hi3110_get_berr_counter() may run concurrently to the rest of the driver but neglects to acquire the lock protecting access to the SPI device. As a result, it and the rest of the driver may clobber each other's tx and rx buffers.
We became aware of this issue because transmission of packets with "cangen -g 0 -i -x" frequently hung. It turns out that agetty executes ->do_get_berr_counter every few seconds via the following call stack:
CPU: 2 PID: 1605 Comm: agetty [<7f3f7500>] (hi3110_get_berr_counter [hi311x]) [<7f130204>] (can_fill_info [can_dev]) [<80693bc0>] (rtnl_fill_ifinfo) [<806949ec>] (rtnl_dump_ifinfo) [<806b4834>] (netlink_dump) [<806b4bc8>] (netlink_recvmsg) [<8065f180>] (sock_recvmsg) [<80660f90>] (___sys_recvmsg) [<80661e7c>] (__sys_recvmsg) [<80661ec0>] (SyS_recvmsg) [<80108b20>] (ret_fast_syscall+0x0/0x1c)
agetty listens to netlink messages in order to update the login prompt when IP addresses change (if /etc/issue contains \4 or \6 escape codes): https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=e3...
It's a useful feature, though it seems questionable that it causes CAN bit error statistics to be queried.
Be that as it may, if hi3110_get_berr_counter() is invoked while a frame is sent by hi3110_hw_tx(), bogus SPI transfers like the following may occur:
=> 12 00 (hi3110_get_berr_counter() wanted to transmit EC 00 to query the transmit error counter, but the first byte was overwritten by hi3110_hw_tx_frame())
=> EA 00 3E 80 01 FB (hi3110_hw_tx_frame() wanted to transmit a frame, but the first byte was overwritten by hi3110_get_berr_counter() because it wanted to query the receive error counter)
This sequence hangs the transmission because the driver believes it has sent a frame and waits for the interrupt signaling completion, but in reality the chip has never sent away the frame since the commands it received were malformed.
Fix by acquiring the SPI lock in hi3110_get_berr_counter().
I've scrutinized the entire driver for further unlocked SPI accesses but found no others.
Cc: Mathias Duckeck m.duckeck@kunbus.de Cc: Akshay Bhat akshay.bhat@timesys.com Cc: Casey Fitzpatrick casey.fitzpatrick@timesys.com Cc: Stef Walter stefw@redhat.com Cc: Karel Zak kzak@redhat.com Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Lukas Wunner lukas@wunner.de Reviewed-by: Akshay Bhat akshay.bhat@timesys.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/spi/hi311x.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/can/spi/hi311x.c +++ b/drivers/net/can/spi/hi311x.c @@ -427,8 +427,10 @@ static int hi3110_get_berr_counter(const struct hi3110_priv *priv = netdev_priv(net); struct spi_device *spi = priv->spi;
+ mutex_lock(&priv->hi3110_lock); bec->txerr = hi3110_read(spi, HI3110_READ_TEC); bec->rxerr = hi3110_read(spi, HI3110_READ_REC); + mutex_unlock(&priv->hi3110_lock);
return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Wunner lukas@wunner.de
commit 32bee8f48fa048a3198109de50e51c092507ff52 upstream.
When sending packets as fast as possible using "cangen -g 0 -i -x", the HI-3110 occasionally latches the interrupt pin high on completion of a packet, but doesn't set the TXCPLT bit in the INTF register. The INTF register contains 0x00 as if no interrupt has occurred. Even waiting for a few milliseconds after the interrupt doesn't help.
Work around this apparent erratum by instead checking the TXMTY bit in the STATF register ("TX FIFO empty"). We know that we've queued up a packet for transmission if priv->tx_len is nonzero. If the TX FIFO is empty, transmission of that packet must have completed.
Note that this is congruent with our handling of received packets, which likewise gleans from the STATF register whether a packet is waiting in the RX FIFO, instead of looking at the INTF register.
Cc: Mathias Duckeck m.duckeck@kunbus.de Cc: Akshay Bhat akshay.bhat@timesys.com Cc: Casey Fitzpatrick casey.fitzpatrick@timesys.com Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Lukas Wunner lukas@wunner.de Acked-by: Akshay Bhat akshay.bhat@timesys.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/spi/hi311x.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/net/can/spi/hi311x.c +++ b/drivers/net/can/spi/hi311x.c @@ -91,6 +91,7 @@ #define HI3110_STAT_BUSOFF BIT(2) #define HI3110_STAT_ERRP BIT(3) #define HI3110_STAT_ERRW BIT(4) +#define HI3110_STAT_TXMTY BIT(7)
#define HI3110_BTR0_SJW_SHIFT 6 #define HI3110_BTR0_BRP_SHIFT 0 @@ -737,10 +738,7 @@ static irqreturn_t hi3110_can_ist(int ir } }
- if (intf == 0) - break; - - if (intf & HI3110_INT_TXCPLT) { + if (priv->tx_len && statf & HI3110_STAT_TXMTY) { net->stats.tx_packets++; net->stats.tx_bytes += priv->tx_len - 1; can_led_event(net, CAN_LED_EVENT_TX); @@ -750,6 +748,9 @@ static irqreturn_t hi3110_can_ist(int ir } netif_wake_queue(net); } + + if (intf == 0) + break; } mutex_unlock(&priv->hi3110_lock); return IRQ_HANDLED;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Brezillon boris.brezillon@bootlin.com
commit 9a0e9802217291e54c4dd1fc5462f189a4be14ec upstream.
When using uni-planar formats (like RGB), the scaling parameters are stored in plane 0, not plane 1.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Reviewed-by: Eric Anholt eric@anholt.net Link: https://patchwork.freedesktop.org/patch/msgid/20180507121303.5610-1-boris.br... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vc4/vc4_plane.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/vc4/vc4_plane.c +++ b/drivers/gpu/drm/vc4/vc4_plane.c @@ -535,7 +535,7 @@ static int vc4_plane_mode_set(struct drm * the scl fields here. */ if (num_planes == 1) { - scl0 = vc4_get_scl_field(state, 1); + scl0 = vc4_get_scl_field(state, 0); scl1 = scl0; } else { scl0 = vc4_get_scl_field(state, 1);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lyude Paul lyude@redhat.com
commit 352672db857290ab5b0e2b6a99c414f92bee024c upstream.
Currently; we're grabbing all of the modesetting locks before adding MST connectors to fbdev. This isn't actually necessary, and causes a deadlock as well:
====================================================== WARNING: possible circular locking dependency detected 4.17.0-rc3Lyude-Test+ #1 Tainted: G O ------------------------------------------------------ kworker/1:0/18 is trying to acquire lock: 00000000c832f62d (&helper->lock){+.+.}, at: drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper]
but task is already holding lock: 00000000942e28e2 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8e/0x1c0 [drm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (crtc_ww_class_mutex){+.+.}: ww_mutex_lock+0x43/0x80 drm_modeset_lock+0x71/0x130 [drm] drm_helper_probe_single_connector_modes+0x7d/0x6b0 [drm_kms_helper] drm_setup_crtcs+0x15e/0xc90 [drm_kms_helper] __drm_fb_helper_initial_config_and_unlock+0x29/0x480 [drm_kms_helper] nouveau_fbcon_init+0x138/0x1a0 [nouveau] nouveau_drm_load+0x173/0x7e0 [nouveau] drm_dev_register+0x134/0x1c0 [drm] drm_get_pci_dev+0x8e/0x160 [drm] nouveau_drm_probe+0x1a9/0x230 [nouveau] pci_device_probe+0xcd/0x150 driver_probe_device+0x30b/0x480 __driver_attach+0xbc/0xe0 bus_for_each_dev+0x67/0x90 bus_add_driver+0x164/0x260 driver_register+0x57/0xc0 do_one_initcall+0x4d/0x323 do_init_module+0x5b/0x1f8 load_module+0x20e5/0x2ac0 __do_sys_finit_module+0xb7/0xd0 do_syscall_64+0x60/0x1b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #2 (crtc_ww_class_acquire){+.+.}: drm_helper_probe_single_connector_modes+0x58/0x6b0 [drm_kms_helper] drm_setup_crtcs+0x15e/0xc90 [drm_kms_helper] __drm_fb_helper_initial_config_and_unlock+0x29/0x480 [drm_kms_helper] nouveau_fbcon_init+0x138/0x1a0 [nouveau] nouveau_drm_load+0x173/0x7e0 [nouveau] drm_dev_register+0x134/0x1c0 [drm] drm_get_pci_dev+0x8e/0x160 [drm] nouveau_drm_probe+0x1a9/0x230 [nouveau] pci_device_probe+0xcd/0x150 driver_probe_device+0x30b/0x480 __driver_attach+0xbc/0xe0 bus_for_each_dev+0x67/0x90 bus_add_driver+0x164/0x260 driver_register+0x57/0xc0 do_one_initcall+0x4d/0x323 do_init_module+0x5b/0x1f8 load_module+0x20e5/0x2ac0 __do_sys_finit_module+0xb7/0xd0 do_syscall_64+0x60/0x1b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #1 (&dev->mode_config.mutex){+.+.}: drm_setup_crtcs+0x10c/0xc90 [drm_kms_helper] __drm_fb_helper_initial_config_and_unlock+0x29/0x480 [drm_kms_helper] nouveau_fbcon_init+0x138/0x1a0 [nouveau] nouveau_drm_load+0x173/0x7e0 [nouveau] drm_dev_register+0x134/0x1c0 [drm] drm_get_pci_dev+0x8e/0x160 [drm] nouveau_drm_probe+0x1a9/0x230 [nouveau] pci_device_probe+0xcd/0x150 driver_probe_device+0x30b/0x480 __driver_attach+0xbc/0xe0 bus_for_each_dev+0x67/0x90 bus_add_driver+0x164/0x260 driver_register+0x57/0xc0 do_one_initcall+0x4d/0x323 do_init_module+0x5b/0x1f8 load_module+0x20e5/0x2ac0 __do_sys_finit_module+0xb7/0xd0 do_syscall_64+0x60/0x1b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (&helper->lock){+.+.}: __mutex_lock+0x70/0x9d0 drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper] nv50_mstm_register_connector+0x2c/0x50 [nouveau] drm_dp_add_port+0x2f5/0x420 [drm_kms_helper] drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper] drm_dp_add_port+0x33f/0x420 [drm_kms_helper] drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper] drm_dp_check_and_send_link_address+0x87/0xd0 [drm_kms_helper] drm_dp_mst_link_probe_work+0x4d/0x80 [drm_kms_helper] process_one_work+0x20d/0x650 worker_thread+0x3a/0x390 kthread+0x11e/0x140 ret_from_fork+0x3a/0x50
other info that might help us debug this: Chain exists of: &helper->lock --> crtc_ww_class_acquire --> crtc_ww_class_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(crtc_ww_class_mutex); lock(crtc_ww_class_acquire); lock(crtc_ww_class_mutex); lock(&helper->lock);
*** DEADLOCK *** 5 locks held by kworker/1:0/18: #0: 000000004a05cd50 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x187/0x650 #1: 00000000601c11d1 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x187/0x650 #2: 00000000586ca0df (&dev->mode_config.mutex){+.+.}, at: drm_modeset_lock_all+0x3a/0x1b0 [drm] #3: 00000000d3ca0ffa (crtc_ww_class_acquire){+.+.}, at: drm_modeset_lock_all+0x44/0x1b0 [drm] #4: 00000000942e28e2 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8e/0x1c0 [drm]
stack backtrace: CPU: 1 PID: 18 Comm: kworker/1:0 Tainted: G O 4.17.0-rc3Lyude-Test+ #1 Hardware name: Gateway FX6840/FX6840, BIOS P01-A3 05/17/2010 Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper] Call Trace: dump_stack+0x85/0xcb print_circular_bug.isra.38+0x1ce/0x1db __lock_acquire+0x128f/0x1350 ? lock_acquire+0x9f/0x200 ? lock_acquire+0x9f/0x200 ? __ww_mutex_lock.constprop.13+0x8f/0x1000 lock_acquire+0x9f/0x200 ? drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper] ? drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper] __mutex_lock+0x70/0x9d0 ? drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper] ? ww_mutex_lock+0x43/0x80 ? _cond_resched+0x15/0x30 ? ww_mutex_lock+0x43/0x80 ? drm_modeset_lock+0xb2/0x130 [drm] ? drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper] drm_fb_helper_add_one_connector+0x2a/0x60 [drm_kms_helper] nv50_mstm_register_connector+0x2c/0x50 [nouveau] drm_dp_add_port+0x2f5/0x420 [drm_kms_helper] ? mark_held_locks+0x50/0x80 ? kfree+0xcf/0x2a0 ? drm_dp_check_mstb_guid+0xd6/0x120 [drm_kms_helper] ? trace_hardirqs_on_caller+0xed/0x180 ? drm_dp_check_mstb_guid+0xd6/0x120 [drm_kms_helper] drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper] drm_dp_add_port+0x33f/0x420 [drm_kms_helper] ? nouveau_connector_aux_xfer+0x7c/0xb0 [nouveau] ? find_held_lock+0x2d/0x90 ? drm_dp_dpcd_access+0xd9/0xf0 [drm_kms_helper] ? __mutex_unlock_slowpath+0x3b/0x280 ? drm_dp_dpcd_access+0xd9/0xf0 [drm_kms_helper] drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper] drm_dp_check_and_send_link_address+0x87/0xd0 [drm_kms_helper] drm_dp_mst_link_probe_work+0x4d/0x80 [drm_kms_helper] process_one_work+0x20d/0x650 worker_thread+0x3a/0x390 ? process_one_work+0x650/0x650 kthread+0x11e/0x140 ? kthread_create_worker_on_cpu+0x50/0x50 ret_from_fork+0x3a/0x50
Taking example from i915, the only time we need to hold any modesetting locks is when changing the port on the mstc, and in that case we only need to hold the connection mutex.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Karol Herbst kherbst@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Lyude Paul lyude@redhat.com Signed-off-by: Ben Skeggs bskeggs@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/nouveau/nv50_display.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -3216,10 +3216,11 @@ nv50_mstm_destroy_connector(struct drm_d
drm_connector_unregister(&mstc->connector);
- drm_modeset_lock_all(drm->dev); drm_fb_helper_remove_one_connector(&drm->fbcon->helper, &mstc->connector); + + drm_modeset_lock(&drm->dev->mode_config.connection_mutex, NULL); mstc->port = NULL; - drm_modeset_unlock_all(drm->dev); + drm_modeset_unlock(&drm->dev->mode_config.connection_mutex);
drm_connector_unreference(&mstc->connector); } @@ -3229,9 +3230,7 @@ nv50_mstm_register_connector(struct drm_ { struct nouveau_drm *drm = nouveau_drm(connector->dev);
- drm_modeset_lock_all(drm->dev); drm_fb_helper_add_one_connector(&drm->fbcon->helper, connector); - drm_modeset_unlock_all(drm->dev);
drm_connector_register(connector); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit acf784bd0ce257fe43da7ca266f7a10b837479d2 upstream.
ioc_data.dev_num can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch: net/atm/lec.c:702 lec_vcc_attach() warn: potential spectre issue 'dev_lec'
Fix this by sanitizing ioc_data.dev_num before using it to index dev_lec. Also, notice that there is another instance in which array dev_lec is being indexed using ioc_data.dev_num at line 705: lec_vcc_added(netdev_priv(dev_lec[ioc_data.dev_num]),
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/atm/lec.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/atm/lec.c +++ b/net/atm/lec.c @@ -41,6 +41,9 @@ static unsigned char bridge_ula_lec[] = #include <linux/module.h> #include <linux/init.h>
+/* Hardening for Spectre-v1 */ +#include <linux/nospec.h> + #include "lec.h" #include "lec_arpc.h" #include "resources.h" @@ -687,8 +690,10 @@ static int lec_vcc_attach(struct atm_vcc bytes_left = copy_from_user(&ioc_data, arg, sizeof(struct atmlec_ioc)); if (bytes_left != 0) pr_info("copy from user failed for %d bytes\n", bytes_left); - if (ioc_data.dev_num < 0 || ioc_data.dev_num >= MAX_LEC_ITF || - !dev_lec[ioc_data.dev_num]) + if (ioc_data.dev_num < 0 || ioc_data.dev_num >= MAX_LEC_ITF) + return -EINVAL; + ioc_data.dev_num = array_index_nospec(ioc_data.dev_num, MAX_LEC_ITF); + if (!dev_lec[ioc_data.dev_num]) return -EINVAL; vpriv = kmalloc(sizeof(struct lec_vcc_priv), GFP_KERNEL); if (!vpriv)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 2be147f7459db5bbf292e0a6f135037b55e20b39 upstream.
pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/zatm.c:1462 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap)
Fix this by sanitizing pool before using it to index zatm_dev->pool_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/atm/zatm.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/atm/zatm.c +++ b/drivers/atm/zatm.c @@ -28,6 +28,7 @@ #include <asm/io.h> #include <linux/atomic.h> #include <linux/uaccess.h> +#include <linux/nospec.h>
#include "uPD98401.h" #include "uPD98402.h" @@ -1458,6 +1459,8 @@ static int zatm_ioctl(struct atm_dev *de return -EFAULT; if (pool < 0 || pool > ZATM_LAST_POOL) return -EINVAL; + pool = array_index_nospec(pool, + ZATM_LAST_POOL + 1); spin_lock_irqsave(&zatm_dev->lock, flags); info = zatm_dev->pool_info[pool]; if (cmd == ZATM_GETPOOLZ) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai Heng Feng kai.heng.feng@canonical.com
commit 8feaec33b9868582654cd3d5355225dcb79aeca6 upstream.
USB controller ASM1042 stops working after commit de3ef1eb1cd0 (PM / core: Drop run_wake flag from struct dev_pm_info).
The device in question is not power managed by platform firmware, furthermore, it only supports PME# from D3cold: Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Before commit de3ef1eb1cd0, the device never gets runtime suspended. After that commit, the device gets runtime suspended to D3hot, which can not generate any PME#.
usb_hcd_pci_probe() unconditionally calls device_wakeup_enable(), hence device_can_wakeup() in pci_dev_run_wake() always returns true.
So pci_dev_run_wake() needs to check PME wakeup capability as its first condition.
In addition, change wakeup flag passed to pci_target_state() from false to true, because we want to find the deepest state different from D3cold that the device can still generate PME#. In this case, it's D0 for the device in question.
Fixes: de3ef1eb1cd0 (PM / core: Drop run_wake flag from struct dev_pm_info) Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Cc: 4.13+ stable@vger.kernel.org # 4.13+ Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2120,16 +2120,16 @@ bool pci_dev_run_wake(struct pci_dev *de { struct pci_bus *bus = dev->bus;
- if (device_can_wakeup(&dev->dev)) - return true; - if (!dev->pme_support) return false;
/* PME-capable in principle, but not from the target power state */ - if (!pci_pme_capable(dev, pci_target_state(dev, false))) + if (!pci_pme_capable(dev, pci_target_state(dev, true))) return false;
+ if (device_can_wakeup(&dev->dev)) + return true; + while (bus->parent) { struct pci_dev *bridge = bus->self;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit cfcadfaad7251d8b640713724b388164d75465b2 upstream.
Commit 0847684cfc5f0 (PCI / PM: Simplify device wakeup settings code) went too far and dropped the device_may_wakeup() check from pci_enable_wake() which causes wakeup to be enabled during system suspend, hibernation or shutdown for some PCI devices that are not allowed by user space to wake up the system from sleep (or power off).
As a result of this, excessive power is drawn by some of the affected systems while in sleep states or off.
Restore the device_may_wakeup() check in pci_enable_wake(), but make sure that the PCI bus type's runtime suspend callback will not call device_may_wakeup() which is about system wakeup from sleep and not about device wakeup from runtime suspend.
Fixes: 0847684cfc5f0 (PCI / PM: Simplify device wakeup settings code) Reported-by: Joseph Salisbury joseph.salisbury@canonical.com Cc: 4.13+ stable@vger.kernel.org # 4.13+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1892,7 +1892,7 @@ void pci_pme_active(struct pci_dev *dev, EXPORT_SYMBOL(pci_pme_active);
/** - * pci_enable_wake - enable PCI device as wakeup event source + * __pci_enable_wake - enable PCI device as wakeup event source * @dev: PCI device affected * @state: PCI state from which device will issue wakeup events * @enable: True to enable event generation; false to disable @@ -1910,7 +1910,7 @@ EXPORT_SYMBOL(pci_pme_active); * Error code depending on the platform is returned if both the platform and * the native mechanism fail to enable the generation of wake-up events */ -int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable) +static int __pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable) { int ret = 0;
@@ -1951,6 +1951,23 @@ int pci_enable_wake(struct pci_dev *dev,
return ret; } + +/** + * pci_enable_wake - change wakeup settings for a PCI device + * @pci_dev: Target device + * @state: PCI state from which device will issue wakeup events + * @enable: Whether or not to enable event generation + * + * If @enable is set, check device_may_wakeup() for the device before calling + * __pci_enable_wake() for it. + */ +int pci_enable_wake(struct pci_dev *pci_dev, pci_power_t state, bool enable) +{ + if (enable && !device_may_wakeup(&pci_dev->dev)) + return -EINVAL; + + return __pci_enable_wake(pci_dev, state, enable); +} EXPORT_SYMBOL(pci_enable_wake);
/** @@ -1963,9 +1980,9 @@ EXPORT_SYMBOL(pci_enable_wake); * should not be called twice in a row to enable wake-up due to PCI PM vs ACPI * ordering constraints. * - * This function only returns error code if the device is not capable of - * generating PME# from both D3_hot and D3_cold, and the platform is unable to - * enable wake-up power for it. + * This function only returns error code if the device is not allowed to wake + * up the system from sleep or it is not capable of generating PME# from both + * D3_hot and D3_cold and the platform is unable to enable wake-up power for it. */ int pci_wake_from_d3(struct pci_dev *dev, bool enable) { @@ -2096,7 +2113,7 @@ int pci_finish_runtime_suspend(struct pc
dev->runtime_d3cold = target_state == PCI_D3cold;
- pci_enable_wake(dev, target_state, pci_dev_run_wake(dev)); + __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
error = pci_set_power_state(dev, target_state);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 97739501f207efe33145b918817f305b822987f8 upstream.
If the next_freq field of struct sugov_policy is set to UINT_MAX, it shouldn't be used for updating the CPU frequency (this is a special "invalid" value), but after commit b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) it may be passed as the new frequency to sugov_update_commit() in sugov_update_single().
Fix that by adding an extra check for the special UINT_MAX value of next_freq to sugov_update_single().
Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) Reported-by: Viresh Kumar viresh.kumar@linaro.org Cc: 4.12+ stable@vger.kernel.org # 4.12+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/cpufreq_schedutil.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -282,7 +282,8 @@ static void sugov_update_single(struct u * Do not reduce the frequency if the CPU has not been idle * recently, as the reduction is likely to be premature then. */ - if (busy && next_f < sg_policy->next_freq) { + if (busy && next_f < sg_policy->next_freq && + sg_policy->next_freq != UINT_MAX) { next_f = sg_policy->next_freq;
/* Reset cached freq as next_freq has changed */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 544a591668813583021474fa5c7ff4942244d654 upstream.
Commit f44cb4b19ed4 ("Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174") is causing bluetooth to no longer work for several people, see: https://bugzilla.redhat.com/show_bug.cgi?id=1568911
So lets revert it for now and try to find another solution for devices which need the modified quirk.
Cc: stable@vger.kernel.org Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bluetooth/btusb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -235,6 +235,7 @@ static const struct usb_device_id blackl { USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x0036), .driver_info = BTUSB_ATH3012 }, + { USB_DEVICE(0x0cf3, 0x3004), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x3008), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x311d), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0x311e), .driver_info = BTUSB_ATH3012 }, @@ -267,7 +268,6 @@ static const struct usb_device_id blackl { USB_DEVICE(0x0489, 0xe03c), .driver_info = BTUSB_ATH3012 },
/* QCA ROME chipset */ - { USB_DEVICE(0x0cf3, 0x3004), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe007), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe009), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe300), .driver_info = BTUSB_QCA_ROME },
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 596b07a9a22656493726edf1739569102bd3e136 upstream.
The Dell XPS 13 9360 uses a QCA Rome chip which needs to be reset (and have its firmware reloaded) for bluetooth to work after suspend/resume.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836 Cc: stable@vger.kernel.org Cc: Garrett LeSage glesage@redhat.com Reported-and-tested-by: Garrett LeSage glesage@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bluetooth/btusb.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -395,6 +395,13 @@ static const struct dmi_system_id btusb_ DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 3060"), }, }, + { + /* Dell XPS 9360 (QCA ROME device 0cf3:e300) */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "XPS 13 9360"), + }, + }, {} };
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit fc54910280eb38bde923cdf0898e74687d8e6989 upstream.
Jeremy Cline correctly points out in rhbz#1514836 that a device where the QCA rome chipset needs the USB_QUIRK_RESET_RESUME quirk, may also ship with a different wifi/bt chipset in some configurations.
If that is the case then we are needlessly penalizing those other chipsets with a reset-resume quirk, typically causing 0.4W extra power use because this disables runtime-pm.
This commit moves the DMI table check to a btusb_check_needs_reset_resume() helper (so that we can easily also call it for other chipsets) and calls this new helper only for QCA_ROME chipsets for now.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836 Cc: stable@vger.kernel.org Cc: Jeremy Cline jcline@redhat.com Suggested-by: Jeremy Cline jcline@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bluetooth/btusb.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -2902,6 +2902,12 @@ static int btusb_config_oob_wake(struct } #endif
+static void btusb_check_needs_reset_resume(struct usb_interface *intf) +{ + if (dmi_check_system(btusb_needs_reset_resume_table)) + interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME; +} + static int btusb_probe(struct usb_interface *intf, const struct usb_device_id *id) { @@ -3037,9 +3043,6 @@ static int btusb_probe(struct usb_interf hdev->send = btusb_send_frame; hdev->notify = btusb_notify;
- if (dmi_check_system(btusb_needs_reset_resume_table)) - interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME; - #ifdef CONFIG_PM err = btusb_config_oob_wake(hdev); if (err) @@ -3177,6 +3180,7 @@ static int btusb_probe(struct usb_interf hdev->setup = btusb_setup_csr;
set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, &hdev->quirks); + btusb_check_needs_reset_resume(intf); }
if (id->driver_info & BTUSB_SNIFFER) {
+ Guenter
On Mon, May 14, 2018 at 08:49:05AM +0200, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this backport is wrong. See below.
From: Hans de Goede hdegoede@redhat.com
commit fc54910280eb38bde923cdf0898e74687d8e6989 upstream.
Jeremy Cline correctly points out in rhbz#1514836 that a device where the QCA rome chipset needs the USB_QUIRK_RESET_RESUME quirk, may also ship with a different wifi/bt chipset in some configurations.
If that is the case then we are needlessly penalizing those other chipsets with a reset-resume quirk, typically causing 0.4W extra power use because this disables runtime-pm.
This commit moves the DMI table check to a btusb_check_needs_reset_resume() helper (so that we can easily also call it for other chipsets) and calls this new helper only for QCA_ROME chipsets for now.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836 Cc: stable@vger.kernel.org Cc: Jeremy Cline jcline@redhat.com Suggested-by: Jeremy Cline jcline@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/bluetooth/btusb.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -2902,6 +2902,12 @@ static int btusb_config_oob_wake(struct } #endif +static void btusb_check_needs_reset_resume(struct usb_interface *intf) +{
- if (dmi_check_system(btusb_needs_reset_resume_table))
interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME;
+}
static int btusb_probe(struct usb_interface *intf, const struct usb_device_id *id) { @@ -3037,9 +3043,6 @@ static int btusb_probe(struct usb_interf hdev->send = btusb_send_frame; hdev->notify = btusb_notify;
- if (dmi_check_system(btusb_needs_reset_resume_table))
interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME;
#ifdef CONFIG_PM err = btusb_config_oob_wake(hdev); if (err) @@ -3177,6 +3180,7 @@ static int btusb_probe(struct usb_interf hdev->setup = btusb_setup_csr; set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, &hdev->quirks);
btusb_check_needs_reset_resume(intf);
The original code puts this under the BTUSB_QCA_ROME section, but this is getting placed under the BTUSB_CSR section.
Brian
} if (id->driver_info & BTUSB_SNIFFER) {
On Tue, May 15, 2018 at 10:07:02AM -0700, Brian Norris wrote:
- Guenter
On Mon, May 14, 2018 at 08:49:05AM +0200, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this backport is wrong. See below.
From: Hans de Goede hdegoede@redhat.com
commit fc54910280eb38bde923cdf0898e74687d8e6989 upstream.
Jeremy Cline correctly points out in rhbz#1514836 that a device where the QCA rome chipset needs the USB_QUIRK_RESET_RESUME quirk, may also ship with a different wifi/bt chipset in some configurations.
If that is the case then we are needlessly penalizing those other chipsets with a reset-resume quirk, typically causing 0.4W extra power use because this disables runtime-pm.
This commit moves the DMI table check to a btusb_check_needs_reset_resume() helper (so that we can easily also call it for other chipsets) and calls this new helper only for QCA_ROME chipsets for now.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836 Cc: stable@vger.kernel.org Cc: Jeremy Cline jcline@redhat.com Suggested-by: Jeremy Cline jcline@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/bluetooth/btusb.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -2902,6 +2902,12 @@ static int btusb_config_oob_wake(struct } #endif +static void btusb_check_needs_reset_resume(struct usb_interface *intf) +{
- if (dmi_check_system(btusb_needs_reset_resume_table))
interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME;
+}
static int btusb_probe(struct usb_interface *intf, const struct usb_device_id *id) { @@ -3037,9 +3043,6 @@ static int btusb_probe(struct usb_interf hdev->send = btusb_send_frame; hdev->notify = btusb_notify;
- if (dmi_check_system(btusb_needs_reset_resume_table))
interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME;
#ifdef CONFIG_PM err = btusb_config_oob_wake(hdev); if (err) @@ -3177,6 +3180,7 @@ static int btusb_probe(struct usb_interf hdev->setup = btusb_setup_csr; set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, &hdev->quirks);
btusb_check_needs_reset_resume(intf);
The original code puts this under the BTUSB_QCA_ROME section, but this is getting placed under the BTUSB_CSR section.
Turns out 4.16.y has the same problem. Maybe git got confused because mainline also has set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, &hdev->quirks); in the BTUSB_QCA_ROME section. This is missing in 4.14.y and 4.16.y.
Guenter
On Tue, May 15, 2018 at 12:42:36PM -0700, Guenter Roeck wrote:
On Tue, May 15, 2018 at 10:07:02AM -0700, Brian Norris wrote:
- Guenter
On Mon, May 14, 2018 at 08:49:05AM +0200, Greg Kroah-Hartman wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
FYI, this backport is wrong. See below.
From: Hans de Goede hdegoede@redhat.com
commit fc54910280eb38bde923cdf0898e74687d8e6989 upstream.
Jeremy Cline correctly points out in rhbz#1514836 that a device where the QCA rome chipset needs the USB_QUIRK_RESET_RESUME quirk, may also ship with a different wifi/bt chipset in some configurations.
If that is the case then we are needlessly penalizing those other chipsets with a reset-resume quirk, typically causing 0.4W extra power use because this disables runtime-pm.
This commit moves the DMI table check to a btusb_check_needs_reset_resume() helper (so that we can easily also call it for other chipsets) and calls this new helper only for QCA_ROME chipsets for now.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514836 Cc: stable@vger.kernel.org Cc: Jeremy Cline jcline@redhat.com Suggested-by: Jeremy Cline jcline@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/bluetooth/btusb.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -2902,6 +2902,12 @@ static int btusb_config_oob_wake(struct } #endif +static void btusb_check_needs_reset_resume(struct usb_interface *intf) +{
- if (dmi_check_system(btusb_needs_reset_resume_table))
interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME;
+}
static int btusb_probe(struct usb_interface *intf, const struct usb_device_id *id) { @@ -3037,9 +3043,6 @@ static int btusb_probe(struct usb_interf hdev->send = btusb_send_frame; hdev->notify = btusb_notify;
- if (dmi_check_system(btusb_needs_reset_resume_table))
interface_to_usbdev(intf)->quirks |= USB_QUIRK_RESET_RESUME;
#ifdef CONFIG_PM err = btusb_config_oob_wake(hdev); if (err) @@ -3177,6 +3180,7 @@ static int btusb_probe(struct usb_interf hdev->setup = btusb_setup_csr; set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, &hdev->quirks);
btusb_check_needs_reset_resume(intf);
The original code puts this under the BTUSB_QCA_ROME section, but this is getting placed under the BTUSB_CSR section.
Turns out 4.16.y has the same problem. Maybe git got confused because mainline also has set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, &hdev->quirks); in the BTUSB_QCA_ROME section. This is missing in 4.14.y and 4.16.y.
Ah, thanks for catching this, I'll go fix this up. Looks like patch got confused by the use of set_bit() here and fuzzed it into this location instead.
greg k-h
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marek Szyprowski m.szyprowski@samsung.com
commit 88fc6f73fddf64eb507b04f7b2bd01d7291db514 upstream.
When thermal sensor is not yet enabled, reading temperature might return random value. This might even result in stopping system booting when such temperature is higher than the critical value. Fix this by checking if TMU has been actually enabled before reading the temperature.
This change fixes booting of Exynos4210-based board with TMU enabled (for example Samsung Trats board), which was broken since v4.4 kernel release.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Fixes: 9e4249b40340 ("thermal: exynos: Fix first temperature read after registering sensor") CC: stable@vger.kernel.org # v4.6+ Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Signed-off-by: Eduardo Valentin edubezval@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/thermal/samsung/exynos_tmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/thermal/samsung/exynos_tmu.c +++ b/drivers/thermal/samsung/exynos_tmu.c @@ -185,6 +185,7 @@ * @regulator: pointer to the TMU regulator structure. * @reg_conf: pointer to structure to register with core thermal. * @ntrip: number of supported trip points. + * @enabled: current status of TMU device * @tmu_initialize: SoC specific TMU initialization method * @tmu_control: SoC specific TMU control method * @tmu_read: SoC specific TMU temperature read method @@ -205,6 +206,7 @@ struct exynos_tmu_data { struct regulator *regulator; struct thermal_zone_device *tzd; unsigned int ntrip; + bool enabled;
int (*tmu_initialize)(struct platform_device *pdev); void (*tmu_control)(struct platform_device *pdev, bool on); @@ -398,6 +400,7 @@ static void exynos_tmu_control(struct pl mutex_lock(&data->lock); clk_enable(data->clk); data->tmu_control(pdev, on); + data->enabled = on; clk_disable(data->clk); mutex_unlock(&data->lock); } @@ -890,7 +893,7 @@ static int exynos_get_temp(void *p, int { struct exynos_tmu_data *data = p;
- if (!data || !data->tmu_read) + if (!data || !data->tmu_read || !data->enabled) return -EINVAL;
mutex_lock(&data->lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marek Szyprowski m.szyprowski@samsung.com
commit c8da6cdef57b459ac0fd5d9d348f8460a575ae90 upstream.
tmu_read() in case of Exynos4210 might return error for out of bound values. Current code ignores such value, what leads to reporting critical temperature value. Add proper error code propagation to exynos_get_temp() function.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: stable@vger.kernel.org # v4.6+ Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Signed-off-by: Eduardo Valentin edubezval@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/thermal/samsung/exynos_tmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/thermal/samsung/exynos_tmu.c +++ b/drivers/thermal/samsung/exynos_tmu.c @@ -892,6 +892,7 @@ static void exynos7_tmu_control(struct p static int exynos_get_temp(void *p, int *temp) { struct exynos_tmu_data *data = p; + int value, ret = 0;
if (!data || !data->tmu_read || !data->enabled) return -EINVAL; @@ -899,12 +900,16 @@ static int exynos_get_temp(void *p, int mutex_lock(&data->lock); clk_enable(data->clk);
- *temp = code_to_temp(data, data->tmu_read(data)) * MCELSIUS; + value = data->tmu_read(data); + if (value < 0) + ret = value; + else + *temp = code_to_temp(data, value) * MCELSIUS;
clk_disable(data->clk); mutex_unlock(&data->lock);
- return 0; + return ret; }
#ifdef CONFIG_THERMAL_EMULATION
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit 9abd68ef454c824bfd18629033367b4382b5f390 upstream.
Some P3100 drives have a bug where they think WRRU (weighted round robin) is always enabled, even though the host doesn't set it. Since they think it's enabled, they also look at the submission queue creation priority. We used to set that to MEDIUM by default, but that was removed in commit 81c1cd98351b. This causes various issues on that drive. Add a quirk to still set MEDIUM priority for that controller.
Fixes: 81c1cd98351b ("nvme/pci: Don't set reserved SQ create flags") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Keith Busch keith.busch@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvme/host/nvme.h | 5 +++++ drivers/nvme/host/pci.c | 12 +++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-)
--- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -80,6 +80,11 @@ enum nvme_quirks { * Supports the LighNVM command set if indicated in vs[1]. */ NVME_QUIRK_LIGHTNVM = (1 << 6), + + /* + * Set MEDIUM priority on SQ creation + */ + NVME_QUIRK_MEDIUM_PRIO_SQ = (1 << 7), };
/* --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -947,10 +947,19 @@ static int adapter_alloc_cq(struct nvme_ static int adapter_alloc_sq(struct nvme_dev *dev, u16 qid, struct nvme_queue *nvmeq) { + struct nvme_ctrl *ctrl = &dev->ctrl; struct nvme_command c; int flags = NVME_QUEUE_PHYS_CONTIG;
/* + * Some drives have a bug that auto-enables WRRU if MEDIUM isn't + * set. Since URGENT priority is zeroes, it makes all queues + * URGENT. + */ + if (ctrl->quirks & NVME_QUIRK_MEDIUM_PRIO_SQ) + flags |= NVME_SQ_PRIO_MEDIUM; + + /* * Note: we (ab)use the fact the the prp fields survive if no data * is attached to the request. */ @@ -2523,7 +2532,8 @@ static const struct pci_device_id nvme_i .driver_data = NVME_QUIRK_STRIPE_SIZE | NVME_QUIRK_DEALLOCATE_ZEROES, }, { PCI_VDEVICE(INTEL, 0xf1a5), /* Intel 600P/P3100 */ - .driver_data = NVME_QUIRK_NO_DEEPEST_PS }, + .driver_data = NVME_QUIRK_NO_DEEPEST_PS | + NVME_QUIRK_MEDIUM_PRIO_SQ }, { PCI_VDEVICE(INTEL, 0x5845), /* Qemu emulated controller */ .driver_data = NVME_QUIRK_IDENTIFY_CNS, }, { PCI_DEVICE(0x1c58, 0x0003), /* HGST adapter */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French smfrench@gmail.com
commit 6e70c267e68d77679534dcf4aaf84e66f2cf1425 upstream.
As with NFS, which ignores sync on directory handles, fsync on a directory handle is a noop for CIFS/SMB3. Do not return an error on it. It breaks some database apps otherwise.
Signed-off-by: Steve French smfrench@gmail.com CC: Stable stable@vger.kernel.org Reviewed-by: Ronnie Sahlberg lsahlber@redhat.com Reviewed-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/cifsfs.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -1045,6 +1045,18 @@ out: return rc; }
+/* + * Directory operations under CIFS/SMB2/SMB3 are synchronous, so fsync() + * is a dummy operation. + */ +static int cifs_dir_fsync(struct file *file, loff_t start, loff_t end, int datasync) +{ + cifs_dbg(FYI, "Sync directory - name: %pD datasync: 0x%x\n", + file, datasync); + + return 0; +} + static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off, struct file *dst_file, loff_t destoff, size_t len, unsigned int flags) @@ -1173,6 +1185,7 @@ const struct file_operations cifs_dir_op .copy_file_range = cifs_copy_file_range, .clone_file_range = cifs_clone_file_range, .llseek = generic_file_llseek, + .fsync = cifs_dir_fsync, };
static void
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit 354d7793070611b4df5a79fbb0f12752d0ed0cc5 upstream.
kernel/sched/autogroup.c:230 proc_sched_autogroup_set_nice() warn: potential spectre issue 'sched_prio_to_weight'
Userspace controls @nice, sanitize the array index.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/autogroup.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/kernel/sched/autogroup.c +++ b/kernel/sched/autogroup.c @@ -7,6 +7,7 @@ #include <linux/utsname.h> #include <linux/security.h> #include <linux/export.h> +#include <linux/nospec.h>
unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1; static struct autogroup autogroup_default; @@ -213,7 +214,7 @@ int proc_sched_autogroup_set_nice(struct static unsigned long next = INITIAL_JIFFIES; struct autogroup *ag; unsigned long shares; - int err; + int err, idx;
if (nice < MIN_NICE || nice > MAX_NICE) return -EINVAL; @@ -231,7 +232,9 @@ int proc_sched_autogroup_set_nice(struct
next = HZ / 10 + jiffies; ag = autogroup_task_get(p); - shares = scale_load(sched_prio_to_weight[nice + 20]); + + idx = array_index_nospec(nice + 20, 40); + shares = scale_load(sched_prio_to_weight[idx]);
down_write(&ag->lock); err = sched_group_set_shares(ag->tg, shares);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit ef9ee4ad38445a30909c48998624861716f2a994 upstream.
arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids[cache_type]' (local cap) arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids' (local cap) arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs[cache_type]' (local cap) arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs' (local cap)
Userspace controls @config which contains 3 (byte) fields used for a 3 dimensional array deref.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -304,17 +304,20 @@ set_ext_hw_attr(struct hw_perf_event *hw
config = attr->config;
- cache_type = (config >> 0) & 0xff; + cache_type = (config >> 0) & 0xff; if (cache_type >= PERF_COUNT_HW_CACHE_MAX) return -EINVAL; + cache_type = array_index_nospec(cache_type, PERF_COUNT_HW_CACHE_MAX);
cache_op = (config >> 8) & 0xff; if (cache_op >= PERF_COUNT_HW_CACHE_OP_MAX) return -EINVAL; + cache_op = array_index_nospec(cache_op, PERF_COUNT_HW_CACHE_OP_MAX);
cache_result = (config >> 16) & 0xff; if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX) return -EINVAL; + cache_result = array_index_nospec(cache_result, PERF_COUNT_HW_CACHE_RESULT_MAX);
val = hw_cache_event_ids[cache_type][cache_op][cache_result];
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit a5f81290ce475489fa2551c01a07470c1a4c932e upstream.
arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap)
Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/intel/cstate.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -91,6 +91,7 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/perf_event.h> +#include <linux/nospec.h> #include <asm/cpu_device_id.h> #include <asm/intel-family.h> #include "../perf_event.h" @@ -301,6 +302,7 @@ static int cstate_pmu_event_init(struct } else if (event->pmu == &cstate_pkg_pmu) { if (cfg >= PERF_CSTATE_PKG_EVENT_MAX) return -EINVAL; + cfg = array_index_nospec((unsigned long)cfg, PERF_CSTATE_PKG_EVENT_MAX); if (!pkg_msr[cfg].attr) return -EINVAL; event->hw.event_base = pkg_msr[cfg].msr;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit 06ce6e9b6d6c09d4129c6e24a1314a395d816c10 upstream.
arch/x86/events/msr.c:178 msr_event_init() warn: potential spectre issue 'msr' (local cap)
Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/msr.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/x86/events/msr.c +++ b/arch/x86/events/msr.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/perf_event.h> +#include <linux/nospec.h> #include <asm/intel-family.h>
enum perf_msr_id { @@ -145,9 +146,6 @@ static int msr_event_init(struct perf_ev if (event->attr.type != event->pmu->type) return -ENOENT;
- if (cfg >= PERF_MSR_EVENT_MAX) - return -EINVAL; - /* unsupported modes and filters */ if (event->attr.exclude_user || event->attr.exclude_kernel || @@ -158,6 +156,11 @@ static int msr_event_init(struct perf_ev event->attr.sample_period) /* no sampling */ return -EINVAL;
+ if (cfg >= PERF_MSR_EVENT_MAX) + return -EINVAL; + + cfg = array_index_nospec((unsigned long)cfg, PERF_MSR_EVENT_MAX); + if (!msr[cfg].attr) return -EINVAL;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit 4411ec1d1993e8dbff2898390e3fed280d88e446 upstream.
kernel/events/ring_buffer.c:871 perf_mmap_to_page() warn: potential spectre issue 'rb->aux_pages'
Userspace controls @pgoff through the fault address. Sanitize the array index before doing the array dereference.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/events/ring_buffer.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -14,6 +14,7 @@ #include <linux/slab.h> #include <linux/circ_buf.h> #include <linux/poll.h> +#include <linux/nospec.h>
#include "internal.h"
@@ -863,8 +864,10 @@ perf_mmap_to_page(struct ring_buffer *rb return NULL;
/* AUX space */ - if (pgoff >= rb->aux_pgoff) - return virt_to_page(rb->aux_pages[pgoff - rb->aux_pgoff]); + if (pgoff >= rb->aux_pgoff) { + int aux_pgoff = array_index_nospec(pgoff - rb->aux_pgoff, rb->aux_nr_pages); + return virt_to_page(rb->aux_pages[aux_pgoff]); + } }
return __perf_mmap_to_page(rb, pgoff);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit 46b1b577229a091b137831becaa0fae8690ee15a upstream.
arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap) arch/x86/events/intel/core.c:337 intel_pmu_event_map() warn: potential spectre issue 'intel_perfmon_event_map' arch/x86/events/intel/knc.c:122 knc_pmu_event_map() warn: potential spectre issue 'knc_perfmon_event_map' arch/x86/events/intel/p4.c:722 p4_pmu_event_map() warn: potential spectre issue 'p4_general_events' arch/x86/events/intel/p6.c:116 p6_pmu_event_map() warn: potential spectre issue 'p6_perfmon_event_map' arch/x86/events/amd/core.c:132 amd_pmu_event_map() warn: potential spectre issue 'amd_perfmon_event_map'
Userspace controls @attr, sanitize @attr->config before passing it on to x86_pmu::event_map().
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -27,6 +27,7 @@ #include <linux/cpu.h> #include <linux/bitops.h> #include <linux/device.h> +#include <linux/nospec.h>
#include <asm/apic.h> #include <asm/stacktrace.h> @@ -424,6 +425,8 @@ int x86_setup_perfctr(struct perf_event if (attr->config >= x86_pmu.max_events) return -EINVAL;
+ attr->config = array_index_nospec((unsigned long)attr->config, x86_pmu.max_events); + /* * The generic map: */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Mackerras paulus@ozlabs.org
commit c3856aeb29402e94ad9b3879030165cc6a4fdc56 upstream.
This fixes several bugs in the radix page fault handler relating to the way large pages in the memory backing the guest were handled. First, the check for large pages only checked for explicit huge pages and missed transparent huge pages. Then the check that the addresses (host virtual vs. guest physical) had appropriate alignment was wrong, meaning that the code never put a large page in the partition scoped radix tree; it was always demoted to a small page.
Fixing this exposed bugs in kvmppc_create_pte(). We were never invalidating a 2MB PTE, which meant that if a page was initially faulted in without write permission and the guest then attempted to store to it, we would never update the PTE to have write permission. If we find a valid 2MB PTE in the PMD, we need to clear it and do a TLB invalidation before installing either the new 2MB PTE or a pointer to a page table page.
This also corrects an assumption that get_user_pages_fast would set the _PAGE_DIRTY bit if we are writing, which is not true. Instead we mark the page dirty explicitly with set_page_dirty_lock(). This also means we don't need the dirty bit set on the host PTE when providing write access on a read fault.
[paulus@ozlabs.org - use mark_pages_dirty instead of kvmppc_update_dirty_map]
Signed-off-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kvm/book3s_64_mmu_radix.c | 72 +++++++++++++++++++++------------ 1 file changed, 46 insertions(+), 26 deletions(-)
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -19,6 +19,9 @@ #include <asm/pgalloc.h> #include <asm/pte-walk.h>
+static void mark_pages_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot, + unsigned long gfn, unsigned int order); + /* * Supported radix tree geometry. * Like p9, we support either 5 or 9 bits at the first (lowest) level, @@ -195,6 +198,12 @@ static void kvmppc_pte_free(pte_t *ptep) kmem_cache_free(kvm_pte_cache, ptep); }
+/* Like pmd_huge() and pmd_large(), but works regardless of config options */ +static inline int pmd_is_leaf(pmd_t pmd) +{ + return !!(pmd_val(pmd) & _PAGE_PTE); +} + static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, unsigned int level, unsigned long mmu_seq) { @@ -219,7 +228,7 @@ static int kvmppc_create_pte(struct kvm else new_pmd = pmd_alloc_one(kvm->mm, gpa);
- if (level == 0 && !(pmd && pmd_present(*pmd))) + if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd))) new_ptep = kvmppc_pte_alloc();
/* Check if we might have been invalidated; let the guest retry if so */ @@ -244,12 +253,30 @@ static int kvmppc_create_pte(struct kvm new_pmd = NULL; } pmd = pmd_offset(pud, gpa); - if (pmd_large(*pmd)) { - /* Someone else has instantiated a large page here; retry */ - ret = -EAGAIN; - goto out_unlock; - } - if (level == 1 && !pmd_none(*pmd)) { + if (pmd_is_leaf(*pmd)) { + unsigned long lgpa = gpa & PMD_MASK; + + /* + * If we raced with another CPU which has just put + * a 2MB pte in after we saw a pte page, try again. + */ + if (level == 0 && !new_ptep) { + ret = -EAGAIN; + goto out_unlock; + } + /* Valid 2MB page here already, remove it */ + old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd), + ~0UL, 0, lgpa, PMD_SHIFT); + kvmppc_radix_tlbie_page(kvm, lgpa, PMD_SHIFT); + if (old & _PAGE_DIRTY) { + unsigned long gfn = lgpa >> PAGE_SHIFT; + struct kvm_memory_slot *memslot; + memslot = gfn_to_memslot(kvm, gfn); + if (memslot) + mark_pages_dirty(kvm, memslot, gfn, + PMD_SHIFT - PAGE_SHIFT); + } + } else if (level == 1 && !pmd_none(*pmd)) { /* * There's a page table page here, but we wanted * to install a large page. Tell the caller and let @@ -412,28 +439,24 @@ int kvmppc_book3s_radix_page_fault(struc } else { page = pages[0]; pfn = page_to_pfn(page); - if (PageHuge(page)) { - page = compound_head(page); - pte_size <<= compound_order(page); + if (PageCompound(page)) { + pte_size <<= compound_order(compound_head(page)); /* See if we can insert a 2MB large-page PTE here */ if (pte_size >= PMD_SIZE && - (gpa & PMD_MASK & PAGE_MASK) == - (hva & PMD_MASK & PAGE_MASK)) { + (gpa & (PMD_SIZE - PAGE_SIZE)) == + (hva & (PMD_SIZE - PAGE_SIZE))) { level = 1; pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1); } } /* See if we can provide write access */ if (writing) { - /* - * We assume gup_fast has set dirty on the host PTE. - */ pgflags |= _PAGE_WRITE; } else { local_irq_save(flags); ptep = find_current_mm_pte(current->mm->pgd, hva, NULL, NULL); - if (ptep && pte_write(*ptep) && pte_dirty(*ptep)) + if (ptep && pte_write(*ptep)) pgflags |= _PAGE_WRITE; local_irq_restore(flags); } @@ -459,18 +482,15 @@ int kvmppc_book3s_radix_page_fault(struc pte = pfn_pte(pfn, __pgprot(pgflags)); ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq); } - if (ret == 0 || ret == -EAGAIN) - ret = RESUME_GUEST;
if (page) { - /* - * We drop pages[0] here, not page because page might - * have been set to the head page of a compound, but - * we have to drop the reference on the correct tail - * page to match the get inside gup() - */ - put_page(pages[0]); + if (!ret && (pgflags & _PAGE_WRITE)) + set_page_dirty_lock(page); + put_page(page); } + + if (ret == 0 || ret == -EAGAIN) + ret = RESUME_GUEST; return ret; }
@@ -676,7 +696,7 @@ void kvmppc_free_radix(struct kvm *kvm) continue; pmd = pmd_offset(pud, 0); for (im = 0; im < PTRS_PER_PMD; ++im, ++pmd) { - if (pmd_huge(*pmd)) { + if (pmd_is_leaf(*pmd)) { pmd_clear(pmd); continue; }
On Mon, May 14, 2018 at 08:48:16AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.41 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed May 16 06:47:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.41-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled, and installed on to my Raspberry Pi 3.
No initial issues noticed in general usage or dmesg.
Thanks! Nathan
On Mon, May 14, 2018 at 08:48:16AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.41 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed May 16 06:47:52 UTC 2018. Anything received after that time might be too late.
Build results: total: 146 pass: 146 fail: 0 Qemu test results: total: 141 pass: 141 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
On 05/14/2018 12:48 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.41 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed May 16 06:47:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.41-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 14 May 2018 at 12:18, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.41 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed May 16 06:47:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.41-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.14.41-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 13b3d94ce25df5cc037e4d28af3ef61933281cc5 git describe: v4.14.40-63-g13b3d94ce25d Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.40-63...
No regressions (compared to build v4.14.40-22-g70a593c357b7)
Boards, architectures and test suites: -------------------------------------
dragonboard-410c * boot - pass: 20, * kselftest - pass: 41, skip: 27 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1016, skip: 134 * ltp-timers-tests - pass: 13,
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 44, skip: 23 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, skip: 4 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1015, skip: 135 * ltp-timers-tests - pass: 13,
juno-r2 - arm64 * boot - pass: 27, * kselftest - pass: 43, skip: 25 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 12, * ltp-sched-tests - pass: 10, skip: 4 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 2032, skip: 268 * ltp-timers-tests - pass: 13,
qemu_arm64 * boot - pass: 21, fail: 1, * kselftest - pass: 41, skip: 29 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 8, skip: 6 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 992, fail: 2, skip: 156 * ltp-timers-tests - pass: 13,
qemu_x86_64 * boot - pass: 20, * kselftest - pass: 50, skip: 30 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 998, skip: 152 * ltp-timers-tests - pass: 13,
x15 - arm * boot - pass: 20, * kselftest - pass: 37, fail: 2, skip: 26 * libhugetlbfs - pass: 87, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 63, skip: 18 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 58, skip: 5 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2 * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1075, skip: 75 * ltp-timers-tests - pass: 13,
x86_64 * boot - pass: 22, * kselftest - pass: 54, skip: 24 * kselftest-vsyscall-mode-native - pass: 52, fail: 1, skip: 24 * kselftest-vsyscall-mode-none - pass: 54, skip: 22 * libhugetlbfs - pass: 90, skip: 1 * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 63, skip: 17 * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 58, skip: 5 * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 9, skip: 5 * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1034, skip: 116 * ltp-timers-tests - pass: 13,
linux-stable-mirror@lists.linaro.org