This is the start of the stable review cycle for the 4.9.114 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:47 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.114-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.114-rc1
Tejun Heo tj@kernel.org string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
Marc Zyngier marc.zyngier@arm.com arm64: KVM: Add HYP per-cpu accessors
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Add prctl interface for per-thread mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Introduce thread flag to control userspace mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Restore mitigation status on CPU resume
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
Marc Zyngier marc.zyngier@arm.com arm64: ssbd: Add global mitigation state accessor
Marc Zyngier marc.zyngier@arm.com arm64: Add 'ssbd' command-line option
Marc Zyngier marc.zyngier@arm.com arm64: Add ARCH_WORKAROUND_2 probing
Marc Zyngier marc.zyngier@arm.com arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
Marc Zyngier marc.zyngier@arm.com arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
Marc Zyngier marc.zyngier@arm.com arm/arm64: smccc: Add SMCCC-specific return codes
Christoffer Dall christoffer.dall@linaro.org KVM: arm64: Avoid storing the vcpu pointer on the stack
Marc Zyngier marc.zyngier@arm.com KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state
Marc Zyngier marc.zyngier@arm.com arm64: alternatives: Add dynamic patching feature
James Morse james.morse@arm.com KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
James Morse james.morse@arm.com arm64: alternatives: use tpidr_el2 on VHE hosts
James Morse james.morse@arm.com KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
James Morse james.morse@arm.com KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation
James Morse james.morse@arm.com KVM: arm64: Store vcpu on the stack during __guest_enter()
Mark Rutland mark.rutland@arm.com arm64: assembler: introduce ldr_this_cpu
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
Santosh Shilimkar santosh.shilimkar@oracle.com rds: avoid unenecessary cong_update in loop transport
Florian Westphal fw@strlen.de netfilter: ipv6: nf_defrag: drop skb dst before queueing
Eric Biggers ebiggers@google.com KEYS: DNS: fix parsing multiple options
Eric Biggers ebiggers@google.com reiserfs: fix buffer overflow with long warning messages
Florian Westphal fw@strlen.de netfilter: ebtables: reject non-bridge targets
Stefan Wahren stefan.wahren@i2se.com net: lan78xx: Fix race in tx pending skb size calculation
Ping-Ke Shih pkshih@realtek.com rtlwifi: rtl8821ae: fix firmware is not ready to run
Gustavo A. R. Silva gustavo@embeddedor.com net: cxgb3_main: fix potential Spectre v1
Alex Vesker valex@mellanox.com net/mlx5: Fix command interface race in polling mode
Eric Dumazet edumazet@google.com net/packet: fix use-after-free
Jason Wang jasowang@redhat.com vhost_net: validate sock before trying to put its fd
Ilpo Järvinen ilpo.jarvinen@helsinki.fi tcp: prevent bogus FRTO undos with non-SACK flows
Yuchung Cheng ycheng@google.com tcp: fix Fast Open key endianness
Jiri Slaby jslaby@suse.cz r8152: napi hangup fix after disconnect
Aleksander Morgado aleksander@aleksander.es qmi_wwan: add support for the Dell Wireless 5821e module
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Limit msix vectors in kdump kernel to the minimum required count.
Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com qed: Fix use of incorrect size in memcpy call.
Eric Dumazet edumazet@google.com net: sungem: fix rx checksum support
Konstantin Khlebnikov khlebnikov@yandex-team.ru net_sched: blackhole: tell upper qdisc about dropped packets
Shay Agroskin shayag@mellanox.com net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
Alex Vesker valex@mellanox.com net/mlx5: Fix incorrect raw command length parsing
Eric Dumazet edumazet@google.com net: dccp: switch rx_tstamp_last_feedback to monotonic clock
Eric Dumazet edumazet@google.com net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
Xin Long lucien.xin@gmail.com ipvlan: fix IFLA_MTU ignored on NEWLINK
Gustavo A. R. Silva gustavo@embeddedor.com atm: zatm: Fix potential Spectre v1
Christian Lamparter chunkeey@googlemail.com crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
Christian Lamparter chunkeey@googlemail.com crypto: crypto4xx - remove bad list_del
Jonas Gorski jonas.gorski@gmail.com bcm63xx_enet: do not write to random DMA channel on BCM6345
Jonas Gorski jonas.gorski@gmail.com bcm63xx_enet: correct clock usage
Jonas Gorski jonas.gorski@gmail.com spi/bcm63xx: fix typo in bcm63xx_spi_max_length breaking compilation
Jonas Gorski jonas.gorski@gmail.com spi/bcm63xx: make spi subsystem aware of message size limits
Heiner Kallweit hkallweit1@gmail.com mtd: m25p80: consider max message size in m25p80_read
alex chen alex.chen@huawei.com ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
alex chen alex.chen@huawei.com ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
Nick Desaulniers ndesaulniers@google.com x86/paravirt: Make native_save_fl() extern inline
H. Peter Anvin hpa@linux.intel.com x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
Nick Desaulniers ndesaulniers@google.com compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
David Rientjes rientjes@google.com compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
Linus Torvalds torvalds@linux-foundation.org compiler, clang: properly override 'inline' for clang
David Rientjes rientjes@google.com compiler, clang: suppress warning for unused static inline functions
Paul Burton paul.burton@mips.com MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
-------------
Diffstat:
Documentation/kernel-parameters.txt | 17 +++ Makefile | 4 +- arch/arm/include/asm/kvm_host.h | 12 ++ arch/arm/include/asm/kvm_mmu.h | 12 ++ arch/arm/kvm/arm.c | 24 ++-- arch/arm/kvm/psci.c | 18 ++- arch/arm64/Kconfig | 9 ++ arch/arm64/include/asm/alternative.h | 43 +++++- arch/arm64/include/asm/assembler.h | 27 +++- arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/cpufeature.h | 22 +++ arch/arm64/include/asm/kvm_asm.h | 41 ++++++ arch/arm64/include/asm/kvm_host.h | 43 ++++++ arch/arm64/include/asm/kvm_mmu.h | 44 ++++++ arch/arm64/include/asm/percpu.h | 12 +- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/alternative.c | 54 ++++--- arch/arm64/kernel/asm-offsets.c | 2 + arch/arm64/kernel/cpu_errata.c | 180 ++++++++++++++++++++++++ arch/arm64/kernel/cpufeature.c | 17 +++ arch/arm64/kernel/entry.S | 32 ++++- arch/arm64/kernel/hibernate.c | 11 ++ arch/arm64/kernel/ssbd.c | 108 ++++++++++++++ arch/arm64/kernel/suspend.c | 8 ++ arch/arm64/kvm/hyp-init.S | 4 + arch/arm64/kvm/hyp/entry.S | 12 +- arch/arm64/kvm/hyp/hyp-entry.S | 62 ++++++-- arch/arm64/kvm/hyp/switch.c | 64 +++++++-- arch/arm64/kvm/hyp/sysreg-sr.c | 21 +-- arch/arm64/kvm/reset.c | 4 + arch/mips/kernel/process.c | 45 ++++-- arch/x86/include/asm/asm.h | 59 ++++++++ arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 ++++ drivers/atm/zatm.c | 2 + drivers/crypto/amcc/crypto4xx_core.c | 23 ++- drivers/mtd/devices/m25p80.c | 3 +- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 34 +++-- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 + drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 +- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 +- drivers/net/ethernet/qlogic/qed/qed_main.c | 9 ++ drivers/net/ethernet/sun/sungem.c | 22 +-- drivers/net/ipvlan/ipvlan_main.c | 3 +- drivers/net/usb/lan78xx.c | 5 +- drivers/net/usb/qmi_wwan.c | 1 + drivers/net/usb/r8152.c | 3 +- drivers/net/wireless/realtek/rtlwifi/core.c | 1 - drivers/spi/spi-bcm63xx.c | 9 ++ drivers/vhost/net.c | 3 +- fs/ocfs2/aops.c | 26 ++-- fs/ocfs2/cluster/nodemanager.c | 63 +++++++-- fs/reiserfs/prints.c | 141 +++++++++++-------- include/linux/arm-smccc.h | 10 ++ include/linux/compiler-gcc.h | 35 +++-- include/linux/string.h | 2 +- net/bridge/netfilter/ebtables.c | 13 ++ net/dccp/ccids/ccid3.c | 16 ++- net/dns_resolver/dns_key.c | 28 ++-- net/ipv4/sysctl_net_ipv4.c | 18 ++- net/ipv4/tcp_input.c | 9 ++ net/ipv6/netfilter/nf_conntrack_reasm.c | 2 + net/nfc/llcp_commands.c | 9 +- net/packet/af_packet.c | 14 +- net/rds/loop.c | 1 + net/rds/rds.h | 5 + net/rds/recv.c | 5 + net/sched/sch_blackhole.c | 2 +- virt/kvm/arm/hyp/vgic-v2-sr.c | 2 +- 72 files changed, 1315 insertions(+), 271 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Burton paul.burton@mips.com
commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream.
The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled.
This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may:
- Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself.
- Not succeed in generating the desired output from remote CPUs.
- Produce warnings about this from smp_call_function_many(), for example:
[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] ------------[ cut here ]------------ [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000 [42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0 [42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [<c7521934>] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [<b3fce717>] exit_mmap+0x76/0xea [42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a [42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c [42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] ------------[ cut here ]------------ [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ...
This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if a CPU hasn't responded to a previous backtrace IPI - ie. if it's hung - and we print a warning to the console in this case.
I've marked this for stable branches as far back as v4.9, to which it applies cleanly. Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel versions v3.19 through v4.8 will require further work to backport due to the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods").
Signed-off-by: Paul Burton paul.burton@mips.com Patchwork: https://patchwork.linux-mips.org/patch/19597/ Cc: James Hogan jhogan@kernel.org Cc: Ralf Baechle ralf@linux-mips.org Cc: Huacai Chen chenhc@lemote.com Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.9+ Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods") [ Huacai: backported to 4.9: Replace "call_single_data_t" with "struct call_single_data" ] Signed-off-by: Huacai Chen chenhc@lemote.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/kernel/process.c | 45 ++++++++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 15 deletions(-)
--- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -26,6 +26,7 @@ #include <linux/kallsyms.h> #include <linux/random.h> #include <linux/prctl.h> +#include <linux/nmi.h>
#include <asm/asm.h> #include <asm/bootinfo.h> @@ -633,28 +634,42 @@ unsigned long arch_align_stack(unsigned return sp & ALMASK; }
-static void arch_dump_stack(void *info) -{ - struct pt_regs *regs; +static DEFINE_PER_CPU(struct call_single_data, backtrace_csd); +static struct cpumask backtrace_csd_busy;
- regs = get_irq_regs(); - - if (regs) - show_regs(regs); - else - dump_stack(); +static void handle_backtrace(void *info) +{ + nmi_cpu_backtrace(get_irq_regs()); + cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy); }
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) +static void raise_backtrace(cpumask_t *mask) { - long this_cpu = get_cpu(); + struct call_single_data *csd; + int cpu;
- if (cpumask_test_cpu(this_cpu, mask) && !exclude_self) - dump_stack(); + for_each_cpu(cpu, mask) { + /* + * If we previously sent an IPI to the target CPU & it hasn't + * cleared its bit in the busy cpumask then it didn't handle + * our previous IPI & it's not safe for us to reuse the + * call_single_data_t. + */ + if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) { + pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n", + cpu); + continue; + }
- smp_call_function_many(mask, arch_dump_stack, NULL, 1); + csd = &per_cpu(backtrace_csd, cpu); + csd->func = handle_backtrace; + smp_call_function_single_async(cpu, csd); + } +}
- put_cpu(); +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) +{ + nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace); }
int mips_get_process_fp_mode(struct task_struct *task)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rientjes rientjes@google.com
commit abb2ea7dfd82451d85ce669b811310c05ab5ca46 upstream.
GCC explicitly does not warn for unused static inline functions for -Wunused-function. The manual states:
Warn whenever a static function is declared but not defined or a non-inline static function is unused.
Clang does warn for static inline functions that are unused.
It turns out that suppressing the warnings avoids potentially complex #ifdef directives, which also reduces LOC.
Suppress the warning for clang.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler-clang.h | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -16,6 +16,13 @@ */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+/* + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well. + */ +#define inline inline __attribute__((unused)) + /* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline #undef __noretpoline
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 6d53cefb18e4646fb4bf62ccb6098fb3808486df upstream.
Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") just caused more warnings due to re-defining the 'inline' macro.
So undef it before re-defining it, and also add the 'notrace' attribute like the gcc version that this is overriding does.
Maybe this makes clang happier.
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler-clang.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -21,7 +21,8 @@ * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well. */ -#define inline inline __attribute__((unused)) +#undef inline +#define inline inline __attribute__((unused)) notrace
/* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rientjes rientjes@google.com
commit 9a04dbcfb33b4012d0ce8c0282f1e3ca694675b1 upstream.
The motivation for commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") was to suppress clang's warnings about unused static inline functions.
For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86 architecture, `inline' in the kernel implies that __attribute__((always_inline)) is used.
Some code depends on that behavior, see https://lkml.org/lkml/2017/6/13/918:
net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99
The full fix would be to identify these breakages and annotate the functions with __always_inline instead of `inline'. But since we are late in the 4.12-rc cycle, simply carry forward the forced inlining behavior and work toward moving arm64, and other architectures, toward CONFIG_OPTIMIZE_INLINING behavior.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706261552200.1075@chino.kir.corp.g... Signed-off-by: David Rientjes rientjes@google.com Reported-by: Sodagudi Prasad psodagud@codeaurora.org Tested-by: Sodagudi Prasad psodagud@codeaurora.org Tested-by: Matthias Kaehlcke mka@chromium.org Cc: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler-clang.h | 8 -------- include/linux/compiler-gcc.h | 18 +++++++++++------- 2 files changed, 11 insertions(+), 15 deletions(-)
--- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -16,14 +16,6 @@ */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
-/* - * GCC does not warn about unused static inline functions for - * -Wunused-function. This turns out to avoid the need for complex #ifdef - * directives. Suppress the warning in clang as well. - */ -#undef inline -#define inline inline __attribute__((unused)) notrace - /* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline #undef __noretpoline --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,18 +66,22 @@
/* * Force always-inline if the user requests it so via the .config, - * or if gcc is too old: + * or if gcc is too old. + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well by using "unused" + * function attribute, which is redundant but not harmful for gcc. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline)) notrace -#define __inline__ __inline__ __attribute__((always_inline)) notrace -#define __inline __inline __attribute__((always_inline)) notrace +#define inline inline __attribute__((always_inline,unused)) notrace +#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace +#define __inline __inline __attribute__((always_inline,unused)) notrace #else /* A lot of inline functions can cause havoc with function tracing */ -#define inline inline notrace -#define __inline__ __inline__ notrace -#define __inline __inline notrace +#define inline inline __attribute__((unused)) notrace +#define __inline__ __inline__ __attribute__((unused)) notrace +#define __inline __inline __attribute__((unused)) notrace #endif
#define __always_inline inline __attribute__((always_inline))
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Desaulniers ndesaulniers@google.com
commit d03db2bc26f0e4a6849ad649a09c9c73fccdc656 upstream.
Functions marked extern inline do not emit an externally visible function when the gnu89 C standard is used. Some KBUILD Makefiles overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without an explicit C standard specified, the default is gnu11. Since c99, the semantics of extern inline have changed such that an externally visible function is always emitted. This can lead to multiple definition errors of extern inline functions at link time of compilation units whose build files have removed an explicit C standard compiler flag for users of GCC 5.1+ or Clang.
Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: H. Peter Anvin hpa@zytor.com Suggested-by: Joe Perches joe@perches.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: sedat.dilek@gmail.com Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/compiler-gcc.h | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
--- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -65,25 +65,40 @@ #endif
/* + * Feature detection for gnu_inline (gnu89 extern inline semantics). Either + * __GNUC_STDC_INLINE__ is defined (not using gnu89 extern inline semantics, + * and we opt in to the gnu89 semantics), or __GNUC_STDC_INLINE__ is not + * defined so the gnu89 semantics are the default. + */ +#ifdef __GNUC_STDC_INLINE__ +# define __gnu_inline __attribute__((gnu_inline)) +#else +# define __gnu_inline +#endif + +/* * Force always-inline if the user requests it so via the .config, * or if gcc is too old. * GCC does not warn about unused static inline functions for * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well by using "unused" * function attribute, which is redundant but not harmful for gcc. + * Prefer gnu_inline, so that extern inline functions do not emit an + * externally visible function. This makes extern inline behave as per gnu89 + * semantics rather than c99. This prevents multiple symbol definition errors + * of extern inline functions at link time. + * A lot of inline functions can cause havoc with function tracing. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline,unused)) notrace -#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace -#define __inline __inline __attribute__((always_inline,unused)) notrace +#define inline \ + inline __attribute__((always_inline, unused)) notrace __gnu_inline #else -/* A lot of inline functions can cause havoc with function tracing */ -#define inline inline __attribute__((unused)) notrace -#define __inline__ __inline__ __attribute__((unused)) notrace -#define __inline __inline __attribute__((unused)) notrace +#define inline inline __attribute__((unused)) notrace __gnu_inline #endif
+#define __inline__ inline +#define __inline inline #define __always_inline inline __attribute__((always_inline)) #define noinline __attribute__((noinline))
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: H. Peter Anvin hpa@linux.intel.com
commit 0e2e160033283e20f688d8bad5b89460cc5bfcc4 upstream.
i386 and x86-64 uses different registers for arguments; make them available so we don't have to #ifdef in the actual code.
Native size and specified size (q, l, w, b) versions are provided.
Signed-off-by: H. Peter Anvin hpa@linux.intel.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Sedat Dilek sedat.dilek@gmail.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-3-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/asm.h | 59 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -45,6 +45,65 @@ #define _ASM_SI __ASM_REG(si) #define _ASM_DI __ASM_REG(di)
+#ifndef __x86_64__ +/* 32 bit */ + +#define _ASM_ARG1 _ASM_AX +#define _ASM_ARG2 _ASM_DX +#define _ASM_ARG3 _ASM_CX + +#define _ASM_ARG1L eax +#define _ASM_ARG2L edx +#define _ASM_ARG3L ecx + +#define _ASM_ARG1W ax +#define _ASM_ARG2W dx +#define _ASM_ARG3W cx + +#define _ASM_ARG1B al +#define _ASM_ARG2B dl +#define _ASM_ARG3B cl + +#else +/* 64 bit */ + +#define _ASM_ARG1 _ASM_DI +#define _ASM_ARG2 _ASM_SI +#define _ASM_ARG3 _ASM_DX +#define _ASM_ARG4 _ASM_CX +#define _ASM_ARG5 r8 +#define _ASM_ARG6 r9 + +#define _ASM_ARG1Q rdi +#define _ASM_ARG2Q rsi +#define _ASM_ARG3Q rdx +#define _ASM_ARG4Q rcx +#define _ASM_ARG5Q r8 +#define _ASM_ARG6Q r9 + +#define _ASM_ARG1L edi +#define _ASM_ARG2L esi +#define _ASM_ARG3L edx +#define _ASM_ARG4L ecx +#define _ASM_ARG5L r8d +#define _ASM_ARG6L r9d + +#define _ASM_ARG1W di +#define _ASM_ARG2W si +#define _ASM_ARG3W dx +#define _ASM_ARG4W cx +#define _ASM_ARG5W r8w +#define _ASM_ARG6W r9w + +#define _ASM_ARG1B dil +#define _ASM_ARG2B sil +#define _ASM_ARG3B dl +#define _ASM_ARG4B cl +#define _ASM_ARG5B r8b +#define _ASM_ARG6B r9b + +#endif + /* * Macros to generate condition code outputs from inline assembly, * The output operand must be type "bool".
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nick Desaulniers ndesaulniers@google.com
commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream.
native_save_fl() is marked static inline, but by using it as a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.
paravirt's use of native_save_fl() also requires that no GPRs other than %rax are clobbered.
Compilers have different heuristics which they use to emit stack guard code, the emittance of which can break paravirt's callee saved assumption by clobbering %rcx.
Marking a function definition extern inline means that if this version cannot be inlined, then the out-of-line version will be preferred. By having the out-of-line version be implemented in assembly, it cannot be instrumented with a stack protector, which might violate custom calling conventions that code like paravirt rely on.
The semantics of extern inline has changed since gnu89. This means that folks using GCC versions >= 5.1 may see symbol redefinition errors at link time for subdirs that override KBUILD_CFLAGS (making the C standard used implicit) regardless of this patch. This has been cleaned up earlier in the patch set, but is left as a note in the commit message for future travelers.
Reports: https://lkml.org/lkml/2018/5/7/534 https://github.com/ClangBuiltLinux/linux/issues/16
Discussion: https://bugs.llvm.org/show_bug.cgi?id=37512 https://lkml.org/lkml/2018/5/24/1371
Thanks to the many folks that participated in the discussion.
Debugged-by: Alistair Strachan astrachan@google.com Debugged-by: Matthias Kaehlcke mka@chromium.org Suggested-by: Arnd Bergmann arnd@arndb.de Suggested-by: H. Peter Anvin hpa@zytor.com Suggested-by: Tom Stellar tstellar@redhat.com Reported-by: Sedat Dilek sedat.dilek@gmail.com Tested-by: Sedat Dilek sedat.dilek@gmail.com Signed-off-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -12,7 +12,7 @@ * Interrupt control: */
-static inline unsigned long native_save_fl(void) +extern inline unsigned long native_save_fl(void) { unsigned long flags;
--- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -56,6 +56,7 @@ obj-y += alternative.o i8253.o pci-nom obj-y += tsc.o tsc_msr.o io_delay.o rtc.o obj-y += pci-iommu_table.o obj-y += resource.o +obj-y += irqflags.o
obj-y += process.o obj-y += fpu/ --- /dev/null +++ b/arch/x86/kernel/irqflags.S @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include <asm/asm.h> +#include <asm/export.h> +#include <linux/linkage.h> + +/* + * unsigned long native_save_fl(void) + */ +ENTRY(native_save_fl) + pushf + pop %_ASM_AX + ret +ENDPROC(native_save_fl) +EXPORT_SYMBOL(native_save_fl) + +/* + * void native_restore_fl(unsigned long flags) + * %eax/%rdi: flags + */ +ENTRY(native_restore_fl) + push %_ASM_ARG1 + popf + ret +ENDPROC(native_restore_fl) +EXPORT_SYMBOL(native_restore_fl)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: alex chen alex.chen@huawei.com
commit 853bc26a7ea39e354b9f8889ae7ad1492ffa28d2 upstream.
The subsystem.su_mutex is required while accessing the item->ci_parent, otherwise, NULL pointer dereference to the item->ci_parent will be triggered in the following situation:
add node delete node sys_write vfs_write configfs_write_file o2nm_node_store o2nm_node_local_write do_rmdir vfs_rmdir configfs_rmdir mutex_lock(&subsys->su_mutex); unlink_obj item->ci_group = NULL; item->ci_parent = NULL; to_o2nm_cluster_from_node node->nd_item.ci_parent->ci_parent BUG since of NULL pointer dereference to nd_item.ci_parent
Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex.
[alex.chen@huawei.com: v2] Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.com Signed-off-by: Alex Chen alex.chen@huawei.com Reviewed-by: Jun Piao piaojun@huawei.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/cluster/nodemanager.c | 63 +++++++++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 8 deletions(-)
--- a/fs/ocfs2/cluster/nodemanager.c +++ b/fs/ocfs2/cluster/nodemanager.c @@ -40,6 +40,9 @@ char *o2nm_fence_method_desc[O2NM_FENCE_ "panic", /* O2NM_FENCE_PANIC */ };
+static inline void o2nm_lock_subsystem(void); +static inline void o2nm_unlock_subsystem(void); + struct o2nm_node *o2nm_get_node_by_num(u8 node_num) { struct o2nm_node *node = NULL; @@ -181,7 +184,10 @@ static struct o2nm_cluster *to_o2nm_clus { /* through the first node_set .parent * mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */ - return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + if (node->nd_item.ci_parent) + return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + else + return NULL; }
enum { @@ -194,7 +200,7 @@ static ssize_t o2nm_node_num_store(struc size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; int ret = 0; @@ -214,6 +220,13 @@ static ssize_t o2nm_node_num_store(struc !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */
+ o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + write_lock(&cluster->cl_nodes_lock); if (cluster->cl_nodes[tmp]) ret = -EEXIST; @@ -226,6 +239,8 @@ static ssize_t o2nm_node_num_store(struc set_bit(tmp, cluster->cl_nodes_bitmap); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret;
@@ -269,7 +284,7 @@ static ssize_t o2nm_node_ipv4_address_st size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; int ret, i; struct rb_node **p, *parent; unsigned int octets[4]; @@ -286,6 +301,13 @@ static ssize_t o2nm_node_ipv4_address_st be32_add_cpu(&ipv4_addr, octets[i] << (i * 8)); }
+ o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + ret = 0; write_lock(&cluster->cl_nodes_lock); if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, &p, &parent)) @@ -298,6 +320,8 @@ static ssize_t o2nm_node_ipv4_address_st rb_insert_color(&node->nd_ip_node, &cluster->cl_node_ip_tree); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret;
@@ -315,7 +339,7 @@ static ssize_t o2nm_node_local_store(str size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; ssize_t ret; @@ -333,17 +357,26 @@ static ssize_t o2nm_node_local_store(str !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */
+ o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + ret = -EINVAL; + goto out; + } + /* the only failure case is trying to set a new local node * when a different one is already set */ if (tmp && tmp == cluster->cl_has_local && - cluster->cl_local_node != node->nd_num) - return -EBUSY; + cluster->cl_local_node != node->nd_num) { + ret = -EBUSY; + goto out; + }
/* bring up the rx thread if we're setting the new local node. */ if (tmp && !cluster->cl_has_local) { ret = o2net_start_listening(node); if (ret) - return ret; + goto out; }
if (!tmp && cluster->cl_has_local && @@ -358,7 +391,11 @@ static ssize_t o2nm_node_local_store(str cluster->cl_local_node = node->nd_num; }
- return count; + ret = count; + +out: + o2nm_unlock_subsystem(); + return ret; }
CONFIGFS_ATTR(o2nm_node_, num); @@ -738,6 +775,16 @@ static struct o2nm_cluster_group o2nm_cl }, };
+static inline void o2nm_lock_subsystem(void) +{ + mutex_lock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + +static inline void o2nm_unlock_subsystem(void) +{ + mutex_unlock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + int o2nm_depend_item(struct config_item *item) { return configfs_depend_item(&o2nm_cluster_group.cs_subsys, item);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: alex chen alex.chen@huawei.com
commit 3e4c56d41eef5595035872a2ec5a483f42e8917f upstream.
ip_alloc_sem should be taken in ocfs2_get_block() when reading file in DIRECT mode to prevent concurrent access to extent tree with ocfs2_dio_end_io_write(), which may cause BUGON in the following situation:
read file 'A' end_io of writing file 'A' vfs_read __vfs_read ocfs2_file_read_iter generic_file_read_iter ocfs2_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO get_more_blocks ocfs2_get_block ocfs2_extent_map_get_blocks ocfs2_get_clusters ocfs2_get_clusters_nocache() ocfs2_search_extent_list return the index of record which contains the v_cluster, that is v_cluster > rec[i]->e_cpos. ocfs2_dio_end_io ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem); ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ... --> modify the rec[i]->e_cpos, resulting in v_cluster < rec[i]->e_cpos. BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
[alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Alex Chen alex.chen@huawei.com Reviewed-by: Jun Piao piaojun@huawei.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Reviewed-by: Gang He ghe@suse.com Acked-by: Changwei Ge ge.changwei@h3c.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/aops.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
--- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -134,6 +134,19 @@ bail: return err; }
+static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) +{ + int ret = 0; + struct ocfs2_inode_info *oi = OCFS2_I(inode); + + down_read(&oi->ip_alloc_sem); + ret = ocfs2_get_block(inode, iblock, bh_result, create); + up_read(&oi->ip_alloc_sem); + + return ret; +} + int ocfs2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { @@ -2120,7 +2133,7 @@ static void ocfs2_dio_free_write_ctx(str * called like this: dio->get_blocks(dio->inode, fs_startblk, * fs_count, map_bh, dio->rw == WRITE); */ -static int ocfs2_dio_get_block(struct inode *inode, sector_t iblock, +static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); @@ -2146,12 +2159,9 @@ static int ocfs2_dio_get_block(struct in * while file size will be changed. */ if (pos + total_len <= i_size_read(inode)) { - down_read(&oi->ip_alloc_sem); - /* This is the fast path for re-write. */ - ret = ocfs2_get_block(inode, iblock, bh_result, create); - - up_read(&oi->ip_alloc_sem);
+ /* This is the fast path for re-write. */ + ret = ocfs2_lock_get_block(inode, iblock, bh_result, create); if (buffer_mapped(bh_result) && !buffer_new(bh_result) && ret == 0) @@ -2416,9 +2426,9 @@ static ssize_t ocfs2_direct_IO(struct ki return 0;
if (iov_iter_rw(iter) == READ) - get_block = ocfs2_get_block; + get_block = ocfs2_lock_get_block; else - get_block = ocfs2_dio_get_block; + get_block = ocfs2_dio_wr_get_block;
return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, get_block,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
commit 9e276de6a367cde07c1a63522152985d4e5cca8b upstream.
Consider a message size limit when calculating the maximum amount of data that can be read.
The message size limit has been introduced with 4.9, so cc it to stable.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: Cyrille Pitchen cyrille.pitchen@atmel.com Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/devices/m25p80.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/mtd/devices/m25p80.c +++ b/drivers/mtd/devices/m25p80.c @@ -172,7 +172,8 @@ static ssize_t m25p80_read(struct spi_no
t[1].rx_buf = buf; t[1].rx_nbits = m25p80_rx_nbits(nor); - t[1].len = min(len, spi_max_transfer_size(spi)); + t[1].len = min3(len, spi_max_transfer_size(spi), + spi_max_message_size(spi) - t[0].len); spi_message_add_tail(&t[1], &m);
ret = spi_sync(spi, &m);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Gorski jonas.gorski@gmail.com
commit 0135c03df914f0481c61f097c78d37cece84f330 upstream.
The bcm63xx SPI controller does not allow manual control of the CS lines and will toggle it automatically before and after sending data, so we are limited to messages that fit in the FIFO buffer. Since the CS lines aren't available as GPIOs either, we will need to make slave drivers aware of this limitation so they can handle them accordingly.
Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/spi/spi-bcm63xx.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/spi/spi-bcm63xx.c +++ b/drivers/spi/spi-bcm63xx.c @@ -428,6 +428,13 @@ static irqreturn_t bcm63xx_spi_interrupt return IRQ_HANDLED; }
+static size_t bcm63xx_spi_max_length(struct spi_device *dev) +{ + struct bcm63xx_spi *bs = spi_master_get_devdata(spi->master); + + return bs->fifo_size; +} + static const unsigned long bcm6348_spi_reg_offsets[] = { [SPI_CMD] = SPI_6348_CMD, [SPI_INT_STATUS] = SPI_6348_INT_STATUS, @@ -541,6 +548,8 @@ static int bcm63xx_spi_probe(struct plat master->transfer_one_message = bcm63xx_spi_transfer_one; master->mode_bits = MODEBITS; master->bits_per_word_mask = SPI_BPW_MASK(8); + master->max_transfer_size = bcm63xx_spi_max_length; + master->max_message_size = bcm63xx_spi_max_length; master->auto_runtime_pm = true; bs->msg_type_shift = bs->reg_offsets[SPI_MSG_TYPE_SHIFT]; bs->msg_ctl_width = bs->reg_offsets[SPI_MSG_CTL_WIDTH];
On Fri, Jul 20, 2018 at 02:13:28PM +0200, Greg Kroah-Hartman wrote:
The bcm63xx SPI controller does not allow manual control of the CS lines and will toggle it automatically before and after sending data, so we are limited to messages that fit in the FIFO buffer. Since the CS lines aren't available as GPIOs either, we will need to make slave drivers aware of this limitation so they can handle them accordingly.
Same review comment as before: this will ask other code (which presumably never worked?) to change how they use the hardware in order to cope with the limiations which seems a bit risky to me given that limits like this are fairly rare so that code isn't super well exercised in general.
On Fri, Jul 20, 2018 at 01:34:43PM +0100, Mark Brown wrote:
On Fri, Jul 20, 2018 at 02:13:28PM +0200, Greg Kroah-Hartman wrote:
The bcm63xx SPI controller does not allow manual control of the CS lines and will toggle it automatically before and after sending data, so we are limited to messages that fit in the FIFO buffer. Since the CS lines aren't available as GPIOs either, we will need to make slave drivers aware of this limitation so they can handle them accordingly.
Same review comment as before: this will ask other code (which presumably never worked?) to change how they use the hardware in order to cope with the limiations which seems a bit risky to me given that limits like this are fairly rare so that code isn't super well exercised in general.
Ok, now I think I understand this better. I've now dropped this, and the follow-on fixup patch from the 4.9 queue. Thanks for being patient with me.
greg k-h
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Gorski jonas.gorski@gmail.com
commit 9c86b846ce02f7e35d7234cf090b80553eba5389 upstream.
Check the return code of prepare_enable and change one last instance of enable only to prepare_enable. Also properly disable and release the clock in error paths and on remove for enetsw.
Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 31 ++++++++++++++++++++------- 1 file changed, 23 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1790,7 +1790,9 @@ static int bcm_enet_probe(struct platfor ret = PTR_ERR(priv->mac_clk); goto out; } - clk_prepare_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk_mac;
/* initialize default and fetch platform data */ priv->rx_ring_size = BCMENET_DEF_RX_DESC; @@ -1822,9 +1824,11 @@ static int bcm_enet_probe(struct platfor if (IS_ERR(priv->phy_clk)) { ret = PTR_ERR(priv->phy_clk); priv->phy_clk = NULL; - goto out_put_clk_mac; + goto out_disable_clk_mac; } - clk_prepare_enable(priv->phy_clk); + ret = clk_prepare_enable(priv->phy_clk); + if (ret) + goto out_put_clk_phy; }
/* do minimal hardware init to be able to probe mii bus */ @@ -1915,13 +1919,16 @@ out_free_mdio: out_uninit_hw: /* turn off mdc clock */ enet_writel(priv, 0, ENET_MIISC_REG); - if (priv->phy_clk) { + if (priv->phy_clk) clk_disable_unprepare(priv->phy_clk); + +out_put_clk_phy: + if (priv->phy_clk) clk_put(priv->phy_clk); - }
-out_put_clk_mac: +out_disable_clk_mac: clk_disable_unprepare(priv->mac_clk); +out_put_clk_mac: clk_put(priv->mac_clk); out: free_netdev(dev); @@ -2766,7 +2773,9 @@ static int bcm_enetsw_probe(struct platf ret = PTR_ERR(priv->mac_clk); goto out_unmap; } - clk_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk;
priv->rx_chan = 0; priv->tx_chan = 1; @@ -2787,7 +2796,7 @@ static int bcm_enetsw_probe(struct platf
ret = register_netdev(dev); if (ret) - goto out_put_clk; + goto out_disable_clk;
netif_carrier_off(dev); platform_set_drvdata(pdev, dev); @@ -2796,6 +2805,9 @@ static int bcm_enetsw_probe(struct platf
return 0;
+out_disable_clk: + clk_disable_unprepare(priv->mac_clk); + out_put_clk: clk_put(priv->mac_clk);
@@ -2827,6 +2839,9 @@ static int bcm_enetsw_remove(struct plat res = platform_get_resource(pdev, IORESOURCE_MEM, 0); release_mem_region(res->start, resource_size(res));
+ clk_disable_unprepare(priv->mac_clk); + clk_put(priv->mac_clk); + free_netdev(dev); return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonas Gorski jonas.gorski@gmail.com
commit d6213c1f2ad54a964b77471690264ed685718928 upstream.
The DMA controller regs actually point to DMA channel 0, so the write to ENETDMA_CFG_REG will actually modify a random DMA channel.
Since DMA controller registers do not exist on BCM6345, guard the write with the usual check for dma_has_sram.
Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1063,7 +1063,8 @@ static int bcm_enet_open(struct net_devi val = enet_readl(priv, ENET_CTL_REG); val |= ENET_CTL_ENABLE_MASK; enet_writel(priv, val, ENET_CTL_REG); - enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); + if (priv->dma_has_sram) + enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); enet_dmac_writel(priv, priv->dma_chan_en_mask, ENETDMAC_CHANCFG, priv->rx_chan);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Lamparter chunkeey@googlemail.com
commit a728a196d253530f17da5c86dc7dfbe58c5f7094 upstream.
alg entries are only added to the list, after the registration was successful. If the registration failed, it was never added to the list in the first place.
Signed-off-by: Christian Lamparter chunkeey@googlemail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/amcc/crypto4xx_core.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -1034,12 +1034,10 @@ int crypto4xx_register_alg(struct crypto break; }
- if (rc) { - list_del(&alg->entry); + if (rc) kfree(alg); - } else { + else list_add_tail(&alg->entry, &sec_dev->alg_list); - } }
return 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Lamparter chunkeey@googlemail.com
commit 5d59ad6eea82ef8df92b4109615a0dde9d8093e9 upstream.
If one of the later memory allocations in rypto4xx_build_pdr() fails: dev->pdr (and/or) dev->pdr_uinfo wouldn't be freed.
crypto4xx_build_sdr() has the same issue with dev->sdr.
Signed-off-by: Christian Lamparter chunkeey@googlemail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/amcc/crypto4xx_core.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
--- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -208,7 +208,7 @@ static u32 crypto4xx_build_pdr(struct cr dev->pdr_pa); return -ENOMEM; } - memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); + memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); dev->shadow_sa_pool = dma_alloc_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, &dev->shadow_sa_pool_pa, @@ -241,13 +241,15 @@ static u32 crypto4xx_build_pdr(struct cr
static void crypto4xx_destroy_pdr(struct crypto4xx_device *dev) { - if (dev->pdr != NULL) + if (dev->pdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_pd) * PPC4XX_NUM_PD, dev->pdr, dev->pdr_pa); + if (dev->shadow_sa_pool) dma_free_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, dev->shadow_sa_pool, dev->shadow_sa_pool_pa); + if (dev->shadow_sr_pool) dma_free_coherent(dev->core_dev->device, sizeof(struct sa_state_record) * PPC4XX_NUM_PD, @@ -417,12 +419,12 @@ static u32 crypto4xx_build_sdr(struct cr
static void crypto4xx_destroy_sdr(struct crypto4xx_device *dev) { - if (dev->sdr != NULL) + if (dev->sdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_sd) * PPC4XX_NUM_SD, dev->sdr, dev->sdr_pa);
- if (dev->scatter_buffer_va != NULL) + if (dev->scatter_buffer_va) dma_free_coherent(dev->core_dev->device, dev->scatter_buffer_size * PPC4XX_NUM_SD, dev->scatter_buffer_va, @@ -1191,7 +1193,7 @@ static int crypto4xx_probe(struct platfo
rc = crypto4xx_build_gdr(core_dev->dev); if (rc) - goto err_build_gdr; + goto err_build_pdr;
rc = crypto4xx_build_sdr(core_dev->dev); if (rc) @@ -1234,12 +1236,11 @@ err_iomap: err_request_irq: irq_dispose_mapping(core_dev->irq); tasklet_kill(&core_dev->tasklet); - crypto4xx_destroy_sdr(core_dev->dev); err_build_sdr: + crypto4xx_destroy_sdr(core_dev->dev); crypto4xx_destroy_gdr(core_dev->dev); -err_build_gdr: - crypto4xx_destroy_pdr(core_dev->dev); err_build_pdr: + crypto4xx_destroy_pdr(core_dev->dev); kfree(core_dev->dev); err_alloc_dev: kfree(core_dev);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Gustavo A. R. Silva" gustavo@embeddedor.com
[ Upstream commit ced9e191501e52b95e1b57b8e0db00943869eed0 ]
pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap)
Fix this by sanitizing pool before using it to index zatm_dev->pool_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/atm/zatm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/atm/zatm.c +++ b/drivers/atm/zatm.c @@ -1483,6 +1483,8 @@ static int zatm_ioctl(struct atm_dev *de return -EFAULT; if (pool < 0 || pool > ZATM_LAST_POOL) return -EINVAL; + pool = array_index_nospec(pool, + ZATM_LAST_POOL + 1); if (copy_from_user(&info, &((struct zatm_pool_req __user *) arg)->info, sizeof(info))) return -EFAULT;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 30877961b1cdd6fdca783c2e8c4f0f47e95dc58c ]
Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted the mtu from the master device when creating a ipvlan device, but it would also override the mtu value set in rtnl_create_link. It causes IFLA_MTU param not to take effect.
So this patch is to not adjust the mtu if IFLA_MTU param is set when creating a ipvlan device.
Fixes: 296d48568042 ("ipvlan: inherit MTU from master device") Reported-by: Jianlin Shi jishi@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipvlan/ipvlan_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -525,7 +525,8 @@ static int ipvlan_link_new(struct net *s ipvlan->dev = dev; ipvlan->port = port; ipvlan->sfeatures = IPVLAN_FEATURES; - ipvlan_adjust_mtu(ipvlan, phy_dev); + if (!tb[IFLA_MTU]) + ipvlan_adjust_mtu(ipvlan, phy_dev); INIT_LIST_HEAD(&ipvlan->addrs);
/* TODO Probably put random address here to be presented to the
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 74174fe5634ffbf645a7ca5a261571f700b2f332 ]
On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive.
syzbot reported :
BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid3.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -624,9 +624,8 @@ static void ccid3_hc_rx_send_feedback(st case CCID3_FBACK_PERIODIC: delta = ktime_us_delta(now, hc->rx_tstamp_last_feedback); if (delta <= 0) - DCCP_BUG("delta (%ld) <= 0", (long)delta); - else - hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); + delta = 1; + hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); break; default: return;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0ce4e70ff00662ad7490e545ba0cd8c1fa179fca ]
To compute delays, better not use time of the day which can be changed by admins or malicious programs.
Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Gerrit Renker gerrit@erg.abdn.ac.uk Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/dccp/ccids/ccid3.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -599,7 +599,7 @@ static void ccid3_hc_rx_send_feedback(st { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - ktime_t now = ktime_get_real(); + ktime_t now = ktime_get(); s64 delta = 0;
switch (fbtype) { @@ -631,7 +631,7 @@ static void ccid3_hc_rx_send_feedback(st return; }
- ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + ccid3_pr_debug("Interval %lldusec, X_recv=%u, 1/p=%u\n", delta, hc->rx_x_recv, hc->rx_pinv);
hc->rx_tstamp_last_feedback = now; @@ -678,7 +678,8 @@ static int ccid3_hc_rx_insert_options(st static u32 ccid3_first_li(struct sock *sk) { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); - u32 x_recv, p, delta; + u32 x_recv, p; + s64 delta; u64 fval;
if (hc->rx_rtt == 0) { @@ -686,7 +687,9 @@ static u32 ccid3_first_li(struct sock *s hc->rx_rtt = DCCP_FALLBACK_RTT; }
- delta = ktime_to_us(net_timedelta(hc->rx_tstamp_last_feedback)); + delta = ktime_us_delta(ktime_get(), hc->rx_tstamp_last_feedback); + if (delta <= 0) + delta = 1; x_recv = scaled_div32(hc->rx_bytes_recv, delta); if (x_recv == 0) { /* would also trigger divide-by-zero */ DCCP_WARN("X_recv==0\n");
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Vesker valex@mellanox.com
[ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ]
The NULL character was not set correctly for the string containing the command length, this caused failures reading the output of the command due to a random length. The fix is to initialize the output length string.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -1256,7 +1256,7 @@ static ssize_t outlen_write(struct file { struct mlx5_core_dev *dev = filp->private_data; struct mlx5_cmd_debug *dbg = &dev->cmd.dbg; - char outlen_str[8]; + char outlen_str[8] = {0}; int outlen; void *ptr; int err; @@ -1271,8 +1271,6 @@ static ssize_t outlen_write(struct file if (copy_from_user(outlen_str, buf, count)) return -EFAULT;
- outlen_str[7] = 0; - err = sscanf(outlen_str, "%d", &outlen); if (err < 0) return err;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shay Agroskin shayag@mellanox.com
[ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ]
The driver allocates wrong size (due to wrong struct name) when issuing a query/set request to NIC's register.
Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Shay Agroskin shayag@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -575,7 +575,7 @@ EXPORT_SYMBOL_GPL(mlx5_set_port_prio_tc) static int mlx5_set_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *in, int inlen) { - u32 out[MLX5_ST_SZ_DW(qtct_reg)]; + u32 out[MLX5_ST_SZ_DW(qetc_reg)];
if (!MLX5_CAP_GEN(mdev, ets)) return -ENOTSUPP; @@ -587,7 +587,7 @@ static int mlx5_set_port_qetcr_reg(struc static int mlx5_query_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *out, int outlen) { - u32 in[MLX5_ST_SZ_DW(qtct_reg)]; + u32 in[MLX5_ST_SZ_DW(qetc_reg)];
if (!MLX5_CAP_GEN(mdev, ets)) return -ENOTSUPP;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Khlebnikov khlebnikov@yandex-team.ru
[ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ]
When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice.
In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly.
This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued.
Signed-off-by: Konstantin Khlebnikov khlebnikov@yandex-team.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_blackhole.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/sch_blackhole.c +++ b/net/sched/sch_blackhole.c @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_b struct sk_buff **to_free) { qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS; + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; }
static struct sk_buff *blackhole_dequeue(struct Qdisc *sch)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ]
After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), sungem owners reported the infamous "eth0: hw csum failure" message.
CHECKSUM_COMPLETE has in fact never worked for this driver, but this was masked by the fact that upper stacks had to strip the FCS, and therefore skb->ip_summed was set back to CHECKSUM_NONE before my recent change.
Driver configures a number of bytes to skip when the chip computes the checksum, and for some reason only half of the Ethernet header was skipped.
Then a second problem is that we should strip the FCS by default, unless the driver is updated to eventually support NETIF_F_RXFCS in the future.
Finally, a driver should check if NETIF_F_RXCSUM feature is enabled or not, so that the admin can turn off rx checksum if wanted.
Many thanks to Andreas Schwab and Mathieu Malaterre for their help in debugging this issue.
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Meelis Roos mroos@linux.ee Reported-by: Mathieu Malaterre malat@debian.org Reported-by: Andreas Schwab schwab@linux-m68k.org Tested-by: Andreas Schwab schwab@linux-m68k.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/sun/sungem.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
--- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -59,8 +59,7 @@ #include <linux/sungem_phy.h> #include "sungem.h"
-/* Stripping FCS is causing problems, disabled for now */ -#undef STRIP_FCS +#define STRIP_FCS
#define DEFAULT_MSG (NETIF_MSG_DRV | \ NETIF_MSG_PROBE | \ @@ -434,7 +433,7 @@ static int gem_rxmac_reset(struct gem *g writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW); writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) writel(((5 & RXDMA_BLANK_IPKTS) | @@ -759,7 +758,6 @@ static int gem_rx(struct gem *gp, int wo struct net_device *dev = gp->dev; int entry, drops, work_done = 0; u32 done; - __sum16 csum;
if (netif_msg_rx_status(gp)) printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", @@ -854,9 +852,13 @@ static int gem_rx(struct gem *gp, int wo skb = copy_skb; }
- csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; + if (likely(dev->features & NETIF_F_RXCSUM)) { + __sum16 csum; + + csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); + skb->csum = csum_unfold(csum); + skb->ip_summed = CHECKSUM_COMPLETE; + } skb->protocol = eth_type_trans(skb, gp->dev);
napi_gro_receive(&gp->napi, skb); @@ -1754,7 +1756,7 @@ static void gem_init_dma(struct gem *gp) writel(0, gp->regs + TXDMA_KICK);
val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG);
writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); @@ -2972,8 +2974,8 @@ static int gem_init_one(struct pci_dev * pci_set_drvdata(pdev, dev);
/* We can do scatter/gather and HW checksum */ - dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM; - dev->features |= dev->hw_features | NETIF_F_RXCSUM; + dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->features = dev->hw_features; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ]
Use the correct size value while copying chassis/port id values.
Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c @@ -677,9 +677,9 @@ qed_dcbx_get_local_lldp_params(struct qe p_local = &p_hwfn->p_dcbx_info->lldp_local[LLDP_NEAREST_BRIDGE];
memcpy(params->lldp_local.local_chassis_id, p_local->local_chassis_id, - ARRAY_SIZE(p_local->local_chassis_id)); + sizeof(p_local->local_chassis_id)); memcpy(params->lldp_local.local_port_id, p_local->local_port_id, - ARRAY_SIZE(p_local->local_port_id)); + sizeof(p_local->local_port_id)); }
static void @@ -692,9 +692,9 @@ qed_dcbx_get_remote_lldp_params(struct q p_remote = &p_hwfn->p_dcbx_info->lldp_remote[LLDP_NEAREST_BRIDGE];
memcpy(params->lldp_remote.peer_chassis_id, p_remote->peer_chassis_id, - ARRAY_SIZE(p_remote->peer_chassis_id)); + sizeof(p_remote->peer_chassis_id)); memcpy(params->lldp_remote.peer_port_id, p_remote->peer_port_id, - ARRAY_SIZE(p_remote->peer_port_id)); + sizeof(p_remote->peer_port_id)); }
static int
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudarsana Reddy Kalluru sudarsana.kalluru@cavium.com
[ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ]
Memory size is limited in the kdump kernel environment. Allocation of more msix-vectors (or queues) consumes few tens of MBs of memory, which might lead to the kdump kernel failure. This patch adds changes to limit the number of MSI-X vectors in kdump kernel to minimum required value (i.e., 2 per engine).
Fixes: fe56b9e6a ("qed: Add module with basic common support") Signed-off-by: Sudarsana Reddy Kalluru Sudarsana.Kalluru@cavium.com Signed-off-by: Michal Kalderon Michal.Kalderon@cavium.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_main.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -23,6 +23,7 @@ #include <linux/vmalloc.h> #include <linux/qed/qed_if.h> #include <linux/qed/qed_ll2_if.h> +#include <linux/crash_dump.h>
#include "qed.h" #include "qed_sriov.h" @@ -701,6 +702,14 @@ static int qed_slowpath_setup_int(struct /* We want a minimum of one slowpath and one fastpath vector per hwfn */ cdev->int_params.in.min_msix_cnt = cdev->num_hwfns * 2;
+ if (is_kdump_kernel()) { + DP_INFO(cdev, + "Kdump kernel: Limit the max number of requested MSI-X vectors to %hd\n", + cdev->int_params.in.min_msix_cnt); + cdev->int_params.in.num_vectors = + cdev->int_params.in.min_msix_cnt; + } + rc = qed_set_int_mode(cdev, false); if (rc) { DP_ERR(cdev, "qed_slowpath_setup_int ERR\n");
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ]
When unplugging an r8152 adapter while the interface is UP, the NIC becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to napi_disable, but the napi is already deleted by disconnect above. So the first while loop in napi_disable never finishes. This results in complete deadlock of the network layer as there is rtnl_mutex held by unregister_netdev.
So avoid the call to napi_disable in rtl8152_close when the device is already gone.
The other calls to usb_kill_urb, cancel_delayed_work_sync, netif_stop_queue etc. seem to be fine. The urb and netdev is not destroyed yet.
Signed-off-by: Jiri Slaby jslaby@suse.cz Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/r8152.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3327,7 +3327,8 @@ static int rtl8152_close(struct net_devi #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + if (!test_bit(RTL8152_UNPLUG, &tp->flags)) + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ]
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts.
Reported-by: Daniele Iamartino danielei@google.com Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/sysctl_net_ipv4.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -212,8 +212,9 @@ static int proc_tcp_fastopen_key(struct { struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) }; struct tcp_fastopen_context *ctxt; - int ret; u32 user_key[4]; /* 16 bytes, matching TCP_FASTOPEN_KEY_LENGTH */ + __le32 key[4]; + int ret, i;
tbl.data = kmalloc(tbl.maxlen, GFP_KERNEL); if (!tbl.data) @@ -222,11 +223,14 @@ static int proc_tcp_fastopen_key(struct rcu_read_lock(); ctxt = rcu_dereference(tcp_fastopen_ctx); if (ctxt) - memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); + memcpy(key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); else - memset(user_key, 0, sizeof(user_key)); + memset(key, 0, sizeof(key)); rcu_read_unlock();
+ for (i = 0; i < ARRAY_SIZE(key); i++) + user_key[i] = le32_to_cpu(key[i]); + snprintf(tbl.data, tbl.maxlen, "%08x-%08x-%08x-%08x", user_key[0], user_key[1], user_key[2], user_key[3]); ret = proc_dostring(&tbl, write, buffer, lenp, ppos); @@ -242,12 +246,16 @@ static int proc_tcp_fastopen_key(struct * first invocation of tcp_fastopen_cookie_gen */ tcp_fastopen_init_key_once(false); - tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH); + + for (i = 0; i < ARRAY_SIZE(user_key); i++) + key[i] = cpu_to_le32(user_key[i]); + + tcp_fastopen_reset_cipher(key, TCP_FASTOPEN_KEY_LENGTH); }
bad_key: pr_debug("proc FO key set 0x%x-%x-%x-%x <- 0x%s: %u\n", - user_key[0], user_key[1], user_key[2], user_key[3], + user_key[0], user_key[1], user_key[2], user_key[3], (char *)tbl.data, ret); kfree(tbl.data); return ret;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ]
Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when we meet errors during ubuf allocation, the code does not check for NULL before calling sockfd_put(), this will lead NULL dereferencing. Fixing by checking sock pointer before.
Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1052,7 +1052,8 @@ err_used: if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: - sockfd_put(sock); + if (sock) + sockfd_put(sock); err_vq: mutex_unlock(&vq->mutex); err:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ]
We should put copy_skb in receive_queue only after a successful call to virtio_net_hdr_from_skb().
syzbot report :
BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline] BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline] BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __skb_unlink include/linux/skbuff.h:1843 [inline] __skb_dequeue include/linux/skbuff.h:1863 [inline] skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331 packet_release+0x630/0xd90 net/packet/af_packet.c:2991 __sock_release+0xd7/0x260 net/socket.c:603 sock_close+0x19/0x20 net/socket.c:1186 __fput+0x35b/0x8b0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:865 do_group_exit+0x177/0x440 kernel/exit.c:968 __do_sys_exit_group kernel/exit.c:979 [inline] __se_sys_exit_group kernel/exit.c:977 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4448e9 Code: Bad RIP value. RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9 RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001 RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000 R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0 R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb net/core/skbuff.c:642 [inline] kfree_skb+0x1a5/0x580 net/core/skbuff.c:659 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8801b044ecc0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 0 bytes inside of 232-byte region [ffff8801b044ecc0, ffff8801b044eda8) The buggy address belongs to the page: page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0 raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^ ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2265,6 +2265,12 @@ static int tpacket_rcv(struct sk_buff *s if (po->stats.stats1.tp_drops) status |= TP_STATUS_LOSING; } + + if (do_vnet && + __packet_rcv_vnet(skb, h.raw + macoff - + sizeof(struct virtio_net_hdr))) + goto drop_n_account; + po->stats.stats1.tp_packets++; if (copy_skb) { status |= TP_STATUS_COPY; @@ -2272,14 +2278,6 @@ static int tpacket_rcv(struct sk_buff *s } spin_unlock(&sk->sk_receive_queue.lock);
- if (do_vnet) { - if (__packet_rcv_vnet(skb, h.raw + macoff - - sizeof(struct virtio_net_hdr))) { - spin_lock(&sk->sk_receive_queue.lock); - goto drop_n_account; - } - } - skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp)))
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Vesker valex@mellanox.com
[ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ]
The command interface can work in two modes: Events and Polling. In the general case, each time we invoke a command, a work is queued to handle it.
When working in events, the interrupt handler completes the command execution. On the other hand, when working in polling mode, the work itself completes it.
Due to a bug in the work handler, a command could have been completed by the interrupt handler, while the work handler hasn't finished yet, causing the it to complete once again if the command interface mode was changed from Events to polling after the interrupt handler was called.
mlx5_unload_one() mlx5_stop_eqs() // Destroy the EQ before cmd EQ ...cmd_work_handler() write_doorbell() --> EVENT_TYPE_CMD mlx5_cmd_comp_handler() // First free free_ent(cmd, ent->idx) complete(&ent->done)
<-- mlx5_stop_eqs //cmd was complete // move to polling before destroying the last cmd EQ mlx5_cmd_use_polling() cmd->mode = POLL;
--> cmd_work_handler (continues) if (cmd->mode == POLL) mlx5_cmd_comp_handler() // Double free
The solution is to store the cmd->mode before writing the doorbell.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker valex@mellanox.com Signed-off-by: Saeed Mahameed saeedm@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -784,6 +784,7 @@ static void cmd_work_handler(struct work struct semaphore *sem; unsigned long flags; int alloc_ret; + int cmd_mode;
sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); @@ -830,6 +831,7 @@ static void cmd_work_handler(struct work set_signature(ent, !cmd->checksum_disabled); dump_command(dev, ent, 1); ent->ts1 = ktime_get_ns(); + cmd_mode = cmd->mode;
if (ent->callback) schedule_delayed_work(&ent->cb_timeout_work, cb_timeout); @@ -854,7 +856,7 @@ static void cmd_work_handler(struct work iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell); mmiowb(); /* if not in polling don't use ent after this point */ - if (cmd->mode == CMD_MODE_POLLING) { + if (cmd_mode == CMD_MODE_POLLING) { poll_timeout(ent); /* make sure we read the descriptor after ownership is SW */ rmb();
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream.
t.qset_idx can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl() warn: potential spectre issue 'adapter->msix_info'
Fix this by sanitizing t.qset_idx before using it to index adapter->msix_info
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -50,6 +50,7 @@ #include <linux/stringify.h> #include <linux/sched.h> #include <linux/slab.h> +#include <linux/nospec.h> #include <asm/uaccess.h>
#include "common.h" @@ -2259,6 +2260,7 @@ static int cxgb_extension_ioctl(struct n
if (t.qset_idx >= nqsets) return -EINVAL; + t.qset_idx = array_index_nospec(t.qset_idx, nqsets);
q = &adapter->params.sge.qset[q1 + t.qset_idx]; t.rspq_size = q->rspq_size;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream.
Without this patch, firmware will not run properly on rtl8821ae, and it causes bad user experience. For example, bad connection performance with low rate, higher power consumption, and so on.
rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and each firmware has firmware data buffer and size individually. Original code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and this mismatch causes firmware checksum error, then firmware can't start.
In this situation, driver gives message "Firmware is not ready to run!".
Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware") Signed-off-by: Ping-Ke Shih pkshih@realtek.com Cc: Stable stable@vger.kernel.org # 4.0+ Reviewed-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/realtek/rtlwifi/core.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -131,7 +131,6 @@ found_alt: firmware->size); rtlpriv->rtlhal.wowlan_fwsize = firmware->size; } - rtlpriv->rtlhal.fwsize = firmware->size; release_firmware(firmware); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Wahren stefan.wahren@i2se.com
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly.
This patch was tested on Raspberry Pi 3B+
Link: https://github.com/raspberrypi/linux/issues/2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable stable@vger.kernel.org Signed-off-by: Floris Bos bos@je-eigen-domein.nl Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/lan78xx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2964,6 +2964,7 @@ static void lan78xx_tx_bh(struct lan78xx pkt_cnt = 0; count = 0; length = 0; + spin_lock_irqsave(&tqp->lock, flags); for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) { if (skb_is_gso(skb)) { if (pkt_cnt) { @@ -2972,7 +2973,8 @@ static void lan78xx_tx_bh(struct lan78xx } count = 1; length = skb->len - TX_OVERHEAD; - skb2 = skb_dequeue(tqp); + __skb_unlink(skb, tqp); + spin_unlock_irqrestore(&tqp->lock, flags); goto gso_skb; }
@@ -2981,6 +2983,7 @@ static void lan78xx_tx_bh(struct lan78xx skb_totallen = skb->len + roundup(skb_totallen, sizeof(u32)); pkt_cnt++; } + spin_unlock_irqrestore(&tqp->lock, flags);
/* copy to a single skb */ skb = alloc_skb(skb_totallen, GFP_ATOMIC);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream.
the ebtables evaluation loop expects targets to return positive values (jumps), or negative values (absolute verdicts).
This is completely different from what xtables does. In xtables, targets are expected to return the standard netfilter verdicts, i.e. NF_DROP, NF_ACCEPT, etc.
ebtables will consider these as jumps.
Therefore reject any target found due to unspec fallback. v2: also reject watchers. ebtables ignores their return value, so a target that assumes skb ownership (and returns NF_STOLEN) causes use-after-free.
The only watchers in the 'ebtables' front-end are log and nflog; both have AF_BRIDGE specific wrappers on kernel side.
Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/netfilter/ebtables.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -406,6 +406,12 @@ ebt_check_watcher(struct ebt_entry_watch watcher = xt_request_find_target(NFPROTO_BRIDGE, w->u.name, 0); if (IS_ERR(watcher)) return PTR_ERR(watcher); + + if (watcher->family != NFPROTO_BRIDGE) { + module_put(watcher->me); + return -ENOENT; + } + w->u.watcher = watcher;
par->target = watcher; @@ -727,6 +733,13 @@ ebt_check_entry(struct ebt_entry *e, str goto cleanup_watchers; }
+ /* Reject UNSPEC, xtables verdicts/return values are incompatible */ + if (target->family != NFPROTO_BRIDGE) { + module_put(target->me); + ret = -ENOENT; + goto cleanup_watchers; + } + t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit fe10e398e860955bac4d28ec031b701d358465e4 upstream.
ReiserFS prepares log messages into a 1024-byte buffer with no bounds checks. Long messages, such as the "unknown mount option" warning when userspace passes a crafted mount options string, overflow this buffer. This causes KASAN to report a global-out-of-bounds write.
Fix it by truncating messages to the buffer size.
Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/reiserfs/prints.c | 141 +++++++++++++++++++++++++++++---------------------- 1 file changed, 81 insertions(+), 60 deletions(-)
--- a/fs/reiserfs/prints.c +++ b/fs/reiserfs/prints.c @@ -76,83 +76,99 @@ static char *le_type(struct reiserfs_key }
/* %k */ -static void sprintf_le_key(char *buf, struct reiserfs_key *key) +static int scnprintf_le_key(char *buf, size_t size, struct reiserfs_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", le32_to_cpu(key->k_dir_id), - le32_to_cpu(key->k_objectid), le_offset(key), - le_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + le32_to_cpu(key->k_dir_id), + le32_to_cpu(key->k_objectid), le_offset(key), + le_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
/* %K */ -static void sprintf_cpu_key(char *buf, struct cpu_key *key) +static int scnprintf_cpu_key(char *buf, size_t size, struct cpu_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", key->on_disk_key.k_dir_id, - key->on_disk_key.k_objectid, reiserfs_cpu_offset(key), - cpu_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + key->on_disk_key.k_dir_id, + key->on_disk_key.k_objectid, + reiserfs_cpu_offset(key), cpu_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
-static void sprintf_de_head(char *buf, struct reiserfs_de_head *deh) +static int scnprintf_de_head(char *buf, size_t size, + struct reiserfs_de_head *deh) { if (deh) - sprintf(buf, - "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", - deh_offset(deh), deh_dir_id(deh), deh_objectid(deh), - deh_location(deh), deh_state(deh)); + return scnprintf(buf, size, + "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", + deh_offset(deh), deh_dir_id(deh), + deh_objectid(deh), deh_location(deh), + deh_state(deh)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]");
}
-static void sprintf_item_head(char *buf, struct item_head *ih) +static int scnprintf_item_head(char *buf, size_t size, struct item_head *ih) { if (ih) { - strcpy(buf, - (ih_version(ih) == KEY_FORMAT_3_6) ? "*3.6* " : "*3.5*"); - sprintf_le_key(buf + strlen(buf), &(ih->ih_key)); - sprintf(buf + strlen(buf), ", item_len %d, item_location %d, " - "free_space(entry_count) %d", - ih_item_len(ih), ih_location(ih), ih_free_space(ih)); + char *p = buf; + char * const end = buf + size; + + p += scnprintf(p, end - p, "%s", + (ih_version(ih) == KEY_FORMAT_3_6) ? + "*3.6* " : "*3.5*"); + + p += scnprintf_le_key(p, end - p, &ih->ih_key); + + p += scnprintf(p, end - p, + ", item_len %d, item_location %d, free_space(entry_count) %d", + ih_item_len(ih), ih_location(ih), + ih_free_space(ih)); + return p - buf; } else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); }
-static void sprintf_direntry(char *buf, struct reiserfs_dir_entry *de) +static int scnprintf_direntry(char *buf, size_t size, + struct reiserfs_dir_entry *de) { char name[20];
memcpy(name, de->de_name, de->de_namelen > 19 ? 19 : de->de_namelen); name[de->de_namelen > 19 ? 19 : de->de_namelen] = 0; - sprintf(buf, ""%s"==>[%d %d]", name, de->de_dir_id, de->de_objectid); + return scnprintf(buf, size, ""%s"==>[%d %d]", + name, de->de_dir_id, de->de_objectid); }
-static void sprintf_block_head(char *buf, struct buffer_head *bh) +static int scnprintf_block_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, "level=%d, nr_items=%d, free_space=%d rdkey ", - B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); + return scnprintf(buf, size, + "level=%d, nr_items=%d, free_space=%d rdkey ", + B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); }
-static void sprintf_buffer_head(char *buf, struct buffer_head *bh) +static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, - "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", - bh->b_bdev, bh->b_size, - (unsigned long long)bh->b_blocknr, atomic_read(&(bh->b_count)), - bh->b_state, bh->b_page, - buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", - buffer_dirty(bh) ? "DIRTY" : "CLEAN", - buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); + return scnprintf(buf, size, + "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", + bh->b_bdev, bh->b_size, + (unsigned long long)bh->b_blocknr, + atomic_read(&(bh->b_count)), + bh->b_state, bh->b_page, + buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", + buffer_dirty(bh) ? "DIRTY" : "CLEAN", + buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); }
-static void sprintf_disk_child(char *buf, struct disk_child *dc) +static int scnprintf_disk_child(char *buf, size_t size, struct disk_child *dc) { - sprintf(buf, "[dc_number=%d, dc_size=%u]", dc_block_number(dc), - dc_size(dc)); + return scnprintf(buf, size, "[dc_number=%d, dc_size=%u]", + dc_block_number(dc), dc_size(dc)); }
static char *is_there_reiserfs_struct(char *fmt, int *what) @@ -189,55 +205,60 @@ static void prepare_error_buf(const char char *fmt1 = fmt_buf; char *k; char *p = error_buf; + char * const end = &error_buf[sizeof(error_buf)]; int what;
spin_lock(&error_lock);
- strcpy(fmt1, fmt); + if (WARN_ON(strscpy(fmt_buf, fmt, sizeof(fmt_buf)) < 0)) { + strscpy(error_buf, "format string too long", end - error_buf); + goto out_unlock; + }
while ((k = is_there_reiserfs_struct(fmt1, &what)) != NULL) { *k = 0;
- p += vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args);
switch (what) { case 'k': - sprintf_le_key(p, va_arg(args, struct reiserfs_key *)); + p += scnprintf_le_key(p, end - p, + va_arg(args, struct reiserfs_key *)); break; case 'K': - sprintf_cpu_key(p, va_arg(args, struct cpu_key *)); + p += scnprintf_cpu_key(p, end - p, + va_arg(args, struct cpu_key *)); break; case 'h': - sprintf_item_head(p, va_arg(args, struct item_head *)); + p += scnprintf_item_head(p, end - p, + va_arg(args, struct item_head *)); break; case 't': - sprintf_direntry(p, - va_arg(args, - struct reiserfs_dir_entry *)); + p += scnprintf_direntry(p, end - p, + va_arg(args, struct reiserfs_dir_entry *)); break; case 'y': - sprintf_disk_child(p, - va_arg(args, struct disk_child *)); + p += scnprintf_disk_child(p, end - p, + va_arg(args, struct disk_child *)); break; case 'z': - sprintf_block_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_block_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'b': - sprintf_buffer_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_buffer_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'a': - sprintf_de_head(p, - va_arg(args, - struct reiserfs_de_head *)); + p += scnprintf_de_head(p, end - p, + va_arg(args, struct reiserfs_de_head *)); break; }
- p += strlen(p); fmt1 = k + 2; } - vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args); +out_unlock: spin_unlock(&error_lock);
}
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream.
My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway.
A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option.
Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer.
Reproducer for the WARN_ONCE() that syzbot hit:
perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s
Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there):
perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s
Reported-by: syzbot syzkaller@googlegroups.com Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/dns_resolver/dns_key.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)
--- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -87,35 +87,39 @@ dns_resolver_preparse(struct key_prepars opt++; kdebug("options: '%s'", opt); do { + int opt_len, opt_nlen; const char *eq; - int opt_len, opt_nlen, opt_vlen, tmp; + char optval[128];
next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (opt_len <= 0 || opt_len > 128) { + if (opt_len <= 0 || opt_len > sizeof(optval)) { pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", opt_len); return -EINVAL; }
- eq = memchr(opt, '=', opt_len) ?: end; - opt_nlen = eq - opt; - eq++; - opt_vlen = next_opt - eq; /* will be -1 if no value */ - - tmp = opt_vlen >= 0 ? opt_vlen : 0; - kdebug("option '%*.*s' val '%*.*s'", - opt_nlen, opt_nlen, opt, tmp, tmp, eq); + eq = memchr(opt, '=', opt_len); + if (eq) { + opt_nlen = eq - opt; + eq++; + memcpy(optval, eq, next_opt - eq); + optval[next_opt - eq] = '\0'; + } else { + opt_nlen = opt_len; + optval[0] = '\0'; + } + + kdebug("option '%*.*s' val '%s'", + opt_nlen, opt_nlen, opt, optval);
/* see if it's an error number representing a DNS error * that's to be recorded as the result in this key */ if (opt_nlen == sizeof(DNS_ERRORNO_OPTION) - 1 && memcmp(opt, DNS_ERRORNO_OPTION, opt_nlen) == 0) { kdebug("dns error number option"); - if (opt_vlen <= 0) - goto bad_option_value;
- ret = kstrtoul(eq, 10, &derrno); + ret = kstrtoul(optval, 10, &derrno); if (ret < 0) goto bad_option_value;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream.
Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg | tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2
Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute.
Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier.
Reported-by: Eric Dumazet eric.dumazet@gmail.com Reported-by: John Sperbeck jsperbeck@google.com Signed-off-by: Florian Westphal fw@strlen.de Tested-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -618,6 +618,8 @@ int nf_ct_frag6_gather(struct net *net, fq->q.meat == fq->q.len && nf_ct_frag6_reasm(fq, skb, dev)) ret = 0; + else + skb_dst_drop(skb);
out_unlock: spin_unlock_bh(&fq->q.lock);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Santosh Shilimkar santosh.shilimkar@oracle.com
commit f1693c63ab133d16994cc50f773982b5905af264 upstream.
Loop transport which is self loopback, remote port congestion update isn't relevant. Infact the xmit path already ignores it. Receive path needs to do the same.
Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com Reviewed-by: Sowmini Varadhan sowmini.varadhan@oracle.com Signed-off-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/rds/loop.c | 1 + net/rds/rds.h | 5 +++++ net/rds/recv.c | 5 +++++ 3 files changed, 11 insertions(+)
--- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -193,4 +193,5 @@ struct rds_transport rds_loop_transport .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", + .t_type = RDS_TRANS_LOOP, }; --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -440,6 +440,11 @@ struct rds_notifier { int n_status; };
+/* Available as part of RDS core, so doesn't need to participate + * in get_preferred transport etc + */ +#define RDS_TRANS_LOOP 3 + /** * struct rds_transport - transport specific behavioural hooks * --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -94,6 +94,11 @@ static void rds_recv_rcvbuf_delta(struct return;
rs->rs_rcv_bytes += delta; + + /* loop transport doesn't send/recv congestion updates */ + if (rs->rs_transport->t_type == RDS_TRANS_LOOP) + return; + now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs);
rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d "
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 3bc53be9db21040b5d2de4d455f023c8c494aa68 upstream.
syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is because nfc_llcp_send_ui_frame() is retrying the loop without any delay when nonblocking nfc_alloc_send_skb() returned NULL.
Since there is no need to use MSG_DONTWAIT if we retry until sock_alloc_send_pskb() succeeds, let's use blocking call. Also, in case an unexpected error occurred, let's break the loop if blocking nfc_alloc_send_skb() failed.
[1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/nfc/llcp_commands.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/nfc/llcp_commands.c +++ b/net/nfc/llcp_commands.c @@ -753,11 +753,14 @@ int nfc_llcp_send_ui_frame(struct nfc_ll pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len);
- pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT, + pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, 0, frag_len + LLCP_HEADER_SIZE, &err); if (pdu == NULL) { - pr_err("Could not allocate PDU\n"); - continue; + pr_err("Could not allocate PDU (error=%d)\n", err); + len -= remaining_len; + if (len == 0) + len = err; + break; }
pdu = llcp_add_header(pdu, dsap, ssap, LLCP_PDU_UI);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
Commit 1b7e2296a822dfd2349960addc42a139360ce769 upstream.
Shortly we will want to load a percpu variable in the return from userspace path. We can save an instruction by folding the addition of the percpu offset into the load instruction, and this patch adds a new helper to do so.
At the same time, we clean up this_cpu_ptr for consistency. As with {adr,ldr,str}_l, we change the template to take the destination register first, and name this dst. Secondly, we rename the macro to adr_this_cpu, following the scheme of adr_l, and matching the newly added ldr_this_cpu.
Signed-off-by: Mark Rutland mark.rutland@arm.com Tested-by: Laura Abbott labbott@redhat.com Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: James Morse james.morse@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/assembler.h | 19 +++++++++++++++---- arch/arm64/kernel/entry.S | 2 +- 2 files changed, 16 insertions(+), 5 deletions(-)
--- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -239,14 +239,25 @@ lr .req x30 // link register .endm
/* + * @dst: Result of per_cpu(sym, smp_processor_id()) * @sym: The name of the per-cpu variable - * @reg: Result of per_cpu(sym, smp_processor_id()) * @tmp: scratch register */ - .macro this_cpu_ptr, sym, reg, tmp - adr_l \reg, \sym + .macro adr_this_cpu, dst, sym, tmp + adr_l \dst, \sym mrs \tmp, tpidr_el1 - add \reg, \reg, \tmp + add \dst, \dst, \tmp + .endm + + /* + * @dst: Result of READ_ONCE(per_cpu(sym, smp_processor_id())) + * @sym: The name of the per-cpu variable + * @tmp: scratch register + */ + .macro ldr_this_cpu dst, sym, tmp + adr_l \dst, \sym + mrs \tmp, tpidr_el1 + ldr \dst, [\dst, \tmp] .endm
/* --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -243,7 +243,7 @@ alternative_insn eret, nop, ARM64_UNMAP_ cmp x25, tsk b.ne 9998f
- this_cpu_ptr irq_stack, x25, x26 + adr_this_cpu x25, irq_stack, x26 mov x26, #IRQ_STACK_START_SP add x26, x25, x26
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 32b03d1059667a39e089c45ee38ec9c16332430f upstream.
KVM uses tpidr_el2 as its private vcpu register, which makes sense for non-vhe world switch as only KVM can access this register. This means vhe Linux has to use tpidr_el1, which KVM has to save/restore as part of the host context.
If the SDEI handler code runs behind KVMs back, it mustn't access any per-cpu variables. To allow this on systems with vhe we need to make the host use tpidr_el2, saving KVM from save/restoring it.
__guest_enter() stores the host_ctxt on the stack, do the same with the vcpu.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/entry.S | 10 +++++++--- arch/arm64/kvm/hyp/hyp-entry.S | 6 +++--- 2 files changed, 10 insertions(+), 6 deletions(-)
--- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -62,8 +62,8 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1
- // Store the host_ctxt for use at exit time - str x1, [sp, #-16]! + // Store host_ctxt and vcpu for use at exit time + stp x1, x0, [sp, #-16]!
add x18, x0, #VCPU_CONTEXT
@@ -159,6 +159,10 @@ abort_guest_exit_end: ENDPROC(__guest_exit)
ENTRY(__fpsimd_guest_restore) + // x0: esr + // x1: vcpu + // x2-x29,lr: vcpu regs + // vcpu x0-x1 on the stack stp x2, x3, [sp, #-16]! stp x4, lr, [sp, #-16]!
@@ -173,7 +177,7 @@ alternative_else alternative_endif isb
- mrs x3, tpidr_el2 + mov x3, x1
ldr x0, [x3, #VCPU_HOST_CONTEXT] kern_hyp_va x0 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -121,24 +121,24 @@ el1_trap: /* * x0: ESR_EC */ + ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter
/* Guest accessed VFP/SIMD registers, save host, restore Guest */ cmp x0, #ESR_ELx_EC_FP_ASIMD b.eq __fpsimd_guest_restore
- mrs x1, tpidr_el2 mov x0, #ARM_EXCEPTION_TRAP b __guest_exit
el1_irq: stp x0, x1, [sp, #-16]! - mrs x1, tpidr_el2 + ldr x1, [sp, #16 + 8] mov x0, #ARM_EXCEPTION_IRQ b __guest_exit
el1_error: stp x0, x1, [sp, #-16]! - mrs x1, tpidr_el2 + ldr x1, [sp, #16 + 8] mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 36989e7fd386a9a5822c48691473863f8fbb404d upstream.
kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init() used to store the host EL1 registers when KVM switches to a guest.
Make it easier for ASM to generate pointers into this per-cpu memory by making it a static allocation.
Signed-off-by: James Morse james.morse@arm.com Acked-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kvm/arm.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-)
--- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -51,8 +51,8 @@ __asm__(".arch_extension virt"); #endif
+DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); -static kvm_cpu_context_t __percpu *kvm_host_cpu_state; static unsigned long hyp_default_vectors;
/* Per-CPU variable containing the currently running vcpu. */ @@ -338,7 +338,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu }
vcpu->cpu = cpu; - vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); + vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state);
kvm_arm_set_running_vcpu(vcpu); } @@ -1199,19 +1199,8 @@ static inline void hyp_cpu_pm_exit(void) } #endif
-static void teardown_common_resources(void) -{ - free_percpu(kvm_host_cpu_state); -} - static int init_common_resources(void) { - kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t); - if (!kvm_host_cpu_state) { - kvm_err("Cannot allocate host CPU state\n"); - return -ENOMEM; - } - /* set size of VMID supported by CPU */ kvm_vmid_bits = kvm_get_vmid_bits(); kvm_info("%d-bit VMID\n", kvm_vmid_bits); @@ -1369,7 +1358,7 @@ static int init_hyp_mode(void) for_each_possible_cpu(cpu) { kvm_cpu_context_t *cpu_ctxt;
- cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu); + cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu); err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP);
if (err) { @@ -1447,7 +1436,6 @@ int kvm_arch_init(void *opaque) out_hyp: teardown_hyp_mode(); out_err: - teardown_common_resources(); return err; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit c97e166e54b662717d20ec2e36761758d2b6a7c2 upstream.
Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and on VHE they can be the same register.
KVM calls hyp_panic() when anything unexpected happens. This may occur while a guest owns the EL1 registers. KVM stashes the vcpu pointer in tpidr_el2, which it uses to find the host context in order to restore the host EL1 registers before parachuting into the host's panic().
The host context is a struct kvm_cpu_context allocated in the per-cpu area, and mapped to hyp. Given the per-cpu offset for this CPU, this is easy to find. Change hyp_panic() to take a pointer to the struct kvm_cpu_context. Wrap these calls with an asm function that retrieves the struct kvm_cpu_context from the host's per-cpu area.
Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during kvm init. (Later patches will make this unnecessary for VHE hosts)
We print out the vcpu pointer as part of the panic message. Add a back reference to the 'running vcpu' in the host cpu context to preserve this.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/hyp/hyp-entry.S | 12 ++++++++++++ arch/arm64/kvm/hyp/s2-setup.c | 3 +++ arch/arm64/kvm/hyp/switch.c | 25 +++++++++++++------------ 4 files changed, 30 insertions(+), 12 deletions(-)
--- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -197,6 +197,8 @@ struct kvm_cpu_context { u64 sys_regs[NR_SYS_REGS]; u32 copro[NR_COPRO_REGS]; }; + + struct kvm_vcpu *__hyp_running_vcpu; };
typedef struct kvm_cpu_context kvm_cpu_context_t; --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -173,6 +173,18 @@ ENTRY(__hyp_do_panic) eret ENDPROC(__hyp_do_panic)
+ENTRY(__hyp_panic) + /* + * '=kvm_host_cpu_state' is a host VA from the constant pool, it may + * not be accessible by this address from EL2, hyp_panic() converts + * it with kern_hyp_va() before use. + */ + ldr x0, =kvm_host_cpu_state + mrs x1, tpidr_el2 + add x0, x0, x1 + b hyp_panic +ENDPROC(__hyp_panic) + .macro invalid_vector label, target = __hyp_panic .align 2 \label: --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -84,5 +84,8 @@ u32 __hyp_text __init_stage2_translation
write_sysreg(val, vtcr_el2);
+ /* copy tpidr_el1 into tpidr_el2 for use by HYP */ + write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); + return parange; } --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -275,9 +275,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm u64 exit_code;
vcpu = kern_hyp_va(vcpu); - write_sysreg(vcpu, tpidr_el2);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); + host_ctxt->__hyp_running_vcpu = vcpu; guest_ctxt = &vcpu->arch.ctxt;
__sysreg_save_host_state(host_ctxt); @@ -364,7 +364,8 @@ again:
static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
-static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par) +static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par, + struct kvm_vcpu *vcpu) { unsigned long str_va;
@@ -378,35 +379,35 @@ static void __hyp_text __hyp_call_panic_ __hyp_do_panic(str_va, spsr, elr, read_sysreg(esr_el2), read_sysreg_el2(far), - read_sysreg(hpfar_el2), par, - (void *)read_sysreg(tpidr_el2)); + read_sysreg(hpfar_el2), par, vcpu); }
-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par) +static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par, + struct kvm_vcpu *vcpu) { panic(__hyp_panic_string, spsr, elr, read_sysreg_el2(esr), read_sysreg_el2(far), - read_sysreg(hpfar_el2), par, - (void *)read_sysreg(tpidr_el2)); + read_sysreg(hpfar_el2), par, vcpu); }
static hyp_alternate_select(__hyp_call_panic, __hyp_call_panic_nvhe, __hyp_call_panic_vhe, ARM64_HAS_VIRT_HOST_EXTN);
-void __hyp_text __noreturn __hyp_panic(void) +void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) { + struct kvm_vcpu *vcpu = NULL; + u64 spsr = read_sysreg_el2(spsr); u64 elr = read_sysreg_el2(elr); u64 par = read_sysreg(par_el1);
if (read_sysreg(vttbr_el2)) { - struct kvm_vcpu *vcpu; struct kvm_cpu_context *host_ctxt;
- vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2); - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); + host_ctxt = kern_hyp_va(__host_ctxt); + vcpu = host_ctxt->__hyp_running_vcpu; __timer_save_state(vcpu); __deactivate_traps(vcpu); __deactivate_vm(vcpu); @@ -414,7 +415,7 @@ void __hyp_text __noreturn __hyp_panic(v }
/* Call panic for real */ - __hyp_call_panic()(spsr, elr, par); + __hyp_call_panic()(spsr, elr, par, vcpu);
unreachable(); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 6d99b68933fbcf51f84fcbba49246ce1209ec193 upstream.
Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1 on VHE hosts, and allows future code to blindly access per-cpu variables without triggering world-switch.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/alternative.h | 2 ++ arch/arm64/include/asm/assembler.h | 8 ++++++++ arch/arm64/include/asm/percpu.h | 12 ++++++++++-- arch/arm64/kernel/alternative.c | 9 +++++---- arch/arm64/kernel/cpufeature.c | 17 +++++++++++++++++ 5 files changed, 42 insertions(+), 6 deletions(-)
--- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -11,6 +11,8 @@ #include <linux/stddef.h> #include <linux/stringify.h>
+extern int alternatives_applied; + struct alt_instr { s32 orig_offset; /* offset to original instruction */ s32 alt_offset; /* offset to replacement instruction */ --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -245,7 +245,11 @@ lr .req x30 // link register */ .macro adr_this_cpu, dst, sym, tmp adr_l \dst, \sym +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs \tmp, tpidr_el1 +alternative_else + mrs \tmp, tpidr_el2 +alternative_endif add \dst, \dst, \tmp .endm
@@ -256,7 +260,11 @@ lr .req x30 // link register */ .macro ldr_this_cpu dst, sym, tmp adr_l \dst, \sym +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs \tmp, tpidr_el1 +alternative_else + mrs \tmp, tpidr_el2 +alternative_endif ldr \dst, [\dst, \tmp] .endm
--- a/arch/arm64/include/asm/percpu.h +++ b/arch/arm64/include/asm/percpu.h @@ -16,9 +16,14 @@ #ifndef __ASM_PERCPU_H #define __ASM_PERCPU_H
+#include <asm/alternative.h> + static inline void set_my_cpu_offset(unsigned long off) { - asm volatile("msr tpidr_el1, %0" :: "r" (off) : "memory"); + asm volatile(ALTERNATIVE("msr tpidr_el1, %0", + "msr tpidr_el2, %0", + ARM64_HAS_VIRT_HOST_EXTN) + :: "r" (off) : "memory"); }
static inline unsigned long __my_cpu_offset(void) @@ -29,7 +34,10 @@ static inline unsigned long __my_cpu_off * We want to allow caching the value, so avoid using volatile and * instead use a fake stack read to hazard against barrier(). */ - asm("mrs %0, tpidr_el1" : "=r" (off) : + asm(ALTERNATIVE("mrs %0, tpidr_el1", + "mrs %0, tpidr_el2", + ARM64_HAS_VIRT_HOST_EXTN) + : "=r" (off) : "Q" (*(const unsigned long *)current_stack_pointer));
return off; --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -32,6 +32,8 @@ #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
+int alternatives_applied; + struct alt_region { struct alt_instr *begin; struct alt_instr *end; @@ -142,7 +144,6 @@ static void __apply_alternatives(void *a */ static int __apply_alternatives_multi_stop(void *unused) { - static int patched = 0; struct alt_region region = { .begin = (struct alt_instr *)__alt_instructions, .end = (struct alt_instr *)__alt_instructions_end, @@ -150,14 +151,14 @@ static int __apply_alternatives_multi_st
/* We always have a CPU 0 at this point (__init) */ if (smp_processor_id()) { - while (!READ_ONCE(patched)) + while (!READ_ONCE(alternatives_applied)) cpu_relax(); isb(); } else { - BUG_ON(patched); + BUG_ON(alternatives_applied); __apply_alternatives(®ion); /* Barriers provided by the cache flushing */ - WRITE_ONCE(patched, 1); + WRITE_ONCE(alternatives_applied, 1); }
return 0; --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -829,6 +829,22 @@ static int __init parse_kpti(char *str) early_param("kpti", parse_kpti); #endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
+static int cpu_copy_el2regs(void *__unused) +{ + /* + * Copy register values that aren't redirected by hardware. + * + * Before code patching, we only set tpidr_el1, all CPUs need to copy + * this value to tpidr_el2 before we patch the code. Once we've done + * that, freshly-onlined CPUs will set tpidr_el2, so we don't need to + * do anything here. + */ + if (!alternatives_applied) + write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); + + return 0; +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -895,6 +911,7 @@ static const struct arm64_cpu_capabiliti .capability = ARM64_HAS_VIRT_HOST_EXTN, .def_scope = SCOPE_SYSTEM, .matches = runs_at_el2, + .enable = cpu_copy_el2regs, }, { .desc = "32-bit EL0 Support",
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Morse james.morse@arm.com
Commit 1f742679c33bc083722cb0b442a95d458c491b56 upstream.
Now that a VHE host uses tpidr_el2 for the cpu offset we no longer need KVM to save/restore tpidr_el1. Move this from the 'common' code into the non-vhe code. While we're at it, on VHE we don't need to save the ELR or SPSR as kernel_entry in entry.S will have pushed these onto the kernel stack, and will restore them from there. Move these to the non-vhe code as we need them to get back to the host.
Finally remove the always-copy-tpidr we hid in the stage2 setup code, cpufeature's enable callback will do this for VHE, we only need KVM to do it for non-vhe. Add the copy into kvm-init instead.
Signed-off-by: James Morse james.morse@arm.com Reviewed-by: Christoffer Dall cdall@linaro.org Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp-init.S | 4 ++++ arch/arm64/kvm/hyp/s2-setup.c | 3 --- arch/arm64/kvm/hyp/sysreg-sr.c | 16 ++++++++-------- 3 files changed, 12 insertions(+), 11 deletions(-)
--- a/arch/arm64/kvm/hyp-init.S +++ b/arch/arm64/kvm/hyp-init.S @@ -118,6 +118,10 @@ CPU_BE( orr x4, x4, #SCTLR_ELx_EE) kern_hyp_va x2 msr vbar_el2, x2
+ /* copy tpidr_el1 into tpidr_el2 for use by HYP */ + mrs x1, tpidr_el1 + msr tpidr_el2, x1 + /* Hello, World! */ eret ENDPROC(__kvm_hyp_init) --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -84,8 +84,5 @@ u32 __hyp_text __init_stage2_translation
write_sysreg(val, vtcr_el2);
- /* copy tpidr_el1 into tpidr_el2 for use by HYP */ - write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); - return parange; } --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -27,8 +27,8 @@ static void __hyp_text __sysreg_do_nothi /* * Non-VHE: Both host and guest must save everything. * - * VHE: Host must save tpidr*_el[01], actlr_el1, mdscr_el1, sp0, pc, - * pstate, and guest must save everything. + * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0, + * and guest must save everything. */
static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt) @@ -36,11 +36,8 @@ static void __hyp_text __sysreg_save_com ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1); ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0); ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0); - ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1); ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1); ctxt->gp_regs.regs.sp = read_sysreg(sp_el0); - ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); - ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); }
static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) @@ -62,10 +59,13 @@ static void __hyp_text __sysreg_save_sta ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair); ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl); ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1); + ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1); ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr); ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr); + ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); + ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); }
static hyp_alternate_select(__sysreg_call_save_host_state, @@ -89,11 +89,8 @@ static void __hyp_text __sysreg_restore_ write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1); write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0); write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0); - write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1); write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1); write_sysreg(ctxt->gp_regs.regs.sp, sp_el0); - write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); - write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); }
static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) @@ -115,10 +112,13 @@ static void __hyp_text __sysreg_restore_ write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair); write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl); write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1); + write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
write_sysreg(ctxt->gp_regs.sp_el1, sp_el1); write_sysreg_el1(ctxt->gp_regs.elr_el1, elr); write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr); + write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); + write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); }
static hyp_alternate_select(__sysreg_call_restore_host_state,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
Commit dea5e2a4c5bcf196f879a66cebdcca07793e8ba4 upstream.
We've so far relied on a patching infrastructure that only gave us a single alternative, without any way to provide a range of potential replacement instructions. For a single feature, this is an all or nothing thing.
It would be interesting to have a more flexible grained way of patching the kernel though, where we could dynamically tune the code that gets injected.
In order to achive this, let's introduce a new form of dynamic patching, assiciating a callback to a patching site. This callback gets source and target locations of the patching request, as well as the number of instructions to be patched.
Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb directives:
asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback) : "r" (v)); or alternative_cb callback mov x0, #0 alternative_cb_end
where callback is the C function computing the alternative.
Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/alternative.h | 41 ++++++++++++++++++++++++++++--- arch/arm64/kernel/alternative.c | 45 +++++++++++++++++++++++++---------- 2 files changed, 70 insertions(+), 16 deletions(-)
--- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -4,6 +4,8 @@ #include <asm/cpucaps.h> #include <asm/insn.h>
+#define ARM64_CB_PATCH ARM64_NCAPS + #ifndef __ASSEMBLY__
#include <linux/init.h> @@ -21,12 +23,19 @@ struct alt_instr { u8 alt_len; /* size of new instruction(s), <= orig_len */ };
+typedef void (*alternative_cb_t)(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst); + void __init apply_alternatives_all(void); void apply_alternatives(void *start, size_t length);
-#define ALTINSTR_ENTRY(feature) \ +#define ALTINSTR_ENTRY(feature,cb) \ " .word 661b - .\n" /* label */ \ + " .if " __stringify(cb) " == 0\n" \ " .word 663f - .\n" /* new instruction */ \ + " .else\n" \ + " .word " __stringify(cb) "- .\n" /* callback */ \ + " .endif\n" \ " .hword " __stringify(feature) "\n" /* feature bit */ \ " .byte 662b-661b\n" /* source len */ \ " .byte 664f-663f\n" /* replacement len */ @@ -44,15 +53,18 @@ void apply_alternatives(void *start, siz * but most assemblers die if insn1 or insn2 have a .inst. This should * be fixed in a binutils release posterior to 2.25.51.0.2 (anything * containing commit 4e4d08cf7399b606 or c1baaddf8861). + * + * Alternatives with callbacks do not generate replacement instructions. */ -#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled) \ +#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled, cb) \ ".if "__stringify(cfg_enabled)" == 1\n" \ "661:\n\t" \ oldinstr "\n" \ "662:\n" \ ".pushsection .altinstructions,"a"\n" \ - ALTINSTR_ENTRY(feature) \ + ALTINSTR_ENTRY(feature,cb) \ ".popsection\n" \ + " .if " __stringify(cb) " == 0\n" \ ".pushsection .altinstr_replacement, "a"\n" \ "663:\n\t" \ newinstr "\n" \ @@ -60,11 +72,17 @@ void apply_alternatives(void *start, siz ".popsection\n\t" \ ".org . - (664b-663b) + (662b-661b)\n\t" \ ".org . - (662b-661b) + (664b-663b)\n" \ + ".else\n\t" \ + "663:\n\t" \ + "664:\n\t" \ + ".endif\n" \ ".endif\n"
#define _ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg, ...) \ - __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg)) + __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0)
+#define ALTERNATIVE_CB(oldinstr, cb) \ + __ALTERNATIVE_CFG(oldinstr, "NOT_AN_INSTRUCTION", ARM64_CB_PATCH, 1, cb) #else
#include <asm/assembler.h> @@ -131,6 +149,14 @@ void apply_alternatives(void *start, siz 661: .endm
+.macro alternative_cb cb + .set .Lasm_alt_mode, 0 + .pushsection .altinstructions, "a" + altinstruction_entry 661f, \cb, ARM64_CB_PATCH, 662f-661f, 0 + .popsection +661: +.endm + /* * Provide the other half of the alternative code sequence. */ @@ -157,6 +183,13 @@ void apply_alternatives(void *start, siz .endm
/* + * Callback-based alternative epilogue + */ +.macro alternative_cb_end +662: +.endm + +/* * Provides a trivial alternative or default sequence consisting solely * of NOPs. The number of NOPs is chosen automatically to match the * previous case. --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -28,7 +28,7 @@ #include <asm/sections.h> #include <linux/stop_machine.h>
-#define __ALT_PTR(a,f) (u32 *)((void *)&(a)->f + (a)->f) +#define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f) #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
@@ -107,31 +107,52 @@ static u32 get_alt_insn(struct alt_instr return insn; }
+static void patch_alternative(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + __le32 *replptr; + int i; + + replptr = ALT_REPL_PTR(alt); + for (i = 0; i < nr_inst; i++) { + u32 insn; + + insn = get_alt_insn(alt, origptr + i, replptr + i); + updptr[i] = cpu_to_le32(insn); + } +} + static void __apply_alternatives(void *alt_region) { struct alt_instr *alt; struct alt_region *region = alt_region; - u32 *origptr, *replptr; + __le32 *origptr; + alternative_cb_t alt_cb;
for (alt = region->begin; alt < region->end; alt++) { - u32 insn; - int i, nr_inst; + int nr_inst;
- if (!cpus_have_cap(alt->cpufeature)) + /* Use ARM64_CB_PATCH as an unconditional patch */ + if (alt->cpufeature < ARM64_CB_PATCH && + !cpus_have_cap(alt->cpufeature)) continue;
- BUG_ON(alt->alt_len != alt->orig_len); + if (alt->cpufeature == ARM64_CB_PATCH) + BUG_ON(alt->alt_len != 0); + else + BUG_ON(alt->alt_len != alt->orig_len);
pr_info_once("patching kernel code\n");
origptr = ALT_ORIG_PTR(alt); - replptr = ALT_REPL_PTR(alt); - nr_inst = alt->alt_len / sizeof(insn); + nr_inst = alt->orig_len / AARCH64_INSN_SIZE;
- for (i = 0; i < nr_inst; i++) { - insn = get_alt_insn(alt, origptr + i, replptr + i); - *(origptr + i) = cpu_to_le32(insn); - } + if (alt->cpufeature < ARM64_CB_PATCH) + alt_cb = patch_alternative; + else + alt_cb = ALT_REPL_PTR(alt); + + alt_cb(alt, origptr, origptr, nr_inst);
flush_icache_range((uintptr_t)origptr, (uintptr_t)(origptr + nr_inst));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
Commit 44a497abd621a71c645f06d3d545ae2f46448830 upstream.
kvm_vgic_global_state is part of the read-only section, and is usually accessed using a PC-relative address generation (adrp + add).
It is thus useless to use kern_hyp_va() on it, and actively problematic if kern_hyp_va() becomes non-idempotent. On the other hand, there is no way that the compiler is going to guarantee that such access is always PC relative.
So let's bite the bullet and provide our own accessor.
Acked-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: James Morse james.morse@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_mmu.h | 7 +++++++ arch/arm64/include/asm/kvm_mmu.h | 20 ++++++++++++++++++++ virt/kvm/arm/hyp/vgic-v2-sr.c | 2 +- 3 files changed, 28 insertions(+), 1 deletion(-)
--- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -28,6 +28,13 @@ */ #define kern_hyp_va(kva) (kva)
+/* Contrary to arm64, there is no need to generate a PC-relative address */ +#define hyp_symbol_addr(s) \ + ({ \ + typeof(s) *addr = &(s); \ + addr; \ + }) + /* * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. */ --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -131,6 +131,26 @@ static inline unsigned long __kern_hyp_v #define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned long)(v))))
/* + * Obtain the PC-relative address of a kernel symbol + * s: symbol + * + * The goal of this macro is to return a symbol's address based on a + * PC-relative computation, as opposed to a loading the VA from a + * constant pool or something similar. This works well for HYP, as an + * absolute VA is guaranteed to be wrong. Only use this if trying to + * obtain the address of a symbol (i.e. not something you obtained by + * following a pointer). + */ +#define hyp_symbol_addr(s) \ + ({ \ + typeof(s) *addr; \ + asm("adrp %0, %1\n" \ + "add %0, %0, :lo12:%1\n" \ + : "=r" (addr) : "S" (&s)); \ + addr; \ + }) + +/* * We currently only support a 40bit IPA. */ #define KVM_PHYS_SHIFT (40) --- a/virt/kvm/arm/hyp/vgic-v2-sr.c +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c @@ -203,7 +203,7 @@ int __hyp_text __vgic_v2_perform_cpuif_a return -1;
rd = kvm_vcpu_dabt_get_rd(vcpu); - addr = kern_hyp_va((kern_hyp_va(&kvm_vgic_global_state))->vcpu_base_va); + addr = kern_hyp_va(hyp_symbol_addr(kvm_vgic_global_state)->vcpu_base_va); addr += fault_ipa - vgic->vgic_cpu_base;
if (kvm_vcpu_dabt_iswrite(vcpu)) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoffer Dall christoffer.dall@linaro.org
Commit 4464e210de9e80e38de59df052fe09ea2ff80b1b upstream.
We already have the percpu area for the host cpu state, which points to the VCPU, so there's no need to store the VCPU pointer on the stack on every context switch. We can be a little more clever and just use tpidr_el2 for the percpu offset and load the VCPU pointer from the host context.
This has the benefit of being able to retrieve the host context even when our stack is corrupted, and it has a potential performance benefit because we trade a store plus a load for an mrs and a load on a round trip to the guest.
This does require us to calculate the percpu offset without including the offset from the kernel mapping of the percpu array to the linear mapping of the array (which is what we store in tpidr_el1), because a PC-relative generated address in EL2 is already giving us the hyp alias of the linear mapping of a kernel address. We do this in __cpu_init_hyp_mode() by using kvm_ksym_ref().
The code that accesses ESR_EL2 was previously using an alternative to use the _EL1 accessor on VHE systems, but this was actually unnecessary as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2 accessor does the same thing on both systems.
Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Marc Zyngier marc.zyngier@arm.com Reviewed-by: Andrew Jones drjones@redhat.com Signed-off-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_asm.h | 15 +++++++++++++++ arch/arm64/include/asm/kvm_host.h | 15 +++++++++++++++ arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/entry.S | 6 +----- arch/arm64/kvm/hyp/hyp-entry.S | 28 ++++++++++------------------ arch/arm64/kvm/hyp/switch.c | 5 +---- arch/arm64/kvm/hyp/sysreg-sr.c | 5 +++++ 7 files changed, 48 insertions(+), 27 deletions(-)
--- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,7 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+/* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ void *val = &sym; \ @@ -65,6 +66,20 @@ extern u32 __kvm_get_mdcr_el2(void);
extern u32 __init_stage2_translation(void);
+#else /* __ASSEMBLY__ */ + +.macro get_host_ctxt reg, tmp + adr_l \reg, kvm_host_cpu_state + mrs \tmp, tpidr_el2 + add \reg, \reg, \tmp +.endm + +.macro get_vcpu_ptr vcpu, ctxt + get_host_ctxt \ctxt, \vcpu + ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU] + kern_hyp_va \vcpu +.endm + #endif
#endif /* __ARM_KVM_ASM_H__ */ --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -356,10 +356,15 @@ int kvm_perf_teardown(void);
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
+void __kvm_set_tpidr_el2(u64 tpidr_el2); +DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); + static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, unsigned long hyp_stack_ptr, unsigned long vector_ptr) { + u64 tpidr_el2; + /* * Call initialization code, and switch to the full blown HYP code. * If the cpucaps haven't been finalized yet, something has gone very @@ -368,6 +373,16 @@ static inline void __cpu_init_hyp_mode(p */ BUG_ON(!static_branch_likely(&arm64_const_caps_ready)); __kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr); + + /* + * Calculate the raw per-cpu offset without a translation from the + * kernel's mapping to the linear mapping, and store it in tpidr_el2 + * so that we can use adr_l to access per-cpu variables in EL2. + */ + tpidr_el2 = (u64)this_cpu_ptr(&kvm_host_cpu_state) + - (u64)kvm_ksym_ref(kvm_host_cpu_state); + + kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2); }
void __kvm_hyp_teardown(void); --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -132,6 +132,7 @@ int main(void) DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2])); DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context)); + DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu)); #endif #ifdef CONFIG_CPU_PM DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx)); --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -62,9 +62,6 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1
- // Store host_ctxt and vcpu for use at exit time - stp x1, x0, [sp, #-16]! - add x18, x0, #VCPU_CONTEXT
// Restore guest regs x0-x17 @@ -118,8 +115,7 @@ ENTRY(__guest_exit) // Store the guest regs x19-x29, lr save_callee_saved_regs x1
- // Restore the host_ctxt from the stack - ldr x2, [sp], #16 + get_host_ctxt x2, x3
// Now restore the host regs restore_callee_saved_regs x2 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -72,13 +72,8 @@ ENDPROC(__kvm_hyp_teardown) el1_sync: // Guest trapped into EL2 stp x0, x1, [sp, #-16]!
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN - mrs x1, esr_el2 -alternative_else - mrs x1, esr_el1 -alternative_endif - lsr x0, x1, #ESR_ELx_EC_SHIFT - + mrs x0, esr_el2 + lsr x0, x0, #ESR_ELx_EC_SHIFT cmp x0, #ESR_ELx_EC_HVC64 ccmp x0, #ESR_ELx_EC_HVC32, #4, ne b.ne el1_trap @@ -118,10 +113,14 @@ el1_hvc_guest: eret
el1_trap: + get_vcpu_ptr x1, x0 + + mrs x0, esr_el2 + lsr x0, x0, #ESR_ELx_EC_SHIFT /* * x0: ESR_EC + * x1: vcpu pointer */ - ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter
/* Guest accessed VFP/SIMD registers, save host, restore Guest */ cmp x0, #ESR_ELx_EC_FP_ASIMD @@ -132,13 +131,13 @@ el1_trap:
el1_irq: stp x0, x1, [sp, #-16]! - ldr x1, [sp, #16 + 8] + get_vcpu_ptr x1, x0 mov x0, #ARM_EXCEPTION_IRQ b __guest_exit
el1_error: stp x0, x1, [sp, #-16]! - ldr x1, [sp, #16 + 8] + get_vcpu_ptr x1, x0 mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit
@@ -174,14 +173,7 @@ ENTRY(__hyp_do_panic) ENDPROC(__hyp_do_panic)
ENTRY(__hyp_panic) - /* - * '=kvm_host_cpu_state' is a host VA from the constant pool, it may - * not be accessible by this address from EL2, hyp_panic() converts - * it with kern_hyp_va() before use. - */ - ldr x0, =kvm_host_cpu_state - mrs x1, tpidr_el2 - add x0, x0, x1 + get_host_ctxt x0, x1 b hyp_panic ENDPROC(__hyp_panic)
--- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -395,7 +395,7 @@ static hyp_alternate_select(__hyp_call_p __hyp_call_panic_nvhe, __hyp_call_panic_vhe, ARM64_HAS_VIRT_HOST_EXTN);
-void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) +void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt) { struct kvm_vcpu *vcpu = NULL;
@@ -404,9 +404,6 @@ void __hyp_text __noreturn hyp_panic(str u64 par = read_sysreg(par_el1);
if (read_sysreg(vttbr_el2)) { - struct kvm_cpu_context *host_ctxt; - - host_ctxt = kern_hyp_va(__host_ctxt); vcpu = host_ctxt->__hyp_running_vcpu; __timer_save_state(vcpu); __deactivate_traps(vcpu); --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -183,3 +183,8 @@ void __hyp_text __sysreg32_restore_state if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY) write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2); } + +void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2) +{ + asm("msr tpidr_el2, %0": : "r" (tpidr_el2)); +}
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit eff0e9e1078ea7dc1d794dc50e31baef984c46d7 upstream.
We've so far used the PSCI return codes for SMCCC because they were extremely similar. But with the new ARM DEN 0070A specification, "NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS".
Let's bite the bullet and add SMCCC specific return codes. Users can be repainted as and when required.
Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/arm-smccc.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -291,5 +291,10 @@ asmlinkage void __arm_smccc_hvc(unsigned */ #define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
+/* Return codes defined in ARM DEN 0070A */ +#define SMCCC_RET_SUCCESS 0 +#define SMCCC_RET_NOT_SUPPORTED -1 +#define SMCCC_RET_NOT_REQUIRED -2 + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream.
In order for the kernel to protect itself, let's call the SSBD mitigation implemented by the higher exception level (either hypervisor or firmware) on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the right exception level, hence the introduction of a runtime patching callback.
Reviewed-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Julien Grall julien.grall@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++++++++ arch/arm64/kernel/entry.S | 22 ++++++++++++++++++++++ include/linux/arm-smccc.h | 5 +++++ 3 files changed, 51 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -187,6 +187,30 @@ static int enable_smccc_arch_workaround_ } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
+#ifdef CONFIG_ARM64_SSBD +void __init arm64_update_smccc_conduit(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + u32 insn; + + BUG_ON(nr_inst != 1); + + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + insn = aarch64_insn_get_hvc_value(); + break; + case PSCI_CONDUIT_SMC: + insn = aarch64_insn_get_smc_value(); + break; + default: + return; + } + + *updptr = cpu_to_le32(insn); +} +#endif /* CONFIG_ARM64_SSBD */ + #define MIDR_RANGE(model, min, max) \ .def_scope = SCOPE_LOCAL_CPU, \ .matches = is_affected_midr_range, \ --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -18,6 +18,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/init.h> #include <linux/linkage.h>
@@ -95,6 +96,18 @@ alternative_else_nop_endif add \dst, \dst, #(\sym - .entry.tramp.text) .endm
+ // This macro corrupts x0-x3. It is the caller's duty + // to save/restore them if required. + .macro apply_ssbd, state +#ifdef CONFIG_ARM64_SSBD + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + mov w1, #\state +alternative_cb arm64_update_smccc_conduit + nop // Patched to SMC/HVC #0 +alternative_cb_end +#endif + .endm + .macro kernel_entry, el, regsize = 64 .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 @@ -122,6 +135,13 @@ alternative_else_nop_endif ldr x19, [tsk, #TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling.
+ apply_ssbd 1 + +#ifdef CONFIG_ARM64_SSBD + ldp x0, x1, [sp, #16 * 0] + ldp x2, x3, [sp, #16 * 1] +#endif + mov x29, xzr // fp pointed to user-space .else add x21, sp, #S_FRAME_SIZE @@ -190,6 +210,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: + apply_ssbd 0 + .endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -80,6 +80,11 @@ ARM_SMCCC_SMC_32, \ 0, 0x8000)
+#define ARM_SMCCC_ARCH_WORKAROUND_2 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0x7fff) + #ifndef __ASSEMBLY__
#include <linux/linkage.h>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream.
In a heterogeneous system, we can end up with both affected and unaffected CPUs. Let's check their status before calling into the firmware.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 2 ++ arch/arm64/kernel/entry.S | 11 +++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -188,6 +188,8 @@ static int enable_smccc_arch_workaround_ #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
#ifdef CONFIG_ARM64_SSBD +DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -98,8 +98,10 @@ alternative_else_nop_endif
// This macro corrupts x0-x3. It is the caller's duty // to save/restore them if required. - .macro apply_ssbd, state + .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD + ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 + cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit @@ -135,12 +137,13 @@ alternative_cb_end ldr x19, [tsk, #TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling.
- apply_ssbd 1 + apply_ssbd 1, 1f, x22, x23
#ifdef CONFIG_ARM64_SSBD ldp x0, x1, [sp, #16 * 0] ldp x2, x3, [sp, #16 * 1] #endif +1:
mov x29, xzr // fp pointed to user-space .else @@ -210,8 +213,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: - apply_ssbd 0 - + apply_ssbd 0, 5f, x0, x1 +5: .endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit a725e3dda1813ed306734823ac4c65ca04e38500 upstream.
As for Spectre variant-2, we rely on SMCCC 1.1 to provide the discovery mechanism for detecting the SSBD mitigation.
A new capability is also allocated for that purpose, and a config option.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 9 +++++ arch/arm64/include/asm/cpucaps.h | 3 + arch/arm64/kernel/cpu_errata.c | 69 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -776,6 +776,15 @@ config HARDEN_BRANCH_PREDICTOR
If unsure, say Y.
+config ARM64_SSBD + bool "Speculative Store Bypass Disable" if EXPERT + default y + help + This enables mitigation of the bypassing of previous stores + by speculative loads. + + If unsure, say Y. + menuconfig ARMV8_DEPRECATED bool "Emulate deprecated/obsolete ARMv8 instructions" depends on COMPAT --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -36,7 +36,8 @@ #define ARM64_MISMATCHED_CACHE_LINE_SIZE 15 #define ARM64_UNMAP_KERNEL_AT_EL0 16 #define ARM64_HARDEN_BRANCH_PREDICTOR 17 +#define ARM64_SSBD 18
-#define ARM64_NCAPS 18 +#define ARM64_NCAPS 19
#endif /* __ASM_CPUCAPS_H */ --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -211,6 +211,67 @@ void __init arm64_update_smccc_conduit(s
*updptr = cpu_to_le32(insn); } + +static void arm64_set_ssbd_mitigation(bool state) +{ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + default: + WARN_ON_ONCE(1); + break; + } +} + +static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry, + int scope) +{ + struct arm_smccc_res res; + bool supported = true; + + WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + return false; + + /* + * The probe function return value is either negative + * (unsupported or mitigated), positive (unaffected), or zero + * (requires mitigation). We only need to do anything in the + * last case. + */ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + default: + supported = false; + } + + if (supported) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + + return supported; +} #endif /* CONFIG_ARM64_SSBD */
#define MIDR_RANGE(model, min, max) \ @@ -336,6 +397,14 @@ const struct arm64_cpu_capabilities arm6 .enable = enable_smccc_arch_workaround_1, }, #endif +#ifdef CONFIG_ARM64_SSBD + { + .desc = "Speculative Store Bypass Disable", + .def_scope = SCOPE_LOCAL_CPU, + .capability = ARM64_SSBD, + .matches = has_ssbd_mitigation, + }, +#endif { } };
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream.
On a system where the firmware implements ARCH_WORKAROUND_2, it may be useful to either permanently enable or disable the workaround for cases where the user decides that they'd rather not get a trap overhead, and keep the mitigation permanently on or off instead of switching it on exception entry/exit.
In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 17 +++++ arch/arm64/include/asm/cpufeature.h | 6 ++ arch/arm64/kernel/cpu_errata.c | 103 ++++++++++++++++++++++++++++++------ 3 files changed, 110 insertions(+), 16 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -4023,6 +4023,23 @@ bytes respectively. Such letter suffixes spia_pedr= spia_peddr=
+ ssbd= [ARM64,HW] + Speculative Store Bypass Disable control + + On CPUs that are vulnerable to the Speculative + Store Bypass vulnerability and offer a + firmware based mitigation, this parameter + indicates how the mitigation should be used: + + force-on: Unconditionally enable mitigation for + for both kernel and userspace + force-off: Unconditionally disable mitigation for + for both kernel and userspace + kernel: Always enable mitigation in the + kernel, and offer a prctl interface + to allow userspace to register its + interest in being mitigated too. + stack_guard_gap= [MM] override the default stack gap protection. The value is in page units and it defines how many pages prior --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -221,6 +221,12 @@ static inline bool system_supports_mixed return id_aa64mmfr0_mixed_endian_el0(read_system_reg(SYS_ID_AA64MMFR0_EL1)); }
+#define ARM64_SSBD_UNKNOWN -1 +#define ARM64_SSBD_FORCE_DISABLE 0 +#define ARM64_SSBD_KERNEL 1 +#define ARM64_SSBD_FORCE_ENABLE 2 +#define ARM64_SSBD_MITIGATED 3 + #endif /* __ASSEMBLY__ */
#endif --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -190,6 +190,38 @@ static int enable_smccc_arch_workaround_ #ifdef CONFIG_ARM64_SSBD DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required);
+int ssbd_state __read_mostly = ARM64_SSBD_KERNEL; + +static const struct ssbd_options { + const char *str; + int state; +} ssbd_options[] = { + { "force-on", ARM64_SSBD_FORCE_ENABLE, }, + { "force-off", ARM64_SSBD_FORCE_DISABLE, }, + { "kernel", ARM64_SSBD_KERNEL, }, +}; + +static int __init ssbd_cfg(char *buf) +{ + int i; + + if (!buf || !buf[0]) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(ssbd_options); i++) { + int len = strlen(ssbd_options[i].str); + + if (strncmp(buf, ssbd_options[i].str, len)) + continue; + + ssbd_state = ssbd_options[i].state; + return 0; + } + + return -EINVAL; +} +early_param("ssbd", ssbd_cfg); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) @@ -233,44 +265,83 @@ static bool has_ssbd_mitigation(const st int scope) { struct arm_smccc_res res; - bool supported = true; + bool required = true; + s32 val;
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
- if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { + ssbd_state = ARM64_SSBD_UNKNOWN; return false; + }
- /* - * The probe function return value is either negative - * (unsupported or mitigated), positive (unaffected), or zero - * (requires mitigation). We only need to do anything in the - * last case. - */ switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break;
case PSCI_CONDUIT_SMC: arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break;
default: - supported = false; + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + } + + val = (s32)res.a0; + + switch (val) { + case SMCCC_RET_NOT_SUPPORTED: + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + + case SMCCC_RET_NOT_REQUIRED: + pr_info_once("%s mitigation not required\n", entry->desc); + ssbd_state = ARM64_SSBD_MITIGATED; + return false; + + case SMCCC_RET_SUCCESS: + required = true; + break; + + case 1: /* Mitigation not required on this CPU */ + required = false; + break; + + default: + WARN_ON(1); + return false; }
- if (supported) { - __this_cpu_write(arm64_ssbd_callback_required, 1); + switch (ssbd_state) { + case ARM64_SSBD_FORCE_DISABLE: + pr_info_once("%s disabled from command-line\n", entry->desc); + arm64_set_ssbd_mitigation(false); + required = false; + break; + + case ARM64_SSBD_KERNEL: + if (required) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + break; + + case ARM64_SSBD_FORCE_ENABLE: + pr_info_once("%s forced from command-line\n", entry->desc); arm64_set_ssbd_mitigation(true); + required = true; + break; + + default: + WARN_ON(1); + break; }
- return supported; + return required; } #endif /* CONFIG_ARM64_SSBD */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit c32e1736ca03904c03de0e4459a673be194f56fd upstream.
We're about to need the mitigation state in various parts of the kernel in order to do the right thing for userspace and guests.
Let's expose an accessor that will let other subsystems know about the state.
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -227,6 +227,16 @@ static inline bool system_supports_mixed #define ARM64_SSBD_FORCE_ENABLE 2 #define ARM64_SSBD_MITIGATED 3
+static inline int arm64_get_ssbd_state(void) +{ +#ifdef CONFIG_ARM64_SSBD + extern int ssbd_state; + return ssbd_state; +#else + return ARM64_SSBD_UNKNOWN; +#endif +} + #endif /* __ASSEMBLY__ */
#endif
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream.
In order to avoid checking arm64_ssbd_callback_required on each kernel entry/exit even if no mitigation is required, let's add yet another alternative that by default jumps over the mitigation, and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall julien.grall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 14 ++++++++++++++ arch/arm64/kernel/entry.S | 3 +++ 2 files changed, 17 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -244,6 +244,20 @@ void __init arm64_update_smccc_conduit(s *updptr = cpu_to_le32(insn); }
+void __init arm64_enable_wa2_handling(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + BUG_ON(nr_inst != 1); + /* + * Only allow mitigation on EL1 entry/exit and guest + * ARCH_WORKAROUND_2 handling if the SSBD state allows it to + * be flipped. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_KERNEL) + *updptr = cpu_to_le32(aarch64_insn_gen_nop()); +} + static void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -100,6 +100,9 @@ alternative_else_nop_endif // to save/restore them if required. .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b \targ +alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 647d0519b53f440a55df163de21c52a8205431cc upstream.
On a system where firmware can dynamically change the state of the mitigation, the CPU will always come up with the mitigation enabled, including when coming back from suspend.
If the user has requested "no mitigation" via a command line option, let's enforce it by calling into the firmware again to disable it.
Similarily, for a resume from hibernate, the mitigation could have been disabled by the boot kernel. Let's ensure that it is set back on in that case.
Acked-by: Will Deacon will.deacon@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cpufeature.h | 6 ++++++ arch/arm64/kernel/cpu_errata.c | 2 +- arch/arm64/kernel/hibernate.c | 11 +++++++++++ arch/arm64/kernel/suspend.c | 8 ++++++++ 4 files changed, 26 insertions(+), 1 deletion(-)
--- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -237,6 +237,12 @@ static inline int arm64_get_ssbd_state(v #endif }
+#ifdef CONFIG_ARM64_SSBD +void arm64_set_ssbd_mitigation(bool state); +#else +static inline void arm64_set_ssbd_mitigation(bool state) {} +#endif + #endif /* __ASSEMBLY__ */
#endif --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -258,7 +258,7 @@ void __init arm64_enable_wa2_handling(st *updptr = cpu_to_le32(aarch64_insn_gen_nop()); }
-static void arm64_set_ssbd_mitigation(bool state) +void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -308,6 +308,17 @@ int swsusp_arch_suspend(void)
sleep_cpu = -EINVAL; __cpu_suspend_exit(); + + /* + * Just in case the boot kernel did turn the SSBD + * mitigation off behind our back, let's set the state + * to what we expect it to be. + */ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_ENABLE: + case ARM64_SSBD_KERNEL: + arm64_set_ssbd_mitigation(true); + } }
local_dbg_restore(flags); --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c @@ -67,6 +67,14 @@ void notrace __cpu_suspend_exit(void) */ if (hw_breakpoint_restore) hw_breakpoint_restore(cpu); + + /* + * On resume, firmware implementing dynamic mitigation will + * have turned the mitigation on. If the user has forcefully + * disabled it, make sure their wishes are obeyed. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) + arm64_set_ssbd_mitigation(false); }
/*
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream.
In order to allow userspace to be mitigated on demand, let's introduce a new thread flag that prevents the mitigation from being turned off when exiting to userspace, and doesn't turn it on on entry into the kernel (with the assumption that the mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later patch.
Reviewed-by: Mark Rutland mark.rutland@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/entry.S | 2 ++ 2 files changed, 3 insertions(+)
--- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -122,6 +122,7 @@ static inline struct thread_info *curren #define TIF_RESTORE_SIGMASK 20 #define TIF_SINGLESTEP 21 #define TIF_32BIT 22 /* 32bit process */ +#define TIF_SSBD 23 /* Wants SSB mitigation */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -105,6 +105,8 @@ alternative_cb arm64_enable_wa2_handling alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ + ldr \tmp2, [tsk, #TI_FLAGS] + tbnz \tmp2, #TIF_SSBD, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 9cdc0108baa8ef87c76ed834619886a46bd70cbe upstream.
If running on a system that performs dynamic SSBD mitigation, allow userspace to request the mitigation for itself. This is implemented as a prctl call, allowing the mitigation to be enabled or disabled at will for this particular thread.
Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/Makefile | 1 arch/arm64/kernel/ssbd.c | 108 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 arch/arm64/kernel/ssbd.c
--- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += ka arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \ cpu-reset.o +arm64-obj-$(CONFIG_ARM64_SSBD) += ssbd.o
ifeq ($(CONFIG_KVM),y) arm64-obj-$(CONFIG_HARDEN_BRANCH_PREDICTOR) += bpi.o --- /dev/null +++ b/arch/arm64/kernel/ssbd.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2018 ARM Ltd, All Rights Reserved. + */ + +#include <linux/errno.h> +#include <linux/prctl.h> +#include <linux/sched.h> +#include <linux/thread_info.h> + +#include <asm/cpufeature.h> + +/* + * prctl interface for SSBD + */ +static int ssbd_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + int state = arm64_get_ssbd_state(); + + /* Unsupported */ + if (state == ARM64_SSBD_UNKNOWN) + return -EINVAL; + + /* Treat the unaffected/mitigated state separately */ + if (state == ARM64_SSBD_MITIGATED) { + switch (ctrl) { + case PR_SPEC_ENABLE: + return -EPERM; + case PR_SPEC_DISABLE: + case PR_SPEC_FORCE_DISABLE: + return 0; + } + } + + /* + * Things are a bit backward here: the arm64 internal API + * *enables the mitigation* when the userspace API *disables + * speculation*. So much fun. + */ + switch (ctrl) { + case PR_SPEC_ENABLE: + /* If speculation is force disabled, enable is not allowed */ + if (state == ARM64_SSBD_FORCE_ENABLE || + task_spec_ssb_force_disable(task)) + return -EPERM; + task_clear_spec_ssb_disable(task); + clear_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_FORCE_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + task_set_spec_ssb_force_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + default: + return -ERANGE; + } + + return 0; +} + +int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, + unsigned long ctrl) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_set(task, ctrl); + default: + return -ENODEV; + } +} + +static int ssbd_prctl_get(struct task_struct *task) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_UNKNOWN: + return -EINVAL; + case ARM64_SSBD_FORCE_ENABLE: + return PR_SPEC_DISABLE; + case ARM64_SSBD_KERNEL: + if (task_spec_ssb_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ssb_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + case ARM64_SSBD_FORCE_DISABLE: + return PR_SPEC_ENABLE; + default: + return PR_SPEC_NOT_AFFECTED; + } +} + +int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_get(task); + default: + return -ENODEV; + } +}
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 85478bab409171de501b719971fd25a3d5d639f9 upstream.
As we're going to require to access per-cpu variables at EL2, let's craft the minimum set of accessors required to implement reading a per-cpu variable, relying on tpidr_el2 to contain the per-cpu offset.
Reviewed-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/kvm_asm.h | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)
--- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -66,14 +66,37 @@ extern u32 __kvm_get_mdcr_el2(void);
extern u32 __init_stage2_translation(void);
+/* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ +#define __hyp_this_cpu_ptr(sym) \ + ({ \ + void *__ptr = hyp_symbol_addr(sym); \ + __ptr += read_sysreg(tpidr_el2); \ + (typeof(&sym))__ptr; \ + }) + +#define __hyp_this_cpu_read(sym) \ + ({ \ + *__hyp_this_cpu_ptr(sym); \ + }) + #else /* __ASSEMBLY__ */
-.macro get_host_ctxt reg, tmp - adr_l \reg, kvm_host_cpu_state +.macro hyp_adr_this_cpu reg, sym, tmp + adr_l \reg, \sym mrs \tmp, tpidr_el2 add \reg, \reg, \tmp .endm
+.macro hyp_ldr_this_cpu reg, sym, tmp + adr_l \reg, \sym + mrs \tmp, tpidr_el2 + ldr \reg, [\reg, \tmp] +.endm + +.macro get_host_ctxt reg, tmp + hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp +.endm + .macro get_vcpu_ptr vcpu, ctxt get_host_ctxt \ctxt, \vcpu ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream.
In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure.
Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it.
Reviewed-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_mmu.h | 5 +++++ arch/arm/kvm/arm.c | 6 ++++++ arch/arm64/include/asm/kvm_asm.h | 3 +++ arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++++ arch/arm64/kvm/hyp/switch.c | 38 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 79 insertions(+)
--- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -256,6 +256,11 @@ static inline int kvm_map_vectors(void) return 0; }
+static inline int hyp_map_aux_data(void) +{ + return 0; +} + #endif /* !__ASSEMBLY__ */
#endif /* __ARM_KVM_MMU_H__ */ --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -1367,6 +1367,12 @@ static int init_hyp_mode(void) } }
+ err = hyp_map_aux_data(); + if (err) { + kvm_err("Cannot map host auxilary data: %d\n", err); + goto out_err; + } + kvm_info("Hyp mode initialized successfully\n");
return 0; --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,9 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
+#define VCPU_WORKAROUND_2_FLAG_SHIFT 0 +#define VCPU_WORKAROUND_2_FLAG (_AC(1, UL) << VCPU_WORKAROUND_2_FLAG_SHIFT) + /* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -213,6 +213,9 @@ struct kvm_vcpu_arch { /* Exception Information */ struct kvm_vcpu_fault_info fault;
+ /* State of various workarounds, see kvm_asm.h for bit assignment */ + u64 workaround_flags; + /* Guest debug state */ u64 debug_flags;
--- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -387,5 +387,29 @@ static inline int kvm_map_vectors(void) } #endif
+#ifdef CONFIG_ARM64_SSBD +DECLARE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + +static inline int hyp_map_aux_data(void) +{ + int cpu, err; + + for_each_possible_cpu(cpu) { + u64 *ptr; + + ptr = per_cpu_ptr(&arm64_ssbd_callback_required, cpu); + err = create_hyp_mappings(ptr, ptr + 1, PAGE_HYP); + if (err) + return err; + } + return 0; +} +#else +static inline int hyp_map_aux_data(void) +{ + return 0; +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -15,6 +15,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/types.h> #include <linux/jump_label.h> #include <uapi/linux/psci.h> @@ -267,6 +268,39 @@ static void __hyp_text __skip_instr(stru write_sysreg_el2(*vcpu_pc(vcpu), elr); }
+static inline bool __hyp_text __needs_ssbd_off(struct kvm_vcpu *vcpu) +{ + if (!cpus_have_cap(ARM64_SSBD)) + return false; + + return !(vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG); +} + +static void __hyp_text __set_guest_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * The host runs with the workaround always present. If the + * guest wants it disabled, so be it... + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 0, NULL); +#endif +} + +static void __hyp_text __set_host_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * If the guest has disabled the workaround, bring it back on. + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 1, NULL); +#endif +} + int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) { struct kvm_cpu_context *host_ctxt; @@ -297,6 +331,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm __sysreg_restore_guest_state(guest_ctxt); __debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
+ __set_guest_arch_workaround_state(vcpu); + /* Jump in the fire! */ again: exit_code = __guest_enter(vcpu, host_ctxt); @@ -339,6 +375,8 @@ again: } }
+ __set_host_arch_workaround_state(vcpu); + fp_enabled = __fpsimd_enabled();
__sysreg_save_guest_state(guest_ctxt);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit b4f18c063a13dfb33e3a63fe1844823e19c2265e upstream.
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3, add a small(-ish) sequence to handle it at EL2. Special care must be taken to track the state of the guest itself by updating the workaround flags. We also rely on patching to enable calls into the firmware.
Note that since we need to execute branches, this always executes after the Spectre-v2 mitigation has been applied.
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/hyp-entry.S | 38 +++++++++++++++++++++++++++++++++++++- 2 files changed, 38 insertions(+), 1 deletion(-)
--- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -127,6 +127,7 @@ int main(void) BLANK(); #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); + DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags)); DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -107,8 +107,44 @@ el1_hvc_guest: */ ldr x1, [sp] // Guest's x0 eor w1, w1, #ARM_SMCCC_ARCH_WORKAROUND_1 + cbz w1, wa_epilogue + + /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */ + eor w1, w1, #(ARM_SMCCC_ARCH_WORKAROUND_1 ^ \ + ARM_SMCCC_ARCH_WORKAROUND_2) cbnz w1, el1_trap - mov x0, x1 + +#ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b wa2_end +alternative_cb_end + get_vcpu_ptr x2, x0 + ldr x0, [x2, #VCPU_WORKAROUND_FLAGS] + + // Sanitize the argument and update the guest flags + ldr x1, [sp, #8] // Guest's x1 + clz w1, w1 // Murphy's device: + lsr w1, w1, #5 // w1 = !!w1 without using + eor w1, w1, #1 // the flags... + bfi x0, x1, #VCPU_WORKAROUND_2_FLAG_SHIFT, #1 + str x0, [x2, #VCPU_WORKAROUND_FLAGS] + + /* Check that we actually need to perform the call */ + hyp_ldr_this_cpu x0, arm64_ssbd_callback_required, x2 + cbz x0, wa2_end + + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + smc #0 + + /* Don't leak data from the SMC call */ + mov x3, xzr +wa2_end: + mov x2, xzr + mov x1, xzr +#endif + +wa_epilogue: + mov x0, xzr add sp, sp, #16 eret
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier marc.zyngier@arm.com
commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream.
Now that all our infrastructure is in place, let's expose the availability of ARCH_WORKAROUND_2 to guests. We take this opportunity to tidy up a couple of SMCCC constants.
Acked-by: Christoffer Dall christoffer.dall@arm.com Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_host.h | 12 ++++++++++++ arch/arm/kvm/psci.c | 18 ++++++++++++++++-- arch/arm64/include/asm/kvm_host.h | 23 +++++++++++++++++++++++ arch/arm64/kvm/reset.c | 4 ++++ 4 files changed, 55 insertions(+), 2 deletions(-)
--- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -327,4 +327,16 @@ static inline bool kvm_arm_harden_branch return false; }
+#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + /* No way to detect it yet, pretend it is not there. */ + return KVM_SSBD_UNKNOWN; +} + #endif /* __ARM_KVM_HOST_H__ */ --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -403,7 +403,7 @@ static int kvm_psci_call(struct kvm_vcpu int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - u32 val = PSCI_RET_NOT_SUPPORTED; + u32 val = SMCCC_RET_NOT_SUPPORTED; u32 feature;
switch (func_id) { @@ -415,7 +415,21 @@ int kvm_hvc_call_handler(struct kvm_vcpu switch(feature) { case ARM_SMCCC_ARCH_WORKAROUND_1: if (kvm_arm_harden_branch_predictor()) - val = 0; + val = SMCCC_RET_SUCCESS; + break; + case ARM_SMCCC_ARCH_WORKAROUND_2: + switch (kvm_arm_have_ssbd()) { + case KVM_SSBD_FORCE_DISABLE: + case KVM_SSBD_UNKNOWN: + break; + case KVM_SSBD_KERNEL: + val = SMCCC_RET_SUCCESS; + break; + case KVM_SSBD_FORCE_ENABLE: + case KVM_SSBD_MITIGATED: + val = SMCCC_RET_NOT_REQUIRED; + break; + } break; } break; --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -425,4 +425,27 @@ static inline bool kvm_arm_harden_branch return cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR); }
+#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_DISABLE: + return KVM_SSBD_FORCE_DISABLE; + case ARM64_SSBD_KERNEL: + return KVM_SSBD_KERNEL; + case ARM64_SSBD_FORCE_ENABLE: + return KVM_SSBD_FORCE_ENABLE; + case ARM64_SSBD_MITIGATED: + return KVM_SSBD_MITIGATED; + case ARM64_SSBD_UNKNOWN: + default: + return KVM_SSBD_UNKNOWN; + } +} + #endif /* __ARM64_KVM_HOST_H__ */ --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -135,6 +135,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu /* Reset PMU */ kvm_pmu_vcpu_reset(vcpu);
+ /* Default workaround setup is enabled (if supported) */ + if (kvm_arm_have_ssbd() == KVM_SSBD_KERNEL) + vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; + /* Reset timer */ return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq); }
On Fri, Jul 20, 2018 at 02:13:17PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.114 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:47 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.114-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my OnePlus 6.
No initial issues noticed in dmesg or general usage.
Thanks! Nathan
On Fri, Jul 20, 2018 at 06:34:42AM -0700, Nathan Chancellor wrote:
On Fri, Jul 20, 2018 at 02:13:17PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.114 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:47 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.114-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my OnePlus 6.
No initial issues noticed in dmesg or general usage.
Wonderful, thanks for testing all 3 of these and letting me know.
greg k-h
On 20 July 2018 at 17:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.114 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:47 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.114-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.9.114-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: 97637db053bf8323a8b81dfbc2d32b9111c65377 git describe: v4.9.113-67-g97637db053bf Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.113-67-...
No regressions (compared to build v4.9.113-67-g8c66f4ccc9b1)
Ran 16396 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 07/20/2018 05:13 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.114 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 22 12:13:47 UTC 2018. Anything received after that time might be too late.
Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 166 pass: 166 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org