This is the start of the stable review cycle for the 4.9.95 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Apr 19 15:56:27 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.95-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.95-rc1
Phil Elwell phil@raspberrypi.org lan78xx: Correctly indicate invalid OTP
Stefan Hajnoczi stefanha@redhat.com vhost: fix vhost_vq_access_ok() log check
Tejaswi Tanikella tejaswit@codeaurora.org slip: Check if rstate is initialized before uncompressing
Ka-Cheong Poon ka-cheong.poon@oracle.com rds: MP-RDS may use an invalid c_path
Bassem Boubaker bassem.boubaker@actia.fr cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
Marek Szyprowski m.szyprowski@samsung.com hwmon: (ina2xx) Fix access to uninitialized mutex
Sudhir Sreedharan ssreedharan@mvista.com rtl8187: Fix NULL pointer dereference in priv->conf_mutex
Szymon Janc szymon.janc@codecoup.pl Bluetooth: Fix connection if directed advertising and privacy is used
Al Viro viro@zeniv.linux.org.uk getname_kernel() needs to make sure that ->name != ->iname in long case
Vasily Gorbik gor@linux.ibm.com s390/ipl: ensure loadparm valid flag is set
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qdio: don't merge ERROR output buffers
Julian Wiedmann jwi@linux.vnet.ibm.com s390/qdio: don't retry EQBS after CCQ 96
Dan Williams dan.j.williams@intel.com nfit: fix region registration vs block-data-window ranges
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp block/loop: fix deadlock after loop_set_status
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "perf tests: Decompress kernel module before objdump"
Eric Biggers ebiggers@google.com sunrpc: remove incorrect HMAC request initialization
Mark Rutland mark.rutland@arm.com arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
Mark Rutland mark.rutland@arm.com arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
Mark Rutland mark.rutland@arm.com arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
Mark Rutland mark.rutland@arm.com arm/arm64: smccc: Make function identifiers an unsigned quantity
Mark Rutland mark.rutland@arm.com firmware/psci: Expose SMCCC version through psci_ops
Mark Rutland mark.rutland@arm.com firmware/psci: Expose PSCI conduit
Mark Rutland mark.rutland@arm.com arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
Mark Rutland mark.rutland@arm.com arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
Mark Rutland mark.rutland@arm.com arm/arm64: KVM: Turn kvm_psci_version into a static inline
Mark Rutland mark.rutland@arm.com arm64: KVM: Make PSCI_VERSION a fast path
Mark Rutland mark.rutland@arm.com arm/arm64: KVM: Advertise SMCCC v1.1
Mark Rutland mark.rutland@arm.com arm/arm64: KVM: Implement PSCI 1.0 support
Mark Rutland mark.rutland@arm.com arm/arm64: KVM: Add smccc accessors to PSCI code
Mark Rutland mark.rutland@arm.com arm/arm64: KVM: Add PSCI_VERSION helper
Mark Rutland mark.rutland@arm.com arm/arm64: KVM: Consolidate the PSCI include files
Mark Rutland mark.rutland@arm.com arm64: KVM: Increment PC after handling an SMC trap
Mark Rutland mark.rutland@arm.com arm64: Branch predictor hardening for Cavium ThunderX2
Mark Rutland mark.rutland@arm.com arm64: Implement branch predictor hardening for affected Cortex-A CPUs
Mark Rutland mark.rutland@arm.com arm64: cpu_errata: Allow an erratum to be match for all revisions of a core
Mark Rutland mark.rutland@arm.com arm64: cputype: Add missing MIDR values for Cortex-A72 and Cortex-A75
Mark Rutland mark.rutland@arm.com arm64: entry: Apply BP hardening for suspicious interrupts from EL0
Mark Rutland mark.rutland@arm.com arm64: entry: Apply BP hardening for high-priority synchronous exceptions
Mark Rutland mark.rutland@arm.com arm64: KVM: Use per-CPU vector when BP hardening is enabled
Mark Rutland mark.rutland@arm.com mm: Introduce lm_alias
Mark Rutland mark.rutland@arm.com arm64: Move BP hardening to check_and_switch_context
Mark Rutland mark.rutland@arm.com arm64: Add skeleton to harden the branch predictor against aliasing attacks
Mark Rutland mark.rutland@arm.com arm64: Move post_ttbr_update_workaround to C code
Mark Rutland mark.rutland@arm.com arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
Mark Rutland mark.rutland@arm.com drivers/firmware: Expose psci_get_version through psci_ops structure
Mark Rutland mark.rutland@arm.com arm64: cpufeature: Pass capability structure to ->enable callback
Mark Rutland mark.rutland@arm.com arm64: Run enable method for errata work arounds on late CPUs
Mark Rutland mark.rutland@arm.com arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early
Mark Rutland mark.rutland@arm.com arm64: uaccess: Mask __user pointers for __arch_{clear, copy_*}_user
Mark Rutland mark.rutland@arm.com arm64: uaccess: Don't bother eliding access_ok checks in __{get, put}_user
Mark Rutland mark.rutland@arm.com arm64: uaccess: Prevent speculative use of the current addr_limit
Mark Rutland mark.rutland@arm.com arm64: entry: Ensure branch through syscall table is bounded under speculation
Mark Rutland mark.rutland@arm.com arm64: Use pointer masking to limit uaccess speculation
Mark Rutland mark.rutland@arm.com arm64: Make USER_DS an inclusive limit
Mark Rutland mark.rutland@arm.com arm64: move TASK_* definitions to <asm/processor.h>
Mark Rutland mark.rutland@arm.com arm64: Implement array_index_mask_nospec()
Mark Rutland mark.rutland@arm.com arm64: barrier: Add CSDB macros to control data-value prediction
Arnd Bergmann arnd@arndb.de radeon: hide pointless #warning when compile testing
Prashant Bhole bhole_prashant_q7@lab.ntt.co.jp perf/core: Fix use-after-free in uprobe_perf_close()
Adrian Hunter adrian.hunter@intel.com perf intel-pt: Fix timestamp following overflow
Adrian Hunter adrian.hunter@intel.com perf intel-pt: Fix error recovery from missing TIP packet
Adrian Hunter adrian.hunter@intel.com perf intel-pt: Fix sync_switch
Adrian Hunter adrian.hunter@intel.com perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
Dexuan Cui decui@microsoft.com Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Helge Deller deller@gmx.de parisc: Fix out of array access in match_pci_device()
Mauro Carvalho Chehab mchehab@kernel.org media: v4l2-compat-ioctl32: don't oops on overlay
-------------
Diffstat:
Makefile | 4 +- arch/arm/include/asm/kvm_host.h | 6 + arch/arm/include/asm/kvm_mmu.h | 10 + arch/arm/include/asm/kvm_psci.h | 27 - arch/arm/kvm/arm.c | 11 +- arch/arm/kvm/handle_exit.c | 4 +- arch/arm/kvm/psci.c | 143 +- arch/arm64/Kconfig | 17 + arch/arm64/crypto/sha256-core.S | 2061 ++++++++++++++++++++ arch/arm64/crypto/sha512-core.S | 1085 +++++++++++ arch/arm64/include/asm/assembler.h | 19 + arch/arm64/include/asm/barrier.h | 23 + arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/cputype.h | 6 + arch/arm64/include/asm/kvm_host.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 38 + arch/arm64/include/asm/kvm_psci.h | 27 - arch/arm64/include/asm/memory.h | 15 - arch/arm64/include/asm/mmu.h | 39 + arch/arm64/include/asm/processor.h | 24 + arch/arm64/include/asm/sysreg.h | 2 + arch/arm64/include/asm/uaccess.h | 153 +- arch/arm64/kernel/Makefile | 4 + arch/arm64/kernel/arm64ksyms.c | 4 +- arch/arm64/kernel/bpi.S | 75 + arch/arm64/kernel/cpu_errata.c | 189 +- arch/arm64/kernel/cpufeature.c | 10 +- arch/arm64/kernel/entry.S | 25 +- arch/arm64/kvm/handle_exit.c | 16 +- arch/arm64/kvm/hyp/hyp-entry.S | 20 +- arch/arm64/kvm/hyp/switch.c | 5 +- arch/arm64/lib/clear_user.S | 6 +- arch/arm64/lib/copy_in_user.S | 4 +- arch/arm64/mm/context.c | 12 + arch/arm64/mm/fault.c | 34 +- arch/arm64/mm/proc.S | 7 +- arch/parisc/kernel/drivers.c | 4 + arch/s390/kernel/ipl.c | 1 + drivers/acpi/nfit/core.c | 22 +- drivers/block/loop.c | 12 +- drivers/firmware/psci.c | 57 +- drivers/gpu/drm/radeon/radeon_object.c | 3 +- drivers/hv/channel_mgmt.c | 2 +- drivers/hwmon/ina2xx.c | 3 +- drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +- drivers/net/slip/slhc.c | 5 + drivers/net/usb/cdc_ether.c | 6 + drivers/net/usb/lan78xx.c | 3 +- drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c | 2 +- drivers/s390/cio/qdio_main.c | 42 +- drivers/vhost/vhost.c | 8 +- fs/namei.c | 3 +- include/kvm/arm_psci.h | 51 + include/linux/arm-smccc.h | 165 +- include/linux/mm.h | 4 + include/linux/psci.h | 14 + include/net/bluetooth/hci_core.h | 2 +- include/net/slhc_vj.h | 1 + include/uapi/linux/psci.h | 3 + kernel/events/core.c | 6 + net/bluetooth/hci_conn.c | 29 +- net/bluetooth/hci_event.c | 15 +- net/bluetooth/l2cap_core.c | 2 +- net/rds/send.c | 15 +- net/sunrpc/auth_gss/gss_krb5_crypto.c | 3 - tools/perf/tests/code-reading.c | 20 +- .../perf/util/intel-pt-decoder/intel-pt-decoder.c | 64 +- .../perf/util/intel-pt-decoder/intel-pt-decoder.h | 2 +- tools/perf/util/intel-pt.c | 37 +- 69 files changed, 4423 insertions(+), 320 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mauro Carvalho Chehab mchehab@s-opensource.com
commit 85ea29f19eab56ec16ec6b92bc67305998706afa upstream.
At put_v4l2_window32(), it tries to access kp->clips. However, kp points to an userspace pointer. So, it should be obtained via get_user(), otherwise it can OOPS:
vivid-000: ================== END STATUS ================== BUG: unable to handle kernel paging request at 00000000fffb18e0 IP: [<ffffffffc05468d9>] __put_v4l2_format32+0x169/0x220 [videodev] PGD 3f5776067 PUD 3f576f067 PMD 3f5769067 PTE 800000042548f067 Oops: 0001 [#1] SMP Modules linked in: vivid videobuf2_vmalloc videobuf2_memops v4l2_dv_timings videobuf2_core v4l2_common videodev media xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables bluetooth rfkill binfmt_misc snd_hda_codec_hdmi i915 snd_hda_intel snd_hda_controller snd_hda_codec intel_rapl x86_pkg_temp_thermal snd_hwdep intel_powerclamp snd_pcm coretemp snd_seq_midi kvm_intel kvm snd_seq_midi_event snd_rawmidi i2c_algo_bit drm_kms_helper snd_seq drm crct10dif_pclmul e1000e snd_seq_device crc32_pclmul snd_timer ghash_clmulni_intel snd mei_me mei ptp pps_core soundcore lpc_ich video crc32c_intel [last unloaded: media] CPU: 2 PID: 28332 Comm: v4l2-compliance Not tainted 3.18.102+ #107 Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017 task: ffff8804293f8000 ti: ffff8803f5640000 task.ti: ffff8803f5640000 RIP: 0010:[<ffffffffc05468d9>] [<ffffffffc05468d9>] __put_v4l2_format32+0x169/0x220 [videodev] RSP: 0018:ffff8803f5643e28 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffb1ab4 RDX: 00000000fffb1a68 RSI: 00000000fffb18d8 RDI: 00000000fffb1aa8 RBP: ffff8803f5643e48 R08: 0000000000000001 R09: ffff8803f54b0378 R10: 0000000000000000 R11: 0000000000000168 R12: 00000000fffb18c0 R13: 00000000fffb1a94 R14: 00000000fffb18c8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880456d00000(0063) knlGS:00000000f7100980 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000fffb18e0 CR3: 00000003f552b000 CR4: 00000000003407e0 Stack: 00000000fffb1a94 00000000c0cc5640 0000000000000056 ffff8804274f3600 ffff8803f5643ed0 ffffffffc0547e16 0000000000000003 ffff8803f5643eb0 ffffffff81301460 ffff88009db44b01 ffff880441942520 ffff8800c0d05640 Call Trace: [<ffffffffc0547e16>] v4l2_compat_ioctl32+0x12d6/0x1b1d [videodev] [<ffffffff81301460>] ? file_has_perm+0x70/0xc0 [<ffffffff81252a2c>] compat_SyS_ioctl+0xec/0x1200 [<ffffffff8173241a>] sysenter_dispatch+0x7/0x21 Code: 00 00 48 8b 80 48 c0 ff ff 48 83 e8 38 49 39 c6 0f 87 2b ff ff ff 49 8d 45 1c e8 a3 ce e3 c0 85 c0 0f 85 1a ff ff ff 41 8d 40 ff <4d> 8b 64 24 20 41 89 d5 48 8d 44 40 03 4d 8d 34 c4 eb 15 0f 1f RIP [<ffffffffc05468d9>] __put_v4l2_format32+0x169/0x220 [videodev] RSP <ffff8803f5643e28> CR2: 00000000fffb18e0
Tested with vivid driver on Kernel v3.18.102.
Same bug happens upstream too:
BUG: KASAN: user-memory-access in __put_v4l2_format32+0x98/0x4d0 [videodev] Read of size 8 at addr 00000000ffe48400 by task v4l2-compliance/8713
CPU: 0 PID: 8713 Comm: v4l2-compliance Not tainted 4.16.0-rc4+ #108 Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017 Call Trace: dump_stack+0x5c/0x7c kasan_report+0x164/0x380 ? __put_v4l2_format32+0x98/0x4d0 [videodev] __put_v4l2_format32+0x98/0x4d0 [videodev] v4l2_compat_ioctl32+0x1aec/0x27a0 [videodev] ? __fsnotify_inode_delete+0x20/0x20 ? __put_v4l2_format32+0x4d0/0x4d0 [videodev] compat_SyS_ioctl+0x646/0x14d0 ? do_ioctl+0x30/0x30 do_fast_syscall_32+0x191/0x3f4 entry_SYSENTER_compat+0x6b/0x7a ================================================================== Disabling lock debugging due to kernel taint BUG: unable to handle kernel paging request at 00000000ffe48400 IP: __put_v4l2_format32+0x98/0x4d0 [videodev] PGD 3a22fb067 P4D 3a22fb067 PUD 39b6f0067 PMD 39b6f1067 PTE 80000003256af067 Oops: 0001 [#1] SMP KASAN Modules linked in: vivid videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops v4l2_tpg v4l2_dv_timings videobuf2_v4l2 videobuf2_common v4l2_common videodev xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables bluetooth rfkill ecdh_generic binfmt_misc snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp i915 coretemp snd_hda_intel snd_hda_codec kvm_intel snd_hwdep snd_hda_core kvm snd_pcm irqbypass crct10dif_pclmul crc32_pclmul snd_seq_midi ghash_clmulni_intel snd_seq_midi_event i2c_algo_bit intel_cstate snd_rawmidi intel_uncore snd_seq drm_kms_helper e1000e snd_seq_device snd_timer intel_rapl_perf drm ptp snd mei_me mei lpc_ich pps_core soundcore video crc32c_intel CPU: 0 PID: 8713 Comm: v4l2-compliance Tainted: G B 4.16.0-rc4+ #108 Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017 RIP: 0010:__put_v4l2_format32+0x98/0x4d0 [videodev] RSP: 0018:ffff8803b9be7d30 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8803ac983e80 RCX: ffffffff8cd929f2 RDX: 1ffffffff1d0a149 RSI: 0000000000000297 RDI: 0000000000000297 RBP: 00000000ffe485c0 R08: fffffbfff1cf5123 R09: ffffffff8e7a8948 R10: 0000000000000001 R11: fffffbfff1cf5122 R12: 00000000ffe483e0 R13: 00000000ffe485c4 R14: ffff8803ac985918 R15: 00000000ffe483e8 FS: 0000000000000000(0000) GS:ffff880407400000(0063) knlGS:00000000f7a46980 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000ffe48400 CR3: 00000003a83f2003 CR4: 00000000003606f0 Call Trace: v4l2_compat_ioctl32+0x1aec/0x27a0 [videodev] ? __fsnotify_inode_delete+0x20/0x20 ? __put_v4l2_format32+0x4d0/0x4d0 [videodev] compat_SyS_ioctl+0x646/0x14d0 ? do_ioctl+0x30/0x30 do_fast_syscall_32+0x191/0x3f4 entry_SYSENTER_compat+0x6b/0x7a Code: 4c 89 f7 4d 8d 7c 24 08 e8 e6 a4 69 cb 48 8b 83 98 1a 00 00 48 83 e8 10 49 39 c7 0f 87 9d 01 00 00 49 8d 7c 24 20 e8 c8 a4 69 cb <4d> 8b 74 24 20 4c 89 ef 4c 89 fe ba 10 00 00 00 e8 23 d9 08 cc RIP: __put_v4l2_format32+0x98/0x4d0 [videodev] RSP: ffff8803b9be7d30 CR2: 00000000ffe48400
cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Reviewed-by: Sakari Ailus sakari.ailus@linux.intel.com Reviewed-by: Hans Verkuil hans.verkuil@cisco.com Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c @@ -101,7 +101,7 @@ static int get_v4l2_window32(struct v4l2 static int put_v4l2_window32(struct v4l2_window __user *kp, struct v4l2_window32 __user *up) { - struct v4l2_clip __user *kclips = kp->clips; + struct v4l2_clip __user *kclips; struct v4l2_clip32 __user *uclips; compat_caddr_t p; u32 clipcount; @@ -116,6 +116,8 @@ static int put_v4l2_window32(struct v4l2 if (!clipcount) return 0;
+ if (get_user(kclips, &kp->clips)) + return -EFAULT; if (get_user(p, &up->clips)) return -EFAULT; uclips = compat_ptr(p);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 615b2665fd20c327b631ff1e79426775de748094 upstream.
As found by the ubsan checker, the value of the 'index' variable can be out of range for the bc[] array:
UBSAN: Undefined behaviour in arch/parisc/kernel/drivers.c:655:21 index 6 is out of range for type 'char [6]' Backtrace: [<104fa850>] __ubsan_handle_out_of_bounds+0x68/0x80 [<1019d83c>] check_parent+0xc0/0x170 [<1019d91c>] descend_children+0x30/0x6c [<1059e164>] device_for_each_child+0x60/0x98 [<1019cd54>] parse_tree_node+0x40/0x54 [<1019d86c>] check_parent+0xf0/0x170 [<1019d91c>] descend_children+0x30/0x6c [<1059e164>] device_for_each_child+0x60/0x98 [<1019d938>] descend_children+0x4c/0x6c [<1059e164>] device_for_each_child+0x60/0x98 [<1019cd54>] parse_tree_node+0x40/0x54 [<1019cffc>] hwpath_to_device+0xa4/0xc4
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/parisc/kernel/drivers.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/parisc/kernel/drivers.c +++ b/arch/parisc/kernel/drivers.c @@ -648,6 +648,10 @@ static int match_pci_device(struct devic (modpath->mod == PCI_FUNC(devfn))); }
+ /* index might be out of bounds for bc[] */ + if (index >= 6) + return 0; + id = PCI_SLOT(pdev->devfn) | (PCI_FUNC(pdev->devfn) << 5); return (modpath->bc[index] == id); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui decui@microsoft.com
commit 238064f13d057390a8c5e1a6a80f4f0a0ec46499 upstream.
The pci-hyperv driver's channel callback hv_pci_onchannelcallback() is not really a hot path, so we don't need to mark it as a perf_device, meaning with this patch all HV_PCIE channels' target_cpu will be CPU0.
Signed-off-by: Dexuan Cui decui@microsoft.com Cc: stable@vger.kernel.org Cc: Stephen Hemminger sthemmin@microsoft.com Signed-off-by: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hv/channel_mgmt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -70,7 +70,7 @@ static const struct vmbus_device vmbus_d /* PCIE */ { .dev_type = HV_PCIE, HV_PCIE_GUID, - .perf_device = true, + .perf_device = false, },
/* Synthetic Frame Buffer */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit 117db4b27bf08dba412faf3924ba55fe970c57b8 upstream.
Overlap detection was not not updating the buffer's 'consecutive' flag. Marking buffers consecutive has the advantage that decoding begins from the start of the buffer instead of the first PSB. Fix overlap detection to identify consecutive buffers correctly.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-2-git-send-email-adrian.hunter@int... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 62 +++++++++----------- tools/perf/util/intel-pt-decoder/intel-pt-decoder.h | 2 tools/perf/util/intel-pt.c | 5 + 3 files changed, 34 insertions(+), 35 deletions(-)
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -2182,14 +2182,6 @@ const struct intel_pt_state *intel_pt_de return &decoder->state; }
-static bool intel_pt_at_psb(unsigned char *buf, size_t len) -{ - if (len < INTEL_PT_PSB_LEN) - return false; - return memmem(buf, INTEL_PT_PSB_LEN, INTEL_PT_PSB_STR, - INTEL_PT_PSB_LEN); -} - /** * intel_pt_next_psb - move buffer pointer to the start of the next PSB packet. * @buf: pointer to buffer pointer @@ -2278,6 +2270,7 @@ static unsigned char *intel_pt_last_psb( * @buf: buffer * @len: size of buffer * @tsc: TSC value returned + * @rem: returns remaining size when TSC is found * * Find a TSC packet in @buf and return the TSC value. This function assumes * that @buf starts at a PSB and that PSB+ will contain TSC and so stops if a @@ -2285,7 +2278,8 @@ static unsigned char *intel_pt_last_psb( * * Return: %true if TSC is found, false otherwise. */ -static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc) +static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc, + size_t *rem) { struct intel_pt_pkt packet; int ret; @@ -2296,6 +2290,7 @@ static bool intel_pt_next_tsc(unsigned c return false; if (packet.type == INTEL_PT_TSC) { *tsc = packet.payload; + *rem = len; return true; } if (packet.type == INTEL_PT_PSBEND) @@ -2346,6 +2341,8 @@ static int intel_pt_tsc_cmp(uint64_t tsc * @len_a: size of first buffer * @buf_b: second buffer * @len_b: size of second buffer + * @consecutive: returns true if there is data in buf_b that is consecutive + * to buf_a * * If the trace contains TSC we can look at the last TSC of @buf_a and the * first TSC of @buf_b in order to determine if the buffers overlap, and then @@ -2358,33 +2355,41 @@ static int intel_pt_tsc_cmp(uint64_t tsc static unsigned char *intel_pt_find_overlap_tsc(unsigned char *buf_a, size_t len_a, unsigned char *buf_b, - size_t len_b) + size_t len_b, bool *consecutive) { uint64_t tsc_a, tsc_b; unsigned char *p; - size_t len; + size_t len, rem_a, rem_b;
p = intel_pt_last_psb(buf_a, len_a); if (!p) return buf_b; /* No PSB in buf_a => no overlap */
len = len_a - (p - buf_a); - if (!intel_pt_next_tsc(p, len, &tsc_a)) { + if (!intel_pt_next_tsc(p, len, &tsc_a, &rem_a)) { /* The last PSB+ in buf_a is incomplete, so go back one more */ len_a -= len; p = intel_pt_last_psb(buf_a, len_a); if (!p) return buf_b; /* No full PSB+ => assume no overlap */ len = len_a - (p - buf_a); - if (!intel_pt_next_tsc(p, len, &tsc_a)) + if (!intel_pt_next_tsc(p, len, &tsc_a, &rem_a)) return buf_b; /* No TSC in buf_a => assume no overlap */ }
while (1) { /* Ignore PSB+ with no TSC */ - if (intel_pt_next_tsc(buf_b, len_b, &tsc_b) && - intel_pt_tsc_cmp(tsc_a, tsc_b) < 0) - return buf_b; /* tsc_a < tsc_b => no overlap */ + if (intel_pt_next_tsc(buf_b, len_b, &tsc_b, &rem_b)) { + int cmp = intel_pt_tsc_cmp(tsc_a, tsc_b); + + /* Same TSC, so buffers are consecutive */ + if (!cmp && rem_b >= rem_a) { + *consecutive = true; + return buf_b + len_b - (rem_b - rem_a); + } + if (cmp < 0) + return buf_b; /* tsc_a < tsc_b => no overlap */ + }
if (!intel_pt_step_psb(&buf_b, &len_b)) return buf_b + len_b; /* No PSB in buf_b => no data */ @@ -2398,6 +2403,8 @@ static unsigned char *intel_pt_find_over * @buf_b: second buffer * @len_b: size of second buffer * @have_tsc: can use TSC packets to detect overlap + * @consecutive: returns true if there is data in buf_b that is consecutive + * to buf_a * * When trace samples or snapshots are recorded there is the possibility that * the data overlaps. Note that, for the purposes of decoding, data is only @@ -2408,7 +2415,7 @@ static unsigned char *intel_pt_find_over */ unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a, unsigned char *buf_b, size_t len_b, - bool have_tsc) + bool have_tsc, bool *consecutive) { unsigned char *found;
@@ -2420,7 +2427,8 @@ unsigned char *intel_pt_find_overlap(uns return buf_b; /* No overlap */
if (have_tsc) { - found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b); + found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b, + consecutive); if (found) return found; } @@ -2435,28 +2443,16 @@ unsigned char *intel_pt_find_overlap(uns }
/* Now len_b >= len_a */ - if (len_b > len_a) { - /* The leftover buffer 'b' must start at a PSB */ - while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) { - if (!intel_pt_step_psb(&buf_a, &len_a)) - return buf_b; /* No overlap */ - } - } - while (1) { /* Potential overlap so check the bytes */ found = memmem(buf_a, len_a, buf_b, len_a); - if (found) + if (found) { + *consecutive = true; return buf_b + len_a; + }
/* Try again at next PSB in buffer 'a' */ if (!intel_pt_step_psb(&buf_a, &len_a)) return buf_b; /* No overlap */ - - /* The leftover buffer 'b' must start at a PSB */ - while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) { - if (!intel_pt_step_psb(&buf_a, &len_a)) - return buf_b; /* No overlap */ - } } } --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h @@ -103,7 +103,7 @@ const struct intel_pt_state *intel_pt_de
unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a, unsigned char *buf_b, size_t len_b, - bool have_tsc); + bool have_tsc, bool *consecutive);
int intel_pt__strerror(int code, char *buf, size_t buflen);
--- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -194,14 +194,17 @@ static void intel_pt_dump_event(struct i static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a, struct auxtrace_buffer *b) { + bool consecutive = false; void *start;
start = intel_pt_find_overlap(a->data, a->size, b->data, b->size, - pt->have_tsc); + pt->have_tsc, &consecutive); if (!start) return -EINVAL; b->use_size = b->data + b->size - start; b->use_data = start; + if (b->use_size && consecutive) + b->consecutive = true; return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit 63d8e38f6ae6c36dd5b5ba0e8c112e8861532ea2 upstream.
sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched.
The flag when sync_switch is enabled was global to the decoding, whereas it is really specific to the CPU.
The trace data for different CPUs is put on different queues, so add sync_switch to the intel_pt_queue structure and use that in preference to the global setting in the intel_pt structure.
That fixes problems decoding one CPU's trace because sync_switch was disabled on a different CPU's queue.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-3-git-send-email-adrian.hunter@int... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/intel-pt.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-)
--- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -131,6 +131,7 @@ struct intel_pt_queue { bool stop; bool step_through_buffers; bool use_buffer_pid_tid; + bool sync_switch; pid_t pid, tid; int cpu; int switch_state; @@ -931,10 +932,12 @@ static int intel_pt_setup_queue(struct i if (pt->timeless_decoding || !pt->have_sched_switch) ptq->use_buffer_pid_tid = true; } + + ptq->sync_switch = pt->sync_switch; }
if (!ptq->on_heap && - (!pt->sync_switch || + (!ptq->sync_switch || ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) { const struct intel_pt_state *state; int ret; @@ -1336,7 +1339,7 @@ static int intel_pt_sample(struct intel_ if (pt->synth_opts.last_branch) intel_pt_update_last_branch_rb(ptq);
- if (!pt->sync_switch) + if (!ptq->sync_switch) return 0;
if (intel_pt_is_switch_ip(ptq, state->to_ip)) { @@ -1417,6 +1420,21 @@ static u64 intel_pt_switch_ip(struct int return switch_ip; }
+static void intel_pt_enable_sync_switch(struct intel_pt *pt) +{ + unsigned int i; + + pt->sync_switch = true; + + for (i = 0; i < pt->queues.nr_queues; i++) { + struct auxtrace_queue *queue = &pt->queues.queue_array[i]; + struct intel_pt_queue *ptq = queue->priv; + + if (ptq) + ptq->sync_switch = true; + } +} + static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp) { const struct intel_pt_state *state = ptq->state; @@ -1433,7 +1451,7 @@ static int intel_pt_run_decoder(struct i if (pt->switch_ip) { intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n", pt->switch_ip, pt->ptss_ip); - pt->sync_switch = true; + intel_pt_enable_sync_switch(pt); } } } @@ -1449,9 +1467,9 @@ static int intel_pt_run_decoder(struct i if (state->err) { if (state->err == INTEL_PT_ERR_NODATA) return 1; - if (pt->sync_switch && + if (ptq->sync_switch && state->from_ip >= pt->kernel_start) { - pt->sync_switch = false; + ptq->sync_switch = false; intel_pt_next_tid(pt, ptq); } if (pt->synth_opts.errors) { @@ -1477,7 +1495,7 @@ static int intel_pt_run_decoder(struct i state->timestamp, state->est_timestamp); ptq->timestamp = state->est_timestamp; /* Use estimated TSC in unknown switch state */ - } else if (pt->sync_switch && + } else if (ptq->sync_switch && ptq->switch_state == INTEL_PT_SS_UNKNOWN && intel_pt_is_switch_ip(ptq, state->to_ip) && ptq->next_tid == -1) { @@ -1624,7 +1642,7 @@ static int intel_pt_sync_switch(struct i return 1;
ptq = intel_pt_cpu_to_ptq(pt, cpu); - if (!ptq) + if (!ptq || !ptq->sync_switch) return 1;
switch (ptq->switch_state) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit 1c196a6c771c47a2faa63d38d913e03284f73a16 upstream.
When a TIP packet is expected but there is a different packet, it is an error. However the unexpected packet might be something important like a TSC packet, so after the error, it is necessary to continue from there, rather than the next packet. That is achieved by setting pkt_step to zero.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-4-git-send-email-adrian.hunter@int... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 1 + 1 file changed, 1 insertion(+)
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -1522,6 +1522,7 @@ static int intel_pt_walk_fup_tip(struct case INTEL_PT_PSBEND: intel_pt_log("ERROR: Missing TIP after FUP\n"); decoder->pkt_state = INTEL_PT_STATE_ERR3; + decoder->pkt_step = 0; return -ENOENT;
case INTEL_PT_OVF:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit 91d29b288aed3406caf7c454bf2b898c96cfd177 upstream.
timestamp_insn_cnt is used to estimate the timestamp based on the number of instructions since the last known timestamp.
If the estimate is not accurate enough decoding might not be correctly synchronized with side-band events causing more trace errors.
However there are always timestamps following an overflow, so the estimate is not needed and can indeed result in more errors.
Suppress the estimate by setting timestamp_insn_cnt to zero.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1520431349-30689-5-git-send-email-adrian.hunter@int... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 1 + 1 file changed, 1 insertion(+)
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -1300,6 +1300,7 @@ static int intel_pt_overflow(struct inte intel_pt_clear_tx_flags(decoder); decoder->have_tma = false; decoder->cbr = 0; + decoder->timestamp_insn_cnt = 0; decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC; decoder->overflow = true; return -EOVERFLOW;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prashant Bhole bhole_prashant_q7@lab.ntt.co.jp
commit 621b6d2ea297d0fb6030452c5bcd221f12165fcf upstream.
A use-after-free bug was caught by KASAN while running usdt related code (BCC project. bcc/tests/python/test_usdt2.py):
================================================================== BUG: KASAN: use-after-free in uprobe_perf_close+0x222/0x3b0 Read of size 4 at addr ffff880384f9b4a4 by task test_usdt2.py/870
CPU: 4 PID: 870 Comm: test_usdt2.py Tainted: G W 4.16.0-next-20180409 #215 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0xc7/0x15b ? show_regs_print_info+0x5/0x5 ? printk+0x9c/0xc3 ? kmsg_dump_rewind_nolock+0x6e/0x6e ? uprobe_perf_close+0x222/0x3b0 print_address_description+0x83/0x3a0 ? uprobe_perf_close+0x222/0x3b0 kasan_report+0x1dd/0x460 ? uprobe_perf_close+0x222/0x3b0 uprobe_perf_close+0x222/0x3b0 ? probes_open+0x180/0x180 ? free_filters_list+0x290/0x290 trace_uprobe_register+0x1bb/0x500 ? perf_event_attach_bpf_prog+0x310/0x310 ? probe_event_disable+0x4e0/0x4e0 perf_uprobe_destroy+0x63/0xd0 _free_event+0x2bc/0xbd0 ? lockdep_rcu_suspicious+0x100/0x100 ? ring_buffer_attach+0x550/0x550 ? kvm_sched_clock_read+0x1a/0x30 ? perf_event_release_kernel+0x3e4/0xc00 ? __mutex_unlock_slowpath+0x12e/0x540 ? wait_for_completion+0x430/0x430 ? lock_downgrade+0x3c0/0x3c0 ? lock_release+0x980/0x980 ? do_raw_spin_trylock+0x118/0x150 ? do_raw_spin_unlock+0x121/0x210 ? do_raw_spin_trylock+0x150/0x150 perf_event_release_kernel+0x5d4/0xc00 ? put_event+0x30/0x30 ? fsnotify+0xd2d/0xea0 ? sched_clock_cpu+0x18/0x1a0 ? __fsnotify_update_child_dentry_flags.part.0+0x1b0/0x1b0 ? pvclock_clocksource_read+0x152/0x2b0 ? pvclock_read_flags+0x80/0x80 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock_cpu+0x18/0x1a0 ? pvclock_clocksource_read+0x152/0x2b0 ? locks_remove_file+0xec/0x470 ? pvclock_read_flags+0x80/0x80 ? fcntl_setlk+0x880/0x880 ? ima_file_free+0x8d/0x390 ? lockdep_rcu_suspicious+0x100/0x100 ? ima_file_check+0x110/0x110 ? fsnotify+0xea0/0xea0 ? kvm_sched_clock_read+0x1a/0x30 ? rcu_note_context_switch+0x600/0x600 perf_release+0x21/0x40 __fput+0x264/0x620 ? fput+0xf0/0xf0 ? do_raw_spin_unlock+0x121/0x210 ? do_raw_spin_trylock+0x150/0x150 ? SyS_fchdir+0x100/0x100 ? fsnotify+0xea0/0xea0 task_work_run+0x14b/0x1e0 ? task_work_cancel+0x1c0/0x1c0 ? copy_fd_bitmaps+0x150/0x150 ? vfs_read+0xe5/0x260 exit_to_usermode_loop+0x17b/0x1b0 ? trace_event_raw_event_sys_exit+0x1a0/0x1a0 do_syscall_64+0x3f6/0x490 ? syscall_return_slowpath+0x2c0/0x2c0 ? lockdep_sys_exit+0x1f/0xaa ? syscall_return_slowpath+0x1a3/0x2c0 ? lockdep_sys_exit+0x1f/0xaa ? prepare_exit_to_usermode+0x11c/0x1e0 ? enter_from_user_mode+0x30/0x30 random: crng init done ? __put_user_4+0x1c/0x30 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7f41d95f9340 RSP: 002b:00007fffe71e4268 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 000000000000000d RCX: 00007f41d95f9340 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 000000000000000d RBP: 0000000000000000 R08: 00007f41ca8ff700 R09: 00007f41d996dd1f R10: 00007fffe71e41e0 R11: 0000000000000246 R12: 00007fffe71e4330 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007fffe71e4290
Allocated by task 870: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc_node+0x11a/0x430 copy_process.part.19+0x11a0/0x41c0 _do_fork+0x1be/0xa20 do_syscall_64+0x198/0x490 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Freed by task 0: __kasan_slab_free+0x12e/0x180 kmem_cache_free+0x102/0x4d0 free_task+0xfe/0x160 __put_task_struct+0x189/0x290 delayed_put_task_struct+0x119/0x250 rcu_process_callbacks+0xa6c/0x1b60 __do_softirq+0x238/0x7ae
The buggy address belongs to the object at ffff880384f9b480 which belongs to the cache task_struct of size 12928
It occurs because task_struct is freed before perf_event which refers to the task and task flags are checked while teardown of the event. perf_event_alloc() assigns task_struct to hw.target of perf_event, but there is no reference counting for it.
As a fix we get_task_struct() in perf_event_alloc() at above mentioned assignment and put_task_struct() in _free_event().
Signed-off-by: Prashant Bhole bhole_prashant_q7@lab.ntt.co.jp Reviewed-by: Oleg Nesterov oleg@redhat.com Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: 63b6da39bb38e8f1a1ef3180d32a39d6 ("perf: Fix perf_event_exit_task() race") Link: http://lkml.kernel.org/r/20180409100346.6416-1-bhole_prashant_q7@lab.ntt.co.... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/events/core.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4091,6 +4091,9 @@ static void _free_event(struct perf_even if (event->ctx) put_ctx(event->ctx);
+ if (event->hw.target) + put_task_struct(event->hw.target); + exclusive_event_destroy(event); module_put(event->pmu->module);
@@ -9214,6 +9217,7 @@ perf_event_alloc(struct perf_event_attr * and we cannot use the ctx information because we need the * pmu before we get a ctx. */ + get_task_struct(task); event->hw.target = task; }
@@ -9331,6 +9335,8 @@ err_ns: perf_detach_cgroup(event); if (event->ns) put_pid_ns(event->ns); + if (event->hw.target) + put_task_struct(event->hw.target); kfree(event);
return ERR_PTR(err);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 669474e772b952b14f4de4845a1558fd4c0414a4 upstream.
For CPUs capable of data value prediction, CSDB waits for any outstanding predictions to architecturally resolve before allowing speculative execution to continue. Provide macros to expose it to the arch code.
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/assembler.h | 7 +++++++ arch/arm64/include/asm/barrier.h | 2 ++ 2 files changed, 9 insertions(+)
--- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -87,6 +87,13 @@ .endm
/* + * Value prediction barrier + */ + .macro csdb + hint #20 + .endm + +/* * NOP sequence */ .macro nops, num --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -31,6 +31,8 @@ #define dmb(opt) asm volatile("dmb " #opt : : : "memory") #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
+#define csdb() asm volatile("hint #20" : : : "memory") + #define mb() dsb(sy) #define rmb() dsb(ld) #define wmb() dsb(st)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Robin Murphy robin.murphy@arm.com
commit 022620eed3d0bc4bf2027326f599f5ad71c2ea3f upstream.
Provide an optimised, assembly implementation of array_index_mask_nospec() for arm64 so that the compiler is not in a position to transform the code in ways which affect its ability to inhibit speculation (e.g. by introducing conditional branches).
This is similar to the sequence used by x86, modulo architectural differences in the carry/borrow flags.
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/barrier.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -40,6 +40,27 @@ #define dma_rmb() dmb(oshld) #define dma_wmb() dmb(oshst)
+/* + * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz + * and 0 otherwise. + */ +#define array_index_mask_nospec array_index_mask_nospec +static inline unsigned long array_index_mask_nospec(unsigned long idx, + unsigned long sz) +{ + unsigned long mask; + + asm volatile( + " cmp %1, %2\n" + " sbc %0, xzr, xzr\n" + : "=r" (mask) + : "r" (idx), "Ir" (sz) + : "cc"); + + csdb(); + return mask; +} + #define __smp_mb() dmb(ish) #define __smp_rmb() dmb(ishld) #define __smp_wmb() dmb(ishst)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Yury Norov ynorov@caviumnetworks.com
commit eef94a3d09aab437c8c254de942d8b1aa76455e2 upstream.
ILP32 series [1] introduces the dependency on <asm/is_compat.h> for TASK_SIZE macro. Which in turn requires <asm/thread_info.h>, and <asm/thread_info.h> include <asm/memory.h>, giving a circular dependency, because TASK_SIZE is currently located in <asm/memory.h>.
In other architectures, TASK_SIZE is defined in <asm/processor.h>, and moving TASK_SIZE there fixes the problem.
Discussion: https://patchwork.kernel.org/patch/9929107/
[1] https://github.com/norov/linux/tree/ilp32-next
CC: Will Deacon will.deacon@arm.com CC: Laura Abbott labbott@redhat.com Cc: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: James Morse james.morse@arm.com Suggested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Yury Norov ynorov@caviumnetworks.com Signed-off-by: Will Deacon will.deacon@arm.com [v4.9: necessary for making USER_DS an inclusive limit] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/memory.h | 15 --------------- arch/arm64/include/asm/processor.h | 21 +++++++++++++++++++++ arch/arm64/kernel/entry.S | 1 + 3 files changed, 22 insertions(+), 15 deletions(-)
--- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -60,8 +60,6 @@ * KIMAGE_VADDR - the virtual address of the start of the kernel image * VA_BITS - the maximum number of bits for virtual addresses. * VA_START - the first kernel virtual address. - * TASK_SIZE - the maximum size of a user space task. - * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area. */ #define VA_BITS (CONFIG_ARM64_VA_BITS) #define VA_START (UL(0xffffffffffffffff) - \ @@ -76,19 +74,6 @@ #define PCI_IO_END (VMEMMAP_START - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) #define FIXADDR_TOP (PCI_IO_START - SZ_2M) -#define TASK_SIZE_64 (UL(1) << VA_BITS) - -#ifdef CONFIG_COMPAT -#define TASK_SIZE_32 UL(0x100000000) -#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ - TASK_SIZE_32 : TASK_SIZE_64) -#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \ - TASK_SIZE_32 : TASK_SIZE_64) -#else -#define TASK_SIZE TASK_SIZE_64 -#endif /* CONFIG_COMPAT */ - -#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 4))
#define KERNEL_START _text #define KERNEL_END _end --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -19,6 +19,10 @@ #ifndef __ASM_PROCESSOR_H #define __ASM_PROCESSOR_H
+#define TASK_SIZE_64 (UL(1) << VA_BITS) + +#ifndef __ASSEMBLY__ + /* * Default implementation of macro that returns current * instruction pointer ("program counter"). @@ -37,6 +41,22 @@ #include <asm/ptrace.h> #include <asm/types.h>
+/* + * TASK_SIZE - the maximum size of a user space task. + * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area. + */ +#ifdef CONFIG_COMPAT +#define TASK_SIZE_32 UL(0x100000000) +#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ + TASK_SIZE_32 : TASK_SIZE_64) +#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \ + TASK_SIZE_32 : TASK_SIZE_64) +#else +#define TASK_SIZE TASK_SIZE_64 +#endif /* CONFIG_COMPAT */ + +#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 4)) + #define STACK_TOP_MAX TASK_SIZE_64 #ifdef CONFIG_COMPAT #define AARCH32_VECTORS_BASE 0xffff0000 @@ -192,4 +212,5 @@ int cpu_enable_pan(void *__unused); int cpu_enable_uao(void *__unused); int cpu_enable_cache_maint_trap(void *__unused);
+#endif /* __ASSEMBLY__ */ #endif /* __ASM_PROCESSOR_H */ --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -30,6 +30,7 @@ #include <asm/irq.h> #include <asm/memory.h> #include <asm/mmu.h> +#include <asm/processor.h> #include <asm/thread_info.h> #include <asm/asm-uaccess.h> #include <asm/unistd.h>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Robin Murphy robin.murphy@arm.com
commit 51369e398d0d33e8f524314e672b07e8cf870e79 upstream.
Currently, USER_DS represents an exclusive limit while KERNEL_DS is inclusive. In order to do some clever trickery for speculation-safe masking, we need them both to behave equivalently - there aren't enough bits to make KERNEL_DS exclusive, so we have precisely one option. This also happens to correct a longstanding false negative for a range ending on the very top byte of kernel memory.
Mark Rutland points out that we've actually got the semantics of addresses vs. segments muddled up in most of the places we need to amend, so shuffle the {USER,KERNEL}_DS definitions around such that we can correct those properly instead of just pasting "-1"s everywhere.
Signed-off-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: avoid dependence on TTBR0 SW PAN and THREAD_INFO_IN_TASK_STRUCT] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/processor.h | 3 ++ arch/arm64/include/asm/uaccess.h | 46 +++++++++++++++++++++---------------- arch/arm64/kernel/entry.S | 4 +-- arch/arm64/mm/fault.c | 2 - 4 files changed, 33 insertions(+), 22 deletions(-)
--- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -21,6 +21,9 @@
#define TASK_SIZE_64 (UL(1) << VA_BITS)
+#define KERNEL_DS UL(-1) +#define USER_DS (TASK_SIZE_64 - 1) + #ifndef __ASSEMBLY__
/* --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -28,6 +28,7 @@
#include <asm/alternative.h> #include <asm/cpufeature.h> +#include <asm/processor.h> #include <asm/ptrace.h> #include <asm/sysreg.h> #include <asm/errno.h> @@ -59,10 +60,7 @@ struct exception_table_entry
extern int fixup_exception(struct pt_regs *regs);
-#define KERNEL_DS (-1UL) #define get_ds() (KERNEL_DS) - -#define USER_DS TASK_SIZE_64 #define get_fs() (current_thread_info()->addr_limit)
static inline void set_fs(mm_segment_t fs) @@ -87,22 +85,32 @@ static inline void set_fs(mm_segment_t f * Returns 1 if the range is valid, 0 otherwise. * * This is equivalent to the following test: - * (u65)addr + (u65)size <= current->addr_limit - * - * This needs 65-bit arithmetic. + * (u65)addr + (u65)size <= (u65)current->addr_limit + 1 */ -#define __range_ok(addr, size) \ -({ \ - unsigned long __addr = (unsigned long __force)(addr); \ - unsigned long flag, roksum; \ - __chk_user_ptr(addr); \ - asm("adds %1, %1, %3; ccmp %1, %4, #2, cc; cset %0, ls" \ - : "=&r" (flag), "=&r" (roksum) \ - : "1" (__addr), "Ir" (size), \ - "r" (current_thread_info()->addr_limit) \ - : "cc"); \ - flag; \ -}) +static inline unsigned long __range_ok(unsigned long addr, unsigned long size) +{ + unsigned long limit = current_thread_info()->addr_limit; + + __chk_user_ptr(addr); + asm volatile( + // A + B <= C + 1 for all A,B,C, in four easy steps: + // 1: X = A + B; X' = X % 2^64 + " adds %0, %0, %2\n" + // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4 + " csel %1, xzr, %1, hi\n" + // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X' + // to compensate for the carry flag being set in step 4. For + // X > 2^64, X' merely has to remain nonzero, which it does. + " csinv %0, %0, xzr, cc\n" + // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1 + // comes from the carry in being clear. Otherwise, we are + // testing X' - C == 0, subject to the previous adjustments. + " sbcs xzr, %0, %1\n" + " cset %0, ls\n" + : "+r" (addr), "+r" (limit) : "Ir" (size) : "cc"); + + return addr; +}
/* * When dealing with data aborts, watchpoints, or instruction traps we may end @@ -111,7 +119,7 @@ static inline void set_fs(mm_segment_t f */ #define untagged_addr(addr) sign_extend64(addr, 55)
-#define access_ok(type, addr, size) __range_ok(addr, size) +#define access_ok(type, addr, size) __range_ok((unsigned long)(addr), size) #define user_addr_max get_fs
#define _ASM_EXTABLE(from, to) \ --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -126,10 +126,10 @@ alternative_else_nop_endif .else add x21, sp, #S_FRAME_SIZE get_thread_info tsk - /* Save the task's original addr_limit and set USER_DS (TASK_SIZE_64) */ + /* Save the task's original addr_limit and set USER_DS */ ldr x20, [tsk, #TI_ADDR_LIMIT] str x20, [sp, #S_ORIG_ADDR_LIMIT] - mov x20, #TASK_SIZE_64 + mov x20, #USER_DS str x20, [tsk, #TI_ADDR_LIMIT] /* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */ .endif /* \el == 0 */ --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -332,7 +332,7 @@ static int __kprobes do_page_fault(unsig mm_flags |= FAULT_FLAG_WRITE; }
- if (is_permission_fault(esr) && (addr < USER_DS)) { + if (is_permission_fault(esr) && (addr < TASK_SIZE)) { /* regs->orig_addr_limit may be 0 if we entered from EL0 */ if (regs->orig_addr_limit == KERNEL_DS) die("Accessing user space memory with fs=KERNEL_DS", regs, esr);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Robin Murphy robin.murphy@arm.com
commit 4d8efc2d5ee4c9ccfeb29ee8afd47a8660d0c0ce upstream.
Similarly to x86, mitigate speculation past an access_ok() check by masking the pointer against the address limit before use.
Even if we don't expect speculative writes per se, it is plausible that a CPU may still speculate at least as far as fetching a cache line for writing, hence we also harden put_user() and clear_user() for peace of mind.
Signed-off-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/uaccess.h | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-)
--- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -129,6 +129,26 @@ static inline unsigned long __range_ok(u " .popsection\n"
/* + * Sanitise a uaccess pointer such that it becomes NULL if above the + * current addr_limit. + */ +#define uaccess_mask_ptr(ptr) (__typeof__(ptr))__uaccess_mask_ptr(ptr) +static inline void __user *__uaccess_mask_ptr(const void __user *ptr) +{ + void __user *safe_ptr; + + asm volatile( + " bics xzr, %1, %2\n" + " csel %0, %1, xzr, eq\n" + : "=&r" (safe_ptr) + : "r" (ptr), "r" (current_thread_info()->addr_limit) + : "cc"); + + csdb(); + return safe_ptr; +} + +/* * The "__xxx" versions of the user access functions do not verify the address * space - it must have been done previously with a separate "access_ok()" * call. @@ -202,7 +222,7 @@ do { \ __typeof__(*(ptr)) __user *__p = (ptr); \ might_fault(); \ access_ok(VERIFY_READ, __p, sizeof(*__p)) ? \ - __get_user((x), __p) : \ + __p = uaccess_mask_ptr(__p), __get_user((x), __p) : \ ((x) = 0, -EFAULT); \ })
@@ -270,7 +290,7 @@ do { \ __typeof__(*(ptr)) __user *__p = (ptr); \ might_fault(); \ access_ok(VERIFY_WRITE, __p, sizeof(*__p)) ? \ - __put_user((x), __p) : \ + __p = uaccess_mask_ptr(__p), __put_user((x), __p) : \ -EFAULT; \ })
@@ -331,7 +351,7 @@ static inline unsigned long __must_check static inline unsigned long __must_check clear_user(void __user *to, unsigned long n) { if (access_ok(VERIFY_WRITE, to, n)) - n = __clear_user(to, n); + n = __clear_user(__uaccess_mask_ptr(to), n); return n; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 6314d90e64936c584f300a52ef173603fb2461b5 upstream.
In a similar manner to array_index_mask_nospec, this patch introduces an assembly macro (mask_nospec64) which can be used to bound a value under speculation. This macro is then used to ensure that the indirect branch through the syscall table is bounded under speculation, with out-of-range addresses speculating as calls to sys_io_setup (0).
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: use existing scno & sc_nr definitions] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/assembler.h | 11 +++++++++++ arch/arm64/kernel/entry.S | 1 + 2 files changed, 12 insertions(+)
--- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -94,6 +94,17 @@ .endm
/* + * Sanitise a 64-bit bounded index wrt speculation, returning zero if out + * of bounds. + */ + .macro mask_nospec64, idx, limit, tmp + sub \tmp, \idx, \limit + bic \tmp, \tmp, \idx + and \idx, \idx, \tmp, asr #63 + csdb + .endm + +/* * NOP sequence */ .macro nops, num --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -795,6 +795,7 @@ el0_svc_naked: // compat entry point b.ne __sys_trace cmp scno, sc_nr // check upper syscall limit b.hs ni_sys + mask_nospec64 scno, sc_nr, x19 // enforce bounds for syscall number ldr x16, [stbl, scno, lsl #3] // address in the syscall table blr x16 // call sys_* routine b ret_fast_syscall
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit c2f0ad4fc089cff81cef6a13d04b399980ecbfcc upstream.
A mispredicted conditional call to set_fs could result in the wrong addr_limit being forwarded under speculation to a subsequent access_ok check, potentially forming part of a spectre-v1 attack using uaccess routines.
This patch prevents this forwarding from taking place, but putting heavy barriers in set_fs after writing the addr_limit.
Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/uaccess.h | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -68,6 +68,13 @@ static inline void set_fs(mm_segment_t f current_thread_info()->addr_limit = fs;
/* + * Prevent a mispredicted conditional call to set_fs from forwarding + * the wrong address limit to access_ok under speculation. + */ + dsb(nsh); + isb(); + + /* * Enable/disable UAO so that copy_to_user() etc can access * kernel memory with the unprivileged instructions. */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 84624087dd7e3b482b7b11c170ebc1f329b3a218 upstream.
access_ok isn't an expensive operation once the addr_limit for the current thread has been loaded into the cache. Given that the initial access_ok check preceding a sequence of __{get,put}_user operations will take the brunt of the miss, we can make the __* variants identical to the full-fat versions, which brings with it the benefits of address masking.
The likely cost in these sequences will be from toggling PAN/UAO, which we can address later by implementing the *_unsafe versions.
Reviewed-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/uaccess.h | 62 ++++++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 26 deletions(-)
--- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -209,30 +209,35 @@ do { \ CONFIG_ARM64_PAN)); \ } while (0)
-#define __get_user(x, ptr) \ +#define __get_user_check(x, ptr, err) \ ({ \ - int __gu_err = 0; \ - __get_user_err((x), (ptr), __gu_err); \ - __gu_err; \ + __typeof__(*(ptr)) __user *__p = (ptr); \ + might_fault(); \ + if (access_ok(VERIFY_READ, __p, sizeof(*__p))) { \ + __p = uaccess_mask_ptr(__p); \ + __get_user_err((x), __p, (err)); \ + } else { \ + (x) = 0; (err) = -EFAULT; \ + } \ })
#define __get_user_error(x, ptr, err) \ ({ \ - __get_user_err((x), (ptr), (err)); \ + __get_user_check((x), (ptr), (err)); \ (void)0; \ })
-#define __get_user_unaligned __get_user - -#define get_user(x, ptr) \ +#define __get_user(x, ptr) \ ({ \ - __typeof__(*(ptr)) __user *__p = (ptr); \ - might_fault(); \ - access_ok(VERIFY_READ, __p, sizeof(*__p)) ? \ - __p = uaccess_mask_ptr(__p), __get_user((x), __p) : \ - ((x) = 0, -EFAULT); \ + int __gu_err = 0; \ + __get_user_check((x), (ptr), __gu_err); \ + __gu_err; \ })
+#define __get_user_unaligned __get_user + +#define get_user __get_user + #define __put_user_asm(instr, alt_instr, reg, x, addr, err, feature) \ asm volatile( \ "1:"ALTERNATIVE(instr " " reg "1, [%2]\n", \ @@ -277,30 +282,35 @@ do { \ CONFIG_ARM64_PAN)); \ } while (0)
-#define __put_user(x, ptr) \ +#define __put_user_check(x, ptr, err) \ ({ \ - int __pu_err = 0; \ - __put_user_err((x), (ptr), __pu_err); \ - __pu_err; \ + __typeof__(*(ptr)) __user *__p = (ptr); \ + might_fault(); \ + if (access_ok(VERIFY_WRITE, __p, sizeof(*__p))) { \ + __p = uaccess_mask_ptr(__p); \ + __put_user_err((x), __p, (err)); \ + } else { \ + (err) = -EFAULT; \ + } \ })
#define __put_user_error(x, ptr, err) \ ({ \ - __put_user_err((x), (ptr), (err)); \ + __put_user_check((x), (ptr), (err)); \ (void)0; \ })
-#define __put_user_unaligned __put_user - -#define put_user(x, ptr) \ +#define __put_user(x, ptr) \ ({ \ - __typeof__(*(ptr)) __user *__p = (ptr); \ - might_fault(); \ - access_ok(VERIFY_WRITE, __p, sizeof(*__p)) ? \ - __p = uaccess_mask_ptr(__p), __put_user((x), __p) : \ - -EFAULT; \ + int __pu_err = 0; \ + __put_user_check((x), (ptr), __pu_err); \ + __pu_err; \ })
+#define __put_user_unaligned __put_user + +#define put_user __put_user + extern unsigned long __must_check __arch_copy_from_user(void *to, const void __user *from, unsigned long n); extern unsigned long __must_check __arch_copy_to_user(void __user *to, const void *from, unsigned long n); extern unsigned long __must_check __copy_in_user(void __user *to, const void __user *from, unsigned long n);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit f71c2ffcb20dd8626880747557014bb9a61eb90e upstream.
Like we've done for get_user and put_user, ensure that user pointers are masked before invoking the underlying __arch_{clear,copy_*}_user operations.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: fixup for v4.9-style uaccess primitives] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/uaccess.h | 18 ++++++++++-------- arch/arm64/kernel/arm64ksyms.c | 4 ++-- arch/arm64/lib/clear_user.S | 6 +++--- arch/arm64/lib/copy_in_user.S | 4 ++-- 4 files changed, 17 insertions(+), 15 deletions(-)
--- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -313,21 +313,20 @@ do { \
extern unsigned long __must_check __arch_copy_from_user(void *to, const void __user *from, unsigned long n); extern unsigned long __must_check __arch_copy_to_user(void __user *to, const void *from, unsigned long n); -extern unsigned long __must_check __copy_in_user(void __user *to, const void __user *from, unsigned long n); -extern unsigned long __must_check __clear_user(void __user *addr, unsigned long n); +extern unsigned long __must_check __arch_copy_in_user(void __user *to, const void __user *from, unsigned long n);
static inline unsigned long __must_check __copy_from_user(void *to, const void __user *from, unsigned long n) { kasan_check_write(to, n); check_object_size(to, n, false); - return __arch_copy_from_user(to, from, n); + return __arch_copy_from_user(to, __uaccess_mask_ptr(from), n); }
static inline unsigned long __must_check __copy_to_user(void __user *to, const void *from, unsigned long n) { kasan_check_read(from, n); check_object_size(from, n, true); - return __arch_copy_to_user(to, from, n); + return __arch_copy_to_user(__uaccess_mask_ptr(to), from, n); }
static inline unsigned long __must_check copy_from_user(void *to, const void __user *from, unsigned long n) @@ -355,22 +354,25 @@ static inline unsigned long __must_check return n; }
-static inline unsigned long __must_check copy_in_user(void __user *to, const void __user *from, unsigned long n) +static inline unsigned long __must_check __copy_in_user(void __user *to, const void __user *from, unsigned long n) { if (access_ok(VERIFY_READ, from, n) && access_ok(VERIFY_WRITE, to, n)) - n = __copy_in_user(to, from, n); + n = __arch_copy_in_user(__uaccess_mask_ptr(to), __uaccess_mask_ptr(from), n); return n; } +#define copy_in_user __copy_in_user
#define __copy_to_user_inatomic __copy_to_user #define __copy_from_user_inatomic __copy_from_user
-static inline unsigned long __must_check clear_user(void __user *to, unsigned long n) +extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n); +static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n) { if (access_ok(VERIFY_WRITE, to, n)) - n = __clear_user(__uaccess_mask_ptr(to), n); + n = __arch_clear_user(__uaccess_mask_ptr(to), n); return n; } +#define clear_user __clear_user
extern long strncpy_from_user(char *dest, const char __user *src, long count);
--- a/arch/arm64/kernel/arm64ksyms.c +++ b/arch/arm64/kernel/arm64ksyms.c @@ -37,8 +37,8 @@ EXPORT_SYMBOL(clear_page); /* user mem (segment) */ EXPORT_SYMBOL(__arch_copy_from_user); EXPORT_SYMBOL(__arch_copy_to_user); -EXPORT_SYMBOL(__clear_user); -EXPORT_SYMBOL(__copy_in_user); +EXPORT_SYMBOL(__arch_clear_user); +EXPORT_SYMBOL(__arch_copy_in_user);
/* physical memory */ EXPORT_SYMBOL(memstart_addr); --- a/arch/arm64/lib/clear_user.S +++ b/arch/arm64/lib/clear_user.S @@ -24,7 +24,7 @@
.text
-/* Prototype: int __clear_user(void *addr, size_t sz) +/* Prototype: int __arch_clear_user(void *addr, size_t sz) * Purpose : clear some user memory * Params : addr - user memory address to clear * : sz - number of bytes to clear @@ -32,7 +32,7 @@ * * Alignment fixed up by hardware. */ -ENTRY(__clear_user) +ENTRY(__arch_clear_user) ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \ CONFIG_ARM64_PAN) mov x2, x1 // save the size for fixup return @@ -57,7 +57,7 @@ uao_user_alternative 9f, strb, sttrb, wz ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(1)), ARM64_ALT_PAN_NOT_UAO, \ CONFIG_ARM64_PAN) ret -ENDPROC(__clear_user) +ENDPROC(__arch_clear_user)
.section .fixup,"ax" .align 2 --- a/arch/arm64/lib/copy_in_user.S +++ b/arch/arm64/lib/copy_in_user.S @@ -67,7 +67,7 @@ .endm
end .req x5 -ENTRY(__copy_in_user) +ENTRY(__arch_copy_in_user) ALTERNATIVE("nop", __stringify(SET_PSTATE_PAN(0)), ARM64_ALT_PAN_NOT_UAO, \ CONFIG_ARM64_PAN) add end, x0, x2 @@ -76,7 +76,7 @@ ALTERNATIVE("nop", __stringify(SET_PSTAT CONFIG_ARM64_PAN) mov x0, #0 ret -ENDPROC(__copy_in_user) +ENDPROC(__arch_copy_in_user)
.section .fixup,"ax" .align 2
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: James Morse james.morse@arm.com
commit edf298cfce47ab7279d03b5203ae2ef3a58e49db upstream.
this_cpu_has_cap() tests caps->desc not caps->matches, so it stops walking the list when it finds a 'silent' feature, instead of walking to the end of the list.
Prior to v4.6's 644c2ae198412 ("arm64: cpufeature: Test 'matches' pointer to find the end of the list") we always tested desc to find the end of a capability list. This was changed for dubious things like PAN_NOT_UAO. v4.7's e3661b128e53e ("arm64: Allow a capability to be checked on single CPU") added this_cpu_has_cap() using the old desc style test.
CC: Suzuki K Poulose suzuki.poulose@arm.com Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Acked-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: James Morse james.morse@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpufeature.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1024,9 +1024,8 @@ static bool __this_cpu_has_cap(const str if (WARN_ON(preemptible())) return false;
- for (caps = cap_array; caps->desc; caps++) + for (caps = cap_array; caps->matches; caps++) if (caps->capability == cap && - caps->matches && caps->matches(caps, SCOPE_LOCAL_CPU)) return true; return false;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Suzuki K Poulose suzuki.poulose@arm.com
commit 55b35d070c2534dfb714b883f3c3ae05d02032da upstream.
When a CPU is brought up after we have finalised the system wide capabilities (i.e, features and errata), we make sure the new CPU doesn't need a new errata work around which has not been detected already. However we don't run enable() method on the new CPU for the errata work arounds already detected. This could cause the new CPU running without potential work arounds. It is upto the "enable()" method to decide if this CPU should do something about the errata.
Fixes: commit 6a6efbb45b7d95c84 ("arm64: Verify CPU errata work arounds on hotplugged CPU") Cc: Will Deacon will.deacon@arm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Andre Przywara andre.przywara@arm.com Cc: Dave Martin dave.martin@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -143,15 +143,18 @@ void verify_local_cpu_errata_workarounds { const struct arm64_cpu_capabilities *caps = arm64_errata;
- for (; caps->matches; caps++) - if (!cpus_have_cap(caps->capability) && - caps->matches(caps, SCOPE_LOCAL_CPU)) { + for (; caps->matches; caps++) { + if (cpus_have_cap(caps->capability)) { + if (caps->enable) + caps->enable((void *)caps); + } else if (caps->matches(caps, SCOPE_LOCAL_CPU)) { pr_crit("CPU%d: Requires work around for %s, not detected" " at boot time\n", smp_processor_id(), caps->desc ? : "an erratum"); cpu_die_early(); } + } }
void update_cpu_errata_workarounds(void)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 0a0d111d40fd1dc588cc590fab6b55d86ddc71d3 upstream.
In order to invoke the CPU capability ->matches callback from the ->enable callback for applying local-CPU workarounds, we need a handle on the capability structure.
This patch passes a pointer to the capability structure to the ->enable callback.
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpufeature.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1058,7 +1058,7 @@ void __init enable_cpu_capabilities(cons * uses an IPI, giving us a PSTATE that disappears when * we return. */ - stop_machine(caps->enable, NULL, cpu_online_mask); + stop_machine(caps->enable, (void *)caps, cpu_online_mask); }
/* @@ -1115,7 +1115,7 @@ verify_local_cpu_features(const struct a cpu_die_early(); } if (caps->enable) - caps->enable(NULL); + caps->enable((void *)caps); } }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit d68e3ba5303f7e1099f51fdcd155f5263da8569b upstream.
Entry into recent versions of ARM Trusted Firmware will invalidate the CPU branch predictor state in order to protect against aliasing attacks.
This patch exposes the PSCI "VERSION" function via psci_ops, so that it can be invoked outside of the PSCI driver where necessary.
Acked-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/psci.c | 2 ++ include/linux/psci.h | 1 + 2 files changed, 3 insertions(+)
--- a/drivers/firmware/psci.c +++ b/drivers/firmware/psci.c @@ -496,6 +496,8 @@ static void __init psci_init_migrate(voi static void __init psci_0_2_set_functions(void) { pr_info("Using standard PSCI v0.2 function IDs\n"); + psci_ops.get_version = psci_get_version; + psci_function_id[PSCI_FN_CPU_SUSPEND] = PSCI_FN_NATIVE(0_2, CPU_SUSPEND); psci_ops.cpu_suspend = psci_cpu_suspend; --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -26,6 +26,7 @@ int psci_cpu_init_idle(unsigned int cpu) int psci_cpu_suspend_enter(unsigned long index);
struct psci_operations { + u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); int (*cpu_off)(u32 state); int (*cpu_on)(unsigned long cpuid, unsigned long entry_point);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Catalin Marinas catalin.marinas@arm.com
commit f33bcf03e6079668da6bf4eec4a7dcf9289131d0 upstream.
This patch takes the errata workaround code out of cpu_do_switch_mm into a dedicated post_ttbr0_update_workaround macro which will be reused in a subsequent patch.
Cc: Will Deacon will.deacon@arm.com Cc: James Morse james.morse@arm.com Cc: Kees Cook keescook@chromium.org Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/assembler.h | 14 ++++++++++++++ arch/arm64/mm/proc.S | 6 +----- 2 files changed, 15 insertions(+), 5 deletions(-)
--- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -434,4 +434,18 @@ alternative_endif .macro pte_to_phys, phys, pte and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT) .endm + +/* + * Errata workaround post TTBR0_EL1 update. + */ + .macro post_ttbr0_update_workaround +#ifdef CONFIG_CAVIUM_ERRATUM_27456 +alternative_if ARM64_WORKAROUND_CAVIUM_27456 + ic iallu + dsb nsh + isb +alternative_else_nop_endif +#endif + .endm + #endif /* __ASM_ASSEMBLER_H */ --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -139,11 +139,7 @@ ENTRY(cpu_do_switch_mm) isb msr ttbr0_el1, x0 // now update TTBR0 isb -alternative_if ARM64_WORKAROUND_CAVIUM_27456 - ic iallu - dsb nsh - isb -alternative_else_nop_endif + post_ttbr0_update_workaround ret ENDPROC(cpu_do_switch_mm)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 95e3de3590e3f2358bb13f013911bc1bfa5d3f53 upstream.
We will soon need to invoke a CPU-specific function pointer after changing page tables, so move post_ttbr_update_workaround out into C code to make this possible.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/assembler.h | 13 ------------- arch/arm64/mm/context.c | 9 +++++++++ arch/arm64/mm/proc.S | 3 +-- 3 files changed, 10 insertions(+), 15 deletions(-)
--- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -435,17 +435,4 @@ alternative_endif and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT) .endm
-/* - * Errata workaround post TTBR0_EL1 update. - */ - .macro post_ttbr0_update_workaround -#ifdef CONFIG_CAVIUM_ERRATUM_27456 -alternative_if ARM64_WORKAROUND_CAVIUM_27456 - ic iallu - dsb nsh - isb -alternative_else_nop_endif -#endif - .endm - #endif /* __ASM_ASSEMBLER_H */ --- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c @@ -233,6 +233,15 @@ switch_mm_fastpath: cpu_switch_mm(mm->pgd, mm); }
+/* Errata workaround post TTBRx_EL1 update. */ +asmlinkage void post_ttbr_update_workaround(void) +{ + asm(ALTERNATIVE("nop; nop; nop", + "ic iallu; dsb nsh; isb", + ARM64_WORKAROUND_CAVIUM_27456, + CONFIG_CAVIUM_ERRATUM_27456)); +} + static int asids_init(void) { asid_bits = get_cpu_asid_bits(); --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -139,8 +139,7 @@ ENTRY(cpu_do_switch_mm) isb msr ttbr0_el1, x0 // now update TTBR0 isb - post_ttbr0_update_workaround - ret + b post_ttbr_update_workaround // Back to C code... ENDPROC(cpu_do_switch_mm)
.pushsection ".idmap.text", "awx"
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 0f15adbb2861ce6f75ccfc5a92b19eae0ef327d0 upstream.
Aliasing attacks against CPU branch predictors can allow an attacker to redirect speculative control flow on some CPUs and potentially divulge information from one context to another.
This patch adds initial skeleton code behind a new Kconfig option to enable implementation-specific mitigations against these attacks for CPUs that are affected.
Co-developed-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: copy bp hardening cb via text mapping] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 17 ++++++++ arch/arm64/include/asm/cpucaps.h | 3 + arch/arm64/include/asm/mmu.h | 39 ++++++++++++++++++++ arch/arm64/include/asm/sysreg.h | 2 + arch/arm64/kernel/Makefile | 4 ++ arch/arm64/kernel/bpi.S | 55 ++++++++++++++++++++++++++++ arch/arm64/kernel/cpu_errata.c | 74 +++++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/cpufeature.c | 3 + arch/arm64/kernel/entry.S | 8 ++-- arch/arm64/mm/context.c | 2 + arch/arm64/mm/fault.c | 17 ++++++++ 11 files changed, 219 insertions(+), 5 deletions(-) create mode 100644 arch/arm64/kernel/bpi.S
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -745,6 +745,23 @@ config UNMAP_KERNEL_AT_EL0
If unsure, say Y.
+config HARDEN_BRANCH_PREDICTOR + bool "Harden the branch predictor against aliasing attacks" if EXPERT + default y + help + Speculation attacks against some high-performance processors rely on + being able to manipulate the branch predictor for a victim context by + executing aliasing branches in the attacker context. Such attacks + can be partially mitigated against by clearing internal branch + predictor state and limiting the prediction logic in some situations. + + This config option will take CPU-specific actions to harden the + branch predictor against aliasing attacks and may rely on specific + instruction sequences or control bits being set by the system + firmware. + + If unsure, say Y. + menuconfig ARMV8_DEPRECATED bool "Emulate deprecated/obsolete ARMv8 instructions" depends on COMPAT --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -35,7 +35,8 @@ #define ARM64_HYP_OFFSET_LOW 14 #define ARM64_MISMATCHED_CACHE_LINE_SIZE 15 #define ARM64_UNMAP_KERNEL_AT_EL0 16 +#define ARM64_HARDEN_BRANCH_PREDICTOR 17
-#define ARM64_NCAPS 17 +#define ARM64_NCAPS 18
#endif /* __ASM_CPUCAPS_H */ --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -20,6 +20,8 @@
#ifndef __ASSEMBLY__
+#include <linux/percpu.h> + typedef struct { atomic64_t id; void *vdso; @@ -38,6 +40,43 @@ static inline bool arm64_kernel_unmapped cpus_have_cap(ARM64_UNMAP_KERNEL_AT_EL0); }
+typedef void (*bp_hardening_cb_t)(void); + +struct bp_hardening_data { + int hyp_vectors_slot; + bp_hardening_cb_t fn; +}; + +#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR +extern char __bp_harden_hyp_vecs_start[], __bp_harden_hyp_vecs_end[]; + +DECLARE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data); + +static inline struct bp_hardening_data *arm64_get_bp_hardening_data(void) +{ + return this_cpu_ptr(&bp_hardening_data); +} + +static inline void arm64_apply_bp_hardening(void) +{ + struct bp_hardening_data *d; + + if (!cpus_have_cap(ARM64_HARDEN_BRANCH_PREDICTOR)) + return; + + d = arm64_get_bp_hardening_data(); + if (d->fn) + d->fn(); +} +#else +static inline struct bp_hardening_data *arm64_get_bp_hardening_data(void) +{ + return NULL; +} + +static inline void arm64_apply_bp_hardening(void) { } +#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */ + extern void paging_init(void); extern void bootmem_init(void); extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt); --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -118,6 +118,8 @@
/* id_aa64pfr0 */ #define ID_AA64PFR0_CSV3_SHIFT 60 +#define ID_AA64PFR0_CSV2_SHIFT 56 +#define ID_AA64PFR0_SVE_SHIFT 32 #define ID_AA64PFR0_GIC_SHIFT 24 #define ID_AA64PFR0_ASIMD_SHIFT 20 #define ID_AA64PFR0_FP_SHIFT 16 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -51,6 +51,10 @@ arm64-obj-$(CONFIG_HIBERNATION) += hibe arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \ cpu-reset.o
+ifeq ($(CONFIG_KVM),y) +arm64-obj-$(CONFIG_HARDEN_BRANCH_PREDICTOR) += bpi.o +endif + obj-y += $(arm64-obj-y) vdso/ probes/ obj-m += $(arm64-obj-m) head-y := head.o --- /dev/null +++ b/arch/arm64/kernel/bpi.S @@ -0,0 +1,55 @@ +/* + * Contains CPU specific branch predictor invalidation sequences + * + * Copyright (C) 2018 ARM Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/linkage.h> + +.macro ventry target + .rept 31 + nop + .endr + b \target +.endm + +.macro vectors target + ventry \target + 0x000 + ventry \target + 0x080 + ventry \target + 0x100 + ventry \target + 0x180 + + ventry \target + 0x200 + ventry \target + 0x280 + ventry \target + 0x300 + ventry \target + 0x380 + + ventry \target + 0x400 + ventry \target + 0x480 + ventry \target + 0x500 + ventry \target + 0x580 + + ventry \target + 0x600 + ventry \target + 0x680 + ventry \target + 0x700 + ventry \target + 0x780 +.endm + + .align 11 +ENTRY(__bp_harden_hyp_vecs_start) + .rept 4 + vectors __kvm_hyp_vector + .endr +ENTRY(__bp_harden_hyp_vecs_end) --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -46,6 +46,80 @@ static int cpu_enable_trap_ctr_access(vo return 0; }
+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR +#include <asm/mmu_context.h> +#include <asm/cacheflush.h> + +DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data); + +#ifdef CONFIG_KVM +static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start, + const char *hyp_vecs_end) +{ + void *dst = __bp_harden_hyp_vecs_start + slot * SZ_2K; + int i; + + for (i = 0; i < SZ_2K; i += 0x80) + memcpy(dst + i, hyp_vecs_start, hyp_vecs_end - hyp_vecs_start); + + flush_icache_range((uintptr_t)dst, (uintptr_t)dst + SZ_2K); +} + +static void __install_bp_hardening_cb(bp_hardening_cb_t fn, + const char *hyp_vecs_start, + const char *hyp_vecs_end) +{ + static int last_slot = -1; + static DEFINE_SPINLOCK(bp_lock); + int cpu, slot = -1; + + spin_lock(&bp_lock); + for_each_possible_cpu(cpu) { + if (per_cpu(bp_hardening_data.fn, cpu) == fn) { + slot = per_cpu(bp_hardening_data.hyp_vectors_slot, cpu); + break; + } + } + + if (slot == -1) { + last_slot++; + BUG_ON(((__bp_harden_hyp_vecs_end - __bp_harden_hyp_vecs_start) + / SZ_2K) <= last_slot); + slot = last_slot; + __copy_hyp_vect_bpi(slot, hyp_vecs_start, hyp_vecs_end); + } + + __this_cpu_write(bp_hardening_data.hyp_vectors_slot, slot); + __this_cpu_write(bp_hardening_data.fn, fn); + spin_unlock(&bp_lock); +} +#else +static void __install_bp_hardening_cb(bp_hardening_cb_t fn, + const char *hyp_vecs_start, + const char *hyp_vecs_end) +{ + __this_cpu_write(bp_hardening_data.fn, fn); +} +#endif /* CONFIG_KVM */ + +static void install_bp_hardening_cb(const struct arm64_cpu_capabilities *entry, + bp_hardening_cb_t fn, + const char *hyp_vecs_start, + const char *hyp_vecs_end) +{ + u64 pfr0; + + if (!entry->matches(entry, SCOPE_LOCAL_CPU)) + return; + + pfr0 = read_cpuid(ID_AA64PFR0_EL1); + if (cpuid_feature_extract_unsigned_field(pfr0, ID_AA64PFR0_CSV2_SHIFT)) + return; + + __install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end); +} +#endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */ + #define MIDR_RANGE(model, min, max) \ .def_scope = SCOPE_LOCAL_CPU, \ .matches = is_affected_midr_range, \ --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -94,7 +94,8 @@ static const struct arm64_ftr_bits ftr_i
static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { ARM64_FTR_BITS(FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0), - ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 32, 28, 0), + ARM64_FTR_BITS(FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 32, 24, 0), ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, 28, 4, 0), ARM64_FTR_BITS(FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0), S_ARM64_FTR_BITS(FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI), --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -589,13 +589,15 @@ el0_ia: * Instruction abort handling */ mrs x26, far_el1 - // enable interrupts before calling the main handler - enable_dbg_and_irq + msr daifclr, #(8 | 4 | 1) +#ifdef CONFIG_TRACE_IRQFLAGS + bl trace_hardirqs_off +#endif ct_user_exit mov x0, x26 mov x1, x25 mov x2, sp - bl do_mem_abort + bl do_el0_ia_bp_hardening b ret_to_user el0_fpsimd_acc: /* --- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c @@ -240,6 +240,8 @@ asmlinkage void post_ttbr_update_workaro "ic iallu; dsb nsh; isb", ARM64_WORKAROUND_CAVIUM_27456, CONFIG_CAVIUM_ERRATUM_27456)); + + arm64_apply_bp_hardening(); }
static int asids_init(void) --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -590,6 +590,23 @@ asmlinkage void __exception do_mem_abort arm64_notify_die("", regs, &info, esr); }
+asmlinkage void __exception do_el0_ia_bp_hardening(unsigned long addr, + unsigned int esr, + struct pt_regs *regs) +{ + /* + * We've taken an instruction abort from userspace and not yet + * re-enabled IRQs. If the address is a kernel address, apply + * BP hardening prior to enabling IRQs and pre-emption. + */ + if (addr > TASK_SIZE) + arm64_apply_bp_hardening(); + + local_irq_enable(); + do_mem_abort(addr, esr, regs); +} + + /* * Handle stack alignment exceptions. */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit a8e4c0a919ae310944ed2c9ace11cf3ccd8a609b upstream.
We call arm64_apply_bp_hardening() from post_ttbr_update_workaround, which has the unexpected consequence of being triggered on every exception return to userspace when ARM64_SW_TTBR0_PAN is selected, even if no context switch actually occured.
This is a bit suboptimal, and it would be more logical to only invalidate the branch predictor when we actually switch to a different mm.
In order to solve this, move the call to arm64_apply_bp_hardening() into check_and_switch_context(), where we're guaranteed to pick a different mm context.
Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/mm/context.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c @@ -230,6 +230,9 @@ void check_and_switch_context(struct mm_ raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
switch_mm_fastpath: + + arm64_apply_bp_hardening(); + cpu_switch_mm(mm->pgd, mm); }
@@ -240,8 +243,6 @@ asmlinkage void post_ttbr_update_workaro "ic iallu; dsb nsh; isb", ARM64_WORKAROUND_CAVIUM_27456, CONFIG_CAVIUM_ERRATUM_27456)); - - arm64_apply_bp_hardening(); }
static int asids_init(void)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Laura Abbott labbott@redhat.com
commit 568c5fe5a54f2654f5a4c599c45b8a62ed9a2013 upstream.
Certain architectures may have the kernel image mapped separately to alias the linear map. Introduce a macro lm_alias to translate a kernel image symbol into its linear alias. This is used in part with work to add CONFIG_DEBUG_VIRTUAL support for arm64.
Reviewed-by: Mark Rutland mark.rutland@arm.com Tested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Laura Abbott labbott@redhat.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+)
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -76,6 +76,10 @@ extern int mmap_rnd_compat_bits __read_m #define page_to_virt(x) __va(PFN_PHYS(page_to_pfn(x))) #endif
+#ifndef lm_alias +#define lm_alias(x) __va(__pa_symbol(x)) +#endif + /* * To prevent common memory management code establishing * a zero page mapping on a read fault.
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 6840bdd73d07216ab4bc46f5a8768c37ea519038 upstream.
Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream, use cpus_have_cap()] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_mmu.h | 10 ++++++++++ arch/arm/kvm/arm.c | 9 ++++++++- arch/arm64/include/asm/kvm_mmu.h | 38 ++++++++++++++++++++++++++++++++++++++ arch/arm64/kvm/hyp/switch.c | 2 +- 4 files changed, 57 insertions(+), 2 deletions(-)
--- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -223,6 +223,16 @@ static inline unsigned int kvm_get_vmid_ return 8; }
+static inline void *kvm_get_hyp_vector(void) +{ + return kvm_ksym_ref(__kvm_hyp_vector); +} + +static inline int kvm_map_vectors(void) +{ + return 0; +} + #endif /* !__ASSEMBLY__ */
#endif /* __ARM_KVM_MMU_H__ */ --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -1088,7 +1088,7 @@ static void cpu_init_hyp_mode(void *dumm pgd_ptr = kvm_mmu_get_httbr(); stack_page = __this_cpu_read(kvm_arm_hyp_stack_page); hyp_stack_ptr = stack_page + PAGE_SIZE; - vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector); + vector_ptr = (unsigned long)kvm_get_hyp_vector();
__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr); __cpu_init_stage2(); @@ -1345,6 +1345,13 @@ static int init_hyp_mode(void) goto out_err; }
+ + err = kvm_map_vectors(); + if (err) { + kvm_err("Cannot map vectors\n"); + goto out_err; + } + /* * Map the Hyp stack pages */ --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -313,5 +313,43 @@ static inline unsigned int kvm_get_vmid_ return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8; }
+#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR +#include <asm/mmu.h> + +static inline void *kvm_get_hyp_vector(void) +{ + struct bp_hardening_data *data = arm64_get_bp_hardening_data(); + void *vect = kvm_ksym_ref(__kvm_hyp_vector); + + if (data->fn) { + vect = __bp_harden_hyp_vecs_start + + data->hyp_vectors_slot * SZ_2K; + + if (!cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN)) + vect = lm_alias(vect); + } + + return vect; +} + +static inline int kvm_map_vectors(void) +{ + return create_hyp_mappings(kvm_ksym_ref(__bp_harden_hyp_vecs_start), + kvm_ksym_ref(__bp_harden_hyp_vecs_end), + PAGE_HYP_EXEC); +} + +#else +static inline void *kvm_get_hyp_vector(void) +{ + return kvm_ksym_ref(__kvm_hyp_vector); +} + +static inline int kvm_map_vectors(void) +{ + return 0; +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -50,7 +50,7 @@ static void __hyp_text __activate_traps_ val &= ~CPACR_EL1_FPEN; write_sysreg(val, cpacr_el1);
- write_sysreg(__kvm_hyp_vector, vbar_el1); + write_sysreg(kvm_get_hyp_vector(), vbar_el1); }
static void __hyp_text __activate_traps_nvhe(void)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 5dfc6ed27710c42cbc15db5c0d4475699991da0a upstream.
Software-step and PC alignment fault exceptions have higher priority than instruction abort exceptions, so apply the BP hardening hooks there too if the user PC appears to reside in kernel space.
Reported-by: Dan Hettena dhettena@nvidia.com Reviewed-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/entry.S | 6 ++++-- arch/arm64/mm/fault.c | 9 +++++++++ 2 files changed, 13 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -624,8 +624,10 @@ el0_sp_pc: * Stack or PC alignment exception handling */ mrs x26, far_el1 - // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_dbg +#ifdef CONFIG_TRACE_IRQFLAGS + bl trace_hardirqs_off +#endif ct_user_exit mov x0, x26 mov x1, x25 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -617,6 +617,12 @@ asmlinkage void __exception do_sp_pc_abo struct siginfo info; struct task_struct *tsk = current;
+ if (user_mode(regs)) { + if (instruction_pointer(regs) > TASK_SIZE) + arm64_apply_bp_hardening(); + local_irq_enable(); + } + if (show_unhandled_signals && unhandled_signal(tsk, SIGBUS)) pr_info_ratelimited("%s[%d]: %s exception: pc=%p sp=%p\n", tsk->comm, task_pid_nr(tsk), @@ -676,6 +682,9 @@ asmlinkage int __exception do_debug_exce if (interrupts_enabled(regs)) trace_hardirqs_off();
+ if (user_mode(regs) && instruction_pointer(regs) > TASK_SIZE) + arm64_apply_bp_hardening(); + if (!inf->fn(addr, esr, regs)) { rv = 1; } else {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit 30d88c0e3ace625a92eead9ca0ad94093a8f59fe upstream.
It is possible to take an IRQ from EL0 following a branch to a kernel address in such a way that the IRQ is prioritised over the instruction abort. Whilst an attacker would need to get the stars to align here, it might be sufficient with enough calibration so perform BP hardening in the rare case that we see a kernel address in the ELR when handling an IRQ from EL0.
Reported-by: Dan Hettena dhettena@nvidia.com Reviewed-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/entry.S | 5 +++++ arch/arm64/mm/fault.c | 6 ++++++ 2 files changed, 11 insertions(+)
--- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -686,6 +686,11 @@ el0_irq_naked: #endif
ct_user_exit +#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR + tbz x22, #55, 1f + bl do_el0_irq_bp_hardening +1: +#endif irq_handler
#ifdef CONFIG_TRACE_IRQFLAGS --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -590,6 +590,12 @@ asmlinkage void __exception do_mem_abort arm64_notify_die("", regs, &info, esr); }
+asmlinkage void __exception do_el0_irq_bp_hardening(void) +{ + /* PC has already been checked in entry.S */ + arm64_apply_bp_hardening(); +} + asmlinkage void __exception do_el0_ia_bp_hardening(unsigned long addr, unsigned int esr, struct pt_regs *regs)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit a65d219fe5dc7887fd5ca04c2ac3e9a34feb8dfc upstream.
Hook up MIDR values for the Cortex-A72 and Cortex-A75 CPUs, since they will soon need MIDR matches for hardening the branch predictor.
Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cputype.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -75,7 +75,10 @@ #define ARM_CPU_PART_AEM_V8 0xD0F #define ARM_CPU_PART_FOUNDATION 0xD00 #define ARM_CPU_PART_CORTEX_A57 0xD07 +#define ARM_CPU_PART_CORTEX_A72 0xD08 #define ARM_CPU_PART_CORTEX_A53 0xD03 +#define ARM_CPU_PART_CORTEX_A73 0xD09 +#define ARM_CPU_PART_CORTEX_A75 0xD0A
#define APM_CPU_PART_POTENZA 0x000
@@ -87,6 +90,9 @@
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) +#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) +#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73) +#define MIDR_CORTEX_A75 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A75) #define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX) #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX) #define MIDR_CAVIUM_THUNDERX2 MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX2)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 06f1494f837da8997d670a1ba87add7963b08922 upstream.
Some minor erratum may not be fixed in further revisions of a core, leading to a situation where the workaround needs to be updated each time an updated core is released.
Introduce a MIDR_ALL_VERSIONS match helper that will work for all versions of that MIDR, once and for all.
Acked-by: Thomas Gleixner tglx@linutronix.de Acked-by: Mark Rutland mark.rutland@arm.com Acked-by: Daniel Lezcano daniel.lezcano@linaro.org Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -127,6 +127,13 @@ static void install_bp_hardening_cb(con .midr_range_min = min, \ .midr_range_max = max
+#define MIDR_ALL_VERSIONS(model) \ + .def_scope = SCOPE_LOCAL_CPU, \ + .matches = is_affected_midr_range, \ + .midr_model = model, \ + .midr_range_min = 0, \ + .midr_range_max = (MIDR_VARIANT_MASK | MIDR_REVISION_MASK) + const struct arm64_cpu_capabilities arm64_errata[] = { #if defined(CONFIG_ARM64_ERRATUM_826319) || \ defined(CONFIG_ARM64_ERRATUM_827319) || \
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Will Deacon will.deacon@arm.com
commit aa6acde65e03186b5add8151e1ffe36c3c62639b upstream.
Cortex-A57, A72, A73 and A75 are susceptible to branch predictor aliasing and can theoretically be attacked by malicious code.
This patch implements a PSCI-based mitigation for these CPUs when available. The call into firmware will invalidate the branch predictor state, preventing any malicious entries from affecting other victim contexts.
Co-developed-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/bpi.S | 24 +++++++++++++++++++++++ arch/arm64/kernel/cpu_errata.c | 42 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+)
--- a/arch/arm64/kernel/bpi.S +++ b/arch/arm64/kernel/bpi.S @@ -53,3 +53,27 @@ ENTRY(__bp_harden_hyp_vecs_start) vectors __kvm_hyp_vector .endr ENTRY(__bp_harden_hyp_vecs_end) +ENTRY(__psci_hyp_bp_inval_start) + sub sp, sp, #(8 * 18) + stp x16, x17, [sp, #(16 * 0)] + stp x14, x15, [sp, #(16 * 1)] + stp x12, x13, [sp, #(16 * 2)] + stp x10, x11, [sp, #(16 * 3)] + stp x8, x9, [sp, #(16 * 4)] + stp x6, x7, [sp, #(16 * 5)] + stp x4, x5, [sp, #(16 * 6)] + stp x2, x3, [sp, #(16 * 7)] + stp x0, x1, [sp, #(16 * 8)] + mov x0, #0x84000000 + smc #0 + ldp x16, x17, [sp, #(16 * 0)] + ldp x14, x15, [sp, #(16 * 1)] + ldp x12, x13, [sp, #(16 * 2)] + ldp x10, x11, [sp, #(16 * 3)] + ldp x8, x9, [sp, #(16 * 4)] + ldp x6, x7, [sp, #(16 * 5)] + ldp x4, x5, [sp, #(16 * 6)] + ldp x2, x3, [sp, #(16 * 7)] + ldp x0, x1, [sp, #(16 * 8)] + add sp, sp, #(8 * 18) +ENTRY(__psci_hyp_bp_inval_end) --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -53,6 +53,8 @@ static int cpu_enable_trap_ctr_access(vo DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
#ifdef CONFIG_KVM +extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[]; + static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start, const char *hyp_vecs_end) { @@ -94,6 +96,9 @@ static void __install_bp_hardening_cb(bp spin_unlock(&bp_lock); } #else +#define __psci_hyp_bp_inval_start NULL +#define __psci_hyp_bp_inval_end NULL + static void __install_bp_hardening_cb(bp_hardening_cb_t fn, const char *hyp_vecs_start, const char *hyp_vecs_end) @@ -118,6 +123,21 @@ static void install_bp_hardening_cb(con
__install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end); } + +#include <linux/psci.h> + +static int enable_psci_bp_hardening(void *data) +{ + const struct arm64_cpu_capabilities *entry = data; + + if (psci_ops.get_version) + install_bp_hardening_cb(entry, + (bp_hardening_cb_t)psci_ops.get_version, + __psci_hyp_bp_inval_start, + __psci_hyp_bp_inval_end); + + return 0; +} #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */
#define MIDR_RANGE(model, min, max) \ @@ -211,6 +231,28 @@ const struct arm64_cpu_capabilities arm6 .def_scope = SCOPE_LOCAL_CPU, .enable = cpu_enable_trap_ctr_access, }, +#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR + { + .capability = ARM64_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CORTEX_A57), + .enable = enable_psci_bp_hardening, + }, + { + .capability = ARM64_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), + .enable = enable_psci_bp_hardening, + }, + { + .capability = ARM64_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CORTEX_A73), + .enable = enable_psci_bp_hardening, + }, + { + .capability = ARM64_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CORTEX_A75), + .enable = enable_psci_bp_hardening, + }, +#endif { } };
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Jayachandran C jnair@caviumnetworks.com
commit f3d795d9b360523beca6d13ba64c2c532f601149 upstream.
Use PSCI based mitigation for speculative execution attacks targeting the branch predictor. We use the same mechanism as the one used for Cortex-A CPUs, we expect the PSCI version call to have a side effect of clearing the BTBs.
Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Jayachandran C jnair@caviumnetworks.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/cpu_errata.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -252,6 +252,16 @@ const struct arm64_cpu_capabilities arm6 MIDR_ALL_VERSIONS(MIDR_CORTEX_A75), .enable = enable_psci_bp_hardening, }, + { + .capability = ARM64_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN), + .enable = enable_psci_bp_hardening, + }, + { + .capability = ARM64_HARDEN_BRANCH_PREDICTOR, + MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2), + .enable = enable_psci_bp_hardening, + }, #endif { }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit f5115e8869e1dfafac0e414b4f1664f3a84a4683 upstream.
When handling an SMC trap, the "preferred return address" is set to that of the SMC, and not the next PC (which is a departure from the behaviour of an SMC that isn't trapped).
Increment PC in the handler, as the guest is otherwise forever stuck...
Cc: stable@vger.kernel.org Fixes: acfb3b883f6d ("arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls") Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/handle_exit.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -53,7 +53,16 @@ static int handle_hvc(struct kvm_vcpu *v
static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run) { + /* + * "If an SMC instruction executed at Non-secure EL1 is + * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a + * Trap exception, not a Secure Monitor Call exception [...]" + * + * We need to advance the PC after the trap, as it would + * otherwise return to the same address... + */ vcpu_set_reg(vcpu, 0, ~0UL); + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); return 1; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 1a2fb94e6a771ff94f4afa22497a4695187b820c upstream.
As we're about to update the PSCI support, and because I'm lazy, let's move the PSCI include file to include/kvm so that both ARM architectures can find it.
Acked-by: Christoffer Dall christoffer.dall@linaro.org Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_psci.h | 27 --------------------------- arch/arm/kvm/arm.c | 2 +- arch/arm/kvm/handle_exit.c | 2 +- arch/arm/kvm/psci.c | 3 ++- arch/arm64/include/asm/kvm_psci.h | 27 --------------------------- arch/arm64/kvm/handle_exit.c | 5 ++++- include/kvm/arm_psci.h | 27 +++++++++++++++++++++++++++ 7 files changed, 35 insertions(+), 58 deletions(-) delete mode 100644 arch/arm/include/asm/kvm_psci.h rename arch/arm64/include/asm/kvm_psci.h => include/kvm/arm_psci.h (89%)
--- a/arch/arm/include/asm/kvm_psci.h +++ /dev/null @@ -1,27 +0,0 @@ -/* - * Copyright (C) 2012 - ARM Ltd - * Author: Marc Zyngier marc.zyngier@arm.com - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - */ - -#ifndef __ARM_KVM_PSCI_H__ -#define __ARM_KVM_PSCI_H__ - -#define KVM_ARM_PSCI_0_1 1 -#define KVM_ARM_PSCI_0_2 2 - -int kvm_psci_version(struct kvm_vcpu *vcpu); -int kvm_psci_call(struct kvm_vcpu *vcpu); - -#endif /* __ARM_KVM_PSCI_H__ */ --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -29,6 +29,7 @@ #include <linux/kvm.h> #include <trace/events/kvm.h> #include <kvm/arm_pmu.h> +#include <kvm/arm_psci.h>
#define CREATE_TRACE_POINTS #include "trace.h" @@ -44,7 +45,6 @@ #include <asm/kvm_mmu.h> #include <asm/kvm_emulate.h> #include <asm/kvm_coproc.h> -#include <asm/kvm_psci.h> #include <asm/sections.h>
#ifdef REQUIRES_VIRT --- a/arch/arm/kvm/handle_exit.c +++ b/arch/arm/kvm/handle_exit.c @@ -21,7 +21,7 @@ #include <asm/kvm_emulate.h> #include <asm/kvm_coproc.h> #include <asm/kvm_mmu.h> -#include <asm/kvm_psci.h> +#include <kvm/arm_psci.h> #include <trace/events/kvm.h>
#include "trace.h" --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -21,9 +21,10 @@
#include <asm/cputype.h> #include <asm/kvm_emulate.h> -#include <asm/kvm_psci.h> #include <asm/kvm_host.h>
+#include <kvm/arm_psci.h> + #include <uapi/linux/psci.h>
/* --- a/arch/arm64/include/asm/kvm_psci.h +++ /dev/null @@ -1,27 +0,0 @@ -/* - * Copyright (C) 2012,2013 - ARM Ltd - * Author: Marc Zyngier marc.zyngier@arm.com - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - */ - -#ifndef __ARM64_KVM_PSCI_H__ -#define __ARM64_KVM_PSCI_H__ - -#define KVM_ARM_PSCI_0_1 1 -#define KVM_ARM_PSCI_0_2 2 - -int kvm_psci_version(struct kvm_vcpu *vcpu); -int kvm_psci_call(struct kvm_vcpu *vcpu); - -#endif /* __ARM64_KVM_PSCI_H__ */ --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -22,12 +22,15 @@ #include <linux/kvm.h> #include <linux/kvm_host.h>
+#include <kvm/arm_psci.h> + #include <asm/esr.h> #include <asm/kvm_asm.h> #include <asm/kvm_coproc.h> #include <asm/kvm_emulate.h> #include <asm/kvm_mmu.h> -#include <asm/kvm_psci.h> +#include <asm/debug-monitors.h> +#include <asm/traps.h>
#define CREATE_TRACE_POINTS #include "trace.h" --- /dev/null +++ b/include/kvm/arm_psci.h @@ -0,0 +1,27 @@ +/* + * Copyright (C) 2012,2013 - ARM Ltd + * Author: Marc Zyngier marc.zyngier@arm.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __KVM_ARM_PSCI_H__ +#define __KVM_ARM_PSCI_H__ + +#define KVM_ARM_PSCI_0_1 1 +#define KVM_ARM_PSCI_0_2 2 + +int kvm_psci_version(struct kvm_vcpu *vcpu); +int kvm_psci_call(struct kvm_vcpu *vcpu); + +#endif /* __KVM_ARM_PSCI_H__ */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit d0a144f12a7ca8368933eae6583c096c363ec506 upstream.
As we're about to trigger a PSCI version explosion, it doesn't hurt to introduce a PSCI_VERSION helper that is going to be used everywhere.
Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kvm/psci.c | 4 +--- include/kvm/arm_psci.h | 6 ++++-- include/uapi/linux/psci.h | 3 +++ 3 files changed, 8 insertions(+), 5 deletions(-)
--- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -25,8 +25,6 @@
#include <kvm/arm_psci.h>
-#include <uapi/linux/psci.h> - /* * This is an implementation of the Power State Coordination Interface * as described in ARM document number ARM DEN 0022A. @@ -220,7 +218,7 @@ static int kvm_psci_0_2_call(struct kvm_ * Bits[31:16] = Major Version = 0 * Bits[15:0] = Minor Version = 2 */ - val = 2; + val = KVM_ARM_PSCI_0_2; break; case PSCI_0_2_FN_CPU_SUSPEND: case PSCI_0_2_FN64_CPU_SUSPEND: --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -18,8 +18,10 @@ #ifndef __KVM_ARM_PSCI_H__ #define __KVM_ARM_PSCI_H__
-#define KVM_ARM_PSCI_0_1 1 -#define KVM_ARM_PSCI_0_2 2 +#include <uapi/linux/psci.h> + +#define KVM_ARM_PSCI_0_1 PSCI_VERSION(0, 1) +#define KVM_ARM_PSCI_0_2 PSCI_VERSION(0, 2)
int kvm_psci_version(struct kvm_vcpu *vcpu); int kvm_psci_call(struct kvm_vcpu *vcpu); --- a/include/uapi/linux/psci.h +++ b/include/uapi/linux/psci.h @@ -87,6 +87,9 @@ (((ver) & PSCI_VERSION_MAJOR_MASK) >> PSCI_VERSION_MAJOR_SHIFT) #define PSCI_VERSION_MINOR(ver) \ ((ver) & PSCI_VERSION_MINOR_MASK) +#define PSCI_VERSION(maj, min) \ + ((((maj) << PSCI_VERSION_MAJOR_SHIFT) & PSCI_VERSION_MAJOR_MASK) | \ + ((min) & PSCI_VERSION_MINOR_MASK))
/* PSCI features decoding (>=1.0) */ #define PSCI_1_0_FEATURES_CPU_SUSPEND_PF_SHIFT 1
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 84684fecd7ea381824a96634a027b7719587fb77 upstream.
Instead of open coding the accesses to the various registers, let's add explicit SMCCC accessors.
Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kvm/psci.c | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 10 deletions(-)
--- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -32,6 +32,38 @@
#define AFFINITY_MASK(level) ~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
+static u32 smccc_get_function(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 0); +} + +static unsigned long smccc_get_arg1(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 1); +} + +static unsigned long smccc_get_arg2(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 2); +} + +static unsigned long smccc_get_arg3(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 3); +} + +static void smccc_set_retval(struct kvm_vcpu *vcpu, + unsigned long a0, + unsigned long a1, + unsigned long a2, + unsigned long a3) +{ + vcpu_set_reg(vcpu, 0, a0); + vcpu_set_reg(vcpu, 1, a1); + vcpu_set_reg(vcpu, 2, a2); + vcpu_set_reg(vcpu, 3, a3); +} + static unsigned long psci_affinity_mask(unsigned long affinity_level) { if (affinity_level <= 3) @@ -74,7 +106,7 @@ static unsigned long kvm_psci_vcpu_on(st unsigned long context_id; phys_addr_t target_pc;
- cpu_id = vcpu_get_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK; + cpu_id = smccc_get_arg1(source_vcpu) & MPIDR_HWID_BITMASK; if (vcpu_mode_is_32bit(source_vcpu)) cpu_id &= ~((u32) 0);
@@ -93,8 +125,8 @@ static unsigned long kvm_psci_vcpu_on(st return PSCI_RET_INVALID_PARAMS; }
- target_pc = vcpu_get_reg(source_vcpu, 2); - context_id = vcpu_get_reg(source_vcpu, 3); + target_pc = smccc_get_arg2(source_vcpu); + context_id = smccc_get_arg3(source_vcpu);
kvm_reset_vcpu(vcpu);
@@ -113,7 +145,7 @@ static unsigned long kvm_psci_vcpu_on(st * NOTE: We always update r0 (or x0) because for PSCI v0.1 * the general puspose registers are undefined upon CPU_ON. */ - vcpu_set_reg(vcpu, 0, context_id); + smccc_set_retval(vcpu, context_id, 0, 0, 0); vcpu->arch.power_off = false; smp_mb(); /* Make sure the above is visible */
@@ -133,8 +165,8 @@ static unsigned long kvm_psci_vcpu_affin struct kvm *kvm = vcpu->kvm; struct kvm_vcpu *tmp;
- target_affinity = vcpu_get_reg(vcpu, 1); - lowest_affinity_level = vcpu_get_reg(vcpu, 2); + target_affinity = smccc_get_arg1(vcpu); + lowest_affinity_level = smccc_get_arg2(vcpu);
/* Determine target affinity mask */ target_affinity_mask = psci_affinity_mask(lowest_affinity_level); @@ -208,7 +240,7 @@ int kvm_psci_version(struct kvm_vcpu *vc static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu) { struct kvm *kvm = vcpu->kvm; - unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0); + unsigned long psci_fn = smccc_get_function(vcpu); unsigned long val; int ret = 1;
@@ -275,14 +307,14 @@ static int kvm_psci_0_2_call(struct kvm_ break; }
- vcpu_set_reg(vcpu, 0, val); + smccc_set_retval(vcpu, val, 0, 0, 0); return ret; }
static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu) { struct kvm *kvm = vcpu->kvm; - unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0); + unsigned long psci_fn = smccc_get_function(vcpu); unsigned long val;
switch (psci_fn) { @@ -300,7 +332,7 @@ static int kvm_psci_0_1_call(struct kvm_ break; }
- vcpu_set_reg(vcpu, 0, val); + smccc_set_retval(vcpu, val, 0, 0, 0); return 1; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 58e0b2239a4d997094ba63986ef4de29ddc91d87 upstream.
PSCI 1.0 can be trivially implemented by providing the FEATURES call on top of PSCI 0.2 and returning 1.0 as the PSCI version.
We happily ignore everything else, as they are either optional or are clarifications that do not require any additional change.
PSCI 1.0 is now the default until we decide to add a userspace selection API.
Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kvm/psci.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- include/kvm/arm_psci.h | 3 +++ 2 files changed, 47 insertions(+), 1 deletion(-)
--- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -232,7 +232,7 @@ static void kvm_psci_system_reset(struct int kvm_psci_version(struct kvm_vcpu *vcpu) { if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features)) - return KVM_ARM_PSCI_0_2; + return KVM_ARM_PSCI_LATEST;
return KVM_ARM_PSCI_0_1; } @@ -311,6 +311,47 @@ static int kvm_psci_0_2_call(struct kvm_ return ret; }
+static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu) +{ + u32 psci_fn = smccc_get_function(vcpu); + u32 feature; + unsigned long val; + int ret = 1; + + switch(psci_fn) { + case PSCI_0_2_FN_PSCI_VERSION: + val = KVM_ARM_PSCI_1_0; + break; + case PSCI_1_0_FN_PSCI_FEATURES: + feature = smccc_get_arg1(vcpu); + switch(feature) { + case PSCI_0_2_FN_PSCI_VERSION: + case PSCI_0_2_FN_CPU_SUSPEND: + case PSCI_0_2_FN64_CPU_SUSPEND: + case PSCI_0_2_FN_CPU_OFF: + case PSCI_0_2_FN_CPU_ON: + case PSCI_0_2_FN64_CPU_ON: + case PSCI_0_2_FN_AFFINITY_INFO: + case PSCI_0_2_FN64_AFFINITY_INFO: + case PSCI_0_2_FN_MIGRATE_INFO_TYPE: + case PSCI_0_2_FN_SYSTEM_OFF: + case PSCI_0_2_FN_SYSTEM_RESET: + case PSCI_1_0_FN_PSCI_FEATURES: + val = 0; + break; + default: + val = PSCI_RET_NOT_SUPPORTED; + break; + } + break; + default: + return kvm_psci_0_2_call(vcpu); + } + + smccc_set_retval(vcpu, val, 0, 0, 0); + return ret; +} + static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu) { struct kvm *kvm = vcpu->kvm; @@ -353,6 +394,8 @@ static int kvm_psci_0_1_call(struct kvm_ int kvm_psci_call(struct kvm_vcpu *vcpu) { switch (kvm_psci_version(vcpu)) { + case KVM_ARM_PSCI_1_0: + return kvm_psci_1_0_call(vcpu); case KVM_ARM_PSCI_0_2: return kvm_psci_0_2_call(vcpu); case KVM_ARM_PSCI_0_1: --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -22,6 +22,9 @@
#define KVM_ARM_PSCI_0_1 PSCI_VERSION(0, 1) #define KVM_ARM_PSCI_0_2 PSCI_VERSION(0, 2) +#define KVM_ARM_PSCI_1_0 PSCI_VERSION(1, 0) + +#define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_0
int kvm_psci_version(struct kvm_vcpu *vcpu); int kvm_psci_call(struct kvm_vcpu *vcpu);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 09e6be12effdb33bf7210c8867bbd213b66a499e upstream.
The new SMC Calling Convention (v1.1) allows for a reduced overhead when calling into the firmware, and provides a new feature discovery mechanism.
Make it visible to KVM guests.
Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kvm/handle_exit.c | 2 +- arch/arm/kvm/psci.c | 24 +++++++++++++++++++++++- arch/arm64/kvm/handle_exit.c | 2 +- include/kvm/arm_psci.h | 2 +- include/linux/arm-smccc.h | 13 +++++++++++++ 5 files changed, 39 insertions(+), 4 deletions(-)
--- a/arch/arm/kvm/handle_exit.c +++ b/arch/arm/kvm/handle_exit.c @@ -36,7 +36,7 @@ static int handle_hvc(struct kvm_vcpu *v kvm_vcpu_hvc_get_imm(vcpu)); vcpu->stat.hvc_exit_stat++;
- ret = kvm_psci_call(vcpu); + ret = kvm_hvc_call_handler(vcpu); if (ret < 0) { vcpu_set_reg(vcpu, 0, ~0UL); return 1; --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -15,6 +15,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/preempt.h> #include <linux/kvm_host.h> #include <linux/wait.h> @@ -337,6 +338,7 @@ static int kvm_psci_1_0_call(struct kvm_ case PSCI_0_2_FN_SYSTEM_OFF: case PSCI_0_2_FN_SYSTEM_RESET: case PSCI_1_0_FN_PSCI_FEATURES: + case ARM_SMCCC_VERSION_FUNC_ID: val = 0; break; default: @@ -391,7 +393,7 @@ static int kvm_psci_0_1_call(struct kvm_ * Errors: * -EINVAL: Unrecognized PSCI function */ -int kvm_psci_call(struct kvm_vcpu *vcpu) +static int kvm_psci_call(struct kvm_vcpu *vcpu) { switch (kvm_psci_version(vcpu)) { case KVM_ARM_PSCI_1_0: @@ -404,3 +406,23 @@ int kvm_psci_call(struct kvm_vcpu *vcpu) return -EINVAL; }; } + +int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) +{ + u32 func_id = smccc_get_function(vcpu); + u32 val = PSCI_RET_NOT_SUPPORTED; + + switch (func_id) { + case ARM_SMCCC_VERSION_FUNC_ID: + val = ARM_SMCCC_VERSION_1_1; + break; + case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: + /* Nothing supported yet */ + break; + default: + return kvm_psci_call(vcpu); + } + + smccc_set_retval(vcpu, val, 0, 0, 0); + return 1; +} --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -45,7 +45,7 @@ static int handle_hvc(struct kvm_vcpu *v kvm_vcpu_hvc_get_imm(vcpu)); vcpu->stat.hvc_exit_stat++;
- ret = kvm_psci_call(vcpu); + ret = kvm_hvc_call_handler(vcpu); if (ret < 0) { vcpu_set_reg(vcpu, 0, ~0UL); return 1; --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -27,6 +27,6 @@ #define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_0
int kvm_psci_version(struct kvm_vcpu *vcpu); -int kvm_psci_call(struct kvm_vcpu *vcpu); +int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
#endif /* __KVM_ARM_PSCI_H__ */ --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -60,6 +60,19 @@ #define ARM_SMCCC_QUIRK_NONE 0 #define ARM_SMCCC_QUIRK_QCOM_A6 1 /* Save/restore register a6 */
+#define ARM_SMCCC_VERSION_1_0 0x10000 +#define ARM_SMCCC_VERSION_1_1 0x10001 + +#define ARM_SMCCC_VERSION_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0) + +#define ARM_SMCCC_ARCH_FEATURES_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 1) + #ifndef __ASSEMBLY__
#include <linux/linkage.h>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 90348689d500410ca7a55624c667f956771dce7f upstream.
For those CPUs that require PSCI to perform a BP invalidation, going all the way to the PSCI code for not much is a waste of precious cycles. Let's terminate that call as early as possible.
Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/switch.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -17,6 +17,7 @@
#include <linux/types.h> #include <linux/jump_label.h> +#include <uapi/linux/psci.h>
#include <asm/kvm_asm.h> #include <asm/kvm_emulate.h> @@ -308,6 +309,18 @@ again: if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu)) goto again;
+ if (exit_code == ARM_EXCEPTION_TRAP && + (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC64 || + kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC32) && + vcpu_get_reg(vcpu, 0) == PSCI_0_2_FN_PSCI_VERSION) { + u64 val = PSCI_RET_NOT_SUPPORTED; + if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features)) + val = 2; + + vcpu_set_reg(vcpu, 0, val); + goto again; + } + if (static_branch_unlikely(&vgic_v2_cpuif_trap) && exit_code == ARM_EXCEPTION_TRAP) { bool valid;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit a4097b351118e821841941a79ec77d3ce3f1c5d9 upstream.
We're about to need kvm_psci_version in HYP too. So let's turn it into a static inline, and pass the kvm structure as a second parameter (so that HYP can do a kern_hyp_va on it).
Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/kvm/psci.c | 12 ++---------- arch/arm64/kvm/hyp/switch.c | 18 +++++++++++------- include/kvm/arm_psci.h | 21 ++++++++++++++++++++- 3 files changed, 33 insertions(+), 18 deletions(-)
--- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -120,7 +120,7 @@ static unsigned long kvm_psci_vcpu_on(st if (!vcpu) return PSCI_RET_INVALID_PARAMS; if (!vcpu->arch.power_off) { - if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1) + if (kvm_psci_version(source_vcpu, kvm) != KVM_ARM_PSCI_0_1) return PSCI_RET_ALREADY_ON; else return PSCI_RET_INVALID_PARAMS; @@ -230,14 +230,6 @@ static void kvm_psci_system_reset(struct kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); }
-int kvm_psci_version(struct kvm_vcpu *vcpu) -{ - if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features)) - return KVM_ARM_PSCI_LATEST; - - return KVM_ARM_PSCI_0_1; -} - static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu) { struct kvm *kvm = vcpu->kvm; @@ -395,7 +387,7 @@ static int kvm_psci_0_1_call(struct kvm_ */ static int kvm_psci_call(struct kvm_vcpu *vcpu) { - switch (kvm_psci_version(vcpu)) { + switch (kvm_psci_version(vcpu, vcpu->kvm)) { case KVM_ARM_PSCI_1_0: return kvm_psci_1_0_call(vcpu); case KVM_ARM_PSCI_0_2: --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -19,6 +19,8 @@ #include <linux/jump_label.h> #include <uapi/linux/psci.h>
+#include <kvm/arm_psci.h> + #include <asm/kvm_asm.h> #include <asm/kvm_emulate.h> #include <asm/kvm_hyp.h> @@ -311,14 +313,16 @@ again:
if (exit_code == ARM_EXCEPTION_TRAP && (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC64 || - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC32) && - vcpu_get_reg(vcpu, 0) == PSCI_0_2_FN_PSCI_VERSION) { - u64 val = PSCI_RET_NOT_SUPPORTED; - if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features)) - val = 2; + kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC32)) { + u32 val = vcpu_get_reg(vcpu, 0);
- vcpu_set_reg(vcpu, 0, val); - goto again; + if (val == PSCI_0_2_FN_PSCI_VERSION) { + val = kvm_psci_version(vcpu, kern_hyp_va(vcpu->kvm)); + if (unlikely(val == KVM_ARM_PSCI_0_1)) + val = PSCI_RET_NOT_SUPPORTED; + vcpu_set_reg(vcpu, 0, val); + goto again; + } }
if (static_branch_unlikely(&vgic_v2_cpuif_trap) && --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -18,6 +18,7 @@ #ifndef __KVM_ARM_PSCI_H__ #define __KVM_ARM_PSCI_H__
+#include <linux/kvm_host.h> #include <uapi/linux/psci.h>
#define KVM_ARM_PSCI_0_1 PSCI_VERSION(0, 1) @@ -26,7 +27,25 @@
#define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_0
-int kvm_psci_version(struct kvm_vcpu *vcpu); +/* + * We need the KVM pointer independently from the vcpu as we can call + * this from HYP, and need to apply kern_hyp_va on it... + */ +static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm) +{ + /* + * Our PSCI implementation stays the same across versions from + * v0.2 onward, only adding the few mandatory functions (such + * as FEATURES with 1.0) that are required by newer + * revisions. It is thus safe to return the latest. + */ + if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features)) + return KVM_ARM_PSCI_LATEST; + + return KVM_ARM_PSCI_0_1; +} + + int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
#endif /* __KVM_ARM_PSCI_H__ */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 6167ec5c9145cdf493722dfd80a5d48bafc4a18a upstream.
A new feature of SMCCC 1.1 is that it offers firmware-based CPU workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides BP hardening for CVE-2017-5715.
If the host has some mitigation for this issue, report that we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the host workaround on every guest exit.
Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com [v4.9: account for files moved to virt/ upstream] Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/kvm_host.h | 6 ++++++ arch/arm/kvm/psci.c | 9 ++++++++- arch/arm64/include/asm/kvm_host.h | 5 +++++ include/linux/arm-smccc.h | 5 +++++ 4 files changed, 24 insertions(+), 1 deletion(-)
--- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -318,4 +318,10 @@ static inline int kvm_arm_vcpu_arch_has_ return -ENXIO; }
+static inline bool kvm_arm_harden_branch_predictor(void) +{ + /* No way to detect it yet, pretend it is not there. */ + return false; +} + #endif /* __ARM_KVM_HOST_H__ */ --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -403,13 +403,20 @@ int kvm_hvc_call_handler(struct kvm_vcpu { u32 func_id = smccc_get_function(vcpu); u32 val = PSCI_RET_NOT_SUPPORTED; + u32 feature;
switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: val = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: - /* Nothing supported yet */ + feature = smccc_get_arg1(vcpu); + switch(feature) { + case ARM_SMCCC_ARCH_WORKAROUND_1: + if (kvm_arm_harden_branch_predictor()) + val = 0; + break; + } break; default: return kvm_psci_call(vcpu); --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -393,4 +393,9 @@ static inline void __cpu_init_stage2(voi "PARange is %d bits, unsupported configuration!", parange); }
+static inline bool kvm_arm_harden_branch_predictor(void) +{ + return cpus_have_cap(ARM64_HARDEN_BRANCH_PREDICTOR); +} + #endif /* __ARM64_KVM_HOST_H__ */ --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -73,6 +73,11 @@ ARM_SMCCC_SMC_32, \ 0, 1)
+#define ARM_SMCCC_ARCH_WORKAROUND_1 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0x8000) + #ifndef __ASSEMBLY__
#include <linux/linkage.h>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit f72af90c3783d924337624659b43e2d36f1b36b4 upstream.
We want SMCCC_ARCH_WORKAROUND_1 to be fast. As fast as possible. So let's intercept it as early as we can by testing for the function call number as soon as we've identified a HVC call coming from the guest.
Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Christoffer Dall christoffer.dall@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/hyp-entry.S | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -15,6 +15,7 @@ * along with this program. If not, see http://www.gnu.org/licenses/. */
+#include <linux/arm-smccc.h> #include <linux/linkage.h>
#include <asm/alternative.h> @@ -79,10 +80,11 @@ alternative_endif lsr x0, x1, #ESR_ELx_EC_SHIFT
cmp x0, #ESR_ELx_EC_HVC64 + ccmp x0, #ESR_ELx_EC_HVC32, #4, ne b.ne el1_trap
- mrs x1, vttbr_el2 // If vttbr is valid, the 64bit guest - cbnz x1, el1_trap // called HVC + mrs x1, vttbr_el2 // If vttbr is valid, the guest + cbnz x1, el1_hvc_guest // called HVC
/* Here, we're pretty sure the host called HVC. */ ldp x0, x1, [sp], #16 @@ -101,6 +103,20 @@ alternative_endif
2: eret
+el1_hvc_guest: + /* + * Fastest possible path for ARM_SMCCC_ARCH_WORKAROUND_1. + * The workaround has already been applied on the host, + * so let's quickly get back to the guest. We don't bother + * restoring x1, as it can be clobbered anyway. + */ + ldr x1, [sp] // Guest's x0 + eor w1, w1, #ARM_SMCCC_ARCH_WORKAROUND_1 + cbnz w1, el1_trap + mov x0, x1 + add sp, sp, #16 + eret + el1_trap: /* * x0: ESR_EC
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 09a8d6d48499f93e2abde691f5800081cd858726 upstream.
In order to call into the firmware to apply workarounds, it is useful to find out whether we're using HVC or SMC. Let's expose this through the psci_ops.
Acked-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Robin Murphy robin.murphy@arm.com Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/psci.c | 28 +++++++++++++++++++++++----- include/linux/psci.h | 7 +++++++ 2 files changed, 30 insertions(+), 5 deletions(-)
--- a/drivers/firmware/psci.c +++ b/drivers/firmware/psci.c @@ -59,7 +59,9 @@ bool psci_tos_resident_on(int cpu) return cpu == resident_cpu; }
-struct psci_operations psci_ops; +struct psci_operations psci_ops = { + .conduit = PSCI_CONDUIT_NONE, +};
typedef unsigned long (psci_fn)(unsigned long, unsigned long, unsigned long, unsigned long); @@ -210,6 +212,22 @@ static unsigned long psci_migrate_info_u 0, 0, 0); }
+static void set_conduit(enum psci_conduit conduit) +{ + switch (conduit) { + case PSCI_CONDUIT_HVC: + invoke_psci_fn = __invoke_psci_fn_hvc; + break; + case PSCI_CONDUIT_SMC: + invoke_psci_fn = __invoke_psci_fn_smc; + break; + default: + WARN(1, "Unexpected PSCI conduit %d\n", conduit); + } + + psci_ops.conduit = conduit; +} + static int get_set_conduit_method(struct device_node *np) { const char *method; @@ -222,9 +240,9 @@ static int get_set_conduit_method(struct }
if (!strcmp("hvc", method)) { - invoke_psci_fn = __invoke_psci_fn_hvc; + set_conduit(PSCI_CONDUIT_HVC); } else if (!strcmp("smc", method)) { - invoke_psci_fn = __invoke_psci_fn_smc; + set_conduit(PSCI_CONDUIT_SMC); } else { pr_warn("invalid "method" property: %s\n", method); return -EINVAL; @@ -654,9 +672,9 @@ int __init psci_acpi_init(void) pr_info("probing for conduit method from ACPI.\n");
if (acpi_psci_use_hvc()) - invoke_psci_fn = __invoke_psci_fn_hvc; + set_conduit(PSCI_CONDUIT_HVC); else - invoke_psci_fn = __invoke_psci_fn_smc; + set_conduit(PSCI_CONDUIT_SMC);
return psci_probe(); } --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -25,6 +25,12 @@ bool psci_tos_resident_on(int cpu); int psci_cpu_init_idle(unsigned int cpu); int psci_cpu_suspend_enter(unsigned long index);
+enum psci_conduit { + PSCI_CONDUIT_NONE, + PSCI_CONDUIT_SMC, + PSCI_CONDUIT_HVC, +}; + struct psci_operations { u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); @@ -34,6 +40,7 @@ struct psci_operations { int (*affinity_info)(unsigned long target_affinity, unsigned long lowest_affinity_level); int (*migrate_info_type)(void); + enum psci_conduit conduit; };
extern struct psci_operations psci_ops;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit e78eef554a912ef6c1e0bbf97619dafbeae3339f upstream.
Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed, let's do that at boot time, and expose the version of the calling convention as part of the psci_ops structure.
Acked-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Robin Murphy robin.murphy@arm.com Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/psci.c | 27 +++++++++++++++++++++++++++ include/linux/psci.h | 6 ++++++ 2 files changed, 33 insertions(+)
--- a/drivers/firmware/psci.c +++ b/drivers/firmware/psci.c @@ -61,6 +61,7 @@ bool psci_tos_resident_on(int cpu)
struct psci_operations psci_ops = { .conduit = PSCI_CONDUIT_NONE, + .smccc_version = SMCCC_VERSION_1_0, };
typedef unsigned long (psci_fn)(unsigned long, unsigned long, @@ -511,6 +512,31 @@ static void __init psci_init_migrate(voi pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid); }
+static void __init psci_init_smccc(void) +{ + u32 ver = ARM_SMCCC_VERSION_1_0; + int feature; + + feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID); + + if (feature != PSCI_RET_NOT_SUPPORTED) { + u32 ret; + ret = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0); + if (ret == ARM_SMCCC_VERSION_1_1) { + psci_ops.smccc_version = SMCCC_VERSION_1_1; + ver = ret; + } + } + + /* + * Conveniently, the SMCCC and PSCI versions are encoded the + * same way. No, this isn't accidental. + */ + pr_info("SMC Calling Convention v%d.%d\n", + PSCI_VERSION_MAJOR(ver), PSCI_VERSION_MINOR(ver)); + +} + static void __init psci_0_2_set_functions(void) { pr_info("Using standard PSCI v0.2 function IDs\n"); @@ -559,6 +585,7 @@ static int __init psci_probe(void) psci_init_migrate();
if (PSCI_VERSION_MAJOR(ver) >= 1) { + psci_init_smccc(); psci_init_cpu_suspend(); psci_init_system_suspend(); } --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -31,6 +31,11 @@ enum psci_conduit { PSCI_CONDUIT_HVC, };
+enum smccc_version { + SMCCC_VERSION_1_0, + SMCCC_VERSION_1_1, +}; + struct psci_operations { u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); @@ -41,6 +46,7 @@ struct psci_operations { unsigned long lowest_affinity_level); int (*migrate_info_type)(void); enum psci_conduit conduit; + enum smccc_version smccc_version; };
extern struct psci_operations psci_ops;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit ded4c39e93f3b72968fdb79baba27f3b83dad34c upstream.
Function identifiers are a 32bit, unsigned quantity. But we never tell so to the compiler, resulting in the following:
4ac: b26187e0 mov x0, #0xffffffff80000001
We thus rely on the firmware narrowing it for us, which is not always a reasonable expectation.
Cc: stable@vger.kernel.org Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Acked-by: Ard Biesheuvel ard.biesheuvel@linaro.org Reviewed-by: Robin Murphy robin.murphy@arm.com Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/arm-smccc.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -14,14 +14,16 @@ #ifndef __LINUX_ARM_SMCCC_H #define __LINUX_ARM_SMCCC_H
+#include <uapi/linux/const.h> + /* * This file provides common defines for ARM SMC Calling Convention as * specified in * http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html */
-#define ARM_SMCCC_STD_CALL 0 -#define ARM_SMCCC_FAST_CALL 1 +#define ARM_SMCCC_STD_CALL _AC(0,U) +#define ARM_SMCCC_FAST_CALL _AC(1,U) #define ARM_SMCCC_TYPE_SHIFT 31
#define ARM_SMCCC_SMC_32 0
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit f2d3b2e8759a5833df6f022e42df2d581e6d843c upstream.
One of the major improvement of SMCCC v1.1 is that it only clobbers the first 4 registers, both on 32 and 64bit. This means that it becomes very easy to provide an inline version of the SMC call primitive, and avoid performing a function call to stash the registers that would otherwise be clobbered by SMCCC v1.0.
Reviewed-by: Robin Murphy robin.murphy@arm.com Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/arm-smccc.h | 141 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+)
--- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -150,5 +150,146 @@ asmlinkage void __arm_smccc_hvc(unsigned
#define arm_smccc_hvc_quirk(...) __arm_smccc_hvc(__VA_ARGS__)
+/* SMCCC v1.1 implementation madness follows */ +#ifdef CONFIG_ARM64 + +#define SMCCC_SMC_INST "smc #0" +#define SMCCC_HVC_INST "hvc #0" + +#elif defined(CONFIG_ARM) +#include <asm/opcodes-sec.h> +#include <asm/opcodes-virt.h> + +#define SMCCC_SMC_INST __SMC(0) +#define SMCCC_HVC_INST __HVC(0) + +#endif + +#define ___count_args(_0, _1, _2, _3, _4, _5, _6, _7, _8, x, ...) x + +#define __count_args(...) \ + ___count_args(__VA_ARGS__, 7, 6, 5, 4, 3, 2, 1, 0) + +#define __constraint_write_0 \ + "+r" (r0), "=&r" (r1), "=&r" (r2), "=&r" (r3) +#define __constraint_write_1 \ + "+r" (r0), "+r" (r1), "=&r" (r2), "=&r" (r3) +#define __constraint_write_2 \ + "+r" (r0), "+r" (r1), "+r" (r2), "=&r" (r3) +#define __constraint_write_3 \ + "+r" (r0), "+r" (r1), "+r" (r2), "+r" (r3) +#define __constraint_write_4 __constraint_write_3 +#define __constraint_write_5 __constraint_write_4 +#define __constraint_write_6 __constraint_write_5 +#define __constraint_write_7 __constraint_write_6 + +#define __constraint_read_0 +#define __constraint_read_1 +#define __constraint_read_2 +#define __constraint_read_3 +#define __constraint_read_4 "r" (r4) +#define __constraint_read_5 __constraint_read_4, "r" (r5) +#define __constraint_read_6 __constraint_read_5, "r" (r6) +#define __constraint_read_7 __constraint_read_6, "r" (r7) + +#define __declare_arg_0(a0, res) \ + struct arm_smccc_res *___res = res; \ + register u32 r0 asm("r0") = a0; \ + register unsigned long r1 asm("r1"); \ + register unsigned long r2 asm("r2"); \ + register unsigned long r3 asm("r3") + +#define __declare_arg_1(a0, a1, res) \ + struct arm_smccc_res *___res = res; \ + register u32 r0 asm("r0") = a0; \ + register typeof(a1) r1 asm("r1") = a1; \ + register unsigned long r2 asm("r2"); \ + register unsigned long r3 asm("r3") + +#define __declare_arg_2(a0, a1, a2, res) \ + struct arm_smccc_res *___res = res; \ + register u32 r0 asm("r0") = a0; \ + register typeof(a1) r1 asm("r1") = a1; \ + register typeof(a2) r2 asm("r2") = a2; \ + register unsigned long r3 asm("r3") + +#define __declare_arg_3(a0, a1, a2, a3, res) \ + struct arm_smccc_res *___res = res; \ + register u32 r0 asm("r0") = a0; \ + register typeof(a1) r1 asm("r1") = a1; \ + register typeof(a2) r2 asm("r2") = a2; \ + register typeof(a3) r3 asm("r3") = a3 + +#define __declare_arg_4(a0, a1, a2, a3, a4, res) \ + __declare_arg_3(a0, a1, a2, a3, res); \ + register typeof(a4) r4 asm("r4") = a4 + +#define __declare_arg_5(a0, a1, a2, a3, a4, a5, res) \ + __declare_arg_4(a0, a1, a2, a3, a4, res); \ + register typeof(a5) r5 asm("r5") = a5 + +#define __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res) \ + __declare_arg_5(a0, a1, a2, a3, a4, a5, res); \ + register typeof(a6) r6 asm("r6") = a6 + +#define __declare_arg_7(a0, a1, a2, a3, a4, a5, a6, a7, res) \ + __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res); \ + register typeof(a7) r7 asm("r7") = a7 + +#define ___declare_args(count, ...) __declare_arg_ ## count(__VA_ARGS__) +#define __declare_args(count, ...) ___declare_args(count, __VA_ARGS__) + +#define ___constraints(count) \ + : __constraint_write_ ## count \ + : __constraint_read_ ## count \ + : "memory" +#define __constraints(count) ___constraints(count) + +/* + * We have an output list that is not necessarily used, and GCC feels + * entitled to optimise the whole sequence away. "volatile" is what + * makes it stick. + */ +#define __arm_smccc_1_1(inst, ...) \ + do { \ + __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \ + asm volatile(inst "\n" \ + __constraints(__count_args(__VA_ARGS__))); \ + if (___res) \ + *___res = (typeof(*___res)){r0, r1, r2, r3}; \ + } while (0) + +/* + * arm_smccc_1_1_smc() - make an SMCCC v1.1 compliant SMC call + * + * This is a variadic macro taking one to eight source arguments, and + * an optional return structure. + * + * @a0-a7: arguments passed in registers 0 to 7 + * @res: result values from registers 0 to 3 + * + * This macro is used to make SMC calls following SMC Calling Convention v1.1. + * The content of the supplied param are copied to registers 0 to 7 prior + * to the SMC instruction. The return values are updated with the content + * from register 0 to 3 on return from the SMC instruction if not NULL. + */ +#define arm_smccc_1_1_smc(...) __arm_smccc_1_1(SMCCC_SMC_INST, __VA_ARGS__) + +/* + * arm_smccc_1_1_hvc() - make an SMCCC v1.1 compliant HVC call + * + * This is a variadic macro taking one to eight source arguments, and + * an optional return structure. + * + * @a0-a7: arguments passed in registers 0 to 7 + * @res: result values from registers 0 to 3 + * + * This macro is used to make HVC calls following SMC Calling Convention v1.1. + * The content of the supplied param are copied to registers 0 to 7 prior + * to the HVC instruction. The return values are updated with the content + * from register 0 to 3 on return from the HVC instruction if not NULL. + */ +#define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__) + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit b092201e0020614127f495c092e0a12d26a2116e upstream.
Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1. It is lovely. Really.
Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/crypto/sha256-core.S | 2061 ++++++++++++++++++++++++++++++++++++++++ arch/arm64/crypto/sha512-core.S | 1085 +++++++++++++++++++++ arch/arm64/kernel/bpi.S | 20 arch/arm64/kernel/cpu_errata.c | 72 + 4 files changed, 3235 insertions(+), 3 deletions(-) create mode 100644 arch/arm64/crypto/sha256-core.S create mode 100644 arch/arm64/crypto/sha512-core.S
--- /dev/null +++ b/arch/arm64/crypto/sha256-core.S @@ -0,0 +1,2061 @@ +// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. +// +// Licensed under the OpenSSL license (the "License"). You may not use +// this file except in compliance with the License. You can obtain a copy +// in the file LICENSE in the source distribution or at +// https://www.openssl.org/source/license.html + +// ==================================================================== +// Written by Andy Polyakov appro@openssl.org for the OpenSSL +// project. The module is, however, dual licensed under OpenSSL and +// CRYPTOGAMS licenses depending on where you obtain it. For further +// details see http://www.openssl.org/~appro/cryptogams/. +// +// Permission to use under GPLv2 terms is granted. +// ==================================================================== +// +// SHA256/512 for ARMv8. +// +// Performance in cycles per processed byte and improvement coefficient +// over code generated with "default" compiler: +// +// SHA256-hw SHA256(*) SHA512 +// Apple A7 1.97 10.5 (+33%) 6.73 (-1%(**)) +// Cortex-A53 2.38 15.5 (+115%) 10.0 (+150%(***)) +// Cortex-A57 2.31 11.6 (+86%) 7.51 (+260%(***)) +// Denver 2.01 10.5 (+26%) 6.70 (+8%) +// X-Gene 20.0 (+100%) 12.8 (+300%(***)) +// Mongoose 2.36 13.0 (+50%) 8.36 (+33%) +// +// (*) Software SHA256 results are of lesser relevance, presented +// mostly for informational purposes. +// (**) The result is a trade-off: it's possible to improve it by +// 10% (or by 1 cycle per round), but at the cost of 20% loss +// on Cortex-A53 (or by 4 cycles per round). +// (***) Super-impressive coefficients over gcc-generated code are +// indication of some compiler "pathology", most notably code +// generated with -mgeneral-regs-only is significanty faster +// and the gap is only 40-90%. +// +// October 2016. +// +// Originally it was reckoned that it makes no sense to implement NEON +// version of SHA256 for 64-bit processors. This is because performance +// improvement on most wide-spread Cortex-A5x processors was observed +// to be marginal, same on Cortex-A53 and ~10% on A57. But then it was +// observed that 32-bit NEON SHA256 performs significantly better than +// 64-bit scalar version on *some* of the more recent processors. As +// result 64-bit NEON version of SHA256 was added to provide best +// all-round performance. For example it executes ~30% faster on X-Gene +// and Mongoose. [For reference, NEON version of SHA512 is bound to +// deliver much less improvement, likely *negative* on Cortex-A5x. +// Which is why NEON support is limited to SHA256.] + +#ifndef __KERNEL__ +# include "arm_arch.h" +#endif + +.text + +.extern OPENSSL_armcap_P +.globl sha256_block_data_order +.type sha256_block_data_order,%function +.align 6 +sha256_block_data_order: +#ifndef __KERNEL__ +# ifdef __ILP32__ + ldrsw x16,.LOPENSSL_armcap_P +# else + ldr x16,.LOPENSSL_armcap_P +# endif + adr x17,.LOPENSSL_armcap_P + add x16,x16,x17 + ldr w16,[x16] + tst w16,#ARMV8_SHA256 + b.ne .Lv8_entry + tst w16,#ARMV7_NEON + b.ne .Lneon_entry +#endif + stp x29,x30,[sp,#-128]! + add x29,sp,#0 + + stp x19,x20,[sp,#16] + stp x21,x22,[sp,#32] + stp x23,x24,[sp,#48] + stp x25,x26,[sp,#64] + stp x27,x28,[sp,#80] + sub sp,sp,#4*4 + + ldp w20,w21,[x0] // load context + ldp w22,w23,[x0,#2*4] + ldp w24,w25,[x0,#4*4] + add x2,x1,x2,lsl#6 // end of input + ldp w26,w27,[x0,#6*4] + adr x30,.LK256 + stp x0,x2,[x29,#96] + +.Loop: + ldp w3,w4,[x1],#2*4 + ldr w19,[x30],#4 // *K++ + eor w28,w21,w22 // magic seed + str x1,[x29,#112] +#ifndef __AARCH64EB__ + rev w3,w3 // 0 +#endif + ror w16,w24,#6 + add w27,w27,w19 // h+=K[i] + eor w6,w24,w24,ror#14 + and w17,w25,w24 + bic w19,w26,w24 + add w27,w27,w3 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w20,w21 // a^b, b^c in next round + eor w16,w16,w6,ror#11 // Sigma1(e) + ror w6,w20,#2 + add w27,w27,w17 // h+=Ch(e,f,g) + eor w17,w20,w20,ror#9 + add w27,w27,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w23,w23,w27 // d+=h + eor w28,w28,w21 // Maj(a,b,c) + eor w17,w6,w17,ror#13 // Sigma0(a) + add w27,w27,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w27,w27,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w4,w4 // 1 +#endif + ldp w5,w6,[x1],#2*4 + add w27,w27,w17 // h+=Sigma0(a) + ror w16,w23,#6 + add w26,w26,w28 // h+=K[i] + eor w7,w23,w23,ror#14 + and w17,w24,w23 + bic w28,w25,w23 + add w26,w26,w4 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w27,w20 // a^b, b^c in next round + eor w16,w16,w7,ror#11 // Sigma1(e) + ror w7,w27,#2 + add w26,w26,w17 // h+=Ch(e,f,g) + eor w17,w27,w27,ror#9 + add w26,w26,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w22,w22,w26 // d+=h + eor w19,w19,w20 // Maj(a,b,c) + eor w17,w7,w17,ror#13 // Sigma0(a) + add w26,w26,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w26,w26,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w5,w5 // 2 +#endif + add w26,w26,w17 // h+=Sigma0(a) + ror w16,w22,#6 + add w25,w25,w19 // h+=K[i] + eor w8,w22,w22,ror#14 + and w17,w23,w22 + bic w19,w24,w22 + add w25,w25,w5 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w26,w27 // a^b, b^c in next round + eor w16,w16,w8,ror#11 // Sigma1(e) + ror w8,w26,#2 + add w25,w25,w17 // h+=Ch(e,f,g) + eor w17,w26,w26,ror#9 + add w25,w25,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w21,w21,w25 // d+=h + eor w28,w28,w27 // Maj(a,b,c) + eor w17,w8,w17,ror#13 // Sigma0(a) + add w25,w25,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w25,w25,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w6,w6 // 3 +#endif + ldp w7,w8,[x1],#2*4 + add w25,w25,w17 // h+=Sigma0(a) + ror w16,w21,#6 + add w24,w24,w28 // h+=K[i] + eor w9,w21,w21,ror#14 + and w17,w22,w21 + bic w28,w23,w21 + add w24,w24,w6 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w25,w26 // a^b, b^c in next round + eor w16,w16,w9,ror#11 // Sigma1(e) + ror w9,w25,#2 + add w24,w24,w17 // h+=Ch(e,f,g) + eor w17,w25,w25,ror#9 + add w24,w24,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w20,w20,w24 // d+=h + eor w19,w19,w26 // Maj(a,b,c) + eor w17,w9,w17,ror#13 // Sigma0(a) + add w24,w24,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w24,w24,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w7,w7 // 4 +#endif + add w24,w24,w17 // h+=Sigma0(a) + ror w16,w20,#6 + add w23,w23,w19 // h+=K[i] + eor w10,w20,w20,ror#14 + and w17,w21,w20 + bic w19,w22,w20 + add w23,w23,w7 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w24,w25 // a^b, b^c in next round + eor w16,w16,w10,ror#11 // Sigma1(e) + ror w10,w24,#2 + add w23,w23,w17 // h+=Ch(e,f,g) + eor w17,w24,w24,ror#9 + add w23,w23,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w27,w27,w23 // d+=h + eor w28,w28,w25 // Maj(a,b,c) + eor w17,w10,w17,ror#13 // Sigma0(a) + add w23,w23,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w23,w23,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w8,w8 // 5 +#endif + ldp w9,w10,[x1],#2*4 + add w23,w23,w17 // h+=Sigma0(a) + ror w16,w27,#6 + add w22,w22,w28 // h+=K[i] + eor w11,w27,w27,ror#14 + and w17,w20,w27 + bic w28,w21,w27 + add w22,w22,w8 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w23,w24 // a^b, b^c in next round + eor w16,w16,w11,ror#11 // Sigma1(e) + ror w11,w23,#2 + add w22,w22,w17 // h+=Ch(e,f,g) + eor w17,w23,w23,ror#9 + add w22,w22,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w26,w26,w22 // d+=h + eor w19,w19,w24 // Maj(a,b,c) + eor w17,w11,w17,ror#13 // Sigma0(a) + add w22,w22,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w22,w22,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w9,w9 // 6 +#endif + add w22,w22,w17 // h+=Sigma0(a) + ror w16,w26,#6 + add w21,w21,w19 // h+=K[i] + eor w12,w26,w26,ror#14 + and w17,w27,w26 + bic w19,w20,w26 + add w21,w21,w9 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w22,w23 // a^b, b^c in next round + eor w16,w16,w12,ror#11 // Sigma1(e) + ror w12,w22,#2 + add w21,w21,w17 // h+=Ch(e,f,g) + eor w17,w22,w22,ror#9 + add w21,w21,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w25,w25,w21 // d+=h + eor w28,w28,w23 // Maj(a,b,c) + eor w17,w12,w17,ror#13 // Sigma0(a) + add w21,w21,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w21,w21,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w10,w10 // 7 +#endif + ldp w11,w12,[x1],#2*4 + add w21,w21,w17 // h+=Sigma0(a) + ror w16,w25,#6 + add w20,w20,w28 // h+=K[i] + eor w13,w25,w25,ror#14 + and w17,w26,w25 + bic w28,w27,w25 + add w20,w20,w10 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w21,w22 // a^b, b^c in next round + eor w16,w16,w13,ror#11 // Sigma1(e) + ror w13,w21,#2 + add w20,w20,w17 // h+=Ch(e,f,g) + eor w17,w21,w21,ror#9 + add w20,w20,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w24,w24,w20 // d+=h + eor w19,w19,w22 // Maj(a,b,c) + eor w17,w13,w17,ror#13 // Sigma0(a) + add w20,w20,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w20,w20,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w11,w11 // 8 +#endif + add w20,w20,w17 // h+=Sigma0(a) + ror w16,w24,#6 + add w27,w27,w19 // h+=K[i] + eor w14,w24,w24,ror#14 + and w17,w25,w24 + bic w19,w26,w24 + add w27,w27,w11 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w20,w21 // a^b, b^c in next round + eor w16,w16,w14,ror#11 // Sigma1(e) + ror w14,w20,#2 + add w27,w27,w17 // h+=Ch(e,f,g) + eor w17,w20,w20,ror#9 + add w27,w27,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w23,w23,w27 // d+=h + eor w28,w28,w21 // Maj(a,b,c) + eor w17,w14,w17,ror#13 // Sigma0(a) + add w27,w27,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w27,w27,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w12,w12 // 9 +#endif + ldp w13,w14,[x1],#2*4 + add w27,w27,w17 // h+=Sigma0(a) + ror w16,w23,#6 + add w26,w26,w28 // h+=K[i] + eor w15,w23,w23,ror#14 + and w17,w24,w23 + bic w28,w25,w23 + add w26,w26,w12 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w27,w20 // a^b, b^c in next round + eor w16,w16,w15,ror#11 // Sigma1(e) + ror w15,w27,#2 + add w26,w26,w17 // h+=Ch(e,f,g) + eor w17,w27,w27,ror#9 + add w26,w26,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w22,w22,w26 // d+=h + eor w19,w19,w20 // Maj(a,b,c) + eor w17,w15,w17,ror#13 // Sigma0(a) + add w26,w26,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w26,w26,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w13,w13 // 10 +#endif + add w26,w26,w17 // h+=Sigma0(a) + ror w16,w22,#6 + add w25,w25,w19 // h+=K[i] + eor w0,w22,w22,ror#14 + and w17,w23,w22 + bic w19,w24,w22 + add w25,w25,w13 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w26,w27 // a^b, b^c in next round + eor w16,w16,w0,ror#11 // Sigma1(e) + ror w0,w26,#2 + add w25,w25,w17 // h+=Ch(e,f,g) + eor w17,w26,w26,ror#9 + add w25,w25,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w21,w21,w25 // d+=h + eor w28,w28,w27 // Maj(a,b,c) + eor w17,w0,w17,ror#13 // Sigma0(a) + add w25,w25,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w25,w25,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w14,w14 // 11 +#endif + ldp w15,w0,[x1],#2*4 + add w25,w25,w17 // h+=Sigma0(a) + str w6,[sp,#12] + ror w16,w21,#6 + add w24,w24,w28 // h+=K[i] + eor w6,w21,w21,ror#14 + and w17,w22,w21 + bic w28,w23,w21 + add w24,w24,w14 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w25,w26 // a^b, b^c in next round + eor w16,w16,w6,ror#11 // Sigma1(e) + ror w6,w25,#2 + add w24,w24,w17 // h+=Ch(e,f,g) + eor w17,w25,w25,ror#9 + add w24,w24,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w20,w20,w24 // d+=h + eor w19,w19,w26 // Maj(a,b,c) + eor w17,w6,w17,ror#13 // Sigma0(a) + add w24,w24,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w24,w24,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w15,w15 // 12 +#endif + add w24,w24,w17 // h+=Sigma0(a) + str w7,[sp,#0] + ror w16,w20,#6 + add w23,w23,w19 // h+=K[i] + eor w7,w20,w20,ror#14 + and w17,w21,w20 + bic w19,w22,w20 + add w23,w23,w15 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w24,w25 // a^b, b^c in next round + eor w16,w16,w7,ror#11 // Sigma1(e) + ror w7,w24,#2 + add w23,w23,w17 // h+=Ch(e,f,g) + eor w17,w24,w24,ror#9 + add w23,w23,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w27,w27,w23 // d+=h + eor w28,w28,w25 // Maj(a,b,c) + eor w17,w7,w17,ror#13 // Sigma0(a) + add w23,w23,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w23,w23,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w0,w0 // 13 +#endif + ldp w1,w2,[x1] + add w23,w23,w17 // h+=Sigma0(a) + str w8,[sp,#4] + ror w16,w27,#6 + add w22,w22,w28 // h+=K[i] + eor w8,w27,w27,ror#14 + and w17,w20,w27 + bic w28,w21,w27 + add w22,w22,w0 // h+=X[i] + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w23,w24 // a^b, b^c in next round + eor w16,w16,w8,ror#11 // Sigma1(e) + ror w8,w23,#2 + add w22,w22,w17 // h+=Ch(e,f,g) + eor w17,w23,w23,ror#9 + add w22,w22,w16 // h+=Sigma1(e) + and w19,w19,w28 // (b^c)&=(a^b) + add w26,w26,w22 // d+=h + eor w19,w19,w24 // Maj(a,b,c) + eor w17,w8,w17,ror#13 // Sigma0(a) + add w22,w22,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + //add w22,w22,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w1,w1 // 14 +#endif + ldr w6,[sp,#12] + add w22,w22,w17 // h+=Sigma0(a) + str w9,[sp,#8] + ror w16,w26,#6 + add w21,w21,w19 // h+=K[i] + eor w9,w26,w26,ror#14 + and w17,w27,w26 + bic w19,w20,w26 + add w21,w21,w1 // h+=X[i] + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w22,w23 // a^b, b^c in next round + eor w16,w16,w9,ror#11 // Sigma1(e) + ror w9,w22,#2 + add w21,w21,w17 // h+=Ch(e,f,g) + eor w17,w22,w22,ror#9 + add w21,w21,w16 // h+=Sigma1(e) + and w28,w28,w19 // (b^c)&=(a^b) + add w25,w25,w21 // d+=h + eor w28,w28,w23 // Maj(a,b,c) + eor w17,w9,w17,ror#13 // Sigma0(a) + add w21,w21,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + //add w21,w21,w17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev w2,w2 // 15 +#endif + ldr w7,[sp,#0] + add w21,w21,w17 // h+=Sigma0(a) + str w10,[sp,#12] + ror w16,w25,#6 + add w20,w20,w28 // h+=K[i] + ror w9,w4,#7 + and w17,w26,w25 + ror w8,w1,#17 + bic w28,w27,w25 + ror w10,w21,#2 + add w20,w20,w2 // h+=X[i] + eor w16,w16,w25,ror#11 + eor w9,w9,w4,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w21,w22 // a^b, b^c in next round + eor w16,w16,w25,ror#25 // Sigma1(e) + eor w10,w10,w21,ror#13 + add w20,w20,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w8,w8,w1,ror#19 + eor w9,w9,w4,lsr#3 // sigma0(X[i+1]) + add w20,w20,w16 // h+=Sigma1(e) + eor w19,w19,w22 // Maj(a,b,c) + eor w17,w10,w21,ror#22 // Sigma0(a) + eor w8,w8,w1,lsr#10 // sigma1(X[i+14]) + add w3,w3,w12 + add w24,w24,w20 // d+=h + add w20,w20,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w3,w3,w9 + add w20,w20,w17 // h+=Sigma0(a) + add w3,w3,w8 +.Loop_16_xx: + ldr w8,[sp,#4] + str w11,[sp,#0] + ror w16,w24,#6 + add w27,w27,w19 // h+=K[i] + ror w10,w5,#7 + and w17,w25,w24 + ror w9,w2,#17 + bic w19,w26,w24 + ror w11,w20,#2 + add w27,w27,w3 // h+=X[i] + eor w16,w16,w24,ror#11 + eor w10,w10,w5,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w20,w21 // a^b, b^c in next round + eor w16,w16,w24,ror#25 // Sigma1(e) + eor w11,w11,w20,ror#13 + add w27,w27,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w9,w9,w2,ror#19 + eor w10,w10,w5,lsr#3 // sigma0(X[i+1]) + add w27,w27,w16 // h+=Sigma1(e) + eor w28,w28,w21 // Maj(a,b,c) + eor w17,w11,w20,ror#22 // Sigma0(a) + eor w9,w9,w2,lsr#10 // sigma1(X[i+14]) + add w4,w4,w13 + add w23,w23,w27 // d+=h + add w27,w27,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w4,w4,w10 + add w27,w27,w17 // h+=Sigma0(a) + add w4,w4,w9 + ldr w9,[sp,#8] + str w12,[sp,#4] + ror w16,w23,#6 + add w26,w26,w28 // h+=K[i] + ror w11,w6,#7 + and w17,w24,w23 + ror w10,w3,#17 + bic w28,w25,w23 + ror w12,w27,#2 + add w26,w26,w4 // h+=X[i] + eor w16,w16,w23,ror#11 + eor w11,w11,w6,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w27,w20 // a^b, b^c in next round + eor w16,w16,w23,ror#25 // Sigma1(e) + eor w12,w12,w27,ror#13 + add w26,w26,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w10,w10,w3,ror#19 + eor w11,w11,w6,lsr#3 // sigma0(X[i+1]) + add w26,w26,w16 // h+=Sigma1(e) + eor w19,w19,w20 // Maj(a,b,c) + eor w17,w12,w27,ror#22 // Sigma0(a) + eor w10,w10,w3,lsr#10 // sigma1(X[i+14]) + add w5,w5,w14 + add w22,w22,w26 // d+=h + add w26,w26,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w5,w5,w11 + add w26,w26,w17 // h+=Sigma0(a) + add w5,w5,w10 + ldr w10,[sp,#12] + str w13,[sp,#8] + ror w16,w22,#6 + add w25,w25,w19 // h+=K[i] + ror w12,w7,#7 + and w17,w23,w22 + ror w11,w4,#17 + bic w19,w24,w22 + ror w13,w26,#2 + add w25,w25,w5 // h+=X[i] + eor w16,w16,w22,ror#11 + eor w12,w12,w7,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w26,w27 // a^b, b^c in next round + eor w16,w16,w22,ror#25 // Sigma1(e) + eor w13,w13,w26,ror#13 + add w25,w25,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w11,w11,w4,ror#19 + eor w12,w12,w7,lsr#3 // sigma0(X[i+1]) + add w25,w25,w16 // h+=Sigma1(e) + eor w28,w28,w27 // Maj(a,b,c) + eor w17,w13,w26,ror#22 // Sigma0(a) + eor w11,w11,w4,lsr#10 // sigma1(X[i+14]) + add w6,w6,w15 + add w21,w21,w25 // d+=h + add w25,w25,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w6,w6,w12 + add w25,w25,w17 // h+=Sigma0(a) + add w6,w6,w11 + ldr w11,[sp,#0] + str w14,[sp,#12] + ror w16,w21,#6 + add w24,w24,w28 // h+=K[i] + ror w13,w8,#7 + and w17,w22,w21 + ror w12,w5,#17 + bic w28,w23,w21 + ror w14,w25,#2 + add w24,w24,w6 // h+=X[i] + eor w16,w16,w21,ror#11 + eor w13,w13,w8,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w25,w26 // a^b, b^c in next round + eor w16,w16,w21,ror#25 // Sigma1(e) + eor w14,w14,w25,ror#13 + add w24,w24,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w12,w12,w5,ror#19 + eor w13,w13,w8,lsr#3 // sigma0(X[i+1]) + add w24,w24,w16 // h+=Sigma1(e) + eor w19,w19,w26 // Maj(a,b,c) + eor w17,w14,w25,ror#22 // Sigma0(a) + eor w12,w12,w5,lsr#10 // sigma1(X[i+14]) + add w7,w7,w0 + add w20,w20,w24 // d+=h + add w24,w24,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w7,w7,w13 + add w24,w24,w17 // h+=Sigma0(a) + add w7,w7,w12 + ldr w12,[sp,#4] + str w15,[sp,#0] + ror w16,w20,#6 + add w23,w23,w19 // h+=K[i] + ror w14,w9,#7 + and w17,w21,w20 + ror w13,w6,#17 + bic w19,w22,w20 + ror w15,w24,#2 + add w23,w23,w7 // h+=X[i] + eor w16,w16,w20,ror#11 + eor w14,w14,w9,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w24,w25 // a^b, b^c in next round + eor w16,w16,w20,ror#25 // Sigma1(e) + eor w15,w15,w24,ror#13 + add w23,w23,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w13,w13,w6,ror#19 + eor w14,w14,w9,lsr#3 // sigma0(X[i+1]) + add w23,w23,w16 // h+=Sigma1(e) + eor w28,w28,w25 // Maj(a,b,c) + eor w17,w15,w24,ror#22 // Sigma0(a) + eor w13,w13,w6,lsr#10 // sigma1(X[i+14]) + add w8,w8,w1 + add w27,w27,w23 // d+=h + add w23,w23,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w8,w8,w14 + add w23,w23,w17 // h+=Sigma0(a) + add w8,w8,w13 + ldr w13,[sp,#8] + str w0,[sp,#4] + ror w16,w27,#6 + add w22,w22,w28 // h+=K[i] + ror w15,w10,#7 + and w17,w20,w27 + ror w14,w7,#17 + bic w28,w21,w27 + ror w0,w23,#2 + add w22,w22,w8 // h+=X[i] + eor w16,w16,w27,ror#11 + eor w15,w15,w10,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w23,w24 // a^b, b^c in next round + eor w16,w16,w27,ror#25 // Sigma1(e) + eor w0,w0,w23,ror#13 + add w22,w22,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w14,w14,w7,ror#19 + eor w15,w15,w10,lsr#3 // sigma0(X[i+1]) + add w22,w22,w16 // h+=Sigma1(e) + eor w19,w19,w24 // Maj(a,b,c) + eor w17,w0,w23,ror#22 // Sigma0(a) + eor w14,w14,w7,lsr#10 // sigma1(X[i+14]) + add w9,w9,w2 + add w26,w26,w22 // d+=h + add w22,w22,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w9,w9,w15 + add w22,w22,w17 // h+=Sigma0(a) + add w9,w9,w14 + ldr w14,[sp,#12] + str w1,[sp,#8] + ror w16,w26,#6 + add w21,w21,w19 // h+=K[i] + ror w0,w11,#7 + and w17,w27,w26 + ror w15,w8,#17 + bic w19,w20,w26 + ror w1,w22,#2 + add w21,w21,w9 // h+=X[i] + eor w16,w16,w26,ror#11 + eor w0,w0,w11,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w22,w23 // a^b, b^c in next round + eor w16,w16,w26,ror#25 // Sigma1(e) + eor w1,w1,w22,ror#13 + add w21,w21,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w15,w15,w8,ror#19 + eor w0,w0,w11,lsr#3 // sigma0(X[i+1]) + add w21,w21,w16 // h+=Sigma1(e) + eor w28,w28,w23 // Maj(a,b,c) + eor w17,w1,w22,ror#22 // Sigma0(a) + eor w15,w15,w8,lsr#10 // sigma1(X[i+14]) + add w10,w10,w3 + add w25,w25,w21 // d+=h + add w21,w21,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w10,w10,w0 + add w21,w21,w17 // h+=Sigma0(a) + add w10,w10,w15 + ldr w15,[sp,#0] + str w2,[sp,#12] + ror w16,w25,#6 + add w20,w20,w28 // h+=K[i] + ror w1,w12,#7 + and w17,w26,w25 + ror w0,w9,#17 + bic w28,w27,w25 + ror w2,w21,#2 + add w20,w20,w10 // h+=X[i] + eor w16,w16,w25,ror#11 + eor w1,w1,w12,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w21,w22 // a^b, b^c in next round + eor w16,w16,w25,ror#25 // Sigma1(e) + eor w2,w2,w21,ror#13 + add w20,w20,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w0,w0,w9,ror#19 + eor w1,w1,w12,lsr#3 // sigma0(X[i+1]) + add w20,w20,w16 // h+=Sigma1(e) + eor w19,w19,w22 // Maj(a,b,c) + eor w17,w2,w21,ror#22 // Sigma0(a) + eor w0,w0,w9,lsr#10 // sigma1(X[i+14]) + add w11,w11,w4 + add w24,w24,w20 // d+=h + add w20,w20,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w11,w11,w1 + add w20,w20,w17 // h+=Sigma0(a) + add w11,w11,w0 + ldr w0,[sp,#4] + str w3,[sp,#0] + ror w16,w24,#6 + add w27,w27,w19 // h+=K[i] + ror w2,w13,#7 + and w17,w25,w24 + ror w1,w10,#17 + bic w19,w26,w24 + ror w3,w20,#2 + add w27,w27,w11 // h+=X[i] + eor w16,w16,w24,ror#11 + eor w2,w2,w13,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w20,w21 // a^b, b^c in next round + eor w16,w16,w24,ror#25 // Sigma1(e) + eor w3,w3,w20,ror#13 + add w27,w27,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w1,w1,w10,ror#19 + eor w2,w2,w13,lsr#3 // sigma0(X[i+1]) + add w27,w27,w16 // h+=Sigma1(e) + eor w28,w28,w21 // Maj(a,b,c) + eor w17,w3,w20,ror#22 // Sigma0(a) + eor w1,w1,w10,lsr#10 // sigma1(X[i+14]) + add w12,w12,w5 + add w23,w23,w27 // d+=h + add w27,w27,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w12,w12,w2 + add w27,w27,w17 // h+=Sigma0(a) + add w12,w12,w1 + ldr w1,[sp,#8] + str w4,[sp,#4] + ror w16,w23,#6 + add w26,w26,w28 // h+=K[i] + ror w3,w14,#7 + and w17,w24,w23 + ror w2,w11,#17 + bic w28,w25,w23 + ror w4,w27,#2 + add w26,w26,w12 // h+=X[i] + eor w16,w16,w23,ror#11 + eor w3,w3,w14,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w27,w20 // a^b, b^c in next round + eor w16,w16,w23,ror#25 // Sigma1(e) + eor w4,w4,w27,ror#13 + add w26,w26,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w2,w2,w11,ror#19 + eor w3,w3,w14,lsr#3 // sigma0(X[i+1]) + add w26,w26,w16 // h+=Sigma1(e) + eor w19,w19,w20 // Maj(a,b,c) + eor w17,w4,w27,ror#22 // Sigma0(a) + eor w2,w2,w11,lsr#10 // sigma1(X[i+14]) + add w13,w13,w6 + add w22,w22,w26 // d+=h + add w26,w26,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w13,w13,w3 + add w26,w26,w17 // h+=Sigma0(a) + add w13,w13,w2 + ldr w2,[sp,#12] + str w5,[sp,#8] + ror w16,w22,#6 + add w25,w25,w19 // h+=K[i] + ror w4,w15,#7 + and w17,w23,w22 + ror w3,w12,#17 + bic w19,w24,w22 + ror w5,w26,#2 + add w25,w25,w13 // h+=X[i] + eor w16,w16,w22,ror#11 + eor w4,w4,w15,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w26,w27 // a^b, b^c in next round + eor w16,w16,w22,ror#25 // Sigma1(e) + eor w5,w5,w26,ror#13 + add w25,w25,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w3,w3,w12,ror#19 + eor w4,w4,w15,lsr#3 // sigma0(X[i+1]) + add w25,w25,w16 // h+=Sigma1(e) + eor w28,w28,w27 // Maj(a,b,c) + eor w17,w5,w26,ror#22 // Sigma0(a) + eor w3,w3,w12,lsr#10 // sigma1(X[i+14]) + add w14,w14,w7 + add w21,w21,w25 // d+=h + add w25,w25,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w14,w14,w4 + add w25,w25,w17 // h+=Sigma0(a) + add w14,w14,w3 + ldr w3,[sp,#0] + str w6,[sp,#12] + ror w16,w21,#6 + add w24,w24,w28 // h+=K[i] + ror w5,w0,#7 + and w17,w22,w21 + ror w4,w13,#17 + bic w28,w23,w21 + ror w6,w25,#2 + add w24,w24,w14 // h+=X[i] + eor w16,w16,w21,ror#11 + eor w5,w5,w0,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w25,w26 // a^b, b^c in next round + eor w16,w16,w21,ror#25 // Sigma1(e) + eor w6,w6,w25,ror#13 + add w24,w24,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w4,w4,w13,ror#19 + eor w5,w5,w0,lsr#3 // sigma0(X[i+1]) + add w24,w24,w16 // h+=Sigma1(e) + eor w19,w19,w26 // Maj(a,b,c) + eor w17,w6,w25,ror#22 // Sigma0(a) + eor w4,w4,w13,lsr#10 // sigma1(X[i+14]) + add w15,w15,w8 + add w20,w20,w24 // d+=h + add w24,w24,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w15,w15,w5 + add w24,w24,w17 // h+=Sigma0(a) + add w15,w15,w4 + ldr w4,[sp,#4] + str w7,[sp,#0] + ror w16,w20,#6 + add w23,w23,w19 // h+=K[i] + ror w6,w1,#7 + and w17,w21,w20 + ror w5,w14,#17 + bic w19,w22,w20 + ror w7,w24,#2 + add w23,w23,w15 // h+=X[i] + eor w16,w16,w20,ror#11 + eor w6,w6,w1,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w24,w25 // a^b, b^c in next round + eor w16,w16,w20,ror#25 // Sigma1(e) + eor w7,w7,w24,ror#13 + add w23,w23,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w5,w5,w14,ror#19 + eor w6,w6,w1,lsr#3 // sigma0(X[i+1]) + add w23,w23,w16 // h+=Sigma1(e) + eor w28,w28,w25 // Maj(a,b,c) + eor w17,w7,w24,ror#22 // Sigma0(a) + eor w5,w5,w14,lsr#10 // sigma1(X[i+14]) + add w0,w0,w9 + add w27,w27,w23 // d+=h + add w23,w23,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w0,w0,w6 + add w23,w23,w17 // h+=Sigma0(a) + add w0,w0,w5 + ldr w5,[sp,#8] + str w8,[sp,#4] + ror w16,w27,#6 + add w22,w22,w28 // h+=K[i] + ror w7,w2,#7 + and w17,w20,w27 + ror w6,w15,#17 + bic w28,w21,w27 + ror w8,w23,#2 + add w22,w22,w0 // h+=X[i] + eor w16,w16,w27,ror#11 + eor w7,w7,w2,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w23,w24 // a^b, b^c in next round + eor w16,w16,w27,ror#25 // Sigma1(e) + eor w8,w8,w23,ror#13 + add w22,w22,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w6,w6,w15,ror#19 + eor w7,w7,w2,lsr#3 // sigma0(X[i+1]) + add w22,w22,w16 // h+=Sigma1(e) + eor w19,w19,w24 // Maj(a,b,c) + eor w17,w8,w23,ror#22 // Sigma0(a) + eor w6,w6,w15,lsr#10 // sigma1(X[i+14]) + add w1,w1,w10 + add w26,w26,w22 // d+=h + add w22,w22,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w1,w1,w7 + add w22,w22,w17 // h+=Sigma0(a) + add w1,w1,w6 + ldr w6,[sp,#12] + str w9,[sp,#8] + ror w16,w26,#6 + add w21,w21,w19 // h+=K[i] + ror w8,w3,#7 + and w17,w27,w26 + ror w7,w0,#17 + bic w19,w20,w26 + ror w9,w22,#2 + add w21,w21,w1 // h+=X[i] + eor w16,w16,w26,ror#11 + eor w8,w8,w3,ror#18 + orr w17,w17,w19 // Ch(e,f,g) + eor w19,w22,w23 // a^b, b^c in next round + eor w16,w16,w26,ror#25 // Sigma1(e) + eor w9,w9,w22,ror#13 + add w21,w21,w17 // h+=Ch(e,f,g) + and w28,w28,w19 // (b^c)&=(a^b) + eor w7,w7,w0,ror#19 + eor w8,w8,w3,lsr#3 // sigma0(X[i+1]) + add w21,w21,w16 // h+=Sigma1(e) + eor w28,w28,w23 // Maj(a,b,c) + eor w17,w9,w22,ror#22 // Sigma0(a) + eor w7,w7,w0,lsr#10 // sigma1(X[i+14]) + add w2,w2,w11 + add w25,w25,w21 // d+=h + add w21,w21,w28 // h+=Maj(a,b,c) + ldr w28,[x30],#4 // *K++, w19 in next round + add w2,w2,w8 + add w21,w21,w17 // h+=Sigma0(a) + add w2,w2,w7 + ldr w7,[sp,#0] + str w10,[sp,#12] + ror w16,w25,#6 + add w20,w20,w28 // h+=K[i] + ror w9,w4,#7 + and w17,w26,w25 + ror w8,w1,#17 + bic w28,w27,w25 + ror w10,w21,#2 + add w20,w20,w2 // h+=X[i] + eor w16,w16,w25,ror#11 + eor w9,w9,w4,ror#18 + orr w17,w17,w28 // Ch(e,f,g) + eor w28,w21,w22 // a^b, b^c in next round + eor w16,w16,w25,ror#25 // Sigma1(e) + eor w10,w10,w21,ror#13 + add w20,w20,w17 // h+=Ch(e,f,g) + and w19,w19,w28 // (b^c)&=(a^b) + eor w8,w8,w1,ror#19 + eor w9,w9,w4,lsr#3 // sigma0(X[i+1]) + add w20,w20,w16 // h+=Sigma1(e) + eor w19,w19,w22 // Maj(a,b,c) + eor w17,w10,w21,ror#22 // Sigma0(a) + eor w8,w8,w1,lsr#10 // sigma1(X[i+14]) + add w3,w3,w12 + add w24,w24,w20 // d+=h + add w20,w20,w19 // h+=Maj(a,b,c) + ldr w19,[x30],#4 // *K++, w28 in next round + add w3,w3,w9 + add w20,w20,w17 // h+=Sigma0(a) + add w3,w3,w8 + cbnz w19,.Loop_16_xx + + ldp x0,x2,[x29,#96] + ldr x1,[x29,#112] + sub x30,x30,#260 // rewind + + ldp w3,w4,[x0] + ldp w5,w6,[x0,#2*4] + add x1,x1,#14*4 // advance input pointer + ldp w7,w8,[x0,#4*4] + add w20,w20,w3 + ldp w9,w10,[x0,#6*4] + add w21,w21,w4 + add w22,w22,w5 + add w23,w23,w6 + stp w20,w21,[x0] + add w24,w24,w7 + add w25,w25,w8 + stp w22,w23,[x0,#2*4] + add w26,w26,w9 + add w27,w27,w10 + cmp x1,x2 + stp w24,w25,[x0,#4*4] + stp w26,w27,[x0,#6*4] + b.ne .Loop + + ldp x19,x20,[x29,#16] + add sp,sp,#4*4 + ldp x21,x22,[x29,#32] + ldp x23,x24,[x29,#48] + ldp x25,x26,[x29,#64] + ldp x27,x28,[x29,#80] + ldp x29,x30,[sp],#128 + ret +.size sha256_block_data_order,.-sha256_block_data_order + +.align 6 +.type .LK256,%object +.LK256: + .long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + .long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + .long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + .long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + .long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + .long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + .long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + .long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + .long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + .long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + .long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + .long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + .long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + .long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + .long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + .long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + .long 0 //terminator +.size .LK256,.-.LK256 +#ifndef __KERNEL__ +.align 3 +.LOPENSSL_armcap_P: +# ifdef __ILP32__ + .long OPENSSL_armcap_P-. +# else + .quad OPENSSL_armcap_P-. +# endif +#endif +.asciz "SHA256 block transform for ARMv8, CRYPTOGAMS by appro@openssl.org" +.align 2 +#ifndef __KERNEL__ +.type sha256_block_armv8,%function +.align 6 +sha256_block_armv8: +.Lv8_entry: + stp x29,x30,[sp,#-16]! + add x29,sp,#0 + + ld1 {v0.4s,v1.4s},[x0] + adr x3,.LK256 + +.Loop_hw: + ld1 {v4.16b-v7.16b},[x1],#64 + sub x2,x2,#1 + ld1 {v16.4s},[x3],#16 + rev32 v4.16b,v4.16b + rev32 v5.16b,v5.16b + rev32 v6.16b,v6.16b + rev32 v7.16b,v7.16b + orr v18.16b,v0.16b,v0.16b // offload + orr v19.16b,v1.16b,v1.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v4.4s + .inst 0x5e2828a4 //sha256su0 v4.16b,v5.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + .inst 0x5e0760c4 //sha256su1 v4.16b,v6.16b,v7.16b + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v5.4s + .inst 0x5e2828c5 //sha256su0 v5.16b,v6.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + .inst 0x5e0460e5 //sha256su1 v5.16b,v7.16b,v4.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v6.4s + .inst 0x5e2828e6 //sha256su0 v6.16b,v7.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + .inst 0x5e056086 //sha256su1 v6.16b,v4.16b,v5.16b + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v7.4s + .inst 0x5e282887 //sha256su0 v7.16b,v4.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + .inst 0x5e0660a7 //sha256su1 v7.16b,v5.16b,v6.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v4.4s + .inst 0x5e2828a4 //sha256su0 v4.16b,v5.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + .inst 0x5e0760c4 //sha256su1 v4.16b,v6.16b,v7.16b + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v5.4s + .inst 0x5e2828c5 //sha256su0 v5.16b,v6.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + .inst 0x5e0460e5 //sha256su1 v5.16b,v7.16b,v4.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v6.4s + .inst 0x5e2828e6 //sha256su0 v6.16b,v7.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + .inst 0x5e056086 //sha256su1 v6.16b,v4.16b,v5.16b + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v7.4s + .inst 0x5e282887 //sha256su0 v7.16b,v4.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + .inst 0x5e0660a7 //sha256su1 v7.16b,v5.16b,v6.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v4.4s + .inst 0x5e2828a4 //sha256su0 v4.16b,v5.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + .inst 0x5e0760c4 //sha256su1 v4.16b,v6.16b,v7.16b + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v5.4s + .inst 0x5e2828c5 //sha256su0 v5.16b,v6.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + .inst 0x5e0460e5 //sha256su1 v5.16b,v7.16b,v4.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v6.4s + .inst 0x5e2828e6 //sha256su0 v6.16b,v7.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + .inst 0x5e056086 //sha256su1 v6.16b,v4.16b,v5.16b + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v7.4s + .inst 0x5e282887 //sha256su0 v7.16b,v4.16b + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + .inst 0x5e0660a7 //sha256su1 v7.16b,v5.16b,v6.16b + ld1 {v17.4s},[x3],#16 + add v16.4s,v16.4s,v4.4s + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + + ld1 {v16.4s},[x3],#16 + add v17.4s,v17.4s,v5.4s + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + + ld1 {v17.4s},[x3] + add v16.4s,v16.4s,v6.4s + sub x3,x3,#64*4-16 // rewind + orr v2.16b,v0.16b,v0.16b + .inst 0x5e104020 //sha256h v0.16b,v1.16b,v16.4s + .inst 0x5e105041 //sha256h2 v1.16b,v2.16b,v16.4s + + add v17.4s,v17.4s,v7.4s + orr v2.16b,v0.16b,v0.16b + .inst 0x5e114020 //sha256h v0.16b,v1.16b,v17.4s + .inst 0x5e115041 //sha256h2 v1.16b,v2.16b,v17.4s + + add v0.4s,v0.4s,v18.4s + add v1.4s,v1.4s,v19.4s + + cbnz x2,.Loop_hw + + st1 {v0.4s,v1.4s},[x0] + + ldr x29,[sp],#16 + ret +.size sha256_block_armv8,.-sha256_block_armv8 +#endif +#ifdef __KERNEL__ +.globl sha256_block_neon +#endif +.type sha256_block_neon,%function +.align 4 +sha256_block_neon: +.Lneon_entry: + stp x29, x30, [sp, #-16]! + mov x29, sp + sub sp,sp,#16*4 + + adr x16,.LK256 + add x2,x1,x2,lsl#6 // len to point at the end of inp + + ld1 {v0.16b},[x1], #16 + ld1 {v1.16b},[x1], #16 + ld1 {v2.16b},[x1], #16 + ld1 {v3.16b},[x1], #16 + ld1 {v4.4s},[x16], #16 + ld1 {v5.4s},[x16], #16 + ld1 {v6.4s},[x16], #16 + ld1 {v7.4s},[x16], #16 + rev32 v0.16b,v0.16b // yes, even on + rev32 v1.16b,v1.16b // big-endian + rev32 v2.16b,v2.16b + rev32 v3.16b,v3.16b + mov x17,sp + add v4.4s,v4.4s,v0.4s + add v5.4s,v5.4s,v1.4s + add v6.4s,v6.4s,v2.4s + st1 {v4.4s-v5.4s},[x17], #32 + add v7.4s,v7.4s,v3.4s + st1 {v6.4s-v7.4s},[x17] + sub x17,x17,#32 + + ldp w3,w4,[x0] + ldp w5,w6,[x0,#8] + ldp w7,w8,[x0,#16] + ldp w9,w10,[x0,#24] + ldr w12,[sp,#0] + mov w13,wzr + eor w14,w4,w5 + mov w15,wzr + b .L_00_48 + +.align 4 +.L_00_48: + ext v4.16b,v0.16b,v1.16b,#4 + add w10,w10,w12 + add w3,w3,w15 + and w12,w8,w7 + bic w15,w9,w7 + ext v7.16b,v2.16b,v3.16b,#4 + eor w11,w7,w7,ror#5 + add w3,w3,w13 + mov d19,v3.d[1] + orr w12,w12,w15 + eor w11,w11,w7,ror#19 + ushr v6.4s,v4.4s,#7 + eor w15,w3,w3,ror#11 + ushr v5.4s,v4.4s,#3 + add w10,w10,w12 + add v0.4s,v0.4s,v7.4s + ror w11,w11,#6 + sli v6.4s,v4.4s,#25 + eor w13,w3,w4 + eor w15,w15,w3,ror#20 + ushr v7.4s,v4.4s,#18 + add w10,w10,w11 + ldr w12,[sp,#4] + and w14,w14,w13 + eor v5.16b,v5.16b,v6.16b + ror w15,w15,#2 + add w6,w6,w10 + sli v7.4s,v4.4s,#14 + eor w14,w14,w4 + ushr v16.4s,v19.4s,#17 + add w9,w9,w12 + add w10,w10,w15 + and w12,w7,w6 + eor v5.16b,v5.16b,v7.16b + bic w15,w8,w6 + eor w11,w6,w6,ror#5 + sli v16.4s,v19.4s,#15 + add w10,w10,w14 + orr w12,w12,w15 + ushr v17.4s,v19.4s,#10 + eor w11,w11,w6,ror#19 + eor w15,w10,w10,ror#11 + ushr v7.4s,v19.4s,#19 + add w9,w9,w12 + ror w11,w11,#6 + add v0.4s,v0.4s,v5.4s + eor w14,w10,w3 + eor w15,w15,w10,ror#20 + sli v7.4s,v19.4s,#13 + add w9,w9,w11 + ldr w12,[sp,#8] + and w13,w13,w14 + eor v17.16b,v17.16b,v16.16b + ror w15,w15,#2 + add w5,w5,w9 + eor w13,w13,w3 + eor v17.16b,v17.16b,v7.16b + add w8,w8,w12 + add w9,w9,w15 + and w12,w6,w5 + add v0.4s,v0.4s,v17.4s + bic w15,w7,w5 + eor w11,w5,w5,ror#5 + add w9,w9,w13 + ushr v18.4s,v0.4s,#17 + orr w12,w12,w15 + ushr v19.4s,v0.4s,#10 + eor w11,w11,w5,ror#19 + eor w15,w9,w9,ror#11 + sli v18.4s,v0.4s,#15 + add w8,w8,w12 + ushr v17.4s,v0.4s,#19 + ror w11,w11,#6 + eor w13,w9,w10 + eor v19.16b,v19.16b,v18.16b + eor w15,w15,w9,ror#20 + add w8,w8,w11 + sli v17.4s,v0.4s,#13 + ldr w12,[sp,#12] + and w14,w14,w13 + ror w15,w15,#2 + ld1 {v4.4s},[x16], #16 + add w4,w4,w8 + eor v19.16b,v19.16b,v17.16b + eor w14,w14,w10 + eor v17.16b,v17.16b,v17.16b + add w7,w7,w12 + add w8,w8,w15 + and w12,w5,w4 + mov v17.d[1],v19.d[0] + bic w15,w6,w4 + eor w11,w4,w4,ror#5 + add w8,w8,w14 + add v0.4s,v0.4s,v17.4s + orr w12,w12,w15 + eor w11,w11,w4,ror#19 + eor w15,w8,w8,ror#11 + add v4.4s,v4.4s,v0.4s + add w7,w7,w12 + ror w11,w11,#6 + eor w14,w8,w9 + eor w15,w15,w8,ror#20 + add w7,w7,w11 + ldr w12,[sp,#16] + and w13,w13,w14 + ror w15,w15,#2 + add w3,w3,w7 + eor w13,w13,w9 + st1 {v4.4s},[x17], #16 + ext v4.16b,v1.16b,v2.16b,#4 + add w6,w6,w12 + add w7,w7,w15 + and w12,w4,w3 + bic w15,w5,w3 + ext v7.16b,v3.16b,v0.16b,#4 + eor w11,w3,w3,ror#5 + add w7,w7,w13 + mov d19,v0.d[1] + orr w12,w12,w15 + eor w11,w11,w3,ror#19 + ushr v6.4s,v4.4s,#7 + eor w15,w7,w7,ror#11 + ushr v5.4s,v4.4s,#3 + add w6,w6,w12 + add v1.4s,v1.4s,v7.4s + ror w11,w11,#6 + sli v6.4s,v4.4s,#25 + eor w13,w7,w8 + eor w15,w15,w7,ror#20 + ushr v7.4s,v4.4s,#18 + add w6,w6,w11 + ldr w12,[sp,#20] + and w14,w14,w13 + eor v5.16b,v5.16b,v6.16b + ror w15,w15,#2 + add w10,w10,w6 + sli v7.4s,v4.4s,#14 + eor w14,w14,w8 + ushr v16.4s,v19.4s,#17 + add w5,w5,w12 + add w6,w6,w15 + and w12,w3,w10 + eor v5.16b,v5.16b,v7.16b + bic w15,w4,w10 + eor w11,w10,w10,ror#5 + sli v16.4s,v19.4s,#15 + add w6,w6,w14 + orr w12,w12,w15 + ushr v17.4s,v19.4s,#10 + eor w11,w11,w10,ror#19 + eor w15,w6,w6,ror#11 + ushr v7.4s,v19.4s,#19 + add w5,w5,w12 + ror w11,w11,#6 + add v1.4s,v1.4s,v5.4s + eor w14,w6,w7 + eor w15,w15,w6,ror#20 + sli v7.4s,v19.4s,#13 + add w5,w5,w11 + ldr w12,[sp,#24] + and w13,w13,w14 + eor v17.16b,v17.16b,v16.16b + ror w15,w15,#2 + add w9,w9,w5 + eor w13,w13,w7 + eor v17.16b,v17.16b,v7.16b + add w4,w4,w12 + add w5,w5,w15 + and w12,w10,w9 + add v1.4s,v1.4s,v17.4s + bic w15,w3,w9 + eor w11,w9,w9,ror#5 + add w5,w5,w13 + ushr v18.4s,v1.4s,#17 + orr w12,w12,w15 + ushr v19.4s,v1.4s,#10 + eor w11,w11,w9,ror#19 + eor w15,w5,w5,ror#11 + sli v18.4s,v1.4s,#15 + add w4,w4,w12 + ushr v17.4s,v1.4s,#19 + ror w11,w11,#6 + eor w13,w5,w6 + eor v19.16b,v19.16b,v18.16b + eor w15,w15,w5,ror#20 + add w4,w4,w11 + sli v17.4s,v1.4s,#13 + ldr w12,[sp,#28] + and w14,w14,w13 + ror w15,w15,#2 + ld1 {v4.4s},[x16], #16 + add w8,w8,w4 + eor v19.16b,v19.16b,v17.16b + eor w14,w14,w6 + eor v17.16b,v17.16b,v17.16b + add w3,w3,w12 + add w4,w4,w15 + and w12,w9,w8 + mov v17.d[1],v19.d[0] + bic w15,w10,w8 + eor w11,w8,w8,ror#5 + add w4,w4,w14 + add v1.4s,v1.4s,v17.4s + orr w12,w12,w15 + eor w11,w11,w8,ror#19 + eor w15,w4,w4,ror#11 + add v4.4s,v4.4s,v1.4s + add w3,w3,w12 + ror w11,w11,#6 + eor w14,w4,w5 + eor w15,w15,w4,ror#20 + add w3,w3,w11 + ldr w12,[sp,#32] + and w13,w13,w14 + ror w15,w15,#2 + add w7,w7,w3 + eor w13,w13,w5 + st1 {v4.4s},[x17], #16 + ext v4.16b,v2.16b,v3.16b,#4 + add w10,w10,w12 + add w3,w3,w15 + and w12,w8,w7 + bic w15,w9,w7 + ext v7.16b,v0.16b,v1.16b,#4 + eor w11,w7,w7,ror#5 + add w3,w3,w13 + mov d19,v1.d[1] + orr w12,w12,w15 + eor w11,w11,w7,ror#19 + ushr v6.4s,v4.4s,#7 + eor w15,w3,w3,ror#11 + ushr v5.4s,v4.4s,#3 + add w10,w10,w12 + add v2.4s,v2.4s,v7.4s + ror w11,w11,#6 + sli v6.4s,v4.4s,#25 + eor w13,w3,w4 + eor w15,w15,w3,ror#20 + ushr v7.4s,v4.4s,#18 + add w10,w10,w11 + ldr w12,[sp,#36] + and w14,w14,w13 + eor v5.16b,v5.16b,v6.16b + ror w15,w15,#2 + add w6,w6,w10 + sli v7.4s,v4.4s,#14 + eor w14,w14,w4 + ushr v16.4s,v19.4s,#17 + add w9,w9,w12 + add w10,w10,w15 + and w12,w7,w6 + eor v5.16b,v5.16b,v7.16b + bic w15,w8,w6 + eor w11,w6,w6,ror#5 + sli v16.4s,v19.4s,#15 + add w10,w10,w14 + orr w12,w12,w15 + ushr v17.4s,v19.4s,#10 + eor w11,w11,w6,ror#19 + eor w15,w10,w10,ror#11 + ushr v7.4s,v19.4s,#19 + add w9,w9,w12 + ror w11,w11,#6 + add v2.4s,v2.4s,v5.4s + eor w14,w10,w3 + eor w15,w15,w10,ror#20 + sli v7.4s,v19.4s,#13 + add w9,w9,w11 + ldr w12,[sp,#40] + and w13,w13,w14 + eor v17.16b,v17.16b,v16.16b + ror w15,w15,#2 + add w5,w5,w9 + eor w13,w13,w3 + eor v17.16b,v17.16b,v7.16b + add w8,w8,w12 + add w9,w9,w15 + and w12,w6,w5 + add v2.4s,v2.4s,v17.4s + bic w15,w7,w5 + eor w11,w5,w5,ror#5 + add w9,w9,w13 + ushr v18.4s,v2.4s,#17 + orr w12,w12,w15 + ushr v19.4s,v2.4s,#10 + eor w11,w11,w5,ror#19 + eor w15,w9,w9,ror#11 + sli v18.4s,v2.4s,#15 + add w8,w8,w12 + ushr v17.4s,v2.4s,#19 + ror w11,w11,#6 + eor w13,w9,w10 + eor v19.16b,v19.16b,v18.16b + eor w15,w15,w9,ror#20 + add w8,w8,w11 + sli v17.4s,v2.4s,#13 + ldr w12,[sp,#44] + and w14,w14,w13 + ror w15,w15,#2 + ld1 {v4.4s},[x16], #16 + add w4,w4,w8 + eor v19.16b,v19.16b,v17.16b + eor w14,w14,w10 + eor v17.16b,v17.16b,v17.16b + add w7,w7,w12 + add w8,w8,w15 + and w12,w5,w4 + mov v17.d[1],v19.d[0] + bic w15,w6,w4 + eor w11,w4,w4,ror#5 + add w8,w8,w14 + add v2.4s,v2.4s,v17.4s + orr w12,w12,w15 + eor w11,w11,w4,ror#19 + eor w15,w8,w8,ror#11 + add v4.4s,v4.4s,v2.4s + add w7,w7,w12 + ror w11,w11,#6 + eor w14,w8,w9 + eor w15,w15,w8,ror#20 + add w7,w7,w11 + ldr w12,[sp,#48] + and w13,w13,w14 + ror w15,w15,#2 + add w3,w3,w7 + eor w13,w13,w9 + st1 {v4.4s},[x17], #16 + ext v4.16b,v3.16b,v0.16b,#4 + add w6,w6,w12 + add w7,w7,w15 + and w12,w4,w3 + bic w15,w5,w3 + ext v7.16b,v1.16b,v2.16b,#4 + eor w11,w3,w3,ror#5 + add w7,w7,w13 + mov d19,v2.d[1] + orr w12,w12,w15 + eor w11,w11,w3,ror#19 + ushr v6.4s,v4.4s,#7 + eor w15,w7,w7,ror#11 + ushr v5.4s,v4.4s,#3 + add w6,w6,w12 + add v3.4s,v3.4s,v7.4s + ror w11,w11,#6 + sli v6.4s,v4.4s,#25 + eor w13,w7,w8 + eor w15,w15,w7,ror#20 + ushr v7.4s,v4.4s,#18 + add w6,w6,w11 + ldr w12,[sp,#52] + and w14,w14,w13 + eor v5.16b,v5.16b,v6.16b + ror w15,w15,#2 + add w10,w10,w6 + sli v7.4s,v4.4s,#14 + eor w14,w14,w8 + ushr v16.4s,v19.4s,#17 + add w5,w5,w12 + add w6,w6,w15 + and w12,w3,w10 + eor v5.16b,v5.16b,v7.16b + bic w15,w4,w10 + eor w11,w10,w10,ror#5 + sli v16.4s,v19.4s,#15 + add w6,w6,w14 + orr w12,w12,w15 + ushr v17.4s,v19.4s,#10 + eor w11,w11,w10,ror#19 + eor w15,w6,w6,ror#11 + ushr v7.4s,v19.4s,#19 + add w5,w5,w12 + ror w11,w11,#6 + add v3.4s,v3.4s,v5.4s + eor w14,w6,w7 + eor w15,w15,w6,ror#20 + sli v7.4s,v19.4s,#13 + add w5,w5,w11 + ldr w12,[sp,#56] + and w13,w13,w14 + eor v17.16b,v17.16b,v16.16b + ror w15,w15,#2 + add w9,w9,w5 + eor w13,w13,w7 + eor v17.16b,v17.16b,v7.16b + add w4,w4,w12 + add w5,w5,w15 + and w12,w10,w9 + add v3.4s,v3.4s,v17.4s + bic w15,w3,w9 + eor w11,w9,w9,ror#5 + add w5,w5,w13 + ushr v18.4s,v3.4s,#17 + orr w12,w12,w15 + ushr v19.4s,v3.4s,#10 + eor w11,w11,w9,ror#19 + eor w15,w5,w5,ror#11 + sli v18.4s,v3.4s,#15 + add w4,w4,w12 + ushr v17.4s,v3.4s,#19 + ror w11,w11,#6 + eor w13,w5,w6 + eor v19.16b,v19.16b,v18.16b + eor w15,w15,w5,ror#20 + add w4,w4,w11 + sli v17.4s,v3.4s,#13 + ldr w12,[sp,#60] + and w14,w14,w13 + ror w15,w15,#2 + ld1 {v4.4s},[x16], #16 + add w8,w8,w4 + eor v19.16b,v19.16b,v17.16b + eor w14,w14,w6 + eor v17.16b,v17.16b,v17.16b + add w3,w3,w12 + add w4,w4,w15 + and w12,w9,w8 + mov v17.d[1],v19.d[0] + bic w15,w10,w8 + eor w11,w8,w8,ror#5 + add w4,w4,w14 + add v3.4s,v3.4s,v17.4s + orr w12,w12,w15 + eor w11,w11,w8,ror#19 + eor w15,w4,w4,ror#11 + add v4.4s,v4.4s,v3.4s + add w3,w3,w12 + ror w11,w11,#6 + eor w14,w4,w5 + eor w15,w15,w4,ror#20 + add w3,w3,w11 + ldr w12,[x16] + and w13,w13,w14 + ror w15,w15,#2 + add w7,w7,w3 + eor w13,w13,w5 + st1 {v4.4s},[x17], #16 + cmp w12,#0 // check for K256 terminator + ldr w12,[sp,#0] + sub x17,x17,#64 + bne .L_00_48 + + sub x16,x16,#256 // rewind x16 + cmp x1,x2 + mov x17, #64 + csel x17, x17, xzr, eq + sub x1,x1,x17 // avoid SEGV + mov x17,sp + add w10,w10,w12 + add w3,w3,w15 + and w12,w8,w7 + ld1 {v0.16b},[x1],#16 + bic w15,w9,w7 + eor w11,w7,w7,ror#5 + ld1 {v4.4s},[x16],#16 + add w3,w3,w13 + orr w12,w12,w15 + eor w11,w11,w7,ror#19 + eor w15,w3,w3,ror#11 + rev32 v0.16b,v0.16b + add w10,w10,w12 + ror w11,w11,#6 + eor w13,w3,w4 + eor w15,w15,w3,ror#20 + add v4.4s,v4.4s,v0.4s + add w10,w10,w11 + ldr w12,[sp,#4] + and w14,w14,w13 + ror w15,w15,#2 + add w6,w6,w10 + eor w14,w14,w4 + add w9,w9,w12 + add w10,w10,w15 + and w12,w7,w6 + bic w15,w8,w6 + eor w11,w6,w6,ror#5 + add w10,w10,w14 + orr w12,w12,w15 + eor w11,w11,w6,ror#19 + eor w15,w10,w10,ror#11 + add w9,w9,w12 + ror w11,w11,#6 + eor w14,w10,w3 + eor w15,w15,w10,ror#20 + add w9,w9,w11 + ldr w12,[sp,#8] + and w13,w13,w14 + ror w15,w15,#2 + add w5,w5,w9 + eor w13,w13,w3 + add w8,w8,w12 + add w9,w9,w15 + and w12,w6,w5 + bic w15,w7,w5 + eor w11,w5,w5,ror#5 + add w9,w9,w13 + orr w12,w12,w15 + eor w11,w11,w5,ror#19 + eor w15,w9,w9,ror#11 + add w8,w8,w12 + ror w11,w11,#6 + eor w13,w9,w10 + eor w15,w15,w9,ror#20 + add w8,w8,w11 + ldr w12,[sp,#12] + and w14,w14,w13 + ror w15,w15,#2 + add w4,w4,w8 + eor w14,w14,w10 + add w7,w7,w12 + add w8,w8,w15 + and w12,w5,w4 + bic w15,w6,w4 + eor w11,w4,w4,ror#5 + add w8,w8,w14 + orr w12,w12,w15 + eor w11,w11,w4,ror#19 + eor w15,w8,w8,ror#11 + add w7,w7,w12 + ror w11,w11,#6 + eor w14,w8,w9 + eor w15,w15,w8,ror#20 + add w7,w7,w11 + ldr w12,[sp,#16] + and w13,w13,w14 + ror w15,w15,#2 + add w3,w3,w7 + eor w13,w13,w9 + st1 {v4.4s},[x17], #16 + add w6,w6,w12 + add w7,w7,w15 + and w12,w4,w3 + ld1 {v1.16b},[x1],#16 + bic w15,w5,w3 + eor w11,w3,w3,ror#5 + ld1 {v4.4s},[x16],#16 + add w7,w7,w13 + orr w12,w12,w15 + eor w11,w11,w3,ror#19 + eor w15,w7,w7,ror#11 + rev32 v1.16b,v1.16b + add w6,w6,w12 + ror w11,w11,#6 + eor w13,w7,w8 + eor w15,w15,w7,ror#20 + add v4.4s,v4.4s,v1.4s + add w6,w6,w11 + ldr w12,[sp,#20] + and w14,w14,w13 + ror w15,w15,#2 + add w10,w10,w6 + eor w14,w14,w8 + add w5,w5,w12 + add w6,w6,w15 + and w12,w3,w10 + bic w15,w4,w10 + eor w11,w10,w10,ror#5 + add w6,w6,w14 + orr w12,w12,w15 + eor w11,w11,w10,ror#19 + eor w15,w6,w6,ror#11 + add w5,w5,w12 + ror w11,w11,#6 + eor w14,w6,w7 + eor w15,w15,w6,ror#20 + add w5,w5,w11 + ldr w12,[sp,#24] + and w13,w13,w14 + ror w15,w15,#2 + add w9,w9,w5 + eor w13,w13,w7 + add w4,w4,w12 + add w5,w5,w15 + and w12,w10,w9 + bic w15,w3,w9 + eor w11,w9,w9,ror#5 + add w5,w5,w13 + orr w12,w12,w15 + eor w11,w11,w9,ror#19 + eor w15,w5,w5,ror#11 + add w4,w4,w12 + ror w11,w11,#6 + eor w13,w5,w6 + eor w15,w15,w5,ror#20 + add w4,w4,w11 + ldr w12,[sp,#28] + and w14,w14,w13 + ror w15,w15,#2 + add w8,w8,w4 + eor w14,w14,w6 + add w3,w3,w12 + add w4,w4,w15 + and w12,w9,w8 + bic w15,w10,w8 + eor w11,w8,w8,ror#5 + add w4,w4,w14 + orr w12,w12,w15 + eor w11,w11,w8,ror#19 + eor w15,w4,w4,ror#11 + add w3,w3,w12 + ror w11,w11,#6 + eor w14,w4,w5 + eor w15,w15,w4,ror#20 + add w3,w3,w11 + ldr w12,[sp,#32] + and w13,w13,w14 + ror w15,w15,#2 + add w7,w7,w3 + eor w13,w13,w5 + st1 {v4.4s},[x17], #16 + add w10,w10,w12 + add w3,w3,w15 + and w12,w8,w7 + ld1 {v2.16b},[x1],#16 + bic w15,w9,w7 + eor w11,w7,w7,ror#5 + ld1 {v4.4s},[x16],#16 + add w3,w3,w13 + orr w12,w12,w15 + eor w11,w11,w7,ror#19 + eor w15,w3,w3,ror#11 + rev32 v2.16b,v2.16b + add w10,w10,w12 + ror w11,w11,#6 + eor w13,w3,w4 + eor w15,w15,w3,ror#20 + add v4.4s,v4.4s,v2.4s + add w10,w10,w11 + ldr w12,[sp,#36] + and w14,w14,w13 + ror w15,w15,#2 + add w6,w6,w10 + eor w14,w14,w4 + add w9,w9,w12 + add w10,w10,w15 + and w12,w7,w6 + bic w15,w8,w6 + eor w11,w6,w6,ror#5 + add w10,w10,w14 + orr w12,w12,w15 + eor w11,w11,w6,ror#19 + eor w15,w10,w10,ror#11 + add w9,w9,w12 + ror w11,w11,#6 + eor w14,w10,w3 + eor w15,w15,w10,ror#20 + add w9,w9,w11 + ldr w12,[sp,#40] + and w13,w13,w14 + ror w15,w15,#2 + add w5,w5,w9 + eor w13,w13,w3 + add w8,w8,w12 + add w9,w9,w15 + and w12,w6,w5 + bic w15,w7,w5 + eor w11,w5,w5,ror#5 + add w9,w9,w13 + orr w12,w12,w15 + eor w11,w11,w5,ror#19 + eor w15,w9,w9,ror#11 + add w8,w8,w12 + ror w11,w11,#6 + eor w13,w9,w10 + eor w15,w15,w9,ror#20 + add w8,w8,w11 + ldr w12,[sp,#44] + and w14,w14,w13 + ror w15,w15,#2 + add w4,w4,w8 + eor w14,w14,w10 + add w7,w7,w12 + add w8,w8,w15 + and w12,w5,w4 + bic w15,w6,w4 + eor w11,w4,w4,ror#5 + add w8,w8,w14 + orr w12,w12,w15 + eor w11,w11,w4,ror#19 + eor w15,w8,w8,ror#11 + add w7,w7,w12 + ror w11,w11,#6 + eor w14,w8,w9 + eor w15,w15,w8,ror#20 + add w7,w7,w11 + ldr w12,[sp,#48] + and w13,w13,w14 + ror w15,w15,#2 + add w3,w3,w7 + eor w13,w13,w9 + st1 {v4.4s},[x17], #16 + add w6,w6,w12 + add w7,w7,w15 + and w12,w4,w3 + ld1 {v3.16b},[x1],#16 + bic w15,w5,w3 + eor w11,w3,w3,ror#5 + ld1 {v4.4s},[x16],#16 + add w7,w7,w13 + orr w12,w12,w15 + eor w11,w11,w3,ror#19 + eor w15,w7,w7,ror#11 + rev32 v3.16b,v3.16b + add w6,w6,w12 + ror w11,w11,#6 + eor w13,w7,w8 + eor w15,w15,w7,ror#20 + add v4.4s,v4.4s,v3.4s + add w6,w6,w11 + ldr w12,[sp,#52] + and w14,w14,w13 + ror w15,w15,#2 + add w10,w10,w6 + eor w14,w14,w8 + add w5,w5,w12 + add w6,w6,w15 + and w12,w3,w10 + bic w15,w4,w10 + eor w11,w10,w10,ror#5 + add w6,w6,w14 + orr w12,w12,w15 + eor w11,w11,w10,ror#19 + eor w15,w6,w6,ror#11 + add w5,w5,w12 + ror w11,w11,#6 + eor w14,w6,w7 + eor w15,w15,w6,ror#20 + add w5,w5,w11 + ldr w12,[sp,#56] + and w13,w13,w14 + ror w15,w15,#2 + add w9,w9,w5 + eor w13,w13,w7 + add w4,w4,w12 + add w5,w5,w15 + and w12,w10,w9 + bic w15,w3,w9 + eor w11,w9,w9,ror#5 + add w5,w5,w13 + orr w12,w12,w15 + eor w11,w11,w9,ror#19 + eor w15,w5,w5,ror#11 + add w4,w4,w12 + ror w11,w11,#6 + eor w13,w5,w6 + eor w15,w15,w5,ror#20 + add w4,w4,w11 + ldr w12,[sp,#60] + and w14,w14,w13 + ror w15,w15,#2 + add w8,w8,w4 + eor w14,w14,w6 + add w3,w3,w12 + add w4,w4,w15 + and w12,w9,w8 + bic w15,w10,w8 + eor w11,w8,w8,ror#5 + add w4,w4,w14 + orr w12,w12,w15 + eor w11,w11,w8,ror#19 + eor w15,w4,w4,ror#11 + add w3,w3,w12 + ror w11,w11,#6 + eor w14,w4,w5 + eor w15,w15,w4,ror#20 + add w3,w3,w11 + and w13,w13,w14 + ror w15,w15,#2 + add w7,w7,w3 + eor w13,w13,w5 + st1 {v4.4s},[x17], #16 + add w3,w3,w15 // h+=Sigma0(a) from the past + ldp w11,w12,[x0,#0] + add w3,w3,w13 // h+=Maj(a,b,c) from the past + ldp w13,w14,[x0,#8] + add w3,w3,w11 // accumulate + add w4,w4,w12 + ldp w11,w12,[x0,#16] + add w5,w5,w13 + add w6,w6,w14 + ldp w13,w14,[x0,#24] + add w7,w7,w11 + add w8,w8,w12 + ldr w12,[sp,#0] + stp w3,w4,[x0,#0] + add w9,w9,w13 + mov w13,wzr + stp w5,w6,[x0,#8] + add w10,w10,w14 + stp w7,w8,[x0,#16] + eor w14,w4,w5 + stp w9,w10,[x0,#24] + mov w15,wzr + mov x17,sp + b.ne .L_00_48 + + ldr x29,[x29] + add sp,sp,#16*4+16 + ret +.size sha256_block_neon,.-sha256_block_neon +#ifndef __KERNEL__ +.comm OPENSSL_armcap_P,4,4 +#endif --- /dev/null +++ b/arch/arm64/crypto/sha512-core.S @@ -0,0 +1,1085 @@ +// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. +// +// Licensed under the OpenSSL license (the "License"). You may not use +// this file except in compliance with the License. You can obtain a copy +// in the file LICENSE in the source distribution or at +// https://www.openssl.org/source/license.html + +// ==================================================================== +// Written by Andy Polyakov appro@openssl.org for the OpenSSL +// project. The module is, however, dual licensed under OpenSSL and +// CRYPTOGAMS licenses depending on where you obtain it. For further +// details see http://www.openssl.org/~appro/cryptogams/. +// +// Permission to use under GPLv2 terms is granted. +// ==================================================================== +// +// SHA256/512 for ARMv8. +// +// Performance in cycles per processed byte and improvement coefficient +// over code generated with "default" compiler: +// +// SHA256-hw SHA256(*) SHA512 +// Apple A7 1.97 10.5 (+33%) 6.73 (-1%(**)) +// Cortex-A53 2.38 15.5 (+115%) 10.0 (+150%(***)) +// Cortex-A57 2.31 11.6 (+86%) 7.51 (+260%(***)) +// Denver 2.01 10.5 (+26%) 6.70 (+8%) +// X-Gene 20.0 (+100%) 12.8 (+300%(***)) +// Mongoose 2.36 13.0 (+50%) 8.36 (+33%) +// +// (*) Software SHA256 results are of lesser relevance, presented +// mostly for informational purposes. +// (**) The result is a trade-off: it's possible to improve it by +// 10% (or by 1 cycle per round), but at the cost of 20% loss +// on Cortex-A53 (or by 4 cycles per round). +// (***) Super-impressive coefficients over gcc-generated code are +// indication of some compiler "pathology", most notably code +// generated with -mgeneral-regs-only is significanty faster +// and the gap is only 40-90%. +// +// October 2016. +// +// Originally it was reckoned that it makes no sense to implement NEON +// version of SHA256 for 64-bit processors. This is because performance +// improvement on most wide-spread Cortex-A5x processors was observed +// to be marginal, same on Cortex-A53 and ~10% on A57. But then it was +// observed that 32-bit NEON SHA256 performs significantly better than +// 64-bit scalar version on *some* of the more recent processors. As +// result 64-bit NEON version of SHA256 was added to provide best +// all-round performance. For example it executes ~30% faster on X-Gene +// and Mongoose. [For reference, NEON version of SHA512 is bound to +// deliver much less improvement, likely *negative* on Cortex-A5x. +// Which is why NEON support is limited to SHA256.] + +#ifndef __KERNEL__ +# include "arm_arch.h" +#endif + +.text + +.extern OPENSSL_armcap_P +.globl sha512_block_data_order +.type sha512_block_data_order,%function +.align 6 +sha512_block_data_order: + stp x29,x30,[sp,#-128]! + add x29,sp,#0 + + stp x19,x20,[sp,#16] + stp x21,x22,[sp,#32] + stp x23,x24,[sp,#48] + stp x25,x26,[sp,#64] + stp x27,x28,[sp,#80] + sub sp,sp,#4*8 + + ldp x20,x21,[x0] // load context + ldp x22,x23,[x0,#2*8] + ldp x24,x25,[x0,#4*8] + add x2,x1,x2,lsl#7 // end of input + ldp x26,x27,[x0,#6*8] + adr x30,.LK512 + stp x0,x2,[x29,#96] + +.Loop: + ldp x3,x4,[x1],#2*8 + ldr x19,[x30],#8 // *K++ + eor x28,x21,x22 // magic seed + str x1,[x29,#112] +#ifndef __AARCH64EB__ + rev x3,x3 // 0 +#endif + ror x16,x24,#14 + add x27,x27,x19 // h+=K[i] + eor x6,x24,x24,ror#23 + and x17,x25,x24 + bic x19,x26,x24 + add x27,x27,x3 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x20,x21 // a^b, b^c in next round + eor x16,x16,x6,ror#18 // Sigma1(e) + ror x6,x20,#28 + add x27,x27,x17 // h+=Ch(e,f,g) + eor x17,x20,x20,ror#5 + add x27,x27,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x23,x23,x27 // d+=h + eor x28,x28,x21 // Maj(a,b,c) + eor x17,x6,x17,ror#34 // Sigma0(a) + add x27,x27,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x27,x27,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x4,x4 // 1 +#endif + ldp x5,x6,[x1],#2*8 + add x27,x27,x17 // h+=Sigma0(a) + ror x16,x23,#14 + add x26,x26,x28 // h+=K[i] + eor x7,x23,x23,ror#23 + and x17,x24,x23 + bic x28,x25,x23 + add x26,x26,x4 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x27,x20 // a^b, b^c in next round + eor x16,x16,x7,ror#18 // Sigma1(e) + ror x7,x27,#28 + add x26,x26,x17 // h+=Ch(e,f,g) + eor x17,x27,x27,ror#5 + add x26,x26,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x22,x22,x26 // d+=h + eor x19,x19,x20 // Maj(a,b,c) + eor x17,x7,x17,ror#34 // Sigma0(a) + add x26,x26,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x26,x26,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x5,x5 // 2 +#endif + add x26,x26,x17 // h+=Sigma0(a) + ror x16,x22,#14 + add x25,x25,x19 // h+=K[i] + eor x8,x22,x22,ror#23 + and x17,x23,x22 + bic x19,x24,x22 + add x25,x25,x5 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x26,x27 // a^b, b^c in next round + eor x16,x16,x8,ror#18 // Sigma1(e) + ror x8,x26,#28 + add x25,x25,x17 // h+=Ch(e,f,g) + eor x17,x26,x26,ror#5 + add x25,x25,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x21,x21,x25 // d+=h + eor x28,x28,x27 // Maj(a,b,c) + eor x17,x8,x17,ror#34 // Sigma0(a) + add x25,x25,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x25,x25,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x6,x6 // 3 +#endif + ldp x7,x8,[x1],#2*8 + add x25,x25,x17 // h+=Sigma0(a) + ror x16,x21,#14 + add x24,x24,x28 // h+=K[i] + eor x9,x21,x21,ror#23 + and x17,x22,x21 + bic x28,x23,x21 + add x24,x24,x6 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x25,x26 // a^b, b^c in next round + eor x16,x16,x9,ror#18 // Sigma1(e) + ror x9,x25,#28 + add x24,x24,x17 // h+=Ch(e,f,g) + eor x17,x25,x25,ror#5 + add x24,x24,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x20,x20,x24 // d+=h + eor x19,x19,x26 // Maj(a,b,c) + eor x17,x9,x17,ror#34 // Sigma0(a) + add x24,x24,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x24,x24,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x7,x7 // 4 +#endif + add x24,x24,x17 // h+=Sigma0(a) + ror x16,x20,#14 + add x23,x23,x19 // h+=K[i] + eor x10,x20,x20,ror#23 + and x17,x21,x20 + bic x19,x22,x20 + add x23,x23,x7 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x24,x25 // a^b, b^c in next round + eor x16,x16,x10,ror#18 // Sigma1(e) + ror x10,x24,#28 + add x23,x23,x17 // h+=Ch(e,f,g) + eor x17,x24,x24,ror#5 + add x23,x23,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x27,x27,x23 // d+=h + eor x28,x28,x25 // Maj(a,b,c) + eor x17,x10,x17,ror#34 // Sigma0(a) + add x23,x23,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x23,x23,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x8,x8 // 5 +#endif + ldp x9,x10,[x1],#2*8 + add x23,x23,x17 // h+=Sigma0(a) + ror x16,x27,#14 + add x22,x22,x28 // h+=K[i] + eor x11,x27,x27,ror#23 + and x17,x20,x27 + bic x28,x21,x27 + add x22,x22,x8 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x23,x24 // a^b, b^c in next round + eor x16,x16,x11,ror#18 // Sigma1(e) + ror x11,x23,#28 + add x22,x22,x17 // h+=Ch(e,f,g) + eor x17,x23,x23,ror#5 + add x22,x22,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x26,x26,x22 // d+=h + eor x19,x19,x24 // Maj(a,b,c) + eor x17,x11,x17,ror#34 // Sigma0(a) + add x22,x22,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x22,x22,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x9,x9 // 6 +#endif + add x22,x22,x17 // h+=Sigma0(a) + ror x16,x26,#14 + add x21,x21,x19 // h+=K[i] + eor x12,x26,x26,ror#23 + and x17,x27,x26 + bic x19,x20,x26 + add x21,x21,x9 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x22,x23 // a^b, b^c in next round + eor x16,x16,x12,ror#18 // Sigma1(e) + ror x12,x22,#28 + add x21,x21,x17 // h+=Ch(e,f,g) + eor x17,x22,x22,ror#5 + add x21,x21,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x25,x25,x21 // d+=h + eor x28,x28,x23 // Maj(a,b,c) + eor x17,x12,x17,ror#34 // Sigma0(a) + add x21,x21,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x21,x21,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x10,x10 // 7 +#endif + ldp x11,x12,[x1],#2*8 + add x21,x21,x17 // h+=Sigma0(a) + ror x16,x25,#14 + add x20,x20,x28 // h+=K[i] + eor x13,x25,x25,ror#23 + and x17,x26,x25 + bic x28,x27,x25 + add x20,x20,x10 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x21,x22 // a^b, b^c in next round + eor x16,x16,x13,ror#18 // Sigma1(e) + ror x13,x21,#28 + add x20,x20,x17 // h+=Ch(e,f,g) + eor x17,x21,x21,ror#5 + add x20,x20,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x24,x24,x20 // d+=h + eor x19,x19,x22 // Maj(a,b,c) + eor x17,x13,x17,ror#34 // Sigma0(a) + add x20,x20,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x20,x20,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x11,x11 // 8 +#endif + add x20,x20,x17 // h+=Sigma0(a) + ror x16,x24,#14 + add x27,x27,x19 // h+=K[i] + eor x14,x24,x24,ror#23 + and x17,x25,x24 + bic x19,x26,x24 + add x27,x27,x11 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x20,x21 // a^b, b^c in next round + eor x16,x16,x14,ror#18 // Sigma1(e) + ror x14,x20,#28 + add x27,x27,x17 // h+=Ch(e,f,g) + eor x17,x20,x20,ror#5 + add x27,x27,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x23,x23,x27 // d+=h + eor x28,x28,x21 // Maj(a,b,c) + eor x17,x14,x17,ror#34 // Sigma0(a) + add x27,x27,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x27,x27,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x12,x12 // 9 +#endif + ldp x13,x14,[x1],#2*8 + add x27,x27,x17 // h+=Sigma0(a) + ror x16,x23,#14 + add x26,x26,x28 // h+=K[i] + eor x15,x23,x23,ror#23 + and x17,x24,x23 + bic x28,x25,x23 + add x26,x26,x12 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x27,x20 // a^b, b^c in next round + eor x16,x16,x15,ror#18 // Sigma1(e) + ror x15,x27,#28 + add x26,x26,x17 // h+=Ch(e,f,g) + eor x17,x27,x27,ror#5 + add x26,x26,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x22,x22,x26 // d+=h + eor x19,x19,x20 // Maj(a,b,c) + eor x17,x15,x17,ror#34 // Sigma0(a) + add x26,x26,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x26,x26,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x13,x13 // 10 +#endif + add x26,x26,x17 // h+=Sigma0(a) + ror x16,x22,#14 + add x25,x25,x19 // h+=K[i] + eor x0,x22,x22,ror#23 + and x17,x23,x22 + bic x19,x24,x22 + add x25,x25,x13 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x26,x27 // a^b, b^c in next round + eor x16,x16,x0,ror#18 // Sigma1(e) + ror x0,x26,#28 + add x25,x25,x17 // h+=Ch(e,f,g) + eor x17,x26,x26,ror#5 + add x25,x25,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x21,x21,x25 // d+=h + eor x28,x28,x27 // Maj(a,b,c) + eor x17,x0,x17,ror#34 // Sigma0(a) + add x25,x25,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x25,x25,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x14,x14 // 11 +#endif + ldp x15,x0,[x1],#2*8 + add x25,x25,x17 // h+=Sigma0(a) + str x6,[sp,#24] + ror x16,x21,#14 + add x24,x24,x28 // h+=K[i] + eor x6,x21,x21,ror#23 + and x17,x22,x21 + bic x28,x23,x21 + add x24,x24,x14 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x25,x26 // a^b, b^c in next round + eor x16,x16,x6,ror#18 // Sigma1(e) + ror x6,x25,#28 + add x24,x24,x17 // h+=Ch(e,f,g) + eor x17,x25,x25,ror#5 + add x24,x24,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x20,x20,x24 // d+=h + eor x19,x19,x26 // Maj(a,b,c) + eor x17,x6,x17,ror#34 // Sigma0(a) + add x24,x24,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x24,x24,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x15,x15 // 12 +#endif + add x24,x24,x17 // h+=Sigma0(a) + str x7,[sp,#0] + ror x16,x20,#14 + add x23,x23,x19 // h+=K[i] + eor x7,x20,x20,ror#23 + and x17,x21,x20 + bic x19,x22,x20 + add x23,x23,x15 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x24,x25 // a^b, b^c in next round + eor x16,x16,x7,ror#18 // Sigma1(e) + ror x7,x24,#28 + add x23,x23,x17 // h+=Ch(e,f,g) + eor x17,x24,x24,ror#5 + add x23,x23,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x27,x27,x23 // d+=h + eor x28,x28,x25 // Maj(a,b,c) + eor x17,x7,x17,ror#34 // Sigma0(a) + add x23,x23,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x23,x23,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x0,x0 // 13 +#endif + ldp x1,x2,[x1] + add x23,x23,x17 // h+=Sigma0(a) + str x8,[sp,#8] + ror x16,x27,#14 + add x22,x22,x28 // h+=K[i] + eor x8,x27,x27,ror#23 + and x17,x20,x27 + bic x28,x21,x27 + add x22,x22,x0 // h+=X[i] + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x23,x24 // a^b, b^c in next round + eor x16,x16,x8,ror#18 // Sigma1(e) + ror x8,x23,#28 + add x22,x22,x17 // h+=Ch(e,f,g) + eor x17,x23,x23,ror#5 + add x22,x22,x16 // h+=Sigma1(e) + and x19,x19,x28 // (b^c)&=(a^b) + add x26,x26,x22 // d+=h + eor x19,x19,x24 // Maj(a,b,c) + eor x17,x8,x17,ror#34 // Sigma0(a) + add x22,x22,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + //add x22,x22,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x1,x1 // 14 +#endif + ldr x6,[sp,#24] + add x22,x22,x17 // h+=Sigma0(a) + str x9,[sp,#16] + ror x16,x26,#14 + add x21,x21,x19 // h+=K[i] + eor x9,x26,x26,ror#23 + and x17,x27,x26 + bic x19,x20,x26 + add x21,x21,x1 // h+=X[i] + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x22,x23 // a^b, b^c in next round + eor x16,x16,x9,ror#18 // Sigma1(e) + ror x9,x22,#28 + add x21,x21,x17 // h+=Ch(e,f,g) + eor x17,x22,x22,ror#5 + add x21,x21,x16 // h+=Sigma1(e) + and x28,x28,x19 // (b^c)&=(a^b) + add x25,x25,x21 // d+=h + eor x28,x28,x23 // Maj(a,b,c) + eor x17,x9,x17,ror#34 // Sigma0(a) + add x21,x21,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + //add x21,x21,x17 // h+=Sigma0(a) +#ifndef __AARCH64EB__ + rev x2,x2 // 15 +#endif + ldr x7,[sp,#0] + add x21,x21,x17 // h+=Sigma0(a) + str x10,[sp,#24] + ror x16,x25,#14 + add x20,x20,x28 // h+=K[i] + ror x9,x4,#1 + and x17,x26,x25 + ror x8,x1,#19 + bic x28,x27,x25 + ror x10,x21,#28 + add x20,x20,x2 // h+=X[i] + eor x16,x16,x25,ror#18 + eor x9,x9,x4,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x21,x22 // a^b, b^c in next round + eor x16,x16,x25,ror#41 // Sigma1(e) + eor x10,x10,x21,ror#34 + add x20,x20,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x8,x8,x1,ror#61 + eor x9,x9,x4,lsr#7 // sigma0(X[i+1]) + add x20,x20,x16 // h+=Sigma1(e) + eor x19,x19,x22 // Maj(a,b,c) + eor x17,x10,x21,ror#39 // Sigma0(a) + eor x8,x8,x1,lsr#6 // sigma1(X[i+14]) + add x3,x3,x12 + add x24,x24,x20 // d+=h + add x20,x20,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x3,x3,x9 + add x20,x20,x17 // h+=Sigma0(a) + add x3,x3,x8 +.Loop_16_xx: + ldr x8,[sp,#8] + str x11,[sp,#0] + ror x16,x24,#14 + add x27,x27,x19 // h+=K[i] + ror x10,x5,#1 + and x17,x25,x24 + ror x9,x2,#19 + bic x19,x26,x24 + ror x11,x20,#28 + add x27,x27,x3 // h+=X[i] + eor x16,x16,x24,ror#18 + eor x10,x10,x5,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x20,x21 // a^b, b^c in next round + eor x16,x16,x24,ror#41 // Sigma1(e) + eor x11,x11,x20,ror#34 + add x27,x27,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x9,x9,x2,ror#61 + eor x10,x10,x5,lsr#7 // sigma0(X[i+1]) + add x27,x27,x16 // h+=Sigma1(e) + eor x28,x28,x21 // Maj(a,b,c) + eor x17,x11,x20,ror#39 // Sigma0(a) + eor x9,x9,x2,lsr#6 // sigma1(X[i+14]) + add x4,x4,x13 + add x23,x23,x27 // d+=h + add x27,x27,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x4,x4,x10 + add x27,x27,x17 // h+=Sigma0(a) + add x4,x4,x9 + ldr x9,[sp,#16] + str x12,[sp,#8] + ror x16,x23,#14 + add x26,x26,x28 // h+=K[i] + ror x11,x6,#1 + and x17,x24,x23 + ror x10,x3,#19 + bic x28,x25,x23 + ror x12,x27,#28 + add x26,x26,x4 // h+=X[i] + eor x16,x16,x23,ror#18 + eor x11,x11,x6,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x27,x20 // a^b, b^c in next round + eor x16,x16,x23,ror#41 // Sigma1(e) + eor x12,x12,x27,ror#34 + add x26,x26,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x10,x10,x3,ror#61 + eor x11,x11,x6,lsr#7 // sigma0(X[i+1]) + add x26,x26,x16 // h+=Sigma1(e) + eor x19,x19,x20 // Maj(a,b,c) + eor x17,x12,x27,ror#39 // Sigma0(a) + eor x10,x10,x3,lsr#6 // sigma1(X[i+14]) + add x5,x5,x14 + add x22,x22,x26 // d+=h + add x26,x26,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x5,x5,x11 + add x26,x26,x17 // h+=Sigma0(a) + add x5,x5,x10 + ldr x10,[sp,#24] + str x13,[sp,#16] + ror x16,x22,#14 + add x25,x25,x19 // h+=K[i] + ror x12,x7,#1 + and x17,x23,x22 + ror x11,x4,#19 + bic x19,x24,x22 + ror x13,x26,#28 + add x25,x25,x5 // h+=X[i] + eor x16,x16,x22,ror#18 + eor x12,x12,x7,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x26,x27 // a^b, b^c in next round + eor x16,x16,x22,ror#41 // Sigma1(e) + eor x13,x13,x26,ror#34 + add x25,x25,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x11,x11,x4,ror#61 + eor x12,x12,x7,lsr#7 // sigma0(X[i+1]) + add x25,x25,x16 // h+=Sigma1(e) + eor x28,x28,x27 // Maj(a,b,c) + eor x17,x13,x26,ror#39 // Sigma0(a) + eor x11,x11,x4,lsr#6 // sigma1(X[i+14]) + add x6,x6,x15 + add x21,x21,x25 // d+=h + add x25,x25,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x6,x6,x12 + add x25,x25,x17 // h+=Sigma0(a) + add x6,x6,x11 + ldr x11,[sp,#0] + str x14,[sp,#24] + ror x16,x21,#14 + add x24,x24,x28 // h+=K[i] + ror x13,x8,#1 + and x17,x22,x21 + ror x12,x5,#19 + bic x28,x23,x21 + ror x14,x25,#28 + add x24,x24,x6 // h+=X[i] + eor x16,x16,x21,ror#18 + eor x13,x13,x8,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x25,x26 // a^b, b^c in next round + eor x16,x16,x21,ror#41 // Sigma1(e) + eor x14,x14,x25,ror#34 + add x24,x24,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x12,x12,x5,ror#61 + eor x13,x13,x8,lsr#7 // sigma0(X[i+1]) + add x24,x24,x16 // h+=Sigma1(e) + eor x19,x19,x26 // Maj(a,b,c) + eor x17,x14,x25,ror#39 // Sigma0(a) + eor x12,x12,x5,lsr#6 // sigma1(X[i+14]) + add x7,x7,x0 + add x20,x20,x24 // d+=h + add x24,x24,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x7,x7,x13 + add x24,x24,x17 // h+=Sigma0(a) + add x7,x7,x12 + ldr x12,[sp,#8] + str x15,[sp,#0] + ror x16,x20,#14 + add x23,x23,x19 // h+=K[i] + ror x14,x9,#1 + and x17,x21,x20 + ror x13,x6,#19 + bic x19,x22,x20 + ror x15,x24,#28 + add x23,x23,x7 // h+=X[i] + eor x16,x16,x20,ror#18 + eor x14,x14,x9,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x24,x25 // a^b, b^c in next round + eor x16,x16,x20,ror#41 // Sigma1(e) + eor x15,x15,x24,ror#34 + add x23,x23,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x13,x13,x6,ror#61 + eor x14,x14,x9,lsr#7 // sigma0(X[i+1]) + add x23,x23,x16 // h+=Sigma1(e) + eor x28,x28,x25 // Maj(a,b,c) + eor x17,x15,x24,ror#39 // Sigma0(a) + eor x13,x13,x6,lsr#6 // sigma1(X[i+14]) + add x8,x8,x1 + add x27,x27,x23 // d+=h + add x23,x23,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x8,x8,x14 + add x23,x23,x17 // h+=Sigma0(a) + add x8,x8,x13 + ldr x13,[sp,#16] + str x0,[sp,#8] + ror x16,x27,#14 + add x22,x22,x28 // h+=K[i] + ror x15,x10,#1 + and x17,x20,x27 + ror x14,x7,#19 + bic x28,x21,x27 + ror x0,x23,#28 + add x22,x22,x8 // h+=X[i] + eor x16,x16,x27,ror#18 + eor x15,x15,x10,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x23,x24 // a^b, b^c in next round + eor x16,x16,x27,ror#41 // Sigma1(e) + eor x0,x0,x23,ror#34 + add x22,x22,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x14,x14,x7,ror#61 + eor x15,x15,x10,lsr#7 // sigma0(X[i+1]) + add x22,x22,x16 // h+=Sigma1(e) + eor x19,x19,x24 // Maj(a,b,c) + eor x17,x0,x23,ror#39 // Sigma0(a) + eor x14,x14,x7,lsr#6 // sigma1(X[i+14]) + add x9,x9,x2 + add x26,x26,x22 // d+=h + add x22,x22,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x9,x9,x15 + add x22,x22,x17 // h+=Sigma0(a) + add x9,x9,x14 + ldr x14,[sp,#24] + str x1,[sp,#16] + ror x16,x26,#14 + add x21,x21,x19 // h+=K[i] + ror x0,x11,#1 + and x17,x27,x26 + ror x15,x8,#19 + bic x19,x20,x26 + ror x1,x22,#28 + add x21,x21,x9 // h+=X[i] + eor x16,x16,x26,ror#18 + eor x0,x0,x11,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x22,x23 // a^b, b^c in next round + eor x16,x16,x26,ror#41 // Sigma1(e) + eor x1,x1,x22,ror#34 + add x21,x21,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x15,x15,x8,ror#61 + eor x0,x0,x11,lsr#7 // sigma0(X[i+1]) + add x21,x21,x16 // h+=Sigma1(e) + eor x28,x28,x23 // Maj(a,b,c) + eor x17,x1,x22,ror#39 // Sigma0(a) + eor x15,x15,x8,lsr#6 // sigma1(X[i+14]) + add x10,x10,x3 + add x25,x25,x21 // d+=h + add x21,x21,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x10,x10,x0 + add x21,x21,x17 // h+=Sigma0(a) + add x10,x10,x15 + ldr x15,[sp,#0] + str x2,[sp,#24] + ror x16,x25,#14 + add x20,x20,x28 // h+=K[i] + ror x1,x12,#1 + and x17,x26,x25 + ror x0,x9,#19 + bic x28,x27,x25 + ror x2,x21,#28 + add x20,x20,x10 // h+=X[i] + eor x16,x16,x25,ror#18 + eor x1,x1,x12,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x21,x22 // a^b, b^c in next round + eor x16,x16,x25,ror#41 // Sigma1(e) + eor x2,x2,x21,ror#34 + add x20,x20,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x0,x0,x9,ror#61 + eor x1,x1,x12,lsr#7 // sigma0(X[i+1]) + add x20,x20,x16 // h+=Sigma1(e) + eor x19,x19,x22 // Maj(a,b,c) + eor x17,x2,x21,ror#39 // Sigma0(a) + eor x0,x0,x9,lsr#6 // sigma1(X[i+14]) + add x11,x11,x4 + add x24,x24,x20 // d+=h + add x20,x20,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x11,x11,x1 + add x20,x20,x17 // h+=Sigma0(a) + add x11,x11,x0 + ldr x0,[sp,#8] + str x3,[sp,#0] + ror x16,x24,#14 + add x27,x27,x19 // h+=K[i] + ror x2,x13,#1 + and x17,x25,x24 + ror x1,x10,#19 + bic x19,x26,x24 + ror x3,x20,#28 + add x27,x27,x11 // h+=X[i] + eor x16,x16,x24,ror#18 + eor x2,x2,x13,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x20,x21 // a^b, b^c in next round + eor x16,x16,x24,ror#41 // Sigma1(e) + eor x3,x3,x20,ror#34 + add x27,x27,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x1,x1,x10,ror#61 + eor x2,x2,x13,lsr#7 // sigma0(X[i+1]) + add x27,x27,x16 // h+=Sigma1(e) + eor x28,x28,x21 // Maj(a,b,c) + eor x17,x3,x20,ror#39 // Sigma0(a) + eor x1,x1,x10,lsr#6 // sigma1(X[i+14]) + add x12,x12,x5 + add x23,x23,x27 // d+=h + add x27,x27,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x12,x12,x2 + add x27,x27,x17 // h+=Sigma0(a) + add x12,x12,x1 + ldr x1,[sp,#16] + str x4,[sp,#8] + ror x16,x23,#14 + add x26,x26,x28 // h+=K[i] + ror x3,x14,#1 + and x17,x24,x23 + ror x2,x11,#19 + bic x28,x25,x23 + ror x4,x27,#28 + add x26,x26,x12 // h+=X[i] + eor x16,x16,x23,ror#18 + eor x3,x3,x14,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x27,x20 // a^b, b^c in next round + eor x16,x16,x23,ror#41 // Sigma1(e) + eor x4,x4,x27,ror#34 + add x26,x26,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x2,x2,x11,ror#61 + eor x3,x3,x14,lsr#7 // sigma0(X[i+1]) + add x26,x26,x16 // h+=Sigma1(e) + eor x19,x19,x20 // Maj(a,b,c) + eor x17,x4,x27,ror#39 // Sigma0(a) + eor x2,x2,x11,lsr#6 // sigma1(X[i+14]) + add x13,x13,x6 + add x22,x22,x26 // d+=h + add x26,x26,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x13,x13,x3 + add x26,x26,x17 // h+=Sigma0(a) + add x13,x13,x2 + ldr x2,[sp,#24] + str x5,[sp,#16] + ror x16,x22,#14 + add x25,x25,x19 // h+=K[i] + ror x4,x15,#1 + and x17,x23,x22 + ror x3,x12,#19 + bic x19,x24,x22 + ror x5,x26,#28 + add x25,x25,x13 // h+=X[i] + eor x16,x16,x22,ror#18 + eor x4,x4,x15,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x26,x27 // a^b, b^c in next round + eor x16,x16,x22,ror#41 // Sigma1(e) + eor x5,x5,x26,ror#34 + add x25,x25,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x3,x3,x12,ror#61 + eor x4,x4,x15,lsr#7 // sigma0(X[i+1]) + add x25,x25,x16 // h+=Sigma1(e) + eor x28,x28,x27 // Maj(a,b,c) + eor x17,x5,x26,ror#39 // Sigma0(a) + eor x3,x3,x12,lsr#6 // sigma1(X[i+14]) + add x14,x14,x7 + add x21,x21,x25 // d+=h + add x25,x25,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x14,x14,x4 + add x25,x25,x17 // h+=Sigma0(a) + add x14,x14,x3 + ldr x3,[sp,#0] + str x6,[sp,#24] + ror x16,x21,#14 + add x24,x24,x28 // h+=K[i] + ror x5,x0,#1 + and x17,x22,x21 + ror x4,x13,#19 + bic x28,x23,x21 + ror x6,x25,#28 + add x24,x24,x14 // h+=X[i] + eor x16,x16,x21,ror#18 + eor x5,x5,x0,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x25,x26 // a^b, b^c in next round + eor x16,x16,x21,ror#41 // Sigma1(e) + eor x6,x6,x25,ror#34 + add x24,x24,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x4,x4,x13,ror#61 + eor x5,x5,x0,lsr#7 // sigma0(X[i+1]) + add x24,x24,x16 // h+=Sigma1(e) + eor x19,x19,x26 // Maj(a,b,c) + eor x17,x6,x25,ror#39 // Sigma0(a) + eor x4,x4,x13,lsr#6 // sigma1(X[i+14]) + add x15,x15,x8 + add x20,x20,x24 // d+=h + add x24,x24,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x15,x15,x5 + add x24,x24,x17 // h+=Sigma0(a) + add x15,x15,x4 + ldr x4,[sp,#8] + str x7,[sp,#0] + ror x16,x20,#14 + add x23,x23,x19 // h+=K[i] + ror x6,x1,#1 + and x17,x21,x20 + ror x5,x14,#19 + bic x19,x22,x20 + ror x7,x24,#28 + add x23,x23,x15 // h+=X[i] + eor x16,x16,x20,ror#18 + eor x6,x6,x1,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x24,x25 // a^b, b^c in next round + eor x16,x16,x20,ror#41 // Sigma1(e) + eor x7,x7,x24,ror#34 + add x23,x23,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x5,x5,x14,ror#61 + eor x6,x6,x1,lsr#7 // sigma0(X[i+1]) + add x23,x23,x16 // h+=Sigma1(e) + eor x28,x28,x25 // Maj(a,b,c) + eor x17,x7,x24,ror#39 // Sigma0(a) + eor x5,x5,x14,lsr#6 // sigma1(X[i+14]) + add x0,x0,x9 + add x27,x27,x23 // d+=h + add x23,x23,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x0,x0,x6 + add x23,x23,x17 // h+=Sigma0(a) + add x0,x0,x5 + ldr x5,[sp,#16] + str x8,[sp,#8] + ror x16,x27,#14 + add x22,x22,x28 // h+=K[i] + ror x7,x2,#1 + and x17,x20,x27 + ror x6,x15,#19 + bic x28,x21,x27 + ror x8,x23,#28 + add x22,x22,x0 // h+=X[i] + eor x16,x16,x27,ror#18 + eor x7,x7,x2,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x23,x24 // a^b, b^c in next round + eor x16,x16,x27,ror#41 // Sigma1(e) + eor x8,x8,x23,ror#34 + add x22,x22,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x6,x6,x15,ror#61 + eor x7,x7,x2,lsr#7 // sigma0(X[i+1]) + add x22,x22,x16 // h+=Sigma1(e) + eor x19,x19,x24 // Maj(a,b,c) + eor x17,x8,x23,ror#39 // Sigma0(a) + eor x6,x6,x15,lsr#6 // sigma1(X[i+14]) + add x1,x1,x10 + add x26,x26,x22 // d+=h + add x22,x22,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x1,x1,x7 + add x22,x22,x17 // h+=Sigma0(a) + add x1,x1,x6 + ldr x6,[sp,#24] + str x9,[sp,#16] + ror x16,x26,#14 + add x21,x21,x19 // h+=K[i] + ror x8,x3,#1 + and x17,x27,x26 + ror x7,x0,#19 + bic x19,x20,x26 + ror x9,x22,#28 + add x21,x21,x1 // h+=X[i] + eor x16,x16,x26,ror#18 + eor x8,x8,x3,ror#8 + orr x17,x17,x19 // Ch(e,f,g) + eor x19,x22,x23 // a^b, b^c in next round + eor x16,x16,x26,ror#41 // Sigma1(e) + eor x9,x9,x22,ror#34 + add x21,x21,x17 // h+=Ch(e,f,g) + and x28,x28,x19 // (b^c)&=(a^b) + eor x7,x7,x0,ror#61 + eor x8,x8,x3,lsr#7 // sigma0(X[i+1]) + add x21,x21,x16 // h+=Sigma1(e) + eor x28,x28,x23 // Maj(a,b,c) + eor x17,x9,x22,ror#39 // Sigma0(a) + eor x7,x7,x0,lsr#6 // sigma1(X[i+14]) + add x2,x2,x11 + add x25,x25,x21 // d+=h + add x21,x21,x28 // h+=Maj(a,b,c) + ldr x28,[x30],#8 // *K++, x19 in next round + add x2,x2,x8 + add x21,x21,x17 // h+=Sigma0(a) + add x2,x2,x7 + ldr x7,[sp,#0] + str x10,[sp,#24] + ror x16,x25,#14 + add x20,x20,x28 // h+=K[i] + ror x9,x4,#1 + and x17,x26,x25 + ror x8,x1,#19 + bic x28,x27,x25 + ror x10,x21,#28 + add x20,x20,x2 // h+=X[i] + eor x16,x16,x25,ror#18 + eor x9,x9,x4,ror#8 + orr x17,x17,x28 // Ch(e,f,g) + eor x28,x21,x22 // a^b, b^c in next round + eor x16,x16,x25,ror#41 // Sigma1(e) + eor x10,x10,x21,ror#34 + add x20,x20,x17 // h+=Ch(e,f,g) + and x19,x19,x28 // (b^c)&=(a^b) + eor x8,x8,x1,ror#61 + eor x9,x9,x4,lsr#7 // sigma0(X[i+1]) + add x20,x20,x16 // h+=Sigma1(e) + eor x19,x19,x22 // Maj(a,b,c) + eor x17,x10,x21,ror#39 // Sigma0(a) + eor x8,x8,x1,lsr#6 // sigma1(X[i+14]) + add x3,x3,x12 + add x24,x24,x20 // d+=h + add x20,x20,x19 // h+=Maj(a,b,c) + ldr x19,[x30],#8 // *K++, x28 in next round + add x3,x3,x9 + add x20,x20,x17 // h+=Sigma0(a) + add x3,x3,x8 + cbnz x19,.Loop_16_xx + + ldp x0,x2,[x29,#96] + ldr x1,[x29,#112] + sub x30,x30,#648 // rewind + + ldp x3,x4,[x0] + ldp x5,x6,[x0,#2*8] + add x1,x1,#14*8 // advance input pointer + ldp x7,x8,[x0,#4*8] + add x20,x20,x3 + ldp x9,x10,[x0,#6*8] + add x21,x21,x4 + add x22,x22,x5 + add x23,x23,x6 + stp x20,x21,[x0] + add x24,x24,x7 + add x25,x25,x8 + stp x22,x23,[x0,#2*8] + add x26,x26,x9 + add x27,x27,x10 + cmp x1,x2 + stp x24,x25,[x0,#4*8] + stp x26,x27,[x0,#6*8] + b.ne .Loop + + ldp x19,x20,[x29,#16] + add sp,sp,#4*8 + ldp x21,x22,[x29,#32] + ldp x23,x24,[x29,#48] + ldp x25,x26,[x29,#64] + ldp x27,x28,[x29,#80] + ldp x29,x30,[sp],#128 + ret +.size sha512_block_data_order,.-sha512_block_data_order + +.align 6 +.type .LK512,%object +.LK512: + .quad 0x428a2f98d728ae22,0x7137449123ef65cd + .quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc + .quad 0x3956c25bf348b538,0x59f111f1b605d019 + .quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 + .quad 0xd807aa98a3030242,0x12835b0145706fbe + .quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 + .quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 + .quad 0x9bdc06a725c71235,0xc19bf174cf692694 + .quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 + .quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 + .quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 + .quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 + .quad 0x983e5152ee66dfab,0xa831c66d2db43210 + .quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 + .quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 + .quad 0x06ca6351e003826f,0x142929670a0e6e70 + .quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 + .quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df + .quad 0x650a73548baf63de,0x766a0abb3c77b2a8 + .quad 0x81c2c92e47edaee6,0x92722c851482353b + .quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 + .quad 0xc24b8b70d0f89791,0xc76c51a30654be30 + .quad 0xd192e819d6ef5218,0xd69906245565a910 + .quad 0xf40e35855771202a,0x106aa07032bbd1b8 + .quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 + .quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 + .quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb + .quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 + .quad 0x748f82ee5defb2fc,0x78a5636f43172f60 + .quad 0x84c87814a1f0ab72,0x8cc702081a6439ec + .quad 0x90befffa23631e28,0xa4506cebde82bde9 + .quad 0xbef9a3f7b2c67915,0xc67178f2e372532b + .quad 0xca273eceea26619c,0xd186b8c721c0c207 + .quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 + .quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 + .quad 0x113f9804bef90dae,0x1b710b35131c471b + .quad 0x28db77f523047d84,0x32caab7b40c72493 + .quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c + .quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a + .quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 + .quad 0 // terminator +.size .LK512,.-.LK512 +#ifndef __KERNEL__ +.align 3 +.LOPENSSL_armcap_P: +# ifdef __ILP32__ + .long OPENSSL_armcap_P-. +# else + .quad OPENSSL_armcap_P-. +# endif +#endif +.asciz "SHA512 block transform for ARMv8, CRYPTOGAMS by appro@openssl.org" +.align 2 +#ifndef __KERNEL__ +.comm OPENSSL_armcap_P,4,4 +#endif --- a/arch/arm64/kernel/bpi.S +++ b/arch/arm64/kernel/bpi.S @@ -17,6 +17,7 @@ */
#include <linux/linkage.h> +#include <linux/arm-smccc.h>
.macro ventry target .rept 31 @@ -77,3 +78,22 @@ ENTRY(__psci_hyp_bp_inval_start) ldp x0, x1, [sp, #(16 * 8)] add sp, sp, #(8 * 18) ENTRY(__psci_hyp_bp_inval_end) + +.macro smccc_workaround_1 inst + sub sp, sp, #(8 * 4) + stp x2, x3, [sp, #(8 * 0)] + stp x0, x1, [sp, #(8 * 2)] + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_1 + \inst #0 + ldp x2, x3, [sp, #(8 * 0)] + ldp x0, x1, [sp, #(8 * 2)] + add sp, sp, #(8 * 4) +.endm + +ENTRY(__smccc_workaround_1_smc_start) + smccc_workaround_1 smc +ENTRY(__smccc_workaround_1_smc_end) + +ENTRY(__smccc_workaround_1_hvc_start) + smccc_workaround_1 hvc +ENTRY(__smccc_workaround_1_hvc_end) --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -54,6 +54,10 @@ DEFINE_PER_CPU_READ_MOSTLY(struct bp_har
#ifdef CONFIG_KVM extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[]; +extern char __smccc_workaround_1_smc_start[]; +extern char __smccc_workaround_1_smc_end[]; +extern char __smccc_workaround_1_hvc_start[]; +extern char __smccc_workaround_1_hvc_end[];
static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start, const char *hyp_vecs_end) @@ -96,8 +100,12 @@ static void __install_bp_hardening_cb(bp spin_unlock(&bp_lock); } #else -#define __psci_hyp_bp_inval_start NULL -#define __psci_hyp_bp_inval_end NULL +#define __psci_hyp_bp_inval_start NULL +#define __psci_hyp_bp_inval_end NULL +#define __smccc_workaround_1_smc_start NULL +#define __smccc_workaround_1_smc_end NULL +#define __smccc_workaround_1_hvc_start NULL +#define __smccc_workaround_1_hvc_end NULL
static void __install_bp_hardening_cb(bp_hardening_cb_t fn, const char *hyp_vecs_start, @@ -124,17 +132,75 @@ static void install_bp_hardening_cb(con __install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end); }
+#include <uapi/linux/psci.h> +#include <linux/arm-smccc.h> #include <linux/psci.h>
+static void call_smc_arch_workaround_1(void) +{ + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL); +} + +static void call_hvc_arch_workaround_1(void) +{ + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL); +} + +static bool check_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry) +{ + bp_hardening_cb_t cb; + void *smccc_start, *smccc_end; + struct arm_smccc_res res; + + if (!entry->matches(entry, SCOPE_LOCAL_CPU)) + return false; + + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + return false; + + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_1, &res); + if (res.a0) + return false; + cb = call_hvc_arch_workaround_1; + smccc_start = __smccc_workaround_1_hvc_start; + smccc_end = __smccc_workaround_1_hvc_end; + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_1, &res); + if (res.a0) + return false; + cb = call_smc_arch_workaround_1; + smccc_start = __smccc_workaround_1_smc_start; + smccc_end = __smccc_workaround_1_smc_end; + break; + + default: + return false; + } + + install_bp_hardening_cb(entry, cb, smccc_start, smccc_end); + + return true; +} + static int enable_psci_bp_hardening(void *data) { const struct arm64_cpu_capabilities *entry = data;
- if (psci_ops.get_version) + if (psci_ops.get_version) { + if (check_smccc_arch_workaround_1(entry)) + return 0; + install_bp_hardening_cb(entry, (bp_hardening_cb_t)psci_ops.get_version, __psci_hyp_bp_inval_start, __psci_hyp_bp_inval_end); + }
return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
From: Marc Zyngier marc.zyngier@arm.com
commit 3a0a397ff5ff8b56ca9f7908b75dee6bf0b5fabb upstream.
Now that we've standardised on SMCCC v1.1 to perform the branch prediction invalidation, let's drop the previous band-aid. If vendors haven't updated their firmware to do SMCCC 1.1, they haven't updated PSCI either, so we don't loose anything.
Tested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Mark Rutland mark.rutland@arm.com [v4.9 backport] Tested-by: Greg Hackmann ghackmann@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/bpi.S | 24 --------------------- arch/arm64/kernel/cpu_errata.c | 45 +++++++++++------------------------------ arch/arm64/kvm/hyp/switch.c | 14 ------------ 3 files changed, 13 insertions(+), 70 deletions(-)
--- a/arch/arm64/kernel/bpi.S +++ b/arch/arm64/kernel/bpi.S @@ -54,30 +54,6 @@ ENTRY(__bp_harden_hyp_vecs_start) vectors __kvm_hyp_vector .endr ENTRY(__bp_harden_hyp_vecs_end) -ENTRY(__psci_hyp_bp_inval_start) - sub sp, sp, #(8 * 18) - stp x16, x17, [sp, #(16 * 0)] - stp x14, x15, [sp, #(16 * 1)] - stp x12, x13, [sp, #(16 * 2)] - stp x10, x11, [sp, #(16 * 3)] - stp x8, x9, [sp, #(16 * 4)] - stp x6, x7, [sp, #(16 * 5)] - stp x4, x5, [sp, #(16 * 6)] - stp x2, x3, [sp, #(16 * 7)] - stp x0, x1, [sp, #(16 * 8)] - mov x0, #0x84000000 - smc #0 - ldp x16, x17, [sp, #(16 * 0)] - ldp x14, x15, [sp, #(16 * 1)] - ldp x12, x13, [sp, #(16 * 2)] - ldp x10, x11, [sp, #(16 * 3)] - ldp x8, x9, [sp, #(16 * 4)] - ldp x6, x7, [sp, #(16 * 5)] - ldp x4, x5, [sp, #(16 * 6)] - ldp x2, x3, [sp, #(16 * 7)] - ldp x0, x1, [sp, #(16 * 8)] - add sp, sp, #(8 * 18) -ENTRY(__psci_hyp_bp_inval_end)
.macro smccc_workaround_1 inst sub sp, sp, #(8 * 4) --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -53,7 +53,6 @@ static int cpu_enable_trap_ctr_access(vo DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
#ifdef CONFIG_KVM -extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[]; extern char __smccc_workaround_1_smc_start[]; extern char __smccc_workaround_1_smc_end[]; extern char __smccc_workaround_1_hvc_start[]; @@ -100,8 +99,6 @@ static void __install_bp_hardening_cb(bp spin_unlock(&bp_lock); } #else -#define __psci_hyp_bp_inval_start NULL -#define __psci_hyp_bp_inval_end NULL #define __smccc_workaround_1_smc_start NULL #define __smccc_workaround_1_smc_end NULL #define __smccc_workaround_1_hvc_start NULL @@ -146,24 +143,25 @@ static void call_hvc_arch_workaround_1(v arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL); }
-static bool check_smccc_arch_workaround_1(const struct arm64_cpu_capabilities *entry) +static int enable_smccc_arch_workaround_1(void *data) { + const struct arm64_cpu_capabilities *entry = data; bp_hardening_cb_t cb; void *smccc_start, *smccc_end; struct arm_smccc_res res;
if (!entry->matches(entry, SCOPE_LOCAL_CPU)) - return false; + return 0;
if (psci_ops.smccc_version == SMCCC_VERSION_1_0) - return false; + return 0;
switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_1, &res); if (res.a0) - return false; + return 0; cb = call_hvc_arch_workaround_1; smccc_start = __smccc_workaround_1_hvc_start; smccc_end = __smccc_workaround_1_hvc_end; @@ -173,35 +171,18 @@ static bool check_smccc_arch_workaround_ arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_1, &res); if (res.a0) - return false; + return 0; cb = call_smc_arch_workaround_1; smccc_start = __smccc_workaround_1_smc_start; smccc_end = __smccc_workaround_1_smc_end; break;
default: - return false; + return 0; }
install_bp_hardening_cb(entry, cb, smccc_start, smccc_end);
- return true; -} - -static int enable_psci_bp_hardening(void *data) -{ - const struct arm64_cpu_capabilities *entry = data; - - if (psci_ops.get_version) { - if (check_smccc_arch_workaround_1(entry)) - return 0; - - install_bp_hardening_cb(entry, - (bp_hardening_cb_t)psci_ops.get_version, - __psci_hyp_bp_inval_start, - __psci_hyp_bp_inval_end); - } - return 0; } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */ @@ -301,32 +282,32 @@ const struct arm64_cpu_capabilities arm6 { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, MIDR_ALL_VERSIONS(MIDR_CORTEX_A57), - .enable = enable_psci_bp_hardening, + .enable = enable_smccc_arch_workaround_1, }, { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), - .enable = enable_psci_bp_hardening, + .enable = enable_smccc_arch_workaround_1, }, { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, MIDR_ALL_VERSIONS(MIDR_CORTEX_A73), - .enable = enable_psci_bp_hardening, + .enable = enable_smccc_arch_workaround_1, }, { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, MIDR_ALL_VERSIONS(MIDR_CORTEX_A75), - .enable = enable_psci_bp_hardening, + .enable = enable_smccc_arch_workaround_1, }, { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN), - .enable = enable_psci_bp_hardening, + .enable = enable_smccc_arch_workaround_1, }, { .capability = ARM64_HARDEN_BRANCH_PREDICTOR, MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2), - .enable = enable_psci_bp_hardening, + .enable = enable_smccc_arch_workaround_1, }, #endif { --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -311,20 +311,6 @@ again: if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu)) goto again;
- if (exit_code == ARM_EXCEPTION_TRAP && - (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC64 || - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_HVC32)) { - u32 val = vcpu_get_reg(vcpu, 0); - - if (val == PSCI_0_2_FN_PSCI_VERSION) { - val = kvm_psci_version(vcpu, kern_hyp_va(vcpu->kvm)); - if (unlikely(val == KVM_ARM_PSCI_0_1)) - val = PSCI_RET_NOT_SUPPORTED; - vcpu_set_reg(vcpu, 0, val); - goto again; - } - } - if (static_branch_unlikely(&vgic_v2_cpuif_trap) && exit_code == ARM_EXCEPTION_TRAP) { bool valid;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit f3aefb6a7066e24bfea7fcf1b07907576de69d63 upstream.
make_checksum_hmac_md5() is allocating an HMAC transform and doing crypto API calls in the following order:
crypto_ahash_init() crypto_ahash_setkey() crypto_ahash_digest()
This is wrong because it makes no sense to init() the request before a key has been set, given that the initial state depends on the key. And digest() is short for init() + update() + final(), so in this case there's no need to explicitly call init() at all.
Before commit 9fa68f620041 ("crypto: hash - prevent using keyed hashes without setting key") the extra init() had no real effect, at least for the software HMAC implementation. (There are also hardware drivers that implement HMAC-MD5, and it's not immediately obvious how gracefully they handle init() before setkey().) But now the crypto API detects this incorrect initialization and returns -ENOKEY. This is breaking NFS mounts in some cases.
Fix it by removing the incorrect call to crypto_ahash_init().
Reported-by: Michael Young m.a.young@durham.ac.uk Fixes: 9fa68f620041 ("crypto: hash - prevent using keyed hashes without setting key") Fixes: fffdaef2eb4a ("gss_krb5: Add support for rc4-hmac encryption") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/auth_gss/gss_krb5_crypto.c | 3 --- 1 file changed, 3 deletions(-)
--- a/net/sunrpc/auth_gss/gss_krb5_crypto.c +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c @@ -237,9 +237,6 @@ make_checksum_hmac_md5(struct krb5_ctx *
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
- err = crypto_ahash_init(req); - if (err) - goto out; err = crypto_ahash_setkey(hmac_md5, cksumkey, kctx->gk5e->keylength); if (err) goto out;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 7525a238be8f46617cdda29d1be5b85ffe3b042d which is commit 94df1040b1e6aacd8dec0ba3c61d7e77cd695f26 upstream.
It breaks the build of perf on 4.9.y, so I'm dropping it.
Reported-by: Pavlos Parissis pavlos.parissis@gmail.com Reported-by: Lei Chen chenl.lei@gmail.com Reported-by: Maxime Hadjinlian maxime.hadjinlian@gmail.com Cc: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: David Ahern dsahern@gmail.com Cc: Peter Zijlstra a.p.zijlstra@chello.nl Cc: Wang Nan wangnan0@huawei.com Cc: kernel-team@lge.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Sasha Levin alexander.levin@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/tests/code-reading.c | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-)
--- a/tools/perf/tests/code-reading.c +++ b/tools/perf/tests/code-reading.c @@ -224,8 +224,6 @@ static int read_object_code(u64 addr, si unsigned char buf2[BUFSZ]; size_t ret_len; u64 objdump_addr; - const char *objdump_name; - char decomp_name[KMOD_DECOMP_LEN]; int ret;
pr_debug("Reading object code for memory address: %#"PRIx64"\n", addr); @@ -286,25 +284,9 @@ static int read_object_code(u64 addr, si state->done[state->done_cnt++] = al.map->start; }
- objdump_name = al.map->dso->long_name; - if (dso__needs_decompress(al.map->dso)) { - if (dso__decompress_kmodule_path(al.map->dso, objdump_name, - decomp_name, - sizeof(decomp_name)) < 0) { - pr_debug("decompression failed\n"); - return -1; - } - - objdump_name = decomp_name; - } - /* Read the object code using objdump */ objdump_addr = map__rip_2objdump(al.map, al.addr); - ret = read_via_objdump(objdump_name, objdump_addr, buf2, len); - - if (dso__needs_decompress(al.map->dso)) - unlink(objdump_name); - + ret = read_via_objdump(al.map->dso->long_name, objdump_addr, buf2, len); if (ret > 0) { /* * The kernel maps are inaccurate - assume objdump is right in
On 04/17/2018 08:59 AM, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 7525a238be8f46617cdda29d1be5b85ffe3b042d which is commit 94df1040b1e6aacd8dec0ba3c61d7e77cd695f26 upstream.
It breaks the build of perf on 4.9.y, so I'm dropping it.
Sorry to hijack this thread, I was not able to find the original email when the offending patch was included in 4.1.52. So kernel 4.1.52 also has the same problem, can you push a 4.1.53 tag with that patch reverted as well?
Thank you!
Reported-by: Pavlos Parissis pavlos.parissis@gmail.com Reported-by: Lei Chen chenl.lei@gmail.com Reported-by: Maxime Hadjinlian maxime.hadjinlian@gmail.com Cc: Namhyung Kim namhyung@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: David Ahern dsahern@gmail.com Cc: Peter Zijlstra a.p.zijlstra@chello.nl Cc: Wang Nan wangnan0@huawei.com Cc: kernel-team@lge.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Sasha Levin alexander.levin@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
tools/perf/tests/code-reading.c | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-)
--- a/tools/perf/tests/code-reading.c +++ b/tools/perf/tests/code-reading.c @@ -224,8 +224,6 @@ static int read_object_code(u64 addr, si unsigned char buf2[BUFSZ]; size_t ret_len; u64 objdump_addr;
- const char *objdump_name;
- char decomp_name[KMOD_DECOMP_LEN]; int ret;
pr_debug("Reading object code for memory address: %#"PRIx64"\n", addr); @@ -286,25 +284,9 @@ static int read_object_code(u64 addr, si state->done[state->done_cnt++] = al.map->start; }
- objdump_name = al.map->dso->long_name;
- if (dso__needs_decompress(al.map->dso)) {
if (dso__decompress_kmodule_path(al.map->dso, objdump_name,
decomp_name,
sizeof(decomp_name)) < 0) {
pr_debug("decompression failed\n");
return -1;
}
objdump_name = decomp_name;
- }
- /* Read the object code using objdump */ objdump_addr = map__rip_2objdump(al.map, al.addr);
- ret = read_via_objdump(objdump_name, objdump_addr, buf2, len);
- if (dso__needs_decompress(al.map->dso))
unlink(objdump_name);
- ret = read_via_objdump(al.map->dso->long_name, objdump_addr, buf2, len); if (ret > 0) { /*
- The kernel maps are inaccurate - assume objdump is right in
On Wed, Sep 05, 2018 at 11:50:02AM -0700, Florian Fainelli wrote:
On 04/17/2018 08:59 AM, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 7525a238be8f46617cdda29d1be5b85ffe3b042d which is commit 94df1040b1e6aacd8dec0ba3c61d7e77cd695f26 upstream.
It breaks the build of perf on 4.9.y, so I'm dropping it.
Sorry to hijack this thread, I was not able to find the original email when the offending patch was included in 4.1.52. So kernel 4.1.52 also has the same problem, can you push a 4.1.53 tag with that patch reverted as well?
4.1 is long end-of-life now, I'm not going to be going and updating a "dead' kernel for a perf build bug.
You shouldn't be using this kernel either :)
thanks,
greg k-h
On 09/05/2018 12:29 PM, Greg Kroah-Hartman wrote:
On Wed, Sep 05, 2018 at 11:50:02AM -0700, Florian Fainelli wrote:
On 04/17/2018 08:59 AM, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 7525a238be8f46617cdda29d1be5b85ffe3b042d which is commit 94df1040b1e6aacd8dec0ba3c61d7e77cd695f26 upstream.
It breaks the build of perf on 4.9.y, so I'm dropping it.
Sorry to hijack this thread, I was not able to find the original email when the offending patch was included in 4.1.52. So kernel 4.1.52 also has the same problem, can you push a 4.1.53 tag with that patch reverted as well?
4.1 is long end-of-life now, I'm not going to be going and updating a "dead' kernel for a perf build bug.
Meh, fair enough, I reverted the offending commit, it still points to a more fundamental problem, there is not always build testing of the tools being shipped with the kernel unfortunately.
You shouldn't be using this kernel either :)
Fortunately 4.9 is what we are mostly using these days, still getting occasional 4.1 kernel support requests unfortunately...
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 1e047eaab3bb5564f25b41e9cd3a053009f4e789 upstream.
syzbot is reporting deadlocks at __blkdev_get() [1].
---------------------------------------- [ 92.493919] systemd-udevd D12696 525 1 0x00000000 [ 92.495891] Call Trace: [ 92.501560] schedule+0x23/0x80 [ 92.502923] schedule_preempt_disabled+0x5/0x10 [ 92.504645] __mutex_lock+0x416/0x9e0 [ 92.510760] __blkdev_get+0x73/0x4f0 [ 92.512220] blkdev_get+0x12e/0x390 [ 92.518151] do_dentry_open+0x1c3/0x2f0 [ 92.519815] path_openat+0x5d9/0xdc0 [ 92.521437] do_filp_open+0x7d/0xf0 [ 92.527365] do_sys_open+0x1b8/0x250 [ 92.528831] do_syscall_64+0x6e/0x270 [ 92.530341] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 92.931922] 1 lock held by systemd-udevd/525: [ 92.933642] #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x73/0x4f0 ----------------------------------------
The reason of deadlock turned out that wait_event_interruptible() in blk_queue_enter() got stuck with bdev->bd_mutex held at __blkdev_put() due to q->mq_freeze_depth == 1.
---------------------------------------- [ 92.787172] a.out S12584 634 633 0x80000002 [ 92.789120] Call Trace: [ 92.796693] schedule+0x23/0x80 [ 92.797994] blk_queue_enter+0x3cb/0x540 [ 92.803272] generic_make_request+0xf0/0x3d0 [ 92.807970] submit_bio+0x67/0x130 [ 92.810928] submit_bh_wbc+0x15e/0x190 [ 92.812461] __block_write_full_page+0x218/0x460 [ 92.815792] __writepage+0x11/0x50 [ 92.817209] write_cache_pages+0x1ae/0x3d0 [ 92.825585] generic_writepages+0x5a/0x90 [ 92.831865] do_writepages+0x43/0xd0 [ 92.836972] __filemap_fdatawrite_range+0xc1/0x100 [ 92.838788] filemap_write_and_wait+0x24/0x70 [ 92.840491] __blkdev_put+0x69/0x1e0 [ 92.841949] blkdev_close+0x16/0x20 [ 92.843418] __fput+0xda/0x1f0 [ 92.844740] task_work_run+0x87/0xb0 [ 92.846215] do_exit+0x2f5/0xba0 [ 92.850528] do_group_exit+0x34/0xb0 [ 92.852018] SyS_exit_group+0xb/0x10 [ 92.853449] do_syscall_64+0x6e/0x270 [ 92.854944] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 92.943530] 1 lock held by a.out/634: [ 92.945105] #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x3c/0x1e0 ----------------------------------------
The reason of q->mq_freeze_depth == 1 turned out that loop_set_status() forgot to call blk_mq_unfreeze_queue() at error paths for info->lo_encrypt_type != NULL case.
---------------------------------------- [ 37.509497] CPU: 2 PID: 634 Comm: a.out Tainted: G W 4.16.0+ #457 [ 37.513608] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 37.518832] RIP: 0010:blk_freeze_queue_start+0x17/0x40 [ 37.521778] RSP: 0018:ffffb0c2013e7c60 EFLAGS: 00010246 [ 37.524078] RAX: 0000000000000000 RBX: ffff8b07b1519798 RCX: 0000000000000000 [ 37.527015] RDX: 0000000000000002 RSI: ffffb0c2013e7cc0 RDI: ffff8b07b1519798 [ 37.529934] RBP: ffffb0c2013e7cc0 R08: 0000000000000008 R09: 47a189966239b898 [ 37.532684] R10: dad78b99b278552f R11: 9332dca72259d5ef R12: ffff8b07acd73678 [ 37.535452] R13: 0000000000004c04 R14: 0000000000000000 R15: ffff8b07b841e940 [ 37.538186] FS: 00007fede33b9740(0000) GS:ffff8b07b8e80000(0000) knlGS:0000000000000000 [ 37.541168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.543590] CR2: 00000000206fdf18 CR3: 0000000130b30006 CR4: 00000000000606e0 [ 37.546410] Call Trace: [ 37.547902] blk_freeze_queue+0x9/0x30 [ 37.549968] loop_set_status+0x67/0x3c0 [loop] [ 37.549975] loop_set_status64+0x3b/0x70 [loop] [ 37.549986] lo_ioctl+0x223/0x810 [loop] [ 37.549995] blkdev_ioctl+0x572/0x980 [ 37.550003] block_ioctl+0x34/0x40 [ 37.550006] do_vfs_ioctl+0xa7/0x6d0 [ 37.550017] ksys_ioctl+0x6b/0x80 [ 37.573076] SyS_ioctl+0x5/0x10 [ 37.574831] do_syscall_64+0x6e/0x270 [ 37.576769] entry_SYSCALL_64_after_hwframe+0x42/0xb7 ----------------------------------------
[1] https://syzkaller.appspot.com/bug?id=cd662bc3f6022c0979d01a262c318fab2ee9b56...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot bot+48594378e9851eab70bcd6f99327c7db58c5a28a@syzkaller.appspotmail.com Fixes: ecdd09597a572513 ("block/loop: fix race between I/O and set_status") Cc: Ming Lei tom.leiming@gmail.com Cc: Dmitry Vyukov dvyukov@google.com Cc: stable stable@vger.kernel.org Cc: Jens Axboe axboe@fb.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/loop.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1110,11 +1110,15 @@ loop_set_status(struct loop_device *lo, if (info->lo_encrypt_type) { unsigned int type = info->lo_encrypt_type;
- if (type >= MAX_LO_CRYPT) - return -EINVAL; + if (type >= MAX_LO_CRYPT) { + err = -EINVAL; + goto exit; + } xfer = xfer_funcs[type]; - if (xfer == NULL) - return -EINVAL; + if (xfer == NULL) { + err = -EINVAL; + goto exit; + } } else xfer = NULL;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams dan.j.williams@intel.com
commit 8d0d8ed3356aa9ed43b819aaedd39b08ca453007 upstream.
Commit 1cf03c00e7c1 "nfit: scrub and register regions in a workqueue" mistakenly attempts to register a region per BLK aperture. There is nothing to register for individual apertures as they belong as a set to a BLK aperture group that are registered with a corresponding DIMM-control-region. Filter them for registration to prevent some needless devm_kzalloc() allocations.
Cc: stable@vger.kernel.org Fixes: 1cf03c00e7c1 ("nfit: scrub and register regions in a workqueue") Reviewed-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/nfit/core.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
--- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2547,15 +2547,21 @@ static void acpi_nfit_scrub(struct work_ static int acpi_nfit_register_regions(struct acpi_nfit_desc *acpi_desc) { struct nfit_spa *nfit_spa; - int rc;
- list_for_each_entry(nfit_spa, &acpi_desc->spas, list) - if (nfit_spa_type(nfit_spa->spa) == NFIT_SPA_DCR) { - /* BLK regions don't need to wait for ars results */ - rc = acpi_nfit_register_region(acpi_desc, nfit_spa); - if (rc) - return rc; - } + list_for_each_entry(nfit_spa, &acpi_desc->spas, list) { + int rc, type = nfit_spa_type(nfit_spa->spa); + + /* PMEM and VMEM will be registered by the ARS workqueue */ + if (type == NFIT_SPA_PM || type == NFIT_SPA_VOLATILE) + continue; + /* BLK apertures belong to BLK region registration below */ + if (type == NFIT_SPA_BDW) + continue; + /* BLK regions don't need to wait for ARS results */ + rc = acpi_nfit_register_region(acpi_desc, nfit_spa); + if (rc) + return rc; + }
queue_work(nfit_wq, &acpi_desc->work); return 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
commit dae55b6fef58530c13df074bcc182c096609339e upstream.
Immediate retry of EQBS after CCQ 96 means that we potentially misreport the state of buffers inspected during the first EQBS call.
This occurs when 1. the first EQBS finds all inspected buffers still in the initial state set by the driver (ie INPUT EMPTY or OUTPUT PRIMED), 2. the EQBS terminates early with CCQ 96, and 3. by the time that the second EQBS comes around, the state of those previously inspected buffers has changed.
If the state reported by the second EQBS is 'driver-owned', all we know is that the previous buffers are driver-owned now as well. But we can't tell if they all have the same state. So for instance - the second EQBS reports OUTPUT EMPTY, but any number of the previous buffers could be OUTPUT ERROR by now, - the second EQBS reports OUTPUT ERROR, but any number of the previous buffers could be OUTPUT EMPTY by now.
Effectively, this can result in both over- and underreporting of errors.
If the state reported by the second EQBS is 'HW-owned', that doesn't guarantee that the previous buffers have not been switched to driver-owned in the mean time. So for instance - the second EQBS reports INPUT EMPTY, but any number of the previous buffers could be INPUT PRIMED (or INPUT ERROR) by now.
This would result in failure to process pending work on the queue. If it's the final check before yielding initiative, this can cause a (temporary) queue stall due to IRQ avoidance.
Fixes: 25f269f17316 ("[S390] qdio: EQBS retry after CCQ 96") Cc: stable@vger.kernel.org #v3.2+ Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Reviewed-by: Benjamin Block bblock@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/qdio_main.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
--- a/drivers/s390/cio/qdio_main.c +++ b/drivers/s390/cio/qdio_main.c @@ -126,7 +126,7 @@ static inline int qdio_check_ccq(struct static int qdio_do_eqbs(struct qdio_q *q, unsigned char *state, int start, int count, int auto_ack) { - int rc, tmp_count = count, tmp_start = start, nr = q->nr, retried = 0; + int rc, tmp_count = count, tmp_start = start, nr = q->nr; unsigned int ccq = 0;
qperf_inc(q, eqbs); @@ -149,14 +149,7 @@ again: qperf_inc(q, eqbs_partial); DBF_DEV_EVENT(DBF_WARN, q->irq_ptr, "EQBS part:%02x", tmp_count); - /* - * Retry once, if that fails bail out and process the - * extracted buffers before trying again. - */ - if (!retried++) - goto again; - else - return count - tmp_count; + return count - tmp_count; }
DBF_ERROR("%4x EQBS ERROR", SCH_NO(q));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.vnet.ibm.com
commit 0cf1e05157b9e5530dcc3ca9fec9bf617fc93375 upstream.
On an Output queue, both EMPTY and PENDING buffer states imply that the buffer is ready for completion-processing by the upper-layer drivers.
So for a non-QEBSM Output queue, get_buf_states() merges mixed batches of PENDING and EMPTY buffers into one large batch of EMPTY buffers. The upper-layer driver (ie. qeth) later distuingishes PENDING from EMPTY by inspecting the slsb_state for QDIO_OUTBUF_STATE_FLAG_PENDING.
But the merge logic in get_buf_states() contains a bug that causes us to erronously also merge ERROR buffers into such a batch of EMPTY buffers (ERROR is 0xaf, EMPTY is 0xa1; so ERROR & EMPTY == EMPTY). Effectively, most outbound ERROR buffers are currently discarded silently and processed as if they had succeeded.
Note that this affects _all_ non-QEBSM device types, not just IQD with CQ.
Fix it by explicitly spelling out the exact conditions for merging.
For extracting the "get initial state" part out of the loop, this relies on the fact that get_buf_states() is never called with a count of 0. The QEBSM path already strictly requires this, and the two callers with variable 'count' make sure of it.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: stable@vger.kernel.org #v3.2+ Signed-off-by: Julian Wiedmann jwi@linux.vnet.ibm.com Reviewed-by: Ursula Braun ubraun@linux.vnet.ibm.com Reviewed-by: Benjamin Block bblock@linux.vnet.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/cio/qdio_main.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-)
--- a/drivers/s390/cio/qdio_main.c +++ b/drivers/s390/cio/qdio_main.c @@ -205,7 +205,10 @@ again: return 0; }
-/* returns number of examined buffers and their common state in *state */ +/* + * Returns number of examined buffers and their common state in *state. + * Requested number of buffers-to-examine must be > 0. + */ static inline int get_buf_states(struct qdio_q *q, unsigned int bufnr, unsigned char *state, unsigned int count, int auto_ack, int merge_pending) @@ -216,17 +219,23 @@ static inline int get_buf_states(struct if (is_qebsm(q)) return qdio_do_eqbs(q, state, bufnr, count, auto_ack);
- for (i = 0; i < count; i++) { - if (!__state) { - __state = q->slsb.val[bufnr]; - if (merge_pending && __state == SLSB_P_OUTPUT_PENDING) - __state = SLSB_P_OUTPUT_EMPTY; - } else if (merge_pending) { - if ((q->slsb.val[bufnr] & __state) != __state) - break; - } else if (q->slsb.val[bufnr] != __state) - break; + /* get initial state: */ + __state = q->slsb.val[bufnr]; + if (merge_pending && __state == SLSB_P_OUTPUT_PENDING) + __state = SLSB_P_OUTPUT_EMPTY; + + for (i = 1; i < count; i++) { bufnr = next_buf(bufnr); + + /* merge PENDING into EMPTY: */ + if (merge_pending && + q->slsb.val[bufnr] == SLSB_P_OUTPUT_PENDING && + __state == SLSB_P_OUTPUT_EMPTY) + continue; + + /* stop if next state differs from initial state: */ + if (q->slsb.val[bufnr] != __state) + break; } *state = __state; return i;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Gorbik gor@linux.ibm.com
commit 15deb080a6087b73089139569558965750e69d67 upstream.
When loadparm is set in reipl parm block, the kernel should also set DIAG308_FLAGS_LP_VALID flag.
This fixes loadparm ignoring during z/VM fcp -> ccw reipl and kvm direct boot -> ccw reipl.
Cc: stable@vger.kernel.org Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/ipl.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/s390/kernel/ipl.c +++ b/arch/s390/kernel/ipl.c @@ -798,6 +798,7 @@ static ssize_t reipl_generic_loadparm_st /* copy and convert to ebcdic */ memcpy(ipb->hdr.loadparm, buf, lp_len); ASCEBC(ipb->hdr.loadparm, LOADPARM_LEN); + ipb->hdr.flags |= DIAG308_FLAGS_LP_VALID; return len; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 30ce4d1903e1d8a7ccd110860a5eef3c638ed8be upstream.
missed it in "kill struct filename.separate" several years ago.
Cc: stable@vger.kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/namei.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/namei.c +++ b/fs/namei.c @@ -221,9 +221,10 @@ getname_kernel(const char * filename) if (len <= EMBEDDED_NAME_MAX) { result->name = (char *)result->iname; } else if (len <= PATH_MAX) { + const size_t size = offsetof(struct filename, iname[1]); struct filename *tmp;
- tmp = kmalloc(sizeof(*tmp), GFP_KERNEL); + tmp = kmalloc(size, GFP_KERNEL); if (unlikely(!tmp)) { __putname(result); return ERR_PTR(-ENOMEM);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Szymon Janc szymon.janc@codecoup.pl
commit 082f2300cfa1a3d9d5221c38c5eba85d4ab98bd8 upstream.
Local random address needs to be updated before creating connection if RPA from LE Direct Advertising Report was resolved in host. Otherwise remote device might ignore connection request due to address mismatch.
This was affecting following qualification test cases: GAP/CONN/SCEP/BV-03-C, GAP/CONN/GCEP/BV-05-C, GAP/CONN/DCEP/BV-05-C
Before patch: < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #11350 [hci0] 84680.231216 Address: 56:BC:E8:24:11:68 (Resolvable) Identity type: Random (0x01) Identity: F2:F1:06:3D:9C:42 (Static)
HCI Event: Command Complete (0x0e) plen 4 #11351 [hci0] 84680.246022
LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #11352 [hci0] 84680.246417 Type: Passive (0x00) Interval: 60.000 msec (0x0060) Window: 30.000 msec (0x0030) Own address type: Random (0x01) Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
HCI Event: Command Complete (0x0e) plen 4 #11353 [hci0] 84680.248854
LE Set Scan Parameters (0x08|0x000b) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #11354 [hci0] 84680.249466 Scanning: Enabled (0x01) Filter duplicates: Enabled (0x01)
HCI Event: Command Complete (0x0e) plen 4 #11355 [hci0] 84680.253222
LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00)
HCI Event: LE Meta Event (0x3e) plen 18 #11356 [hci0] 84680.458387
LE Direct Advertising Report (0x0b) Num reports: 1 Event type: Connectable directed - ADV_DIRECT_IND (0x01) Address type: Random (0x01) Address: 53:38:DA:46:8C:45 (Resolvable) Identity type: Public (0x00) Identity: 11:22:33:44:55:66 (OUI 11-22-33) Direct address type: Random (0x01) Direct address: 7C:D6:76:8C:DF:82 (Resolvable) Identity type: Random (0x01) Identity: F2:F1:06:3D:9C:42 (Static) RSSI: -74 dBm (0xb6) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #11357 [hci0] 84680.458737 Scanning: Disabled (0x00) Filter duplicates: Disabled (0x00)
HCI Event: Command Complete (0x0e) plen 4 #11358 [hci0] 84680.469982
LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #11359 [hci0] 84680.470444 Scan interval: 60.000 msec (0x0060) Scan window: 60.000 msec (0x0060) Filter policy: White list is not used (0x00) Peer address type: Random (0x01) Peer address: 53:38:DA:46:8C:45 (Resolvable) Identity type: Public (0x00) Identity: 11:22:33:44:55:66 (OUI 11-22-33) Own address type: Random (0x01) Min connection interval: 30.00 msec (0x0018) Max connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Min connection length: 0.000 msec (0x0000) Max connection length: 0.000 msec (0x0000)
HCI Event: Command Status (0x0f) plen 4 #11360 [hci0] 84680.474971
LE Create Connection (0x08|0x000d) ncmd 1 Status: Success (0x00) < HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #11361 [hci0] 84682.545385
HCI Event: Command Complete (0x0e) plen 4 #11362 [hci0] 84682.551014
LE Create Connection Cancel (0x08|0x000e) ncmd 1 Status: Success (0x00)
HCI Event: LE Meta Event (0x3e) plen 19 #11363 [hci0] 84682.551074
LE Connection Complete (0x01) Status: Unknown Connection Identifier (0x02) Handle: 0 Role: Master (0x00) Peer address type: Public (0x00) Peer address: 00:00:00:00:00:00 (OUI 00-00-00) Connection interval: 0.00 msec (0x0000) Connection latency: 0 (0x0000) Supervision timeout: 0 msec (0x0000) Master clock accuracy: 0x00
After patch: < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #210 [hci0] 667.152459 Type: Passive (0x00) Interval: 60.000 msec (0x0060) Window: 30.000 msec (0x0030) Own address type: Random (0x01) Filter policy: Accept all advertisement, inc. directed unresolved RPA (0x02)
HCI Event: Command Complete (0x0e) plen 4 #211 [hci0] 667.153613
LE Set Scan Parameters (0x08|0x000b) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #212 [hci0] 667.153704 Scanning: Enabled (0x01) Filter duplicates: Enabled (0x01)
HCI Event: Command Complete (0x0e) plen 4 #213 [hci0] 667.154584
LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00)
HCI Event: LE Meta Event (0x3e) plen 18 #214 [hci0] 667.182619
LE Direct Advertising Report (0x0b) Num reports: 1 Event type: Connectable directed - ADV_DIRECT_IND (0x01) Address type: Random (0x01) Address: 50:52:D9:A6:48:A0 (Resolvable) Identity type: Public (0x00) Identity: 11:22:33:44:55:66 (OUI 11-22-33) Direct address type: Random (0x01) Direct address: 7C:C1:57:A5:B7:A8 (Resolvable) Identity type: Random (0x01) Identity: F4:28:73:5D:38:B0 (Static) RSSI: -70 dBm (0xba) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #215 [hci0] 667.182704 Scanning: Disabled (0x00) Filter duplicates: Disabled (0x00)
HCI Event: Command Complete (0x0e) plen 4 #216 [hci0] 667.183599
LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #217 [hci0] 667.183645 Address: 7C:C1:57:A5:B7:A8 (Resolvable) Identity type: Random (0x01) Identity: F4:28:73:5D:38:B0 (Static)
HCI Event: Command Complete (0x0e) plen 4 #218 [hci0] 667.184590
LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #219 [hci0] 667.184613 Scan interval: 60.000 msec (0x0060) Scan window: 60.000 msec (0x0060) Filter policy: White list is not used (0x00) Peer address type: Random (0x01) Peer address: 50:52:D9:A6:48:A0 (Resolvable) Identity type: Public (0x00) Identity: 11:22:33:44:55:66 (OUI 11-22-33) Own address type: Random (0x01) Min connection interval: 30.00 msec (0x0018) Max connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Min connection length: 0.000 msec (0x0000) Max connection length: 0.000 msec (0x0000)
HCI Event: Command Status (0x0f) plen 4 #220 [hci0] 667.186558
LE Create Connection (0x08|0x000d) ncmd 1 Status: Success (0x00)
HCI Event: LE Meta Event (0x3e) plen 19 #221 [hci0] 667.485824
LE Connection Complete (0x01) Status: Success (0x00) Handle: 0 Role: Master (0x00) Peer address type: Random (0x01) Peer address: 50:52:D9:A6:48:A0 (Resolvable) Identity type: Public (0x00) Identity: 11:22:33:44:55:66 (OUI 11-22-33) Connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Master clock accuracy: 0x07 @ MGMT Event: Device Connected (0x000b) plen 13 {0x0002} [hci0] 667.485996 LE Address: 11:22:33:44:55:66 (OUI 11-22-33) Flags: 0x00000000 Data length: 0
Signed-off-by: Szymon Janc szymon.janc@codecoup.pl Signed-off-by: Marcel Holtmann marcel@holtmann.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/bluetooth/hci_core.h | 2 +- net/bluetooth/hci_conn.c | 29 +++++++++++++++++++++-------- net/bluetooth/hci_event.c | 15 +++++++++++---- net/bluetooth/l2cap_core.c | 2 +- 4 files changed, 34 insertions(+), 14 deletions(-)
--- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -893,7 +893,7 @@ struct hci_conn *hci_connect_le_scan(str u16 conn_timeout); struct hci_conn *hci_connect_le(struct hci_dev *hdev, bdaddr_t *dst, u8 dst_type, u8 sec_level, u16 conn_timeout, - u8 role); + u8 role, bdaddr_t *direct_rpa); struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst, u8 sec_level, u8 auth_type); struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst, --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -749,18 +749,31 @@ static bool conn_use_rpa(struct hci_conn }
static void hci_req_add_le_create_conn(struct hci_request *req, - struct hci_conn *conn) + struct hci_conn *conn, + bdaddr_t *direct_rpa) { struct hci_cp_le_create_conn cp; struct hci_dev *hdev = conn->hdev; u8 own_addr_type;
- /* Update random address, but set require_privacy to false so - * that we never connect with an non-resolvable address. + /* If direct address was provided we use it instead of current + * address. */ - if (hci_update_random_address(req, false, conn_use_rpa(conn), - &own_addr_type)) - return; + if (direct_rpa) { + if (bacmp(&req->hdev->random_addr, direct_rpa)) + hci_req_add(req, HCI_OP_LE_SET_RANDOM_ADDR, 6, + direct_rpa); + + /* direct address is always RPA */ + own_addr_type = ADDR_LE_DEV_RANDOM; + } else { + /* Update random address, but set require_privacy to false so + * that we never connect with an non-resolvable address. + */ + if (hci_update_random_address(req, false, conn_use_rpa(conn), + &own_addr_type)) + return; + }
memset(&cp, 0, sizeof(cp));
@@ -825,7 +838,7 @@ static void hci_req_directed_advertising
struct hci_conn *hci_connect_le(struct hci_dev *hdev, bdaddr_t *dst, u8 dst_type, u8 sec_level, u16 conn_timeout, - u8 role) + u8 role, bdaddr_t *direct_rpa) { struct hci_conn_params *params; struct hci_conn *conn; @@ -940,7 +953,7 @@ struct hci_conn *hci_connect_le(struct h hci_dev_set_flag(hdev, HCI_LE_SCAN_INTERRUPTED); }
- hci_req_add_le_create_conn(&req, conn); + hci_req_add_le_create_conn(&req, conn, direct_rpa);
create_conn: err = hci_req_run(&req, create_le_conn_complete); --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -4646,7 +4646,8 @@ static void hci_le_conn_update_complete_ /* This function requires the caller holds hdev->lock */ static struct hci_conn *check_pending_le_conn(struct hci_dev *hdev, bdaddr_t *addr, - u8 addr_type, u8 adv_type) + u8 addr_type, u8 adv_type, + bdaddr_t *direct_rpa) { struct hci_conn *conn; struct hci_conn_params *params; @@ -4697,7 +4698,8 @@ static struct hci_conn *check_pending_le }
conn = hci_connect_le(hdev, addr, addr_type, BT_SECURITY_LOW, - HCI_LE_AUTOCONN_TIMEOUT, HCI_ROLE_MASTER); + HCI_LE_AUTOCONN_TIMEOUT, HCI_ROLE_MASTER, + direct_rpa); if (!IS_ERR(conn)) { /* If HCI_AUTO_CONN_EXPLICIT is set, conn is already owned * by higher layer that tried to connect, if no then @@ -4807,8 +4809,13 @@ static void process_adv_report(struct hc bdaddr_type = irk->addr_type; }
- /* Check if we have been requested to connect to this device */ - conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type); + /* Check if we have been requested to connect to this device. + * + * direct_addr is set only for directed advertising reports (it is NULL + * for advertising reports) and is already verified to be RPA above. + */ + conn = check_pending_le_conn(hdev, bdaddr, bdaddr_type, type, + direct_addr); if (conn && type == LE_ADV_IND) { /* Store report for later inclusion by * mgmt_device_connected --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -7148,7 +7148,7 @@ int l2cap_chan_connect(struct l2cap_chan hcon = hci_connect_le(hdev, dst, dst_type, chan->sec_level, HCI_LE_CONN_TIMEOUT, - HCI_ROLE_SLAVE); + HCI_ROLE_SLAVE, NULL); else hcon = hci_connect_le_scan(hdev, dst, dst_type, chan->sec_level,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sudhir Sreedharan ssreedharan@mvista.com
commit 7972326a26b5bf8dc2adac575c4e03ee7e9d193a upstream.
This can be reproduced by bind/unbind the driver multiple times in AM3517 board.
Analysis revealed that rtl8187_start() was invoked before probe finishes(ie. before the mutex is initialized).
INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 821 Comm: wpa_supplicant Not tainted 4.9.80-dirty #250 Hardware name: Generic AM3517 (Flattened Device Tree) [<c010e0d8>] (unwind_backtrace) from [<c010beac>] (show_stack+0x10/0x14) [<c010beac>] (show_stack) from [<c017401c>] (register_lock_class+0x4f4/0x55c) [<c017401c>] (register_lock_class) from [<c0176fe0>] (__lock_acquire+0x74/0x1938) [<c0176fe0>] (__lock_acquire) from [<c0178cfc>] (lock_acquire+0xfc/0x23c) [<c0178cfc>] (lock_acquire) from [<c08aa2f8>] (mutex_lock_nested+0x50/0x3b0) [<c08aa2f8>] (mutex_lock_nested) from [<c05f5bf8>] (rtl8187_start+0x2c/0xd54) [<c05f5bf8>] (rtl8187_start) from [<c082dea0>] (drv_start+0xa8/0x320) [<c082dea0>] (drv_start) from [<c084d1d4>] (ieee80211_do_open+0x2bc/0x8e4) [<c084d1d4>] (ieee80211_do_open) from [<c069be94>] (__dev_open+0xb8/0x120) [<c069be94>] (__dev_open) from [<c069c11c>] (__dev_change_flags+0x88/0x14c) [<c069c11c>] (__dev_change_flags) from [<c069c1f8>] (dev_change_flags+0x18/0x48) [<c069c1f8>] (dev_change_flags) from [<c0710b08>] (devinet_ioctl+0x738/0x840) [<c0710b08>] (devinet_ioctl) from [<c067925c>] (sock_ioctl+0x164/0x2f4) [<c067925c>] (sock_ioctl) from [<c02883f8>] (do_vfs_ioctl+0x8c/0x9d0) [<c02883f8>] (do_vfs_ioctl) from [<c0288da8>] (SyS_ioctl+0x6c/0x7c) [<c0288da8>] (SyS_ioctl) from [<c0107760>] (ret_fast_syscall+0x0/0x1c) Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = cd1ec000 [00000000] *pgd=8d1de831, *pte=00000000, *ppte=00000000 Internal error: Oops: 817 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 821 Comm: wpa_supplicant Not tainted 4.9.80-dirty #250 Hardware name: Generic AM3517 (Flattened Device Tree) task: ce73eec0 task.stack: cd1ea000 PC is at mutex_lock_nested+0xe8/0x3b0 LR is at mutex_lock_nested+0xd0/0x3b0
Cc: stable@vger.kernel.org Signed-off-by: Sudhir Sreedharan ssreedharan@mvista.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c +++ b/drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c @@ -1454,6 +1454,7 @@ static int rtl8187_probe(struct usb_inte goto err_free_dev; } mutex_init(&priv->io_mutex); + mutex_init(&priv->conf_mutex);
SET_IEEE80211_DEV(dev, &intf->dev); usb_set_intfdata(intf, dev); @@ -1627,7 +1628,6 @@ static int rtl8187_probe(struct usb_inte printk(KERN_ERR "rtl8187: Cannot register device\n"); goto err_free_dmabuf; } - mutex_init(&priv->conf_mutex); skb_queue_head_init(&priv->b_tx_status.queue);
wiphy_info(dev->wiphy, "hwaddr %pM, %s V%d + %s, rfkill mask %d\n",
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marek Szyprowski m.szyprowski@samsung.com
commit 0c4c5860e9983eb3da7a3d73ca987643c3ed034b upstream.
Initialize data->config_lock mutex before it is used by the driver code.
This fixes following warning on Odroid XU3 boards:
INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc7-next-20180115-00001-gb75575dee3f2 #107 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c0111504>] (unwind_backtrace) from [<c010dbec>] (show_stack+0x10/0x14) [<c010dbec>] (show_stack) from [<c09b3f74>] (dump_stack+0x90/0xc8) [<c09b3f74>] (dump_stack) from [<c0179528>] (register_lock_class+0x1c0/0x59c) [<c0179528>] (register_lock_class) from [<c017bd1c>] (__lock_acquire+0x78/0x1850) [<c017bd1c>] (__lock_acquire) from [<c017de30>] (lock_acquire+0xc8/0x2b8) [<c017de30>] (lock_acquire) from [<c09ca59c>] (__mutex_lock+0x60/0xa0c) [<c09ca59c>] (__mutex_lock) from [<c09cafd0>] (mutex_lock_nested+0x1c/0x24) [<c09cafd0>] (mutex_lock_nested) from [<c068b0d0>] (ina2xx_set_shunt+0x70/0xb0) [<c068b0d0>] (ina2xx_set_shunt) from [<c068b218>] (ina2xx_probe+0x88/0x1b0) [<c068b218>] (ina2xx_probe) from [<c0673d90>] (i2c_device_probe+0x1e0/0x2d0) [<c0673d90>] (i2c_device_probe) from [<c053a268>] (driver_probe_device+0x2b8/0x4a0) [<c053a268>] (driver_probe_device) from [<c053a54c>] (__driver_attach+0xfc/0x120) [<c053a54c>] (__driver_attach) from [<c05384cc>] (bus_for_each_dev+0x58/0x7c) [<c05384cc>] (bus_for_each_dev) from [<c0539590>] (bus_add_driver+0x174/0x250) [<c0539590>] (bus_add_driver) from [<c053b5e0>] (driver_register+0x78/0xf4) [<c053b5e0>] (driver_register) from [<c0675ef0>] (i2c_register_driver+0x38/0xa8) [<c0675ef0>] (i2c_register_driver) from [<c0102b40>] (do_one_initcall+0x48/0x18c) [<c0102b40>] (do_one_initcall) from [<c0e00df0>] (kernel_init_freeable+0x110/0x1d4) [<c0e00df0>] (kernel_init_freeable) from [<c09c8120>] (kernel_init+0x8/0x114) [<c09c8120>] (kernel_init) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Fixes: 5d389b125186 ("hwmon: (ina2xx) Make calibration register value fixed") Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Guenter Roeck linux@roeck-us.net [backport to v4.4.y/v4.9.y: context changes] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/ina2xx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/hwmon/ina2xx.c +++ b/drivers/hwmon/ina2xx.c @@ -447,6 +447,7 @@ static int ina2xx_probe(struct i2c_clien
/* set the device type */ data->config = &ina2xx_config[id->driver_data]; + mutex_init(&data->config_lock);
if (of_property_read_u32(dev->of_node, "shunt-resistor", &val) < 0) { struct ina2xx_platform_data *pdata = dev_get_platdata(dev); @@ -473,8 +474,6 @@ static int ina2xx_probe(struct i2c_clien return -ENODEV; }
- mutex_init(&data->config_lock); - data->groups[group++] = &ina2xx_group; if (id->driver_data == ina226) data->groups[group++] = &ina226_group;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bassem Boubaker bassem.boubaker@actia.fr
[ Upstream commit 53765341ee821c0a0f1dec41adc89c9096ad694c ]
The Cinterion AHS8 is a 3G device with one embedded WWAN interface using cdc_ether as a driver.
The modem is controlled via AT commands through the exposed TTYs.
AT+CGDCONT write command can be used to activate or deactivate a WWAN connection for a PDP context defined with the same command. UE supports one WWAN adapter.
Signed-off-by: Bassem Boubaker bassem.boubaker@actia.fr Acked-by: Oliver Neukum oneukum@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/cdc_ether.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -774,6 +774,12 @@ static const struct usb_device_id produc USB_CDC_PROTO_NONE), .driver_info = (unsigned long)&wwan_info, }, { + /* Cinterion AHS3 modem by GEMALTO */ + USB_DEVICE_AND_INTERFACE_INFO(0x1e2d, 0x0055, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, + USB_CDC_PROTO_NONE), + .driver_info = (unsigned long)&wwan_info, +}, { /* Telit modules */ USB_VENDOR_AND_INTERFACE_INFO(0x1bc7, USB_CLASS_COMM, USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ka-Cheong Poon ka-cheong.poon@oracle.com
[ Upstream commit a43cced9a348901f9015f4730b70b69e7c41a9c9 ]
rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to send a message. Suppose the RDS connection is not yet up. In rds_send_mprds_hash(), it does
if (conn->c_npaths == 0) wait_event_interruptible(conn->c_hs_waitq, (conn->c_npaths != 0));
If it is interrupted before the connection is set up, rds_send_mprds_hash() will return a non-zero hash value. Hence rds_sendmsg() will use a non-zero c_path to send the message. But if the RDS connection ends up to be non-MP capable, the message will be lost as only the zero c_path can be used.
Signed-off-by: Ka-Cheong Poon ka-cheong.poon@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/send.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/net/rds/send.c +++ b/net/rds/send.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 Oracle. All rights reserved. + * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -983,10 +983,15 @@ static int rds_send_mprds_hash(struct rd if (conn->c_npaths == 0 && hash != 0) { rds_send_ping(conn);
- if (conn->c_npaths == 0) { - wait_event_interruptible(conn->c_hs_waitq, - (conn->c_npaths != 0)); - } + /* The underlying connection is not up yet. Need to wait + * until it is up to be sure that the non-zero c_path can be + * used. But if we are interrupted, we have to use the zero + * c_path in case the connection ends up being non-MP capable. + */ + if (conn->c_npaths == 0) + if (wait_event_interruptible(conn->c_hs_waitq, + conn->c_npaths != 0)) + hash = 0; if (conn->c_npaths == 1) hash = 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejaswi Tanikella tejaswit@codeaurora.org
[ Upstream commit 3f01ddb962dc506916c243f9524e8bef97119b77 ]
On receiving a packet the state index points to the rstate which must be used to fill up IP and TCP headers. But if the state index points to a rstate which is unitialized, i.e. filled with zeros, it gets stuck in an infinite loop inside ip_fast_csum trying to compute the ip checsum of a header with zero length.
89.666953: <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468 89.666965: <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c 89.666978: <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0 89.666991: <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198 89.667005: <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370 89.667027: <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250 89.667040: <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60 89.667053: <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c 89.667065: <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38 89.667085: <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154 89.667098: <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c 89.667117: <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c 89.667131: <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c
./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output: ip_fast_csum at arch/arm64/include/asm/checksum.h:40 (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615
Adding a variable to indicate if the current rstate is initialized. If such a packet arrives, move to toss state.
Signed-off-by: Tejaswi Tanikella tejaswit@codeaurora.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/slip/slhc.c | 5 +++++ include/net/slhc_vj.h | 1 + 2 files changed, 6 insertions(+)
--- a/drivers/net/slip/slhc.c +++ b/drivers/net/slip/slhc.c @@ -509,6 +509,10 @@ slhc_uncompress(struct slcompress *comp, if(x < 0 || x > comp->rslot_limit) goto bad;
+ /* Check if the cstate is initialized */ + if (!comp->rstate[x].initialized) + goto bad; + comp->flags &=~ SLF_TOSS; comp->recv_current = x; } else { @@ -673,6 +677,7 @@ slhc_remember(struct slcompress *comp, u if (cs->cs_tcp.doff > 5) memcpy(cs->cs_tcpopt, icp + ihl*4 + sizeof(struct tcphdr), (cs->cs_tcp.doff - 5) * 4); cs->cs_hsize = ihl*2 + cs->cs_tcp.doff*2; + cs->initialized = true; /* Put headers back on packet * Neither header checksum is recalculated */ --- a/include/net/slhc_vj.h +++ b/include/net/slhc_vj.h @@ -127,6 +127,7 @@ typedef __u32 int32; */ struct cstate { byte_t cs_this; /* connection id number (xmit) */ + bool initialized; /* true if initialized */ struct cstate *next; /* next in ring (xmit) */ struct iphdr cs_ip; /* ip/tcp hdr from most recent packet */ struct tcphdr cs_tcp;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Hajnoczi stefanha@redhat.com
[ Upstream commit d14d2b78090c7de0557362b26a4ca591aa6a9faa ]
Commit d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log when IOTLB is enabled") introduced a regression. The logic was originally:
if (vq->iotlb) return 1; return A && B;
After the patch the short-circuit logic for A was inverted:
if (A || vq->iotlb) return A; return B;
This patch fixes the regression by rewriting the checks in the obvious way, no longer returning A when vq->iotlb is non-NULL (which is hard to understand).
Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com Cc: Jason Wang jasowang@redhat.com Signed-off-by: Stefan Hajnoczi stefanha@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/vhost.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1175,10 +1175,12 @@ static int vq_log_access_ok(struct vhost /* Caller should have vq mutex and device mutex */ int vhost_vq_access_ok(struct vhost_virtqueue *vq) { - int ret = vq_log_access_ok(vq, vq->log_base); + if (!vq_log_access_ok(vq, vq->log_base)) + return 0;
- if (ret || vq->iotlb) - return ret; + /* Access validation occurs at prefetch time with IOTLB */ + if (vq->iotlb) + return 1;
return vq_access_ok(vq, vq->num, vq->desc, vq->avail, vq->used); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Elwell phil@raspberrypi.org
[ Upstream commit 4bfc33807a9a02764bdd1e42e794b3b401240f27 ]
lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP content, but the value gets overwritten before it is returned and the read goes ahead anyway. Make the read conditional as it should be and preserve the error code.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Phil Elwell phil@raspberrypi.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/lan78xx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -873,7 +873,8 @@ static int lan78xx_read_otp(struct lan78 offset += 0x100; else ret = -EINVAL; - ret = lan78xx_read_raw_otp(dev, offset, length, data); + if (!ret) + ret = lan78xx_read_raw_otp(dev, offset, length, data); }
return ret;
On 04/17/2018 09:58 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.95 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Apr 19 15:56:27 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.95-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Tue, Apr 17, 2018 at 05:58:33PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.95 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Apr 19 15:56:27 UTC 2018. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 137 pass: 137 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
On Tue, Apr 17, 2018 at 05:58:33PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.95 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Apr 19 15:56:27 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.95-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
We've noticed a regression on arm32 in sendto() and send() system calls, causing them to hang forever.
The easiest way we have found to reproduce, is to simply run 'ip link'. When running 'strace -T ip link' we see:
... getsockname(3, {sa_family=AF_NETLINK, nl_pid=364, nl_groups=00000000}, [12]) = 0 <0.000079> .... long wait (10 minute) and then eventually: send(3, {{len=40, type=0x12 /* NLMSG_??? */, flags=NLM_F_REQUEST|0x300, seq=1522961259, pid=0}, "\21\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\35\0\1\0\0\0"}, 40, 0[ 807.868633] random: crng init done
Ignore the 'random: crng init done' - I believe it is causing the dmesg buffer to print which is what gets us our send() output.
We saw a similar strace hang on sendto() with the ltp 'gethostid01' test.
We've also observed the following kernel log during boot on the x15/arm32 device:
[ 13.555030] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 13.580352] net eth0: initializing cpsw version 1.15 (0) [ 13.708220] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [ 13.716375] pgd = ed724f40 [ 13.719126] [00000008] *pgd=ac173003, *pmd=fb03d003 [ 13.724068] Internal error: Oops: 207 [#1] SMP ARM [ 13.728882] Modules linked in: snd_soc_simple_card snd_soc_simple_card_utils snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore ac97_bus [ 13.742337] CPU: 0 PID: 243 Comm: NetworkManager Not tainted 4.9.95-rc1 #1 [ 13.749241] Hardware name: Generic DRA74X (Flattened Device Tree) [ 13.755360] task: ed6a6e40 task.stack: ec114000 [ 13.759916] PC is at kszphy_config_reset+0x1c/0x150 [ 13.764815] LR is at kszphy_resume+0x24/0x64 [ 13.769104] pc : [<c0b6f680>] lr : [<c0b6f940>] psr: 600e0113 [ 13.769104] sp : ec115920 ip : ec115940 fp : ec11593c [ 13.780630] r10: 00000000 r9 : 00000007 r8 : 00000000 [ 13.785877] r7 : ed049800 r6 : 00000000 r5 : ee1d3800 r4 : ed049c00 [ 13.792431] r3 : 00000001 r2 : 00000000 r1 : 00000110 r0 : ed049c00 [ 13.798987] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 13.806152] Control: 30c5387d Table: ad724f40 DAC: fffffffd [ 13.811923] Process NetworkManager (pid: 243, stack limit = 0xec114220) [ 13.818564] Stack: (0xec115920 to 0xec116000) [ 13.822941] 5920: ed049c00 ee1d3800 00000000 ed049800 ec115954 ec115940 c0b6f940 c0b6f670 [ 13.831154] 5940: ed049c00 ee1d3800 ec115984 ec115958 c0b693f8 c0b6f928 00000007 ed049c00 [ 13.839368] 5960: ed049c00 c0c11cec c0c11cec 00000007 ee1d3e0c 00000000 ec1159a4 ec115988 [ 13.847582] 5980: c0b69600 c0b69340 ee6af854 ed049c00 ee1d3800 c0c11cec ec1159cc ec1159a8 [ 13.855795] 59a0: c0b6968c c0b695e8 ed040310 ee1d3e00 00000000 ee4fbc10 ee4fbc10 ee1d3e0c [ 13.864009] 59c0: ec115a04 ec1159d0 c0c0e47c c0b69644 00000001 00000000 c04d5160 ee4fbc10 [ 13.872222] 59e0: ee1d3e00 ee1d3800 ee4fbc10 00000000 00000001 ed3aad80 ec115aa4 ec115a08 [ 13.880436] 5a00: c0c11634 c0c0e258 00000000 c0476cec c0ef92e0 c0476cec c0df67b8 fffffff4 [ 13.888647] 5a20: ec115aac fffffff3 ec115aac 0000000d 00000000 00000000 ec115a6c ec115a48 [ 13.896861] 5a40: c0476cec c0eba8ac ec115aac 0000000d ee1d3800 00001002 00000000 ed5a7810 [ 13.905074] 5a60: ec115a84 ec115a70 c0476ed0 c0476ca4 00000000 c0de20fc ec115aa4 ee1d3800 [ 13.913287] 5a80: 00000000 c1193410 ee1d3830 00000000 ed5a7810 ed3aad80 ec115acc ec115aa8 [ 13.921500] 5aa0: c0deeb64 c0c1118c ec115acc ee1d3800 ee1d3800 00001003 00000001 00001002 [ 13.929713] 5ac0: ec115af4 ec115ad0 c0deee4c c0deeab4 ee1d3800 00001002 00000000 ee1d394c [ 13.937925] 5ae0: 00000000 ed5a7810 ec115b1c ec115af8 c0deef24 c0deedb4 ee1d3800 ec115c28 [ 13.946137] 5b00: 00000000 c1193410 00000000 ed5a7810 ec115b8c ec115b20 c0e025f0 c0deef08 [ 13.954351] 5b20: c0491510 c04aa8d4 00000000 ec115b38 00000000 00000003 00000000 00000000 [ 13.962563] 5b40: 00000000 c230d40c ed6a7338 00000001 00000002 c1a58e78 c1a80870 ed6a6e40 [ 13.970775] 5b60: ec115bfc 00000000 ee1d3800 00000000 00000000 00000000 00000000 ed5a7800 [ 13.978988] 5b80: ec115d04 ec115b90 c0e04554 c0e0230c ec115bc4 00000000 ed3aad80 c11cb12c [ 13.987202] 5ba0: ed5a7820 c1b68b40 00000000 c1b69b50 ec115b90 ed5a7810 ec115c54 ec115bc8 [ 13.995416] 5bc0: c04adef0 c04acd00 ec115c64 ec115bd8 c04adef0 00000000 00000000 00000000 [ 14.003629] 5be0: 00000000 00000000 00000000 c230d400 ed6a7318 00000000 00000001 c230d40c [ 14.011842] 5c00: ed6a72f8 00000001 00000000 c1b69b84 c1a80870 ed6a6e40 ec115cb4 ec115c28 [ 14.020054] 5c20: c04adef0 c04acdd4 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.028266] 5c40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.036479] 5c60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.044691] 5c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.052903] 5ca0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.061115] 5cc0: 00000000 00000000 00000000 00000000 00000000 00000000 c0e194b0 c0e040a4 [ 14.069328] 5ce0: ed5a7800 c236d9e4 ed3aad80 ec115d84 00000000 00000000 ec115d44 ec115d08 [ 14.077540] 5d00: c0e04908 c0e040b0 c0f458a8 c04ad82c 00000001 00000000 c0e01c0c 00000000 [ 14.085753] 5d20: ed5a7800 c0e04828 ed3aad80 ed3aad80 ec115d84 00000000 ec115d64 ec115d48 [ 14.093966] 5d40: c0e1e538 c0e04834 ed3aad80 ec212000 00000020 ed3aad80 ec115d7c ec115d68 [ 14.102179] 5d60: c0e01c1c c0e1e494 ee2f0400 ec212000 ec115dac ec115d80 c0e1de70 c0e01bf0 [ 14.110391] 5d80: ec115e24 7fffffff ec115f48 ec212000 ed3aad80 00000000 00000020 00000000 [ 14.118603] 5da0: ec115e0c ec115db0 c0e1e29c c0e1dcfc ec115e24 ec115e48 0000000c 00000001 [ 14.126817] 5dc0: be8be8b4 00000008 00000000 ed73bd00 00000000 000000f3 00000000 00000000 [ 14.135031] 5de0: ec115e24 ec115f48 00000000 00000000 eddf6d80 ec115e28 00000000 00000000 [ 14.143244] 5e00: ec115e1c ec115e10 c0dc8304 c0e1dfe4 ec115f34 ec115e20 c0dc8a70 c0dc82ec [ 14.151457] 5e20: ec115e84 00000000 c04afaa8 c04adb9c 00000001 00000000 600b0013 c1b7a9a4 [ 14.159670] 5e40: 00000001 c1a4405c 001fa4f0 00000020 c04b00cc c04c87a0 c04afaa8 c04adb9c [ 14.167884] 5e60: 00000000 00000000 ffffe000 00000000 600b0013 00004000 00000000 c05fed1c [ 14.176097] 5e80: 600b0013 c04aa828 00000010 00000000 00000000 00000008 00004000 ed45ea00 [ 14.184308] 5ea0: c1b79b7b c13c1e6c ec115efc ec115eb8 c05fed48 c04afc5c 00000000 00000000 [ 14.192522] 5ec0: c05febb8 ec115ed0 c13e1ce0 ed45ea90 00000000 ec115f44 ec115f40 00000008 [ 14.200735] 5ee0: 00000128 c0408e04 ec114000 00000000 ec115f0c ec115f00 c05fee84 c05febc4 [ 14.208949] 5f00: ec115f1c ec115f10 c05fef04 be8be89c 00000000 eddf6d80 00000128 c0408e04 [ 14.217162] 5f20: ec114000 00000000 ec115f94 ec115f38 c0dc9884 c0dc889c 00000000 00000000 [ 14.225375] 5f40: 00000001 fffffff7 ec115e88 0000000c 00000001 00000000 00000000 ec115e50 [ 14.233588] 5f60: 00000000 00000006 00000000 00000000 00000000 00000000 ec115f94 00000000 [ 14.241801] 5f80: be8be89c 00000008 ec115fa4 ec115f98 c0dc98c8 c0dc9840 00000000 ec115fa8 [ 14.250013] 5fa0: c0408c80 c0dc98bc 00000000 be8be89c 00000008 be8be89c 00000000 00000000 [ 14.258225] 5fc0: 00000000 be8be89c 00000008 00000128 b69d9390 be8be900 00000000 001a93e0 [ 14.266438] 5fe0: b6f316e0 be8be840 00000000 b699a4a4 800b0010 00000008 00000000 00000000 [ 14.274661] [<c0b6f680>] (kszphy_config_reset) from [<c0b6f940>] (kszphy_resume+0x24/0x64) [ 14.282966] [<c0b6f940>] (kszphy_resume) from [<c0b693f8>] (phy_attach_direct+0xc4/0x1cc) [ 14.291183] [<c0b693f8>] (phy_attach_direct) from [<c0b69600>] (phy_connect_direct+0x24/0x5c) [ 14.299747] [<c0b69600>] (phy_connect_direct) from [<c0b6968c>] (phy_connect+0x54/0x88) [ 14.307792] [<c0b6968c>] (phy_connect) from [<c0c0e47c>] (cpsw_slave_open+0x230/0x294) [ 14.315749] [<c0c0e47c>] (cpsw_slave_open) from [<c0c11634>] (cpsw_ndo_open+0x4b4/0x618) [ 14.323883] [<c0c11634>] (cpsw_ndo_open) from [<c0deeb64>] (__dev_open+0xbc/0x124) [ 14.331498] [<c0deeb64>] (__dev_open) from [<c0deee4c>] (__dev_change_flags+0xa4/0x154) [ 14.331507] [<c0deee4c>] (__dev_change_flags) from [<c0deef24>] (dev_change_flags+0x28/0x58) [ 14.331522] [<c0deef24>] (dev_change_flags) from [<c0e025f0>] (do_setlink+0x2f0/0x890) [ 14.331531] [<c0e025f0>] (do_setlink) from [<c0e04554>] (rtnl_newlink+0x4b0/0x784) [ 14.331539] [<c0e04554>] (rtnl_newlink) from [<c0e04908>] (rtnetlink_rcv_msg+0xe0/0x1fc) [ 14.331550] [<c0e04908>] (rtnetlink_rcv_msg) from [<c0e1e538>] (netlink_rcv_skb+0xb0/0xcc) [ 14.331559] [<c0e1e538>] (netlink_rcv_skb) from [<c0e01c1c>] (rtnetlink_rcv+0x38/0x40) [ 14.331567] [<c0e01c1c>] (rtnetlink_rcv) from [<c0e1de70>] (netlink_unicast+0x180/0x210) [ 14.331575] [<c0e1de70>] (netlink_unicast) from [<c0e1e29c>] (netlink_sendmsg+0x2c4/0x380) [ 14.331584] [<c0e1e29c>] (netlink_sendmsg) from [<c0dc8304>] (sock_sendmsg+0x24/0x34) [ 14.331593] [<c0dc8304>] (sock_sendmsg) from [<c0dc8a70>] (___sys_sendmsg+0x1e0/0x1f0) [ 14.331601] [<c0dc8a70>] (___sys_sendmsg) from [<c0dc9884>] (__sys_sendmsg+0x50/0x7c) [ 14.331610] [<c0dc9884>] (__sys_sendmsg) from [<c0dc98c8>] (SyS_sendmsg+0x18/0x1c) [ 14.331620] [<c0dc98c8>] (SyS_sendmsg) from [<c0408c80>] (ret_fast_syscall+0x0/0x1c) [ 14.331629] Code: e52de004 e8bd4000 e59062dc e1a04000 (e5d63008) [ 14.331751] ---[ end trace 7927abed2565423d ]---
Full output of a lava job that shows both of these symptoms can be seen at https://lkft.validation.linaro.org/scheduler/job/187562.
Looking through the patches, it's not obvious to me where the issue may be. We haven't bisected it yet but will begin. I noticed quite a few arm changes from Mark Rutland, and a change to net/rds/send.c from Ka-Cheong Poon. Both added to CC.
Dan
On Wed, Apr 18, 2018 at 12:42:44PM -0500, Dan Rue wrote:
On Tue, Apr 17, 2018 at 05:58:33PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.95 release. There are 66 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Apr 19 15:56:27 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.95-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
We've noticed a regression on arm32 in sendto() and send() system calls, causing them to hang forever.
The easiest way we have found to reproduce, is to simply run 'ip link'. When running 'strace -T ip link' we see:
... getsockname(3, {sa_family=AF_NETLINK, nl_pid=364, nl_groups=00000000}, [12]) = 0 <0.000079> .... long wait (10 minute) and then eventually: send(3, {{len=40, type=0x12 /* NLMSG_??? */, flags=NLM_F_REQUEST|0x300, seq=1522961259, pid=0}, \"\21\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\35\0\1\0\0\0\"}, 40, 0[ 807.868633] random: crng init done
Ignore the 'random: crng init done' - I believe it is causing the dmesg buffer to print which is what gets us our send() output.
We saw a similar strace hang on sendto() with the ltp 'gethostid01' test.
We've also observed the following kernel log during boot on the x15/arm32 device:
[ 13.555030] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 13.580352] net eth0: initializing cpsw version 1.15 (0) [ 13.708220] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [ 13.716375] pgd = ed724f40 [ 13.719126] [00000008] *pgd=ac173003, *pmd=fb03d003 [ 13.724068] Internal error: Oops: 207 [#1] SMP ARM [ 13.728882] Modules linked in: snd_soc_simple_card snd_soc_simple_card_utils snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer snd soundcore ac97_bus [ 13.742337] CPU: 0 PID: 243 Comm: NetworkManager Not tainted 4.9.95-rc1 #1 [ 13.749241] Hardware name: Generic DRA74X (Flattened Device Tree) [ 13.755360] task: ed6a6e40 task.stack: ec114000 [ 13.759916] PC is at kszphy_config_reset+0x1c/0x150 [ 13.764815] LR is at kszphy_resume+0x24/0x64 [ 13.769104] pc : [<c0b6f680>] lr : [<c0b6f940>] psr: 600e0113 [ 13.769104] sp : ec115920 ip : ec115940 fp : ec11593c [ 13.780630] r10: 00000000 r9 : 00000007 r8 : 00000000 [ 13.785877] r7 : ed049800 r6 : 00000000 r5 : ee1d3800 r4 : ed049c00 [ 13.792431] r3 : 00000001 r2 : 00000000 r1 : 00000110 r0 : ed049c00 [ 13.798987] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 13.806152] Control: 30c5387d Table: ad724f40 DAC: fffffffd [ 13.811923] Process NetworkManager (pid: 243, stack limit = 0xec114220) [ 13.818564] Stack: (0xec115920 to 0xec116000) [ 13.822941] 5920: ed049c00 ee1d3800 00000000 ed049800 ec115954 ec115940 c0b6f940 c0b6f670 [ 13.831154] 5940: ed049c00 ee1d3800 ec115984 ec115958 c0b693f8 c0b6f928 00000007 ed049c00 [ 13.839368] 5960: ed049c00 c0c11cec c0c11cec 00000007 ee1d3e0c 00000000 ec1159a4 ec115988 [ 13.847582] 5980: c0b69600 c0b69340 ee6af854 ed049c00 ee1d3800 c0c11cec ec1159cc ec1159a8 [ 13.855795] 59a0: c0b6968c c0b695e8 ed040310 ee1d3e00 00000000 ee4fbc10 ee4fbc10 ee1d3e0c [ 13.864009] 59c0: ec115a04 ec1159d0 c0c0e47c c0b69644 00000001 00000000 c04d5160 ee4fbc10 [ 13.872222] 59e0: ee1d3e00 ee1d3800 ee4fbc10 00000000 00000001 ed3aad80 ec115aa4 ec115a08 [ 13.880436] 5a00: c0c11634 c0c0e258 00000000 c0476cec c0ef92e0 c0476cec c0df67b8 fffffff4 [ 13.888647] 5a20: ec115aac fffffff3 ec115aac 0000000d 00000000 00000000 ec115a6c ec115a48 [ 13.896861] 5a40: c0476cec c0eba8ac ec115aac 0000000d ee1d3800 00001002 00000000 ed5a7810 [ 13.905074] 5a60: ec115a84 ec115a70 c0476ed0 c0476ca4 00000000 c0de20fc ec115aa4 ee1d3800 [ 13.913287] 5a80: 00000000 c1193410 ee1d3830 00000000 ed5a7810 ed3aad80 ec115acc ec115aa8 [ 13.921500] 5aa0: c0deeb64 c0c1118c ec115acc ee1d3800 ee1d3800 00001003 00000001 00001002 [ 13.929713] 5ac0: ec115af4 ec115ad0 c0deee4c c0deeab4 ee1d3800 00001002 00000000 ee1d394c [ 13.937925] 5ae0: 00000000 ed5a7810 ec115b1c ec115af8 c0deef24 c0deedb4 ee1d3800 ec115c28 [ 13.946137] 5b00: 00000000 c1193410 00000000 ed5a7810 ec115b8c ec115b20 c0e025f0 c0deef08 [ 13.954351] 5b20: c0491510 c04aa8d4 00000000 ec115b38 00000000 00000003 00000000 00000000 [ 13.962563] 5b40: 00000000 c230d40c ed6a7338 00000001 00000002 c1a58e78 c1a80870 ed6a6e40 [ 13.970775] 5b60: ec115bfc 00000000 ee1d3800 00000000 00000000 00000000 00000000 ed5a7800 [ 13.978988] 5b80: ec115d04 ec115b90 c0e04554 c0e0230c ec115bc4 00000000 ed3aad80 c11cb12c [ 13.987202] 5ba0: ed5a7820 c1b68b40 00000000 c1b69b50 ec115b90 ed5a7810 ec115c54 ec115bc8 [ 13.995416] 5bc0: c04adef0 c04acd00 ec115c64 ec115bd8 c04adef0 00000000 00000000 00000000 [ 14.003629] 5be0: 00000000 00000000 00000000 c230d400 ed6a7318 00000000 00000001 c230d40c [ 14.011842] 5c00: ed6a72f8 00000001 00000000 c1b69b84 c1a80870 ed6a6e40 ec115cb4 ec115c28 [ 14.020054] 5c20: c04adef0 c04acdd4 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.028266] 5c40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.036479] 5c60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.044691] 5c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.052903] 5ca0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 14.061115] 5cc0: 00000000 00000000 00000000 00000000 00000000 00000000 c0e194b0 c0e040a4 [ 14.069328] 5ce0: ed5a7800 c236d9e4 ed3aad80 ec115d84 00000000 00000000 ec115d44 ec115d08 [ 14.077540] 5d00: c0e04908 c0e040b0 c0f458a8 c04ad82c 00000001 00000000 c0e01c0c 00000000 [ 14.085753] 5d20: ed5a7800 c0e04828 ed3aad80 ed3aad80 ec115d84 00000000 ec115d64 ec115d48 [ 14.093966] 5d40: c0e1e538 c0e04834 ed3aad80 ec212000 00000020 ed3aad80 ec115d7c ec115d68 [ 14.102179] 5d60: c0e01c1c c0e1e494 ee2f0400 ec212000 ec115dac ec115d80 c0e1de70 c0e01bf0 [ 14.110391] 5d80: ec115e24 7fffffff ec115f48 ec212000 ed3aad80 00000000 00000020 00000000 [ 14.118603] 5da0: ec115e0c ec115db0 c0e1e29c c0e1dcfc ec115e24 ec115e48 0000000c 00000001 [ 14.126817] 5dc0: be8be8b4 00000008 00000000 ed73bd00 00000000 000000f3 00000000 00000000 [ 14.135031] 5de0: ec115e24 ec115f48 00000000 00000000 eddf6d80 ec115e28 00000000 00000000 [ 14.143244] 5e00: ec115e1c ec115e10 c0dc8304 c0e1dfe4 ec115f34 ec115e20 c0dc8a70 c0dc82ec [ 14.151457] 5e20: ec115e84 00000000 c04afaa8 c04adb9c 00000001 00000000 600b0013 c1b7a9a4 [ 14.159670] 5e40: 00000001 c1a4405c 001fa4f0 00000020 c04b00cc c04c87a0 c04afaa8 c04adb9c [ 14.167884] 5e60: 00000000 00000000 ffffe000 00000000 600b0013 00004000 00000000 c05fed1c [ 14.176097] 5e80: 600b0013 c04aa828 00000010 00000000 00000000 00000008 00004000 ed45ea00 [ 14.184308] 5ea0: c1b79b7b c13c1e6c ec115efc ec115eb8 c05fed48 c04afc5c 00000000 00000000 [ 14.192522] 5ec0: c05febb8 ec115ed0 c13e1ce0 ed45ea90 00000000 ec115f44 ec115f40 00000008 [ 14.200735] 5ee0: 00000128 c0408e04 ec114000 00000000 ec115f0c ec115f00 c05fee84 c05febc4 [ 14.208949] 5f00: ec115f1c ec115f10 c05fef04 be8be89c 00000000 eddf6d80 00000128 c0408e04 [ 14.217162] 5f20: ec114000 00000000 ec115f94 ec115f38 c0dc9884 c0dc889c 00000000 00000000 [ 14.225375] 5f40: 00000001 fffffff7 ec115e88 0000000c 00000001 00000000 00000000 ec115e50 [ 14.233588] 5f60: 00000000 00000006 00000000 00000000 00000000 00000000 ec115f94 00000000 [ 14.241801] 5f80: be8be89c 00000008 ec115fa4 ec115f98 c0dc98c8 c0dc9840 00000000 ec115fa8 [ 14.250013] 5fa0: c0408c80 c0dc98bc 00000000 be8be89c 00000008 be8be89c 00000000 00000000 [ 14.258225] 5fc0: 00000000 be8be89c 00000008 00000128 b69d9390 be8be900 00000000 001a93e0 [ 14.266438] 5fe0: b6f316e0 be8be840 00000000 b699a4a4 800b0010 00000008 00000000 00000000 [ 14.274661] [<c0b6f680>] (kszphy_config_reset) from [<c0b6f940>] (kszphy_resume+0x24/0x64) [ 14.282966] [<c0b6f940>] (kszphy_resume) from [<c0b693f8>] (phy_attach_direct+0xc4/0x1cc) [ 14.291183] [<c0b693f8>] (phy_attach_direct) from [<c0b69600>] (phy_connect_direct+0x24/0x5c) [ 14.299747] [<c0b69600>] (phy_connect_direct) from [<c0b6968c>] (phy_connect+0x54/0x88) [ 14.307792] [<c0b6968c>] (phy_connect) from [<c0c0e47c>] (cpsw_slave_open+0x230/0x294) [ 14.315749] [<c0c0e47c>] (cpsw_slave_open) from [<c0c11634>] (cpsw_ndo_open+0x4b4/0x618) [ 14.323883] [<c0c11634>] (cpsw_ndo_open) from [<c0deeb64>] (__dev_open+0xbc/0x124) [ 14.331498] [<c0deeb64>] (__dev_open) from [<c0deee4c>] (__dev_change_flags+0xa4/0x154) [ 14.331507] [<c0deee4c>] (__dev_change_flags) from [<c0deef24>] (dev_change_flags+0x28/0x58) [ 14.331522] [<c0deef24>] (dev_change_flags) from [<c0e025f0>] (do_setlink+0x2f0/0x890) [ 14.331531] [<c0e025f0>] (do_setlink) from [<c0e04554>] (rtnl_newlink+0x4b0/0x784) [ 14.331539] [<c0e04554>] (rtnl_newlink) from [<c0e04908>] (rtnetlink_rcv_msg+0xe0/0x1fc) [ 14.331550] [<c0e04908>] (rtnetlink_rcv_msg) from [<c0e1e538>] (netlink_rcv_skb+0xb0/0xcc) [ 14.331559] [<c0e1e538>] (netlink_rcv_skb) from [<c0e01c1c>] (rtnetlink_rcv+0x38/0x40) [ 14.331567] [<c0e01c1c>] (rtnetlink_rcv) from [<c0e1de70>] (netlink_unicast+0x180/0x210) [ 14.331575] [<c0e1de70>] (netlink_unicast) from [<c0e1e29c>] (netlink_sendmsg+0x2c4/0x380) [ 14.331584] [<c0e1e29c>] (netlink_sendmsg) from [<c0dc8304>] (sock_sendmsg+0x24/0x34) [ 14.331593] [<c0dc8304>] (sock_sendmsg) from [<c0dc8a70>] (___sys_sendmsg+0x1e0/0x1f0) [ 14.331601] [<c0dc8a70>] (___sys_sendmsg) from [<c0dc9884>] (__sys_sendmsg+0x50/0x7c) [ 14.331610] [<c0dc9884>] (__sys_sendmsg) from [<c0dc98c8>] (SyS_sendmsg+0x18/0x1c) [ 14.331620] [<c0dc98c8>] (SyS_sendmsg) from [<c0408c80>] (ret_fast_syscall+0x0/0x1c) [ 14.331629] Code: e52de004 e8bd4000 e59062dc e1a04000 (e5d63008) [ 14.331751] ---[ end trace 7927abed2565423d ]---
Full output of a lava job that shows both of these symptoms can be seen at https://lkft.validation.linaro.org/scheduler/job/187562.
Looking through the patches, it's not obvious to me where the issue may be. We haven't bisected it yet but will begin. I noticed quite a few arm changes from Mark Rutland, and a change to net/rds/send.c from Ka-Cheong Poon. Both added to CC.
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
thanks,
greg k-h
On Thu, 2018-04-19 at 16:42 +0530, Naresh Kamboju wrote:
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
After reverting this patch, network started works on arm32 x15 device. d7ba3c00047d ("net: phy: micrel: Restore led_mode and clk_sel on resume")
Does that also fix the hang during 'ip l'? (If the hang only occurred on the same systems that oops'd in the micrel driver, it could well be because the task that oops'd was holding the rtnetlink lock.)
Ben.
On 19 April 2018 at 17:39, Ben Hutchings ben.hutchings@codethink.co.uk wrote:
On Thu, 2018-04-19 at 16:42 +0530, Naresh Kamboju wrote:
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
After reverting this patch, network started works on arm32 x15 device. d7ba3c00047d ("net: phy: micrel: Restore led_mode and clk_sel on resume")
Does that also fix the hang during 'ip l'? (If the hang only occurred on the same systems that oops'd in the micrel driver, it could well be because the task that oops'd was holding the rtnetlink lock.)
Now 'ip l' works and the oops message is gone.
- Naresh
Ben.
-- Ben Hutchings Software Developer, Codethink Ltd.
On Thu, Apr 19, 2018 at 06:00:52PM +0530, Naresh Kamboju wrote:
On 19 April 2018 at 17:39, Ben Hutchings ben.hutchings@codethink.co.uk wrote:
On Thu, 2018-04-19 at 16:42 +0530, Naresh Kamboju wrote:
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
After reverting this patch, network started works on arm32 x15 device. d7ba3c00047d ("net: phy: micrel: Restore led_mode and clk_sel on resume")
Does that also fix the hang during 'ip l'? (If the hang only occurred on the same systems that oops'd in the micrel driver, it could well be because the task that oops'd was holding the rtnetlink lock.)
Now 'ip l' works and the oops message is gone.
- Naresh
There may be some confusion since this patch was in 4.9.94 and we did not report this issue then. The way these failures presented themselves made them a bit subtle (we didn't get any hard failures, just some incomplete jobs - not an unusual occurrence on embedded boards) so we didn't notice them in the last release. It was only after Naresh manually investigated that this was discovered.
We'll think about how we can improve our process to make these much more obvious in an automated way.
Dan
On Thu, Apr 19, 2018 at 04:42:56PM +0530, Naresh Kamboju wrote:
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
After reverting this patch, network started works on arm32 x15 device. d7ba3c00047d ("net: phy: micrel: Restore led_mode and clk_sel on resume")
Thanks for letting me know, I've now reverted that commit.
greg k-h
On Thu, Apr 19, 2018 at 04:03:05PM +0200, Greg Kroah-Hartman wrote:
On Thu, Apr 19, 2018 at 04:42:56PM +0530, Naresh Kamboju wrote:
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
After reverting this patch, network started works on arm32 x15 device. d7ba3c00047d ("net: phy: micrel: Restore led_mode and clk_sel on resume")
Thanks for letting me know, I've now reverted that commit.
Alright here we go.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
A few notes about these results. Since yesterday, we have enabled qemu arm 32 and qemu arm 64 testing, and also upgraded kselftest to 4.16 from 4.15. These changes have caused some new failures and noise that will be triaged over the next couple of days. These aren't considered regressions because they're a result of upgrading the test / introducing new environments, not some change in kernel. I removed some of these false regressions from the below email report, but you would see them if you looked at qa-reports.
The x15 issue that was previously reported is resolved, but we have learned that running 'ethtool --phy-statistics eth0' causes an oops. This isn't a new issue, and we'll be tracking it going forward until it's resolved.
Summary ------------------------------------------------------------------------
kernel: 4.9.95-rc3 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: 0afd4120faa7df778d227670fc91f49234be202b git describe: v4.9.94-69-g0afd4120faa7 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.94-69-g...
No regressions (compared to build v4.9.94-67-g148bd16b9893)
Boards, architectures and test suites: -------------------------------------
dragonboard-410c * boot - pass: 20, * kselftest - pass: 35, skip: 27, fail: 4 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1015, skip: 135, * ltp-timers-tests - pass: 13,
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - pass: 38, skip: 24, fail: 4 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 21, skip: 1, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, skip: 4, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1014, skip: 136, * ltp-timers-tests - pass: 13,
juno-r2 - arm64 * boot - pass: 20, * kselftest - pass: 38, skip: 24, fail: 4 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 10, skip: 4, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1015, skip: 135, * ltp-timers-tests - pass: 13,
qemu_arm * boot - fail: 20 ^ This is the first run for qemu_arm and this boot problem is due to an error in our job template.
qemu_arm64 * boot - pass: 20, * kselftest - pass: 37, skip: 27, fail: 4 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 592, skip: 74, fail: 5 * ltp-timers-tests - pass: 13,
qemu_x86_64 * boot - pass: 22, * kselftest - pass: 49, skip: 27, fail: 4 * kselftest-vsyscall-mode-native - pass: 49, skip: 27, fail: 4 * kselftest-vsyscall-mode-none - pass: 49, skip: 27, fail: 4 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 57, skip: 6, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1003, skip: 147, * ltp-timers-tests - pass: 13,
x15 - arm * boot - pass: 20, * kselftest - pass: 37, skip: 24, fail: 4 * libhugetlbfs - pass: 87, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 63, skip: 18, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 58, skip: 5, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 20, skip: 2, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 13, skip: 1, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1075, skip: 75, * ltp-timers-tests - pass: 13,
x86_64 * boot - pass: 22, * kselftest - pass: 51, skip: 25, fail: 4 * kselftest-vsyscall-mode-native - pass: 50, skip: 25, fail: 5 * kselftest-vsyscall-mode-none - pass: 51, skip: 25, fail: 4 * libhugetlbfs - pass: 90, skip: 1, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, skip: 17, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 58, skip: 5, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 9, skip: 5, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - pass: 1034, skip: 116, * ltp-timers-tests - pass: 13,
On Thu, Apr 19, 2018 at 03:04:05PM -0500, Dan Rue wrote:
On Thu, Apr 19, 2018 at 04:03:05PM +0200, Greg Kroah-Hartman wrote:
On Thu, Apr 19, 2018 at 04:42:56PM +0530, Naresh Kamboju wrote:
Can you try 'git bisect'? I'll hold off on releasing 4.9.y until this gets figured out.
After reverting this patch, network started works on arm32 x15 device. d7ba3c00047d ("net: phy: micrel: Restore led_mode and clk_sel on resume")
Thanks for letting me know, I've now reverted that commit.
Alright here we go.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Great, thanks for testing and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org