This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.77-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.77-rc1
Thomas Gleixner tglx@linutronix.de x86/retpoline: Remove compile time warning
Andy Lutomirski luto@kernel.org selftests/x86: Add test_vsyscall
David Woodhouse dwmw@amazon.co.uk x86/retpoline: Fill return stack buffer on vmexit
Andi Kleen ak@linux.intel.com x86/retpoline/irq32: Convert assembler indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/retpoline/checksum32: Convert assembler indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/retpoline/xen: Convert Xen hypercall indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/retpoline/hyperv: Convert assembler indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/retpoline/entry: Convert entry assembler indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/retpoline/crypto: Convert crypto assembler indirect jumps
David Woodhouse dwmw@amazon.co.uk x86/spectre: Add boot time option to select Spectre v2 mitigation
David Woodhouse dwmw@amazon.co.uk x86/retpoline: Add initial retpoline support
Andrey Ryabinin aryabinin@virtuozzo.com x86/asm: Use register variable to get stack pointer value
Josh Poimboeuf jpoimboe@redhat.com objtool: Allow alternatives to be ignored
Josh Poimboeuf jpoimboe@redhat.com objtool: Detect jumps to retpoline thunks
Josh Poimboeuf jpoimboe@redhat.com objtool, modules: Discard objtool annotation sections for modules
Andy Lutomirski luto@kernel.org x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
David Woodhouse dwmw@amazon.co.uk x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
Borislav Petkov bp@suse.de x86/alternatives: Fix optimize_nops() checking
David Woodhouse dwmw@amazon.co.uk sysfs/cpu: Fix typos in vulnerability documentation
Tom Lendacky thomas.lendacky@amd.com x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
Tom Lendacky thomas.lendacky@amd.com x86/cpu/AMD: Make LFENCE a serializing instruction
Thomas Gleixner tglx@linutronix.de x86/cpu: Implement CPU vulnerabilites sysfs functions
Thomas Gleixner tglx@linutronix.de sysfs/cpu: Add vulnerability folder
Borislav Petkov bp@suse.de x86/cpu: Merge bugs.c and bugs_64.c
David Woodhouse dwmw@amazon.co.uk x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
Thomas Gleixner tglx@linutronix.de x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
Thomas Gleixner tglx@linutronix.de x86/cpufeatures: Add X86_BUG_CPU_INSECURE
Thomas Gleixner tglx@linutronix.de x86/cpufeatures: Make CPU bugs sticky
Andy Lutomirski luto@kernel.org x86/cpu: Factor out application of forced CPU caps
Dave Hansen dave.hansen@linux.intel.com x86/Documentation: Add PTI description
Benjamin Poirier bpoirier@suse.com e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
Icenowy Zheng icenowy@aosc.io uas: ignore UAS for Norelsys NS1068(X) chips
Ben Seri ben@armis.com Bluetooth: Prevent stack info leak from the EFS element.
Viktor Slavkovic viktors@google.com staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
Shuah Khan shuah@kernel.org usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
Shuah Khan shuah@kernel.org usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
Shuah Khan shuah@kernel.org usbip: remove kernel addresses from usb device and urb debug msgs
Pete Zaitcev zaitcev@redhat.com USB: fix usbmon BUG trigger
Stefan Agner stefan@agner.ch usb: misc: usb3503: make sure reset is low for at least 100us
Christian Holl cyborgx1@gmail.com USB: serial: cp210x: add new device ID ELV ALC 8xxx
Diego Elio Pettenò flameeyes@flameeyes.eu USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
Daniel Borkmann daniel@iogearbox.net bpf, array: fix overflow in max_entries and undefined behavior in index_mask
Alexei Starovoitov ast@kernel.org bpf: prevent out-of-bounds speculation
Alexei Starovoitov ast@fb.com bpf: refactor fixup_bpf_calls()
Alexei Starovoitov ast@fb.com bpf: move fixup_bpf_calls() function
Nicholas Bellinger nab@linux-iscsi.org target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
Nicholas Bellinger nab@linux-iscsi.org iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
Lepton Wu ytht.net@gmail.com kaiser: Set _PAGE_NX only if supported
Dan Carpenter dan.carpenter@oracle.com drm/vmwgfx: Potential off by one in vmw_view_add()
Andrew Honig ahonig@google.com KVM: x86: Add memory barrier on vmcs field lookup
Jia Zhang qianyue.zj@alibaba-inc.com x86/microcode/intel: Extend BDW late-loading with a revision check
Ilya Dryomov idryomov@gmail.com rbd: set max_segments to USHRT_MAX
Eric Biggers ebiggers@google.com crypto: algapi - fix NULL dereference in crypto_remove_spawns()
Roi Dayan roid@mellanox.com net/sched: Fix update of lastuse in act modules implementing stats_update
Ido Schimmel idosch@mellanox.com mlxsw: spectrum_router: Fix NULL pointer deref
Stephen Hemminger stephen@networkplumber.org ethtool: do not print warning for applications using legacy API
Eric Dumazet edumazet@google.com ipv6: fix possible mem leaks in ipv6_make_skb()
Jerome Brunet jbrunet@baylibre.com net: stmmac: enable EEE in MII, GMII or RGMII only
Sergei Shtylyov sergei.shtylyov@cogentembedded.com sh_eth: fix SH7757 GEther initialization
Sergei Shtylyov sergei.shtylyov@cogentembedded.com sh_eth: fix TSU resource handling
Mohamed Ghannam simo.ghannam@gmail.com RDS: null pointer dereference in rds_atomic_free_op
Mohamed Ghannam simo.ghannam@gmail.com RDS: Heap OOB write in rds_message_alloc_sgs()
Andrii Vladyka tulup@mail.ru net: core: fix module type in sock_diag_bind
Eli Cooper elicooper@gmx.com ip6_tunnel: disable dst caching if tunnel is dual-stack
Cong Wang xiyou.wangcong@gmail.com 8021q: fix a memory leak for VLAN 0 device
Ben Hutchings ben.hutchings@codethink.co.uk xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
Eric Dumazet edumazet@google.com cx82310_eth: use skb_cow_head() to deal with cloned skbs
Eric Dumazet edumazet@google.com smsc75xx: use skb_cow_head() to deal with cloned skbs
Eric Dumazet edumazet@google.com sr9700: use skb_cow_head() to deal with cloned skbs
Eric Dumazet edumazet@google.com lan78xx: use skb_cow_head() to deal with cloned skbs
Dan Streetman ddstreet@ieee.org zswap: don't param_set_charp while holding spinlock
Vikas C Sajjan vikas.cha.sajjan@hpe.com x86/acpi: Reduce code duplication in mp_override_legacy_irq()
Takashi Iwai tiwai@suse.de ALSA: aloop: Fix racy hw constraints adjustment
Takashi Iwai tiwai@suse.de ALSA: aloop: Fix inconsistent format due to incomplete rule
Takashi Iwai tiwai@suse.de ALSA: aloop: Release cable upon open error path
Takashi Iwai tiwai@suse.de ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
Takashi Iwai tiwai@suse.de ALSA: pcm: Abort properly at pending signal in OSS read/write loops
Takashi Iwai tiwai@suse.de ALSA: pcm: Add missing error checks in OSS emulation plugin builder
Takashi Iwai tiwai@suse.de ALSA: pcm: Remove incorrect snd_BUG_ON() usages
Vikas C Sajjan vikas.cha.sajjan@hpe.com x86/acpi: Handle SCI interrupts above legacy space gracefully
Rafael J. Wysocki rafael.j.wysocki@intel.com platform/x86: wmi: Call acpi_wmi_init() later
Jim Mattson jmattson@google.com kvm: vmx: Scrub hardware GPRs at VM-exit
Maciej W. Rozycki macro@mips.com MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
Maciej W. Rozycki macro@mips.com MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
Maciej W. Rozycki macro@mips.com MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
Maciej W. Rozycki macro@mips.com MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
Maciej W. Rozycki macro@mips.com MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
Maciej W. Rozycki macro@mips.com MIPS: Factor out NT_PRFPREG regset access helpers
Maciej W. Rozycki macro@mips.com MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
Bart Van Assche bart.vanassche@wdc.com IB/srpt: Disable RDMA access by the initiator
Wolfgang Grandegger wg@grandegger.com can: gs_usb: fix return value of the "set_bittiming" callback
Wanpeng Li wanpeng.li@hotmail.com KVM: Fix stack-out-of-bounds read in write_mmio
Vasanthakumar Thiagarajan vthiagar@qti.qualcomm.com ath10k: rebuild crypto header in rx data frames
David Spinadel david.spinadel@intel.com mac80211: Add RX flag to indicate ICV stripped
Suren Baghdasaryan surenb@google.com dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 16 + Documentation/kernel-parameters.txt | 49 +- Documentation/x86/pti.txt | 186 ++++++++ Makefile | 4 +- arch/arm/kvm/mmio.c | 6 +- arch/mips/kernel/process.c | 12 + arch/mips/kernel/ptrace.c | 147 ++++-- arch/x86/Kconfig | 14 + arch/x86/Makefile | 8 + arch/x86/crypto/aesni-intel_asm.S | 5 +- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 +- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 +- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +- arch/x86/entry/entry_32.S | 10 +- arch/x86/entry/entry_64.S | 10 +- arch/x86/include/asm/alternative.h | 4 +- arch/x86/include/asm/asm-prototypes.h | 25 ++ arch/x86/include/asm/asm.h | 11 + arch/x86/include/asm/cpufeature.h | 2 + arch/x86/include/asm/cpufeatures.h | 6 + arch/x86/include/asm/msr-index.h | 3 + arch/x86/include/asm/nospec-branch.h | 214 +++++++++ arch/x86/include/asm/processor.h | 4 +- arch/x86/include/asm/thread_info.h | 11 - arch/x86/include/asm/xen/hypercall.h | 5 +- arch/x86/kernel/acpi/boot.c | 61 ++- arch/x86/kernel/alternative.c | 7 +- arch/x86/kernel/cpu/Makefile | 4 +- arch/x86/kernel/cpu/amd.c | 28 +- arch/x86/kernel/cpu/bugs.c | 219 ++++++++- arch/x86/kernel/cpu/bugs_64.c | 33 -- arch/x86/kernel/cpu/common.c | 39 +- arch/x86/kernel/cpu/microcode/intel.c | 13 +- arch/x86/kernel/irq_32.c | 15 +- arch/x86/kernel/mcount_64.S | 7 +- arch/x86/kernel/traps.c | 2 +- arch/x86/kvm/svm.c | 23 + arch/x86/kvm/vmx.c | 30 +- arch/x86/kvm/x86.c | 8 +- arch/x86/lib/Makefile | 1 + arch/x86/lib/checksum_32.S | 7 +- arch/x86/lib/retpoline.S | 48 ++ arch/x86/mm/kaiser.c | 2 + arch/x86/mm/tlb.c | 2 +- crypto/algapi.c | 12 + drivers/base/Kconfig | 3 + drivers/base/cpu.c | 48 ++ drivers/block/rbd.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 2 + drivers/hv/hv.c | 11 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +- drivers/md/dm-bufio.c | 7 +- drivers/net/can/usb/gs_usb.c | 2 +- drivers/net/ethernet/intel/e1000e/ich8lan.c | 11 +- .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 4 +- drivers/net/ethernet/renesas/sh_eth.c | 29 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 + drivers/net/usb/cx82310_eth.c | 7 +- drivers/net/usb/lan78xx.c | 9 +- drivers/net/usb/smsc75xx.c | 8 +- drivers/net/usb/sr9700.c | 9 +- drivers/net/wireless/ath/ath10k/htt_rx.c | 105 ++++- drivers/net/wireless/ath/ath10k/rx_desc.h | 3 + drivers/platform/x86/wmi.c | 2 +- drivers/staging/android/ashmem.c | 2 + drivers/target/iscsi/iscsi_target.c | 20 +- drivers/target/target_core_tmr.c | 9 + drivers/target/target_core_transport.c | 2 + drivers/usb/host/xhci-mem.c | 3 +- drivers/usb/misc/usb3503.c | 2 + drivers/usb/mon/mon_bin.c | 8 +- drivers/usb/serial/cp210x.c | 2 + drivers/usb/storage/unusual_uas.h | 7 + drivers/usb/usbip/usbip_common.c | 17 +- drivers/usb/usbip/vudc_rx.c | 19 + drivers/usb/usbip/vudc_tx.c | 11 +- include/linux/bpf.h | 2 + include/linux/bpf_verifier.h | 5 +- include/linux/cpu.h | 7 + include/linux/frame.h | 2 +- include/linux/phy.h | 11 + include/linux/sh_eth.h | 1 - include/net/mac80211.h | 5 +- include/target/target_core_base.h | 1 + include/trace/events/kvm.h | 7 +- kernel/bpf/arraymap.c | 45 +- kernel/bpf/syscall.c | 54 --- kernel/bpf/verifier.c | 89 +++- mm/zswap.c | 12 +- net/8021q/vlan.c | 7 +- net/bluetooth/l2cap_core.c | 20 +- net/core/ethtool.c | 15 +- net/core/sock_diag.c | 2 +- net/ipv6/ip6_output.c | 5 +- net/ipv6/ip6_tunnel.c | 9 +- net/mac80211/wep.c | 3 +- net/mac80211/wpa.c | 3 +- net/rds/rdma.c | 4 + net/sched/act_gact.c | 2 +- net/sched/act_mirred.c | 2 +- scripts/mod/modpost.c | 1 + scripts/module-common.lds | 5 +- sound/core/oss/pcm_oss.c | 41 +- sound/core/oss/pcm_plugin.c | 14 +- sound/core/pcm_lib.c | 4 +- sound/drivers/aloop.c | 98 ++-- tools/objtool/builtin-check.c | 73 ++- tools/testing/selftests/x86/test_vsyscall.c | 500 +++++++++++++++++++++ 108 files changed, 2286 insertions(+), 458 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suren Baghdasaryan surenb@google.com
commit fbc7c07ec23c040179384a1f16b62b6030eb6bdd upstream.
When system is under memory pressure it is observed that dm bufio shrinker often reclaims only one buffer per scan. This change fixes the following two issues in dm bufio shrinker that cause this behavior:
1. ((nr_to_scan - freed) <= retain_target) condition is used to terminate slab scan process. This assumes that nr_to_scan is equal to the LRU size, which might not be correct because do_shrink_slab() in vmscan.c calculates nr_to_scan using multiple inputs. As a result when nr_to_scan is less than retain_target (64) the scan will terminate after the first iteration, effectively reclaiming one buffer per scan and making scans very inefficient. This hurts vmscan performance especially because mutex is acquired/released every time dm_bufio_shrink_scan() is called. New implementation uses ((LRU size - freed) <= retain_target) condition for scan termination. LRU size can be safely determined inside __scan() because this function is called after dm_bufio_lock().
2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to determine number of freeable objects in the slab. However dm_bufio always retains retain_target buffers in its LRU and will terminate a scan when this mark is reached. Therefore returning the entire LRU size from dm_bufio_shrink_count() is misleading because that does not represent the number of freeable objects that slab will reclaim during a scan. Returning (LRU size - retain_target) better represents the number of freeable objects in the slab. This way do_shrink_slab() returns 0 when (LRU size < retain_target) and vmscan will not try to scan this shrinker avoiding scans that will not reclaim any memory.
Test: tested using Android device running <AOSP>/system/extras/alloc-stress that generates memory pressure and causes intensive shrinker scans
Signed-off-by: Suren Baghdasaryan surenb@google.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-bufio.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1554,7 +1554,8 @@ static unsigned long __scan(struct dm_bu int l; struct dm_buffer *b, *tmp; unsigned long freed = 0; - unsigned long count = nr_to_scan; + unsigned long count = c->n_buffers[LIST_CLEAN] + + c->n_buffers[LIST_DIRTY]; unsigned long retain_target = get_retain_buffers(c);
for (l = 0; l < LIST_SIZE; l++) { @@ -1591,6 +1592,7 @@ dm_bufio_shrink_count(struct shrinker *s { struct dm_bufio_client *c; unsigned long count; + unsigned long retain_target;
c = container_of(shrink, struct dm_bufio_client, shrinker); if (sc->gfp_mask & __GFP_FS) @@ -1599,8 +1601,9 @@ dm_bufio_shrink_count(struct shrinker *s return 0;
count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY]; + retain_target = get_retain_buffers(c); dm_bufio_unlock(c); - return count; + return (count < retain_target) ? 0 : (count - retain_target); }
/*
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Spinadel david.spinadel@intel.com
commit cef0acd4d7d4811d2d19cd0195031bf0dfe41249 upstream.
Add a flag that indicates that the WEP ICV was stripped from an RX packet, allowing the device to not transfer that if it's already checked.
Signed-off-by: David Spinadel david.spinadel@intel.com Signed-off-by: Johannes Berg johannes.berg@intel.com Cc: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/mac80211.h | 5 ++++- net/mac80211/wep.c | 3 ++- net/mac80211/wpa.c | 3 ++- 3 files changed, 8 insertions(+), 3 deletions(-)
--- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -1007,7 +1007,7 @@ ieee80211_tx_info_clear_status(struct ie * @RX_FLAG_DECRYPTED: This frame was decrypted in hardware. * @RX_FLAG_MMIC_STRIPPED: the Michael MIC is stripped off this frame, * verification has been done by the hardware. - * @RX_FLAG_IV_STRIPPED: The IV/ICV are stripped from this frame. + * @RX_FLAG_IV_STRIPPED: The IV and ICV are stripped from this frame. * If this flag is set, the stack cannot do any replay detection * hence the driver or hardware will have to do that. * @RX_FLAG_PN_VALIDATED: Currently only valid for CCMP/GCMP frames, this @@ -1078,6 +1078,8 @@ ieee80211_tx_info_clear_status(struct ie * @RX_FLAG_ALLOW_SAME_PN: Allow the same PN as same packet before. * This is used for AMSDU subframes which can have the same PN as * the first subframe. + * @RX_FLAG_ICV_STRIPPED: The ICV is stripped from this frame. CRC checking must + * be done in the hardware. */ enum mac80211_rx_flags { RX_FLAG_MMIC_ERROR = BIT(0), @@ -1113,6 +1115,7 @@ enum mac80211_rx_flags { RX_FLAG_RADIOTAP_VENDOR_DATA = BIT(31), RX_FLAG_MIC_STRIPPED = BIT_ULL(32), RX_FLAG_ALLOW_SAME_PN = BIT_ULL(33), + RX_FLAG_ICV_STRIPPED = BIT_ULL(34), };
#define RX_FLAG_STBC_SHIFT 26 --- a/net/mac80211/wep.c +++ b/net/mac80211/wep.c @@ -293,7 +293,8 @@ ieee80211_crypto_wep_decrypt(struct ieee return RX_DROP_UNUSABLE; ieee80211_wep_remove_iv(rx->local, rx->skb, rx->key); /* remove ICV */ - if (pskb_trim(rx->skb, rx->skb->len - IEEE80211_WEP_ICV_LEN)) + if (!(status->flag & RX_FLAG_ICV_STRIPPED) && + pskb_trim(rx->skb, rx->skb->len - IEEE80211_WEP_ICV_LEN)) return RX_DROP_UNUSABLE; }
--- a/net/mac80211/wpa.c +++ b/net/mac80211/wpa.c @@ -295,7 +295,8 @@ ieee80211_crypto_tkip_decrypt(struct iee return RX_DROP_UNUSABLE;
/* Trim ICV */ - skb_trim(skb, skb->len - IEEE80211_TKIP_ICV_LEN); + if (!(status->flag & RX_FLAG_ICV_STRIPPED)) + skb_trim(skb, skb->len - IEEE80211_TKIP_ICV_LEN);
/* Remove IV */ memmove(skb->data + IEEE80211_TKIP_IV_LEN, skb->data, hdrlen);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasanthakumar Thiagarajan vthiagar@qti.qualcomm.com
commit 7eccb738fce57cbe53ed903ccf43f9ab257b15b3 upstream.
Rx data frames notified through HTT_T2H_MSG_TYPE_RX_IND and HTT_T2H_MSG_TYPE_RX_FRAG_IND expect PN/TSC check to be done on host (mac80211) rather than firmware. Rebuild cipher header in every received data frames (that are notified through those HTT interfaces) from the rx_hdr_status tlv available in the rx descriptor of the first msdu. Skip setting RX_FLAG_IV_STRIPPED flag for the packets which requires mac80211 PN/TSC check support and set appropriate RX_FLAG for stripped crypto tail. Hw QCA988X, QCA9887, QCA99X0, QCA9984, QCA9888 and QCA4019 currently need the rebuilding of cipher header to perform PN/TSC check for replay attack.
Please note that removing crypto tail for CCMP-256, GCMP and GCMP-256 ciphers in raw mode needs to be fixed. Since Rx with these ciphers in raw mode does not work in the current form even without this patch and removing crypto tail for these chipers needs clean up, raw mode related issues in CCMP-256, GCMP and GCMP-256 can be addressed in follow up patches.
Tested-by: Manikanta Pubbisetty mpubbise@qti.qualcomm.com Signed-off-by: Vasanthakumar Thiagarajan vthiagar@qti.qualcomm.com Signed-off-by: Kalle Valo kvalo@qca.qualcomm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/htt_rx.c | 105 +++++++++++++++++++++++++----- drivers/net/wireless/ath/ath10k/rx_desc.h | 3 2 files changed, 92 insertions(+), 16 deletions(-)
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -548,6 +548,11 @@ static int ath10k_htt_rx_crypto_param_le return IEEE80211_TKIP_IV_LEN; case HTT_RX_MPDU_ENCRYPT_AES_CCM_WPA2: return IEEE80211_CCMP_HDR_LEN; + case HTT_RX_MPDU_ENCRYPT_AES_CCM256_WPA2: + return IEEE80211_CCMP_256_HDR_LEN; + case HTT_RX_MPDU_ENCRYPT_AES_GCMP_WPA2: + case HTT_RX_MPDU_ENCRYPT_AES_GCMP256_WPA2: + return IEEE80211_GCMP_HDR_LEN; case HTT_RX_MPDU_ENCRYPT_WEP128: case HTT_RX_MPDU_ENCRYPT_WAPI: break; @@ -573,6 +578,11 @@ static int ath10k_htt_rx_crypto_tail_len return IEEE80211_TKIP_ICV_LEN; case HTT_RX_MPDU_ENCRYPT_AES_CCM_WPA2: return IEEE80211_CCMP_MIC_LEN; + case HTT_RX_MPDU_ENCRYPT_AES_CCM256_WPA2: + return IEEE80211_CCMP_256_MIC_LEN; + case HTT_RX_MPDU_ENCRYPT_AES_GCMP_WPA2: + case HTT_RX_MPDU_ENCRYPT_AES_GCMP256_WPA2: + return IEEE80211_GCMP_MIC_LEN; case HTT_RX_MPDU_ENCRYPT_WEP128: case HTT_RX_MPDU_ENCRYPT_WAPI: break; @@ -1024,9 +1034,21 @@ static void ath10k_htt_rx_h_undecap_raw( hdr = (void *)msdu->data;
/* Tail */ - if (status->flag & RX_FLAG_IV_STRIPPED) + if (status->flag & RX_FLAG_IV_STRIPPED) { skb_trim(msdu, msdu->len - ath10k_htt_rx_crypto_tail_len(ar, enctype)); + } else { + /* MIC */ + if ((status->flag & RX_FLAG_MIC_STRIPPED) && + enctype == HTT_RX_MPDU_ENCRYPT_AES_CCM_WPA2) + skb_trim(msdu, msdu->len - 8); + + /* ICV */ + if (status->flag & RX_FLAG_ICV_STRIPPED && + enctype != HTT_RX_MPDU_ENCRYPT_AES_CCM_WPA2) + skb_trim(msdu, msdu->len - + ath10k_htt_rx_crypto_tail_len(ar, enctype)); + }
/* MMIC */ if ((status->flag & RX_FLAG_MMIC_STRIPPED) && @@ -1048,7 +1070,8 @@ static void ath10k_htt_rx_h_undecap_raw( static void ath10k_htt_rx_h_undecap_nwifi(struct ath10k *ar, struct sk_buff *msdu, struct ieee80211_rx_status *status, - const u8 first_hdr[64]) + const u8 first_hdr[64], + enum htt_rx_mpdu_encrypt_type enctype) { struct ieee80211_hdr *hdr; struct htt_rx_desc *rxd; @@ -1056,6 +1079,7 @@ static void ath10k_htt_rx_h_undecap_nwif u8 da[ETH_ALEN]; u8 sa[ETH_ALEN]; int l3_pad_bytes; + int bytes_aligned = ar->hw_params.decap_align_bytes;
/* Delivered decapped frame: * [nwifi 802.11 header] <-- replaced with 802.11 hdr @@ -1084,6 +1108,14 @@ static void ath10k_htt_rx_h_undecap_nwif /* push original 802.11 header */ hdr = (struct ieee80211_hdr *)first_hdr; hdr_len = ieee80211_hdrlen(hdr->frame_control); + + if (!(status->flag & RX_FLAG_IV_STRIPPED)) { + memcpy(skb_push(msdu, + ath10k_htt_rx_crypto_param_len(ar, enctype)), + (void *)hdr + round_up(hdr_len, bytes_aligned), + ath10k_htt_rx_crypto_param_len(ar, enctype)); + } + memcpy(skb_push(msdu, hdr_len), hdr, hdr_len);
/* original 802.11 header has a different DA and in @@ -1144,6 +1176,7 @@ static void ath10k_htt_rx_h_undecap_eth( u8 sa[ETH_ALEN]; int l3_pad_bytes; struct htt_rx_desc *rxd; + int bytes_aligned = ar->hw_params.decap_align_bytes;
/* Delivered decapped frame: * [eth header] <-- replaced with 802.11 hdr & rfc1042/llc @@ -1172,6 +1205,14 @@ static void ath10k_htt_rx_h_undecap_eth( /* push original 802.11 header */ hdr = (struct ieee80211_hdr *)first_hdr; hdr_len = ieee80211_hdrlen(hdr->frame_control); + + if (!(status->flag & RX_FLAG_IV_STRIPPED)) { + memcpy(skb_push(msdu, + ath10k_htt_rx_crypto_param_len(ar, enctype)), + (void *)hdr + round_up(hdr_len, bytes_aligned), + ath10k_htt_rx_crypto_param_len(ar, enctype)); + } + memcpy(skb_push(msdu, hdr_len), hdr, hdr_len);
/* original 802.11 header has a different DA and in @@ -1185,12 +1226,14 @@ static void ath10k_htt_rx_h_undecap_eth( static void ath10k_htt_rx_h_undecap_snap(struct ath10k *ar, struct sk_buff *msdu, struct ieee80211_rx_status *status, - const u8 first_hdr[64]) + const u8 first_hdr[64], + enum htt_rx_mpdu_encrypt_type enctype) { struct ieee80211_hdr *hdr; size_t hdr_len; int l3_pad_bytes; struct htt_rx_desc *rxd; + int bytes_aligned = ar->hw_params.decap_align_bytes;
/* Delivered decapped frame: * [amsdu header] <-- replaced with 802.11 hdr @@ -1206,6 +1249,14 @@ static void ath10k_htt_rx_h_undecap_snap
hdr = (struct ieee80211_hdr *)first_hdr; hdr_len = ieee80211_hdrlen(hdr->frame_control); + + if (!(status->flag & RX_FLAG_IV_STRIPPED)) { + memcpy(skb_push(msdu, + ath10k_htt_rx_crypto_param_len(ar, enctype)), + (void *)hdr + round_up(hdr_len, bytes_aligned), + ath10k_htt_rx_crypto_param_len(ar, enctype)); + } + memcpy(skb_push(msdu, hdr_len), hdr, hdr_len); }
@@ -1240,13 +1291,15 @@ static void ath10k_htt_rx_h_undecap(stru is_decrypted); break; case RX_MSDU_DECAP_NATIVE_WIFI: - ath10k_htt_rx_h_undecap_nwifi(ar, msdu, status, first_hdr); + ath10k_htt_rx_h_undecap_nwifi(ar, msdu, status, first_hdr, + enctype); break; case RX_MSDU_DECAP_ETHERNET2_DIX: ath10k_htt_rx_h_undecap_eth(ar, msdu, status, first_hdr, enctype); break; case RX_MSDU_DECAP_8023_SNAP_LLC: - ath10k_htt_rx_h_undecap_snap(ar, msdu, status, first_hdr); + ath10k_htt_rx_h_undecap_snap(ar, msdu, status, first_hdr, + enctype); break; } } @@ -1289,7 +1342,8 @@ static void ath10k_htt_rx_h_csum_offload
static void ath10k_htt_rx_h_mpdu(struct ath10k *ar, struct sk_buff_head *amsdu, - struct ieee80211_rx_status *status) + struct ieee80211_rx_status *status, + bool fill_crypt_header) { struct sk_buff *first; struct sk_buff *last; @@ -1299,7 +1353,6 @@ static void ath10k_htt_rx_h_mpdu(struct enum htt_rx_mpdu_encrypt_type enctype; u8 first_hdr[64]; u8 *qos; - size_t hdr_len; bool has_fcs_err; bool has_crypto_err; bool has_tkip_err; @@ -1324,15 +1377,17 @@ static void ath10k_htt_rx_h_mpdu(struct * decapped header. It'll be used for undecapping of each MSDU. */ hdr = (void *)rxd->rx_hdr_status; - hdr_len = ieee80211_hdrlen(hdr->frame_control); - memcpy(first_hdr, hdr, hdr_len); + memcpy(first_hdr, hdr, RX_HTT_HDR_STATUS_LEN);
/* Each A-MSDU subframe will use the original header as the base and be * reported as a separate MSDU so strip the A-MSDU bit from QoS Ctl. */ hdr = (void *)first_hdr; - qos = ieee80211_get_qos_ctl(hdr); - qos[0] &= ~IEEE80211_QOS_CTL_A_MSDU_PRESENT; + + if (ieee80211_is_data_qos(hdr->frame_control)) { + qos = ieee80211_get_qos_ctl(hdr); + qos[0] &= ~IEEE80211_QOS_CTL_A_MSDU_PRESENT; + }
/* Some attention flags are valid only in the last MSDU. */ last = skb_peek_tail(amsdu); @@ -1379,9 +1434,14 @@ static void ath10k_htt_rx_h_mpdu(struct status->flag |= RX_FLAG_DECRYPTED;
if (likely(!is_mgmt)) - status->flag |= RX_FLAG_IV_STRIPPED | - RX_FLAG_MMIC_STRIPPED; -} + status->flag |= RX_FLAG_MMIC_STRIPPED; + + if (fill_crypt_header) + status->flag |= RX_FLAG_MIC_STRIPPED | + RX_FLAG_ICV_STRIPPED; + else + status->flag |= RX_FLAG_IV_STRIPPED; + }
skb_queue_walk(amsdu, msdu) { ath10k_htt_rx_h_csum_offload(msdu); @@ -1397,6 +1457,9 @@ static void ath10k_htt_rx_h_mpdu(struct if (is_mgmt) continue;
+ if (fill_crypt_header) + continue; + hdr = (void *)msdu->data; hdr->frame_control &= ~__cpu_to_le16(IEEE80211_FCTL_PROTECTED); } @@ -1407,6 +1470,9 @@ static void ath10k_htt_rx_h_deliver(stru struct ieee80211_rx_status *status) { struct sk_buff *msdu; + struct sk_buff *first_subframe; + + first_subframe = skb_peek(amsdu);
while ((msdu = __skb_dequeue(amsdu))) { /* Setup per-MSDU flags */ @@ -1415,6 +1481,13 @@ static void ath10k_htt_rx_h_deliver(stru else status->flag |= RX_FLAG_AMSDU_MORE;
+ if (msdu == first_subframe) { + first_subframe = NULL; + status->flag &= ~RX_FLAG_ALLOW_SAME_PN; + } else { + status->flag |= RX_FLAG_ALLOW_SAME_PN; + } + ath10k_process_rx(ar, status, msdu); } } @@ -1557,7 +1630,7 @@ static int ath10k_htt_rx_handle_amsdu(st ath10k_htt_rx_h_ppdu(ar, &amsdu, rx_status, 0xffff); ath10k_htt_rx_h_unchain(ar, &amsdu, ret > 0); ath10k_htt_rx_h_filter(ar, &amsdu, rx_status); - ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status); + ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status, true); ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
return num_msdus; @@ -1892,7 +1965,7 @@ static int ath10k_htt_rx_in_ord_ind(stru num_msdus += skb_queue_len(&amsdu); ath10k_htt_rx_h_ppdu(ar, &amsdu, status, vdev_id); ath10k_htt_rx_h_filter(ar, &amsdu, status); - ath10k_htt_rx_h_mpdu(ar, &amsdu, status); + ath10k_htt_rx_h_mpdu(ar, &amsdu, status, false); ath10k_htt_rx_h_deliver(ar, &amsdu, status); break; case -EAGAIN: --- a/drivers/net/wireless/ath/ath10k/rx_desc.h +++ b/drivers/net/wireless/ath/ath10k/rx_desc.h @@ -239,6 +239,9 @@ enum htt_rx_mpdu_encrypt_type { HTT_RX_MPDU_ENCRYPT_WAPI = 5, HTT_RX_MPDU_ENCRYPT_AES_CCM_WPA2 = 6, HTT_RX_MPDU_ENCRYPT_NONE = 7, + HTT_RX_MPDU_ENCRYPT_AES_CCM256_WPA2 = 8, + HTT_RX_MPDU_ENCRYPT_AES_GCMP_WPA2 = 9, + HTT_RX_MPDU_ENCRYPT_AES_GCMP256_WPA2 = 10, };
#define RX_MPDU_START_INFO0_PEER_IDX_MASK 0x000007ff
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfgang Grandegger wg@grandegger.com
commit d5b42e6607661b198d8b26a0c30969605b1bf5c7 upstream.
The "set_bittiming" callback treats a positive return value as error! For that reason "can_changelink()" will quit silently after setting the bittiming values without processing ctrlmode, restart-ms, etc.
Signed-off-by: Wolfgang Grandegger wg@grandegger.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/usb/gs_usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/can/usb/gs_usb.c +++ b/drivers/net/can/usb/gs_usb.c @@ -449,7 +449,7 @@ static int gs_usb_set_bittiming(struct n dev_err(netdev->dev.parent, "Couldn't set bittimings (err=%d)", rc);
- return rc; + return (rc > 0) ? 0 : rc; }
static void gs_usb_xmit_callback(struct urb *urb)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bart.vanassche@wdc.com
commit bec40c26041de61162f7be9d2ce548c756ce0f65 upstream.
With the SRP protocol all RDMA operations are initiated by the target. Since no RDMA operations are initiated by the initiator, do not grant the initiator permission to submit RDMA reads or writes to the target.
Signed-off-by: Bart Van Assche bart.vanassche@wdc.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -992,8 +992,7 @@ static int srpt_init_ch_qp(struct srpt_r return -ENOMEM;
attr->qp_state = IB_QPS_INIT; - attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ | - IB_ACCESS_REMOTE_WRITE; + attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE; attr->port_num = ch->sport->port; attr->pkey_index = 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit b67336eee3fcb8ecedc6c13e2bf88aacfa3151e2 upstream.
Fix an API loophole introduced with commit 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS"), where the caller of prctl(2) is incorrectly allowed to make a change to CP0.Status.FR or CP0.Config5.FRE register bits even if CONFIG_MIPS_O32_FP64_SUPPORT has not been enabled, despite that an executable requesting the mode requested via ELF file annotation would not be allowed to run in the first place, or for n64 and n64 ABI tasks which do not have non-default modes defined at all. Add suitable checks to `mips_set_process_fp_mode' and bail out if an invalid mode change has been requested for the ABI in effect, even if the FPU hardware or emulation would otherwise allow it.
Always succeed however without taking any further action if the mode requested is the same as one already in effect, regardless of whether any mode change, should it be requested, would actually be allowed for the task concerned.
Signed-off-by: Maciej W. Rozycki macro@mips.com Fixes: 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS") Reviewed-by: Paul Burton paul.burton@mips.com Cc: James Hogan james.hogan@mips.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17800/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/process.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -683,6 +683,18 @@ int mips_set_process_fp_mode(struct task struct task_struct *t; int max_users;
+ /* If nothing to change, return right away, successfully. */ + if (value == mips_get_process_fp_mode(task)) + return 0; + + /* Only accept a mode change if 64-bit FP enabled for o32. */ + if (!IS_ENABLED(CONFIG_MIPS_O32_FP64_SUPPORT)) + return -EOPNOTSUPP; + + /* And only for o32 tasks. */ + if (IS_ENABLED(CONFIG_64BIT) && !test_thread_flag(TIF_32BIT_REGS)) + return -EOPNOTSUPP; + /* Check the value is valid */ if (value & ~known_bits) return -EOPNOTSUPP;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit a03fe72572c12e98f4173f8a535f32468e48b6ec upstream.
In preparation to fix a commit 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset") FCSR access regression factor out NT_PRFPREG regset access helpers for the non-MSA and the MSA variants respectively, to avoid having to deal with excessive indentation in the actual fix.
No functional change, however use `target->thread.fpu.fpr[0]' rather than `target->thread.fpu.fpr[i]' for FGR holding type size determination as there's no `i' variable to refer to anymore, and for the factored out `i' variable declaration use `unsigned int' rather than `unsigned' as its type, following the common style.
Signed-off-by: Maciej W. Rozycki macro@mips.com Fixes: 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset") Cc: James Hogan james.hogan@mips.com Cc: Paul Burton Paul.Burton@mips.com Cc: Alex Smith alex@alex-smith.me.uk Cc: Dave Martin Dave.Martin@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17925/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/ptrace.c | 108 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 83 insertions(+), 25 deletions(-)
--- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -439,25 +439,36 @@ static int gpr64_set(struct task_struct
#endif /* CONFIG_64BIT */
-static int fpr_get(struct task_struct *target, - const struct user_regset *regset, - unsigned int pos, unsigned int count, - void *kbuf, void __user *ubuf) +/* + * Copy the floating-point context to the supplied NT_PRFPREG buffer, + * !CONFIG_CPU_HAS_MSA variant. FP context's general register slots + * correspond 1:1 to buffer slots. + */ +static int fpr_get_fpa(struct task_struct *target, + unsigned int *pos, unsigned int *count, + void **kbuf, void __user **ubuf) { - unsigned i; - int err; - u64 fpr_val; - - /* XXX fcr31 */ + return user_regset_copyout(pos, count, kbuf, ubuf, + &target->thread.fpu, + 0, sizeof(elf_fpregset_t)); +}
- if (sizeof(target->thread.fpu.fpr[i]) == sizeof(elf_fpreg_t)) - return user_regset_copyout(&pos, &count, &kbuf, &ubuf, - &target->thread.fpu, - 0, sizeof(elf_fpregset_t)); +/* + * Copy the floating-point context to the supplied NT_PRFPREG buffer, + * CONFIG_CPU_HAS_MSA variant. Only lower 64 bits of FP context's + * general register slots are copied to buffer slots. + */ +static int fpr_get_msa(struct task_struct *target, + unsigned int *pos, unsigned int *count, + void **kbuf, void __user **ubuf) +{ + unsigned int i; + u64 fpr_val; + int err;
for (i = 0; i < NUM_FPU_REGS; i++) { fpr_val = get_fpr64(&target->thread.fpu.fpr[i], 0); - err = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + err = user_regset_copyout(pos, count, kbuf, ubuf, &fpr_val, i * sizeof(elf_fpreg_t), (i + 1) * sizeof(elf_fpreg_t)); if (err) @@ -467,27 +478,54 @@ static int fpr_get(struct task_struct *t return 0; }
-static int fpr_set(struct task_struct *target, +/* Copy the floating-point context to the supplied NT_PRFPREG buffer. */ +static int fpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, - const void *kbuf, const void __user *ubuf) + void *kbuf, void __user *ubuf) { - unsigned i; int err; - u64 fpr_val;
/* XXX fcr31 */
- init_fp_ctx(target); + if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t)) + err = fpr_get_fpa(target, &pos, &count, &kbuf, &ubuf); + else + err = fpr_get_msa(target, &pos, &count, &kbuf, &ubuf); + + return err; +}
- if (sizeof(target->thread.fpu.fpr[i]) == sizeof(elf_fpreg_t)) - return user_regset_copyin(&pos, &count, &kbuf, &ubuf, - &target->thread.fpu, - 0, sizeof(elf_fpregset_t)); +/* + * Copy the supplied NT_PRFPREG buffer to the floating-point context, + * !CONFIG_CPU_HAS_MSA variant. Buffer slots correspond 1:1 to FP + * context's general register slots. + */ +static int fpr_set_fpa(struct task_struct *target, + unsigned int *pos, unsigned int *count, + const void **kbuf, const void __user **ubuf) +{ + return user_regset_copyin(pos, count, kbuf, ubuf, + &target->thread.fpu, + 0, sizeof(elf_fpregset_t)); +} + +/* + * Copy the supplied NT_PRFPREG buffer to the floating-point context, + * CONFIG_CPU_HAS_MSA variant. Buffer slots are copied to lower 64 + * bits only of FP context's general register slots. + */ +static int fpr_set_msa(struct task_struct *target, + unsigned int *pos, unsigned int *count, + const void **kbuf, const void __user **ubuf) +{ + unsigned int i; + u64 fpr_val; + int err;
BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t)); - for (i = 0; i < NUM_FPU_REGS && count >= sizeof(elf_fpreg_t); i++) { - err = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + for (i = 0; i < NUM_FPU_REGS && *count >= sizeof(elf_fpreg_t); i++) { + err = user_regset_copyin(pos, count, kbuf, ubuf, &fpr_val, i * sizeof(elf_fpreg_t), (i + 1) * sizeof(elf_fpreg_t)); if (err) @@ -498,6 +536,26 @@ static int fpr_set(struct task_struct *t return 0; }
+/* Copy the supplied NT_PRFPREG buffer to the floating-point context. */ +static int fpr_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int err; + + /* XXX fcr31 */ + + init_fp_ctx(target); + + if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t)) + err = fpr_set_fpa(target, &pos, &count, &kbuf, &ubuf); + else + err = fpr_set_msa(target, &pos, &count, &kbuf, &ubuf); + + return err; +} + enum mips_regset { REGSET_GPR, REGSET_FPR,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit dc24d0edf33c3e15099688b6bbdf7bdc24bf6e91 upstream.
Complement commit d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write") and ensure that no partial register write attempt is made with PTRACE_SETREGSET, as we do not preinitialize any temporaries used to hold incoming register data and consequently random data could be written.
It is the responsibility of the caller, such as `ptrace_regset', to arrange for writes to span whole registers only, so here we only assert that it has indeed happened.
Signed-off-by: Maciej W. Rozycki macro@mips.com Fixes: 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset") Cc: James Hogan james.hogan@mips.com Cc: Paul Burton Paul.Burton@mips.com Cc: Alex Smith alex@alex-smith.me.uk Cc: Dave Martin Dave.Martin@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17926/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/ptrace.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -536,7 +536,15 @@ static int fpr_set_msa(struct task_struc return 0; }
-/* Copy the supplied NT_PRFPREG buffer to the floating-point context. */ +/* + * Copy the supplied NT_PRFPREG buffer to the floating-point context. + * + * We optimize for the case where `count % sizeof(elf_fpreg_t) == 0', + * which is supposed to have been guaranteed by the kernel before + * calling us, e.g. in `ptrace_regset'. We enforce that requirement, + * so that we can safely avoid preinitializing temporaries for + * partial register writes. + */ static int fpr_set(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, @@ -544,6 +552,8 @@ static int fpr_set(struct task_struct *t { int err;
+ BUG_ON(count % sizeof(elf_fpreg_t)); + /* XXX fcr31 */
init_fp_ctx(target);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit 80b3ffce0196ea50068885d085ff981e4b8396f4 upstream.
Update commit d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write") bug and consistently consume all data supplied to `fpr_set_msa' with the ptrace(2) PTRACE_SETREGSET request, such that a zero data buffer counter is returned where insufficient data has been given to fill a whole number of FP general registers.
In reality this is not going to happen, as the caller is supposed to only supply data covering a whole number of registers and it is verified in `ptrace_regset' and again asserted in `fpr_set', however structuring code such that the presence of trailing partial FP general register data causes `fpr_set_msa' to return with a non-zero data buffer counter makes it appear that this trailing data will be used if there are subsequent writes made to FP registers, which is going to be the case with the FCSR once the missing write to that register has been fixed.
Fixes: d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write") Signed-off-by: Maciej W. Rozycki macro@mips.com Cc: James Hogan james.hogan@mips.com Cc: Paul Burton Paul.Burton@mips.com Cc: Alex Smith alex@alex-smith.me.uk Cc: Dave Martin Dave.Martin@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17927/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -524,7 +524,7 @@ static int fpr_set_msa(struct task_struc int err;
BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t)); - for (i = 0; i < NUM_FPU_REGS && *count >= sizeof(elf_fpreg_t); i++) { + for (i = 0; i < NUM_FPU_REGS && *count > 0; i++) { err = user_regset_copyin(pos, count, kbuf, ubuf, &fpr_val, i * sizeof(elf_fpreg_t), (i + 1) * sizeof(elf_fpreg_t));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit be07a6a1188372b6d19a3307ec33211fc9c9439d upstream.
Fix a commit 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset") public API regression, then activated by commit 1db1af84d6df ("MIPS: Basic MSA context switching support"), that caused the FCSR register not to be read or written for CONFIG_CPU_HAS_MSA kernel configurations (regardless of actual presence or absence of the MSA feature in a given processor) with ptrace(2) PTRACE_GETREGSET and PTRACE_SETREGSET requests nor recorded in core dumps.
This is because with !CONFIG_CPU_HAS_MSA configurations the whole of `elf_fpregset_t' array is bulk-copied as it is, which includes the FCSR in one half of the last, 33rd slot, whereas with CONFIG_CPU_HAS_MSA configurations array elements are copied individually, and then only the leading 32 FGR slots while the remaining slot is ignored.
Correct the code then such that only FGR slots are copied in the respective !MSA and MSA helpers an then the FCSR slot is handled separately in common code. Use `ptrace_setfcr31' to update the FCSR too, so that the read-only mask is respected.
Retrieving a correct value of FCSR is important in debugging not only for the human to be able to get the right interpretation of the situation, but for correct operation of GDB as well. This is because the condition code bits in FSCR are used by GDB to determine the location to place a breakpoint at when single-stepping through an FPU branch instruction. If such a breakpoint is placed incorrectly (i.e. with the condition reversed), then it will be missed, likely causing the debuggee to run away from the control of GDB and consequently breaking the process of investigation.
Fortunately GDB continues using the older PTRACE_GETFPREGS ptrace(2) request which is unaffected, so the regression only really hits with post-mortem debug sessions using a core dump file, in which case execution, and consequently single-stepping through branches is not possible. Of course core files created by buggy kernels out there will have the value of FCSR recorded clobbered, but such core files cannot be corrected and the person using them simply will have to be aware that the value of FCSR retrieved is not reliable.
Which also means we can likely get away without defining a replacement API which would ensure a correct value of FSCR to be retrieved, or none at all.
This is based on previous work by Alex Smith, extensively rewritten.
Signed-off-by: Alex Smith alex@alex-smith.me.uk Signed-off-by: James Hogan james.hogan@mips.com Signed-off-by: Maciej W. Rozycki macro@mips.com Fixes: 72b22bbad1e7 ("MIPS: Don't assume 64-bit FP registers for FP regset") Cc: Paul Burton Paul.Burton@mips.com Cc: Dave Martin Dave.Martin@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17928/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/ptrace.c | 47 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-)
--- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -442,7 +442,7 @@ static int gpr64_set(struct task_struct /* * Copy the floating-point context to the supplied NT_PRFPREG buffer, * !CONFIG_CPU_HAS_MSA variant. FP context's general register slots - * correspond 1:1 to buffer slots. + * correspond 1:1 to buffer slots. Only general registers are copied. */ static int fpr_get_fpa(struct task_struct *target, unsigned int *pos, unsigned int *count, @@ -450,13 +450,14 @@ static int fpr_get_fpa(struct task_struc { return user_regset_copyout(pos, count, kbuf, ubuf, &target->thread.fpu, - 0, sizeof(elf_fpregset_t)); + 0, NUM_FPU_REGS * sizeof(elf_fpreg_t)); }
/* * Copy the floating-point context to the supplied NT_PRFPREG buffer, * CONFIG_CPU_HAS_MSA variant. Only lower 64 bits of FP context's - * general register slots are copied to buffer slots. + * general register slots are copied to buffer slots. Only general + * registers are copied. */ static int fpr_get_msa(struct task_struct *target, unsigned int *pos, unsigned int *count, @@ -478,20 +479,29 @@ static int fpr_get_msa(struct task_struc return 0; }
-/* Copy the floating-point context to the supplied NT_PRFPREG buffer. */ +/* + * Copy the floating-point context to the supplied NT_PRFPREG buffer. + * Choose the appropriate helper for general registers, and then copy + * the FCSR register separately. + */ static int fpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) { + const int fcr31_pos = NUM_FPU_REGS * sizeof(elf_fpreg_t); int err;
- /* XXX fcr31 */ - if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t)) err = fpr_get_fpa(target, &pos, &count, &kbuf, &ubuf); else err = fpr_get_msa(target, &pos, &count, &kbuf, &ubuf); + if (err) + return err; + + err = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.fpu.fcr31, + fcr31_pos, fcr31_pos + sizeof(u32));
return err; } @@ -499,7 +509,7 @@ static int fpr_get(struct task_struct *t /* * Copy the supplied NT_PRFPREG buffer to the floating-point context, * !CONFIG_CPU_HAS_MSA variant. Buffer slots correspond 1:1 to FP - * context's general register slots. + * context's general register slots. Only general registers are copied. */ static int fpr_set_fpa(struct task_struct *target, unsigned int *pos, unsigned int *count, @@ -507,13 +517,14 @@ static int fpr_set_fpa(struct task_struc { return user_regset_copyin(pos, count, kbuf, ubuf, &target->thread.fpu, - 0, sizeof(elf_fpregset_t)); + 0, NUM_FPU_REGS * sizeof(elf_fpreg_t)); }
/* * Copy the supplied NT_PRFPREG buffer to the floating-point context, * CONFIG_CPU_HAS_MSA variant. Buffer slots are copied to lower 64 - * bits only of FP context's general register slots. + * bits only of FP context's general register slots. Only general + * registers are copied. */ static int fpr_set_msa(struct task_struct *target, unsigned int *pos, unsigned int *count, @@ -538,6 +549,8 @@ static int fpr_set_msa(struct task_struc
/* * Copy the supplied NT_PRFPREG buffer to the floating-point context. + * Choose the appropriate helper for general registers, and then copy + * the FCSR register separately. * * We optimize for the case where `count % sizeof(elf_fpreg_t) == 0', * which is supposed to have been guaranteed by the kernel before @@ -550,18 +563,30 @@ static int fpr_set(struct task_struct *t unsigned int pos, unsigned int count, const void *kbuf, const void __user *ubuf) { + const int fcr31_pos = NUM_FPU_REGS * sizeof(elf_fpreg_t); + u32 fcr31; int err;
BUG_ON(count % sizeof(elf_fpreg_t));
- /* XXX fcr31 */ - init_fp_ctx(target);
if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t)) err = fpr_set_fpa(target, &pos, &count, &kbuf, &ubuf); else err = fpr_set_msa(target, &pos, &count, &kbuf, &ubuf); + if (err) + return err; + + if (count > 0) { + err = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &fcr31, + fcr31_pos, fcr31_pos + sizeof(u32)); + if (err) + return err; + + ptrace_setfcr31(target, fcr31); + }
return err; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit 006501e039eec411842bb3150c41358867d320c2 upstream.
Complement commit d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write") and like with the PTRACE_GETREGSET ptrace(2) request also apply a BUILD_BUG_ON check for the size of the `elf_fpreg_t' type in the PTRACE_SETREGSET request handler.
Signed-off-by: Maciej W. Rozycki macro@mips.com Fixes: d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write") Cc: James Hogan james.hogan@mips.com Cc: Paul Burton Paul.Burton@mips.com Cc: Alex Smith alex@alex-smith.me.uk Cc: Dave Martin Dave.Martin@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17929/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/ptrace.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -467,6 +467,7 @@ static int fpr_get_msa(struct task_struc u64 fpr_val; int err;
+ BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t)); for (i = 0; i < NUM_FPU_REGS; i++) { fpr_val = get_fpr64(&target->thread.fpu.fpr[i], 0); err = user_regset_copyout(pos, count, kbuf, ubuf,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej W. Rozycki macro@mips.com
commit c8c5a3a24d395b14447a9a89d61586a913840a3b upstream.
Complement commit c23b3d1a5311 ("MIPS: ptrace: Change GP regset to use correct core dump register layout") and also reject outsized PTRACE_SETREGSET requests to the NT_PRFPREG regset, like with the NT_PRSTATUS regset.
Signed-off-by: Maciej W. Rozycki macro@mips.com Fixes: c23b3d1a5311 ("MIPS: ptrace: Change GP regset to use correct core dump register layout") Cc: James Hogan james.hogan@mips.com Cc: Paul Burton Paul.Burton@mips.com Cc: Alex Smith alex@alex-smith.me.uk Cc: Dave Martin Dave.Martin@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/17930/ Signed-off-by: Ralf Baechle ralf@linux-mips.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/ptrace.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -570,6 +570,9 @@ static int fpr_set(struct task_struct *t
BUG_ON(count % sizeof(elf_fpreg_t));
+ if (pos + count > sizeof(elf_fpregset_t)) + return -EIO; + init_fp_ctx(target);
if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jim Mattson jmattson@google.com
commit 0cb5b30698fdc8f6b4646012e3acb4ddce430788 upstream.
Guest GPR values are live in the hardware GPRs at VM-exit. Do not leave any guest values in hardware GPRs after the guest GPR values are saved to the vcpu_vmx structure.
This is a partial mitigation for CVE 2017-5715 and CVE 2017-5753. Specifically, it defeats the Project Zero PoC for CVE 2017-5715.
Suggested-by: Eric Northup digitaleric@google.com Signed-off-by: Jim Mattson jmattson@google.com Reviewed-by: Eric Northup digitaleric@google.com Reviewed-by: Benjamin Serebrin serebrin@google.com Reviewed-by: Andrew Honig ahonig@google.com [Paolo: Add AMD bits, Signed-off-by: Tom Lendacky thomas.lendacky@amd.com] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/svm.c | 19 +++++++++++++++++++ arch/x86/kvm/vmx.c | 14 +++++++++++++- 2 files changed, 32 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4869,6 +4869,25 @@ static void svm_vcpu_run(struct kvm_vcpu "mov %%r14, %c[r14](%[svm]) \n\t" "mov %%r15, %c[r15](%[svm]) \n\t" #endif + /* + * Clear host registers marked as clobbered to prevent + * speculative use. + */ + "xor %%" _ASM_BX ", %%" _ASM_BX " \n\t" + "xor %%" _ASM_CX ", %%" _ASM_CX " \n\t" + "xor %%" _ASM_DX ", %%" _ASM_DX " \n\t" + "xor %%" _ASM_SI ", %%" _ASM_SI " \n\t" + "xor %%" _ASM_DI ", %%" _ASM_DI " \n\t" +#ifdef CONFIG_X86_64 + "xor %%r8, %%r8 \n\t" + "xor %%r9, %%r9 \n\t" + "xor %%r10, %%r10 \n\t" + "xor %%r11, %%r11 \n\t" + "xor %%r12, %%r12 \n\t" + "xor %%r13, %%r13 \n\t" + "xor %%r14, %%r14 \n\t" + "xor %%r15, %%r15 \n\t" +#endif "pop %%" _ASM_BP : : [svm]"a"(svm), --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8948,6 +8948,7 @@ static void __noclone vmx_vcpu_run(struc /* Save guest registers, load host registers, keep flags */ "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t" "pop %0 \n\t" + "setbe %c[fail](%0)\n\t" "mov %%" _ASM_AX ", %c[rax](%0) \n\t" "mov %%" _ASM_BX ", %c[rbx](%0) \n\t" __ASM_SIZE(pop) " %c[rcx](%0) \n\t" @@ -8964,12 +8965,23 @@ static void __noclone vmx_vcpu_run(struc "mov %%r13, %c[r13](%0) \n\t" "mov %%r14, %c[r14](%0) \n\t" "mov %%r15, %c[r15](%0) \n\t" + "xor %%r8d, %%r8d \n\t" + "xor %%r9d, %%r9d \n\t" + "xor %%r10d, %%r10d \n\t" + "xor %%r11d, %%r11d \n\t" + "xor %%r12d, %%r12d \n\t" + "xor %%r13d, %%r13d \n\t" + "xor %%r14d, %%r14d \n\t" + "xor %%r15d, %%r15d \n\t" #endif "mov %%cr2, %%" _ASM_AX " \n\t" "mov %%" _ASM_AX ", %c[cr2](%0) \n\t"
+ "xor %%eax, %%eax \n\t" + "xor %%ebx, %%ebx \n\t" + "xor %%esi, %%esi \n\t" + "xor %%edi, %%edi \n\t" "pop %%" _ASM_BP "; pop %%" _ASM_DX " \n\t" - "setbe %c[fail](%0) \n\t" ".pushsection .rodata \n\t" ".global vmx_return \n\t" "vmx_return: " _ASM_PTR " 2b \n\t"
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 98b8e4e5c17bf87c1b18ed929472051dab39878c upstream.
Calling acpi_wmi_init() at the subsys_initcall() level causes ordering issues to appear on some systems and they are difficult to reproduce, because there is no guaranteed ordering between subsys_initcall() calls, so they may occur in different orders on different systems.
In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache creation delayed issue) exposed one of these issues where genl_init() and acpi_wmi_init() are both called at the same initcall level, but the former must run before the latter so as to avoid a NULL pointer dereference.
For this reason, move the acpi_wmi_init() invocation to the initcall_sync level which should still be early enough for things to work correctly in the WMI land.
Link: https://marc.info/?t=151274596700002&r=1&w=2 Reported-by: Jonathan McDowell noodles@earth.li Reported-by: Joonsoo Kim iamjoonsoo.kim@lge.com Tested-by: Jonathan McDowell noodles@earth.li Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Darren Hart (VMware) dvhart@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/platform/x86/wmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -848,5 +848,5 @@ static void __exit acpi_wmi_exit(void) pr_info("Mapper unloaded\n"); }
-subsys_initcall(acpi_wmi_init); +subsys_initcall_sync(acpi_wmi_init); module_exit(acpi_wmi_exit);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikas C Sajjan vikas.cha.sajjan@hpe.com
commit 252714155f04c5d16989cb3aadb85fd1b5772f99 upstream.
Platforms which support only IOAPIC mode, pass the SCI information above the legacy space (0-15) via the FADT mechanism and not via MADT.
In such cases mp_override_legacy_irq() which is invoked from acpi_sci_ioapic_setup() to register SCI interrupts fails for interrupts greater equal 16, since it is meant to handle only the legacy space and emits error "Invalid bus_irq %u for legacy override".
Add a new function to handle SCI interrupts >= 16 and invoke it conditionally in acpi_sci_ioapic_setup().
The code duplication due to this new function will be cleaned up in a separate patch.
Co-developed-by: Sunil V L sunil.vl@hpe.com Signed-off-by: Vikas C Sajjan vikas.cha.sajjan@hpe.com Signed-off-by: Sunil V L sunil.vl@hpe.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Abdul Lateef Attar abdul-lateef.attar@hpe.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: linux-pm@vger.kernel.org Cc: kkamagui@gmail.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/1510848825-21965-2-git-send-email-vikas.cha.sajjan... Cc: Jean Delvare jdelvare@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/acpi/boot.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -422,6 +422,34 @@ static int mp_config_acpi_gsi(struct dev return 0; }
+static int __init mp_register_ioapic_irq(u8 bus_irq, u8 polarity, + u8 trigger, u32 gsi) +{ + struct mpc_intsrc mp_irq; + int ioapic, pin; + + /* Convert 'gsi' to 'ioapic.pin'(INTIN#) */ + ioapic = mp_find_ioapic(gsi); + if (ioapic < 0) { + pr_warn("Failed to find ioapic for gsi : %u\n", gsi); + return ioapic; + } + + pin = mp_find_ioapic_pin(ioapic, gsi); + + mp_irq.type = MP_INTSRC; + mp_irq.irqtype = mp_INT; + mp_irq.irqflag = (trigger << 2) | polarity; + mp_irq.srcbus = MP_ISA_BUS; + mp_irq.srcbusirq = bus_irq; + mp_irq.dstapic = mpc_ioapic_id(ioapic); + mp_irq.dstirq = pin; + + mp_save_irq(&mp_irq); + + return 0; +} + static int __init acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) { @@ -466,7 +494,11 @@ static void __init acpi_sci_ioapic_setup if (acpi_sci_flags & ACPI_MADT_POLARITY_MASK) polarity = acpi_sci_flags & ACPI_MADT_POLARITY_MASK;
- mp_override_legacy_irq(bus_irq, polarity, trigger, gsi); + if (bus_irq < NR_IRQS_LEGACY) + mp_override_legacy_irq(bus_irq, polarity, trigger, gsi); + else + mp_register_ioapic_irq(bus_irq, polarity, trigger, gsi); + acpi_penalize_sci_irq(bus_irq, trigger, polarity);
/*
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit fe08f34d066f4404934a509b6806db1a4f700c86 upstream.
syzkaller triggered kernel warnings through PCM OSS emulation at closing a stream: WARNING: CPU: 0 PID: 3502 at sound/core/pcm_lib.c:1635 snd_pcm_hw_param_first+0x289/0x690 sound/core/pcm_lib.c:1635 Call Trace: .... snd_pcm_hw_param_near.constprop.27+0x78d/0x9a0 sound/core/oss/pcm_oss.c:457 snd_pcm_oss_change_params+0x17d3/0x3720 sound/core/oss/pcm_oss.c:969 snd_pcm_oss_make_ready+0xaa/0x130 sound/core/oss/pcm_oss.c:1128 snd_pcm_oss_sync+0x257/0x830 sound/core/oss/pcm_oss.c:1638 snd_pcm_oss_release+0x20b/0x280 sound/core/oss/pcm_oss.c:2431 __fput+0x327/0x7e0 fs/file_table.c:210 ....
This happens while it tries to open and set up the aloop device concurrently. The warning above (invoked from snd_BUG_ON() macro) is to detect the unexpected logical error where snd_pcm_hw_refine() call shouldn't fail. The theory is true for the case where the hw_params config rules are static. But for an aloop device, the hw_params rule condition does vary dynamically depending on the connected target; when another device is opened and changes the parameters, the device connected in another side is also affected, and it caused the error from snd_pcm_hw_refine().
That is, the simplest "solution" for this is to remove the incorrect assumption of static rules, and treat such an error as a normal error path. As there are a couple of other places using snd_BUG_ON() incorrectly, this patch removes these spurious snd_BUG_ON() calls.
Reported-by: syzbot+6f11c7e2a1b91d466432@syzkaller.appspotmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/oss/pcm_oss.c | 1 - sound/core/pcm_lib.c | 4 ++-- 2 files changed, 2 insertions(+), 3 deletions(-)
--- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -466,7 +466,6 @@ static int snd_pcm_hw_param_near(struct v = snd_pcm_hw_param_last(pcm, params, var, dir); else v = snd_pcm_hw_param_first(pcm, params, var, dir); - snd_BUG_ON(v < 0); return v; }
--- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -1664,7 +1664,7 @@ int snd_pcm_hw_param_first(struct snd_pc return changed; if (params->rmask) { int err = snd_pcm_hw_refine(pcm, params); - if (snd_BUG_ON(err < 0)) + if (err < 0) return err; } return snd_pcm_hw_param_value(params, var, dir); @@ -1711,7 +1711,7 @@ int snd_pcm_hw_param_last(struct snd_pcm return changed; if (params->rmask) { int err = snd_pcm_hw_refine(pcm, params); - if (snd_BUG_ON(err < 0)) + if (err < 0) return err; } return snd_pcm_hw_param_value(params, var, dir);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 6708913750344a900f2e73bfe4a4d6dbbce4fe8d upstream.
In the OSS emulation plugin builder where the frame size is parsed in the plugin chain, some places miss the possible errors returned from the plugin src_ or dst_frames callback.
This patch papers over such places.
Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/oss/pcm_plugin.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/sound/core/oss/pcm_plugin.c +++ b/sound/core/oss/pcm_plugin.c @@ -591,18 +591,26 @@ snd_pcm_sframes_t snd_pcm_plug_write_tra snd_pcm_sframes_t frames = size;
plugin = snd_pcm_plug_first(plug); - while (plugin && frames > 0) { + while (plugin) { + if (frames <= 0) + return frames; if ((next = plugin->next) != NULL) { snd_pcm_sframes_t frames1 = frames; - if (plugin->dst_frames) + if (plugin->dst_frames) { frames1 = plugin->dst_frames(plugin, frames); + if (frames1 <= 0) + return frames1; + } if ((err = next->client_channels(next, frames1, &dst_channels)) < 0) { return err; } if (err != frames1) { frames = err; - if (plugin->src_frames) + if (plugin->src_frames) { frames = plugin->src_frames(plugin, frames1); + if (frames <= 0) + return frames; + } } } else dst_channels = NULL;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 29159a4ed7044c52e3e2cf1a9fb55cec4745c60b upstream.
The loops for read and write in PCM OSS emulation have no proper check of pending signals, and they keep processing even after user tries to break. This results in a very long delay, often seen as RCU stall when a huge unprocessed bytes remain queued. The bug could be easily triggered by syzkaller.
As a simple workaround, this patch adds the proper check of pending signals and aborts the loop appropriately.
Reported-by: syzbot+993cb4cfcbbff3947c21@syzkaller.appspotmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/oss/pcm_oss.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -1416,6 +1416,10 @@ static ssize_t snd_pcm_oss_write1(struct tmp != runtime->oss.period_bytes) break; } + if (signal_pending(current)) { + tmp = -ERESTARTSYS; + goto err; + } } mutex_unlock(&runtime->oss.params_lock); return xfer; @@ -1501,6 +1505,10 @@ static ssize_t snd_pcm_oss_read1(struct bytes -= tmp; xfer += tmp; } + if (signal_pending(current)) { + tmp = -ERESTARTSYS; + goto err; + } } mutex_unlock(&runtime->oss.params_lock); return xfer;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 900498a34a3ac9c611e9b425094c8106bdd7dc1c upstream.
PCM OSS read/write loops keep taking the mutex lock for the whole read/write, and this might take very long when the exceptionally high amount of data is given. Also, since it invokes with mutex_lock(), the concurrent read/write becomes unbreakable.
This patch tries to address these issues by replacing mutex_lock() with mutex_lock_interruptible(), and also splits / re-takes the lock at each read/write period chunk, so that it can switch the context more finely if requested.
Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/oss/pcm_oss.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-)
--- a/sound/core/oss/pcm_oss.c +++ b/sound/core/oss/pcm_oss.c @@ -1369,8 +1369,11 @@ static ssize_t snd_pcm_oss_write1(struct
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp; - mutex_lock(&runtime->oss.params_lock); while (bytes > 0) { + if (mutex_lock_interruptible(&runtime->oss.params_lock)) { + tmp = -ERESTARTSYS; + break; + } if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) { tmp = bytes; if (tmp + runtime->oss.buffer_used > runtime->oss.period_bytes) @@ -1414,18 +1417,18 @@ static ssize_t snd_pcm_oss_write1(struct xfer += tmp; if ((substream->f_flags & O_NONBLOCK) != 0 && tmp != runtime->oss.period_bytes) - break; + tmp = -EAGAIN; } + err: + mutex_unlock(&runtime->oss.params_lock); + if (tmp < 0) + break; if (signal_pending(current)) { tmp = -ERESTARTSYS; - goto err; + break; } + tmp = 0; } - mutex_unlock(&runtime->oss.params_lock); - return xfer; - - err: - mutex_unlock(&runtime->oss.params_lock); return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp; }
@@ -1473,8 +1476,11 @@ static ssize_t snd_pcm_oss_read1(struct
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0) return tmp; - mutex_lock(&runtime->oss.params_lock); while (bytes > 0) { + if (mutex_lock_interruptible(&runtime->oss.params_lock)) { + tmp = -ERESTARTSYS; + break; + } if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) { if (runtime->oss.buffer_used == 0) { tmp = snd_pcm_oss_read2(substream, runtime->oss.buffer, runtime->oss.period_bytes, 1); @@ -1505,16 +1511,16 @@ static ssize_t snd_pcm_oss_read1(struct bytes -= tmp; xfer += tmp; } + err: + mutex_unlock(&runtime->oss.params_lock); + if (tmp < 0) + break; if (signal_pending(current)) { tmp = -ERESTARTSYS; - goto err; + break; } + tmp = 0; } - mutex_unlock(&runtime->oss.params_lock); - return xfer; - - err: - mutex_unlock(&runtime->oss.params_lock); return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 9685347aa0a5c2869058ca6ab79fd8e93084a67f upstream.
The aloop runtime object and its assignment in the cable are left even when opening a substream fails. This doesn't mean any memory leak, but it still keeps the invalid pointer that may be referred by the another side of the cable spontaneously, which is a potential Oops cause.
Clean up the cable assignment and the empty cable upon the error path properly.
Fixes: 597603d615d2 ("ALSA: introduce the snd-aloop module for the PCM loopback") Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/drivers/aloop.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-)
--- a/sound/drivers/aloop.c +++ b/sound/drivers/aloop.c @@ -658,12 +658,31 @@ static int rule_channels(struct snd_pcm_ return snd_interval_refine(hw_param_interval(params, rule->var), &t); }
+static void free_cable(struct snd_pcm_substream *substream) +{ + struct loopback *loopback = substream->private_data; + int dev = get_cable_index(substream); + struct loopback_cable *cable; + + cable = loopback->cables[substream->number][dev]; + if (!cable) + return; + if (cable->streams[!substream->stream]) { + /* other stream is still alive */ + cable->streams[substream->stream] = NULL; + } else { + /* free the cable */ + loopback->cables[substream->number][dev] = NULL; + kfree(cable); + } +} + static int loopback_open(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct loopback *loopback = substream->private_data; struct loopback_pcm *dpcm; - struct loopback_cable *cable; + struct loopback_cable *cable = NULL; int err = 0; int dev = get_cable_index(substream);
@@ -682,7 +701,6 @@ static int loopback_open(struct snd_pcm_ if (!cable) { cable = kzalloc(sizeof(*cable), GFP_KERNEL); if (!cable) { - kfree(dpcm); err = -ENOMEM; goto unlock; } @@ -724,6 +742,10 @@ static int loopback_open(struct snd_pcm_ else runtime->hw = cable->hw; unlock: + if (err < 0) { + free_cable(substream); + kfree(dpcm); + } mutex_unlock(&loopback->cable_lock); return err; } @@ -732,20 +754,10 @@ static int loopback_close(struct snd_pcm { struct loopback *loopback = substream->private_data; struct loopback_pcm *dpcm = substream->runtime->private_data; - struct loopback_cable *cable; - int dev = get_cable_index(substream);
loopback_timer_stop(dpcm); mutex_lock(&loopback->cable_lock); - cable = loopback->cables[substream->number][dev]; - if (cable->streams[!substream->stream]) { - /* other stream is still alive */ - cable->streams[substream->stream] = NULL; - } else { - /* free the cable */ - loopback->cables[substream->number][dev] = NULL; - kfree(cable); - } + free_cable(substream); mutex_unlock(&loopback->cable_lock); return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit b088b53e20c7d09b5ab84c5688e609f478e5c417 upstream.
The extra hw constraint rule for the formats the aloop driver introduced has a slight flaw, where it doesn't return a positive value when the mask got changed. It came from the fact that it's basically a copy&paste from snd_hw_constraint_mask64(). The original code is supposed to be a single-shot and it modifies the mask bits only once and never after, while what we need for aloop is the dynamic hw rule that limits the mask bits.
This difference results in the inconsistent state, as the hw_refine doesn't apply the dependencies fully. The worse and surprisingly result is that it causes a crash in OSS emulation when multiple full-duplex reads/writes are performed concurrently (I leave why it triggers Oops to readers as a homework).
For fixing this, replace a few open-codes with the standard snd_mask_*() macros.
Reported-by: syzbot+3902b5220e8ca27889ca@syzkaller.appspotmail.com Fixes: b1c73fc8e697 ("ALSA: snd-aloop: Fix hw_params restrictions and checking") Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/drivers/aloop.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
--- a/sound/drivers/aloop.c +++ b/sound/drivers/aloop.c @@ -39,6 +39,7 @@ #include <sound/core.h> #include <sound/control.h> #include <sound/pcm.h> +#include <sound/pcm_params.h> #include <sound/info.h> #include <sound/initval.h>
@@ -622,14 +623,12 @@ static int rule_format(struct snd_pcm_hw {
struct snd_pcm_hardware *hw = rule->private; - struct snd_mask *maskp = hw_param_mask(params, rule->var); + struct snd_mask m;
- maskp->bits[0] &= (u_int32_t)hw->formats; - maskp->bits[1] &= (u_int32_t)(hw->formats >> 32); - memset(maskp->bits + 2, 0, (SNDRV_MASK_MAX-64) / 8); /* clear rest */ - if (! maskp->bits[0] && ! maskp->bits[1]) - return -EINVAL; - return 0; + snd_mask_none(&m); + m.bits[0] = (u_int32_t)hw->formats; + m.bits[1] = (u_int32_t)(hw->formats >> 32); + return snd_mask_refine(hw_param_mask(params, rule->var), &m); }
static int rule_rate(struct snd_pcm_hw_params *params,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 898dfe4687f460ba337a01c11549f87269a13fa2 upstream.
The aloop driver tries to update the hw constraints of the connected target on the cable of the opened PCM substream. This is done by adding the extra hw constraints rules referring to the substream runtime->hw fields, while the other substream may update the runtime hw of another side on the fly.
This is, however, racy and may result in the inconsistent values when both PCM streams perform the prepare concurrently. One of the reason is that it overwrites the other's runtime->hw field; which is not only racy but also broken when it's called before the open of another side finishes. And, since the reference to runtime->hw isn't protected, the concurrent write may give the partial value update and become inconsistent.
This patch is an attempt to fix and clean up: - The prepare doesn't change the runtime->hw of other side any longer, but only update the cable->hw that is referred commonly. - The extra rules refer to the loopback_pcm object instead of the runtime->hw. The actual hw is deduced from cable->hw. - The extra rules take the cable_lock to protect against the race.
Fixes: b1c73fc8e697 ("ALSA: snd-aloop: Fix hw_params restrictions and checking") Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/drivers/aloop.c | 51 ++++++++++++++++++++------------------------------ 1 file changed, 21 insertions(+), 30 deletions(-)
--- a/sound/drivers/aloop.c +++ b/sound/drivers/aloop.c @@ -306,19 +306,6 @@ static int loopback_trigger(struct snd_p return 0; }
-static void params_change_substream(struct loopback_pcm *dpcm, - struct snd_pcm_runtime *runtime) -{ - struct snd_pcm_runtime *dst_runtime; - - if (dpcm == NULL || dpcm->substream == NULL) - return; - dst_runtime = dpcm->substream->runtime; - if (dst_runtime == NULL) - return; - dst_runtime->hw = dpcm->cable->hw; -} - static void params_change(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; @@ -330,10 +317,6 @@ static void params_change(struct snd_pcm cable->hw.rate_max = runtime->rate; cable->hw.channels_min = runtime->channels; cable->hw.channels_max = runtime->channels; - params_change_substream(cable->streams[SNDRV_PCM_STREAM_PLAYBACK], - runtime); - params_change_substream(cable->streams[SNDRV_PCM_STREAM_CAPTURE], - runtime); }
static int loopback_prepare(struct snd_pcm_substream *substream) @@ -621,24 +604,29 @@ static unsigned int get_cable_index(stru static int rule_format(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { - - struct snd_pcm_hardware *hw = rule->private; + struct loopback_pcm *dpcm = rule->private; + struct loopback_cable *cable = dpcm->cable; struct snd_mask m;
snd_mask_none(&m); - m.bits[0] = (u_int32_t)hw->formats; - m.bits[1] = (u_int32_t)(hw->formats >> 32); + mutex_lock(&dpcm->loopback->cable_lock); + m.bits[0] = (u_int32_t)cable->hw.formats; + m.bits[1] = (u_int32_t)(cable->hw.formats >> 32); + mutex_unlock(&dpcm->loopback->cable_lock); return snd_mask_refine(hw_param_mask(params, rule->var), &m); }
static int rule_rate(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { - struct snd_pcm_hardware *hw = rule->private; + struct loopback_pcm *dpcm = rule->private; + struct loopback_cable *cable = dpcm->cable; struct snd_interval t;
- t.min = hw->rate_min; - t.max = hw->rate_max; + mutex_lock(&dpcm->loopback->cable_lock); + t.min = cable->hw.rate_min; + t.max = cable->hw.rate_max; + mutex_unlock(&dpcm->loopback->cable_lock); t.openmin = t.openmax = 0; t.integer = 0; return snd_interval_refine(hw_param_interval(params, rule->var), &t); @@ -647,11 +635,14 @@ static int rule_rate(struct snd_pcm_hw_p static int rule_channels(struct snd_pcm_hw_params *params, struct snd_pcm_hw_rule *rule) { - struct snd_pcm_hardware *hw = rule->private; + struct loopback_pcm *dpcm = rule->private; + struct loopback_cable *cable = dpcm->cable; struct snd_interval t;
- t.min = hw->channels_min; - t.max = hw->channels_max; + mutex_lock(&dpcm->loopback->cable_lock); + t.min = cable->hw.channels_min; + t.max = cable->hw.channels_max; + mutex_unlock(&dpcm->loopback->cable_lock); t.openmin = t.openmax = 0; t.integer = 0; return snd_interval_refine(hw_param_interval(params, rule->var), &t); @@ -717,19 +708,19 @@ static int loopback_open(struct snd_pcm_ /* are cached -> they do not reflect the actual state */ err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_FORMAT, - rule_format, &runtime->hw, + rule_format, dpcm, SNDRV_PCM_HW_PARAM_FORMAT, -1); if (err < 0) goto unlock; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, - rule_rate, &runtime->hw, + rule_rate, dpcm, SNDRV_PCM_HW_PARAM_RATE, -1); if (err < 0) goto unlock; err = snd_pcm_hw_rule_add(runtime, 0, SNDRV_PCM_HW_PARAM_CHANNELS, - rule_channels, &runtime->hw, + rule_channels, dpcm, SNDRV_PCM_HW_PARAM_CHANNELS, -1); if (err < 0) goto unlock;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vikas C Sajjan vikas.cha.sajjan@hpe.com
commit 4ee2ec1b122599f7b10c849fa7915cebb37b7edb upstream.
The new function mp_register_ioapic_irq() is a subset of the code in mp_override_legacy_irq().
Replace the code duplication by invoking mp_register_ioapic_irq() from mp_override_legacy_irq().
Signed-off-by: Vikas C Sajjan vikas.cha.sajjan@hpe.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: linux-pm@vger.kernel.org Cc: kkamagui@gmail.com Cc: linux-acpi@vger.kernel.org Link: https://lkml.kernel.org/r/1510848825-21965-3-git-send-email-vikas.cha.sajjan... Cc: Jean Delvare jdelvare@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/acpi/boot.c | 27 +++++---------------------- 1 file changed, 5 insertions(+), 22 deletions(-)
--- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -335,13 +335,12 @@ acpi_parse_lapic_nmi(struct acpi_subtabl #ifdef CONFIG_X86_IO_APIC #define MP_ISA_BUS 0
+static int __init mp_register_ioapic_irq(u8 bus_irq, u8 polarity, + u8 trigger, u32 gsi); + static void __init mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger, u32 gsi) { - int ioapic; - int pin; - struct mpc_intsrc mp_irq; - /* * Check bus_irq boundary. */ @@ -351,14 +350,6 @@ static void __init mp_override_legacy_ir }
/* - * Convert 'gsi' to 'ioapic.pin'. - */ - ioapic = mp_find_ioapic(gsi); - if (ioapic < 0) - return; - pin = mp_find_ioapic_pin(ioapic, gsi); - - /* * TBD: This check is for faulty timer entries, where the override * erroneously sets the trigger to level, resulting in a HUGE * increase of timer interrupts! @@ -366,16 +357,8 @@ static void __init mp_override_legacy_ir if ((bus_irq == 0) && (trigger == 3)) trigger = 1;
- mp_irq.type = MP_INTSRC; - mp_irq.irqtype = mp_INT; - mp_irq.irqflag = (trigger << 2) | polarity; - mp_irq.srcbus = MP_ISA_BUS; - mp_irq.srcbusirq = bus_irq; /* IRQ */ - mp_irq.dstapic = mpc_ioapic_id(ioapic); /* APIC ID */ - mp_irq.dstirq = pin; /* INTIN# */ - - mp_save_irq(&mp_irq); - + if (mp_register_ioapic_irq(bus_irq, polarity, trigger, gsi) < 0) + return; /* * Reset default identity mapping if gsi is also an legacy IRQ, * otherwise there will be more than one entry with the same GSI
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Streetman ddstreet@ieee.org
commit fd5bb66cd934987e49557455b6497fc006521940 upstream.
Change the zpool/compressor param callback function to release the zswap_pools_lock spinlock before calling param_set_charp, since that function may sleep when it calls kmalloc with GFP_KERNEL.
While this problem has existed for a while, I wasn't able to trigger it using a tight loop changing either/both the zpool and compressor params; I think it's very unlikely to be an issue on the stable kernels, especially since most zswap users will change the compressor and/or zpool from sysfs only one time each boot - or zero times, if they add the params to the kernel boot.
Fixes: c99b42c3529e ("zswap: use charp for zswap param strings") Link: http://lkml.kernel.org/r/20170126155821.4545-1-ddstreet@ieee.org Signed-off-by: Dan Streetman dan.streetman@canonical.com Reported-by: Sergey Senozhatsky sergey.senozhatsky.work@gmail.com Cc: Michal Hocko mhocko@kernel.org Cc: Minchan Kim minchan@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/zswap.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/mm/zswap.c +++ b/mm/zswap.c @@ -752,18 +752,22 @@ static int __zswap_param_set(const char pool = zswap_pool_find_get(type, compressor); if (pool) { zswap_pool_debug("using existing", pool); + WARN_ON(pool == zswap_pool_current()); list_del_rcu(&pool->list); - } else { - spin_unlock(&zswap_pools_lock); - pool = zswap_pool_create(type, compressor); - spin_lock(&zswap_pools_lock); }
+ spin_unlock(&zswap_pools_lock); + + if (!pool) + pool = zswap_pool_create(type, compressor); + if (pool) ret = param_set_charp(s, kp); else ret = -EINVAL;
+ spin_lock(&zswap_pools_lock); + if (!ret) { put_pool = zswap_pool_current(); list_add_rcu(&pool->list, &zswap_pools);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit d4ca73591916b760478d2b04334d5dcadc028e9c upstream.
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet edumazet@google.com Cc: James Hughes james.hughes@raspberrypi.org Cc: Woojung Huh woojung.huh@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/lan78xx.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2419,14 +2419,9 @@ static struct sk_buff *lan78xx_tx_prep(s { u32 tx_cmd_a, tx_cmd_b;
- if (skb_headroom(skb) < TX_OVERHEAD) { - struct sk_buff *skb2; - - skb2 = skb_copy_expand(skb, TX_OVERHEAD, 0, flags); + if (skb_cow_head(skb, TX_OVERHEAD)) { dev_kfree_skb_any(skb); - skb = skb2; - if (!skb) - return NULL; + return NULL; }
if (lan78xx_linearize(skb) < 0)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit d532c1082f68176363ed766d09bf187616e282fe upstream.
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet edumazet@google.com Cc: James Hughes james.hughes@raspberrypi.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/sr9700.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/drivers/net/usb/sr9700.c +++ b/drivers/net/usb/sr9700.c @@ -456,14 +456,9 @@ static struct sk_buff *sr9700_tx_fixup(s
len = skb->len;
- if (skb_headroom(skb) < SR_TX_OVERHEAD) { - struct sk_buff *skb2; - - skb2 = skb_copy_expand(skb, SR_TX_OVERHEAD, 0, flags); + if (skb_cow_head(skb, SR_TX_OVERHEAD)) { dev_kfree_skb_any(skb); - skb = skb2; - if (!skb) - return NULL; + return NULL; }
__skb_push(skb, SR_TX_OVERHEAD);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit b7c6d2675899cfff0180412c63fc9cbd5bacdb4d upstream.
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet edumazet@google.com Cc: James Hughes james.hughes@raspberrypi.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/smsc75xx.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/drivers/net/usb/smsc75xx.c +++ b/drivers/net/usb/smsc75xx.c @@ -2205,13 +2205,9 @@ static struct sk_buff *smsc75xx_tx_fixup { u32 tx_cmd_a, tx_cmd_b;
- if (skb_headroom(skb) < SMSC75XX_TX_OVERHEAD) { - struct sk_buff *skb2 = - skb_copy_expand(skb, SMSC75XX_TX_OVERHEAD, 0, flags); + if (skb_cow_head(skb, SMSC75XX_TX_OVERHEAD)) { dev_kfree_skb_any(skb); - skb = skb2; - if (!skb) - return NULL; + return NULL; }
tx_cmd_a = (u32)(skb->len & TX_CMD_A_LEN) | TX_CMD_A_FCS;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit a9e840a2081ed28c2b7caa6a9a0041c950b3c37d upstream.
We need to ensure there is enough headroom to push extra header, but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet edumazet@google.com Cc: James Hughes james.hughes@raspberrypi.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Oliver Neukum oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/usb/cx82310_eth.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
--- a/drivers/net/usb/cx82310_eth.c +++ b/drivers/net/usb/cx82310_eth.c @@ -293,12 +293,9 @@ static struct sk_buff *cx82310_tx_fixup( { int len = skb->len;
- if (skb_headroom(skb) < 2) { - struct sk_buff *skb2 = skb_copy_expand(skb, 2, 0, flags); + if (skb_cow_head(skb, 2)) { dev_kfree_skb_any(skb); - skb = skb2; - if (!skb) - return NULL; + return NULL; } skb_push(skb, 2);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Hutchings ben.hutchings@codethink.co.uk
This is a stable-only fix for the backport of commit 5d9b70f7d52e ("xhci: Don't add a virt_dev to the devs array before it's fully allocated").
In branches that predate commit c5628a2af83a ("xhci: remove endpoint ring cache") there is an additional failure path in xhci_alloc_virt_device() where ring cache allocation fails, in which case we need to free the ring allocated for endpoint 0.
Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Cc: Mathias Nyman mathias.nyman@intel.com --- This is build-tested only.
Ben.
drivers/usb/host/xhci-mem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1086,7 +1086,8 @@ int xhci_alloc_virt_device(struct xhci_h
return 1; fail: - + if (dev->eps[0].ring) + xhci_ring_free(xhci, dev->eps[0].ring); if (dev->in_ctx) xhci_free_container_ctx(xhci, dev->in_ctx); if (dev->out_ctx)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit 78bbb15f2239bc8e663aa20bbe1987c91a0b75f6 ]
A vlan device with vid 0 is allow to creat by not able to be fully cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.
Also, VLAN 0 is probably not a valid number and it is kinda "reserved" for HW accelerating devices, but it is probably too late to reject it from creation even if makes sense. Instead, just remove the check in unregister_vlan_dev().
Reported-by: Dmitry Vyukov dvyukov@google.com Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Cc: Vlad Yasevich vyasevich@gmail.com Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/8021q/vlan.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
--- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -111,12 +111,7 @@ void unregister_vlan_dev(struct net_devi vlan_gvrp_uninit_applicant(real_dev); }
- /* Take it out of our own structures, but be sure to interlock with - * HW accelerating devices or SW vlan input packet processing if - * VLAN is not 0 (leave it there for 802.1p). - */ - if (vlan_id) - vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); + vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
/* Get rid of the vlan's reference to real_dev */ dev_put(real_dev);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Cooper elicooper@gmx.com
[ Upstream commit 23263ec86a5f44312d2899323872468752324107 ]
When an ip6_tunnel is in mode 'any', where the transport layer protocol can be either 4 or 41, dst_cache must be disabled.
This is because xfrm policies might apply to only one of the two protocols. Caching dst would cause xfrm policies for one protocol incorrectly used for the other.
Signed-off-by: Eli Cooper elicooper@gmx.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_tunnel.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1080,10 +1080,11 @@ int ip6_tnl_xmit(struct sk_buff *skb, st memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr)); neigh_release(neigh); } - } else if (!(t->parms.flags & - (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) { - /* enable the cache only only if the routing decision does - * not depend on the current inner header value + } else if (t->parms.proto != 0 && !(t->parms.flags & + (IP6_TNL_F_USE_ORIG_TCLASS | + IP6_TNL_F_USE_ORIG_FWMARK))) { + /* enable the cache only if neither the outer protocol nor the + * routing decision depends on the current inner header value */ use_cache = true; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Vladyka tulup@mail.ru
[ Upstream commit b8fd0823e0770c2d5fdbd865bccf0d5e058e5287 ]
Use AF_INET6 instead of AF_INET in IPv6-related code path
Signed-off-by: Andrii Vladyka tulup@mail.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -295,7 +295,7 @@ static int sock_diag_bind(struct net *ne case SKNLGRP_INET6_UDP_DESTROY: if (!sock_diag_handlers[AF_INET6]) request_module("net-pf-%d-proto-%d-type-%d", PF_NETLINK, - NETLINK_SOCK_DIAG, AF_INET); + NETLINK_SOCK_DIAG, AF_INET6); break; } return 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mohamed Ghannam simo.ghannam@gmail.com
[ Upstream commit c095508770aebf1b9218e77026e48345d719b17c ]
When args->nr_local is 0, nr_pages gets also 0 due some size calculation via rds_rm_size(), which is later used to allocate pages for DMA, this bug produces a heap Out-Of-Bound write access to a specific memory region.
Signed-off-by: Mohamed Ghannam simo.ghannam@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/rdma.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -524,6 +524,9 @@ int rds_rdma_extra_size(struct rds_rdma_
local_vec = (struct rds_iovec __user *)(unsigned long) args->local_vec_addr;
+ if (args->nr_local == 0) + return -EINVAL; + /* figure out the number of pages in the vector */ for (i = 0; i < args->nr_local; i++) { if (copy_from_user(&vec, &local_vec[i],
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mohamed Ghannam simo.ghannam@gmail.com
[ Upstream commit 7d11f77f84b27cef452cee332f4e469503084737 ]
set rm->atomic.op_active to 0 when rds_pin_pages() fails or the user supplied address is invalid, this prevents a NULL pointer usage in rds_atomic_free_op()
Signed-off-by: Mohamed Ghannam simo.ghannam@gmail.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/rdma.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -876,6 +876,7 @@ int rds_cmsg_atomic(struct rds_sock *rs, err: if (page) put_page(page); + rm->atomic.op_active = 0; kfree(rm->atomic.op_notifier);
return ret;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergei Shtylyov sergei.shtylyov@cogentembedded.com
[ Upstream commit dfe8266b8dd10e12a731c985b725fcf7f0e537f0 ]
When switching the driver to the managed device API, I managed to break the case of a dual Ether devices sharing a single TSU: the 2nd Ether port wouldn't probe. Iwamatsu-san has tried to fix this but his patch was buggy and he then dropped the ball...
The solution is to limit calling devm_request_mem_region() to the first of the two ports sharing the same TSU, so devm_ioremap_resource() can't be used anymore for the TSU resource...
Fixes: d5e07e69218f ("sh_eth: use managed device API") Reported-by: Nobuhiro Iwamatsu nobuhiro.iwamatsu.yj@renesas.com Signed-off-by: Sergei Shtylyov sergei.shtylyov@cogentembedded.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/renesas/sh_eth.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -3087,10 +3087,29 @@ static int sh_eth_drv_probe(struct platf /* ioremap the TSU registers */ if (mdp->cd->tsu) { struct resource *rtsu; + rtsu = platform_get_resource(pdev, IORESOURCE_MEM, 1); - mdp->tsu_addr = devm_ioremap_resource(&pdev->dev, rtsu); - if (IS_ERR(mdp->tsu_addr)) { - ret = PTR_ERR(mdp->tsu_addr); + if (!rtsu) { + dev_err(&pdev->dev, "no TSU resource\n"); + ret = -ENODEV; + goto out_release; + } + /* We can only request the TSU region for the first port + * of the two sharing this TSU for the probe to succeed... + */ + if (devno % 2 == 0 && + !devm_request_mem_region(&pdev->dev, rtsu->start, + resource_size(rtsu), + dev_name(&pdev->dev))) { + dev_err(&pdev->dev, "can't request TSU resource.\n"); + ret = -EBUSY; + goto out_release; + } + mdp->tsu_addr = devm_ioremap(&pdev->dev, rtsu->start, + resource_size(rtsu)); + if (!mdp->tsu_addr) { + dev_err(&pdev->dev, "TSU region ioremap() failed.\n"); + ret = -ENOMEM; goto out_release; } mdp->port = devno % 2;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergei Shtylyov sergei.shtylyov@cogentembedded.com
[ Upstream commit 5133550296d43236439494aa955bfb765a89f615 ]
Renesas SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the 'sh_eth' driver can only reset and initialize TSU of the first controller pair. Shimoda-san tried to solve that adding the 'needs_init' member to the 'struct sh_eth_plat_data', however the platform code still never sets this flag. I think that we can infer this information from the 'devno' variable (set to 'platform_device::id') and reset/init the Ether controller pair only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be removed...
Fixes: 150647fb2c31 ("net: sh_eth: change the condition of initialization") Signed-off-by: Sergei Shtylyov sergei.shtylyov@cogentembedded.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/renesas/sh_eth.c | 4 ++-- include/linux/sh_eth.h | 1 - 2 files changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -3116,8 +3116,8 @@ static int sh_eth_drv_probe(struct platf ndev->features = NETIF_F_HW_VLAN_CTAG_FILTER; }
- /* initialize first or needed device */ - if (!devno || pd->needs_init) { + /* Need to init only the first port of the two sharing a TSU */ + if (devno % 2 == 0) { if (mdp->cd->chip_reset) mdp->cd->chip_reset(ndev);
--- a/include/linux/sh_eth.h +++ b/include/linux/sh_eth.h @@ -16,7 +16,6 @@ struct sh_eth_plat_data { unsigned char mac_addr[ETH_ALEN]; unsigned no_ether_link:1; unsigned ether_link_active_low:1; - unsigned needs_init:1; };
#endif
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jerome Brunet jbrunet@baylibre.com
[ Upstream commit 879626e3a52630316d817cbda7cec9a5446d1d82 ]
Note in the databook - Section 4.4 - EEE : " The EEE feature is not supported when the MAC is configured to use the TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC supports multiple PHY interfaces, you should activate the EEE mode only when the MAC is operating with GMII, MII, or RGMII interface."
Applying this restriction solves a stability issue observed on Amlogic gxl platforms operating with RMII interface and the internal PHY.
Fixes: 83bf79b6bb64 ("stmmac: disable at run-time the EEE if not supported") Signed-off-by: Jerome Brunet jbrunet@baylibre.com Tested-by: Arnaud Patard arnaud.patard@rtp-net.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 ++++++ include/linux/phy.h | 11 +++++++++++ 2 files changed, 17 insertions(+)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -280,8 +280,14 @@ static void stmmac_eee_ctrl_timer(unsign bool stmmac_eee_init(struct stmmac_priv *priv) { unsigned long flags; + int interface = priv->plat->interface; bool ret = false;
+ if ((interface != PHY_INTERFACE_MODE_MII) && + (interface != PHY_INTERFACE_MODE_GMII) && + !phy_interface_mode_is_rgmii(interface)) + goto out; + /* Using PCS we cannot dial with the phy registers at this stage * so we do not support extra feature like EEE. */ --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -684,6 +684,17 @@ static inline bool phy_is_internal(struc }
/** + * phy_interface_mode_is_rgmii - Convenience function for testing if a + * PHY interface mode is RGMII (all variants) + * @mode: the phy_interface_t enum + */ +static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode) +{ + return mode >= PHY_INTERFACE_MODE_RGMII && + mode <= PHY_INTERFACE_MODE_RGMII_TXID; +}; + +/** * phy_interface_is_rgmii - Convenience function for testing if a PHY interface * is RGMII (all variants) * @phydev: the phy_device struct
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 862c03ee1deb7e19e0f9931682e0294ecd1fcaf9 ]
ip6_setup_cork() might return an error, while memory allocations have been done and must be rolled back.
Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Vlad Yasevich vyasevich@gmail.com Reported-by: Mike Maloney maloney@google.com Acked-by: Mike Maloney maloney@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_output.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1800,9 +1800,10 @@ struct sk_buff *ip6_make_skb(struct sock cork.base.opt = NULL; v6_cork.opt = NULL; err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6); - if (err) + if (err) { + ip6_cork_release(&cork, &v6_cork); return ERR_PTR(err); - + } if (ipc6->dontfrag < 0) ipc6->dontfrag = inet6_sk(sk)->dontfrag;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Hemminger stephen@networkplumber.org
[ Upstream commit 71891e2dab6b55a870f8f7735e44a2963860b5c6 ]
In kernel log ths message appears on every boot: "warning: `NetworkChangeNo' uses legacy ethtool link settings API, link modes are only partially reported"
When ethtool link settings API changed, it started complaining about usages of old API. Ironically, the original patch was from google but the application using the legacy API is chrome.
Linux ABI is fixed as much as possible. The kernel must not break it and should not complain about applications using legacy API's. This patch just removes the warning since using legacy API's in Linux is perfectly acceptable.
Fixes: 3f1ac7a700d0 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API") Signed-off-by: Stephen Hemminger stephen@networkplumber.org Signed-off-by: David Decotigny decot@googlers.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/ethtool.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-)
--- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -742,15 +742,6 @@ static int ethtool_set_link_ksettings(st return dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings); }
-static void -warn_incomplete_ethtool_legacy_settings_conversion(const char *details) -{ - char name[sizeof(current->comm)]; - - pr_info_once("warning: `%s' uses legacy ethtool link settings API, %s\n", - get_task_comm(name, current), details); -} - /* Query device for its ethtool_cmd settings. * * Backward compatibility note: for compatibility with legacy ethtool, @@ -777,10 +768,8 @@ static int ethtool_get_settings(struct n &link_ksettings); if (err < 0) return err; - if (!convert_link_ksettings_to_legacy_settings(&cmd, - &link_ksettings)) - warn_incomplete_ethtool_legacy_settings_conversion( - "link modes are only partially reported"); + convert_link_ksettings_to_legacy_settings(&cmd, + &link_ksettings);
/* send a sensible cmd tag back to user */ cmd.cmd = ETHTOOL_GSET;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@mellanox.com
[ Upstream commit 8764a8267b128405cf383157d5e9a4a3735d2409 ]
When we remove the neighbour associated with a nexthop we should always refuse to write the nexthop to the adjacency table. Regardless if it is already present in the table or not.
Otherwise, we risk dereferencing the NULL pointer that was set instead of the neighbour.
Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel idosch@mellanox.com Reported-by: Alexander Petrovskiy alexpe@mellanox.com Signed-off-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -1328,9 +1328,9 @@ set_trap: static void __mlxsw_sp_nexthop_neigh_update(struct mlxsw_sp_nexthop *nh, bool removing) { - if (!removing && !nh->should_offload) + if (!removing) nh->should_offload = 1; - else if (removing && nh->offloaded) + else nh->should_offload = 0; nh->update = 1; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roi Dayan roid@mellanox.com
[ Upstream commit 3bb23421a504f01551b7cb9dff0e41dbf16656b0 ]
We need to update lastuse to to the most updated value between what is already set and the new value. If HW matching fails, i.e. because of an issue, the stats are not updated but it could be that software did match and updated lastuse.
Fixes: 5712bf9c5c30 ("net/sched: act_mirred: Use passed lastuse argument") Fixes: 9fea47d93bcc ("net/sched: act_gact: Update statistics when offloaded to hardware") Signed-off-by: Roi Dayan roid@mellanox.com Reviewed-by: Paul Blakey paulb@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/act_gact.c | 2 +- net/sched/act_mirred.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -161,7 +161,7 @@ static void tcf_gact_stats_update(struct if (action == TC_ACT_SHOT) this_cpu_ptr(gact->common.cpu_qstats)->drops += packets;
- tm->lastuse = lastuse; + tm->lastuse = max_t(u64, tm->lastuse, lastuse); }
static int tcf_gact_dump(struct sk_buff *skb, struct tc_action *a, --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -211,7 +211,7 @@ static void tcf_stats_update(struct tc_a struct tcf_t *tm = &m->tcf_tm;
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets); - tm->lastuse = lastuse; + tm->lastuse = max_t(u64, tm->lastuse, lastuse); }
static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit 9a00674213a3f00394f4e3221b88f2d21fc05789 upstream.
syzkaller triggered a NULL pointer dereference in crypto_remove_spawns() via a program that repeatedly and concurrently requests AEADs "authenc(cmac(des3_ede-asm),pcbc-aes-aesni)" and hashes "cmac(des3_ede)" through AF_ALG, where the hashes are requested as "untested" (CRYPTO_ALG_TESTED is set in ->salg_mask but clear in ->salg_feat; this causes the template to be instantiated for every request).
Although AF_ALG users really shouldn't be able to request an "untested" algorithm, the NULL pointer dereference is actually caused by a longstanding race condition where crypto_remove_spawns() can encounter an instance which has had spawn(s) "grabbed" but hasn't yet been registered, resulting in ->cra_users still being NULL.
We probably should properly initialize ->cra_users earlier, but that would require updating many templates individually. For now just fix the bug in a simple way that can easily be backported: make crypto_remove_spawns() treat a NULL ->cra_users list as empty.
Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/algapi.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -167,6 +167,18 @@ void crypto_remove_spawns(struct crypto_
spawn->alg = NULL; spawns = &inst->alg.cra_users; + + /* + * We may encounter an unregistered instance here, since + * an instance's spawns are set up prior to the instance + * being registered. An unregistered instance will have + * NULL ->cra_users.next, since ->cra_users isn't + * properly initialized until registration. But an + * unregistered instance cannot have any users, so treat + * it the same as ->cra_users being empty. + */ + if (spawns->next == NULL) + break; } } while ((spawns = crypto_more_spawns(alg, &stack, &top, &secondary_spawns)));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit 21acdf45f4958135940f0b4767185cf911d4b010 upstream.
Commit d3834fefcfe5 ("rbd: bump queue_max_segments") bumped max_segments (unsigned short) to max_hw_sectors (unsigned int). max_hw_sectors is set to the number of 512-byte sectors in an object and overflows unsigned short for 32M (largest possible) objects, making the block layer resort to handing us single segment (i.e. single page or even smaller) bios in that case.
Fixes: d3834fefcfe5 ("rbd: bump queue_max_segments") Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Alex Elder elder@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/rbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -4511,7 +4511,7 @@ static int rbd_init_disk(struct rbd_devi segment_size = rbd_obj_bytes(&rbd_dev->header); blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE); q->limits.max_sectors = queue_max_hw_sectors(q); - blk_queue_max_segments(q, segment_size / SECTOR_SIZE); + blk_queue_max_segments(q, USHRT_MAX); blk_queue_max_segment_size(q, segment_size); blk_queue_io_min(q, segment_size); blk_queue_io_opt(q, segment_size);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jia Zhang qianyue.zj@alibaba-inc.com
commit b94b7373317164402ff7728d10f7023127a02b60 upstream.
Instead of blacklisting all model 79 CPUs when attempting a late microcode loading, limit that only to CPUs with microcode revisions < 0x0b000021 because only on those late loading may cause a system hang.
For such processors either:
a) a BIOS update which might contain a newer microcode revision
or
b) the early microcode loading method
should be considered.
Processors with revisions 0x0b000021 or higher will not experience such hangs.
For more details, see erratum BDF90 in document #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family Specification Update) from September 2017.
[ bp: Heavily massage commit message and pr_* statements. ]
Fixes: 723f2828a98c ("x86/microcode/intel: Disable late loading on model 79") Signed-off-by: Jia Zhang qianyue.zj@alibaba-inc.com Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Tony Luck tony.luck@intel.com Cc: x86-ml x86@kernel.org Link: http://lkml.kernel.org/r/1514772287-92959-1-git-send-email-qianyue.zj@alibab... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/microcode/intel.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -1051,8 +1051,17 @@ static bool is_blacklisted(unsigned int { struct cpuinfo_x86 *c = &cpu_data(cpu);
- if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X) { - pr_err_once("late loading on model 79 is disabled.\n"); + /* + * Late loading on model 79 with microcode revision less than 0x0b000021 + * may result in a system hang. This behavior is documented in item + * BDF90, #334165 (Intel Xeon Processor E7-8800/4800 v4 Product Family). + */ + if (c->x86 == 6 && + c->x86_model == INTEL_FAM6_BROADWELL_X && + c->x86_mask == 0x01 && + c->microcode < 0x0b000021) { + pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode); + pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n"); return true; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Honig ahonig@google.com
commit 75f139aaf896d6fdeec2e468ddfa4b2fe469bf40 upstream.
This adds a memory barrier when performing a lookup into the vmcs_field_to_offset_table. This is related to CVE-2017-5753.
Signed-off-by: Andrew Honig ahonig@google.com Reviewed-by: Jim Mattson jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -857,8 +857,16 @@ static inline short vmcs_field_to_offset { BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
- if (field >= ARRAY_SIZE(vmcs_field_to_offset_table) || - vmcs_field_to_offset_table[field] == 0) + if (field >= ARRAY_SIZE(vmcs_field_to_offset_table)) + return -ENOENT; + + /* + * FIXME: Mitigation for CVE-2017-5753. To be replaced with a + * generic mechanism. + */ + asm("lfence"); + + if (vmcs_field_to_offset_table[field] == 0) return -ENOENT;
return vmcs_field_to_offset_table[field];
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 0d9cac0ca0429830c40fe1a4e50e60f6221fd7b6 upstream.
The vmw_view_cmd_to_type() function returns vmw_view_max (3) on error. It's one element beyond the end of the vmw_view_cotables[] table.
My read on this is that it's possible to hit this failure. header->id comes from vmw_cmd_check() and it's a user controlled number between 1040 and 1225 so we can hit that error. But I don't have the hardware to test this code.
Fixes: d80efd5cb3de ("drm/vmwgfx: Initial DX support") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Thomas Hellstrom thellstrom@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -2729,6 +2729,8 @@ static int vmw_cmd_dx_view_define(struct }
view_type = vmw_view_cmd_to_type(header->id); + if (view_type == vmw_view_max) + return -EINVAL; cmd = container_of(header, typeof(*cmd), header); ret = vmw_cmd_res_check(dev_priv, sw_context, vmw_res_surface, user_surface_converter,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lepton Wu ytht.net@gmail.com
This finally resolve crash if loaded under qemu + haxm. Haitao Shan pointed out that the reason of that crash is that NX bit get set for page tables. It seems we missed checking if _PAGE_NX is supported in kaiser_add_user_map
Link: https://www.spinics.net/lists/kernel/msg2689835.html
Reviewed-by: Guenter Roeck groeck@chromium.org Signed-off-by: Lepton Wu ytht.net@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/kaiser.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -197,6 +197,8 @@ static int kaiser_add_user_map(const voi * requires that not to be #defined to 0): so mask it off here. */ flags &= ~_PAGE_GLOBAL; + if (!(__supported_pte_mask & _PAGE_NX)) + flags &= ~_PAGE_NX;
for (; address < end_addr; address += PAGE_SIZE) { target_address = get_pa_from_mapping(address);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicholas Bellinger nab@linux-iscsi.org
commit ae072726f6109bb1c94841d6fb3a82dde298ea85 upstream.
Since commit 59b6986dbf fixed a potential NULL pointer dereference by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the se_tmr_req is currently leaked by iscsit_free_cmd() because no iscsi_cmd->se_cmd.se_tfo was associated.
To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other TMR and call transport_init_se_cmd() + target_get_sess_cmd() to setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.
This will ensure normal release operation once se_cmd->cmd_kref reaches zero and target_release_cmd_kref() is invoked, se_tmr_req will be released via existing target_free_cmd_mem() and core_tmr_release_req() code.
Reported-by: Donald White dew@datera.io Cc: Donald White dew@datera.io Cc: Mike Christie mchristi@redhat.com Cc: Hannes Reinecke hare@suse.com Signed-off-by: Nicholas Bellinger nab@linux-iscsi.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/target/iscsi/iscsi_target.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-)
--- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1940,7 +1940,6 @@ iscsit_handle_task_mgt_cmd(struct iscsi_ struct iscsi_tmr_req *tmr_req; struct iscsi_tm *hdr; int out_of_order_cmdsn = 0, ret; - bool sess_ref = false; u8 function, tcm_function = TMR_UNKNOWN;
hdr = (struct iscsi_tm *) buf; @@ -1982,18 +1981,17 @@ iscsit_handle_task_mgt_cmd(struct iscsi_ buf); }
+ transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops, + conn->sess->se_sess, 0, DMA_NONE, + TCM_SIMPLE_TAG, cmd->sense_buffer + 2); + + target_get_sess_cmd(&cmd->se_cmd, true); + /* * TASK_REASSIGN for ERL=2 / connection stays inside of * LIO-Target $FABRIC_MOD */ if (function != ISCSI_TM_FUNC_TASK_REASSIGN) { - transport_init_se_cmd(&cmd->se_cmd, &iscsi_ops, - conn->sess->se_sess, 0, DMA_NONE, - TCM_SIMPLE_TAG, cmd->sense_buffer + 2); - - target_get_sess_cmd(&cmd->se_cmd, true); - sess_ref = true; - switch (function) { case ISCSI_TM_FUNC_ABORT_TASK: tcm_function = TMR_ABORT_TASK; @@ -2132,12 +2130,8 @@ attach: * For connection recovery, this is also the default action for * TMR TASK_REASSIGN. */ - if (sess_ref) { - pr_debug("Handle TMR, using sess_ref=true check\n"); - target_put_sess_cmd(&cmd->se_cmd); - } - iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state); + target_put_sess_cmd(&cmd->se_cmd); return 0; } EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicholas Bellinger nab@linux-iscsi.org
commit 1c21a48055a67ceb693e9c2587824a8de60a217c upstream.
This patch fixes bug where early se_cmd exceptions that occur before backend execution can result in use-after-free if/when a subsequent ABORT_TASK occurs for the same tag.
Since an early se_cmd exception will have had se_cmd added to se_session->sess_cmd_list via target_get_sess_cmd(), it will not have CMD_T_COMPLETE set by the usual target_complete_cmd() backend completion path.
This causes a subsequent ABORT_TASK + __target_check_io_state() to signal ABORT_TASK should proceed. As core_tmr_abort_task() executes, it will bring the outstanding se_cmd->cmd_kref count down to zero releasing se_cmd, after se_cmd has already been queued with error status into fabric driver response path code.
To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is set at target_get_sess_cmd() time, and cleared immediately before backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE is set.
Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to determine when an early exception has occured, and avoid aborting this se_cmd since it will have already been queued into fabric driver response path code.
Reported-by: Donald White dew@datera.io Cc: Donald White dew@datera.io Cc: Mike Christie mchristi@redhat.com Cc: Hannes Reinecke hare@suse.com Signed-off-by: Nicholas Bellinger nab@linux-iscsi.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/target/target_core_tmr.c | 9 +++++++++ drivers/target/target_core_transport.c | 2 ++ include/target/target_core_base.h | 1 + 3 files changed, 12 insertions(+)
--- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -133,6 +133,15 @@ static bool __target_check_io_state(stru spin_unlock(&se_cmd->t_state_lock); return false; } + if (se_cmd->transport_state & CMD_T_PRE_EXECUTE) { + if (se_cmd->scsi_status) { + pr_debug("Attempted to abort io tag: %llu early failure" + " status: 0x%02x\n", se_cmd->tag, + se_cmd->scsi_status); + spin_unlock(&se_cmd->t_state_lock); + return false; + } + } if (sess->sess_tearing_down || se_cmd->cmd_wait_set) { pr_debug("Attempted to abort io tag: %llu already shutdown," " skipping\n", se_cmd->tag); --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1939,6 +1939,7 @@ void target_execute_cmd(struct se_cmd *c }
cmd->t_state = TRANSPORT_PROCESSING; + cmd->transport_state &= ~CMD_T_PRE_EXECUTE; cmd->transport_state |= CMD_T_ACTIVE|CMD_T_BUSY|CMD_T_SENT; spin_unlock_irq(&cmd->t_state_lock);
@@ -2592,6 +2593,7 @@ int target_get_sess_cmd(struct se_cmd *s ret = -ESHUTDOWN; goto out; } + se_cmd->transport_state |= CMD_T_PRE_EXECUTE; list_add_tail(&se_cmd->se_cmd_list, &se_sess->sess_cmd_list); out: spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -493,6 +493,7 @@ struct se_cmd { #define CMD_T_BUSY (1 << 9) #define CMD_T_TAS (1 << 10) #define CMD_T_FABRIC_STOP (1 << 11) +#define CMD_T_PRE_EXECUTE (1 << 12) spinlock_t t_state_lock; struct kref cmd_kref; struct completion t_transport_stop_comp;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@fb.com
commit e245c5c6a5656e4d61aa7bb08e9694fd6e5b2b9d upstream.
no functional change. move fixup_bpf_calls() to verifier.c it's being refactored in the next patch
Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Cc: Jiri Slaby jslaby@suse.cz [backported to 4.9 - gregkh] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/syscall.c | 54 -------------------------------------------------- kernel/bpf/verifier.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+), 54 deletions(-)
--- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -565,57 +565,6 @@ void bpf_register_prog_type(struct bpf_p list_add(&tl->list_node, &bpf_prog_types); }
-/* fixup insn->imm field of bpf_call instructions: - * if (insn->imm == BPF_FUNC_map_lookup_elem) - * insn->imm = bpf_map_lookup_elem - __bpf_call_base; - * else if (insn->imm == BPF_FUNC_map_update_elem) - * insn->imm = bpf_map_update_elem - __bpf_call_base; - * else ... - * - * this function is called after eBPF program passed verification - */ -static void fixup_bpf_calls(struct bpf_prog *prog) -{ - const struct bpf_func_proto *fn; - int i; - - for (i = 0; i < prog->len; i++) { - struct bpf_insn *insn = &prog->insnsi[i]; - - if (insn->code == (BPF_JMP | BPF_CALL)) { - /* we reach here when program has bpf_call instructions - * and it passed bpf_check(), means that - * ops->get_func_proto must have been supplied, check it - */ - BUG_ON(!prog->aux->ops->get_func_proto); - - if (insn->imm == BPF_FUNC_get_route_realm) - prog->dst_needed = 1; - if (insn->imm == BPF_FUNC_get_prandom_u32) - bpf_user_rnd_init_once(); - if (insn->imm == BPF_FUNC_tail_call) { - /* mark bpf_tail_call as different opcode - * to avoid conditional branch in - * interpeter for every normal call - * and to prevent accidental JITing by - * JIT compiler that doesn't support - * bpf_tail_call yet - */ - insn->imm = 0; - insn->code |= BPF_X; - continue; - } - - fn = prog->aux->ops->get_func_proto(insn->imm); - /* all functions that have prototype and verifier allowed - * programs to call them, must be real in-kernel functions - */ - BUG_ON(!fn->func); - insn->imm = fn->func - __bpf_call_base; - } - } -} - /* drop refcnt on maps used by eBPF program and free auxilary data */ static void free_used_maps(struct bpf_prog_aux *aux) { @@ -808,9 +757,6 @@ static int bpf_prog_load(union bpf_attr if (err < 0) goto free_used_maps;
- /* fixup BPF_CALL->imm field */ - fixup_bpf_calls(prog); - /* eBPF program is ready to be JITed */ prog = bpf_prog_select_runtime(prog, &err); if (err < 0) --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3362,6 +3362,57 @@ static int convert_ctx_accesses(struct b return 0; }
+/* fixup insn->imm field of bpf_call instructions: + * if (insn->imm == BPF_FUNC_map_lookup_elem) + * insn->imm = bpf_map_lookup_elem - __bpf_call_base; + * else if (insn->imm == BPF_FUNC_map_update_elem) + * insn->imm = bpf_map_update_elem - __bpf_call_base; + * else ... + * + * this function is called after eBPF program passed verification + */ +static void fixup_bpf_calls(struct bpf_prog *prog) +{ + const struct bpf_func_proto *fn; + int i; + + for (i = 0; i < prog->len; i++) { + struct bpf_insn *insn = &prog->insnsi[i]; + + if (insn->code == (BPF_JMP | BPF_CALL)) { + /* we reach here when program has bpf_call instructions + * and it passed bpf_check(), means that + * ops->get_func_proto must have been supplied, check it + */ + BUG_ON(!prog->aux->ops->get_func_proto); + + if (insn->imm == BPF_FUNC_get_route_realm) + prog->dst_needed = 1; + if (insn->imm == BPF_FUNC_get_prandom_u32) + bpf_user_rnd_init_once(); + if (insn->imm == BPF_FUNC_tail_call) { + /* mark bpf_tail_call as different opcode + * to avoid conditional branch in + * interpeter for every normal call + * and to prevent accidental JITing by + * JIT compiler that doesn't support + * bpf_tail_call yet + */ + insn->imm = 0; + insn->code |= BPF_X; + continue; + } + + fn = prog->aux->ops->get_func_proto(insn->imm); + /* all functions that have prototype and verifier allowed + * programs to call them, must be real in-kernel functions + */ + BUG_ON(!fn->func); + insn->imm = fn->func - __bpf_call_base; + } + } +} + static void free_states(struct bpf_verifier_env *env) { struct bpf_verifier_state_list *sl, *sln; @@ -3463,6 +3514,9 @@ skip_full_check: /* program is valid, convert *(u32*)(ctx + off) accesses */ ret = convert_ctx_accesses(env);
+ if (ret == 0) + fixup_bpf_calls(env->prog); + if (log_level && log_len >= log_size - 1) { BUG_ON(log_len >= log_size); /* verifier log exceeded user supplied buffer */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@fb.com
commit 79741b3bdec01a8628368fbcfccc7d189ed606cb upstream.
reduce indent and make it iterate over instructions similar to convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.
Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: David S. Miller davem@davemloft.net Cc: Jiri Slaby jslaby@suse.cz [Backported to 4.9.y - gregkh] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/verifier.c | 75 +++++++++++++++++++++++--------------------------- 1 file changed, 35 insertions(+), 40 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3362,55 +3362,50 @@ static int convert_ctx_accesses(struct b return 0; }
-/* fixup insn->imm field of bpf_call instructions: - * if (insn->imm == BPF_FUNC_map_lookup_elem) - * insn->imm = bpf_map_lookup_elem - __bpf_call_base; - * else if (insn->imm == BPF_FUNC_map_update_elem) - * insn->imm = bpf_map_update_elem - __bpf_call_base; - * else ... +/* fixup insn->imm field of bpf_call instructions * * this function is called after eBPF program passed verification */ -static void fixup_bpf_calls(struct bpf_prog *prog) +static int fixup_bpf_calls(struct bpf_verifier_env *env) { + struct bpf_prog *prog = env->prog; + struct bpf_insn *insn = prog->insnsi; const struct bpf_func_proto *fn; + const int insn_cnt = prog->len; int i;
- for (i = 0; i < prog->len; i++) { - struct bpf_insn *insn = &prog->insnsi[i]; - - if (insn->code == (BPF_JMP | BPF_CALL)) { - /* we reach here when program has bpf_call instructions - * and it passed bpf_check(), means that - * ops->get_func_proto must have been supplied, check it - */ - BUG_ON(!prog->aux->ops->get_func_proto); - - if (insn->imm == BPF_FUNC_get_route_realm) - prog->dst_needed = 1; - if (insn->imm == BPF_FUNC_get_prandom_u32) - bpf_user_rnd_init_once(); - if (insn->imm == BPF_FUNC_tail_call) { - /* mark bpf_tail_call as different opcode - * to avoid conditional branch in - * interpeter for every normal call - * and to prevent accidental JITing by - * JIT compiler that doesn't support - * bpf_tail_call yet - */ - insn->imm = 0; - insn->code |= BPF_X; - continue; - } + for (i = 0; i < insn_cnt; i++, insn++) { + if (insn->code != (BPF_JMP | BPF_CALL)) + continue; + + if (insn->imm == BPF_FUNC_get_route_realm) + prog->dst_needed = 1; + if (insn->imm == BPF_FUNC_get_prandom_u32) + bpf_user_rnd_init_once(); + if (insn->imm == BPF_FUNC_tail_call) { + /* mark bpf_tail_call as different opcode to avoid + * conditional branch in the interpeter for every normal + * call and to prevent accidental JITing by JIT compiler + * that doesn't support bpf_tail_call yet + */ + insn->imm = 0; + insn->code |= BPF_X; + continue; + }
- fn = prog->aux->ops->get_func_proto(insn->imm); - /* all functions that have prototype and verifier allowed - * programs to call them, must be real in-kernel functions - */ - BUG_ON(!fn->func); - insn->imm = fn->func - __bpf_call_base; + fn = prog->aux->ops->get_func_proto(insn->imm); + /* all functions that have prototype and verifier allowed + * programs to call them, must be real in-kernel functions + */ + if (!fn->func) { + verbose("kernel subsystem misconfigured func %d\n", + insn->imm); + return -EFAULT; } + insn->imm = fn->func - __bpf_call_base; } + + return 0; }
static void free_states(struct bpf_verifier_env *env) @@ -3515,7 +3510,7 @@ skip_full_check: ret = convert_ctx_accesses(env);
if (ret == 0) - fixup_bpf_calls(env->prog); + ret = fixup_bpf_calls(env);
if (log_level && log_len >= log_size - 1) { BUG_ON(log_len >= log_size);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexei Starovoitov ast@kernel.org
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.
Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT.
v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: John Fastabend john.fastabend@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Jiri Slaby jslaby@suse.cz [ Backported to 4.9 - gregkh ] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/bpf.h | 2 ++ include/linux/bpf_verifier.h | 5 ++++- kernel/bpf/arraymap.c | 31 ++++++++++++++++++++++--------- kernel/bpf/verifier.c | 42 +++++++++++++++++++++++++++++++++++++++--- 4 files changed, 67 insertions(+), 13 deletions(-)
--- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -43,6 +43,7 @@ struct bpf_map { u32 max_entries; u32 map_flags; u32 pages; + bool unpriv_array; struct user_struct *user; const struct bpf_map_ops *ops; struct work_struct work; @@ -189,6 +190,7 @@ struct bpf_prog_aux { struct bpf_array { struct bpf_map map; u32 elem_size; + u32 index_mask; /* 'ownership' of prog_array is claimed by the first program that * is going to use this map or by the first program which FD is stored * in the map to make sure that all callers and callees have the same --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -67,7 +67,10 @@ struct bpf_verifier_state_list { };
struct bpf_insn_aux_data { - enum bpf_reg_type ptr_type; /* pointer type for load/store insns */ + union { + enum bpf_reg_type ptr_type; /* pointer type for load/store insns */ + struct bpf_map *map_ptr; /* pointer for call insn into lookup_elem */ + }; bool seen; /* this insn was processed by the verifier */ };
--- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -46,9 +46,10 @@ static int bpf_array_alloc_percpu(struct static struct bpf_map *array_map_alloc(union bpf_attr *attr) { bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY; + u32 elem_size, index_mask, max_entries; + bool unpriv = !capable(CAP_SYS_ADMIN); struct bpf_array *array; u64 array_size; - u32 elem_size;
/* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || @@ -63,11 +64,20 @@ static struct bpf_map *array_map_alloc(u
elem_size = round_up(attr->value_size, 8);
+ max_entries = attr->max_entries; + index_mask = roundup_pow_of_two(max_entries) - 1; + + if (unpriv) + /* round up array size to nearest power of 2, + * since cpu will speculate within index_mask limits + */ + max_entries = index_mask + 1; + array_size = sizeof(*array); if (percpu) - array_size += (u64) attr->max_entries * sizeof(void *); + array_size += (u64) max_entries * sizeof(void *); else - array_size += (u64) attr->max_entries * elem_size; + array_size += (u64) max_entries * elem_size;
/* make sure there is no u32 overflow later in round_up() */ if (array_size >= U32_MAX - PAGE_SIZE) @@ -77,6 +87,8 @@ static struct bpf_map *array_map_alloc(u array = bpf_map_area_alloc(array_size); if (!array) return ERR_PTR(-ENOMEM); + array->index_mask = index_mask; + array->map.unpriv_array = unpriv;
/* copy mandatory map attributes */ array->map.map_type = attr->map_type; @@ -110,7 +122,7 @@ static void *array_map_lookup_elem(struc if (unlikely(index >= array->map.max_entries)) return NULL;
- return array->value + array->elem_size * index; + return array->value + array->elem_size * (index & array->index_mask); }
/* Called from eBPF program */ @@ -122,7 +134,7 @@ static void *percpu_array_map_lookup_ele if (unlikely(index >= array->map.max_entries)) return NULL;
- return this_cpu_ptr(array->pptrs[index]); + return this_cpu_ptr(array->pptrs[index & array->index_mask]); }
int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value) @@ -142,7 +154,7 @@ int bpf_percpu_array_copy(struct bpf_map */ size = round_up(map->value_size, 8); rcu_read_lock(); - pptr = array->pptrs[index]; + pptr = array->pptrs[index & array->index_mask]; for_each_possible_cpu(cpu) { bpf_long_memcpy(value + off, per_cpu_ptr(pptr, cpu), size); off += size; @@ -190,10 +202,11 @@ static int array_map_update_elem(struct return -EEXIST;
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) - memcpy(this_cpu_ptr(array->pptrs[index]), + memcpy(this_cpu_ptr(array->pptrs[index & array->index_mask]), value, map->value_size); else - memcpy(array->value + array->elem_size * index, + memcpy(array->value + + array->elem_size * (index & array->index_mask), value, map->value_size); return 0; } @@ -227,7 +240,7 @@ int bpf_percpu_array_update(struct bpf_m */ size = round_up(map->value_size, 8); rcu_read_lock(); - pptr = array->pptrs[index]; + pptr = array->pptrs[index & array->index_mask]; for_each_possible_cpu(cpu) { bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off, size); off += size; --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1187,7 +1187,7 @@ static void clear_all_pkt_pointers(struc } }
-static int check_call(struct bpf_verifier_env *env, int func_id) +static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx) { struct bpf_verifier_state *state = &env->cur_state; const struct bpf_func_proto *fn = NULL; @@ -1238,6 +1238,13 @@ static int check_call(struct bpf_verifie err = check_func_arg(env, BPF_REG_2, fn->arg2_type, &meta); if (err) return err; + if (func_id == BPF_FUNC_tail_call) { + if (meta.map_ptr == NULL) { + verbose("verifier bug\n"); + return -EINVAL; + } + env->insn_aux_data[insn_idx].map_ptr = meta.map_ptr; + } err = check_func_arg(env, BPF_REG_3, fn->arg3_type, &meta); if (err) return err; @@ -3019,7 +3026,7 @@ static int do_check(struct bpf_verifier_ return -EINVAL; }
- err = check_call(env, insn->imm); + err = check_call(env, insn->imm, insn_idx); if (err) return err;
@@ -3372,7 +3379,11 @@ static int fixup_bpf_calls(struct bpf_ve struct bpf_insn *insn = prog->insnsi; const struct bpf_func_proto *fn; const int insn_cnt = prog->len; - int i; + struct bpf_insn insn_buf[16]; + struct bpf_prog *new_prog; + struct bpf_map *map_ptr; + int i, cnt, delta = 0; +
for (i = 0; i < insn_cnt; i++, insn++) { if (insn->code != (BPF_JMP | BPF_CALL)) @@ -3390,6 +3401,31 @@ static int fixup_bpf_calls(struct bpf_ve */ insn->imm = 0; insn->code |= BPF_X; + + /* instead of changing every JIT dealing with tail_call + * emit two extra insns: + * if (index >= max_entries) goto out; + * index &= array->index_mask; + * to avoid out-of-bounds cpu speculation + */ + map_ptr = env->insn_aux_data[i + delta].map_ptr; + if (!map_ptr->unpriv_array) + continue; + insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3, + map_ptr->max_entries, 2); + insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3, + container_of(map_ptr, + struct bpf_array, + map)->index_mask); + insn_buf[2] = *insn; + cnt = 3; + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; continue; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Borkmann daniel@iogearbox.net
commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns, and thus unprivileged. With the recently added logic in b2157399cc98 ("bpf: prevent out-of-bounds speculation") we round this up to the next power of two value for max_entries for unprivileged such that we can apply proper masking into potentially zeroed out map slots.
However, this will generate an index_mask of 0xffffffff, and therefore a + 1 will let this overflow into new max_entries of 0. This will pass allocation, etc, and later on map access we still enforce on the original attr->max_entries value which was 0xfffffffd, therefore triggering GPF all over the place. Thus bail out on overflow in such case.
Moreover, on 32 bit archs roundup_pow_of_two() can also not be used, since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit space is undefined. Therefore, do this by hand in a 64 bit variable.
This fixes all the issues triggered by syzkaller's reproducers.
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/bpf/arraymap.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -49,7 +49,7 @@ static struct bpf_map *array_map_alloc(u u32 elem_size, index_mask, max_entries; bool unpriv = !capable(CAP_SYS_ADMIN); struct bpf_array *array; - u64 array_size; + u64 array_size, mask64;
/* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || @@ -65,13 +65,25 @@ static struct bpf_map *array_map_alloc(u elem_size = round_up(attr->value_size, 8);
max_entries = attr->max_entries; - index_mask = roundup_pow_of_two(max_entries) - 1;
- if (unpriv) + /* On 32 bit archs roundup_pow_of_two() with max_entries that has + * upper most bit set in u32 space is undefined behavior due to + * resulting 1U << 32, so do it manually here in u64 space. + */ + mask64 = fls_long(max_entries - 1); + mask64 = 1ULL << mask64; + mask64 -= 1; + + index_mask = mask64; + if (unpriv) { /* round up array size to nearest power of 2, * since cpu will speculate within index_mask limits */ max_entries = index_mask + 1; + /* Check for overflows. */ + if (max_entries < attr->max_entries) + return ERR_PTR(-E2BIG); + }
array_size = sizeof(*array); if (percpu)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Holl cyborgx1@gmail.com
commit d14ac576d10f865970bb1324d337e5e24d79aaf4 upstream.
This adds the ELV ALC 8xxx Battery Charging device to the list of USB IDs of drivers/usb/serial/cp210x.c
Signed-off-by: Christian Holl cyborgx1@gmail.com Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -172,6 +172,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x1843, 0x0200) }, /* Vaisala USB Instrument Cable */ { USB_DEVICE(0x18EF, 0xE00F) }, /* ELV USB-I2C-Interface */ { USB_DEVICE(0x18EF, 0xE025) }, /* ELV Marble Sound Board 1 */ + { USB_DEVICE(0x18EF, 0xE030) }, /* ELV ALC 8xxx Battery Charger */ { USB_DEVICE(0x18EF, 0xE032) }, /* ELV TFD500 Data Logger */ { USB_DEVICE(0x1901, 0x0190) }, /* GE B850 CP2105 Recorder interface */ { USB_DEVICE(0x1901, 0x0193) }, /* GE B650 CP2104 PMC interface */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Agner stefan@agner.ch
commit b8626f1dc29d3eee444bfaa92146ec7b291ef41c upstream.
When using a GPIO which is high by default, and initialize the driver in USB Hub mode, initialization fails with: [ 111.757794] usb3503 0-0008: SP_ILOCK failed (-5)
The reason seems to be that the chip is not properly reset. Probe does initialize reset low, however some lines later the code already set it back high, which is not long enouth.
Make sure reset is asserted for at least 100us by inserting a delay after initializing the reset pin during probe.
Signed-off-by: Stefan Agner stefan@agner.ch Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/misc/usb3503.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/misc/usb3503.c +++ b/drivers/usb/misc/usb3503.c @@ -292,6 +292,8 @@ static int usb3503_probe(struct usb3503 if (gpio_is_valid(hub->gpio_reset)) { err = devm_gpio_request_one(dev, hub->gpio_reset, GPIOF_OUT_INIT_LOW, "usb3503 reset"); + /* Datasheet defines a hardware reset to be at least 100us */ + usleep_range(100, 10000); if (err) { dev_err(dev, "unable to request GPIO %d as reset pin (%d)\n",
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pete Zaitcev zaitcev@redhat.com
commit 46eb14a6e1585d99c1b9f58d0e7389082a5f466b upstream.
Automated tests triggered this by opening usbmon and accessing the mmap while simultaneously resizing the buffers. This bug was with us since 2006, because typically applications only size the buffers once and thus avoid racing. Reported by Kirill A. Shutemov.
Reported-by: syzbot+f9831b881b3e849829fc@syzkaller.appspotmail.com Signed-off-by: Pete Zaitcev zaitcev@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/mon/mon_bin.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/usb/mon/mon_bin.c +++ b/drivers/usb/mon/mon_bin.c @@ -1002,7 +1002,9 @@ static long mon_bin_ioctl(struct file *f break;
case MON_IOCQ_RING_SIZE: + mutex_lock(&rp->fetch_lock); ret = rp->b_size; + mutex_unlock(&rp->fetch_lock); break;
case MON_IOCT_RING_SIZE: @@ -1229,12 +1231,16 @@ static int mon_bin_vma_fault(struct vm_a unsigned long offset, chunk_idx; struct page *pageptr;
+ mutex_lock(&rp->fetch_lock); offset = vmf->pgoff << PAGE_SHIFT; - if (offset >= rp->b_size) + if (offset >= rp->b_size) { + mutex_unlock(&rp->fetch_lock); return VM_FAULT_SIGBUS; + } chunk_idx = offset / CHUNK_SIZE; pageptr = rp->b_vec[chunk_idx].pg; get_page(pageptr); + mutex_unlock(&rp->fetch_lock); vmf->page = pageptr; return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan shuahkh@osg.samsung.com
commit e1346fd87c71a1f61de1fe476ec8df1425ac931c upstream.
usbip_dump_usb_device() and usbip_dump_urb() print kernel addresses. Remove kernel addresses from usb device and urb debug msgs and improve the message content.
Instead of printing parent device and bus addresses, print parent device and bus names.
Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/usbip_common.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-)
--- a/drivers/usb/usbip/usbip_common.c +++ b/drivers/usb/usbip/usbip_common.c @@ -105,7 +105,7 @@ static void usbip_dump_usb_device(struct dev_dbg(dev, " devnum(%d) devpath(%s) usb speed(%s)", udev->devnum, udev->devpath, usb_speed_string(udev->speed));
- pr_debug("tt %p, ttport %d\n", udev->tt, udev->ttport); + pr_debug("tt hub ttport %d\n", udev->ttport);
dev_dbg(dev, " "); for (i = 0; i < 16; i++) @@ -138,12 +138,8 @@ static void usbip_dump_usb_device(struct } pr_debug("\n");
- dev_dbg(dev, "parent %p, bus %p\n", udev->parent, udev->bus); - - dev_dbg(dev, - "descriptor %p, config %p, actconfig %p, rawdescriptors %p\n", - &udev->descriptor, udev->config, - udev->actconfig, udev->rawdescriptors); + dev_dbg(dev, "parent %s, bus %s\n", dev_name(&udev->parent->dev), + udev->bus->bus_name);
dev_dbg(dev, "have_langid %d, string_langid %d\n", udev->have_langid, udev->string_langid); @@ -251,9 +247,6 @@ void usbip_dump_urb(struct urb *urb)
dev = &urb->dev->dev;
- dev_dbg(dev, " urb :%p\n", urb); - dev_dbg(dev, " dev :%p\n", urb->dev); - usbip_dump_usb_device(urb->dev);
dev_dbg(dev, " pipe :%08x ", urb->pipe); @@ -262,11 +255,9 @@ void usbip_dump_urb(struct urb *urb)
dev_dbg(dev, " status :%d\n", urb->status); dev_dbg(dev, " transfer_flags :%08X\n", urb->transfer_flags); - dev_dbg(dev, " transfer_buffer :%p\n", urb->transfer_buffer); dev_dbg(dev, " transfer_buffer_length:%d\n", urb->transfer_buffer_length); dev_dbg(dev, " actual_length :%d\n", urb->actual_length); - dev_dbg(dev, " setup_packet :%p\n", urb->setup_packet);
if (urb->setup_packet && usb_pipetype(urb->pipe) == PIPE_CONTROL) usbip_dump_usb_ctrlrequest( @@ -276,8 +267,6 @@ void usbip_dump_urb(struct urb *urb) dev_dbg(dev, " number_of_packets :%d\n", urb->number_of_packets); dev_dbg(dev, " interval :%d\n", urb->interval); dev_dbg(dev, " error_count :%d\n", urb->error_count); - dev_dbg(dev, " context :%p\n", urb->context); - dev_dbg(dev, " complete :%p\n", urb->complete); } EXPORT_SYMBOL_GPL(usbip_dump_urb);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan shuahkh@osg.samsung.com
commit b78d830f0049ef1966dc1e0ebd1ec2a594e2cf25 upstream.
Harden CMD_SUBMIT path to handle malicious input that could trigger large memory allocations. Add checks to validate transfer_buffer_length and number_of_packets to protect against bad input requesting for unbounded memory allocations.
Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/vudc_rx.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/usb/usbip/vudc_rx.c +++ b/drivers/usb/usbip/vudc_rx.c @@ -132,6 +132,25 @@ static int v_recv_cmd_submit(struct vudc urb_p->new = 1; urb_p->seqnum = pdu->base.seqnum;
+ if (urb_p->ep->type == USB_ENDPOINT_XFER_ISOC) { + /* validate packet size and number of packets */ + unsigned int maxp, packets, bytes; + + maxp = usb_endpoint_maxp(urb_p->ep->desc); + maxp *= usb_endpoint_maxp_mult(urb_p->ep->desc); + bytes = pdu->u.cmd_submit.transfer_buffer_length; + packets = DIV_ROUND_UP(bytes, maxp); + + if (pdu->u.cmd_submit.number_of_packets < 0 || + pdu->u.cmd_submit.number_of_packets > packets) { + dev_err(&udc->gadget.dev, + "CMD_SUBMIT: isoc invalid num packets %d\n", + pdu->u.cmd_submit.number_of_packets); + ret = -EMSGSIZE; + goto free_urbp; + } + } + ret = alloc_urb_from_cmd(&urb_p->urb, pdu, urb_p->ep->type); if (ret) { usbip_event_add(&udc->ud, VUDC_EVENT_ERROR_MALLOC);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuah Khan shuahkh@osg.samsung.com
commit 5fd77a3a0e408c23ab4002a57db980e46bc16e72 upstream.
v_send_ret_submit() handles urb with a null transfer_buffer, when it replays a packet with potential malicious data that could contain a null buffer.
Add a check for the condition when actual_length > 0 and transfer_buffer is null.
Signed-off-by: Shuah Khan shuahkh@osg.samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/usbip/vudc_tx.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/usb/usbip/vudc_tx.c +++ b/drivers/usb/usbip/vudc_tx.c @@ -97,6 +97,13 @@ static int v_send_ret_submit(struct vudc memset(&pdu_header, 0, sizeof(pdu_header)); memset(&msg, 0, sizeof(msg));
+ if (urb->actual_length > 0 && !urb->transfer_buffer) { + dev_err(&udc->gadget.dev, + "urb: actual_length %d transfer_buffer null\n", + urb->actual_length); + return -1; + } + if (urb_p->type == USB_ENDPOINT_XFER_ISOC) iovnum = 2 + urb->number_of_packets; else @@ -112,8 +119,8 @@ static int v_send_ret_submit(struct vudc
/* 1. setup usbip_header */ setup_ret_submit_pdu(&pdu_header, urb_p); - usbip_dbg_stub_tx("setup txdata seqnum: %d urb: %p\n", - pdu_header.base.seqnum, urb); + usbip_dbg_stub_tx("setup txdata seqnum: %d\n", + pdu_header.base.seqnum); usbip_header_correct_endian(&pdu_header, 1);
iov[iovnum].iov_base = &pdu_header;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viktor Slavkovic viktors@google.com
commit 443064cb0b1fb4569fe0a71209da7625129fb760 upstream.
A lock-unlock is missing in ASHMEM_SET_SIZE ioctl which can result in a race condition when mmap is called. After the !asma->file check, before setting asma->size, asma->file can be set in mmap. That would result in having different asma->size than the mapped memory size. Combined with ASHMEM_UNPIN ioctl and shrinker invocation, this can result in memory corruption.
Signed-off-by: Viktor Slavkovic viktors@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/android/ashmem.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/staging/android/ashmem.c +++ b/drivers/staging/android/ashmem.c @@ -774,10 +774,12 @@ static long ashmem_ioctl(struct file *fi break; case ASHMEM_SET_SIZE: ret = -EINVAL; + mutex_lock(&ashmem_mutex); if (!asma->file) { ret = 0; asma->size = (size_t)arg; } + mutex_unlock(&ashmem_mutex); break; case ASHMEM_GET_SIZE: ret = asma->size;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Seri ben@armis.com
commit 06e7e776ca4d36547e503279aeff996cbb292c16 upstream.
In the function l2cap_parse_conf_rsp and in the function l2cap_parse_conf_req the following variable is declared without initialization:
struct l2cap_conf_efs efs;
In addition, when parsing input configuration parameters in both of these functions, the switch case for handling EFS elements may skip the memcpy call that will write to the efs variable:
... case L2CAP_CONF_EFS: if (olen == sizeof(efs)) memcpy(&efs, (void *)val, olen); ...
The olen in the above if is attacker controlled, and regardless of that if, in both of these functions the efs variable would eventually be added to the outgoing configuration request that is being built:
l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs), (unsigned long) &efs);
So by sending a configuration request, or response, that contains an L2CAP_CONF_EFS element, but with an element length that is not sizeof(efs) - the memcpy to the uninitialized efs variable can be avoided, and the uninitialized variable would be returned to the attacker (16 bytes).
This issue has been assigned CVE-2017-1000410
Cc: Marcel Holtmann marcel@holtmann.org Cc: Gustavo Padovan gustavo@padovan.org Cc: Johan Hedberg johan.hedberg@gmail.com Signed-off-by: Ben Seri ben@armis.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bluetooth/l2cap_core.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -3353,9 +3353,10 @@ static int l2cap_parse_conf_req(struct l break;
case L2CAP_CONF_EFS: - remote_efs = 1; - if (olen == sizeof(efs)) + if (olen == sizeof(efs)) { + remote_efs = 1; memcpy(&efs, (void *) val, olen); + } break;
case L2CAP_CONF_EWS: @@ -3574,16 +3575,17 @@ static int l2cap_parse_conf_rsp(struct l break;
case L2CAP_CONF_EFS: - if (olen == sizeof(efs)) + if (olen == sizeof(efs)) { memcpy(&efs, (void *)val, olen);
- if (chan->local_stype != L2CAP_SERV_NOTRAFIC && - efs.stype != L2CAP_SERV_NOTRAFIC && - efs.stype != chan->local_stype) - return -ECONNREFUSED; + if (chan->local_stype != L2CAP_SERV_NOTRAFIC && + efs.stype != L2CAP_SERV_NOTRAFIC && + efs.stype != chan->local_stype) + return -ECONNREFUSED;
- l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs), - (unsigned long) &efs, endptr - ptr); + l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs), + (unsigned long) &efs, endptr - ptr); + } break;
case L2CAP_CONF_FCS:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Icenowy Zheng icenowy@aosc.io
commit 928afc85270753657b5543e052cc270c279a3fe9 upstream.
The UAS mode of Norelsys NS1068(X) is reported to fail to work on several platforms with the following error message:
xhci-hcd xhci-hcd.0.auto: ERROR Transfer event for unknown stream ring slot 1 ep 8 xhci-hcd xhci-hcd.0.auto: @00000000bf04a400 00000000 00000000 1b000000 01098001
And when trying to mount a partition on the disk the disk will disconnect from the USB controller, then after re-connecting the device will be offlined and not working at all.
Falling back to USB mass storage can solve this problem, so ignore UAS function of this chip.
Signed-off-by: Icenowy Zheng icenowy@aosc.io Acked-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/unusual_uas.h | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/storage/unusual_uas.h +++ b/drivers/usb/storage/unusual_uas.h @@ -156,6 +156,13 @@ UNUSUAL_DEV(0x2109, 0x0711, 0x0000, 0x99 USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NO_ATA_1X),
+/* Reported-by: Icenowy Zheng icenowy@aosc.io */ +UNUSUAL_DEV(0x2537, 0x1068, 0x0000, 0x9999, + "Norelsys", + "NS1068X", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_IGNORE_UAS), + /* Reported-by: Takeo Nakayama javhera@gmx.com */ UNUSUAL_DEV(0x357d, 0x7788, 0x0000, 0x9999, "JMicron",
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Poirier bpoirier@suse.com
commit 4110e02eb45ea447ec6f5459c9934de0a273fb91 upstream.
e1000e_check_for_copper_link() and e1000_check_for_copper_link_ich8lan() are the two functions that may be assigned to mac.ops.check_for_link when phy.media_type == e1000_media_type_copper. Commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up") changed the meaning of the return value of check_for_link for copper media but only adjusted the first function. This patch adjusts the second function likewise.
Reported-by: Christian Hesse list@eworm.de Reported-by: Gabriel C nix.or.die@gmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=198047 Fixes: 19110cfbb34d ("e1000e: Separate signaling for link check/link up") Signed-off-by: Benjamin Poirier bpoirier@suse.com Tested-by: Aaron Brown aaron.f.brown@intel.com Tested-by: Christian Hesse list@eworm.de Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/intel/e1000e/ich8lan.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -1364,6 +1364,9 @@ out: * Checks to see of the link status of the hardware has changed. If a * change in link status has been detected, then we read the PHY registers * to get the current speed/duplex if link exists. + * + * Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link + * up). **/ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw) { @@ -1379,7 +1382,7 @@ static s32 e1000_check_for_copper_link_i * Change or Rx Sequence Error interrupt. */ if (!mac->get_link_status) - return 0; + return 1;
/* First we want to see if the MII Status Register reports * link. If so, then we want to get the current speed/duplex @@ -1611,10 +1614,12 @@ static s32 e1000_check_for_copper_link_i * different link partner. */ ret_val = e1000e_config_fc_after_link_up(hw); - if (ret_val) + if (ret_val) { e_dbg("Error configuring flow control\n"); + return ret_val; + }
- return ret_val; + return 1; }
static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Hansen dave.hansen@linux.intel.com
commit 01c9b17bf673b05bb401b76ec763e9730ccf1376 upstream.
Add some details about how PTI works, what some of the downsides are, and how to debug it when things go wrong.
Also document the kernel parameter: 'pti/nopti'.
Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Randy Dunlap rdunlap@infradead.org Reviewed-by: Kees Cook keescook@chromium.org Cc: Moritz Lipp moritz.lipp@iaik.tugraz.at Cc: Daniel Gruss daniel.gruss@iaik.tugraz.at Cc: Michael Schwarz michael.schwarz@iaik.tugraz.at Cc: Richard Fellner richard.fellner@student.tugraz.at Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Hugh Dickins hughd@google.com Cc: Andi Lutomirsky luto@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/kernel-parameters.txt | 21 ++-- Documentation/x86/pti.txt | 186 ++++++++++++++++++++++++++++++++++++ 2 files changed, 200 insertions(+), 7 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2763,8 +2763,6 @@ bytes respectively. Such letter suffixes
nojitter [IA-64] Disables jitter checking for ITC timers.
- nopti [X86-64] Disable KAISER isolation of kernel from user. - no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page @@ -3327,11 +3325,20 @@ bytes respectively. Such letter suffixes pt. [PARIDE] See Documentation/blockdev/paride.txt.
- pti= [X86_64] - Control KAISER user/kernel address space isolation: - on - enable - off - disable - auto - default setting + pti= [X86_64] Control Page Table Isolation of user and + kernel address spaces. Disabling this feature + removes hardening, but improves performance of + system calls and interrupts. + + on - unconditionally enable + off - unconditionally disable + auto - kernel detects whether your CPU model is + vulnerable to issues that PTI mitigates + + Not specifying this option is equivalent to pti=auto. + + nopti [X86_64] + Equivalent to pti=off
pty.legacy_count= [KNL] Number of legacy pty's. Overwrites compiled-in --- /dev/null +++ b/Documentation/x86/pti.txt @@ -0,0 +1,186 @@ +Overview +======== + +Page Table Isolation (pti, previously known as KAISER[1]) is a +countermeasure against attacks on the shared user/kernel address +space such as the "Meltdown" approach[2]. + +To mitigate this class of attacks, we create an independent set of +page tables for use only when running userspace applications. When +the kernel is entered via syscalls, interrupts or exceptions, the +page tables are switched to the full "kernel" copy. When the system +switches back to user mode, the user copy is used again. + +The userspace page tables contain only a minimal amount of kernel +data: only what is needed to enter/exit the kernel such as the +entry/exit functions themselves and the interrupt descriptor table +(IDT). There are a few strictly unnecessary things that get mapped +such as the first C function when entering an interrupt (see +comments in pti.c). + +This approach helps to ensure that side-channel attacks leveraging +the paging structures do not function when PTI is enabled. It can be +enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time. +Once enabled at compile-time, it can be disabled at boot with the +'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt). + +Page Table Management +===================== + +When PTI is enabled, the kernel manages two sets of page tables. +The first set is very similar to the single set which is present in +kernels without PTI. This includes a complete mapping of userspace +that the kernel can use for things like copy_to_user(). + +Although _complete_, the user portion of the kernel page tables is +crippled by setting the NX bit in the top level. This ensures +that any missed kernel->user CR3 switch will immediately crash +userspace upon executing its first instruction. + +The userspace page tables map only the kernel data needed to enter +and exit the kernel. This data is entirely contained in the 'struct +cpu_entry_area' structure which is placed in the fixmap which gives +each CPU's copy of the area a compile-time-fixed virtual address. + +For new userspace mappings, the kernel makes the entries in its +page tables like normal. The only difference is when the kernel +makes entries in the top (PGD) level. In addition to setting the +entry in the main kernel PGD, a copy of the entry is made in the +userspace page tables' PGD. + +This sharing at the PGD level also inherently shares all the lower +layers of the page tables. This leaves a single, shared set of +userspace page tables to manage. One PTE to lock, one set of +accessed bits, dirty bits, etc... + +Overhead +======== + +Protection against side-channel attacks is important. But, +this protection comes at a cost: + +1. Increased Memory Use + a. Each process now needs an order-1 PGD instead of order-0. + (Consumes an additional 4k per process). + b. The 'cpu_entry_area' structure must be 2MB in size and 2MB + aligned so that it can be mapped by setting a single PMD + entry. This consumes nearly 2MB of RAM once the kernel + is decompressed, but no space in the kernel image itself. + +2. Runtime Cost + a. CR3 manipulation to switch between the page table copies + must be done at interrupt, syscall, and exception entry + and exit (it can be skipped when the kernel is interrupted, + though.) Moves to CR3 are on the order of a hundred + cycles, and are required at every entry and exit. + b. A "trampoline" must be used for SYSCALL entry. This + trampoline depends on a smaller set of resources than the + non-PTI SYSCALL entry code, so requires mapping fewer + things into the userspace page tables. The downside is + that stacks must be switched at entry time. + d. Global pages are disabled for all kernel structures not + mapped into both kernel and userspace page tables. This + feature of the MMU allows different processes to share TLB + entries mapping the kernel. Losing the feature means more + TLB misses after a context switch. The actual loss of + performance is very small, however, never exceeding 1%. + d. Process Context IDentifiers (PCID) is a CPU feature that + allows us to skip flushing the entire TLB when switching page + tables by setting a special bit in CR3 when the page tables + are changed. This makes switching the page tables (at context + switch, or kernel entry/exit) cheaper. But, on systems with + PCID support, the context switch code must flush both the user + and kernel entries out of the TLB. The user PCID TLB flush is + deferred until the exit to userspace, minimizing the cost. + See intel.com/sdm for the gory PCID/INVPCID details. + e. The userspace page tables must be populated for each new + process. Even without PTI, the shared kernel mappings + are created by copying top-level (PGD) entries into each + new process. But, with PTI, there are now *two* kernel + mappings: one in the kernel page tables that maps everything + and one for the entry/exit structures. At fork(), we need to + copy both. + f. In addition to the fork()-time copying, there must also + be an update to the userspace PGD any time a set_pgd() is done + on a PGD used to map userspace. This ensures that the kernel + and userspace copies always map the same userspace + memory. + g. On systems without PCID support, each CR3 write flushes + the entire TLB. That means that each syscall, interrupt + or exception flushes the TLB. + h. INVPCID is a TLB-flushing instruction which allows flushing + of TLB entries for non-current PCIDs. Some systems support + PCIDs, but do not support INVPCID. On these systems, addresses + can only be flushed from the TLB for the current PCID. When + flushing a kernel address, we need to flush all PCIDs, so a + single kernel address flush will require a TLB-flushing CR3 + write upon the next use of every PCID. + +Possible Future Work +==================== +1. We can be more careful about not actually writing to CR3 + unless its value is actually changed. +2. Allow PTI to be enabled/disabled at runtime in addition to the + boot-time switching. + +Testing +======== + +To test stability of PTI, the following test procedure is recommended, +ideally doing all of these in parallel: + +1. Set CONFIG_DEBUG_ENTRY=y +2. Run several copies of all of the tools/testing/selftests/x86/ tests + (excluding MPX and protection_keys) in a loop on multiple CPUs for + several minutes. These tests frequently uncover corner cases in the + kernel entry code. In general, old kernels might cause these tests + themselves to crash, but they should never crash the kernel. +3. Run the 'perf' tool in a mode (top or record) that generates many + frequent performance monitoring non-maskable interrupts (see "NMI" + in /proc/interrupts). This exercises the NMI entry/exit code which + is known to trigger bugs in code paths that did not expect to be + interrupted, including nested NMIs. Using "-c" boosts the rate of + NMIs, and using two -c with separate counters encourages nested NMIs + and less deterministic behavior. + + while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done + +4. Launch a KVM virtual machine. +5. Run 32-bit binaries on systems supporting the SYSCALL instruction. + This has been a lightly-tested code path and needs extra scrutiny. + +Debugging +========= + +Bugs in PTI cause a few different signatures of crashes +that are worth noting here. + + * Failures of the selftests/x86 code. Usually a bug in one of the + more obscure corners of entry_64.S + * Crashes in early boot, especially around CPU bringup. Bugs + in the trampoline code or mappings cause these. + * Crashes at the first interrupt. Caused by bugs in entry_64.S, + like screwing up a page table switch. Also caused by + incorrectly mapping the IRQ handler entry code. + * Crashes at the first NMI. The NMI code is separate from main + interrupt handlers and can have bugs that do not affect + normal interrupts. Also caused by incorrectly mapping NMI + code. NMIs that interrupt the entry code must be very + careful and can be the cause of crashes that show up when + running perf. + * Kernel crashes at the first exit to userspace. entry_64.S + bugs, or failing to map some of the exit code. + * Crashes at first interrupt that interrupts userspace. The paths + in entry_64.S that return to userspace are sometimes separate + from the ones that return to the kernel. + * Double faults: overflowing the kernel stack because of page + faults upon page faults. Caused by touching non-pti-mapped + data in the entry code, or forgetting to switch to kernel + CR3 before calling into C functions which are not pti-mapped. + * Userspace segfaults early in boot, sometimes manifesting + as mount(8) failing to mount the rootfs. These have + tended to be TLB invalidation issues. Usually invalidating + the wrong PCID, or otherwise missing an invalidation. + +1. https://gruss.cc/files/kaiser.pdf +2. https://meltdownattack.com/meltdown.pdf
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 8bf1ebca215c262e48c15a4a15f175991776f57f upstream.
There are multiple call sites that apply forced CPU caps. Factor them into a helper.
Signed-off-by: Andy Lutomirski luto@kernel.org Reviewed-by: Borislav Petkov bp@suse.de Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: H. Peter Anvin hpa@zytor.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Matthew Whitehead tedheadster@gmail.com Cc: Oleg Nesterov oleg@redhat.com Cc: One Thousand Gnomes gnomes@lxorguk.ukuu.org.uk Cc: Peter Zijlstra peterz@infradead.org Cc: Rik van Riel riel@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Yu-cheng Yu yu-cheng.yu@intel.com Link: http://lkml.kernel.org/r/623ff7555488122143e4417de09b18be2085ad06.1484705016... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/common.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -706,6 +706,16 @@ void cpu_detect(struct cpuinfo_x86 *c) } }
+static void apply_forced_caps(struct cpuinfo_x86 *c) +{ + int i; + + for (i = 0; i < NCAPINTS; i++) { + c->x86_capability[i] &= ~cpu_caps_cleared[i]; + c->x86_capability[i] |= cpu_caps_set[i]; + } +} + void get_cpu_cap(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx; @@ -1086,10 +1096,7 @@ static void identify_cpu(struct cpuinfo_ this_cpu->c_identify(c);
/* Clear/Set all flags overridden by options, after probe */ - for (i = 0; i < NCAPINTS; i++) { - c->x86_capability[i] &= ~cpu_caps_cleared[i]; - c->x86_capability[i] |= cpu_caps_set[i]; - } + apply_forced_caps(c);
#ifdef CONFIG_X86_64 c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); @@ -1151,10 +1158,7 @@ static void identify_cpu(struct cpuinfo_ * Clear/Set all flags overridden by options, need do it * before following smp all cpus cap AND. */ - for (i = 0; i < NCAPINTS; i++) { - c->x86_capability[i] &= ~cpu_caps_cleared[i]; - c->x86_capability[i] |= cpu_caps_set[i]; - } + apply_forced_caps(c);
/* * On SMP, boot_cpu_data holds the common feature set between
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 6cbd2171e89b13377261d15e64384df60ecb530e upstream.
There is currently no way to force CPU bug bits like CPU feature bits. That makes it impossible to set a bug bit once at boot and have it stick for all upcoming CPUs.
Extend the force set/clear arrays to handle bug bits as well.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Borislav Petkov bp@suse.de Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Borislav Petkov bp@alien8.de Cc: Borislav Petkov bpetkov@suse.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Laight David.Laight@aculab.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Eduardo Valentin eduval@amazon.com Cc: Greg KH gregkh@linuxfoundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Rik van Riel riel@redhat.com Cc: Will Deacon will.deacon@arm.com Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.992156574@linutronix.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeature.h | 2 ++ arch/x86/include/asm/processor.h | 4 ++-- arch/x86/kernel/cpu/common.c | 6 +++--- 3 files changed, 7 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -135,6 +135,8 @@ extern const char * const x86_bug_flags[ set_bit(bit, (unsigned long *)cpu_caps_set); \ } while (0)
+#define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit) + #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_X86_FAST_FEATURE_TESTS) /* * Static testing of CPU features. Used the same as boot_cpu_has(). --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -156,8 +156,8 @@ extern struct cpuinfo_x86 boot_cpu_data; extern struct cpuinfo_x86 new_cpu_data;
extern struct tss_struct doublefault_tss; -extern __u32 cpu_caps_cleared[NCAPINTS]; -extern __u32 cpu_caps_set[NCAPINTS]; +extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; +extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS];
#ifdef CONFIG_SMP DECLARE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -480,8 +480,8 @@ static const char *table_lookup_model(st return NULL; /* Not found */ }
-__u32 cpu_caps_cleared[NCAPINTS]; -__u32 cpu_caps_set[NCAPINTS]; +__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; +__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
void load_percpu_segment(int cpu) { @@ -710,7 +710,7 @@ static void apply_forced_caps(struct cpu { int i;
- for (i = 0; i < NCAPINTS; i++) { + for (i = 0; i < NCAPINTS + NBUGINTS; i++) { c->x86_capability[i] &= ~cpu_caps_cleared[i]; c->x86_capability[i] |= cpu_caps_set[i]; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit a89f040fa34ec9cd682aed98b8f04e3c47d998bd upstream.
Many x86 CPUs leak information to user space due to missing isolation of user space and kernel space page tables. There are many well documented ways to exploit that.
The upcoming software migitation of isolating the user and kernel space page tables needs a misfeature flag so code can be made runtime conditional.
Add the BUG bits which indicates that the CPU is affected and add a feature bit which indicates that the software migitation is enabled.
Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be made later.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: David Laight David.Laight@aculab.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Eduardo Valentin eduval@amazon.com Cc: Greg KH gregkh@linuxfoundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Juergen Gross jgross@suse.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Will Deacon will.deacon@arm.com Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/common.c | 4 ++++ 2 files changed, 5 insertions(+)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -316,5 +316,6 @@ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ +#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -882,6 +882,10 @@ static void __init early_identify_cpu(st }
setup_force_cpu_cap(X86_FEATURE_ALWAYS); + + /* Assume for now that ALL x86 CPUs are insecure */ + setup_force_cpu_bug(X86_BUG_CPU_INSECURE); + fpu__init_system(c); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit de791821c295cc61419a06fe5562288417d1bc58 upstream.
Use the name associated with the particular attack which needs page table isolation for mitigation.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: David Woodhouse dwmw@amazon.co.uk Cc: Alan Cox gnomes@lxorguk.ukuu.org.uk Cc: Jiri Koshina jikos@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Lutomirski luto@amacapital.net Cc: Andi Kleen ak@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Paul Turner pjt@google.com Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Greg KH gregkh@linux-foundation.org Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos Signed-off-by: Razvan Ghitulete rga@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/kernel/cpu/common.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -316,6 +316,6 @@ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ -#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */ +#define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -884,7 +884,7 @@ static void __init early_identify_cpu(st setup_force_cpu_cap(X86_FEATURE_ALWAYS);
/* Assume for now that ALL x86 CPUs are insecure */ - setup_force_cpu_bug(X86_BUG_CPU_INSECURE); + setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
fpu__init_system(c); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 99c6fa2511d8a683e61468be91b83f85452115fa upstream.
Add the bug bits for spectre v1/2 and force them unconditionally for all cpus.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1515239374-23361-2-git-send-email-dwmw@amazon.co.u... Signed-off-by: Razvan Ghitulete rga@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/kernel/cpu/common.c | 3 +++ 2 files changed, 5 insertions(+)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -317,5 +317,7 @@ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ #define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */ +#define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */ +#define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -886,6 +886,9 @@ static void __init early_identify_cpu(st /* Assume for now that ALL x86 CPUs are insecure */ setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
+ setup_force_cpu_bug(X86_BUG_SPECTRE_V1); + setup_force_cpu_bug(X86_BUG_SPECTRE_V2); + fpu__init_system(c); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
commit 62a67e123e058a67db58bc6a14354dd037bafd0a upstream.
Should be easier when following boot paths. It probably is a left over from the x86 unification eons ago.
No functionality change.
Signed-off-by: Borislav Petkov bp@suse.de Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20161024173844.23038-3-bp@alien8.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Razvan Ghitulete rga@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/Makefile | 4 +--- arch/x86/kernel/cpu/bugs.c | 26 ++++++++++++++++++++++---- arch/x86/kernel/cpu/bugs_64.c | 33 --------------------------------- 3 files changed, 23 insertions(+), 40 deletions(-) delete mode 100644 arch/x86/kernel/cpu/bugs_64.c
--- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -20,13 +20,11 @@ obj-y := intel_cacheinfo.o scattered.o obj-y += common.o obj-y += rdrand.o obj-y += match.o +obj-y += bugs.o
obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
-obj-$(CONFIG_X86_32) += bugs.o -obj-$(CONFIG_X86_64) += bugs_64.o - obj-$(CONFIG_CPU_SUP_INTEL) += intel.o obj-$(CONFIG_CPU_SUP_AMD) += amd.o obj-$(CONFIG_CPU_SUP_CYRIX_32) += cyrix.o --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -16,6 +16,8 @@ #include <asm/msr.h> #include <asm/paravirt.h> #include <asm/alternative.h> +#include <asm/pgtable.h> +#include <asm/cacheflush.h>
void __init check_bugs(void) { @@ -28,11 +30,13 @@ void __init check_bugs(void) #endif
identify_boot_cpu(); -#ifndef CONFIG_SMP - pr_info("CPU: "); - print_cpu_info(&boot_cpu_data); -#endif
+ if (!IS_ENABLED(CONFIG_SMP)) { + pr_info("CPU: "); + print_cpu_info(&boot_cpu_data); + } + +#ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. * @@ -48,4 +52,18 @@ void __init check_bugs(void) alternative_instructions();
fpu__init_check_bugs(); +#else /* CONFIG_X86_64 */ + alternative_instructions(); + + /* + * Make sure the first 2MB area is not mapped by huge pages + * There are typically fixed size MTRRs in there and overlapping + * MTRRs into large pages causes slow downs. + * + * Right now we don't do that with gbpages because there seems + * very little benefit for that case. + */ + if (!direct_gbpages) + set_memory_4k((unsigned long)__va(0), 1); +#endif } --- a/arch/x86/kernel/cpu/bugs_64.c +++ /dev/null @@ -1,33 +0,0 @@ -/* - * Copyright (C) 1994 Linus Torvalds - * Copyright (C) 2000 SuSE - */ - -#include <linux/kernel.h> -#include <linux/init.h> -#include <asm/alternative.h> -#include <asm/bugs.h> -#include <asm/processor.h> -#include <asm/mtrr.h> -#include <asm/cacheflush.h> - -void __init check_bugs(void) -{ - identify_boot_cpu(); -#if !defined(CONFIG_SMP) - pr_info("CPU: "); - print_cpu_info(&boot_cpu_data); -#endif - alternative_instructions(); - - /* - * Make sure the first 2MB area is not mapped by huge pages - * There are typically fixed size MTRRs in there and overlapping - * MTRRs into large pages causes slow downs. - * - * Right now we don't do that with gbpages because there seems - * very little benefit for that case. - */ - if (!direct_gbpages) - set_memory_4k((unsigned long)__va(0), 1); -}
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 87590ce6e373d1a5401f6539f0c59ef92dd924a9 upstream.
As the meltdown/spectre problem affects several CPU architectures, it makes sense to have common way to express whether a system is affected by a particular vulnerability or not. If affected the way to express the mitigation should be common as well.
Create /sys/devices/system/cpu/vulnerabilities folder and files for meltdown, spectre_v1 and spectre_v2.
Allow architectures to override the show function.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Peter Zijlstra peterz@infradead.org Cc: Will Deacon will.deacon@arm.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linuxfoundation.org Cc: Borislav Petkov bp@alien8.de Cc: David Woodhouse dwmw@amazon.co.uk Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/ABI/testing/sysfs-devices-system-cpu | 16 +++++++ drivers/base/Kconfig | 3 + drivers/base/cpu.c | 48 +++++++++++++++++++++ include/linux/cpu.h | 7 +++ 4 files changed, 74 insertions(+)
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -350,3 +350,19 @@ Contact: Linux ARM Kernel Mailing list < Description: AArch64 CPU registers 'identification' directory exposes the CPU ID registers for identifying model and revision of the CPU. + +What: /sys/devices/system/cpu/vulnerabilities + /sys/devices/system/cpu/vulnerabilities/meltdown + /sys/devices/system/cpu/vulnerabilities/spectre_v1 + /sys/devices/system/cpu/vulnerabilities/spectre_v2 +Date: Januar 2018 +Contact: Linux kernel mailing list linux-kernel@vger.kernel.org +Description: Information about CPU vulnerabilities + + The files are named after the code names of CPU + vulnerabilities. The output of those files reflects the + state of the CPUs in the system. Possible output values: + + "Not affected" CPU is not affected by the vulnerability + "Vulnerable" CPU is affected and no mitigation in effect + "Mitigation: $M" CPU is affetcted and mitigation $M is in effect --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -235,6 +235,9 @@ config GENERIC_CPU_DEVICES config GENERIC_CPU_AUTOPROBE bool
+config GENERIC_CPU_VULNERABILITIES + bool + config SOC_BUS bool
--- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -499,10 +499,58 @@ static void __init cpu_dev_register_gene #endif }
+#ifdef CONFIG_GENERIC_CPU_VULNERABILITIES + +ssize_t __weak cpu_show_meltdown(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + +ssize_t __weak cpu_show_spectre_v1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + +ssize_t __weak cpu_show_spectre_v2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + +static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); +static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); +static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); + +static struct attribute *cpu_root_vulnerabilities_attrs[] = { + &dev_attr_meltdown.attr, + &dev_attr_spectre_v1.attr, + &dev_attr_spectre_v2.attr, + NULL +}; + +static const struct attribute_group cpu_root_vulnerabilities_group = { + .name = "vulnerabilities", + .attrs = cpu_root_vulnerabilities_attrs, +}; + +static void __init cpu_register_vulnerabilities(void) +{ + if (sysfs_create_group(&cpu_subsys.dev_root->kobj, + &cpu_root_vulnerabilities_group)) + pr_err("Unable to register CPU vulnerabilities\n"); +} + +#else +static inline void cpu_register_vulnerabilities(void) { } +#endif + void __init cpu_dev_init(void) { if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups)) panic("Failed to register CPU subsystem");
cpu_dev_register_generic(); + cpu_register_vulnerabilities(); } --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -44,6 +44,13 @@ extern void cpu_remove_dev_attr(struct d extern int cpu_add_dev_attr_group(struct attribute_group *attrs); extern void cpu_remove_dev_attr_group(struct attribute_group *attrs);
+extern ssize_t cpu_show_meltdown(struct device *dev, + struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_spectre_v1(struct device *dev, + struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_spectre_v2(struct device *dev, + struct device_attribute *attr, char *buf); + extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, const struct attribute_group **groups,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 61dc0f555b5c761cdafb0ba5bd41ecf22d68a4c4 upstream.
Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and spectre_v2.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Peter Zijlstra peterz@infradead.org Cc: Will Deacon will.deacon@arm.com Cc: Dave Hansen dave.hansen@intel.com Cc: Linus Torvalds torvalds@linuxfoundation.org Cc: Borislav Petkov bp@alien8.de Cc: David Woodhouse dwmw@amazon.co.uk Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de Signed-off-by: Razvan Ghitulete rga@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/Kconfig | 1 + arch/x86/kernel/cpu/bugs.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -64,6 +64,7 @@ config X86 select GENERIC_CLOCKEVENTS_MIN_ADJUST select GENERIC_CMOS_UPDATE select GENERIC_CPU_AUTOPROBE + select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP select GENERIC_FIND_FIRST_BIT select GENERIC_IOMAP --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -9,6 +9,7 @@ */ #include <linux/init.h> #include <linux/utsname.h> +#include <linux/cpu.h> #include <asm/bugs.h> #include <asm/processor.h> #include <asm/processor-flags.h> @@ -67,3 +68,31 @@ void __init check_bugs(void) set_memory_4k((unsigned long)__va(0), 1); #endif } + +#ifdef CONFIG_SYSFS +ssize_t cpu_show_meltdown(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) + return sprintf(buf, "Not affected\n"); + if (boot_cpu_has(X86_FEATURE_KAISER)) + return sprintf(buf, "Mitigation: PTI\n"); + return sprintf(buf, "Vulnerable\n"); +} + +ssize_t cpu_show_spectre_v1(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1)) + return sprintf(buf, "Not affected\n"); + return sprintf(buf, "Vulnerable\n"); +} + +ssize_t cpu_show_spectre_v2(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + return sprintf(buf, "Not affected\n"); + return sprintf(buf, "Vulnerable\n"); +} +#endif
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
commit e4d0e84e490790798691aaa0f2e598637f1867ec upstream.
To aid in speculation control, make LFENCE a serializing instruction since it has less overhead than MFENCE. This is done by setting bit 1 of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not have this MSR. For these families, the LFENCE instruction is already serializing.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Reviewed-by: Borislav Petkov bp@suse.de Cc: Peter Zijlstra peterz@infradead.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Borislav Petkov bp@alien8.de Cc: Dan Williams dan.j.williams@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: David Woodhouse dwmw@amazon.co.uk Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdof... Signed-off-by: Razvan Ghitulete rga@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/msr-index.h | 2 ++ arch/x86/kernel/cpu/amd.c | 10 ++++++++++ 2 files changed, 12 insertions(+)
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -330,6 +330,8 @@ #define FAM10H_MMIO_CONF_BASE_MASK 0xfffffffULL #define FAM10H_MMIO_CONF_BASE_SHIFT 20 #define MSR_FAM10H_NODE_ID 0xc001100c +#define MSR_F10H_DECFG 0xc0011029 +#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1
/* K8 MSRs */ #define MSR_K8_TOP_MEM1 0xc001001a --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -782,6 +782,16 @@ static void init_amd(struct cpuinfo_x86 set_cpu_cap(c, X86_FEATURE_K8);
if (cpu_has(c, X86_FEATURE_XMM2)) { + /* + * A serializing LFENCE has less overhead than MFENCE, so + * use it for execution serialization. On families which + * don't have that MSR, LFENCE is already serializing. + * msr_set_bit() uses the safe accessors, too, even if the MSR + * is not present. + */ + msr_set_bit(MSR_F10H_DECFG, + MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT); + /* MFENCE stops RDTSC speculation */ set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
commit 9c6a73c75864ad9fa49e5fa6513e4c4071c0e29f upstream.
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference to MFENCE_RDTSC. However, since the kernel could be running under a hypervisor that does not support writing that MSR, read the MSR back and verify that the bit has been set successfully. If the MSR can be read and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the MFENCE_RDTSC feature.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Reviewed-by: Borislav Petkov bp@suse.de Cc: Peter Zijlstra peterz@infradead.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Borislav Petkov bp@alien8.de Cc: Dan Williams dan.j.williams@intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: David Woodhouse dwmw@amazon.co.uk Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdof... Signed-off-by: Razvan Ghitulete rga@amazon.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/amd.c | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -332,6 +332,7 @@ #define MSR_FAM10H_NODE_ID 0xc001100c #define MSR_F10H_DECFG 0xc0011029 #define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1 +#define MSR_F10H_DECFG_LFENCE_SERIALIZE BIT_ULL(MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT)
/* K8 MSRs */ #define MSR_K8_TOP_MEM1 0xc001001a --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -782,6 +782,9 @@ static void init_amd(struct cpuinfo_x86 set_cpu_cap(c, X86_FEATURE_K8);
if (cpu_has(c, X86_FEATURE_XMM2)) { + unsigned long long val; + int ret; + /* * A serializing LFENCE has less overhead than MFENCE, so * use it for execution serialization. On families which @@ -792,8 +795,19 @@ static void init_amd(struct cpuinfo_x86 msr_set_bit(MSR_F10H_DECFG, MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
- /* MFENCE stops RDTSC speculation */ - set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC); + /* + * Verify that the MSR write was successful (could be running + * under a hypervisor) and only then assume that LFENCE is + * serializing. + */ + ret = rdmsrl_safe(MSR_F10H_DECFG, &val); + if (!ret && (val & MSR_F10H_DECFG_LFENCE_SERIALIZE)) { + /* A serializing LFENCE stops RDTSC speculation */ + set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC); + } else { + /* MFENCE stops RDTSC speculation */ + set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC); + } }
/*
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 9ecccfaa7cb5249bd31bdceb93fcf5bedb8a24d8 upstream.
Fixes: 87590ce6e ("sysfs/cpu: Add vulnerability folder") Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/ABI/testing/sysfs-devices-system-cpu | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -355,7 +355,7 @@ What: /sys/devices/system/cpu/vulnerabi /sys/devices/system/cpu/vulnerabilities/meltdown /sys/devices/system/cpu/vulnerabilities/spectre_v1 /sys/devices/system/cpu/vulnerabilities/spectre_v2 -Date: Januar 2018 +Date: January 2018 Contact: Linux kernel mailing list linux-kernel@vger.kernel.org Description: Information about CPU vulnerabilities
@@ -365,4 +365,4 @@ Description: Information about CPU vulne
"Not affected" CPU is not affected by the vulnerability "Vulnerable" CPU is affected and no mitigation in effect - "Mitigation: $M" CPU is affetcted and mitigation $M is in effect + "Mitigation: $M" CPU is affected and mitigation $M is in effect
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
commit 612e8e9350fd19cae6900cf36ea0c6892d1a0dca upstream.
The alternatives code checks only the first byte whether it is a NOP, but with NOPs in front of the payload and having actual instructions after it breaks the "optimized' test.
Make sure to scan all bytes before deciding to optimize the NOPs in there.
Reported-by: David Woodhouse dwmw2@infradead.org Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Andi Kleen ak@linux.intel.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Jiri Kosina jikos@kernel.org Cc: Dave Hansen dave.hansen@intel.com Cc: Andi Kleen andi@firstfloor.org Cc: Andrew Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/alternative.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -340,9 +340,12 @@ done: static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr) { unsigned long flags; + int i;
- if (instr[0] != 0x90) - return; + for (i = 0; i < a->padlen; i++) { + if (instr[i] != 0x90) + return; + }
local_irq_save(flags); add_nops(instr + (a->instrlen - a->padlen), a->padlen);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit b9e705ef7cfaf22db0daab91ad3cd33b0fa32eb9 upstream.
Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile.
Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: ak@linux.intel.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Paul Turner pjt@google.com Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/alternative.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -139,7 +139,7 @@ static inline int alternatives_text_rese ".popsection\n" \ ".pushsection .altinstr_replacement, "ax"\n" \ ALTINSTR_REPLACEMENT(newinstr, feature, 1) \ - ".popsection" + ".popsection\n"
#define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\ OLDINSTR_2(oldinstr, 1, 2) \ @@ -150,7 +150,7 @@ static inline int alternatives_text_rese ".pushsection .altinstr_replacement, "ax"\n" \ ALTINSTR_REPLACEMENT(newinstr1, feature1, 1) \ ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \ - ".popsection" + ".popsection\n"
/* * Alternative instructions for different CPU types or capabilities.
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit b8b7abaed7a49b350f8ba659ddc264b04931d581 upstream.
Otherwise we might have the PCID feature bit set during cpu_init().
This is just for robustness. I haven't seen any actual bugs here.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bpetkov@suse.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: cba4671af755 ("x86/mm: Disable PCID on 32-bit kernels") Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 8 -------- arch/x86/kernel/cpu/common.c | 8 ++++++++ 2 files changed, 8 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -22,14 +22,6 @@
void __init check_bugs(void) { -#ifdef CONFIG_X86_32 - /* - * Regardless of whether PCID is enumerated, the SDM says - * that it can't be enabled in 32-bit mode. - */ - setup_clear_cpu_cap(X86_FEATURE_PCID); -#endif - identify_boot_cpu();
if (!IS_ENABLED(CONFIG_SMP)) { --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -890,6 +890,14 @@ static void __init early_identify_cpu(st setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
fpu__init_system(c); + +#ifdef CONFIG_X86_32 + /* + * Regardless of whether PCID is enumerated, the SDM says + * that it can't be enabled in 32-bit mode. + */ + setup_clear_cpu_cap(X86_FEATURE_PCID); +#endif }
void __init early_cpu_init(void)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit e390f9a9689a42f477a6073e2e7df530a4c1b740 upstream.
The '__unreachable' and '__func_stack_frame_non_standard' sections are only used at compile time. They're discarded for vmlinux but they should also be discarded for modules.
Since this is a recurring pattern, prefix the section names with ".discard.". It's a nice convention and vmlinux.lds.h already discards such sections.
Also remove the 'a' (allocatable) flag from the __unreachable section since it doesn't make sense for a discarded section.
Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Jessica Yu jeyu@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: d1091c7fa3d5 ("objtool: Improve detection of BUG() and other dead ends") Link: http://lkml.kernel.org/r/20170301180444.lhd53c5tibc4ns77@treble Signed-off-by: Ingo Molnar mingo@kernel.org [dwmw2: Remove the unreachable part in backporting since it's not here yet] Signed-off-by: David Woodhouse dwmw@amazon.co.ku Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/frame.h | 2 +- scripts/mod/modpost.c | 1 + scripts/module-common.lds | 5 ++++- tools/objtool/builtin-check.c | 2 +- 4 files changed, 7 insertions(+), 3 deletions(-)
--- a/include/linux/frame.h +++ b/include/linux/frame.h @@ -11,7 +11,7 @@ * For more information, see tools/objtool/Documentation/stack-validation.txt. */ #define STACK_FRAME_NON_STANDARD(func) \ - static void __used __section(__func_stack_frame_non_standard) \ + static void __used __section(.discard.func_stack_frame_non_standard) \ *__func_stack_frame_non_standard_##func = func
#else /* !CONFIG_STACK_VALIDATION */ --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -838,6 +838,7 @@ static const char *const section_white_l ".cmem*", /* EZchip */ ".fmt_slot*", /* EZchip */ ".gnu.lto*", + ".discard.*", NULL };
--- a/scripts/module-common.lds +++ b/scripts/module-common.lds @@ -4,7 +4,10 @@ * combine them automatically. */ SECTIONS { - /DISCARD/ : { *(.discard) } + /DISCARD/ : { + *(.discard) + *(.discard.*) + }
__ksymtab 0 : { *(SORT(___ksymtab+*)) } __ksymtab_gpl 0 : { *(SORT(___ksymtab_gpl+*)) } --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -1229,7 +1229,7 @@ int cmd_check(int argc, const char **arg
INIT_LIST_HEAD(&file.insn_list); hash_init(file.insn_hash); - file.whitelist = find_section_by_name(file.elf, "__func_stack_frame_non_standard"); + file.whitelist = find_section_by_name(file.elf, ".discard.func_stack_frame_non_standard"); file.rodata = find_section_by_name(file.elf, ".rodata"); file.ignore_unreachables = false; file.c_file = find_section_by_name(file.elf, ".comment");
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 39b735332cb8b33a27c28592d969e4016c86c3ea upstream.
A direct jump to a retpoline thunk is really an indirect jump in disguise. Change the objtool instruction type accordingly.
Objtool needs to know where indirect branches are so it can detect switch statement jump tables.
This fixes a bunch of warnings with CONFIG_RETPOLINE like:
arch/x86/events/intel/uncore_nhmex.o: warning: objtool: nhmex_rbox_msr_enable_event()+0x44: sibling call from callable instruction with modified stack frame kernel/signal.o: warning: objtool: copy_siginfo_to_user()+0x91: sibling call from callable instruction with modified stack frame ...
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-2-git-send-email-dwmw@amazon.co.u... [dwmw2: Applies to tools/objtool/builtin-check.c not check.c] Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/objtool/builtin-check.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -382,6 +382,13 @@ static int add_jump_destinations(struct } else if (rela->sym->sec->idx) { dest_sec = rela->sym->sec; dest_off = rela->sym->sym.st_value + rela->addend + 4; + } else if (strstr(rela->sym->name, "_indirect_thunk_")) { + /* + * Retpoline jumps are really dynamic jumps in + * disguise, so convert them accordingly. + */ + insn->type = INSN_JUMP_DYNAMIC; + continue; } else { /* sibling call */ insn->jump_dest = 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josh Poimboeuf jpoimboe@redhat.com
commit 258c76059cece01bebae098e81bacb1af2edad17 upstream.
Getting objtool to understand retpolines is going to be a bit of a challenge. For now, take advantage of the fact that retpolines are patched in with alternatives. Just read the original (sane) non-alternative instruction, and ignore the patched-in retpoline.
This allows objtool to understand the control flow *around* the retpoline, even if it can't yet follow what's inside. This means the ORC unwinder will fail to unwind from inside a retpoline, but will work fine otherwise.
Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-3-git-send-email-dwmw@amazon.co.u... [dwmw2: Applies to tools/objtool/builtin-check.c not check.[ch]] Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/objtool/builtin-check.c | 64 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 7 deletions(-)
--- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -51,7 +51,7 @@ struct instruction { unsigned int len, state; unsigned char type; unsigned long immediate; - bool alt_group, visited; + bool alt_group, visited, ignore_alts; struct symbol *call_dest; struct instruction *jump_dest; struct list_head alts; @@ -353,6 +353,40 @@ static void add_ignores(struct objtool_f }
/* + * FIXME: For now, just ignore any alternatives which add retpolines. This is + * a temporary hack, as it doesn't allow ORC to unwind from inside a retpoline. + * But it at least allows objtool to understand the control flow *around* the + * retpoline. + */ +static int add_nospec_ignores(struct objtool_file *file) +{ + struct section *sec; + struct rela *rela; + struct instruction *insn; + + sec = find_section_by_name(file->elf, ".rela.discard.nospec"); + if (!sec) + return 0; + + list_for_each_entry(rela, &sec->rela_list, list) { + if (rela->sym->type != STT_SECTION) { + WARN("unexpected relocation symbol type in %s", sec->name); + return -1; + } + + insn = find_insn(file, rela->sym->sec, rela->addend); + if (!insn) { + WARN("bad .discard.nospec entry"); + return -1; + } + + insn->ignore_alts = true; + } + + return 0; +} + +/* * Find the destination instructions for all jumps. */ static int add_jump_destinations(struct objtool_file *file) @@ -435,11 +469,18 @@ static int add_call_destinations(struct dest_off = insn->offset + insn->len + insn->immediate; insn->call_dest = find_symbol_by_offset(insn->sec, dest_off); + /* + * FIXME: Thanks to retpolines, it's now considered + * normal for a function to call within itself. So + * disable this warning for now. + */ +#if 0 if (!insn->call_dest) { WARN_FUNC("can't find call dest symbol at offset 0x%lx", insn->sec, insn->offset, dest_off); return -1; } +#endif } else if (rela->sym->type == STT_SECTION) { insn->call_dest = find_symbol_by_offset(rela->sym->sec, rela->addend+4); @@ -601,12 +642,6 @@ static int add_special_section_alts(stru return ret;
list_for_each_entry_safe(special_alt, tmp, &special_alts, list) { - alt = malloc(sizeof(*alt)); - if (!alt) { - WARN("malloc failed"); - ret = -1; - goto out; - }
orig_insn = find_insn(file, special_alt->orig_sec, special_alt->orig_off); @@ -617,6 +652,10 @@ static int add_special_section_alts(stru goto out; }
+ /* Ignore retpoline alternatives. */ + if (orig_insn->ignore_alts) + continue; + new_insn = NULL; if (!special_alt->group || special_alt->new_len) { new_insn = find_insn(file, special_alt->new_sec, @@ -642,6 +681,13 @@ static int add_special_section_alts(stru goto out; }
+ alt = malloc(sizeof(*alt)); + if (!alt) { + WARN("malloc failed"); + ret = -1; + goto out; + } + alt->insn = new_insn; list_add_tail(&alt->list, &orig_insn->alts);
@@ -861,6 +907,10 @@ static int decode_sections(struct objtoo
add_ignores(file);
+ ret = add_nospec_ignores(file); + if (ret) + return ret; + ret = add_jump_destinations(file); if (ret) return ret;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Ryabinin aryabinin@virtuozzo.com
commit 196bd485ee4f03ce4c690bfcf38138abfcd0a4bc upstream.
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions:
-mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d>
+sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a>
Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function.
Signed-off-by: Andrey Ryabinin aryabinin@virtuozzo.com Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar mingo@kernel.org [dwmw2: We want ASM_CALL_CONSTRAINT for retpoline] Signed-off-by: David Woodhouse dwmw@amazon.co.ku Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/asm.h | 11 +++++++++++ arch/x86/include/asm/thread_info.h | 11 ----------- arch/x86/kernel/irq_32.c | 6 +++--- arch/x86/kernel/traps.c | 2 +- arch/x86/mm/tlb.c | 2 +- 5 files changed, 16 insertions(+), 16 deletions(-)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -125,4 +125,15 @@ /* For C file, we already have NOKPROBE_SYMBOL macro */ #endif
+#ifndef __ASSEMBLY__ +/* + * This output constraint should be used for any inline asm which has a "call" + * instruction. Otherwise the asm may be inserted before the frame pointer + * gets set up by the containing function. If you forget to do this, objtool + * may print a "call without frame pointer save/setup" warning. + */ +register unsigned long current_stack_pointer asm(_ASM_SP); +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) +#endif + #endif /* _ASM_X86_ASM_H */ --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -152,17 +152,6 @@ struct thread_info { */ #ifndef __ASSEMBLY__
-static inline unsigned long current_stack_pointer(void) -{ - unsigned long sp; -#ifdef CONFIG_X86_64 - asm("mov %%rsp,%0" : "=g" (sp)); -#else - asm("mov %%esp,%0" : "=g" (sp)); -#endif - return sp; -} - /* * Walks up the stack frames to make sure that the specified object is * entirely contained by a single stack frame. --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -64,7 +64,7 @@ static void call_on_stack(void *func, vo
static inline void *current_stack(void) { - return (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1)); + return (void *)(current_stack_pointer & ~(THREAD_SIZE - 1)); }
static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc) @@ -88,7 +88,7 @@ static inline int execute_on_irq_stack(i
/* Save the next esp at the bottom of the stack */ prev_esp = (u32 *)irqstk; - *prev_esp = current_stack_pointer(); + *prev_esp = current_stack_pointer;
if (unlikely(overflow)) call_on_stack(print_stack_overflow, isp); @@ -139,7 +139,7 @@ void do_softirq_own_stack(void)
/* Push the previous esp onto the stack */ prev_esp = (u32 *)irqstk; - *prev_esp = current_stack_pointer(); + *prev_esp = current_stack_pointer;
call_on_stack(__do_softirq, isp); } --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -153,7 +153,7 @@ void ist_begin_non_atomic(struct pt_regs * from double_fault. */ BUG_ON((unsigned long)(current_top_of_stack() - - current_stack_pointer()) >= THREAD_SIZE); + current_stack_pointer) >= THREAD_SIZE);
preempt_enable_no_resched(); } --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -110,7 +110,7 @@ void switch_mm_irqs_off(struct mm_struct * mapped in the new pgd, we'll double-fault. Forcibly * map it. */ - unsigned int stack_pgd_index = pgd_index(current_stack_pointer()); + unsigned int stack_pgd_index = pgd_index(current_stack_pointer);
pgd_t *pgd = next->pgd + stack_pgd_index;
On 01/15/2018 03:35 PM, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Andrey Ryabinin aryabinin@virtuozzo.com
commit 196bd485ee4f03ce4c690bfcf38138abfcd0a4bc upstream.
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions:
-mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d>
+sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a>
Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function.
Signed-off-by: Andrey Ryabinin aryabinin@virtuozzo.com Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar mingo@kernel.org [dwmw2: We want ASM_CALL_CONSTRAINT for retpoline]
If we want ASM_CALL_CONSTRAINT it would be more correct to backport f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") and some fixes for it: 520a13c530ae ("x86/asm: Fix inline asm call constraints for GCC 4.4") ca26cffa4e4a ("x86/asm: Allow again using asm.h when building for the 'bpf' clang target")
Because ASM_CALL_CONSTRAINT added in f5caf621ee35, not in this patch.
The end result looks fine though. So it's ok to keep it that way.
Signed-off-by: David Woodhouse dwmw@amazon.co.ku Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/asm.h | 11 +++++++++++ arch/x86/include/asm/thread_info.h | 11 ----------- arch/x86/kernel/irq_32.c | 6 +++--- arch/x86/kernel/traps.c | 2 +- arch/x86/mm/tlb.c | 2 +- 5 files changed, 16 insertions(+), 16 deletions(-)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -125,4 +125,15 @@ /* For C file, we already have NOKPROBE_SYMBOL macro */ #endif +#ifndef __ASSEMBLY__ +/*
- This output constraint should be used for any inline asm which has a "call"
- instruction. Otherwise the asm may be inserted before the frame pointer
- gets set up by the containing function. If you forget to do this, objtool
- may print a "call without frame pointer save/setup" warning.
- */
+register unsigned long current_stack_pointer asm(_ASM_SP); +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) +#endif
#endif /* _ASM_X86_ASM_H */
Resending with correct David's email s/dwmw@amazon.co.ku/dwmw@amazon.co.uk/ (it's incorrect in signed-off-by line)
On 01/15/2018 05:31 PM, Andrey Ryabinin wrote:
On 01/15/2018 03:35 PM, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Andrey Ryabinin aryabinin@virtuozzo.com
commit 196bd485ee4f03ce4c690bfcf38138abfcd0a4bc upstream.
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit:
f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions:
-mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d>
+sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a>
Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function.
Signed-off-by: Andrey Ryabinin aryabinin@virtuozzo.com Reviewed-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar mingo@kernel.org [dwmw2: We want ASM_CALL_CONSTRAINT for retpoline]
If we want ASM_CALL_CONSTRAINT it would be more correct to backport f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") and some fixes for it: 520a13c530ae ("x86/asm: Fix inline asm call constraints for GCC 4.4") ca26cffa4e4a ("x86/asm: Allow again using asm.h when building for the 'bpf' clang target")
Because ASM_CALL_CONSTRAINT added in f5caf621ee35, not in this patch.
The end result looks fine though. So it's ok to keep it that way.
Signed-off-by: David Woodhouse dwmw@amazon.co.ku Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/asm.h | 11 +++++++++++ arch/x86/include/asm/thread_info.h | 11 ----------- arch/x86/kernel/irq_32.c | 6 +++--- arch/x86/kernel/traps.c | 2 +- arch/x86/mm/tlb.c | 2 +- 5 files changed, 16 insertions(+), 16 deletions(-)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -125,4 +125,15 @@ /* For C file, we already have NOKPROBE_SYMBOL macro */ #endif +#ifndef __ASSEMBLY__ +/*
- This output constraint should be used for any inline asm which has a "call"
- instruction. Otherwise the asm may be inserted before the frame pointer
- gets set up by the containing function. If you forget to do this, objtool
- may print a "call without frame pointer save/setup" warning.
- */
+register unsigned long current_stack_pointer asm(_ASM_SP); +#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) +#endif
#endif /* _ASM_X86_ASM_H */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 76b043848fd22dbf7f8bf3a1452f8c70d557b860 upstream.
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide the corresponding thunks. Provide assembler macros for invoking the thunks in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In some circumstances, IBRS microcode features may be used instead, and the retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically simplified to a simple "lfence; jmp *\reg". A future patch, after it has been verified that lfence really is serialising in all circumstances, can enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no guarantee that it stays aligned when it's copied over the oldinstr during alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks] [ tglx: Put actual function CALL/JMP in front of the macros, convert to symbolic labels ] [ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.u... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig | 13 +++ arch/x86/Makefile | 10 ++ arch/x86/include/asm/asm-prototypes.h | 25 ++++++ arch/x86/include/asm/cpufeatures.h | 3 arch/x86/include/asm/nospec-branch.h | 128 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 4 + arch/x86/lib/Makefile | 1 arch/x86/lib/retpoline.S | 48 ++++++++++++ 8 files changed, 232 insertions(+) create mode 100644 arch/x86/include/asm/nospec-branch.h create mode 100644 arch/x86/lib/retpoline.S
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -408,6 +408,19 @@ config GOLDFISH def_bool y depends on X86_GOLDFISH
+config RETPOLINE + bool "Avoid speculative indirect branches in kernel" + default y + ---help--- + Compile kernel with the retpoline compiler options to guard against + kernel-to-user data leaks by avoiding speculative indirect + branches. Requires a compiler with -mindirect-branch=thunk-extern + support for full protection. The kernel may run slower. + + Without compiler support, at least indirect branches in assembler + code are eliminated. Since this includes the syscall entry path, + it is not entirely pointless. + if X86_32 config X86_EXTENDED_PLATFORM bool "Support for extended (non-PC) x86 platforms" --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -182,6 +182,16 @@ KBUILD_CFLAGS += -fno-asynchronous-unwin KBUILD_CFLAGS += $(mflags-y) KBUILD_AFLAGS += $(mflags-y)
+# Avoid indirect branches in kernel to deal with Spectre +ifdef CONFIG_RETPOLINE + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register) + ifneq ($(RETPOLINE_CFLAGS),) + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE + else + $(warning CONFIG_RETPOLINE=y, but not supported by the compiler. Toolchain update recommended.) + endif +endif + archscripts: scripts_basic $(Q)$(MAKE) $(build)=arch/x86/tools relocs
--- a/arch/x86/include/asm/asm-prototypes.h +++ b/arch/x86/include/asm/asm-prototypes.h @@ -10,7 +10,32 @@ #include <asm/pgtable.h> #include <asm/special_insns.h> #include <asm/preempt.h> +#include <asm/asm.h>
#ifndef CONFIG_X86_CMPXCHG64 extern void cmpxchg8b_emu(void); #endif + +#ifdef CONFIG_RETPOLINE +#ifdef CONFIG_X86_32 +#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void); +#else +#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void); +INDIRECT_THUNK(8) +INDIRECT_THUNK(9) +INDIRECT_THUNK(10) +INDIRECT_THUNK(11) +INDIRECT_THUNK(12) +INDIRECT_THUNK(13) +INDIRECT_THUNK(14) +INDIRECT_THUNK(15) +#endif +INDIRECT_THUNK(ax) +INDIRECT_THUNK(bx) +INDIRECT_THUNK(cx) +INDIRECT_THUNK(dx) +INDIRECT_THUNK(si) +INDIRECT_THUNK(di) +INDIRECT_THUNK(bp) +INDIRECT_THUNK(sp) +#endif /* CONFIG_RETPOLINE */ --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -194,6 +194,9 @@ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_RETPOLINE ( 7*32+12) /* Generic Retpoline mitigation for Spectre variant 2 */ +#define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */ + #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ --- /dev/null +++ b/arch/x86/include/asm/nospec-branch.h @@ -0,0 +1,128 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __NOSPEC_BRANCH_H__ +#define __NOSPEC_BRANCH_H__ + +#include <asm/alternative.h> +#include <asm/alternative-asm.h> +#include <asm/cpufeatures.h> + +#ifdef __ASSEMBLY__ + +/* + * This should be used immediately before a retpoline alternative. It tells + * objtool where the retpolines are so that it can make sense of the control + * flow by just reading the original instruction(s) and ignoring the + * alternatives. + */ +.macro ANNOTATE_NOSPEC_ALTERNATIVE + .Lannotate_@: + .pushsection .discard.nospec + .long .Lannotate_@ - . + .popsection +.endm + +/* + * These are the bare retpoline primitives for indirect jmp and call. + * Do not use these directly; they only exist to make the ALTERNATIVE + * invocation below less ugly. + */ +.macro RETPOLINE_JMP reg:req + call .Ldo_rop_@ +.Lspec_trap_@: + pause + jmp .Lspec_trap_@ +.Ldo_rop_@: + mov \reg, (%_ASM_SP) + ret +.endm + +/* + * This is a wrapper around RETPOLINE_JMP so the called function in reg + * returns to the instruction after the macro. + */ +.macro RETPOLINE_CALL reg:req + jmp .Ldo_call_@ +.Ldo_retpoline_jmp_@: + RETPOLINE_JMP \reg +.Ldo_call_@: + call .Ldo_retpoline_jmp_@ +.endm + +/* + * JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple + * indirect jmp/call which may be susceptible to the Spectre variant 2 + * attack. + */ +.macro JMP_NOSPEC reg:req +#ifdef CONFIG_RETPOLINE + ANNOTATE_NOSPEC_ALTERNATIVE + ALTERNATIVE_2 __stringify(jmp *\reg), \ + __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \ + __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD +#else + jmp *\reg +#endif +.endm + +.macro CALL_NOSPEC reg:req +#ifdef CONFIG_RETPOLINE + ANNOTATE_NOSPEC_ALTERNATIVE + ALTERNATIVE_2 __stringify(call *\reg), \ + __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\ + __stringify(lfence; call *\reg), X86_FEATURE_RETPOLINE_AMD +#else + call *\reg +#endif +.endm + +#else /* __ASSEMBLY__ */ + +#define ANNOTATE_NOSPEC_ALTERNATIVE \ + "999:\n\t" \ + ".pushsection .discard.nospec\n\t" \ + ".long 999b - .\n\t" \ + ".popsection\n\t" + +#if defined(CONFIG_X86_64) && defined(RETPOLINE) + +/* + * Since the inline asm uses the %V modifier which is only in newer GCC, + * the 64-bit one is dependent on RETPOLINE not CONFIG_RETPOLINE. + */ +# define CALL_NOSPEC \ + ANNOTATE_NOSPEC_ALTERNATIVE \ + ALTERNATIVE( \ + "call *%[thunk_target]\n", \ + "call __x86_indirect_thunk_%V[thunk_target]\n", \ + X86_FEATURE_RETPOLINE) +# define THUNK_TARGET(addr) [thunk_target] "r" (addr) + +#elif defined(CONFIG_X86_32) && defined(CONFIG_RETPOLINE) +/* + * For i386 we use the original ret-equivalent retpoline, because + * otherwise we'll run out of registers. We don't care about CET + * here, anyway. + */ +# define CALL_NOSPEC ALTERNATIVE("call *%[thunk_target]\n", \ + " jmp 904f;\n" \ + " .align 16\n" \ + "901: call 903f;\n" \ + "902: pause;\n" \ + " jmp 902b;\n" \ + " .align 16\n" \ + "903: addl $4, %%esp;\n" \ + " pushl %[thunk_target];\n" \ + " ret;\n" \ + " .align 16\n" \ + "904: call 901b;\n", \ + X86_FEATURE_RETPOLINE) + +# define THUNK_TARGET(addr) [thunk_target] "rm" (addr) +#else /* No retpoline */ +# define CALL_NOSPEC "call *%[thunk_target]\n" +# define THUNK_TARGET(addr) [thunk_target] "rm" (addr) +#endif + +#endif /* __ASSEMBLY__ */ +#endif /* __NOSPEC_BRANCH_H__ */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -889,6 +889,10 @@ static void __init early_identify_cpu(st setup_force_cpu_bug(X86_BUG_SPECTRE_V1); setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
+#ifdef CONFIG_RETPOLINE + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); +#endif + fpu__init_system(c);
#ifdef CONFIG_X86_32 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -25,6 +25,7 @@ lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o +lib-$(CONFIG_RETPOLINE) += retpoline.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
--- /dev/null +++ b/arch/x86/lib/retpoline.S @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include <linux/stringify.h> +#include <linux/linkage.h> +#include <asm/dwarf2.h> +#include <asm/cpufeatures.h> +#include <asm/alternative-asm.h> +#include <asm/export.h> +#include <asm/nospec-branch.h> + +.macro THUNK reg + .section .text.__x86.indirect_thunk.\reg + +ENTRY(__x86_indirect_thunk_\reg) + CFI_STARTPROC + JMP_NOSPEC %\reg + CFI_ENDPROC +ENDPROC(__x86_indirect_thunk_\reg) +.endm + +/* + * Despite being an assembler file we can't just use .irp here + * because __KSYM_DEPS__ only uses the C preprocessor and would + * only see one instance of "__x86_indirect_thunk_\reg" rather + * than one per register with the correct names. So we do it + * the simple and nasty way... + */ +#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg) +#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg) + +GENERATE_THUNK(_ASM_AX) +GENERATE_THUNK(_ASM_BX) +GENERATE_THUNK(_ASM_CX) +GENERATE_THUNK(_ASM_DX) +GENERATE_THUNK(_ASM_SI) +GENERATE_THUNK(_ASM_DI) +GENERATE_THUNK(_ASM_BP) +GENERATE_THUNK(_ASM_SP) +#ifdef CONFIG_64BIT +GENERATE_THUNK(r8) +GENERATE_THUNK(r9) +GENERATE_THUNK(r10) +GENERATE_THUNK(r11) +GENERATE_THUNK(r12) +GENERATE_THUNK(r13) +GENERATE_THUNK(r14) +GENERATE_THUNK(r15) +#endif
On 01/15/2018, 01:35 PM, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
May I ask if somebody has started the 4.4 port yet?
From: David Woodhouse dwmw@amazon.co.uk
commit 76b043848fd22dbf7f8bf3a1452f8c70d557b860 upstream.
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide the corresponding thunks. Provide assembler macros for invoking the thunks in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In some circumstances, IBRS microcode features may be used instead, and the retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically simplified to a simple "lfence; jmp *\reg". A future patch, after it has been verified that lfence really is serialising in all circumstances, can enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition to X86_FEATURE_RETPOLINE.
Do not align the retpoline in the altinstr section, because there is no guarantee that it stays aligned when it's copied over the oldinstr during alternative patching.
[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks] [ tglx: Put actual function CALL/JMP in front of the macros, convert to symbolic labels ] [ dwmw2: Convert back to numeric labels, merge objtool fixes ]
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.u... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
thanks,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit da285121560e769cc31797bba6422eea71d473e0 upstream.
Add a spectre_v2= option to select the mitigation used for the indirect branch speculation vulnerability.
Currently, the only option available is retpoline, in its various forms. This will be expanded to cover the new IBRS/IBPB microcode features.
The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a serializing instruction, which is indicated by the LFENCE_RDTSC feature.
[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS integration becomes simple ]
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.u... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/kernel-parameters.txt | 28 ++++++ arch/x86/include/asm/nospec-branch.h | 10 ++ arch/x86/kernel/cpu/bugs.c | 158 ++++++++++++++++++++++++++++++++++- arch/x86/kernel/cpu/common.c | 4 4 files changed, 195 insertions(+), 5 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2691,6 +2691,11 @@ bytes respectively. Such letter suffixes nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1.
+ nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2 + (indirect branch prediction) vulnerability. System may + allow data leaks with this option, which is equivalent + to spectre_v2=off. + noxsave [BUGS=X86] Disables x86 extended register state save and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. @@ -3944,6 +3949,29 @@ bytes respectively. Such letter suffixes sonypi.*= [HW] Sony Programmable I/O Control Device driver See Documentation/laptops/sonypi.txt
+ spectre_v2= [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability. + + on - unconditionally enable + off - unconditionally disable + auto - kernel detects whether your CPU model is + vulnerable + + Selecting 'on' will, and 'auto' may, choose a + mitigation method at run time according to the + CPU, the available microcode, the setting of the + CONFIG_RETPOLINE configuration option, and the + compiler with which the kernel was built. + + Specific mitigations can also be selected manually: + + retpoline - replace indirect branches + retpoline,generic - google's original retpoline + retpoline,amd - AMD-specific minimal thunk + + Not specifying this option is equivalent to + spectre_v2=auto. + spia_io_base= [HW,MTD] spia_fio_base= spia_pedr= --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -124,5 +124,15 @@ # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) #endif
+/* The Spectre V2 mitigation variants */ +enum spectre_v2_mitigation { + SPECTRE_V2_NONE, + SPECTRE_V2_RETPOLINE_MINIMAL, + SPECTRE_V2_RETPOLINE_MINIMAL_AMD, + SPECTRE_V2_RETPOLINE_GENERIC, + SPECTRE_V2_RETPOLINE_AMD, + SPECTRE_V2_IBRS, +}; + #endif /* __ASSEMBLY__ */ #endif /* __NOSPEC_BRANCH_H__ */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -10,6 +10,9 @@ #include <linux/init.h> #include <linux/utsname.h> #include <linux/cpu.h> + +#include <asm/nospec-branch.h> +#include <asm/cmdline.h> #include <asm/bugs.h> #include <asm/processor.h> #include <asm/processor-flags.h> @@ -20,6 +23,8 @@ #include <asm/pgtable.h> #include <asm/cacheflush.h>
+static void __init spectre_v2_select_mitigation(void); + void __init check_bugs(void) { identify_boot_cpu(); @@ -29,6 +34,9 @@ void __init check_bugs(void) print_cpu_info(&boot_cpu_data); }
+ /* Select the proper spectre mitigation before patching alternatives */ + spectre_v2_select_mitigation(); + #ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. @@ -61,6 +69,153 @@ void __init check_bugs(void) #endif }
+/* The kernel command line selection */ +enum spectre_v2_mitigation_cmd { + SPECTRE_V2_CMD_NONE, + SPECTRE_V2_CMD_AUTO, + SPECTRE_V2_CMD_FORCE, + SPECTRE_V2_CMD_RETPOLINE, + SPECTRE_V2_CMD_RETPOLINE_GENERIC, + SPECTRE_V2_CMD_RETPOLINE_AMD, +}; + +static const char *spectre_v2_strings[] = { + [SPECTRE_V2_NONE] = "Vulnerable", + [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", + [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", + [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", + [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", +}; + +#undef pr_fmt +#define pr_fmt(fmt) "Spectre V2 mitigation: " fmt + +static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; + +static void __init spec2_print_if_insecure(const char *reason) +{ + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + pr_info("%s\n", reason); +} + +static void __init spec2_print_if_secure(const char *reason) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + pr_info("%s\n", reason); +} + +static inline bool retp_compiler(void) +{ + return __is_defined(RETPOLINE); +} + +static inline bool match_option(const char *arg, int arglen, const char *opt) +{ + int len = strlen(opt); + + return len == arglen && !strncmp(arg, opt, len); +} + +static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) +{ + char arg[20]; + int ret; + + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, + sizeof(arg)); + if (ret > 0) { + if (match_option(arg, ret, "off")) { + goto disable; + } else if (match_option(arg, ret, "on")) { + spec2_print_if_secure("force enabled on command line."); + return SPECTRE_V2_CMD_FORCE; + } else if (match_option(arg, ret, "retpoline")) { + spec2_print_if_insecure("retpoline selected on command line."); + return SPECTRE_V2_CMD_RETPOLINE; + } else if (match_option(arg, ret, "retpoline,amd")) { + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { + pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO; + } + spec2_print_if_insecure("AMD retpoline selected on command line."); + return SPECTRE_V2_CMD_RETPOLINE_AMD; + } else if (match_option(arg, ret, "retpoline,generic")) { + spec2_print_if_insecure("generic retpoline selected on command line."); + return SPECTRE_V2_CMD_RETPOLINE_GENERIC; + } else if (match_option(arg, ret, "auto")) { + return SPECTRE_V2_CMD_AUTO; + } + } + + if (!cmdline_find_option_bool(boot_command_line, "nospectre_v2")) + return SPECTRE_V2_CMD_AUTO; +disable: + spec2_print_if_insecure("disabled on command line."); + return SPECTRE_V2_CMD_NONE; +} + +static void __init spectre_v2_select_mitigation(void) +{ + enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); + enum spectre_v2_mitigation mode = SPECTRE_V2_NONE; + + /* + * If the CPU is not affected and the command line mode is NONE or AUTO + * then nothing to do. + */ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2) && + (cmd == SPECTRE_V2_CMD_NONE || cmd == SPECTRE_V2_CMD_AUTO)) + return; + + switch (cmd) { + case SPECTRE_V2_CMD_NONE: + return; + + case SPECTRE_V2_CMD_FORCE: + /* FALLTRHU */ + case SPECTRE_V2_CMD_AUTO: + goto retpoline_auto; + + case SPECTRE_V2_CMD_RETPOLINE_AMD: + if (IS_ENABLED(CONFIG_RETPOLINE)) + goto retpoline_amd; + break; + case SPECTRE_V2_CMD_RETPOLINE_GENERIC: + if (IS_ENABLED(CONFIG_RETPOLINE)) + goto retpoline_generic; + break; + case SPECTRE_V2_CMD_RETPOLINE: + if (IS_ENABLED(CONFIG_RETPOLINE)) + goto retpoline_auto; + break; + } + pr_err("kernel not compiled with retpoline; no mitigation available!"); + return; + +retpoline_auto: + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { + retpoline_amd: + if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { + pr_err("LFENCE not serializing. Switching to generic retpoline\n"); + goto retpoline_generic; + } + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : + SPECTRE_V2_RETPOLINE_MINIMAL_AMD; + setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); + } else { + retpoline_generic: + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : + SPECTRE_V2_RETPOLINE_MINIMAL; + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); + } + + spectre_v2_enabled = mode; + pr_info("%s\n", spectre_v2_strings[mode]); +} + +#undef pr_fmt + #ifdef CONFIG_SYSFS ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) @@ -85,6 +240,7 @@ ssize_t cpu_show_spectre_v2(struct devic { if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) return sprintf(buf, "Not affected\n"); - return sprintf(buf, "Vulnerable\n"); + + return sprintf(buf, "%s\n", spectre_v2_strings[spectre_v2_enabled]); } #endif --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -889,10 +889,6 @@ static void __init early_identify_cpu(st setup_force_cpu_bug(X86_BUG_SPECTRE_V1); setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
-#ifdef CONFIG_RETPOLINE - setup_force_cpu_cap(X86_FEATURE_RETPOLINE); -#endif - fpu__init_system(c);
#ifdef CONFIG_X86_32
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 9697fa39efd3fc3692f2949d4045f393ec58450b upstream.
Convert all indirect jumps in crypto assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-6-git-send-email-dwmw@amazon.co.u... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/crypto/aesni-intel_asm.S | 5 +++-- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 ++- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 ++- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 ++- 4 files changed, 9 insertions(+), 5 deletions(-)
--- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -32,6 +32,7 @@ #include <linux/linkage.h> #include <asm/inst.h> #include <asm/frame.h> +#include <asm/nospec-branch.h>
/* * The following macros are used to move an (un)aligned 16 byte value to/from @@ -2734,7 +2735,7 @@ ENTRY(aesni_xts_crypt8) pxor INC, STATE4 movdqu IV, 0x30(OUTP)
- call *%r11 + CALL_NOSPEC %r11
movdqu 0x00(OUTP), INC pxor INC, STATE1 @@ -2779,7 +2780,7 @@ ENTRY(aesni_xts_crypt8) _aesni_gf128mul_x_ble() movups IV, (IVP)
- call *%r11 + CALL_NOSPEC %r11
movdqu 0x40(OUTP), INC pxor INC, STATE1 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -17,6 +17,7 @@
#include <linux/linkage.h> #include <asm/frame.h> +#include <asm/nospec-branch.h>
#define CAMELLIA_TABLE_BYTE_LEN 272
@@ -1224,7 +1225,7 @@ camellia_xts_crypt_16way: vpxor 14 * 16(%rax), %xmm15, %xmm14; vpxor 15 * 16(%rax), %xmm15, %xmm15;
- call *%r9; + CALL_NOSPEC %r9;
addq $(16 * 16), %rsp;
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -12,6 +12,7 @@
#include <linux/linkage.h> #include <asm/frame.h> +#include <asm/nospec-branch.h>
#define CAMELLIA_TABLE_BYTE_LEN 272
@@ -1337,7 +1338,7 @@ camellia_xts_crypt_32way: vpxor 14 * 32(%rax), %ymm15, %ymm14; vpxor 15 * 32(%rax), %ymm15, %ymm15;
- call *%r9; + CALL_NOSPEC %r9;
addq $(16 * 32), %rsp;
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -45,6 +45,7 @@
#include <asm/inst.h> #include <linux/linkage.h> +#include <asm/nospec-branch.h>
## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction
@@ -172,7 +173,7 @@ continue_block: movzxw (bufp, %rax, 2), len lea crc_array(%rip), bufp lea (bufp, len, 1), bufp - jmp *bufp + JMP_NOSPEC bufp
################################################################ ## 2a) PROCESS FULL BLOCKS:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 2641f08bb7fc63a636a2b18173221d7040a3512e upstream.
Convert indirect jumps in core 32/64bit entry assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled.
Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return address after the 'call' instruction must be *precisely* at the .Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work, and the use of alternatives will mess that up unless we play horrid games to prepend with NOPs and make the variants the same length. It's not worth it; in the case where we ALTERNATIVE out the retpoline, the first instruction at __x86.indirect_thunk.rax is going to be a bare jmp *%rax anyway.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Ingo Molnar mingo@kernel.org Acked-by: Arjan van de Ven arjan@linux.intel.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.u... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_32.S | 5 +++-- arch/x86/entry/entry_64.S | 10 ++++++++-- 2 files changed, 11 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -45,6 +45,7 @@ #include <asm/asm.h> #include <asm/smap.h> #include <asm/export.h> +#include <asm/nospec-branch.h>
.section .entry.text, "ax"
@@ -260,7 +261,7 @@ ENTRY(ret_from_fork)
/* kernel thread */ 1: movl %edi, %eax - call *%ebx + CALL_NOSPEC %ebx /* * A kernel thread is allowed to return here after successfully * calling do_execve(). Exit to userspace to complete the execve() @@ -1062,7 +1063,7 @@ error_code: movl %ecx, %es TRACE_IRQS_OFF movl %esp, %eax # pt_regs pointer - call *%edi + CALL_NOSPEC %edi jmp ret_from_exception END(page_fault)
--- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -37,6 +37,7 @@ #include <asm/pgtable_types.h> #include <asm/export.h> #include <asm/kaiser.h> +#include <asm/nospec-branch.h> #include <linux/err.h>
/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */ @@ -208,7 +209,12 @@ entry_SYSCALL_64_fastpath: * It might end up jumping to the slow path. If it jumps, RAX * and all argument registers are clobbered. */ +#ifdef CONFIG_RETPOLINE + movq sys_call_table(, %rax, 8), %rax + call __x86_indirect_thunk_rax +#else call *sys_call_table(, %rax, 8) +#endif .Lentry_SYSCALL_64_after_fastpath_call:
movq %rax, RAX(%rsp) @@ -380,7 +386,7 @@ ENTRY(stub_ptregs_64) jmp entry_SYSCALL64_slow_path
1: - jmp *%rax /* Called from C */ + JMP_NOSPEC %rax /* Called from C */ END(stub_ptregs_64)
.macro ptregs_stub func @@ -457,7 +463,7 @@ ENTRY(ret_from_fork) 1: /* kernel thread */ movq %r12, %rdi - call *%rbx + CALL_NOSPEC %rbx /* * A kernel thread is allowed to return here after successfully * calling do_execve(). Exit to userspace to complete the execve()
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 9351803bd803cdbeb9b5a7850b7b6f464806e3db upstream.
Convert all indirect jumps in ftrace assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-8-git-send-email-dwmw@amazon.co.u... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/entry/entry_32.S | 5 +++-- arch/x86/kernel/mcount_64.S | 7 ++++--- 2 files changed, 7 insertions(+), 5 deletions(-)
--- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -985,7 +985,8 @@ trace: movl 0x4(%ebp), %edx subl $MCOUNT_INSN_SIZE, %eax
- call *ftrace_trace_function + movl ftrace_trace_function, %ecx + CALL_NOSPEC %ecx
popl %edx popl %ecx @@ -1021,7 +1022,7 @@ return_to_handler: movl %eax, %ecx popl %edx popl %eax - jmp *%ecx + JMP_NOSPEC %ecx #endif
#ifdef CONFIG_TRACING --- a/arch/x86/kernel/mcount_64.S +++ b/arch/x86/kernel/mcount_64.S @@ -8,7 +8,7 @@ #include <asm/ptrace.h> #include <asm/ftrace.h> #include <asm/export.h> - +#include <asm/nospec-branch.h>
.code64 .section .entry.text, "ax" @@ -290,8 +290,9 @@ trace: * ip and parent ip are used and the list function is called when * function tracing is enabled. */ - call *ftrace_trace_function
+ movq ftrace_trace_function, %r8 + CALL_NOSPEC %r8 restore_mcount_regs
jmp fgraph_trace @@ -334,5 +335,5 @@ GLOBAL(return_to_handler) movq 8(%rsp), %rdx movq (%rsp), %rax addq $24, %rsp - jmp *%rdi + JMP_NOSPEC %rdi #endif
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit e70e5892b28c18f517f29ab6e83bd57705104b31 upstream.
Convert all indirect jumps in hyperv inline asm code to use non-speculative sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-9-git-send-email-dwmw@amazon.co.u... [ backport to 4.9, hopefully correct, not tested... - gregkh ] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hv/hv.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -31,6 +31,7 @@ #include <linux/clockchips.h> #include <asm/hyperv.h> #include <asm/mshyperv.h> +#include <asm/nospec-branch.h> #include "hyperv_vmbus.h"
/* The one and only */ @@ -103,9 +104,10 @@ u64 hv_do_hypercall(u64 control, void *i return (u64)ULLONG_MAX;
__asm__ __volatile__("mov %0, %%r8" : : "r" (output_address) : "r8"); - __asm__ __volatile__("call *%3" : "=a" (hv_status) : + __asm__ __volatile__(CALL_NOSPEC : + "=a" (hv_status) : "c" (control), "d" (input_address), - "m" (hypercall_page)); + THUNK_TARGET(hypercall_page));
return hv_status;
@@ -123,11 +125,12 @@ u64 hv_do_hypercall(u64 control, void *i if (!hypercall_page) return (u64)ULLONG_MAX;
- __asm__ __volatile__ ("call *%8" : "=d"(hv_status_hi), + __asm__ __volatile__ (CALL_NOSPEC : "=d"(hv_status_hi), "=a"(hv_status_lo) : "d" (control_hi), "a" (control_lo), "b" (input_address_hi), "c" (input_address_lo), "D"(output_address_hi), - "S"(output_address_lo), "m" (hypercall_page)); + "S"(output_address_lo), + THUNK_TARGET(hypercall_page));
return hv_status_lo | ((u64)hv_status_hi << 32); #endif /* !x86_64 */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit ea08816d5b185ab3d09e95e393f265af54560350 upstream.
Convert indirect call in Xen hypercall to use non-speculative sequence, when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Reviewed-by: Juergen Gross jgross@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-10-git-send-email-dwmw@amazon.co.... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/xen/hypercall.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -44,6 +44,7 @@ #include <asm/page.h> #include <asm/pgtable.h> #include <asm/smap.h> +#include <asm/nospec-branch.h>
#include <xen/interface/xen.h> #include <xen/interface/sched.h> @@ -216,9 +217,9 @@ privcmd_call(unsigned call, __HYPERCALL_5ARG(a1, a2, a3, a4, a5);
stac(); - asm volatile("call *%[call]" + asm volatile(CALL_NOSPEC : __HYPERCALL_5PARAM - : [call] "a" (&hypercall_page[call]) + : [thunk_target] "a" (&hypercall_page[call]) : __HYPERCALL_CLOBBER5); clac();
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 5096732f6f695001fa2d6f1335a2680b37912c69 upstream.
Convert all indirect jumps in 32bit checksum assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-11-git-send-email-dwmw@amazon.co.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/lib/checksum_32.S | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/arch/x86/lib/checksum_32.S +++ b/arch/x86/lib/checksum_32.S @@ -29,7 +29,8 @@ #include <asm/errno.h> #include <asm/asm.h> #include <asm/export.h> - +#include <asm/nospec-branch.h> + /* * computes a partial checksum, e.g. for TCP/UDP fragments */ @@ -156,7 +157,7 @@ ENTRY(csum_partial) negl %ebx lea 45f(%ebx,%ebx,2), %ebx testl %esi, %esi - jmp *%ebx + JMP_NOSPEC %ebx
# Handle 2-byte-aligned regions 20: addw (%esi), %ax @@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic) andl $-32,%edx lea 3f(%ebx,%ebx), %ebx testl %esi, %esi - jmp *%ebx + JMP_NOSPEC %ebx 1: addl $64,%esi addl $64,%edi SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Kleen ak@linux.intel.com
commit 7614e913db1f40fff819b36216484dc3808995d4 upstream.
Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences.
Signed-off-by: Andi Kleen ak@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Arjan van de Ven arjan@linux.intel.com Acked-by: Ingo Molnar mingo@kernel.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515707194-20531-12-git-send-email-dwmw@amazon.co.... Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/irq_32.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -19,6 +19,7 @@ #include <linux/mm.h>
#include <asm/apic.h> +#include <asm/nospec-branch.h>
#ifdef CONFIG_DEBUG_STACKOVERFLOW
@@ -54,11 +55,11 @@ DEFINE_PER_CPU(struct irq_stack *, softi static void call_on_stack(void *func, void *stack) { asm volatile("xchgl %%ebx,%%esp \n" - "call *%%edi \n" + CALL_NOSPEC "movl %%ebx,%%esp \n" : "=b" (stack) : "0" (stack), - "D"(func) + [thunk_target] "D"(func) : "memory", "cc", "edx", "ecx", "eax"); }
@@ -94,11 +95,11 @@ static inline int execute_on_irq_stack(i call_on_stack(print_stack_overflow, isp);
asm volatile("xchgl %%ebx,%%esp \n" - "call *%%edi \n" + CALL_NOSPEC "movl %%ebx,%%esp \n" : "=a" (arg1), "=b" (isp) : "0" (desc), "1" (isp), - "D" (desc->handle_irq) + [thunk_target] "D" (desc->handle_irq) : "memory", "cc", "ecx"); return 1; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Woodhouse dwmw@amazon.co.uk
commit 117cc7a908c83697b0b737d15ae1eb5943afe35b upstream.
In accordance with the Intel and AMD documentation, we need to overwrite all entries in the RSB on exiting a guest, to prevent malicious branch target predictions from affecting the host kernel. This is needed both for retpoline and for IBRS.
[ak: numbers again for the RSB stuffing labels]
Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Cc: Paul Turner pjt@google.com Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 78 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm.c | 4 + arch/x86/kvm/vmx.c | 4 + 3 files changed, 85 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -7,6 +7,48 @@ #include <asm/alternative-asm.h> #include <asm/cpufeatures.h>
+/* + * Fill the CPU return stack buffer. + * + * Each entry in the RSB, if used for a speculative 'ret', contains an + * infinite 'pause; jmp' loop to capture speculative execution. + * + * This is required in various cases for retpoline and IBRS-based + * mitigations for the Spectre variant 2 vulnerability. Sometimes to + * eliminate potentially bogus entries from the RSB, and sometimes + * purely to ensure that it doesn't get empty, which on some CPUs would + * allow predictions from other (unwanted!) sources to be used. + * + * We define a CPP macro such that it can be used from both .S files and + * inline assembly. It's possible to do a .macro and then include that + * from C via asm(".include <asm/nospec-branch.h>") but let's not go there. + */ + +#define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */ +#define RSB_FILL_LOOPS 16 /* To avoid underflow */ + +/* + * Google experimented with loop-unrolling and this turned out to be + * the optimal version — two calls, each with their own speculation + * trap should their return address end up getting used, in a loop. + */ +#define __FILL_RETURN_BUFFER(reg, nr, sp) \ + mov $(nr/2), reg; \ +771: \ + call 772f; \ +773: /* speculation trap */ \ + pause; \ + jmp 773b; \ +772: \ + call 774f; \ +775: /* speculation trap */ \ + pause; \ + jmp 775b; \ +774: \ + dec reg; \ + jnz 771b; \ + add $(BITS_PER_LONG/8) * nr, sp; + #ifdef __ASSEMBLY__
/* @@ -76,6 +118,20 @@ #endif .endm
+ /* + * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP + * monstrosity above, manually. + */ +.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req +#ifdef CONFIG_RETPOLINE + ANNOTATE_NOSPEC_ALTERNATIVE + ALTERNATIVE "jmp .Lskip_rsb_@", \ + __stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)) \ + \ftr +.Lskip_rsb_@: +#endif +.endm + #else /* __ASSEMBLY__ */
#define ANNOTATE_NOSPEC_ALTERNATIVE \ @@ -119,7 +175,7 @@ X86_FEATURE_RETPOLINE)
# define THUNK_TARGET(addr) [thunk_target] "rm" (addr) -#else /* No retpoline */ +#else /* No retpoline for C / inline asm */ # define CALL_NOSPEC "call *%[thunk_target]\n" # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) #endif @@ -134,5 +190,25 @@ enum spectre_v2_mitigation { SPECTRE_V2_IBRS, };
+/* + * On VMEXIT we must ensure that no RSB predictions learned in the guest + * can be followed in the host, by overwriting the RSB completely. Both + * retpoline and IBRS mitigations for Spectre v2 need this; only on future + * CPUs with IBRS_ATT *might* it be avoided. + */ +static inline void vmexit_fill_RSB(void) +{ +#ifdef CONFIG_RETPOLINE + unsigned long loops = RSB_CLEAR_LOOPS / 2; + + asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE + ALTERNATIVE("jmp 910f", + __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)), + X86_FEATURE_RETPOLINE) + "910:" + : "=&r" (loops), ASM_CALL_CONSTRAINT + : "r" (loops) : "memory" ); +#endif +} #endif /* __ASSEMBLY__ */ #endif /* __NOSPEC_BRANCH_H__ */ --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -44,6 +44,7 @@ #include <asm/debugreg.h> #include <asm/kvm_para.h> #include <asm/irq_remapping.h> +#include <asm/nospec-branch.h>
#include <asm/virtext.h> #include "trace.h" @@ -4917,6 +4918,9 @@ static void svm_vcpu_run(struct kvm_vcpu #endif );
+ /* Eliminate branch target predictions from guest mode */ + vmexit_fill_RSB(); + #ifdef CONFIG_X86_64 wrmsrl(MSR_GS_BASE, svm->host.gs_base); #else --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -48,6 +48,7 @@ #include <asm/kexec.h> #include <asm/apic.h> #include <asm/irq_remapping.h> +#include <asm/nospec-branch.h>
#include "trace.h" #include "pmu.h" @@ -9026,6 +9027,9 @@ static void __noclone vmx_vcpu_run(struc #endif );
+ /* Eliminate branch target predictions from guest mode */ + vmexit_fill_RSB(); + /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */ if (debugctlmsr) update_debugctlmsr(debugctlmsr);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 352909b49ba0d74929b96af6dfbefc854ab6ebb5 upstream.
This tests that the vsyscall entries do what they're expected to do. It also confirms that attempts to read the vsyscall page behave as expected.
If changes are made to the vsyscall code or its memory map handling, running this test in all three of vsyscall=none, vsyscall=emulate, and vsyscall=native are helpful.
(Because it's easy, this also compares the vsyscall results to their vDSO equivalents.)
Note to KAISER backporters: please test this under all three vsyscall modes. Also, in the emulate and native modes, make sure that test_vsyscall_64 agrees with the command line or config option as to which mode you're in. It's quite easy to mess up the kernel such that native mode accidentally emulates or vice versa.
Greg, etc: please backport this to all your Meltdown-patched kernels. It'll help make sure the patches didn't regress vsyscalls.
CSigned-off-by: Andy Lutomirski luto@kernel.org Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: H. Peter Anvin hpa@zytor.com Cc: Hugh Dickins hughd@google.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Juergen Gross jgross@suse.com Cc: Kees Cook keescook@chromium.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/x86/test_vsyscall.c | 500 ++++++++++++++++++++++++++++ 1 file changed, 500 insertions(+)
--- /dev/null +++ b/tools/testing/selftests/x86/test_vsyscall.c @@ -0,0 +1,500 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#define _GNU_SOURCE + +#include <stdio.h> +#include <sys/time.h> +#include <time.h> +#include <stdlib.h> +#include <sys/syscall.h> +#include <unistd.h> +#include <dlfcn.h> +#include <string.h> +#include <inttypes.h> +#include <signal.h> +#include <sys/ucontext.h> +#include <errno.h> +#include <err.h> +#include <sched.h> +#include <stdbool.h> +#include <setjmp.h> + +#ifdef __x86_64__ +# define VSYS(x) (x) +#else +# define VSYS(x) 0 +#endif + +#ifndef SYS_getcpu +# ifdef __x86_64__ +# define SYS_getcpu 309 +# else +# define SYS_getcpu 318 +# endif +#endif + +static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), + int flags) +{ + struct sigaction sa; + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = handler; + sa.sa_flags = SA_SIGINFO | flags; + sigemptyset(&sa.sa_mask); + if (sigaction(sig, &sa, 0)) + err(1, "sigaction"); +} + +/* vsyscalls and vDSO */ +bool should_read_vsyscall = false; + +typedef long (*gtod_t)(struct timeval *tv, struct timezone *tz); +gtod_t vgtod = (gtod_t)VSYS(0xffffffffff600000); +gtod_t vdso_gtod; + +typedef int (*vgettime_t)(clockid_t, struct timespec *); +vgettime_t vdso_gettime; + +typedef long (*time_func_t)(time_t *t); +time_func_t vtime = (time_func_t)VSYS(0xffffffffff600400); +time_func_t vdso_time; + +typedef long (*getcpu_t)(unsigned *, unsigned *, void *); +getcpu_t vgetcpu = (getcpu_t)VSYS(0xffffffffff600800); +getcpu_t vdso_getcpu; + +static void init_vdso(void) +{ + void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); + if (!vdso) + vdso = dlopen("linux-gate.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD); + if (!vdso) { + printf("[WARN]\tfailed to find vDSO\n"); + return; + } + + vdso_gtod = (gtod_t)dlsym(vdso, "__vdso_gettimeofday"); + if (!vdso_gtod) + printf("[WARN]\tfailed to find gettimeofday in vDSO\n"); + + vdso_gettime = (vgettime_t)dlsym(vdso, "__vdso_clock_gettime"); + if (!vdso_gettime) + printf("[WARN]\tfailed to find clock_gettime in vDSO\n"); + + vdso_time = (time_func_t)dlsym(vdso, "__vdso_time"); + if (!vdso_time) + printf("[WARN]\tfailed to find time in vDSO\n"); + + vdso_getcpu = (getcpu_t)dlsym(vdso, "__vdso_getcpu"); + if (!vdso_getcpu) { + /* getcpu() was never wired up in the 32-bit vDSO. */ + printf("[%s]\tfailed to find getcpu in vDSO\n", + sizeof(long) == 8 ? "WARN" : "NOTE"); + } +} + +static int init_vsys(void) +{ +#ifdef __x86_64__ + int nerrs = 0; + FILE *maps; + char line[128]; + bool found = false; + + maps = fopen("/proc/self/maps", "r"); + if (!maps) { + printf("[WARN]\tCould not open /proc/self/maps -- assuming vsyscall is r-x\n"); + should_read_vsyscall = true; + return 0; + } + + while (fgets(line, sizeof(line), maps)) { + char r, x; + void *start, *end; + char name[128]; + if (sscanf(line, "%p-%p %c-%cp %*x %*x:%*x %*u %s", + &start, &end, &r, &x, name) != 5) + continue; + + if (strcmp(name, "[vsyscall]")) + continue; + + printf("\tvsyscall map: %s", line); + + if (start != (void *)0xffffffffff600000 || + end != (void *)0xffffffffff601000) { + printf("[FAIL]\taddress range is nonsense\n"); + nerrs++; + } + + printf("\tvsyscall permissions are %c-%c\n", r, x); + should_read_vsyscall = (r == 'r'); + if (x != 'x') { + vgtod = NULL; + vtime = NULL; + vgetcpu = NULL; + } + + found = true; + break; + } + + fclose(maps); + + if (!found) { + printf("\tno vsyscall map in /proc/self/maps\n"); + should_read_vsyscall = false; + vgtod = NULL; + vtime = NULL; + vgetcpu = NULL; + } + + return nerrs; +#else + return 0; +#endif +} + +/* syscalls */ +static inline long sys_gtod(struct timeval *tv, struct timezone *tz) +{ + return syscall(SYS_gettimeofday, tv, tz); +} + +static inline int sys_clock_gettime(clockid_t id, struct timespec *ts) +{ + return syscall(SYS_clock_gettime, id, ts); +} + +static inline long sys_time(time_t *t) +{ + return syscall(SYS_time, t); +} + +static inline long sys_getcpu(unsigned * cpu, unsigned * node, + void* cache) +{ + return syscall(SYS_getcpu, cpu, node, cache); +} + +static jmp_buf jmpbuf; + +static void sigsegv(int sig, siginfo_t *info, void *ctx_void) +{ + siglongjmp(jmpbuf, 1); +} + +static double tv_diff(const struct timeval *a, const struct timeval *b) +{ + return (double)(a->tv_sec - b->tv_sec) + + (double)((int)a->tv_usec - (int)b->tv_usec) * 1e-6; +} + +static int check_gtod(const struct timeval *tv_sys1, + const struct timeval *tv_sys2, + const struct timezone *tz_sys, + const char *which, + const struct timeval *tv_other, + const struct timezone *tz_other) +{ + int nerrs = 0; + double d1, d2; + + if (tz_other && (tz_sys->tz_minuteswest != tz_other->tz_minuteswest || tz_sys->tz_dsttime != tz_other->tz_dsttime)) { + printf("[FAIL] %s tz mismatch\n", which); + nerrs++; + } + + d1 = tv_diff(tv_other, tv_sys1); + d2 = tv_diff(tv_sys2, tv_other); + printf("\t%s time offsets: %lf %lf\n", which, d1, d2); + + if (d1 < 0 || d2 < 0) { + printf("[FAIL]\t%s time was inconsistent with the syscall\n", which); + nerrs++; + } else { + printf("[OK]\t%s gettimeofday()'s timeval was okay\n", which); + } + + return nerrs; +} + +static int test_gtod(void) +{ + struct timeval tv_sys1, tv_sys2, tv_vdso, tv_vsys; + struct timezone tz_sys, tz_vdso, tz_vsys; + long ret_vdso = -1; + long ret_vsys = -1; + int nerrs = 0; + + printf("[RUN]\ttest gettimeofday()\n"); + + if (sys_gtod(&tv_sys1, &tz_sys) != 0) + err(1, "syscall gettimeofday"); + if (vdso_gtod) + ret_vdso = vdso_gtod(&tv_vdso, &tz_vdso); + if (vgtod) + ret_vsys = vgtod(&tv_vsys, &tz_vsys); + if (sys_gtod(&tv_sys2, &tz_sys) != 0) + err(1, "syscall gettimeofday"); + + if (vdso_gtod) { + if (ret_vdso == 0) { + nerrs += check_gtod(&tv_sys1, &tv_sys2, &tz_sys, "vDSO", &tv_vdso, &tz_vdso); + } else { + printf("[FAIL]\tvDSO gettimeofday() failed: %ld\n", ret_vdso); + nerrs++; + } + } + + if (vgtod) { + if (ret_vsys == 0) { + nerrs += check_gtod(&tv_sys1, &tv_sys2, &tz_sys, "vsyscall", &tv_vsys, &tz_vsys); + } else { + printf("[FAIL]\tvsys gettimeofday() failed: %ld\n", ret_vsys); + nerrs++; + } + } + + return nerrs; +} + +static int test_time(void) { + int nerrs = 0; + + printf("[RUN]\ttest time()\n"); + long t_sys1, t_sys2, t_vdso = 0, t_vsys = 0; + long t2_sys1 = -1, t2_sys2 = -1, t2_vdso = -1, t2_vsys = -1; + t_sys1 = sys_time(&t2_sys1); + if (vdso_time) + t_vdso = vdso_time(&t2_vdso); + if (vtime) + t_vsys = vtime(&t2_vsys); + t_sys2 = sys_time(&t2_sys2); + if (t_sys1 < 0 || t_sys1 != t2_sys1 || t_sys2 < 0 || t_sys2 != t2_sys2) { + printf("[FAIL]\tsyscall failed (ret1:%ld output1:%ld ret2:%ld output2:%ld)\n", t_sys1, t2_sys1, t_sys2, t2_sys2); + nerrs++; + return nerrs; + } + + if (vdso_time) { + if (t_vdso < 0 || t_vdso != t2_vdso) { + printf("[FAIL]\tvDSO failed (ret:%ld output:%ld)\n", t_vdso, t2_vdso); + nerrs++; + } else if (t_vdso < t_sys1 || t_vdso > t_sys2) { + printf("[FAIL]\tvDSO returned the wrong time (%ld %ld %ld)\n", t_sys1, t_vdso, t_sys2); + nerrs++; + } else { + printf("[OK]\tvDSO time() is okay\n"); + } + } + + if (vtime) { + if (t_vsys < 0 || t_vsys != t2_vsys) { + printf("[FAIL]\tvsyscall failed (ret:%ld output:%ld)\n", t_vsys, t2_vsys); + nerrs++; + } else if (t_vsys < t_sys1 || t_vsys > t_sys2) { + printf("[FAIL]\tvsyscall returned the wrong time (%ld %ld %ld)\n", t_sys1, t_vsys, t_sys2); + nerrs++; + } else { + printf("[OK]\tvsyscall time() is okay\n"); + } + } + + return nerrs; +} + +static int test_getcpu(int cpu) +{ + int nerrs = 0; + long ret_sys, ret_vdso = -1, ret_vsys = -1; + + printf("[RUN]\tgetcpu() on CPU %d\n", cpu); + + cpu_set_t cpuset; + CPU_ZERO(&cpuset); + CPU_SET(cpu, &cpuset); + if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0) { + printf("[SKIP]\tfailed to force CPU %d\n", cpu); + return nerrs; + } + + unsigned cpu_sys, cpu_vdso, cpu_vsys, node_sys, node_vdso, node_vsys; + unsigned node = 0; + bool have_node = false; + ret_sys = sys_getcpu(&cpu_sys, &node_sys, 0); + if (vdso_getcpu) + ret_vdso = vdso_getcpu(&cpu_vdso, &node_vdso, 0); + if (vgetcpu) + ret_vsys = vgetcpu(&cpu_vsys, &node_vsys, 0); + + if (ret_sys == 0) { + if (cpu_sys != cpu) { + printf("[FAIL]\tsyscall reported CPU %hu but should be %d\n", cpu_sys, cpu); + nerrs++; + } + + have_node = true; + node = node_sys; + } + + if (vdso_getcpu) { + if (ret_vdso) { + printf("[FAIL]\tvDSO getcpu() failed\n"); + nerrs++; + } else { + if (!have_node) { + have_node = true; + node = node_vdso; + } + + if (cpu_vdso != cpu) { + printf("[FAIL]\tvDSO reported CPU %hu but should be %d\n", cpu_vdso, cpu); + nerrs++; + } else { + printf("[OK]\tvDSO reported correct CPU\n"); + } + + if (node_vdso != node) { + printf("[FAIL]\tvDSO reported node %hu but should be %hu\n", node_vdso, node); + nerrs++; + } else { + printf("[OK]\tvDSO reported correct node\n"); + } + } + } + + if (vgetcpu) { + if (ret_vsys) { + printf("[FAIL]\tvsyscall getcpu() failed\n"); + nerrs++; + } else { + if (!have_node) { + have_node = true; + node = node_vsys; + } + + if (cpu_vsys != cpu) { + printf("[FAIL]\tvsyscall reported CPU %hu but should be %d\n", cpu_vsys, cpu); + nerrs++; + } else { + printf("[OK]\tvsyscall reported correct CPU\n"); + } + + if (node_vsys != node) { + printf("[FAIL]\tvsyscall reported node %hu but should be %hu\n", node_vsys, node); + nerrs++; + } else { + printf("[OK]\tvsyscall reported correct node\n"); + } + } + } + + return nerrs; +} + +static int test_vsys_r(void) +{ +#ifdef __x86_64__ + printf("[RUN]\tChecking read access to the vsyscall page\n"); + bool can_read; + if (sigsetjmp(jmpbuf, 1) == 0) { + *(volatile int *)0xffffffffff600000; + can_read = true; + } else { + can_read = false; + } + + if (can_read && !should_read_vsyscall) { + printf("[FAIL]\tWe have read access, but we shouldn't\n"); + return 1; + } else if (!can_read && should_read_vsyscall) { + printf("[FAIL]\tWe don't have read access, but we should\n"); + return 1; + } else { + printf("[OK]\tgot expected result\n"); + } +#endif + + return 0; +} + + +#ifdef __x86_64__ +#define X86_EFLAGS_TF (1UL << 8) +static volatile sig_atomic_t num_vsyscall_traps; + +static unsigned long get_eflags(void) +{ + unsigned long eflags; + asm volatile ("pushfq\n\tpopq %0" : "=rm" (eflags)); + return eflags; +} + +static void set_eflags(unsigned long eflags) +{ + asm volatile ("pushq %0\n\tpopfq" : : "rm" (eflags) : "flags"); +} + +static void sigtrap(int sig, siginfo_t *info, void *ctx_void) +{ + ucontext_t *ctx = (ucontext_t *)ctx_void; + unsigned long ip = ctx->uc_mcontext.gregs[REG_RIP]; + + if (((ip ^ 0xffffffffff600000UL) & ~0xfffUL) == 0) + num_vsyscall_traps++; +} + +static int test_native_vsyscall(void) +{ + time_t tmp; + bool is_native; + + if (!vtime) + return 0; + + printf("[RUN]\tchecking for native vsyscall\n"); + sethandler(SIGTRAP, sigtrap, 0); + set_eflags(get_eflags() | X86_EFLAGS_TF); + vtime(&tmp); + set_eflags(get_eflags() & ~X86_EFLAGS_TF); + + /* + * If vsyscalls are emulated, we expect a single trap in the + * vsyscall page -- the call instruction will trap with RIP + * pointing to the entry point before emulation takes over. + * In native mode, we expect two traps, since whatever code + * the vsyscall page contains will be more than just a ret + * instruction. + */ + is_native = (num_vsyscall_traps > 1); + + printf("\tvsyscalls are %s (%d instructions in vsyscall page)\n", + (is_native ? "native" : "emulated"), + (int)num_vsyscall_traps); + + return 0; +} +#endif + +int main(int argc, char **argv) +{ + int nerrs = 0; + + init_vdso(); + nerrs += init_vsys(); + + nerrs += test_gtod(); + nerrs += test_time(); + nerrs += test_getcpu(0); + nerrs += test_getcpu(1); + + sethandler(SIGSEGV, sigsegv, 0); + nerrs += test_vsys_r(); + +#ifdef __x86_64__ + nerrs += test_native_vsyscall(); +#endif + + return nerrs ? 1 : 0; +}
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit b8b9ce4b5aec8de9e23cabb0a26b78641f9ab1d6 upstream.
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler does not have retpoline support. Linus rationale for this is:
It's wrong because it will just make people turn off RETPOLINE, and the asm updates - and return stack clearing - that are independent of the compiler are likely the most important parts because they are likely the ones easiest to target.
And it's annoying because most people won't be able to do anything about it. The number of people building their own compiler? Very small. So if their distro hasn't got a compiler yet (and pretty much nobody does), the warning is just annoying crap.
It is already properly reported as part of the sysfs interface. The compile-time warning only encourages bad things.
Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support") Requested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel riel@redhat.com Cc: Andi Kleen ak@linux.intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: thomas.lendacky@amd.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jikos@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Hansen dave.hansen@intel.com Cc: Kees Cook keescook@google.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Greg Kroah-Hartman gregkh@linux-foundation.org Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAt... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/Makefile | 2 -- 1 file changed, 2 deletions(-)
--- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -187,8 +187,6 @@ ifdef CONFIG_RETPOLINE RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register) ifneq ($(RETPOLINE_CFLAGS),) KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE - else - $(warning CONFIG_RETPOLINE=y, but not supported by the compiler. Toolchain update recommended.) endif endif
On Mon, Jan 15, 2018 at 01:33:59PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.77-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.9.77-rc2 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: aa8f4c62cd1ceee8dd60ba20231fc44d13b01db1 git describe: v4.9.76-99-gaa8f4c62cd1c Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.76-99-g...
No regressions (compared to build v4.9.76-97-g8cc984c38b60)
Boards, architectures and test suites: -------------------------------------
hi6220-hikey - arm64 * boot - pass: 20, * kselftest - skip: 23, pass: 40, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 1, pass: 21, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 121, pass: 983, * ltp-timers-tests - pass: 12,
juno-r2 - arm64 * boot - pass: 20, * kselftest - skip: 23, pass: 40, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - pass: 14, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 121, pass: 987, * ltp-timers-tests - pass: 12,
x15 - arm * boot - pass: 20, * kselftest - skip: 25, pass: 37, * libhugetlbfs - skip: 1, pass: 87, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - pass: 60, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - skip: 2, pass: 20, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 1, pass: 13, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 66, pass: 1037, * ltp-timers-tests - pass: 12,
x86_64 * boot - pass: 20, * kselftest - skip: 24, pass: 53, * libhugetlbfs - skip: 1, pass: 90, * ltp-cap_bounds-tests - pass: 2, * ltp-containers-tests - pass: 64, * ltp-fcntl-locktests-tests - pass: 2, * ltp-filecaps-tests - pass: 2, * ltp-fs-tests - skip: 1, pass: 61, * ltp-fs_bind-tests - pass: 2, * ltp-fs_perms_simple-tests - pass: 19, * ltp-fsx-tests - pass: 2, * ltp-hugetlb-tests - pass: 22, * ltp-io-tests - pass: 3, * ltp-ipc-tests - pass: 9, * ltp-math-tests - pass: 11, * ltp-nptl-tests - pass: 2, * ltp-pty-tests - pass: 4, * ltp-sched-tests - skip: 1, pass: 9, * ltp-securebits-tests - pass: 4, * ltp-syscalls-tests - skip: 116, pass: 1016, * ltp-timers-tests - pass: 12,
On Mon, Jan 15, 2018 at 04:03:06PM -0600, Dan Rue wrote:
On Mon, Jan 15, 2018 at 01:33:59PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.77-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Really? Did you test ebpf? If not, can you go and manually do that (I don't know if it's part of your skips), as it is important here...
thanks,
greg k-h
On 16 January 2018 at 11:23, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Jan 15, 2018 at 04:03:06PM -0600, Dan Rue wrote:
On Mon, Jan 15, 2018 at 01:33:59PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.77-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Really? Did you test ebpf? If not, can you go and manually do that (I don't know if it's part of your skips), as it is important here...
We do not have selftests/bpf tests for 4.9 So running 4.14 version of bpf test cases on 4.9 kernel causing these failures.
bpf# ./test_tag test_tag: test_ta[ 2947.456687] audit: type=1701 audit(1516100826.662:8): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2672 comm="test_6 g.c:111: tag_from_fdinfo: Assertion `!ret' failed. Aborted (core dumped)
bpf# gdb ./test_tag <> test_tag: test_tag.c:111: tag_from_fdinfo: Assertion `!ret' failed. Program received signal SIGABRT, Aborted. 0x0000ffffb7e96a00 in raise () from /lib64/libc.so.6
bpf# gdb ./test_lpm_map Starting program: /opt/kselftests/mainline/bpf/test_lpm_map test_lpm_map: test_lpm_map.c:191: test_lpm_map: Assertion `map >= 0' failed. Program received signal SIGABRT, Aborted. 0x0000ffffb7e96a00 in raise () from /lib64/libc.so.6
The error log is same before and after the previous stable reviews. We have old bugs on these issues, LKFT: 4.4 and 4.9: kselftest: bpf: test_lpm_map: Assertion `map >= 0' failed https://bugs.linaro.org/show_bug.cgi?id=3119 LKFT: 4.4 and 4.9: kselftest: bpf: test_lpm_map: Assertion `map >= 0' failed https://bugs.linaro.org/show_bug.cgi?id=3116
- Naresh
thanks,
greg k-h _______________________________________________ Lkft-triage mailing list Lkft-triage@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lkft-triage
On Tue, Jan 16, 2018 at 04:49:11PM +0530, Naresh Kamboju wrote:
On 16 January 2018 at 11:23, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Jan 15, 2018 at 04:03:06PM -0600, Dan Rue wrote:
On Mon, Jan 15, 2018 at 01:33:59PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.77-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Really? Did you test ebpf? If not, can you go and manually do that (I don't know if it's part of your skips), as it is important here...
We do not have selftests/bpf tests for 4.9 So running 4.14 version of bpf test cases on 4.9 kernel causing these failures.
bpf# ./test_tag test_tag: test_ta[ 2947.456687] audit: type=1701 audit(1516100826.662:8): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2672 comm="test_6 g.c:111: tag_from_fdinfo: Assertion `!ret' failed. Aborted (core dumped)
bpf# gdb ./test_tag <> test_tag: test_tag.c:111: tag_from_fdinfo: Assertion `!ret' failed. Program received signal SIGABRT, Aborted. 0x0000ffffb7e96a00 in raise () from /lib64/libc.so.6
bpf# gdb ./test_lpm_map Starting program: /opt/kselftests/mainline/bpf/test_lpm_map test_lpm_map: test_lpm_map.c:191: test_lpm_map: Assertion `map >= 0' failed. Program received signal SIGABRT, Aborted. 0x0000ffffb7e96a00 in raise () from /lib64/libc.so.6
The error log is same before and after the previous stable reviews. We have old bugs on these issues, LKFT: 4.4 and 4.9: kselftest: bpf: test_lpm_map: Assertion `map >= 0' failed https://bugs.linaro.org/show_bug.cgi?id=3119 LKFT: 4.4 and 4.9: kselftest: bpf: test_lpm_map: Assertion `map >= 0' failed https://bugs.linaro.org/show_bug.cgi?id=3116
Ok, I guess we are just as broken as before, I'll take it! :)
thanks,
greg k-h
On 01/15/2018 04:33 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
Build results: total: 145 pass: 145 fail: 0 Qemu test results: total: 126 pass: 126 fail: 0
Details are available at http://kerneltests.org/builders.
Guenter
On 01/15/2018 05:33 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.77 release. There are 96 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jan 17 12:33:26 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.77-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org