This fixes quite a number of runtime PM bugs I found that have been
causing some pretty nasty issues such as:
- Deadlocking on boot
- Connector probing potentially not working while the GPU is in runtime
suspend
- i2c char dev not working while the GPU is in runtime suspend
- aux char dev not working while the GPU is in runtime suspend
There's definitely more parts of nouveau that need to be fixed to use
runtime power management correctly, such as the hwmon portions, but this
series just handles the more important fixes that should get into stable
for the time being.
Cc: Karol Herbst <karolherbst(a)gmail.com>
Cc: stable(a)vger.kernel.org
Lyude Paul (5):
drm/nouveau: Prevent RPM callback recursion in suspend/resume paths
drm/nouveau: Grab RPM ref while probing outputs
drm/nouveau: Add missing RPM get/put() when probing connectors
drm/nouveau: Grab RPM ref when i2c bus is in use
drm/nouveau: Grab RPM ref when aux bus is in use
drivers/gpu/drm/nouveau/dispnv50/disp.c | 12 +++++++++--
drivers/gpu/drm/nouveau/nouveau_connector.c | 21 +++++++++++++++++--
drivers/gpu/drm/nouveau/nouveau_connector.h | 3 +++
drivers/gpu/drm/nouveau/nouveau_drm.c | 10 ++++++++-
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/aux.c | 12 ++++++++++-
drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 12 ++++++++++-
6 files changed, 63 insertions(+), 7 deletions(-)
--
2.17.1
NOTE, this kernel release is broken for i386 systems. If you are
running such a machine, do NOT update to this release, you will not be
able to boot properly.
I did this release anyway with this known problem as there is a fix in
here for x86-64 systems that was nasty to track down and was affecting
people. Given that the huge majority of systems are NOT i386, I felt
this was a safe release to do at this point in time.
Once the proper fix for i386 systems has been accepted into Linus's tree
(it has been posted already), I will pick it up and do a new 4.17.y
release so that users of those systems can update.
All users of non-i386 systems running the 4.17 kernel series must upgrade.
The updated 4.17.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.17.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/kbuild/kbuild.txt | 9
Makefile | 2
arch/arm/boot/dts/armada-38x.dtsi | 2
arch/arm64/include/asm/simd.h | 19
arch/mips/kernel/process.c | 43 -
arch/mips/kernel/traps.c | 1
arch/mips/mm/ioremap.c | 37 -
arch/x86/crypto/Makefile | 4
arch/x86/crypto/salsa20-i586-asm_32.S | 938 --------------------------
arch/x86/crypto/salsa20-x86_64-asm_64.S | 805 ----------------------
arch/x86/crypto/salsa20_glue.c | 91 --
arch/x86/include/asm/vmx.h | 3
arch/x86/kernel/uprobes.c | 2
arch/x86/kvm/vmx.c | 67 +
arch/x86/kvm/x86.h | 9
arch/x86/purgatory/Makefile | 2
arch/x86/xen/enlighten_pv.c | 25
arch/x86/xen/irq.c | 4
block/bsg.c | 2
crypto/Kconfig | 28
crypto/sha3_generic.c | 2
drivers/acpi/acpica/hwsleep.c | 15
drivers/acpi/nfit/core.c | 44 -
drivers/acpi/nfit/nfit.h | 1
drivers/ata/ahci.c | 60 +
drivers/ata/libata-core.c | 3
drivers/ata/libata-scsi.c | 18
drivers/block/loop.c | 79 +-
drivers/block/loop.h | 1
drivers/gpu/drm/etnaviv/etnaviv_drv.c | 24
drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 3
drivers/gpu/drm/etnaviv/etnaviv_sched.c | 24
drivers/i2c/busses/i2c-tegra.c | 17
drivers/i2c/i2c-core-base.c | 11
drivers/infiniband/Kconfig | 11
drivers/infiniband/core/Makefile | 4
drivers/infiniband/hw/cxgb4/mem.c | 2
drivers/infiniband/hw/hfi1/rc.c | 2
drivers/infiniband/hw/hfi1/uc.c | 4
drivers/infiniband/hw/hfi1/ud.c | 4
drivers/infiniband/hw/hfi1/verbs_txreq.c | 4
drivers/infiniband/hw/hfi1/verbs_txreq.h | 4
drivers/misc/ibmasm/ibmasmfs.c | 27
drivers/misc/mei/interrupt.c | 5
drivers/misc/vmw_balloon.c | 4
drivers/mmc/host/dw_mmc.c | 7
drivers/mmc/host/renesas_sdhi_internal_dmac.c | 3
drivers/mmc/host/sdhci-esdhc-imx.c | 21
drivers/mtd/spi-nor/cadence-quadspi.c | 6
drivers/staging/rtl8723bs/core/rtw_ap.c | 2
drivers/staging/rtlwifi/rtl8822be/hw.c | 2
drivers/staging/rtlwifi/wifi.h | 1
drivers/thunderbolt/domain.c | 4
drivers/usb/core/quirks.c | 4
drivers/usb/host/xhci-mem.c | 2
drivers/usb/misc/yurex.c | 23
drivers/usb/serial/ch341.c | 2
drivers/usb/serial/cp210x.c | 1
drivers/usb/serial/keyspan_pda.c | 4
drivers/usb/serial/mos7840.c | 3
fs/binfmt_elf.c | 5
fs/f2fs/f2fs.h | 13
fs/f2fs/inode.c | 33
fs/f2fs/node.c | 21
fs/f2fs/segment.c | 25
fs/inode.c | 6
fs/proc/task_mmu.c | 3
fs/xfs/libxfs/xfs_ialloc_btree.c | 2
include/linux/libata.h | 1
kernel/bpf/verifier.c | 48 -
kernel/power/user.c | 5
kernel/trace/trace.c | 8
kernel/trace/trace_kprobe.c | 6
kernel/trace/trace_output.c | 5
mm/gup.c | 2
mm/mmap.c | 29
mm/page_alloc.c | 4
mm/rmap.c | 8
net/bridge/netfilter/ebtables.c | 2
net/ipv4/netfilter/ip_tables.c | 1
net/ipv6/netfilter/ip6_tables.c | 1
net/netfilter/nfnetlink_queue.c | 3
sound/pci/hda/patch_hdmi.c | 19
sound/pci/hda/patch_realtek.c | 6
tools/build/Build.include | 4
tools/testing/selftests/bpf/test_verifier.c | 58 +
86 files changed, 701 insertions(+), 2168 deletions(-)
Alexander Usyskin (1):
mei: discard messages from not connected client during power down.
Baruch Siach (1):
ARM: dts: armada-38x: use the new thermal binding
Chris Wilson (1):
ALSA: hda - Handle pm failure during hotplug
Christian Borntraeger (1):
mm: do not drop unused pages when userfaultd is running
Damien Le Moal (2):
ata: Fix ZBC_OUT command block check
ata: Fix ZBC_OUT all bit handling
Dan Carpenter (2):
USB: serial: ch341: fix type promotion bug in ch341_control_in()
xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
Dan Williams (1):
acpi, nfit: Fix scrub idle detection
Daniel Borkmann (1):
bpf: reject passing modified ctx to helper functions
Darrick J. Wong (1):
xfs: fix inobt magic number check
Dmitry Vyukov (1):
crypto: don't optimize keccakf()
Eric Biggers (1):
crypto: x86/salsa20 - remove x86 salsa20 implementations
Eric Dumazet (1):
netfilter: nf_queue: augment nfqa_cfg_policy
Fabio Estevam (2):
drm/etnaviv: Check for platform_device_register_simple() failure
drm/etnaviv: Fix driver unregistering
Florian Westphal (1):
netfilter: x_tables: initialise match/target check parameter struct
Greg Kroah-Hartman (1):
Linux 4.17.7
Hans de Goede (1):
ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
Hui Wang (1):
ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
Jaegeuk Kim (4):
f2fs: give message and set need_fsck given broken node id
f2fs: avoid bug_on on corrupted inode
f2fs: sanity check on sit entry
f2fs: sanity check for total valid node blocks
Jann Horn (2):
ibmasm: don't write out of bounds in read handler
USB: yurex: fix out-of-bounds uaccess in read handler
Jiri Olsa (1):
tracing/kprobe: Release kprobe print_fmt properly
Joel Fernandes (Google) (1):
tracing: Reorder display of TGID to be after PID
Johan Hovold (2):
USB: serial: keyspan_pda: fix modem-status error handling
USB: serial: mos7840: fix status-register error handling
Jon Hunter (1):
i2c: tegra: Fix NACK error handling
Juergen Gross (2):
xen: remove global bit from __default_kernel_pte_mask for pv guests
xen: setup pv irq ops vector earlier
Leon Romanovsky (1):
RDMA/ucm: Mark UCM interface as BROKEN
Linus Torvalds (1):
Fix up non-directory creation in SGID directories
Lucas Stach (1):
drm/etnaviv: bring back progress check in job timeout handler
Marc Orr (1):
kvm: vmx: Nested VM-entry prereqs for event inj.
Michael J. Ruhl (1):
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
Michal Hocko (1):
mm: do not bug_on on incorrect length in __mm_populate()
Mika Westerberg (2):
ahci: Add Intel Ice Lake LP PCI ID
thunderbolt: Notify userspace when boot_acl is changed
Murray McAllister (1):
staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
Nadav Amit (1):
vmw_balloon: fix inflation with batching
Nico Sneck (1):
usb: quirks: add delay quirks for Corsair Strafe
Oleg Nesterov (1):
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
Olli Salonen (1):
USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
Oscar Salvador (1):
fs, elf: make sure to page align bss in load_elf_library
Paul Burton (3):
MIPS: Call dump_stack() from show_regs()
MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
MIPS: Fix ioremap() RAM check
Paul Menzel (1):
tools build: fix # escaping in .cmd files for future Make
Pavel Tatashin (1):
mm: zero unavailable pages before memmap init
Philipp Rudo (1):
x86/purgatory: add missing FORCE to Makefile target
Ping-Ke Shih (1):
staging: r8822be: Fix RTL8822be can't find any wireless AP
Rafael J. Wysocki (1):
ACPICA: Clear status of all events when entering S5
Randy Dunlap (1):
kbuild: delete INSTALL_FW_PATH from kbuild documentation
Stefan Agner (1):
mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
Steve Wise (1):
iw_cxgb4: correctly enforce the max reg_mr depth
Tetsuo Handa (2):
PM / hibernate: Fix oops at snapshot_write()
loop: remember whether sysfs_create_group() was done
Theodore Ts'o (1):
loop: add recursion validation to LOOP_CHANGE_FD
Tony Battersby (1):
bsg: fix bogus EINVAL on non-data commands
Vignesh R (1):
mtd: spi-nor: cadence-quadspi: Fix direct mode write timeouts
Vlastimil Babka (1):
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Wolfram Sang (1):
i2c: recovery: if possible send STOP with recovery pulses
Yandong Zhao (1):
arm64: neon: Fix function may_use_simd() return error status
Yoshihiro Shimoda (1):
mmc: renesas_sdhi_internal_dmac: Cannot clear the RX_IN_USE in abort
x00270170 (1):
mmc: dw_mmc: fix card threshold control configuration
This is the start of the stable review cycle for the 4.17.7 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jul 18 07:34:11 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.7-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.17.7-rc1
Baruch Siach <baruch(a)tkos.co.il>
ARM: dts: armada-38x: use the new thermal binding
Jaegeuk Kim <jaegeuk(a)kernel.org>
f2fs: sanity check for total valid node blocks
Jaegeuk Kim <jaegeuk(a)kernel.org>
f2fs: sanity check on sit entry
Jaegeuk Kim <jaegeuk(a)kernel.org>
f2fs: avoid bug_on on corrupted inode
Jaegeuk Kim <jaegeuk(a)kernel.org>
f2fs: give message and set need_fsck given broken node id
Marc Orr <marcorr(a)google.com>
kvm: vmx: Nested VM-entry prereqs for event inj.
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
loop: remember whether sysfs_create_group() was done
Leon Romanovsky <leon(a)kernel.org>
RDMA/ucm: Mark UCM interface as BROKEN
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
PM / hibernate: Fix oops at snapshot_write()
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix inobt magic number check
Theodore Ts'o <tytso(a)mit.edu>
loop: add recursion validation to LOOP_CHANGE_FD
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: initialise match/target check parameter struct
Dmitry Vyukov <dvyukov(a)google.com>
crypto: don't optimize keccakf()
Eric Dumazet <edumazet(a)google.com>
netfilter: nf_queue: augment nfqa_cfg_policy
Oleg Nesterov <oleg(a)redhat.com>
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
Eric Biggers <ebiggers(a)google.com>
crypto: x86/salsa20 - remove x86 salsa20 implementations
Tony Battersby <tonyb(a)cybernetics.com>
bsg: fix bogus EINVAL on non-data commands
Juergen Gross <jgross(a)suse.com>
xen: setup pv irq ops vector earlier
Juergen Gross <jgross(a)suse.com>
xen: remove global bit from __default_kernel_pte_mask for pv guests
Steve Wise <swise(a)opengridcomputing.com>
iw_cxgb4: correctly enforce the max reg_mr depth
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: recovery: if possible send STOP with recovery pulses
Jon Hunter <jonathanh(a)nvidia.com>
i2c: tegra: Fix NACK error handling
Michael J. Ruhl <michael.j.ruhl(a)intel.com>
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
Paul Menzel <pmenzel(a)molgen.mpg.de>
tools build: fix # escaping in .cmd files for future Make
Yandong Zhao <yandong77520(a)gmail.com>
arm64: neon: Fix function may_use_simd() return error status
Dan Williams <dan.j.williams(a)intel.com>
acpi, nfit: Fix scrub idle detection
Randy Dunlap <rdunlap(a)infradead.org>
kbuild: delete INSTALL_FW_PATH from kbuild documentation
Joel Fernandes (Google) <joel(a)joelfernandes.org>
tracing: Reorder display of TGID to be after PID
Michal Hocko <mhocko(a)suse.com>
mm: do not bug_on on incorrect length in __mm_populate()
Oscar Salvador <osalvador(a)suse.de>
fs, elf: make sure to page align bss in load_elf_library
Philipp Rudo <prudo(a)linux.ibm.com>
x86/purgatory: add missing FORCE to Makefile target
Vlastimil Babka <vbabka(a)suse.cz>
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Christian Borntraeger <borntraeger(a)de.ibm.com>
mm: do not drop unused pages when userfaultd is running
Chris Wilson <chris(a)chris-wilson.co.uk>
ALSA: hda - Handle pm failure during hotplug
Hui Wang <hui.wang(a)canonical.com>
ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
Pavel Tatashin <pasha.tatashin(a)oracle.com>
mm: zero unavailable pages before memmap init
Linus Torvalds <torvalds(a)linux-foundation.org>
Fix up non-directory creation in SGID directories
Dan Carpenter <dan.carpenter(a)oracle.com>
xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
Nico Sneck <snecknico(a)gmail.com>
usb: quirks: add delay quirks for Corsair Strafe
Johan Hovold <johan(a)kernel.org>
USB: serial: mos7840: fix status-register error handling
Jann Horn <jannh(a)google.com>
USB: yurex: fix out-of-bounds uaccess in read handler
Johan Hovold <johan(a)kernel.org>
USB: serial: keyspan_pda: fix modem-status error handling
Olli Salonen <olli.salonen(a)iki.fi>
USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: serial: ch341: fix type promotion bug in ch341_control_in()
Mika Westerberg <mika.westerberg(a)linux.intel.com>
thunderbolt: Notify userspace when boot_acl is changed
Hans de Goede <hdegoede(a)redhat.com>
ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
Mika Westerberg <mika.westerberg(a)linux.intel.com>
ahci: Add Intel Ice Lake LP PCI ID
Nadav Amit <namit(a)vmware.com>
vmw_balloon: fix inflation with batching
Jiri Olsa <jolsa(a)kernel.org>
tracing/kprobe: Release kprobe print_fmt properly
Vignesh R <vigneshr(a)ti.com>
mtd: spi-nor: cadence-quadspi: Fix direct mode write timeouts
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: discard messages from not connected client during power down.
Damien Le Moal <damien.lemoal(a)wdc.com>
ata: Fix ZBC_OUT all bit handling
Damien Le Moal <damien.lemoal(a)wdc.com>
ata: Fix ZBC_OUT command block check
Ping-Ke Shih <pkshih(a)realtek.com>
staging: r8822be: Fix RTL8822be can't find any wireless AP
Murray McAllister <murray.mcallister(a)insomniasec.com>
staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
Jann Horn <jannh(a)google.com>
ibmasm: don't write out of bounds in read handler
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
mmc: renesas_sdhi_internal_dmac: Cannot clear the RX_IN_USE in abort
x00270170 <xiaqing17(a)hisilicon.com>
mmc: dw_mmc: fix card threshold control configuration
Stefan Agner <stefan(a)agner.ch>
mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPICA: Clear status of all events when entering S5
Lucas Stach <l.stach(a)pengutronix.de>
drm/etnaviv: bring back progress check in job timeout handler
Fabio Estevam <fabio.estevam(a)nxp.com>
drm/etnaviv: Fix driver unregistering
Fabio Estevam <fabio.estevam(a)nxp.com>
drm/etnaviv: Check for platform_device_register_simple() failure
Paul Burton <paul.burton(a)mips.com>
MIPS: Fix ioremap() RAM check
Paul Burton <paul.burton(a)mips.com>
MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Paul Burton <paul.burton(a)mips.com>
MIPS: Call dump_stack() from show_regs()
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: reject passing modified ctx to helper functions
-------------
Diffstat:
Documentation/kbuild/kbuild.txt | 9 -
Makefile | 4 +-
arch/arm/boot/dts/armada-38x.dtsi | 2 +-
arch/arm64/include/asm/simd.h | 19 +-
arch/mips/kernel/process.c | 43 +-
arch/mips/kernel/traps.c | 1 +
arch/mips/mm/ioremap.c | 37 +-
arch/x86/crypto/Makefile | 4 -
arch/x86/crypto/salsa20-i586-asm_32.S | 938 --------------------------
arch/x86/crypto/salsa20-x86_64-asm_64.S | 805 ----------------------
arch/x86/crypto/salsa20_glue.c | 91 ---
arch/x86/include/asm/vmx.h | 3 +
arch/x86/kernel/uprobes.c | 2 +-
arch/x86/kvm/vmx.c | 67 ++
arch/x86/kvm/x86.h | 9 +
arch/x86/purgatory/Makefile | 2 +-
arch/x86/xen/enlighten_pv.c | 25 +-
arch/x86/xen/irq.c | 4 +-
block/bsg.c | 2 -
crypto/Kconfig | 28 -
crypto/sha3_generic.c | 2 +-
drivers/acpi/acpica/hwsleep.c | 15 +-
drivers/acpi/nfit/core.c | 44 +-
drivers/acpi/nfit/nfit.h | 1 +
drivers/ata/ahci.c | 60 ++
drivers/ata/libata-core.c | 3 +
drivers/ata/libata-scsi.c | 18 +-
drivers/block/loop.c | 79 ++-
drivers/block/loop.h | 1 +
drivers/gpu/drm/etnaviv/etnaviv_drv.c | 24 +-
drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 3 +
drivers/gpu/drm/etnaviv/etnaviv_sched.c | 24 +
drivers/i2c/busses/i2c-tegra.c | 17 +-
drivers/i2c/i2c-core-base.c | 11 +-
drivers/infiniband/Kconfig | 11 +
drivers/infiniband/core/Makefile | 4 +-
drivers/infiniband/hw/cxgb4/mem.c | 2 +-
drivers/infiniband/hw/hfi1/rc.c | 2 +-
drivers/infiniband/hw/hfi1/uc.c | 4 +-
drivers/infiniband/hw/hfi1/ud.c | 4 +-
drivers/infiniband/hw/hfi1/verbs_txreq.c | 4 +-
drivers/infiniband/hw/hfi1/verbs_txreq.h | 4 +-
drivers/misc/ibmasm/ibmasmfs.c | 27 +-
drivers/misc/mei/interrupt.c | 5 +-
drivers/misc/vmw_balloon.c | 4 +-
drivers/mmc/host/dw_mmc.c | 7 +-
drivers/mmc/host/renesas_sdhi_internal_dmac.c | 3 +-
drivers/mmc/host/sdhci-esdhc-imx.c | 21 +-
drivers/mtd/spi-nor/cadence-quadspi.c | 6 +-
drivers/staging/rtl8723bs/core/rtw_ap.c | 2 +-
drivers/staging/rtlwifi/rtl8822be/hw.c | 2 +-
drivers/staging/rtlwifi/wifi.h | 1 +
drivers/thunderbolt/domain.c | 4 +
drivers/usb/core/quirks.c | 4 +
drivers/usb/host/xhci-mem.c | 2 +-
drivers/usb/misc/yurex.c | 23 +-
drivers/usb/serial/ch341.c | 2 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/keyspan_pda.c | 4 +-
drivers/usb/serial/mos7840.c | 3 +
fs/binfmt_elf.c | 5 +-
fs/f2fs/f2fs.h | 13 +-
fs/f2fs/inode.c | 33 +-
fs/f2fs/node.c | 21 +-
fs/f2fs/segment.c | 25 +
fs/inode.c | 6 +
fs/proc/task_mmu.c | 3 +-
fs/xfs/libxfs/xfs_ialloc_btree.c | 2 +-
include/linux/libata.h | 1 +
kernel/bpf/verifier.c | 48 +-
kernel/power/user.c | 5 +
kernel/trace/trace.c | 8 +-
kernel/trace/trace_kprobe.c | 6 +-
kernel/trace/trace_output.c | 5 +-
mm/gup.c | 2 -
mm/mmap.c | 29 +-
mm/page_alloc.c | 4 +-
mm/rmap.c | 8 +-
net/bridge/netfilter/ebtables.c | 2 +
net/ipv4/netfilter/ip_tables.c | 1 +
net/ipv6/netfilter/ip6_tables.c | 1 +
net/netfilter/nfnetlink_queue.c | 3 +
sound/pci/hda/patch_hdmi.c | 19 +-
sound/pci/hda/patch_realtek.c | 6 +-
tools/build/Build.include | 4 +-
tools/testing/selftests/bpf/test_verifier.c | 58 +-
86 files changed, 702 insertions(+), 2169 deletions(-)
We have got a team of professional to do image editing service for you.
We have 20 image editors and on daily basis 2000 images can be processed.
If you want to check our quality of work please send us a photo with
instruction and we will work on it.
Our Services:
Photo cut out, masking, clipping path
Color, brightness and contrast correction
Beauty, Model retouching, skin retouching
Image cropping and resizing
Correcting the shape and size
We do unlimited revisions until you are satisfied with the work.
Thanks,
Ruby Young
This is a note to let you know that I've just added the patch titled
usb: dwc2: Fix DMA alignment to start at allocated boundary
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 56406e017a883b54b339207b230f85599f4d70ae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antti=20Sepp=C3=A4l=C3=A4?= <a.seppala(a)gmail.com>
Date: Thu, 5 Jul 2018 17:31:53 +0300
Subject: usb: dwc2: Fix DMA alignment to start at allocated boundary
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
supported way") introduced a common way to align DMA allocations.
The code in the commit aligns the struct dma_aligned_buffer but the
actual DMA address pointed by data[0] gets aligned to an offset from
the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
pointers.
This is against the recommendation in Documentation/DMA-API.txt which
states:
Therefore, it is recommended that driver writers who don't take
special care to determine the cache line size at run time only map
virtual regions that begin and end on page boundaries (which are
guaranteed also to be cache line boundaries).
The effect of this is that architectures with non-coherent DMA caches
may run into memory corruption or kernel crashes with Unhandled
kernel unaligned accesses exceptions.
Fix the alignment by positioning the DMA area in front of the allocation
and use memory at the end of the area for storing the orginal
transfer_buffer pointer. This may have the added benefit of increased
performance as the DMA area is now fully aligned on all architectures.
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Antti Seppälä <a.seppala(a)gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
---
drivers/usb/dwc2/hcd.c | 44 ++++++++++++++++++++++--------------------
1 file changed, 23 insertions(+), 21 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index b1104be3429c..2ed0ac18e053 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2665,34 +2665,29 @@ static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
#define DWC2_USB_DMA_ALIGN 4
-struct dma_aligned_buffer {
- void *kmalloc_ptr;
- void *old_xfer_buffer;
- u8 data[0];
-};
-
static void dwc2_free_dma_aligned_buffer(struct urb *urb)
{
- struct dma_aligned_buffer *temp;
+ void *stored_xfer_buffer;
if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
return;
- temp = container_of(urb->transfer_buffer,
- struct dma_aligned_buffer, data);
+ /* Restore urb->transfer_buffer from the end of the allocated area */
+ memcpy(&stored_xfer_buffer, urb->transfer_buffer +
+ urb->transfer_buffer_length, sizeof(urb->transfer_buffer));
if (usb_urb_dir_in(urb))
- memcpy(temp->old_xfer_buffer, temp->data,
+ memcpy(stored_xfer_buffer, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = temp->old_xfer_buffer;
- kfree(temp->kmalloc_ptr);
+ kfree(urb->transfer_buffer);
+ urb->transfer_buffer = stored_xfer_buffer;
urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
}
static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
{
- struct dma_aligned_buffer *temp, *kmalloc_ptr;
+ void *kmalloc_ptr;
size_t kmalloc_size;
if (urb->num_sgs || urb->sg ||
@@ -2700,22 +2695,29 @@ static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
!((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
return 0;
- /* Allocate a buffer with enough padding for alignment */
+ /*
+ * Allocate a buffer with enough padding for original transfer_buffer
+ * pointer. This allocation is guaranteed to be aligned properly for
+ * DMA
+ */
kmalloc_size = urb->transfer_buffer_length +
- sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1;
+ sizeof(urb->transfer_buffer);
kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
if (!kmalloc_ptr)
return -ENOMEM;
- /* Position our struct dma_aligned_buffer such that data is aligned */
- temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1;
- temp->kmalloc_ptr = kmalloc_ptr;
- temp->old_xfer_buffer = urb->transfer_buffer;
+ /*
+ * Position value of original urb->transfer_buffer pointer to the end
+ * of allocation for later referencing
+ */
+ memcpy(kmalloc_ptr + urb->transfer_buffer_length,
+ &urb->transfer_buffer, sizeof(urb->transfer_buffer));
+
if (usb_urb_dir_out(urb))
- memcpy(temp->data, urb->transfer_buffer,
+ memcpy(kmalloc_ptr, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = temp->data;
+ urb->transfer_buffer = kmalloc_ptr;
urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;
--
2.18.0
This is a note to let you know that I've just added the patch titled
usb: gadget: Fix OS descriptors support
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 50b9773c13bffbef32060e67c4483ea7b2eca7b5 Mon Sep 17 00:00:00 2001
From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Date: Wed, 27 Jun 2018 12:33:56 +1000
Subject: usb: gadget: Fix OS descriptors support
The current code is broken as it re-defines "req" inside the
if block, then goto out of it. Thus the request that ends
up being sent is not the one that was populated by the
code in question.
This fixes RNDIS driver autodetect by Windows 10 for me.
The bug was introduced by Chris rework to remove the local
queuing inside the if { } block of the redefined request.
Fixes: 636ba13aec8a ("usb: gadget: composite: remove duplicated code in OS desc handling")
Cc: <stable(a)vger.kernel.org> # v4.17
Signed-off-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
---
drivers/usb/gadget/composite.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c
index d2fa071c21b1..b8a15840b4ff 100644
--- a/drivers/usb/gadget/composite.c
+++ b/drivers/usb/gadget/composite.c
@@ -1819,7 +1819,6 @@ composite_setup(struct usb_gadget *gadget, const struct usb_ctrlrequest *ctrl)
if (cdev->use_os_string && cdev->os_desc_config &&
(ctrl->bRequestType & USB_TYPE_VENDOR) &&
ctrl->bRequest == cdev->b_vendor_code) {
- struct usb_request *req;
struct usb_configuration *os_desc_cfg;
u8 *buf;
int interface;
--
2.18.0
From: Paul Burton <paul.burton(a)mips.com>
The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.
This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:
- Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.
- Not succeed in generating the desired output from remote CPUs.
- Produce warnings about this from smp_call_function_many(), for
example:
[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0
[42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0
[42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] ------------[ cut here ]------------
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000
[42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0
[42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [<c7521934>] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [<b3fce717>] exit_mmap+0x76/0xea
[42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a
[42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c
[42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] ------------[ cut here ]------------
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
...
This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.
I've marked this for stable branches as far back as v4.9, to which it
applies cleanly. Strictly speaking the faulty MIPS implementation can be
traced further back to commit 856839b76836 ("MIPS: Add
arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel
versions v3.19 through v4.8 will require further work to backport due to
the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more
trigger_*_cpu_backtrace() methods").
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19597/
Cc: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Cc: stable(a)vger.kernel.org
Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function")
Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods")
[ Huacai: backported to 4.9: Replace "call_single_data_t" with "struct call_single_data" ]
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
arch/mips/kernel/process.c | 45 ++++++++++++++++++++++++++++++---------------
1 file changed, 30 insertions(+), 15 deletions(-)
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index cb1e9c1..513a63b 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -26,6 +26,7 @@
#include <linux/kallsyms.h>
#include <linux/random.h>
#include <linux/prctl.h>
+#include <linux/nmi.h>
#include <asm/asm.h>
#include <asm/bootinfo.h>
@@ -633,28 +634,42 @@ unsigned long arch_align_stack(unsigned long sp)
return sp & ALMASK;
}
-static void arch_dump_stack(void *info)
-{
- struct pt_regs *regs;
+static DEFINE_PER_CPU(struct call_single_data, backtrace_csd);
+static struct cpumask backtrace_csd_busy;
- regs = get_irq_regs();
-
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+static void handle_backtrace(void *info)
+{
+ nmi_cpu_backtrace(get_irq_regs());
+ cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
}
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+static void raise_backtrace(cpumask_t *mask)
{
- long this_cpu = get_cpu();
+ struct call_single_data *csd;
+ int cpu;
- if (cpumask_test_cpu(this_cpu, mask) && !exclude_self)
- dump_stack();
+ for_each_cpu(cpu, mask) {
+ /*
+ * If we previously sent an IPI to the target CPU & it hasn't
+ * cleared its bit in the busy cpumask then it didn't handle
+ * our previous IPI & it's not safe for us to reuse the
+ * call_single_data_t.
+ */
+ if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) {
+ pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n",
+ cpu);
+ continue;
+ }
- smp_call_function_many(mask, arch_dump_stack, NULL, 1);
+ csd = &per_cpu(backtrace_csd, cpu);
+ csd->func = handle_backtrace;
+ smp_call_function_single_async(cpu, csd);
+ }
+}
- put_cpu();
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+ nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
}
int mips_get_process_fp_mode(struct task_struct *task)
--
2.7.0
This is the start of the stable review cycle for the 4.14.56 release.
There are 54 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jul 18 07:34:24 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.56-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.56-rc1
Jaegeuk Kim <jaegeuk(a)kernel.org>
f2fs: give message and set need_fsck given broken node id
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
loop: remember whether sysfs_create_group() was done
Leon Romanovsky <leonro(a)mellanox.com>
RDMA/ucm: Mark UCM interface as BROKEN
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
PM / hibernate: Fix oops at snapshot_write()
Theodore Ts'o <tytso(a)mit.edu>
loop: add recursion validation to LOOP_CHANGE_FD
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: initialise match/target check parameter struct
Eric Dumazet <edumazet(a)google.com>
netfilter: nf_queue: augment nfqa_cfg_policy
Oleg Nesterov <oleg(a)redhat.com>
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
Eric Biggers <ebiggers(a)google.com>
crypto: x86/salsa20 - remove x86 salsa20 implementations
Keith Busch <keith.busch(a)intel.com>
nvme-pci: Remap CMB SQ entries on every controller reset
Juergen Gross <jgross(a)suse.com>
xen: setup pv irq ops vector earlier
Steve Wise <swise(a)opengridcomputing.com>
iw_cxgb4: correctly enforce the max reg_mr depth
Jon Hunter <jonathanh(a)nvidia.com>
i2c: tegra: Fix NACK error handling
Michael J. Ruhl <michael.j.ruhl(a)intel.com>
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
Paul Menzel <pmenzel(a)molgen.mpg.de>
tools build: fix # escaping in .cmd files for future Make
Yandong Zhao <yandong77520(a)gmail.com>
arm64: neon: Fix function may_use_simd() return error status
Randy Dunlap <rdunlap(a)infradead.org>
kbuild: delete INSTALL_FW_PATH from kbuild documentation
Joel Fernandes (Google) <joel(a)joelfernandes.org>
tracing: Reorder display of TGID to be after PID
Michal Hocko <mhocko(a)suse.com>
mm: do not bug_on on incorrect length in __mm_populate()
Oscar Salvador <osalvador(a)suse.de>
fs, elf: make sure to page align bss in load_elf_library
Vlastimil Babka <vbabka(a)suse.cz>
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Christian Borntraeger <borntraeger(a)de.ibm.com>
mm: do not drop unused pages when userfaultd is running
Chris Wilson <chris(a)chris-wilson.co.uk>
ALSA: hda - Handle pm failure during hotplug
Hui Wang <hui.wang(a)canonical.com>
ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
Ming Lei <ming.lei(a)redhat.com>
scsi: megaraid_sas: fix selection of reply queue
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: Create separate functions to allocate ctrl memory
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: replace is_ventura with adapter_type checks
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: replace instance->ctrl_context checks with instance->adapter_type
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: use adapter_type for all gen controllers
Christoph Hellwig <hch(a)lst.de>
genirq/affinity: assign vectors to all possible CPUs
Linus Torvalds <torvalds(a)linux-foundation.org>
Fix up non-directory creation in SGID directories
Christian Brauner <christian.brauner(a)ubuntu.com>
devpts: resolve devpts bind-mounts
Christian Brauner <christian.brauner(a)ubuntu.com>
devpts: hoist out check for DEVPTS_SUPER_MAGIC
Dan Carpenter <dan.carpenter(a)oracle.com>
xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
Nico Sneck <snecknico(a)gmail.com>
usb: quirks: add delay quirks for Corsair Strafe
Johan Hovold <johan(a)kernel.org>
USB: serial: mos7840: fix status-register error handling
Jann Horn <jannh(a)google.com>
USB: yurex: fix out-of-bounds uaccess in read handler
Johan Hovold <johan(a)kernel.org>
USB: serial: keyspan_pda: fix modem-status error handling
Olli Salonen <olli.salonen(a)iki.fi>
USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: serial: ch341: fix type promotion bug in ch341_control_in()
Hans de Goede <hdegoede(a)redhat.com>
ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
Nadav Amit <namit(a)vmware.com>
vmw_balloon: fix inflation with batching
Damien Le Moal <damien.lemoal(a)wdc.com>
ata: Fix ZBC_OUT all bit handling
Damien Le Moal <damien.lemoal(a)wdc.com>
ata: Fix ZBC_OUT command block check
Ping-Ke Shih <pkshih(a)realtek.com>
staging: r8822be: Fix RTL8822be can't find any wireless AP
Murray McAllister <murray.mcallister(a)insomniasec.com>
staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
Jann Horn <jannh(a)google.com>
ibmasm: don't write out of bounds in read handler
x00270170 <xiaqing17(a)hisilicon.com>
mmc: dw_mmc: fix card threshold control configuration
Stefan Agner <stefan(a)agner.ch>
mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
Paul Burton <paul.burton(a)mips.com>
MIPS: Fix ioremap() RAM check
Paul Burton <paul.burton(a)mips.com>
MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Paul Burton <paul.burton(a)mips.com>
MIPS: Call dump_stack() from show_regs()
Kai Chieh Chuang <kaichieh.chuang(a)mediatek.com>
ASoC: mediatek: preallocate pages use platform device
Sean Young <sean(a)mess.org>
media: rc: mce_kbd decoder: fix stuck keys
-------------
Diffstat:
Documentation/kbuild/kbuild.txt | 9 -
Makefile | 4 +-
arch/arm64/include/asm/simd.h | 19 +-
arch/mips/kernel/process.c | 43 +-
arch/mips/kernel/traps.c | 1 +
arch/mips/mm/ioremap.c | 37 +-
arch/x86/crypto/Makefile | 4 -
arch/x86/crypto/salsa20-i586-asm_32.S | 1114 --------------------
arch/x86/crypto/salsa20-x86_64-asm_64.S | 919 ----------------
arch/x86/crypto/salsa20_glue.c | 116 --
arch/x86/kernel/uprobes.c | 2 +-
arch/x86/xen/enlighten_pv.c | 24 +-
arch/x86/xen/irq.c | 4 +-
crypto/Kconfig | 26 -
drivers/ata/ahci.c | 59 ++
drivers/ata/libata-core.c | 3 +
drivers/ata/libata-scsi.c | 18 +-
drivers/block/loop.c | 79 +-
drivers/block/loop.h | 1 +
drivers/i2c/busses/i2c-tegra.c | 17 +-
drivers/infiniband/Kconfig | 12 +
drivers/infiniband/core/Makefile | 4 +-
drivers/infiniband/hw/cxgb4/mem.c | 2 +-
drivers/infiniband/hw/hfi1/rc.c | 2 +-
drivers/infiniband/hw/hfi1/uc.c | 4 +-
drivers/infiniband/hw/hfi1/ud.c | 4 +-
drivers/infiniband/hw/hfi1/verbs_txreq.c | 4 +-
drivers/infiniband/hw/hfi1/verbs_txreq.h | 4 +-
drivers/media/rc/ir-mce_kbd-decoder.c | 2 +
drivers/misc/ibmasm/ibmasmfs.c | 27 +-
drivers/misc/vmw_balloon.c | 4 +-
drivers/mmc/host/dw_mmc.c | 7 +-
drivers/mmc/host/sdhci-esdhc-imx.c | 21 +-
drivers/nvme/host/pci.c | 27 +-
drivers/scsi/megaraid/megaraid_sas.h | 10 +-
drivers/scsi/megaraid/megaraid_sas_base.c | 296 ++++--
drivers/scsi/megaraid/megaraid_sas_fp.c | 20 +-
drivers/scsi/megaraid/megaraid_sas_fusion.c | 54 +-
drivers/scsi/megaraid/megaraid_sas_fusion.h | 7 -
drivers/staging/rtl8723bs/core/rtw_ap.c | 2 +-
drivers/staging/rtlwifi/rtl8822be/hw.c | 2 +-
drivers/staging/rtlwifi/wifi.h | 1 +
drivers/usb/core/quirks.c | 4 +
drivers/usb/host/xhci-mem.c | 2 +-
drivers/usb/misc/yurex.c | 23 +-
drivers/usb/serial/ch341.c | 2 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/keyspan_pda.c | 4 +-
drivers/usb/serial/mos7840.c | 3 +
fs/binfmt_elf.c | 5 +-
fs/devpts/inode.c | 48 +-
fs/f2fs/f2fs.h | 13 +-
fs/f2fs/inode.c | 13 +-
fs/f2fs/node.c | 21 +-
fs/inode.c | 6 +
fs/proc/task_mmu.c | 2 +-
include/linux/libata.h | 1 +
kernel/irq/affinity.c | 30 +-
kernel/power/user.c | 5 +
kernel/trace/trace.c | 8 +-
kernel/trace/trace_output.c | 5 +-
mm/gup.c | 2 -
mm/mmap.c | 29 +-
mm/rmap.c | 8 +-
net/bridge/netfilter/ebtables.c | 2 +
net/ipv4/netfilter/ip_tables.c | 1 +
net/ipv6/netfilter/ip6_tables.c | 1 +
net/netfilter/nfnetlink_queue.c | 3 +
sound/pci/hda/patch_hdmi.c | 19 +-
sound/pci/hda/patch_realtek.c | 6 +-
.../soc/mediatek/common/mtk-afe-platform-driver.c | 4 +-
tools/build/Build.include | 4 +-
72 files changed, 665 insertions(+), 2625 deletions(-)
SEV guest fails to update the UEFI runtime variables stored in the
flash. commit 1379edd59673 ("x86/efi: Access EFI data as encrypted
when SEV is active") unconditionally maps all the UEFI runtime data
as 'encrypted' (C=1). When SEV is active the UEFI runtime data marked
as EFI_MEMORY_MAPPED_IO should be mapped as 'unencrypted' so that both
guest and hypervisor can access the data.
Fixes: 1379edd59673 (x86/efi: Access EFI data as encrypted ...)
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: linux-efi(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # 4.15.x
Signed-off-by: Brijesh Singh <brijesh.singh(a)amd.com>
---
arch/x86/platform/efi/efi_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 77873ce..5f2eb32 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -417,7 +417,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
- if (sev_active())
+ if (sev_active() && md->type != EFI_MEMORY_MAPPED_IO)
flags |= _PAGE_ENC;
pfn = md->phys_addr >> PAGE_SHIFT;
--
2.7.4
The patch titled
Subject: mm: do not bug_on on incorrect length in __mm_populate()
has been removed from the -mm tree. Its filename was
mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm: do not bug_on on incorrect length in __mm_populate()
syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate
localhost login: [ 81.210241] emacs (9634) used greatest stack depth: 10416 bytes left
[ 140.099935] ------------[ cut here ]------------
[ 140.101904] kernel BUG at mm/gup.c:1242!
[ 140.103572] invalid opcode: 0000 [#1] SMP
[ 140.105220] CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
[ 140.107762] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 140.112000] RIP: 0010:__mm_populate+0x1e2/0x1f0
[ 140.113875] Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
[ 140.121403] RSP: 0018:ffffc90000dffd78 EFLAGS: 00010293
[ 140.123516] RAX: ffff8801366c63c0 RBX: 000000007bf81000 RCX: ffffffff813e4ee2
[ 140.126352] RDX: 0000000000000000 RSI: 0000000000007676 RDI: 000000007bf81000
[ 140.129236] RBP: ffffc90000dffdc0 R08: 0000000000000000 R09: 0000000000000000
[ 140.132110] R10: ffff880135895c80 R11: 0000000000000000 R12: 0000000000007676
[ 140.134955] R13: 0000000000008000 R14: 0000000000000000 R15: 0000000000007676
[ 140.137785] FS: 0000000000000000(0000) GS:ffff88013a680000(0063) knlGS:00000000f7db9700
[ 140.140998] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 140.143303] CR2: 00000000f7ea56e0 CR3: 0000000134674004 CR4: 00000000000606e0
[ 140.145906] Call Trace:
[ 140.146728] vm_brk_flags+0xc3/0x100
[ 140.147830] vm_brk+0x1f/0x30
[ 140.148714] load_elf_library+0x281/0x2e0
[ 140.149875] __ia32_sys_uselib+0x170/0x1e0
[ 140.151028] ? copy_overflow+0x30/0x30
[ 140.152105] ? __ia32_sys_uselib+0x170/0x1e0
[ 140.153301] do_fast_syscall_32+0xca/0x420
[ 140.154455] entry_SYSENTER_compat+0x70/0x7f
The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is expanded
as it should. All we need is to tell mm_populate about it. Besides that
there is absolutely no reason to to bug_on in the first place. The worst
thing that could happen is that the last page wouldn't get populated and
that is far from putting system into an inconsistent state.
Fix the issue by moving the length sanitization code from do_brk_flags up
to vm_brk_flags. The only other caller of do_brk_flags is brk syscall
entry and it makes sure to provide the proper length so t here is no need
for sanitation and so we can use do_brk_flags without it.
Also remove the bogus BUG_ONs.
[osalvador(a)techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06(a)syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 2 --
mm/mmap.c | 29 ++++++++++++-----------------
2 files changed, 12 insertions(+), 19 deletions(-)
diff -puN mm/gup.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate mm/gup.c
--- a/mm/gup.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate
+++ a/mm/gup.c
@@ -1238,8 +1238,6 @@ int __mm_populate(unsigned long start, u
int locked = 0;
long ret = 0;
- VM_BUG_ON(start & ~PAGE_MASK);
- VM_BUG_ON(len != PAGE_ALIGN(len));
end = start + len;
for (nstart = start; nstart < end; nstart = nend) {
diff -puN mm/mmap.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate mm/mmap.c
--- a/mm/mmap.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate
+++ a/mm/mmap.c
@@ -186,8 +186,8 @@ static struct vm_area_struct *remove_vma
return next;
}
-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf);
-
+static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags,
+ struct list_head *uf);
SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long retval;
@@ -245,7 +245,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
goto out;
/* Ok, looks good - let it rip. */
- if (do_brk(oldbrk, newbrk-oldbrk, &uf) < 0)
+ if (do_brk_flags(oldbrk, newbrk-oldbrk, 0, &uf) < 0)
goto out;
set_brk:
@@ -2929,21 +2929,14 @@ static inline void verify_mm_writelocked
* anonymous maps. eventually we may be able to do some
* brk-specific accounting here.
*/
-static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf)
+static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long flags, struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
- unsigned long len;
struct rb_node **rb_link, *rb_parent;
pgoff_t pgoff = addr >> PAGE_SHIFT;
int error;
- len = PAGE_ALIGN(request);
- if (len < request)
- return -ENOMEM;
- if (!len)
- return 0;
-
/* Until we need other flags, refuse anything except VM_EXEC. */
if ((flags & (~VM_EXEC)) != 0)
return -EINVAL;
@@ -3015,18 +3008,20 @@ out:
return 0;
}
-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf)
-{
- return do_brk_flags(addr, len, 0, uf);
-}
-
-int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
+int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ unsigned long len;
int ret;
bool populate;
LIST_HEAD(uf);
+ len = PAGE_ALIGN(request);
+ if (len < request)
+ return -ENOMEM;
+ if (!len)
+ return 0;
+
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-drop-vm_bug_on-from-__get_free_pages.patch
memcg-oom-move-out_of_memory-back-to-the-charge-path.patch
mm-oom-remove-sleep-from-under-oom_lock.patch
mm-oom-document-oom_lock.patch
mm-oom-docs-describe-the-cgroup-aware-oom-killer-fix-2.patch
The patch titled
Subject: fs, elf: make sure to page align bss in load_elf_library
has been removed from the -mm tree. Its filename was
fs-elf-make-sure-to-page-align-bss-in-load_elf_library.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: fs, elf: make sure to page align bss in load_elf_library
The current code does not make sure to page align bss before calling
vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate()
due to the requested lenght not being correctly aligned.
Let us make sure to align it properly.
Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for
libc5.
Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-by: syzbot+5dcb560fe12aa5091c06(a)syzkaller.appspotmail.com
Tested-by: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nicolas Pitre <nicolas.pitre(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_elf.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff -puN fs/binfmt_elf.c~fs-elf-make-sure-to-page-align-bss-in-load_elf_library fs/binfmt_elf.c
--- a/fs/binfmt_elf.c~fs-elf-make-sure-to-page-align-bss-in-load_elf_library
+++ a/fs/binfmt_elf.c
@@ -1259,9 +1259,8 @@ static int load_elf_library(struct file
goto out_free_ph;
}
- len = ELF_PAGESTART(eppnt->p_filesz + eppnt->p_vaddr +
- ELF_MIN_ALIGN - 1);
- bss = eppnt->p_memsz + eppnt->p_vaddr;
+ len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr);
+ bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr);
if (bss > len) {
error = vm_brk(len, bss - len);
if (error)
_
Patches currently in -mm which might be from osalvador(a)suse.de are
mm-memory_hotplug-make-add_memory_resource-use-__try_online_node.patch
mm-memory_hotplug-call-register_mem_sect_under_node.patch
mm-memory_hotplug-make-register_mem_sect_under_node-a-cb-of-walk_memory_range.patch
mm-memory_hotplug-drop-unnecessary-checks-from-register_mem_sect_under_node.patch
mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
The patch titled
Subject: x86/purgatory: add missing FORCE to Makefile target
has been removed from the -mm tree. Its filename was
x86-purgatory-add-missing-force-to-makefile-target.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Philipp Rudo <prudo(a)linux.ibm.com>
Subject: x86/purgatory: add missing FORCE to Makefile target
- Build the kernel without the fix
- Add some flag to the purgatories KBUILD_CFLAGS,I used
-fno-asynchronous-unwind-tables
- Re-build the kernel
When you look at makes output you see that sha256.o is not re-build in the
last step. Also readelf -S still shows the .eh_frame section for
sha256.o.
With the fix sha256.o is rebuilt in the last step.
Without FORCE make does not detect changes only made to the command line
options. So object files might not be re-built even when they should be.
Fix this by adding FORCE where it is missing.
Link: http://lkml.kernel.org/r/20180704110044.29279-2-prudo@linux.ibm.com
Fixes: df6f2801f511 ("kernel/kexec_file.c: move purgatories sha256 to common code")
Signed-off-by: Philipp Rudo <prudo(a)linux.ibm.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/purgatory/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN arch/x86/purgatory/Makefile~x86-purgatory-add-missing-force-to-makefile-target arch/x86/purgatory/Makefile
--- a/arch/x86/purgatory/Makefile~x86-purgatory-add-missing-force-to-makefile-target
+++ a/arch/x86/purgatory/Makefile
@@ -6,7 +6,7 @@ purgatory-y := purgatory.o stack.o setup
targets += $(purgatory-y)
PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
-$(obj)/sha256.o: $(srctree)/lib/sha256.c
+$(obj)/sha256.o: $(srctree)/lib/sha256.c FORCE
$(call if_changed_rule,cc_o_c)
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib
_
Patches currently in -mm which might be from prudo(a)linux.ibm.com are
The patch titled
Subject: fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
has been removed from the -mm tree. Its filename was
mm-fix-locked-field-in-proc-pid-smaps.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Thomas reports:
: While looking around in /proc on my v4.14.52 system I noticed that
: all processes got a lot of "Locked" memory in /proc/*/smaps. A lot
: more memory than a regular user can usually lock with mlock().
:
: commit 493b0e9d945fa9dfe96be93ae41b4ca4b6fdb317 (v4.14-rc1) seems
: to have changed the behavior of "Locked".
:
: Before that commit the code was like this. Notice the VM_LOCKED
: check.
:
: (vma->vm_flags & VM_LOCKED) ?
: (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
:
: After that commit Locked is now the same as Pss. This looks like a
: mistake.
:
: (unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
Indeed, the commit has added mss->pss_locked with the correct value that
depends on VM_LOCKED, but forgot to actually use it. Fix it.
Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Thomas Lindroth <thomas.lindroth(a)gmail.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Daniel Colascione <dancol(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN fs/proc/task_mmu.c~mm-fix-locked-field-in-proc-pid-smaps fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-fix-locked-field-in-proc-pid-smaps
+++ a/fs/proc/task_mmu.c
@@ -831,7 +831,8 @@ static int show_smap(struct seq_file *m,
SEQ_PUT_DEC(" kB\nSwap: ", mss->swap);
SEQ_PUT_DEC(" kB\nSwapPss: ",
mss->swap_pss >> PSS_SHIFT);
- SEQ_PUT_DEC(" kB\nLocked: ", mss->pss >> PSS_SHIFT);
+ SEQ_PUT_DEC(" kB\nLocked: ",
+ mss->pss_locked >> PSS_SHIFT);
seq_puts(m, " kB\n");
}
if (!rollup_mode) {
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-page_alloc-actually-ignore-mempolicies-for-high-priority-allocations.patch
The patch titled
Subject: mm: do not drop unused pages when userfaultd is running
has been removed from the -mm tree. Its filename was
mm-do-not-drop-unused-pages-when-userfaultd-is-running.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
Subject: mm: do not drop unused pages when userfaultd is running
KVM guests on s390 can notify the host of unused pages. This can result
in pte_unused callbacks to be true for KVM guest memory.
If a page is unused (checked with pte_unused) we might drop this page
instead of paging it. This can have side-effects on userfaultd, when the
page in question was already migrated:
The next access of that page will trigger a fault and a user fault instead
of faulting in a new and empty zero page. As QEMU does not expect a
userfault on an already migrated page this migration will fail.
The most straightforward solution is to ignore the pte_unused hint if a
userfault context is active for this VMA.
Link: http://lkml.kernel.org/r/20180703171854.63981-1-borntraeger@de.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Cornelia Huck <cohuck(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff -puN mm/rmap.c~mm-do-not-drop-unused-pages-when-userfaultd-is-running mm/rmap.c
--- a/mm/rmap.c~mm-do-not-drop-unused-pages-when-userfaultd-is-running
+++ a/mm/rmap.c
@@ -64,6 +64,7 @@
#include <linux/backing-dev.h>
#include <linux/page_idle.h>
#include <linux/memremap.h>
+#include <linux/userfaultfd_k.h>
#include <asm/tlbflush.h>
@@ -1481,11 +1482,16 @@ static bool try_to_unmap_one(struct page
set_pte_at(mm, address, pvmw.pte, pteval);
}
- } else if (pte_unused(pteval)) {
+ } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
/*
* The guest indicated that the page content is of no
* interest anymore. Simply discard the pte, vmscan
* will take care of the rest.
+ * A future reference will then fault in a new zero
+ * page. When userfaultfd is active, we must not drop
+ * this page though, as its main user (postcopy
+ * migration) will not expect userfaults on already
+ * copied pages.
*/
dec_mm_counter(mm, mm_counter(page));
/* We have to invalidate as we cleared the pte */
_
Patches currently in -mm which might be from borntraeger(a)de.ibm.com are
With commit eca0fa28cd0d ("perf record: Provide detailed information on s390 CPU")
s390 platform provides detailed type/model/capacitiy information
in the CPU indentifier string instead of just "IBM/S390".
This breaks perf kvm support which uses hard coded string IBM/S390 to
compare with the CPU identifier string. Fix this by changing the comparison.
Fixes: eca0fa28cd0d ("perf record: Provide detailed information on s390 CPU")
Cc: Stefan Raspl <raspl(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> # 4.17
Signed-off-by: Thomas Richter <tmricht(a)linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner(a)linux.ibm.com>
---
tools/perf/arch/s390/util/kvm-stat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/arch/s390/util/kvm-stat.c b/tools/perf/arch/s390/util/kvm-stat.c
index d233e2eb9592..aaabab5e2830 100644
--- a/tools/perf/arch/s390/util/kvm-stat.c
+++ b/tools/perf/arch/s390/util/kvm-stat.c
@@ -102,7 +102,7 @@ const char * const kvm_skip_events[] = {
int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid)
{
- if (strstr(cpuid, "IBM/S390")) {
+ if (strstr(cpuid, "IBM")) {
kvm->exit_reasons = sie_exit_reasons;
kvm->exit_reasons_isa = "SIE";
} else
--
2.14.3
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit def9331a12977770cc6132d79f8e6565871e8e38 upstream
When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
on AMD cpus.
This bug/feature bit is kind of special as it will be used very early
when switching threads. Setting the bit and clearing it a little bit
later leaves a critical window where things can go wrong. This time
window has enlarged a little bit by using setup_clear_cpu_cap() instead
of the hypervisor's set_cpu_features callback. It seems this larger
window now makes it rather easy to hit the problem.
The proper solution is to never set the bit in case of Xen.
Signed-off-by: Juergen Gross <jgross(a)suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Acked-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/cpu/amd.c | 5 +++--
arch/x86/xen/enlighten.c | 4 +---
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f4fb8f5..9b29414 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -791,8 +791,9 @@ static void init_amd(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))
set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
- /* AMD CPUs don't reset SS attributes on SYSRET */
- set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
+ /* AMD CPUs don't reset SS attributes on SYSRET, Xen does. */
+ if (!cpu_has(c, X86_FEATURE_XENPV))
+ set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
}
#ifdef CONFIG_X86_32
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2d7ab4e..82fd84d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -462,10 +462,8 @@ static void __init xen_init_cpuid_mask(void)
static void __init xen_init_capabilities(void)
{
- if (xen_pv_domain()) {
- setup_clear_cpu_cap(X86_BUG_SYSRET_SS_ATTRS);
+ if (xen_pv_domain())
setup_force_cpu_cap(X86_FEATURE_XENPV);
- }
}
static void xen_set_debugreg(int reg, unsigned long val)
On 15/07/18 01:32, kernelci.org bot wrote:
> mainline/master boot: 177 boots: 2 failed, 174 passed with 1 conflict (v4.18-rc4-160-gf353078f028f)
>
> Full Boot Summary: https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.18-rc4-1…
> Full Build Summary: https://kernelci.org/build/mainline/branch/master/kernel/v4.18-rc4-160-gf35…
>
> Tree: mainline
> Branch: master
> Git Describe: v4.18-rc4-160-gf353078f028f
> Git Commit: f353078f028fbfe9acd4b747b4a19c69ef6846cd
> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> Tested: 67 unique boards, 25 SoC families, 21 builds out of 199
>
> Boot Regressions Detected:
[...]
> x86:
>
> i386_defconfig:
> x86-celeron:
> lab-mhart: new failure (last pass: v4.18-rc4-147-g2db39a2f491a)
> x86-pentium4:
> lab-mhart: new failure (last pass: v4.18-rc4-147-g2db39a2f491a)
Please see below an automated bisection report for this
regression. Several bisections were run on other x86 platforms
with i386_defconfig on a few revisions up to v4.18-rc5, they all
reached the same "bad" commit.
Unfortunately there isn't much to learn from the kernelci.org
boot logs as the kernel seems to crash very early on:
https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.18-rc5/https://storage.kernelci.org/mainline/master/v4.18-rc4-160-gf353078f028f/x8…
It looks like stable-rc/linux-4.17.y is also broken with
i386_defconfig, which tends to confirm the "bad" commit found by
the automated bisection which was applied there as well:
https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.17.y/kernel/v4.1…
The automated bisection on kernelci.org is still quite new, so
please take the results with a pinch of salt as the "bad" commit
found may not be the actual root cause of the boot failure.
Hope this helps!
Best wishes,
Guillaume
--------------------------------------8<--------------------------------------
Bisection result for mainline/master (v4.18-rc4-160-gf353078f028f) on x86-celeron
Good: 2db39a2f491a Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Bad: f353078f028f Merge branch 'akpm' (patches from Andrew)
Found: e181ae0c5db9 mm: zero unavailable pages before memmap init
Checks:
revert: PASS
verify: PASS
Parameters:
Tree: mainline
URL: http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Branch: master
Target: x86-celeron
Lab: lab-mhart
Config: i386_defconfig
Plan: boot
Breaking commit found:
-------------------------------------------------------------------------------
commit e181ae0c5db9544de9c53239eb22bc012ce75033
Author: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Date: Sat Jul 14 09:15:07 2018 -0400
mm: zero unavailable pages before memmap init
We must zero struct pages for memory that is not backed by physical
memory, or kernel does not have access to.
Recently, there was a change which zeroed all memmap for all holes in
e820. Unfortunately, it introduced a bug that is discussed here:
https://www.spinics.net/lists/linux-mm/msg156764.html
Linus, also saw this bug on his machine, and confirmed that reverting
commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
memblock.reserved") fixes the issue.
The problem is that we incorrectly zero some struct pages after they
were setup.
The fix is to zero unavailable struct pages prior to initializing of
struct pages.
A more detailed fix should come later that would avoid double zeroing
cases: one in __init_single_page(), the other one in
zero_resv_unavail().
Fixes: 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..5d800d61ddb7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6847,6 +6847,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
+ zero_resv_unavail();
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid, NULL,
@@ -6857,7 +6858,6 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
node_set_state(nid, N_MEMORY);
check_for_memory(pgdat, nid);
}
- zero_resv_unavail();
}
static int __init cmdline_parse_core(char *p, unsigned long *core,
@@ -7033,9 +7033,9 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
void __init free_area_init(unsigned long *zones_size)
{
+ zero_resv_unavail();
free_area_init_node(0, zones_size,
__pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL);
- zero_resv_unavail();
}
static int page_alloc_cpu_dead(unsigned int cpu)
-------------------------------------------------------------------------------
Git bisection log:
-------------------------------------------------------------------------------
git bisect start
# good: [2db39a2f491a48ec740e0214a7dd584eefc2137d] Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect good 2db39a2f491a48ec740e0214a7dd584eefc2137d
# bad: [f353078f028fbfe9acd4b747b4a19c69ef6846cd] Merge branch 'akpm' (patches from Andrew)
git bisect bad f353078f028fbfe9acd4b747b4a19c69ef6846cd
# good: [fa8cbda88db12e632a8987c94b66f5caf25bcec4] x86/purgatory: add missing FORCE to Makefile target
git bisect good fa8cbda88db12e632a8987c94b66f5caf25bcec4
# good: [bb177a732c4369bb58a1fe1df8f552b6f0f7db5f] mm: do not bug_on on incorrect length in __mm_populate()
git bisect good bb177a732c4369bb58a1fe1df8f552b6f0f7db5f
# good: [fe10e398e860955bac4d28ec031b701d358465e4] reiserfs: fix buffer overflow with long warning messages
git bisect good fe10e398e860955bac4d28ec031b701d358465e4
# bad: [e181ae0c5db9544de9c53239eb22bc012ce75033] mm: zero unavailable pages before memmap init
git bisect bad e181ae0c5db9544de9c53239eb22bc012ce75033
# first bad commit: [e181ae0c5db9544de9c53239eb22bc012ce75033] mm: zero unavailable pages before memmap init
-------------------------------------------------------------------------------
We can process 400+ images per day.
If you need any image editing, please let us know.
Photos cut out;
Photos clipping path;
Photos masking;
Photo shadow creation;
Photos retouching;
Beauty Model retouching on skin, face, body;
Glamour retouching;
Products retouching.
We can give you testing for your photos.
Turnaround time is fast
Thanks,
Simon
We can process 400+ images per day.
If you need any image editing, please let us know.
Photos cut out;
Photos clipping path;
Photos masking;
Photo shadow creation;
Photos retouching;
Beauty Model retouching on skin, face, body;
Glamour retouching;
Products retouching.
We can give you testing for your photos.
Turnaround time is fast
Thanks,
Simon
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 03703ed1debf777ea845aa9b50ba2e80a5e7dd3c Mon Sep 17 00:00:00 2001
From: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Date: Fri, 2 Feb 2018 05:08:59 -0500
Subject: [PATCH] media: vb2: core: Finish buffers at the end of the stream
If buffers were prepared or queued and the buffers were released without
starting the queue, the finish mem op (corresponding to the prepare mem
op) was never called to the buffers.
Before commit a136f59c0a1f there was no need to do this as in such a case
the prepare mem op had not been called yet. Address the problem by
explicitly calling finish mem op when the queue is stopped if the buffer
is in either prepared or queued state.
Fixes: a136f59c0a1f ("[media] vb2: Move buffer cache synchronisation to prepare from queue")
Cc: stable(a)vger.kernel.org # for v4.13 and up
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Tested-by: Devin Heitmueller <dheitmueller(a)kernellabs.com>
Signed-off-by: Hans Verkuil <hans.verkuil(a)cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)s-opensource.com>
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index debe35fc66b4..d3f7bb33a54d 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -1696,6 +1696,15 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
for (i = 0; i < q->num_buffers; ++i) {
struct vb2_buffer *vb = q->bufs[i];
+ if (vb->state == VB2_BUF_STATE_PREPARED ||
+ vb->state == VB2_BUF_STATE_QUEUED) {
+ unsigned int plane;
+
+ for (plane = 0; plane < vb->num_planes; ++plane)
+ call_void_memop(vb, finish,
+ vb->planes[plane].mem_priv);
+ }
+
if (vb->state != VB2_BUF_STATE_DEQUEUED) {
vb->state = VB2_BUF_STATE_PREPARED;
call_void_vb_qop(vb, buf_finish, vb);
From: Dewet Thibaut <thibaut.dewet(a)nokia.com>
Previously the min interval limitation (one second) has been introduced.
In this case it was no longer possible to disable the machine check.
This patch removes this limitation and allows the value 0 again.
Cc: stable(a)vger.kernel.org
Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes")
Signed-off-by: Dewet Thibaut <thibaut.dewet(a)nokia.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c102ad51025e..8c50754c09c1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2165,9 +2165,6 @@ static ssize_t store_int_with_restart(struct device *s,
if (check_interval == old_check_interval)
return ret;
- if (check_interval < 1)
- check_interval = 1;
-
mutex_lock(&mce_sysfs_mutex);
mce_restart();
mutex_unlock(&mce_sysfs_mutex);
--
2.18.0
This is a note to let you know that I've just added the patch titled
usb: cdc_acm: Add quirk for Castles VEGA3000
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1445cbe476fc3dd09c0b380b206526a49403c071 Mon Sep 17 00:00:00 2001
From: Lubomir Rintel <lkundrak(a)v3.sk>
Date: Tue, 10 Jul 2018 08:28:49 +0200
Subject: usb: cdc_acm: Add quirk for Castles VEGA3000
The device (a POS terminal) implements CDC ACM, but has not union
descriptor.
Signed-off-by: Lubomir Rintel <lkundrak(a)v3.sk>
Acked-by: Oliver Neukum <oneukum(a)suse.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/cdc-acm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 998b32d0167e..75c4623ad779 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1831,6 +1831,9 @@ static const struct usb_device_id acm_ids[] = {
{ USB_DEVICE(0x09d8, 0x0320), /* Elatec GmbH TWN3 */
.driver_info = NO_UNION_NORMAL, /* has misplaced union descriptor */
},
+ { USB_DEVICE(0x0ca6, 0xa050), /* Castles VEGA3000 */
+ .driver_info = NO_UNION_NORMAL, /* reports zero length descriptor */
+ },
{ USB_DEVICE(0x2912, 0x0001), /* ATOL FPrint */
.driver_info = CLEAR_HALT_CONDITIONS,
--
2.18.0
This is a note to let you know that I've just added the patch titled
eventpoll.h: wrap casts in () properly
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 45cd74cb5061781e793a098c420a7f548fdc9e7d Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Date: Tue, 10 Jul 2018 17:15:38 +0200
Subject: eventpoll.h: wrap casts in () properly
When importing the latest copy of the kernel headers into Bionic,
Christpher and Elliott noticed that the eventpoll.h casts were not
wrapped in (). As it is, clang complains about macros without
surrounding (), so this makes it a pain for userspace tools.
So fix it up by adding another () pair, and make them line up purty by
using tabs.
Fixes: 65aaf87b3aa2 ("add EPOLLNVAL, annotate EPOLL... and event_poll->event")
Reported-by: Christopher Ferris <cferris(a)google.com>
Reported-by: Elliott Hughes <enh(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/uapi/linux/eventpoll.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bf48e71f2634..8a3432d0f0dc 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -42,7 +42,7 @@
#define EPOLLRDHUP (__force __poll_t)0x00002000
/* Set exclusive wakeup mode for the target file descriptor */
-#define EPOLLEXCLUSIVE (__force __poll_t)(1U << 28)
+#define EPOLLEXCLUSIVE ((__force __poll_t)(1U << 28))
/*
* Request the handling of system wakeup events so as to prevent system suspends
@@ -54,13 +54,13 @@
*
* Requires CAP_BLOCK_SUSPEND
*/
-#define EPOLLWAKEUP (__force __poll_t)(1U << 29)
+#define EPOLLWAKEUP ((__force __poll_t)(1U << 29))
/* Set the One Shot behaviour for the target file descriptor */
-#define EPOLLONESHOT (__force __poll_t)(1U << 30)
+#define EPOLLONESHOT ((__force __poll_t)(1U << 30))
/* Set the Edge Triggered behaviour for the target file descriptor */
-#define EPOLLET (__force __poll_t)(1U << 31)
+#define EPOLLET ((__force __poll_t)(1U << 31))
/*
* On x86-64 make the 64bit structure have the same alignment as the
--
2.18.0
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 37becec95ac31b209eb1c8e096f1093a7db00f32 Mon Sep 17 00:00:00 2001
From: Omar Sandoval <osandov(a)fb.com>
Date: Mon, 21 May 2018 17:07:19 -0700
Subject: [PATCH] Btrfs: allow empty subvol= again
I got a report that after upgrading to 4.16, someone's filesystems
weren't mounting:
[ 23.845852] BTRFS info (device loop0): unrecognized mount option 'subvol='
Before 4.16, this mounted the default subvolume. It turns out that this
empty "subvol=" is actually an application bug, but it was causing the
application to fail, so it's an ABI break if you squint.
The generic parsing code we use for mount options (match_token())
doesn't match an empty string as "%s". Previously, setup_root_args()
removed the "subvol=" string, but the mount path was cleaned up to not
need that. Add a dummy Opt_subvol_empty to fix this.
The simple workaround is to use / or . for the value of 'subvol=' .
Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
CC: stable(a)vger.kernel.org # 4.16+
Signed-off-by: Omar Sandoval <osandov(a)fb.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c67fafaa2fe7..81107ad49f3a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -323,6 +323,7 @@ enum {
Opt_ssd, Opt_nossd,
Opt_ssd_spread, Opt_nossd_spread,
Opt_subvol,
+ Opt_subvol_empty,
Opt_subvolid,
Opt_thread_pool,
Opt_treelog, Opt_notreelog,
@@ -388,6 +389,7 @@ static const match_table_t tokens = {
{Opt_ssd_spread, "ssd_spread"},
{Opt_nossd_spread, "nossd_spread"},
{Opt_subvol, "subvol=%s"},
+ {Opt_subvol_empty, "subvol="},
{Opt_subvolid, "subvolid=%s"},
{Opt_thread_pool, "thread_pool=%u"},
{Opt_treelog, "treelog"},
@@ -461,6 +463,7 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
btrfs_set_opt(info->mount_opt, DEGRADED);
break;
case Opt_subvol:
+ case Opt_subvol_empty:
case Opt_subvolid:
case Opt_subvolrootid:
case Opt_device:
The regset API documented in <linux/regset.h> defines -ENODEV as the
result of the `->active' handler to be used where the feature requested
is not available on the hardware found. However code handling core file
note generation in `fill_thread_core_info' interpretes any non-zero
result from the `->active' handler as the regset requested being active.
Consequently processing continues (and hopefully gracefully fails later
on) rather than being abandoned right away for the regset requested.
Fix the problem then by making the code proceed only if a positive
result is returned from the `->active' handler.
Cc: stable(a)vger.kernel.org # 2.6.25+
Fixes: 4206d3aa1978 ("elf core dump: notes user_regset")
Signed-off-by: Maciej W. Rozycki <macro(a)mips.com>
---
Hi,
Overall we could also use the count returned by ->active to limit the
size of data requested, i.e. something along the lines of:
ssize_t size;
if (regset->active)
size = regset->active(t->task, regset);
else
size = regset_size(t->task, regset);
if (size > 0) {
...
however that would be an optimisation that belongs to a separate change,
which (due to the need to unload stuff I have in progress already) I am
not going to make at this point. Perhaps someone else would be willing
to pick up the idea.
Anyway, please apply.
Maciej
---
fs/binfmt_elf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
linux-elf-core-regset-active.diff
Index: linux-jhogan-test/fs/binfmt_elf.c
===================================================================
--- linux-jhogan-test.orig/fs/binfmt_elf.c 2018-03-21 17:14:55.000000000 +0000
+++ linux-jhogan-test/fs/binfmt_elf.c 2018-05-09 23:25:50.742255000 +0100
@@ -1739,7 +1739,7 @@ static int fill_thread_core_info(struct
const struct user_regset *regset = &view->regsets[i];
do_thread_regset_writeback(t->task, regset);
if (regset->core_note_type && regset->get &&
- (!regset->active || regset->active(t->task, regset))) {
+ (!regset->active || regset->active(t->task, regset) > 0)) {
int ret;
size_t size = regset_size(t->task, regset);
void *data = kmalloc(size, GFP_KERNEL);
From: Waldemar Rymarkiewicz <waldemar.rymarkiewicz(a)gmail.com>
Original commit c5c2a97b3ac7 ("PM / OPP: Update voltage in case freq ==
old_freq").
This commit fixes a rare but possible case when the clk rate is updated
without update of the regulator voltage.
At boot up, CPUfreq checks if the system is running at the right freq. This
is a sanity check in case a bootloader set clk rate that is outside of freq
table present with cpufreq core. In such cases system can be unstable so
better to change it to a freq that is preset in freq-table.
The CPUfreq takes next freq that is >= policy->cur and this is our
target_freq that needs to be set now.
dev_pm_opp_set_rate(dev, target_freq) checks the target_freq and the
old_freq (a current rate). If these are equal it returns early. If not,
it searches for OPP (old_opp) that fits best to old_freq (not listed in
the table) and updates old_freq (!).
Here, we can end up with old_freq = old_opp.rate = target_freq, which
is not handled in _generic_set_opp_regulator(). It's supposed to update
voltage only when freq > old_freq || freq > old_freq.
if (freq > old_freq) {
ret = _set_opp_voltage(dev, reg, new_supply);
[...]
if (freq < old_freq) {
ret = _set_opp_voltage(dev, reg, new_supply);
if (ret)
It results in, no voltage update while clk rate is updated.
Example:
freq-table = {
1000MHz 1.15V
666MHZ 1.10V
333MHz 1.05V
}
boot-up-freq = 800MHz # not listed in freq-table
freq = target_freq = 1GHz
old_freq = 800Mhz
old_opp = _find_freq_ceil(opp_table, &old_freq); #(old_freq is modified!)
old_freq = 1GHz
Fixes: 6a0712f6f199 ("PM / OPP: Add dev_pm_opp_set_rate()")
Cc: 4.6+ <stable(a)vger.kernel.org> # v4.6+
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz(a)gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Sending it for stable kernels from 4.6 until 4.9.
drivers/base/power/opp/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 4c7c6da7a989..0580cbe5bd1c 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -642,7 +642,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
rcu_read_unlock();
/* Scaling up? Scale voltage before frequency */
- if (freq > old_freq) {
+ if (freq >= old_freq) {
ret = _set_opp_voltage(dev, reg, u_volt, u_volt_min,
u_volt_max);
if (ret)
--
2.18.0.rc1.242.g61856ae69a2c
This is a note to let you know that I've just added the patch titled
eventpoll.h: wrap casts in () properly
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 45cd74cb5061781e793a098c420a7f548fdc9e7d Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Date: Tue, 10 Jul 2018 17:15:38 +0200
Subject: eventpoll.h: wrap casts in () properly
When importing the latest copy of the kernel headers into Bionic,
Christpher and Elliott noticed that the eventpoll.h casts were not
wrapped in (). As it is, clang complains about macros without
surrounding (), so this makes it a pain for userspace tools.
So fix it up by adding another () pair, and make them line up purty by
using tabs.
Fixes: 65aaf87b3aa2 ("add EPOLLNVAL, annotate EPOLL... and event_poll->event")
Reported-by: Christopher Ferris <cferris(a)google.com>
Reported-by: Elliott Hughes <enh(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/uapi/linux/eventpoll.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bf48e71f2634..8a3432d0f0dc 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -42,7 +42,7 @@
#define EPOLLRDHUP (__force __poll_t)0x00002000
/* Set exclusive wakeup mode for the target file descriptor */
-#define EPOLLEXCLUSIVE (__force __poll_t)(1U << 28)
+#define EPOLLEXCLUSIVE ((__force __poll_t)(1U << 28))
/*
* Request the handling of system wakeup events so as to prevent system suspends
@@ -54,13 +54,13 @@
*
* Requires CAP_BLOCK_SUSPEND
*/
-#define EPOLLWAKEUP (__force __poll_t)(1U << 29)
+#define EPOLLWAKEUP ((__force __poll_t)(1U << 29))
/* Set the One Shot behaviour for the target file descriptor */
-#define EPOLLONESHOT (__force __poll_t)(1U << 30)
+#define EPOLLONESHOT ((__force __poll_t)(1U << 30))
/* Set the Edge Triggered behaviour for the target file descriptor */
-#define EPOLLET (__force __poll_t)(1U << 31)
+#define EPOLLET ((__force __poll_t)(1U << 31))
/*
* On x86-64 make the 64bit structure have the same alignment as the
--
2.18.0
Commit 815c6704bf9f1c59f3a6be380a4032b9c57b12f1 upstream.
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black(a)intel.com>
Cc: Jon Derrick <jonathan.derrick(a)intel.com>
Signed-off-by: Keith Busch <keith.busch(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Scott Bauer <scott.bauer(a)intel.com>
---
drivers/nvme/host/pci.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3d4724e38aa9..4cac4755abef 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1233,17 +1233,15 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
int qid, int depth)
{
- if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) {
- unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
- dev->ctrl.page_size);
- nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
- nvmeq->sq_cmds_io = dev->cmb + offset;
- } else {
- nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
- &nvmeq->sq_dma_addr, GFP_KERNEL);
- if (!nvmeq->sq_cmds)
- return -ENOMEM;
- }
+
+ /* CMB SQEs will be mapped before creation */
+ if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz))
+ return 0;
+
+ nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+ &nvmeq->sq_dma_addr, GFP_KERNEL);
+ if (!nvmeq->sq_cmds)
+ return -ENOMEM;
return 0;
}
@@ -1320,6 +1318,13 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
struct nvme_dev *dev = nvmeq->dev;
int result;
+ if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) {
+ unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
+ dev->ctrl.page_size);
+ nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
+ nvmeq->sq_cmds_io = dev->cmb + offset;
+ }
+
nvmeq->cq_vector = qid - 1;
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
--
2.17.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From abe41184abac487264a4904bfcff2d5500dccce6 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Date: Tue, 10 Jul 2018 23:42:15 +0200
Subject: [PATCH] i2c: recovery: if possible send STOP with recovery pulses
I2C clients may misunderstand recovery pulses if they can't read SDA to
bail out early. In the worst case, as a write operation. To avoid that
and if we can write SDA, try to send STOP to avoid the
misinterpretation.
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
Reviewed-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Wolfram Sang <wsa(a)the-dreams.de>
Cc: stable(a)kernel.org
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index 31d16ada6e7d..301285c54603 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -198,7 +198,16 @@ int i2c_generic_scl_recovery(struct i2c_adapter *adap)
val = !val;
bri->set_scl(adap, val);
- ndelay(RECOVERY_NDELAY);
+
+ /*
+ * If we can set SDA, we will always create STOP here to ensure
+ * the additional pulses will do no harm. This is achieved by
+ * letting SDA follow SCL half a cycle later.
+ */
+ ndelay(RECOVERY_NDELAY / 2);
+ if (bri->set_sda)
+ bri->set_sda(adap, val);
+ ndelay(RECOVERY_NDELAY / 2);
}
/* check if recovery actually succeeded */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b697d7d8c741f27b728a878fc55852b06d0f6f5e Mon Sep 17 00:00:00 2001
From: "Michael J. Ruhl" <michael.j.ruhl(a)intel.com>
Date: Wed, 20 Jun 2018 09:29:08 -0700
Subject: [PATCH] IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return
values
The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
All of the relevant call sites look for IS_ERR, so the NULL return would
lead to a NULL pointer exception.
Do not use the ERR_PTR mechanism for this function.
Update all call sites to handle the return value correctly.
Clean up error paths to reflect return value.
Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable(a)vger.kernel.org> # 4.9.x+
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam(a)intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c
index 1a1a47ac53c6..f15c93102081 100644
--- a/drivers/infiniband/hw/hfi1/rc.c
+++ b/drivers/infiniband/hw/hfi1/rc.c
@@ -271,7 +271,7 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
lockdep_assert_held(&qp->s_lock);
ps->s_txreq = get_txreq(ps->dev, qp);
- if (IS_ERR(ps->s_txreq))
+ if (!ps->s_txreq)
goto bail_no_tx;
if (priv->hdr_type == HFI1_PKT_TYPE_9B) {
diff --git a/drivers/infiniband/hw/hfi1/uc.c b/drivers/infiniband/hw/hfi1/uc.c
index b7b671017e59..e254dcec6f64 100644
--- a/drivers/infiniband/hw/hfi1/uc.c
+++ b/drivers/infiniband/hw/hfi1/uc.c
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2015, 2016 Intel Corporation.
+ * Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@@ -72,7 +72,7 @@ int hfi1_make_uc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
int middle = 0;
ps->s_txreq = get_txreq(ps->dev, qp);
- if (IS_ERR(ps->s_txreq))
+ if (!ps->s_txreq)
goto bail_no_tx;
if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_SEND_OK)) {
diff --git a/drivers/infiniband/hw/hfi1/ud.c b/drivers/infiniband/hw/hfi1/ud.c
index 1ab332f1866e..70d39fc450a1 100644
--- a/drivers/infiniband/hw/hfi1/ud.c
+++ b/drivers/infiniband/hw/hfi1/ud.c
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2015, 2016 Intel Corporation.
+ * Copyright(c) 2015 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@@ -503,7 +503,7 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps)
u32 lid;
ps->s_txreq = get_txreq(ps->dev, qp);
- if (IS_ERR(ps->s_txreq))
+ if (!ps->s_txreq)
goto bail_no_tx;
if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_NEXT_SEND_OK)) {
diff --git a/drivers/infiniband/hw/hfi1/verbs_txreq.c b/drivers/infiniband/hw/hfi1/verbs_txreq.c
index 873e48ea923f..c4ab2d5b4502 100644
--- a/drivers/infiniband/hw/hfi1/verbs_txreq.c
+++ b/drivers/infiniband/hw/hfi1/verbs_txreq.c
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2016 - 2017 Intel Corporation.
+ * Copyright(c) 2016 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@@ -94,7 +94,7 @@ struct verbs_txreq *__get_txreq(struct hfi1_ibdev *dev,
struct rvt_qp *qp)
__must_hold(&qp->s_lock)
{
- struct verbs_txreq *tx = ERR_PTR(-EBUSY);
+ struct verbs_txreq *tx = NULL;
write_seqlock(&dev->txwait_lock);
if (ib_rvt_state_ops[qp->state] & RVT_PROCESS_RECV_OK) {
diff --git a/drivers/infiniband/hw/hfi1/verbs_txreq.h b/drivers/infiniband/hw/hfi1/verbs_txreq.h
index 729244c3086c..1c19bbc764b2 100644
--- a/drivers/infiniband/hw/hfi1/verbs_txreq.h
+++ b/drivers/infiniband/hw/hfi1/verbs_txreq.h
@@ -1,5 +1,5 @@
/*
- * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 - 2018 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@@ -83,7 +83,7 @@ static inline struct verbs_txreq *get_txreq(struct hfi1_ibdev *dev,
if (unlikely(!tx)) {
/* call slow path to get the lock */
tx = __get_txreq(dev, qp);
- if (IS_ERR(tx))
+ if (!tx)
return tx;
}
tx->qp = qp;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe48aecb4df837540f13b5216f27ddb306aaf4b9 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 1 Jul 2018 15:31:54 +0300
Subject: [PATCH] RDMA/uverbs: Don't fail in creation of multiple flows
The conversion from offsetof() calculations to sizeof()
wrongly behaved for missed exact size and in scenario with
more than one flow.
In such scenario we got "create flow failed, flow 10: 8 bytes
left from uverb cmd" error, which is wrong because the size of
kern_spec is exactly 8 bytes, and we were not supposed to fail.
Cc: <stable(a)vger.kernel.org> # 3.12
Fixes: 4fae7f170416 ("RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow")
Reported-by: Ran Rozenstein <ranro(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 87ffeebc0b28..cc06e8404e9b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3586,7 +3586,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
kern_spec = kern_flow_attr->flow_specs;
ib_spec = flow_attr + 1;
for (i = 0; i < flow_attr->num_of_specs &&
- cmd.flow_attr.size > sizeof(*kern_spec) &&
+ cmd.flow_attr.size >= sizeof(*kern_spec) &&
cmd.flow_attr.size >= kern_spec->size;
i++) {
err = kern_spec_to_ib_spec(
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe48aecb4df837540f13b5216f27ddb306aaf4b9 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 1 Jul 2018 15:31:54 +0300
Subject: [PATCH] RDMA/uverbs: Don't fail in creation of multiple flows
The conversion from offsetof() calculations to sizeof()
wrongly behaved for missed exact size and in scenario with
more than one flow.
In such scenario we got "create flow failed, flow 10: 8 bytes
left from uverb cmd" error, which is wrong because the size of
kern_spec is exactly 8 bytes, and we were not supposed to fail.
Cc: <stable(a)vger.kernel.org> # 3.12
Fixes: 4fae7f170416 ("RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow")
Reported-by: Ran Rozenstein <ranro(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 87ffeebc0b28..cc06e8404e9b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3586,7 +3586,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
kern_spec = kern_flow_attr->flow_specs;
ib_spec = flow_attr + 1;
for (i = 0; i < flow_attr->num_of_specs &&
- cmd.flow_attr.size > sizeof(*kern_spec) &&
+ cmd.flow_attr.size >= sizeof(*kern_spec) &&
cmd.flow_attr.size >= kern_spec->size;
i++) {
err = kern_spec_to_ib_spec(
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe48aecb4df837540f13b5216f27ddb306aaf4b9 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 1 Jul 2018 15:31:54 +0300
Subject: [PATCH] RDMA/uverbs: Don't fail in creation of multiple flows
The conversion from offsetof() calculations to sizeof()
wrongly behaved for missed exact size and in scenario with
more than one flow.
In such scenario we got "create flow failed, flow 10: 8 bytes
left from uverb cmd" error, which is wrong because the size of
kern_spec is exactly 8 bytes, and we were not supposed to fail.
Cc: <stable(a)vger.kernel.org> # 3.12
Fixes: 4fae7f170416 ("RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow")
Reported-by: Ran Rozenstein <ranro(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 87ffeebc0b28..cc06e8404e9b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3586,7 +3586,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
kern_spec = kern_flow_attr->flow_specs;
ib_spec = flow_attr + 1;
for (i = 0; i < flow_attr->num_of_specs &&
- cmd.flow_attr.size > sizeof(*kern_spec) &&
+ cmd.flow_attr.size >= sizeof(*kern_spec) &&
cmd.flow_attr.size >= kern_spec->size;
i++) {
err = kern_spec_to_ib_spec(
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe48aecb4df837540f13b5216f27ddb306aaf4b9 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 1 Jul 2018 15:31:54 +0300
Subject: [PATCH] RDMA/uverbs: Don't fail in creation of multiple flows
The conversion from offsetof() calculations to sizeof()
wrongly behaved for missed exact size and in scenario with
more than one flow.
In such scenario we got "create flow failed, flow 10: 8 bytes
left from uverb cmd" error, which is wrong because the size of
kern_spec is exactly 8 bytes, and we were not supposed to fail.
Cc: <stable(a)vger.kernel.org> # 3.12
Fixes: 4fae7f170416 ("RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow")
Reported-by: Ran Rozenstein <ranro(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 87ffeebc0b28..cc06e8404e9b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3586,7 +3586,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
kern_spec = kern_flow_attr->flow_specs;
ib_spec = flow_attr + 1;
for (i = 0; i < flow_attr->num_of_specs &&
- cmd.flow_attr.size > sizeof(*kern_spec) &&
+ cmd.flow_attr.size >= sizeof(*kern_spec) &&
cmd.flow_attr.size >= kern_spec->size;
i++) {
err = kern_spec_to_ib_spec(
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 940efcc8889f0d15567eb07fc9fd69b06e366aa5 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 24 Jun 2018 11:23:42 +0300
Subject: [PATCH] RDMA/uverbs: Protect from attempts to create flows on
unsupported QP
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.
The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.
Cc: <stable(a)vger.kernel.org> # 3.11
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller(a)googlegroups.com>
Reported-by: Noa Osherovich <noaos(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3e90b6a1d9d2..89c4ce2da78b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3559,6 +3559,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
goto err_uobj;
}
+ if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) {
+ err = -EINVAL;
+ goto err_put;
+ }
+
flow_attr = kzalloc(struct_size(flow_attr, flows,
cmd.flow_attr.num_of_specs), GFP_KERNEL);
if (!flow_attr) {
The patch below does not apply to the 4.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 940efcc8889f0d15567eb07fc9fd69b06e366aa5 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 24 Jun 2018 11:23:42 +0300
Subject: [PATCH] RDMA/uverbs: Protect from attempts to create flows on
unsupported QP
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.
The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.
Cc: <stable(a)vger.kernel.org> # 3.11
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller(a)googlegroups.com>
Reported-by: Noa Osherovich <noaos(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3e90b6a1d9d2..89c4ce2da78b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3559,6 +3559,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
goto err_uobj;
}
+ if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) {
+ err = -EINVAL;
+ goto err_put;
+ }
+
flow_attr = kzalloc(struct_size(flow_attr, flows,
cmd.flow_attr.num_of_specs), GFP_KERNEL);
if (!flow_attr) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 940efcc8889f0d15567eb07fc9fd69b06e366aa5 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 24 Jun 2018 11:23:42 +0300
Subject: [PATCH] RDMA/uverbs: Protect from attempts to create flows on
unsupported QP
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.
The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.
Cc: <stable(a)vger.kernel.org> # 3.11
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller(a)googlegroups.com>
Reported-by: Noa Osherovich <noaos(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3e90b6a1d9d2..89c4ce2da78b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3559,6 +3559,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
goto err_uobj;
}
+ if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) {
+ err = -EINVAL;
+ goto err_put;
+ }
+
flow_attr = kzalloc(struct_size(flow_attr, flows,
cmd.flow_attr.num_of_specs), GFP_KERNEL);
if (!flow_attr) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 940efcc8889f0d15567eb07fc9fd69b06e366aa5 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 24 Jun 2018 11:23:42 +0300
Subject: [PATCH] RDMA/uverbs: Protect from attempts to create flows on
unsupported QP
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.
The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.
Cc: <stable(a)vger.kernel.org> # 3.11
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller(a)googlegroups.com>
Reported-by: Noa Osherovich <noaos(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3e90b6a1d9d2..89c4ce2da78b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3559,6 +3559,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
goto err_uobj;
}
+ if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) {
+ err = -EINVAL;
+ goto err_put;
+ }
+
flow_attr = kzalloc(struct_size(flow_attr, flows,
cmd.flow_attr.num_of_specs), GFP_KERNEL);
if (!flow_attr) {
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 940efcc8889f0d15567eb07fc9fd69b06e366aa5 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 24 Jun 2018 11:23:42 +0300
Subject: [PATCH] RDMA/uverbs: Protect from attempts to create flows on
unsupported QP
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.
The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.
Cc: <stable(a)vger.kernel.org> # 3.11
Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller(a)googlegroups.com>
Reported-by: Noa Osherovich <noaos(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3e90b6a1d9d2..89c4ce2da78b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3559,6 +3559,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
goto err_uobj;
}
+ if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) {
+ err = -EINVAL;
+ goto err_put;
+ }
+
flow_attr = kzalloc(struct_size(flow_attr, flows,
cmd.flow_attr.num_of_specs), GFP_KERNEL);
if (!flow_attr) {
Just a note that I just pushed out commit e181ae0c5db9 ("mm: zero
unavailable pages before memmap init") that fixes the nasty VM crash
issue I had, and that apparently the Fedora people have also seen in
the wild in 4.17.4.
The commit that triggered this didn't make it in until 4.18-rc3, but
it has apparently already been back-ported to stable, so now the fix
needs to be back-ported too.
Linus
Fedora has integrated the jitter entropy daemon to work around slow
boot problems, especially on VM's that don't support virtio-rng:
https://bugzilla.redhat.com/show_bug.cgi?id=1572944
It's understandable why they did this, but the Jitter entropy daemon
works fundamentally on the principle: "the CPU microarchitecture is
**so** complicated and we can't figure it out, so it *must* be
random". Yes, it uses statistical tests to "prove" it is secure, but
AES_ENCRYPT(NSA_KEY, COUNTER++) will also pass statistical tests with
flying colors.
So if RDRAND is available, mix it into entropy submitted from
userspace. It can't hurt, and if you believe the NSA has backdoored
RDRAND, then they probably have enough details about the Intel
microarchitecture that they can reverse engineer how the Jitter
entropy daemon affects the microarchitecture, and attack its output
stream. And if RDRAND is in fact an honest DRNG, it will immeasurably
improve on what the Jitter entropy daemon might produce.
This also provides some protection against someone who is able to read
or set the entropy seed file.
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)vger.kernel.org
---
drivers/char/random.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0706646b018d..283fe390e878 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1896,14 +1896,22 @@ static int
write_pool(struct entropy_store *r, const char __user *buffer, size_t count)
{
size_t bytes;
- __u32 buf[16];
+ __u32 t, buf[16];
const char __user *p = buffer;
while (count > 0) {
+ int b, i = 0;
+
bytes = min(count, sizeof(buf));
if (copy_from_user(&buf, p, bytes))
return -EFAULT;
+ for (b = bytes ; b > 0 ; b -= sizeof(__u32), i++) {
+ if (arch_get_random_int(&t))
+ continue;
+ buf[i] ^= t;
+ }
+
count -= bytes;
p += bytes;
--
2.18.0.rc0
A report from Colin Ian King pointed a CoverityScan issue where error
values on these helpers where not checked in the drivers. These
helpers could error out only in case of a software bug in driver code,
not because of a runtime/hardware error but in any cases it is safer
to handle these errors properly.
Before fixing the drivers, let's add some consistency and fix these
helpers error handling.
Fixes: 8878b126df76 ("mtd: nand: add ->exec_op() implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/nand_base.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 10c4f9919850..51f68203aa63 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -2720,6 +2720,8 @@ int nand_subop_get_num_addr_cyc(const struct nand_subop *subop,
return -EINVAL;
start_off = nand_subop_get_addr_start_off(subop, instr_idx);
+ if (start_off < 0)
+ return start_off;
if (instr_idx == subop->ninstrs - 1 &&
subop->last_instr_end_off)
@@ -2774,6 +2776,8 @@ int nand_subop_get_data_len(const struct nand_subop *subop,
return -EINVAL;
start_off = nand_subop_get_data_start_off(subop, instr_idx);
+ if (start_off < 0)
+ return start_off;
if (instr_idx == subop->ninstrs - 1 &&
subop->last_instr_end_off)
--
2.14.1
From: Thomas Gleixner <tglx(a)linutronix.de>
commit be6fcb5478e95bb1c91f489121238deb3abca46a upstream
x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value
which are not to be modified. However the implementation is not really used
and the bitmask was inverted to make a check easier, which was removed in
"x86/bugs: Remove x86_spec_ctrl_set()"
Aside of that it is missing the STIBP bit if it is supported by the
platform, so if the mask would be used in x86_virt_spec_ctrl() then it
would prevent a guest from setting STIBP.
Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to
sanitize the value which is supplied by the guest.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/cpu/bugs.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 05eed68..af11a02 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -41,7 +41,7 @@ EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
* The vendor and possibly platform specific bits which can be modified in
* x86_spec_ctrl_base.
*/
-static u64 x86_spec_ctrl_mask = ~SPEC_CTRL_IBRS;
+static u64 x86_spec_ctrl_mask = SPEC_CTRL_IBRS;
/*
* AMD specific MSR info for Speculative Store Bypass control.
@@ -67,6 +67,10 @@ void __init check_bugs(void)
if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
rdmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ /* Allow STIBP in MSR_SPEC_CTRL if supported */
+ if (boot_cpu_has(X86_FEATURE_STIBP))
+ x86_spec_ctrl_mask |= SPEC_CTRL_STIBP;
+
/* Select the proper spectre mitigation before patching alternatives */
spectre_v2_select_mitigation();
@@ -134,18 +138,26 @@ static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE;
void
x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
{
+ u64 msrval, guestval, hostval = x86_spec_ctrl_base;
struct thread_info *ti = current_thread_info();
- u64 msr, host = x86_spec_ctrl_base;
/* Is MSR_SPEC_CTRL implemented ? */
if (static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) {
+ /*
+ * Restrict guest_spec_ctrl to supported values. Clear the
+ * modifiable bits in the host base value and or the
+ * modifiable bits from the guest value.
+ */
+ guestval = hostval & ~x86_spec_ctrl_mask;
+ guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
+
/* SSBD controlled in MSR_SPEC_CTRL */
if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
- host |= ssbd_tif_to_spec_ctrl(ti->flags);
+ hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
- if (host != guest_spec_ctrl) {
- msr = setguest ? guest_spec_ctrl : host;
- wrmsrl(MSR_IA32_SPEC_CTRL, msr);
+ if (hostval != guestval) {
+ msrval = setguest ? guestval : hostval;
+ wrmsrl(MSR_IA32_SPEC_CTRL, msrval);
}
}
}
@@ -491,7 +503,7 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_INTEL:
x86_spec_ctrl_base |= SPEC_CTRL_SSBD;
- x86_spec_ctrl_mask &= ~SPEC_CTRL_SSBD;
+ x86_spec_ctrl_mask |= SPEC_CTRL_SSBD;
wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
break;
case X86_VENDOR_AMD:
From: Thomas Gleixner <tglx(a)linutronix.de>
commit fa8ac4988249c38476f6ad678a4848a736373403 upstream
x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR.
x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to
prevent modification to that variable. Though the variable is read only
after init and globaly visible already.
Remove the function and export the variable instead.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/nospec-branch.h | 16 +++++-----------
arch/x86/include/asm/spec-ctrl.h | 3 ---
arch/x86/kernel/cpu/bugs.c | 11 +----------
3 files changed, 6 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 640c11b..2757c79 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -172,16 +172,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_IBRS,
};
-/*
- * The Intel specification for the SPEC_CTRL MSR requires that we
- * preserve any already set reserved bits at boot time (e.g. for
- * future additions that this kernel is not currently aware of).
- * We then set any additional mitigation bits that we want
- * ourselves and always use this as the base for SPEC_CTRL.
- * We also use this when handling guest entry/exit as below.
- */
extern void x86_spec_ctrl_set(u64);
-extern u64 x86_spec_ctrl_get_default(void);
/* The Speculative Store Bypass disable variants */
enum ssb_mitigation {
@@ -232,6 +223,9 @@ static inline void indirect_branch_prediction_barrier(void)
alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
}
+/* The Intel SPEC CTRL MSR base value cache */
+extern u64 x86_spec_ctrl_base;
+
/*
* With retpoline, we must use IBRS to restrict branch prediction
* before calling into firmware.
@@ -240,7 +234,7 @@ static inline void indirect_branch_prediction_barrier(void)
*/
#define firmware_restrict_branch_speculation_start() \
do { \
- u64 val = x86_spec_ctrl_get_default() | SPEC_CTRL_IBRS; \
+ u64 val = x86_spec_ctrl_base | SPEC_CTRL_IBRS; \
\
preempt_disable(); \
alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
@@ -249,7 +243,7 @@ do { \
#define firmware_restrict_branch_speculation_end() \
do { \
- u64 val = x86_spec_ctrl_get_default(); \
+ u64 val = x86_spec_ctrl_base; \
\
alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
X86_FEATURE_USE_IBRS_FW); \
diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
index 9cecbe5..763d497 100644
--- a/arch/x86/include/asm/spec-ctrl.h
+++ b/arch/x86/include/asm/spec-ctrl.h
@@ -47,9 +47,6 @@ void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl)
extern u64 x86_amd_ls_cfg_base;
extern u64 x86_amd_ls_cfg_ssbd_mask;
-/* The Intel SPEC CTRL MSR base value cache */
-extern u64 x86_spec_ctrl_base;
-
static inline u64 ssbd_tif_to_spec_ctrl(u64 tifn)
{
BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 208d44c..5391df5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -35,6 +35,7 @@ static void __init ssb_select_mitigation(void);
* writes to SPEC_CTRL contain whatever reserved bits have been set.
*/
u64 x86_spec_ctrl_base;
+EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
/*
* The vendor and possibly platform specific bits which can be modified in
@@ -139,16 +140,6 @@ void x86_spec_ctrl_set(u64 val)
}
EXPORT_SYMBOL_GPL(x86_spec_ctrl_set);
-u64 x86_spec_ctrl_get_default(void)
-{
- u64 msrval = x86_spec_ctrl_base;
-
- if (static_cpu_has(X86_FEATURE_SPEC_CTRL))
- msrval |= ssbd_tif_to_spec_ctrl(current_thread_info()->flags);
- return msrval;
-}
-EXPORT_SYMBOL_GPL(x86_spec_ctrl_get_default);
-
void
x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
{
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit 11fb0683493b2da112cd64c9dada221b52463bf7 upstream
Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.
Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 4 +++-
arch/x86/kernel/process.c | 13 ++++++++++++-
4 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 8ae9132..f4b175d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -269,6 +269,7 @@
#define X86_FEATURE_AMD_IBPB (13*32+12) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_AMD_IBRS (13*32+14) /* Indirect Branch Restricted Speculation */
#define X86_FEATURE_AMD_STIBP (13*32+15) /* Single Thread Indirect Branch Predictors */
+#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2ea2ff1..22f2dd5 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -328,6 +328,8 @@
#define MSR_AMD64_IBSOPDATA4 0xc001103d
#define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */
+#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
+
/* Fam 16h MSRs */
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
#define MSR_F16H_L2I_PERF_CTR 0xc0010231
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a1c98fd..50ab206a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -203,7 +203,9 @@ static void x86_amd_ssb_disable(void)
{
u64 msrval = x86_amd_ls_cfg_base | x86_amd_ls_cfg_ssbd_mask;
- if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD))
+ if (boot_cpu_has(X86_FEATURE_VIRT_SSBD))
+ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, SPEC_CTRL_SSBD);
+ else if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD))
wrmsrl(MSR_AMD64_LS_CFG, msrval);
}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0842869..eab9d0c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -308,6 +308,15 @@ static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
}
#endif
+static __always_inline void amd_set_ssb_virt_state(unsigned long tifn)
+{
+ /*
+ * SSBD has the same definition in SPEC_CTRL and VIRT_SPEC_CTRL,
+ * so ssbd_tif_to_spec_ctrl() just works.
+ */
+ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn));
+}
+
static __always_inline void intel_set_ssb_state(unsigned long tifn)
{
u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn);
@@ -317,7 +326,9 @@ static __always_inline void intel_set_ssb_state(unsigned long tifn)
static __always_inline void __speculative_store_bypass_update(unsigned long tifn)
{
- if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD))
+ if (static_cpu_has(X86_FEATURE_VIRT_SSBD))
+ amd_set_ssb_virt_state(tifn);
+ else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD))
amd_set_core_ssb_state(tifn);
else
intel_set_ssb_state(tifn);
From: Thomas Gleixner <tglx(a)linutronix.de>
commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 upstream
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.
Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.
Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Srivatsa: Backported to 4.4.y, skipping the KVM changes in this patch. ]
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/spec-ctrl.h | 9 ++++++---
arch/x86/kernel/cpu/bugs.c | 20 ++++++++++++++++++--
2 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
index 0cb49c4..6e28740 100644
--- a/arch/x86/include/asm/spec-ctrl.h
+++ b/arch/x86/include/asm/spec-ctrl.h
@@ -10,10 +10,13 @@
* the guest has, while on VMEXIT we restore the host view. This
* would be easier if SPEC_CTRL were architecturally maskable or
* shadowable for guests but this is not (currently) the case.
- * Takes the guest view of SPEC_CTRL MSR as a parameter.
+ * Takes the guest view of SPEC_CTRL MSR as a parameter and also
+ * the guest's version of VIRT_SPEC_CTRL, if emulated.
*/
-extern void x86_spec_ctrl_set_guest(u64);
-extern void x86_spec_ctrl_restore_host(u64);
+extern void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl,
+ u64 guest_virt_spec_ctrl);
+extern void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl,
+ u64 guest_virt_spec_ctrl);
/* AMD specific Speculative Store Bypass MSR data */
extern u64 x86_amd_ls_cfg_base;
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9be7292..a1c98fd 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -149,7 +149,15 @@ u64 x86_spec_ctrl_get_default(void)
}
EXPORT_SYMBOL_GPL(x86_spec_ctrl_get_default);
-void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl)
+/**
+ * x86_spec_ctrl_set_guest - Set speculation control registers for the guest
+ * @guest_spec_ctrl: The guest content of MSR_SPEC_CTRL
+ * @guest_virt_spec_ctrl: The guest controlled bits of MSR_VIRT_SPEC_CTRL
+ * (may get translated to MSR_AMD64_LS_CFG bits)
+ *
+ * Avoids writing to the MSR if the content/bits are the same
+ */
+void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl)
{
u64 host = x86_spec_ctrl_base;
@@ -166,7 +174,15 @@ void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl)
}
EXPORT_SYMBOL_GPL(x86_spec_ctrl_set_guest);
-void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl)
+/**
+ * x86_spec_ctrl_restore_host - Restore host speculation control registers
+ * @guest_spec_ctrl: The guest content of MSR_SPEC_CTRL
+ * @guest_virt_spec_ctrl: The guest controlled bits of MSR_VIRT_SPEC_CTRL
+ * (may get translated to MSR_AMD64_LS_CFG bits)
+ *
+ * Avoids writing to the MSR if the content/bits are the same
+ */
+void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl)
{
u64 host = x86_spec_ctrl_base;
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 1f50ddb4f4189243c05926b842dc1a0332195f31 upstream
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when
hyperthreading is enabled the SSBD bit toggle needs to take both cores into
account. Otherwise the following situation can happen:
CPU0 CPU1
disable SSB
disable SSB
enable SSB <- Enables it for the Core, i.e. for CPU0 as well
So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled
again.
On Intel the SSBD control is per core as well, but the synchronization
logic is implemented behind the per thread SPEC_CTRL MSR. It works like
this:
CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL
i.e. if one of the threads enables a mitigation then this affects both and
the mitigation is only disabled in the core when both threads disabled it.
Add the necessary synchronization logic for AMD family 17H. Unfortunately
that requires a spinlock to serialize the access to the MSR, but the locks
are only shared between siblings.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/spec-ctrl.h | 6 ++
arch/x86/kernel/process.c | 125 ++++++++++++++++++++++++++++++++++++--
arch/x86/kernel/smpboot.c | 5 ++
3 files changed, 130 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
index dc21209..0cb49c4 100644
--- a/arch/x86/include/asm/spec-ctrl.h
+++ b/arch/x86/include/asm/spec-ctrl.h
@@ -33,6 +33,12 @@ static inline u64 ssbd_tif_to_amd_ls_cfg(u64 tifn)
return (tifn & _TIF_SSBD) ? x86_amd_ls_cfg_ssbd_mask : 0ULL;
}
+#ifdef CONFIG_SMP
+extern void speculative_store_bypass_ht_init(void);
+#else
+static inline void speculative_store_bypass_ht_init(void) { }
+#endif
+
extern void speculative_store_bypass_update(void);
#endif
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 8cefbd2..0842869 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -199,22 +199,135 @@ static inline void switch_to_bitmap(struct tss_struct *tss,
}
}
-static __always_inline void __speculative_store_bypass_update(unsigned long tifn)
+#ifdef CONFIG_SMP
+
+struct ssb_state {
+ struct ssb_state *shared_state;
+ raw_spinlock_t lock;
+ unsigned int disable_state;
+ unsigned long local_state;
+};
+
+#define LSTATE_SSB 0
+
+static DEFINE_PER_CPU(struct ssb_state, ssb_state);
+
+void speculative_store_bypass_ht_init(void)
{
- u64 msr;
+ struct ssb_state *st = this_cpu_ptr(&ssb_state);
+ unsigned int this_cpu = smp_processor_id();
+ unsigned int cpu;
+
+ st->local_state = 0;
+
+ /*
+ * Shared state setup happens once on the first bringup
+ * of the CPU. It's not destroyed on CPU hotunplug.
+ */
+ if (st->shared_state)
+ return;
+
+ raw_spin_lock_init(&st->lock);
- if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) {
- msr = x86_amd_ls_cfg_base | ssbd_tif_to_amd_ls_cfg(tifn);
+ /*
+ * Go over HT siblings and check whether one of them has set up the
+ * shared state pointer already.
+ */
+ for_each_cpu(cpu, topology_sibling_cpumask(this_cpu)) {
+ if (cpu == this_cpu)
+ continue;
+
+ if (!per_cpu(ssb_state, cpu).shared_state)
+ continue;
+
+ /* Link it to the state of the sibling: */
+ st->shared_state = per_cpu(ssb_state, cpu).shared_state;
+ return;
+ }
+
+ /*
+ * First HT sibling to come up on the core. Link shared state of
+ * the first HT sibling to itself. The siblings on the same core
+ * which come up later will see the shared state pointer and link
+ * themself to the state of this CPU.
+ */
+ st->shared_state = st;
+}
+
+/*
+ * Logic is: First HT sibling enables SSBD for both siblings in the core
+ * and last sibling to disable it, disables it for the whole core. This how
+ * MSR_SPEC_CTRL works in "hardware":
+ *
+ * CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL
+ */
+static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
+{
+ struct ssb_state *st = this_cpu_ptr(&ssb_state);
+ u64 msr = x86_amd_ls_cfg_base;
+
+ if (!static_cpu_has(X86_FEATURE_ZEN)) {
+ msr |= ssbd_tif_to_amd_ls_cfg(tifn);
wrmsrl(MSR_AMD64_LS_CFG, msr);
+ return;
+ }
+
+ if (tifn & _TIF_SSBD) {
+ /*
+ * Since this can race with prctl(), block reentry on the
+ * same CPU.
+ */
+ if (__test_and_set_bit(LSTATE_SSB, &st->local_state))
+ return;
+
+ msr |= x86_amd_ls_cfg_ssbd_mask;
+
+ raw_spin_lock(&st->shared_state->lock);
+ /* First sibling enables SSBD: */
+ if (!st->shared_state->disable_state)
+ wrmsrl(MSR_AMD64_LS_CFG, msr);
+ st->shared_state->disable_state++;
+ raw_spin_unlock(&st->shared_state->lock);
} else {
- msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn);
- wrmsrl(MSR_IA32_SPEC_CTRL, msr);
+ if (!__test_and_clear_bit(LSTATE_SSB, &st->local_state))
+ return;
+
+ raw_spin_lock(&st->shared_state->lock);
+ st->shared_state->disable_state--;
+ if (!st->shared_state->disable_state)
+ wrmsrl(MSR_AMD64_LS_CFG, msr);
+ raw_spin_unlock(&st->shared_state->lock);
}
}
+#else
+static __always_inline void amd_set_core_ssb_state(unsigned long tifn)
+{
+ u64 msr = x86_amd_ls_cfg_base | ssbd_tif_to_amd_ls_cfg(tifn);
+
+ wrmsrl(MSR_AMD64_LS_CFG, msr);
+}
+#endif
+
+static __always_inline void intel_set_ssb_state(unsigned long tifn)
+{
+ u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn);
+
+ wrmsrl(MSR_IA32_SPEC_CTRL, msr);
+}
+
+static __always_inline void __speculative_store_bypass_update(unsigned long tifn)
+{
+ if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD))
+ amd_set_core_ssb_state(tifn);
+ else
+ intel_set_ssb_state(tifn);
+}
void speculative_store_bypass_update(void)
{
+ preempt_disable();
__speculative_store_bypass_update(current_thread_info()->flags);
+ preempt_enable();
}
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 1f7aefc..c017f1c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -75,6 +75,7 @@
#include <asm/i8259.h>
#include <asm/realmode.h>
#include <asm/misc.h>
+#include <asm/spec-ctrl.h>
/* Number of siblings per CPU package */
int smp_num_siblings = 1;
@@ -217,6 +218,8 @@ static void notrace start_secondary(void *unused)
*/
check_tsc_sync_target();
+ speculative_store_bypass_ht_init();
+
/*
* Lock vector_lock and initialize the vectors on this cpu
* before setting the cpu online. We must set it online with
@@ -1209,6 +1212,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
set_mtrr_aps_delayed_init();
smp_quirk_init_udelay();
+
+ speculative_store_bypass_ht_init();
}
void arch_enable_nonboot_cpus_begin(void)
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 7eb8956a7fec3c1f0abc2a5517dada99ccc8a961 upstream
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on
Intel and implied by IBRS or STIBP support on AMD. That's just confusing
and in case an AMD CPU has IBRS not supported because the underlying
problem has been fixed but has another bit valid in the SPEC_CTRL MSR,
the thing falls apart.
Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the
availability on both Intel and AMD.
While at it replace the boot_cpu_has() checks with static_cpu_has() where
possible. This prevents late microcode loading from exposing SPEC_CTRL, but
late loading is already very limited as it does not reevaluate the
mitigation options and other bits and pieces. Having static_cpu_has() is
the simplest and least fragile solution.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/cpufeatures.h | 3 +++
arch/x86/kernel/cpu/bugs.c | 18 +++++++++++-------
arch/x86/kernel/cpu/common.c | 9 +++++++--
arch/x86/kernel/cpu/intel.c | 1 +
4 files changed, 22 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 9f64d10..dd04cd7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -198,6 +198,9 @@
#define X86_FEATURE_RETPOLINE ( 7*32+29) /* "" Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_AMD ( 7*32+30) /* "" AMD Retpoline mitigation for Spectre variant 2 */
+
+#define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */
+
/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 84de0fc..e23e289 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -63,7 +63,7 @@ void __init check_bugs(void)
* have unknown values. AMD64_LS_CFG MSR is cached in the early AMD
* init code as it is not enumerated and depends on the family.
*/
- if (boot_cpu_has(X86_FEATURE_IBRS))
+ if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
rdmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
/* Select the proper spectre mitigation before patching alternatives */
@@ -143,7 +143,7 @@ u64 x86_spec_ctrl_get_default(void)
{
u64 msrval = x86_spec_ctrl_base;
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ if (static_cpu_has(X86_FEATURE_SPEC_CTRL))
msrval |= ssbd_tif_to_spec_ctrl(current_thread_info()->flags);
return msrval;
}
@@ -153,10 +153,12 @@ void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl)
{
u64 host = x86_spec_ctrl_base;
- if (!boot_cpu_has(X86_FEATURE_IBRS))
+ /* Is MSR_SPEC_CTRL implemented ? */
+ if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
return;
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ /* Intel controls SSB in MSR_SPEC_CTRL */
+ if (static_cpu_has(X86_FEATURE_SPEC_CTRL))
host |= ssbd_tif_to_spec_ctrl(current_thread_info()->flags);
if (host != guest_spec_ctrl)
@@ -168,10 +170,12 @@ void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl)
{
u64 host = x86_spec_ctrl_base;
- if (!boot_cpu_has(X86_FEATURE_IBRS))
+ /* Is MSR_SPEC_CTRL implemented ? */
+ if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
return;
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ /* Intel controls SSB in MSR_SPEC_CTRL */
+ if (static_cpu_has(X86_FEATURE_SPEC_CTRL))
host |= ssbd_tif_to_spec_ctrl(current_thread_info()->flags);
if (host != guest_spec_ctrl)
@@ -629,7 +633,7 @@ int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which)
void x86_spec_ctrl_setup_ap(void)
{
- if (boot_cpu_has(X86_FEATURE_IBRS))
+ if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
x86_spec_ctrl_set(x86_spec_ctrl_base & ~x86_spec_ctrl_mask);
if (ssb_mode == SPEC_STORE_BYPASS_DISABLE)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f2b579f..1f70ff1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -687,19 +687,24 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
set_cpu_cap(c, X86_FEATURE_IBRS);
set_cpu_cap(c, X86_FEATURE_IBPB);
+ set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
}
if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
set_cpu_cap(c, X86_FEATURE_STIBP);
- if (cpu_has(c, X86_FEATURE_AMD_IBRS))
+ if (cpu_has(c, X86_FEATURE_AMD_IBRS)) {
set_cpu_cap(c, X86_FEATURE_IBRS);
+ set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
+ }
if (cpu_has(c, X86_FEATURE_AMD_IBPB))
set_cpu_cap(c, X86_FEATURE_IBPB);
- if (cpu_has(c, X86_FEATURE_AMD_STIBP))
+ if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
set_cpu_cap(c, X86_FEATURE_STIBP);
+ set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
+ }
}
void get_cpu_cap(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index a34e357..9a84e75 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -118,6 +118,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
setup_clear_cpu_cap(X86_FEATURE_IBPB);
setup_clear_cpu_cap(X86_FEATURE_STIBP);
setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL);
+ setup_clear_cpu_cap(X86_FEATURE_MSR_SPEC_CTRL);
setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP);
setup_clear_cpu_cap(X86_FEATURE_SSBD);
}
From: Borislav Petkov <bp(a)suse.de>
commit dd0792699c4058e63c0715d9a7c2d40226fcdddc upstream
Fix some typos, improve formulations, end sentences with a fullstop.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
Documentation/spec_ctrl.txt | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/Documentation/spec_ctrl.txt b/Documentation/spec_ctrl.txt
index 1b3690d..32f3d55 100644
--- a/Documentation/spec_ctrl.txt
+++ b/Documentation/spec_ctrl.txt
@@ -2,13 +2,13 @@
Speculation Control
===================
-Quite some CPUs have speculation related misfeatures which are in fact
-vulnerabilites causing data leaks in various forms even accross privilege
-domains.
+Quite some CPUs have speculation-related misfeatures which are in
+fact vulnerabilities causing data leaks in various forms even across
+privilege domains.
The kernel provides mitigation for such vulnerabilities in various
-forms. Some of these mitigations are compile time configurable and some on
-the kernel command line.
+forms. Some of these mitigations are compile-time configurable and some
+can be supplied on the kernel command line.
There is also a class of mitigations which are very expensive, but they can
be restricted to a certain set of processes or tasks in controlled
@@ -32,18 +32,18 @@ the following meaning:
Bit Define Description
==== ===================== ===================================================
0 PR_SPEC_PRCTL Mitigation can be controlled per task by
- PR_SET_SPECULATION_CTRL
+ PR_SET_SPECULATION_CTRL.
1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
- disabled
+ disabled.
2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
- enabled
+ enabled.
3 PR_SPEC_FORCE_DISABLE Same as PR_SPEC_DISABLE, but cannot be undone. A
subsequent prctl(..., PR_SPEC_ENABLE) will fail.
==== ===================== ===================================================
If all bits are 0 the CPU is not affected by the speculation misfeature.
-If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
+If PR_SPEC_PRCTL is set, then the per-task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
@@ -61,9 +61,9 @@ Common error codes
Value Meaning
======= =================================================================
EINVAL The prctl is not implemented by the architecture or unused
- prctl(2) arguments are not 0
+ prctl(2) arguments are not 0.
-ENODEV arg2 is selecting a not supported speculation misfeature
+ENODEV arg2 is selecting a not supported speculation misfeature.
======= =================================================================
PR_SET_SPECULATION_CTRL error codes
@@ -74,7 +74,7 @@ Value Meaning
0 Success
ERANGE arg3 is incorrect, i.e. it's neither PR_SPEC_ENABLE nor
- PR_SPEC_DISABLE nor PR_SPEC_FORCE_DISABLE
+ PR_SPEC_DISABLE nor PR_SPEC_FORCE_DISABLE.
ENXIO Control of the selected speculation misfeature is not possible.
See PR_GET_SPECULATION_CTRL.
From: Kees Cook <keescook(a)chromium.org>
commit f21b53b20c754021935ea43364dbf53778eeba32 upstream
Unless explicitly opted out of, anything running under seccomp will have
SSB mitigations enabled. Choosing the "prctl" mode will disable this.
[ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ]
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
Documentation/kernel-parameters.txt | 26 +++++++++++++++++---------
arch/x86/include/asm/nospec-branch.h | 1 +
arch/x86/kernel/cpu/bugs.c | 32 +++++++++++++++++++++++---------
3 files changed, 41 insertions(+), 18 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 80202de..3fd53e1 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3647,19 +3647,27 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
This parameter controls whether the Speculative Store
Bypass optimization is used.
- on - Unconditionally disable Speculative Store Bypass
- off - Unconditionally enable Speculative Store Bypass
- auto - Kernel detects whether the CPU model contains an
- implementation of Speculative Store Bypass and
- picks the most appropriate mitigation.
- prctl - Control Speculative Store Bypass per thread
- via prctl. Speculative Store Bypass is enabled
- for a process by default. The state of the control
- is inherited on fork.
+ on - Unconditionally disable Speculative Store Bypass
+ off - Unconditionally enable Speculative Store Bypass
+ auto - Kernel detects whether the CPU model contains an
+ implementation of Speculative Store Bypass and
+ picks the most appropriate mitigation. If the
+ CPU is not vulnerable, "off" is selected. If the
+ CPU is vulnerable the default mitigation is
+ architecture and Kconfig dependent. See below.
+ prctl - Control Speculative Store Bypass per thread
+ via prctl. Speculative Store Bypass is enabled
+ for a process by default. The state of the control
+ is inherited on fork.
+ seccomp - Same as "prctl" above, but all seccomp threads
+ will disable SSB unless they explicitly opt out.
Not specifying this option is equivalent to
spec_store_bypass_disable=auto.
+ Default mitigations:
+ X86: If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
+
spia_io_base= [HW,MTD]
spia_fio_base=
spia_pedr=
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 155d955..930c1594 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -188,6 +188,7 @@ enum ssb_mitigation {
SPEC_STORE_BYPASS_NONE,
SPEC_STORE_BYPASS_DISABLE,
SPEC_STORE_BYPASS_PRCTL,
+ SPEC_STORE_BYPASS_SECCOMP,
};
extern char __indirect_thunk_start[];
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index b005ef7..6fd3fcf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -414,22 +414,25 @@ enum ssb_mitigation_cmd {
SPEC_STORE_BYPASS_CMD_AUTO,
SPEC_STORE_BYPASS_CMD_ON,
SPEC_STORE_BYPASS_CMD_PRCTL,
+ SPEC_STORE_BYPASS_CMD_SECCOMP,
};
static const char *ssb_strings[] = {
[SPEC_STORE_BYPASS_NONE] = "Vulnerable",
[SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled",
- [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl"
+ [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl",
+ [SPEC_STORE_BYPASS_SECCOMP] = "Mitigation: Speculative Store Bypass disabled via prctl and seccomp",
};
static const struct {
const char *option;
enum ssb_mitigation_cmd cmd;
} ssb_mitigation_options[] = {
- { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
- { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
- { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
- { "prctl", SPEC_STORE_BYPASS_CMD_PRCTL }, /* Disable Speculative Store Bypass via prctl */
+ { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
+ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
+ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
+ { "prctl", SPEC_STORE_BYPASS_CMD_PRCTL }, /* Disable Speculative Store Bypass via prctl */
+ { "seccomp", SPEC_STORE_BYPASS_CMD_SECCOMP }, /* Disable Speculative Store Bypass via prctl and seccomp */
};
static enum ssb_mitigation_cmd __init ssb_parse_cmdline(void)
@@ -479,8 +482,15 @@ static enum ssb_mitigation_cmd __init __ssb_select_mitigation(void)
switch (cmd) {
case SPEC_STORE_BYPASS_CMD_AUTO:
- /* Choose prctl as the default mode */
- mode = SPEC_STORE_BYPASS_PRCTL;
+ case SPEC_STORE_BYPASS_CMD_SECCOMP:
+ /*
+ * Choose prctl+seccomp as the default mode if seccomp is
+ * enabled.
+ */
+ if (IS_ENABLED(CONFIG_SECCOMP))
+ mode = SPEC_STORE_BYPASS_SECCOMP;
+ else
+ mode = SPEC_STORE_BYPASS_PRCTL;
break;
case SPEC_STORE_BYPASS_CMD_ON:
mode = SPEC_STORE_BYPASS_DISABLE;
@@ -528,12 +538,14 @@ static void ssb_select_mitigation()
}
#undef pr_fmt
+#define pr_fmt(fmt) "Speculation prctl: " fmt
static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl)
{
bool update;
- if (ssb_mode != SPEC_STORE_BYPASS_PRCTL)
+ if (ssb_mode != SPEC_STORE_BYPASS_PRCTL &&
+ ssb_mode != SPEC_STORE_BYPASS_SECCOMP)
return -ENXIO;
switch (ctrl) {
@@ -581,7 +593,8 @@ int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
#ifdef CONFIG_SECCOMP
void arch_seccomp_spec_mitigate(struct task_struct *task)
{
- ssb_prctl_set(task, PR_SPEC_FORCE_DISABLE);
+ if (ssb_mode == SPEC_STORE_BYPASS_SECCOMP)
+ ssb_prctl_set(task, PR_SPEC_FORCE_DISABLE);
}
#endif
@@ -590,6 +603,7 @@ static int ssb_prctl_get(struct task_struct *task)
switch (ssb_mode) {
case SPEC_STORE_BYPASS_DISABLE:
return PR_SPEC_DISABLE;
+ case SPEC_STORE_BYPASS_SECCOMP:
case SPEC_STORE_BYPASS_PRCTL:
if (task_spec_ssb_force_disable(task))
return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE;
From: Kees Cook <keescook(a)chromium.org>
commit 00a02d0c502a06d15e07b857f8ff921e3e402675 upstream
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
include/linux/seccomp.h | 3 +
include/uapi/linux/seccomp.h | 4 +
kernel/seccomp.c | 19 ++++--
tools/testing/selftests/seccomp/seccomp_bpf.c | 78 +++++++++++++++++++++++++
4 files changed, 93 insertions(+), 11 deletions(-)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 2296e6b..5a53d34 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,7 +3,8 @@
#include <uapi/linux/seccomp.h>
-#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
+#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
+ SECCOMP_FILTER_FLAG_SPEC_ALLOW)
#ifdef CONFIG_SECCOMP
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..e4acb61 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -15,7 +15,9 @@
#define SECCOMP_SET_MODE_FILTER 1
/* Valid flags for SECCOMP_SET_MODE_FILTER */
-#define SECCOMP_FILTER_FLAG_TSYNC 1
+#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0)
+/* In v4.14+ SECCOMP_FILTER_FLAG_LOG is (1UL << 1) */
+#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2)
/*
* All BPF programs must return a 32-bit value.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f33539f..4bb8a5a 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -230,7 +230,8 @@ static inline void spec_mitigate(struct task_struct *task,
}
static inline void seccomp_assign_mode(struct task_struct *task,
- unsigned long seccomp_mode)
+ unsigned long seccomp_mode,
+ unsigned long flags)
{
assert_spin_locked(&task->sighand->siglock);
@@ -240,8 +241,9 @@ static inline void seccomp_assign_mode(struct task_struct *task,
* filter) is set.
*/
smp_mb__before_atomic();
- /* Assume seccomp processes want speculation flaw mitigation. */
- spec_mitigate(task, PR_SPEC_STORE_BYPASS);
+ /* Assume default seccomp processes want spec flaw mitigation. */
+ if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0)
+ spec_mitigate(task, PR_SPEC_STORE_BYPASS);
set_tsk_thread_flag(task, TIF_SECCOMP);
}
@@ -309,7 +311,7 @@ static inline pid_t seccomp_can_sync_threads(void)
* without dropping the locks.
*
*/
-static inline void seccomp_sync_threads(void)
+static inline void seccomp_sync_threads(unsigned long flags)
{
struct task_struct *thread, *caller;
@@ -350,7 +352,8 @@ static inline void seccomp_sync_threads(void)
* allow one thread to transition the other.
*/
if (thread->seccomp.mode == SECCOMP_MODE_DISABLED)
- seccomp_assign_mode(thread, SECCOMP_MODE_FILTER);
+ seccomp_assign_mode(thread, SECCOMP_MODE_FILTER,
+ flags);
}
}
@@ -469,7 +472,7 @@ static long seccomp_attach_filter(unsigned int flags,
/* Now that the new filter is in place, synchronize to all threads. */
if (flags & SECCOMP_FILTER_FLAG_TSYNC)
- seccomp_sync_threads();
+ seccomp_sync_threads(flags);
return 0;
}
@@ -764,7 +767,7 @@ static long seccomp_set_mode_strict(void)
#ifdef TIF_NOTSC
disable_TSC();
#endif
- seccomp_assign_mode(current, seccomp_mode);
+ seccomp_assign_mode(current, seccomp_mode, 0);
ret = 0;
out:
@@ -822,7 +825,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
/* Do not free the successfully attached filter. */
prepared = NULL;
- seccomp_assign_mode(current, seccomp_mode);
+ seccomp_assign_mode(current, seccomp_mode, flags);
out:
spin_unlock_irq(¤t->sighand->siglock);
if (flags & SECCOMP_FILTER_FLAG_TSYNC)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 29487e0..b3f3454 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1477,7 +1477,11 @@ TEST_F(TRACE_syscall, syscall_dropped)
#endif
#ifndef SECCOMP_FILTER_FLAG_TSYNC
-#define SECCOMP_FILTER_FLAG_TSYNC 1
+#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0)
+#endif
+
+#ifndef SECCOMP_FILTER_FLAG_SPEC_ALLOW
+#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2)
#endif
#ifndef seccomp
@@ -1576,6 +1580,78 @@ TEST(seccomp_syscall_mode_lock)
}
}
+/*
+ * Test detection of known and unknown filter flags. Userspace needs to be able
+ * to check if a filter flag is supported by the current kernel and a good way
+ * of doing that is by attempting to enter filter mode, with the flag bit in
+ * question set, and a NULL pointer for the _args_ parameter. EFAULT indicates
+ * that the flag is valid and EINVAL indicates that the flag is invalid.
+ */
+TEST(detect_seccomp_filter_flags)
+{
+ unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC,
+ SECCOMP_FILTER_FLAG_SPEC_ALLOW };
+ unsigned int flag, all_flags;
+ int i;
+ long ret;
+
+ /* Test detection of known-good filter flags */
+ for (i = 0, all_flags = 0; i < ARRAY_SIZE(flags); i++) {
+ int bits = 0;
+
+ flag = flags[i];
+ /* Make sure the flag is a single bit! */
+ while (flag) {
+ if (flag & 0x1)
+ bits ++;
+ flag >>= 1;
+ }
+ ASSERT_EQ(1, bits);
+ flag = flags[i];
+
+ ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL);
+ ASSERT_NE(ENOSYS, errno) {
+ TH_LOG("Kernel does not support seccomp syscall!");
+ }
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EFAULT, errno) {
+ TH_LOG("Failed to detect that a known-good filter flag (0x%X) is supported!",
+ flag);
+ }
+
+ all_flags |= flag;
+ }
+
+ /* Test detection of all known-good filter flags */
+ ret = seccomp(SECCOMP_SET_MODE_FILTER, all_flags, NULL);
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EFAULT, errno) {
+ TH_LOG("Failed to detect that all known-good filter flags (0x%X) are supported!",
+ all_flags);
+ }
+
+ /* Test detection of an unknown filter flag */
+ flag = -1;
+ ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL);
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EINVAL, errno) {
+ TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported!",
+ flag);
+ }
+
+ /*
+ * Test detection of an unknown filter flag that may simply need to be
+ * added to this test
+ */
+ flag = flags[ARRAY_SIZE(flags) - 1] << 1;
+ ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL);
+ EXPECT_EQ(-1, ret);
+ EXPECT_EQ(EINVAL, errno) {
+ TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported! Does a new flag need to be added to this test?",
+ flag);
+ }
+}
+
TEST(TSYNC_first)
{
struct sock_filter filter[] = {
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 356e4bfff2c5489e016fdb925adbf12a1e3950ee upstream
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
Documentation/spec_ctrl.txt | 34 +++++++++++++++++++++-------------
arch/x86/kernel/cpu/bugs.c | 35 +++++++++++++++++++++++++----------
fs/proc/array.c | 3 +++
include/linux/sched.h | 9 +++++++++
include/uapi/linux/prctl.h | 1 +
5 files changed, 59 insertions(+), 23 deletions(-)
diff --git a/Documentation/spec_ctrl.txt b/Documentation/spec_ctrl.txt
index ddbebcd..1b3690d 100644
--- a/Documentation/spec_ctrl.txt
+++ b/Documentation/spec_ctrl.txt
@@ -25,19 +25,21 @@ PR_GET_SPECULATION_CTRL
-----------------------
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
-which is selected with arg2 of prctl(2). The return value uses bits 0-2 with
+which is selected with arg2 of prctl(2). The return value uses bits 0-3 with
the following meaning:
-==== ================ ===================================================
-Bit Define Description
-==== ================ ===================================================
-0 PR_SPEC_PRCTL Mitigation can be controlled per task by
- PR_SET_SPECULATION_CTRL
-1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
- disabled
-2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
- enabled
-==== ================ ===================================================
+==== ===================== ===================================================
+Bit Define Description
+==== ===================== ===================================================
+0 PR_SPEC_PRCTL Mitigation can be controlled per task by
+ PR_SET_SPECULATION_CTRL
+1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
+ disabled
+2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
+ enabled
+3 PR_SPEC_FORCE_DISABLE Same as PR_SPEC_DISABLE, but cannot be undone. A
+ subsequent prctl(..., PR_SPEC_ENABLE) will fail.
+==== ===================== ===================================================
If all bits are 0 the CPU is not affected by the speculation misfeature.
@@ -47,9 +49,11 @@ misfeature will fail.
PR_SET_SPECULATION_CTRL
-----------------------
+
PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of :manpage:`prctl(2)` per task. arg3 is used to hand
-in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
+in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE or
+PR_SPEC_FORCE_DISABLE.
Common error codes
------------------
@@ -70,10 +74,13 @@ Value Meaning
0 Success
ERANGE arg3 is incorrect, i.e. it's neither PR_SPEC_ENABLE nor
- PR_SPEC_DISABLE
+ PR_SPEC_DISABLE nor PR_SPEC_FORCE_DISABLE
ENXIO Control of the selected speculation misfeature is not possible.
See PR_GET_SPECULATION_CTRL.
+
+EPERM Speculation was disabled with PR_SPEC_FORCE_DISABLE and caller
+ tried to enable it again.
======= =================================================================
Speculation misfeature controls
@@ -84,3 +91,4 @@ Speculation misfeature controls
* prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, 0, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_FORCE_DISABLE, 0, 0);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 64b54a4..d6897ca 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -531,21 +531,37 @@ static void ssb_select_mitigation()
static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl)
{
- bool rds = !!test_tsk_thread_flag(task, TIF_RDS);
+ bool update;
if (ssb_mode != SPEC_STORE_BYPASS_PRCTL)
return -ENXIO;
- if (ctrl == PR_SPEC_ENABLE)
- clear_tsk_thread_flag(task, TIF_RDS);
- else
- set_tsk_thread_flag(task, TIF_RDS);
+ switch (ctrl) {
+ case PR_SPEC_ENABLE:
+ /* If speculation is force disabled, enable is not allowed */
+ if (task_spec_ssb_force_disable(task))
+ return -EPERM;
+ task_clear_spec_ssb_disable(task);
+ update = test_and_clear_tsk_thread_flag(task, TIF_RDS);
+ break;
+ case PR_SPEC_DISABLE:
+ task_set_spec_ssb_disable(task);
+ update = !test_and_set_tsk_thread_flag(task, TIF_RDS);
+ break;
+ case PR_SPEC_FORCE_DISABLE:
+ task_set_spec_ssb_disable(task);
+ task_set_spec_ssb_force_disable(task);
+ update = !test_and_set_tsk_thread_flag(task, TIF_RDS);
+ break;
+ default:
+ return -ERANGE;
+ }
/*
* If being set on non-current task, delay setting the CPU
* mitigation until it is next scheduled.
*/
- if (task == current && rds != !!test_tsk_thread_flag(task, TIF_RDS))
+ if (task == current && update)
speculative_store_bypass_update();
return 0;
@@ -557,7 +573,9 @@ static int ssb_prctl_get(struct task_struct *task)
case SPEC_STORE_BYPASS_DISABLE:
return PR_SPEC_DISABLE;
case SPEC_STORE_BYPASS_PRCTL:
- if (test_tsk_thread_flag(task, TIF_RDS))
+ if (task_spec_ssb_force_disable(task))
+ return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE;
+ if (task_spec_ssb_disable(task))
return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
default:
@@ -570,9 +588,6 @@ static int ssb_prctl_get(struct task_struct *task)
int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
unsigned long ctrl)
{
- if (ctrl != PR_SPEC_ENABLE && ctrl != PR_SPEC_DISABLE)
- return -ERANGE;
-
switch (which) {
case PR_SPEC_STORE_BYPASS:
return ssb_prctl_set(task, ctrl);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index bb48358..3141478 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -341,6 +341,9 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
case PR_SPEC_NOT_AFFECTED:
seq_printf(m, "not vulnerable");
break;
+ case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE:
+ seq_printf(m, "thread force mitigated");
+ break;
case PR_SPEC_PRCTL | PR_SPEC_DISABLE:
seq_printf(m, "thread mitigated");
break;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 90bea39..725498c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2167,6 +2167,8 @@ static inline void memalloc_noio_restore(unsigned int flags)
#define PFA_NO_NEW_PRIVS 0 /* May not gain new privileges. */
#define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */
#define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */
+#define PFA_SPEC_SSB_DISABLE 4 /* Speculative Store Bypass disabled */
+#define PFA_SPEC_SSB_FORCE_DISABLE 5 /* Speculative Store Bypass force disabled*/
#define TASK_PFA_TEST(name, func) \
@@ -2190,6 +2192,13 @@ TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
TASK_PFA_SET(SPREAD_SLAB, spread_slab)
TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
+TASK_PFA_TEST(SPEC_SSB_DISABLE, spec_ssb_disable)
+TASK_PFA_SET(SPEC_SSB_DISABLE, spec_ssb_disable)
+TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ssb_disable)
+
+TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable)
+TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable)
+
/*
* task->jobctl flags
*/
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3b316be..64776b7 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -207,5 +207,6 @@ struct prctl_mm_map {
# define PR_SPEC_PRCTL (1UL << 0)
# define PR_SPEC_ENABLE (1UL << 1)
# define PR_SPEC_DISABLE (1UL << 2)
+# define PR_SPEC_FORCE_DISABLE (1UL << 3)
#endif /* _LINUX_PRCTL_H */
From: Thomas Gleixner <tglx(a)linutronix.de>
commit a73ec77ee17ec556fe7f165d00314cb7c047b1ac upstream
Add prctl based control for Speculative Store Bypass mitigation and make it
the default mitigation for Intel and AMD.
Andi Kleen provided the following rationale (slightly redacted):
There are multiple levels of impact of Speculative Store Bypass:
1) JITed sandbox.
It cannot invoke system calls, but can do PRIME+PROBE and may have call
interfaces to other code
2) Native code process.
No protection inside the process at this level.
3) Kernel.
4) Between processes.
The prctl tries to protect against case (1) doing attacks.
If the untrusted code can do random system calls then control is already
lost in a much worse way. So there needs to be system call protection in
some way (using a JIT not allowing them or seccomp). Or rather if the
process can subvert its environment somehow to do the prctl it can already
execute arbitrary code, which is much worse than SSB.
To put it differently, the point of the prctl is to not allow JITed code
to read data it shouldn't read from its JITed sandbox. If it already has
escaped its sandbox then it can already read everything it wants in its
address space, and do much worse.
The ability to control Speculative Store Bypass allows to enable the
protection selectively without affecting overall system performance.
Based on an initial patch from Tim Chen. Completely rewritten.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
Documentation/kernel-parameters.txt | 6 ++
arch/x86/include/asm/nospec-branch.h | 1
arch/x86/kernel/cpu/bugs.c | 83 ++++++++++++++++++++++++++++++----
3 files changed, 79 insertions(+), 11 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index dc138b8..80202de 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3651,7 +3651,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
off - Unconditionally enable Speculative Store Bypass
auto - Kernel detects whether the CPU model contains an
implementation of Speculative Store Bypass and
- picks the most appropriate mitigation
+ picks the most appropriate mitigation.
+ prctl - Control Speculative Store Bypass per thread
+ via prctl. Speculative Store Bypass is enabled
+ for a process by default. The state of the control
+ is inherited on fork.
Not specifying this option is equivalent to
spec_store_bypass_disable=auto.
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 47c454c..155d955 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -187,6 +187,7 @@ extern u64 x86_spec_ctrl_get_default(void);
enum ssb_mitigation {
SPEC_STORE_BYPASS_NONE,
SPEC_STORE_BYPASS_DISABLE,
+ SPEC_STORE_BYPASS_PRCTL,
};
extern char __indirect_thunk_start[];
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0f8303e..bcfccd3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -11,6 +11,8 @@
#include <linux/utsname.h>
#include <linux/cpu.h>
#include <linux/module.h>
+#include <linux/nospec.h>
+#include <linux/prctl.h>
#include <asm/spec-ctrl.h>
#include <asm/cmdline.h>
@@ -411,20 +413,23 @@ enum ssb_mitigation_cmd {
SPEC_STORE_BYPASS_CMD_NONE,
SPEC_STORE_BYPASS_CMD_AUTO,
SPEC_STORE_BYPASS_CMD_ON,
+ SPEC_STORE_BYPASS_CMD_PRCTL,
};
static const char *ssb_strings[] = {
[SPEC_STORE_BYPASS_NONE] = "Vulnerable",
- [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled"
+ [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled",
+ [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl"
};
static const struct {
const char *option;
enum ssb_mitigation_cmd cmd;
} ssb_mitigation_options[] = {
- { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
- { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
- { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
+ { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
+ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
+ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
+ { "prctl", SPEC_STORE_BYPASS_CMD_PRCTL }, /* Disable Speculative Store Bypass via prctl */
};
static enum ssb_mitigation_cmd __init ssb_parse_cmdline(void)
@@ -474,14 +479,15 @@ static enum ssb_mitigation_cmd __init __ssb_select_mitigation(void)
switch (cmd) {
case SPEC_STORE_BYPASS_CMD_AUTO:
- /*
- * AMD platforms by default don't need SSB mitigation.
- */
- if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
- break;
+ /* Choose prctl as the default mode */
+ mode = SPEC_STORE_BYPASS_PRCTL;
+ break;
case SPEC_STORE_BYPASS_CMD_ON:
mode = SPEC_STORE_BYPASS_DISABLE;
break;
+ case SPEC_STORE_BYPASS_CMD_PRCTL:
+ mode = SPEC_STORE_BYPASS_PRCTL;
+ break;
case SPEC_STORE_BYPASS_CMD_NONE:
break;
}
@@ -492,7 +498,7 @@ static enum ssb_mitigation_cmd __init __ssb_select_mitigation(void)
* - X86_FEATURE_RDS - CPU is able to turn off speculative store bypass
* - X86_FEATURE_SPEC_STORE_BYPASS_DISABLE - engage the mitigation
*/
- if (mode != SPEC_STORE_BYPASS_NONE) {
+ if (mode == SPEC_STORE_BYPASS_DISABLE) {
setup_force_cpu_cap(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE);
/*
* Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD uses
@@ -523,6 +529,63 @@ static void ssb_select_mitigation()
#undef pr_fmt
+static int ssb_prctl_set(unsigned long ctrl)
+{
+ bool rds = !!test_tsk_thread_flag(current, TIF_RDS);
+
+ if (ssb_mode != SPEC_STORE_BYPASS_PRCTL)
+ return -ENXIO;
+
+ if (ctrl == PR_SPEC_ENABLE)
+ clear_tsk_thread_flag(current, TIF_RDS);
+ else
+ set_tsk_thread_flag(current, TIF_RDS);
+
+ if (rds != !!test_tsk_thread_flag(current, TIF_RDS))
+ speculative_store_bypass_update();
+
+ return 0;
+}
+
+static int ssb_prctl_get(void)
+{
+ switch (ssb_mode) {
+ case SPEC_STORE_BYPASS_DISABLE:
+ return PR_SPEC_DISABLE;
+ case SPEC_STORE_BYPASS_PRCTL:
+ if (test_tsk_thread_flag(current, TIF_RDS))
+ return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
+ return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
+ default:
+ if (boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
+ return PR_SPEC_ENABLE;
+ return PR_SPEC_NOT_AFFECTED;
+ }
+}
+
+int arch_prctl_spec_ctrl_set(unsigned long which, unsigned long ctrl)
+{
+ if (ctrl != PR_SPEC_ENABLE && ctrl != PR_SPEC_DISABLE)
+ return -ERANGE;
+
+ switch (which) {
+ case PR_SPEC_STORE_BYPASS:
+ return ssb_prctl_set(ctrl);
+ default:
+ return -ENODEV;
+ }
+}
+
+int arch_prctl_spec_ctrl_get(unsigned long which)
+{
+ switch (which) {
+ case PR_SPEC_STORE_BYPASS:
+ return ssb_prctl_get();
+ default:
+ return -ENODEV;
+ }
+}
+
void x86_spec_ctrl_setup_ap(void)
{
if (boot_cpu_has(X86_FEATURE_IBRS))
From: Kyle Huey <me(a)kylehuey.com>
commit b9894a2f5bd18b1691cb6872c9afe32b148d0132 upstream
The debug control MSR is "highly magical" as the blockstep bit can be
cleared by hardware under not well documented circumstances.
So a task switch relying on the bit set by the previous task (according to
the previous tasks thread flags) can trip over this and not update the flag
for the next task.
To fix this its required to handle DEBUGCTLMSR_BTF when either the previous
or the next or both tasks have the TIF_BLOCKSTEP flag set.
While at it avoid branching within the TIF_BLOCKSTEP case and evaluating
boot_cpu_data twice in kernels without CONFIG_X86_DEBUGCTLMSR.
x86_64: arch/x86/kernel/process.o
text data bss dec hex
3024 8577 16 11617 2d61 Before
3008 8577 16 11601 2d51 After
i386: No change
[ tglx: Made the shift value explicit, use a local variable to make the
code readable and massaged changelog]
Originally-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Kyle Huey <khuey(a)kylehuey.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Link: http://lkml.kernel.org/r/20170214081104.9244-3-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/process.c | 12 +++++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a29edb7..71a2c84 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -150,6 +150,7 @@
/* DEBUGCTLMSR bits (others vary by model): */
#define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */
+#define DEBUGCTLMSR_BTF_SHIFT 1
#define DEBUGCTLMSR_BTF (1UL << 1) /* single-step on branches */
#define DEBUGCTLMSR_TR (1UL << 6)
#define DEBUGCTLMSR_BTS (1UL << 7)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index cc0f288..166aef3 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -223,13 +223,15 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
propagate_user_return_notify(prev_p, next_p);
- if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
- unsigned long debugctl = get_debugctlmsr();
+ if ((tifp & _TIF_BLOCKSTEP || tifn & _TIF_BLOCKSTEP) &&
+ arch_has_block_step()) {
+ unsigned long debugctl, msk;
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
debugctl &= ~DEBUGCTLMSR_BTF;
- if (tifn & _TIF_BLOCKSTEP)
- debugctl |= DEBUGCTLMSR_BTF;
- update_debugctlmsr(debugctl);
+ msk = tifn & _TIF_BLOCKSTEP;
+ debugctl |= (msk >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BTF_SHIFT;
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
}
if ((tifp ^ tifn) & _TIF_NOTSC) {
From: Thomas Gleixner <tglx(a)linutronix.de>
commit b617cfc858161140d69cc0b5cc211996b557a1c7 upstream
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:
Bit Define Description
0 PR_SPEC_PRCTL Mitigation can be controlled per task by
PR_SET_SPECULATION_CTRL
1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
disabled
2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
enabled
If all bits are 0 the CPU is not affected by the speculation misfeature.
If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
The common return values are:
EINVAL prctl is not implemented by the architecture or the unused prctl()
arguments are not 0
ENODEV arg2 is selecting a not supported speculation misfeature
PR_SET_SPECULATION_CTRL has these additional return values:
ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO prctl control of the selected speculation misfeature is disabled
The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.
Based on an initial patch from Tim Chen and mostly rewritten.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
Documentation/spec_ctrl.txt | 86 +++++++++++++++++++++++++++++++++++++++++++
include/linux/nospec.h | 5 +++
include/uapi/linux/prctl.h | 11 ++++++
kernel/sys.c | 20 ++++++++++
4 files changed, 122 insertions(+)
create mode 100644 Documentation/spec_ctrl.txt
diff --git a/Documentation/spec_ctrl.txt b/Documentation/spec_ctrl.txt
new file mode 100644
index 0000000..ddbebcd
--- /dev/null
+++ b/Documentation/spec_ctrl.txt
@@ -0,0 +1,86 @@
+===================
+Speculation Control
+===================
+
+Quite some CPUs have speculation related misfeatures which are in fact
+vulnerabilites causing data leaks in various forms even accross privilege
+domains.
+
+The kernel provides mitigation for such vulnerabilities in various
+forms. Some of these mitigations are compile time configurable and some on
+the kernel command line.
+
+There is also a class of mitigations which are very expensive, but they can
+be restricted to a certain set of processes or tasks in controlled
+environments. The mechanism to control these mitigations is via
+:manpage:`prctl(2)`.
+
+There are two prctl options which are related to this:
+
+ * PR_GET_SPECULATION_CTRL
+
+ * PR_SET_SPECULATION_CTRL
+
+PR_GET_SPECULATION_CTRL
+-----------------------
+
+PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
+which is selected with arg2 of prctl(2). The return value uses bits 0-2 with
+the following meaning:
+
+==== ================ ===================================================
+Bit Define Description
+==== ================ ===================================================
+0 PR_SPEC_PRCTL Mitigation can be controlled per task by
+ PR_SET_SPECULATION_CTRL
+1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
+ disabled
+2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
+ enabled
+==== ================ ===================================================
+
+If all bits are 0 the CPU is not affected by the speculation misfeature.
+
+If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
+available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
+misfeature will fail.
+
+PR_SET_SPECULATION_CTRL
+-----------------------
+PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
+is selected by arg2 of :manpage:`prctl(2)` per task. arg3 is used to hand
+in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
+
+Common error codes
+------------------
+======= =================================================================
+Value Meaning
+======= =================================================================
+EINVAL The prctl is not implemented by the architecture or unused
+ prctl(2) arguments are not 0
+
+ENODEV arg2 is selecting a not supported speculation misfeature
+======= =================================================================
+
+PR_SET_SPECULATION_CTRL error codes
+-----------------------------------
+======= =================================================================
+Value Meaning
+======= =================================================================
+0 Success
+
+ERANGE arg3 is incorrect, i.e. it's neither PR_SPEC_ENABLE nor
+ PR_SPEC_DISABLE
+
+ENXIO Control of the selected speculation misfeature is not possible.
+ See PR_GET_SPECULATION_CTRL.
+======= =================================================================
+
+Speculation misfeature controls
+-------------------------------
+- PR_SPEC_STORE_BYPASS: Speculative Store Bypass
+
+ Invocations:
+ * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, 0, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0);
diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index e791ebc..700bb8a 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -55,4 +55,9 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
\
(typeof(_i)) (_i & _mask); \
})
+
+/* Speculation control prctl */
+int arch_prctl_spec_ctrl_get(unsigned long which);
+int arch_prctl_spec_ctrl_set(unsigned long which, unsigned long ctrl);
+
#endif /* _LINUX_NOSPEC_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..3b316be 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,15 @@ struct prctl_mm_map {
# define PR_CAP_AMBIENT_LOWER 3
# define PR_CAP_AMBIENT_CLEAR_ALL 4
+/* Per task speculation control */
+#define PR_GET_SPECULATION_CTRL 52
+#define PR_SET_SPECULATION_CTRL 53
+/* Speculation control variants */
+# define PR_SPEC_STORE_BYPASS 0
+/* Return and control values for PR_SET/GET_SPECULATION_CTRL */
+# define PR_SPEC_NOT_AFFECTED 0
+# define PR_SPEC_PRCTL (1UL << 0)
+# define PR_SPEC_ENABLE (1UL << 1)
+# define PR_SPEC_DISABLE (1UL << 2)
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 6624919..d80c33f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2075,6 +2075,16 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
}
#endif
+int __weak arch_prctl_spec_ctrl_get(unsigned long which)
+{
+ return -EINVAL;
+}
+
+int __weak arch_prctl_spec_ctrl_set(unsigned long which, unsigned long ctrl)
+{
+ return -EINVAL;
+}
+
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
@@ -2269,6 +2279,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_GET_FP_MODE:
error = GET_FP_MODE(me);
break;
+ case PR_GET_SPECULATION_CTRL:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_prctl_spec_ctrl_get(arg2);
+ break;
+ case PR_SET_SPECULATION_CTRL:
+ if (arg4 || arg5)
+ return -EINVAL;
+ error = arch_prctl_spec_ctrl_set(arg2, arg3);
+ break;
default:
error = -EINVAL;
break;
From: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
commit 24f7fc83b9204d20f878c57cb77d261ae825e033 upstream
Contemporary high performance processors use a common industry-wide
optimization known as "Speculative Store Bypass" in which loads from
addresses to which a recent store has occurred may (speculatively) see an
older value. Intel refers to this feature as "Memory Disambiguation" which
is part of their "Smart Memory Access" capability.
Memory Disambiguation can expose a cache side-channel attack against such
speculatively read values. An attacker can create exploit code that allows
them to read memory outside of a sandbox environment (for example,
malicious JavaScript in a web page), or to perform more complex attacks
against code running within the same privilege level, e.g. via the stack.
As a first step to mitigate against such attacks, provide two boot command
line control knobs:
nospec_store_bypass_disable
spec_store_bypass_disable=[off,auto,on]
By default affected x86 processors will power on with Speculative
Store Bypass enabled. Hence the provided kernel parameters are written
from the point of view of whether to enable a mitigation or not.
The parameters are as follows:
- auto - Kernel detects whether your CPU model contains an implementation
of Speculative Store Bypass and picks the most appropriate
mitigation.
- on - disable Speculative Store Bypass
- off - enable Speculative Store Bypass
[ tglx: Reordered the checks so that the whole evaluation is not done
when the CPU does not support RDS ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
Documentation/kernel-parameters.txt | 33 +++++++++++
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/nospec-branch.h | 6 ++
arch/x86/kernel/cpu/bugs.c | 103 ++++++++++++++++++++++++++++++++++
4 files changed, 143 insertions(+)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e60d0b5..dc138b8 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2460,6 +2460,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
allow data leaks with this option, which is equivalent
to spectre_v2=off.
+ nospec_store_bypass_disable
+ [HW] Disable all mitigations for the Speculative Store Bypass vulnerability
+
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
@@ -3623,6 +3626,36 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Not specifying this option is equivalent to
spectre_v2=auto.
+ spec_store_bypass_disable=
+ [HW] Control Speculative Store Bypass (SSB) Disable mitigation
+ (Speculative Store Bypass vulnerability)
+
+ Certain CPUs are vulnerable to an exploit against a
+ a common industry wide performance optimization known
+ as "Speculative Store Bypass" in which recent stores
+ to the same memory location may not be observed by
+ later loads during speculative execution. The idea
+ is that such stores are unlikely and that they can
+ be detected prior to instruction retirement at the
+ end of a particular speculation execution window.
+
+ In vulnerable processors, the speculatively forwarded
+ store can be used in a cache side channel attack, for
+ example to read memory to which the attacker does not
+ directly have access (e.g. inside sandboxed code).
+
+ This parameter controls whether the Speculative Store
+ Bypass optimization is used.
+
+ on - Unconditionally disable Speculative Store Bypass
+ off - Unconditionally enable Speculative Store Bypass
+ auto - Kernel detects whether the CPU model contains an
+ implementation of Speculative Store Bypass and
+ picks the most appropriate mitigation
+
+ Not specifying this option is equivalent to
+ spec_store_bypass_disable=auto.
+
spia_io_base= [HW,MTD]
spia_fio_base=
spia_pedr=
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3fce65d..9510f5f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -203,6 +203,7 @@
#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled*/
#define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */
+#define X86_FEATURE_SPEC_STORE_BYPASS_DISABLE ( 7*32+23) /* "" Disable Speculative Store Bypass. */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 11db69a..c786d01 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -193,6 +193,12 @@ extern u64 x86_spec_ctrl_get_default(void);
extern void x86_spec_ctrl_set_guest(u64);
extern void x86_spec_ctrl_restore_host(u64);
+/* The Speculative Store Bypass disable variants */
+enum ssb_mitigation {
+ SPEC_STORE_BYPASS_NONE,
+ SPEC_STORE_BYPASS_DISABLE,
+};
+
extern char __indirect_thunk_start[];
extern char __indirect_thunk_end[];
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0ad13b1..826aa81 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -26,6 +26,7 @@
#include <asm/intel-family.h>
static void __init spectre_v2_select_mitigation(void);
+static void __init ssb_select_mitigation(void);
/*
* Our boot-time value of the SPEC_CTRL MSR. We read it once so that any
@@ -52,6 +53,12 @@ void __init check_bugs(void)
/* Select the proper spectre mitigation before patching alternatives */
spectre_v2_select_mitigation();
+ /*
+ * Select proper mitigation for any exposure to the Speculative Store
+ * Bypass vulnerability.
+ */
+ ssb_select_mitigation();
+
#ifdef CONFIG_X86_32
/*
* Check whether we are able to run this kernel safely on SMP.
@@ -357,6 +364,99 @@ retpoline_auto:
}
#undef pr_fmt
+#define pr_fmt(fmt) "Speculative Store Bypass: " fmt
+
+static enum ssb_mitigation ssb_mode = SPEC_STORE_BYPASS_NONE;
+
+/* The kernel command line selection */
+enum ssb_mitigation_cmd {
+ SPEC_STORE_BYPASS_CMD_NONE,
+ SPEC_STORE_BYPASS_CMD_AUTO,
+ SPEC_STORE_BYPASS_CMD_ON,
+};
+
+static const char *ssb_strings[] = {
+ [SPEC_STORE_BYPASS_NONE] = "Vulnerable",
+ [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled"
+};
+
+static const struct {
+ const char *option;
+ enum ssb_mitigation_cmd cmd;
+} ssb_mitigation_options[] = {
+ { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
+ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
+ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
+};
+
+static enum ssb_mitigation_cmd __init ssb_parse_cmdline(void)
+{
+ enum ssb_mitigation_cmd cmd = SPEC_STORE_BYPASS_CMD_AUTO;
+ char arg[20];
+ int ret, i;
+
+ if (cmdline_find_option_bool(boot_command_line, "nospec_store_bypass_disable")) {
+ return SPEC_STORE_BYPASS_CMD_NONE;
+ } else {
+ ret = cmdline_find_option(boot_command_line, "spec_store_bypass_disable",
+ arg, sizeof(arg));
+ if (ret < 0)
+ return SPEC_STORE_BYPASS_CMD_AUTO;
+
+ for (i = 0; i < ARRAY_SIZE(ssb_mitigation_options); i++) {
+ if (!match_option(arg, ret, ssb_mitigation_options[i].option))
+ continue;
+
+ cmd = ssb_mitigation_options[i].cmd;
+ break;
+ }
+
+ if (i >= ARRAY_SIZE(ssb_mitigation_options)) {
+ pr_err("unknown option (%s). Switching to AUTO select\n", arg);
+ return SPEC_STORE_BYPASS_CMD_AUTO;
+ }
+ }
+
+ return cmd;
+}
+
+static enum ssb_mitigation_cmd __init __ssb_select_mitigation(void)
+{
+ enum ssb_mitigation mode = SPEC_STORE_BYPASS_NONE;
+ enum ssb_mitigation_cmd cmd;
+
+ if (!boot_cpu_has(X86_FEATURE_RDS))
+ return mode;
+
+ cmd = ssb_parse_cmdline();
+ if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS) &&
+ (cmd == SPEC_STORE_BYPASS_CMD_NONE ||
+ cmd == SPEC_STORE_BYPASS_CMD_AUTO))
+ return mode;
+
+ switch (cmd) {
+ case SPEC_STORE_BYPASS_CMD_AUTO:
+ case SPEC_STORE_BYPASS_CMD_ON:
+ mode = SPEC_STORE_BYPASS_DISABLE;
+ break;
+ case SPEC_STORE_BYPASS_CMD_NONE:
+ break;
+ }
+
+ if (mode != SPEC_STORE_BYPASS_NONE)
+ setup_force_cpu_cap(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE);
+ return mode;
+}
+
+static void ssb_select_mitigation()
+{
+ ssb_mode = __ssb_select_mitigation();
+
+ if (boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
+ pr_info("%s\n", ssb_strings[ssb_mode]);
+}
+
+#undef pr_fmt
#ifdef CONFIG_SYSFS
@@ -382,6 +482,9 @@ ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
spectre_v2_module_string());
+ case X86_BUG_SPEC_STORE_BYPASS:
+ return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
+
default:
break;
}
From: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
commit 5cf687548705412da47c9cec342fd952d71ed3d5 upstream
A guest may modify the SPEC_CTRL MSR from the value used by the
kernel. Since the kernel doesn't use IBRS, this means a value of zero is
what is needed in the host.
But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to
the other bits as reserved so the kernel should respect the boot time
SPEC_CTRL value and use that.
This allows to deal with future extensions to the SPEC_CTRL interface if
any at all.
Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any
difference as paravirt will over-write the callq *0xfff.. with the wrmsrl
assembler code.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Srivatsa: Backported to 4.4.y, skipping the KVM changes in this patch. ]
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/nospec-branch.h | 10 ++++++++++
arch/x86/kernel/cpu/bugs.c | 18 ++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index daec318..11db69a 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -183,6 +183,16 @@ enum spectre_v2_mitigation {
extern void x86_spec_ctrl_set(u64);
extern u64 x86_spec_ctrl_get_default(void);
+/*
+ * On VMENTER we must preserve whatever view of the SPEC_CTRL MSR
+ * the guest has, while on VMEXIT we restore the host view. This
+ * would be easier if SPEC_CTRL were architecturally maskable or
+ * shadowable for guests but this is not (currently) the case.
+ * Takes the guest view of SPEC_CTRL MSR as a parameter.
+ */
+extern void x86_spec_ctrl_set_guest(u64);
+extern void x86_spec_ctrl_restore_host(u64);
+
extern char __indirect_thunk_start[];
extern char __indirect_thunk_end[];
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 42c2204..e71e281 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -122,6 +122,24 @@ u64 x86_spec_ctrl_get_default(void)
}
EXPORT_SYMBOL_GPL(x86_spec_ctrl_get_default);
+void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl)
+{
+ if (!boot_cpu_has(X86_FEATURE_IBRS))
+ return;
+ if (x86_spec_ctrl_base != guest_spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, guest_spec_ctrl);
+}
+EXPORT_SYMBOL_GPL(x86_spec_ctrl_set_guest);
+
+void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl)
+{
+ if (!boot_cpu_has(X86_FEATURE_IBRS))
+ return;
+ if (x86_spec_ctrl_base != guest_spec_ctrl)
+ wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+}
+EXPORT_SYMBOL_GPL(x86_spec_ctrl_restore_host);
+
#ifdef RETPOLINE
static bool spectre_v2_bad_module;
From: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
commit 1b86883ccb8d5d9506529d42dbe1a5257cb30b18 upstream
The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all
the other bits as reserved. The Intel SDM glossary defines reserved as
implementation specific - aka unknown.
As such at bootup this must be taken it into account and proper masking for
the bits in use applied.
A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511
[ tglx: Made x86_spec_ctrl_base __ro_after_init ]
[ Srivatsa: Removed __ro_after_init for 4.4.y ]
Suggested-by: Jon Masters <jcm(a)redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/nospec-branch.h | 24 ++++++++++++++++++++----
arch/x86/kernel/cpu/bugs.c | 27 +++++++++++++++++++++++++++
2 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 6403016..daec318 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -172,6 +172,17 @@ enum spectre_v2_mitigation {
SPECTRE_V2_IBRS,
};
+/*
+ * The Intel specification for the SPEC_CTRL MSR requires that we
+ * preserve any already set reserved bits at boot time (e.g. for
+ * future additions that this kernel is not currently aware of).
+ * We then set any additional mitigation bits that we want
+ * ourselves and always use this as the base for SPEC_CTRL.
+ * We also use this when handling guest entry/exit as below.
+ */
+extern void x86_spec_ctrl_set(u64);
+extern u64 x86_spec_ctrl_get_default(void);
+
extern char __indirect_thunk_start[];
extern char __indirect_thunk_end[];
@@ -208,8 +219,9 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
static inline void indirect_branch_prediction_barrier(void)
{
- alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
- X86_FEATURE_USE_IBPB);
+ u64 val = PRED_CMD_IBPB;
+
+ alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
}
/*
@@ -220,14 +232,18 @@ static inline void indirect_branch_prediction_barrier(void)
*/
#define firmware_restrict_branch_speculation_start() \
do { \
+ u64 val = x86_spec_ctrl_get_default() | SPEC_CTRL_IBRS; \
+ \
preempt_disable(); \
- alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, \
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
X86_FEATURE_USE_IBRS_FW); \
} while (0)
#define firmware_restrict_branch_speculation_end() \
do { \
- alternative_msr_write(MSR_IA32_SPEC_CTRL, 0, \
+ u64 val = x86_spec_ctrl_get_default(); \
+ \
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \
X86_FEATURE_USE_IBRS_FW); \
preempt_enable(); \
} while (0)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 75f3d49..42c2204 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -27,6 +27,12 @@
static void __init spectre_v2_select_mitigation(void);
+/*
+ * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any
+ * writes to SPEC_CTRL contain whatever reserved bits have been set.
+ */
+static u64 x86_spec_ctrl_base;
+
void __init check_bugs(void)
{
identify_boot_cpu();
@@ -36,6 +42,13 @@ void __init check_bugs(void)
print_cpu_info(&boot_cpu_data);
}
+ /*
+ * Read the SPEC_CTRL MSR to account for reserved bits which may
+ * have unknown values.
+ */
+ if (boot_cpu_has(X86_FEATURE_IBRS))
+ rdmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+
/* Select the proper spectre mitigation before patching alternatives */
spectre_v2_select_mitigation();
@@ -94,6 +107,20 @@ static const char *spectre_v2_strings[] = {
static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE;
+void x86_spec_ctrl_set(u64 val)
+{
+ if (val & ~SPEC_CTRL_IBRS)
+ WARN_ONCE(1, "SPEC_CTRL MSR value 0x%16llx is unknown.\n", val);
+ else
+ wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base | val);
+}
+EXPORT_SYMBOL_GPL(x86_spec_ctrl_set);
+
+u64 x86_spec_ctrl_get_default(void)
+{
+ return x86_spec_ctrl_base;
+}
+EXPORT_SYMBOL_GPL(x86_spec_ctrl_get_default);
#ifdef RETPOLINE
static bool spectre_v2_bad_module;
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 1aa7a5735a41418d8e01fa7c9565eb2657e2ea3f upstream
The macro is not type safe and I did look for why that "g" constraint for
the asm doesn't work: it's because the asm is more fundamentally wrong.
It does
movl %[val], %%eax
but "val" isn't a 32-bit value, so then gcc will pass it in a register,
and generate code like
movl %rsi, %eax
and gas will complain about a nonsensical 'mov' instruction (it's moving a
64-bit register to a 32-bit one).
Passing it through memory will just hide the real bug - gcc still thinks
the memory location is 64-bit, but the "movl" will only load the first 32
bits and it all happens to work because x86 is little-endian.
Convert it to a type safe inline function with a little trick which hands
the feature into the ALTERNATIVE macro.
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/nospec-branch.h | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index b9dd1d9..6403016 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -195,15 +195,16 @@ static inline void vmexit_fill_RSB(void)
#endif
}
-#define alternative_msr_write(_msr, _val, _feature) \
- asm volatile(ALTERNATIVE("", \
- "movl %[msr], %%ecx\n\t" \
- "movl %[val], %%eax\n\t" \
- "movl $0, %%edx\n\t" \
- "wrmsr", \
- _feature) \
- : : [msr] "i" (_msr), [val] "i" (_val) \
- : "eax", "ecx", "edx", "memory")
+static __always_inline
+void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
+{
+ asm volatile(ALTERNATIVE("", "wrmsr", %c[feature])
+ : : "c" (msr),
+ "a" (val),
+ "d" (val >> 32),
+ [feature] "i" (feature)
+ : "memory");
+}
static inline void indirect_branch_prediction_barrier(void)
{
From: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
commit 36268223c1e9981d6cfc33aff8520b3bde4b8114 upstream.
As:
1) It's known that hypervisors lie about the environment anyhow (host
mismatch)
2) Even if the hypervisor (Xen, KVM, VMWare, etc) provided a valid
"correct" value, it all gets to be very murky when migration happens
(do you provide the "new" microcode of the machine?).
And in reality the cloud vendors are the ones that should make sure that
the microcode that is running is correct and we should just sing lalalala
and trust them.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Wanpeng Li <kernellwp(a)gmail.com>
Cc: kvm <kvm(a)vger.kernel.org>
Cc: Krčmář <rkrcmar(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
CC: "H. Peter Anvin" <hpa(a)zytor.com>
CC: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20180226213019.GE9497@char.us.oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/cpu/intel.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index b69d258..dcc0349 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -68,6 +68,13 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
{
int i;
+ /*
+ * We know that the hypervisor lie to us on the microcode version so
+ * we may as well hope that it is running the correct version.
+ */
+ if (cpu_has(c, X86_FEATURE_HYPERVISOR))
+ return false;
+
for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) {
if (c->x86_model == spectre_bad_microcodes[i].model &&
c->x86_mask == spectre_bad_microcodes[i].stepping)
From: Tim Chen <tim.c.chen(a)linux.intel.com>
commit 18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7 upstream.
Flush indirect branches when switching into a process that marked itself
non dumpable. This protects high value processes like gpg better,
without having too high performance overhead.
If done naïvely, we could switch to a kernel idle thread and then back
to the original process, such as:
process A -> idle -> process A
In such scenario, we do not have to do IBPB here even though the process
is non-dumpable, as we are switching back to the same process after a
hiatus.
To avoid the redundant IBPB, which is expensive, we track the last mm
user context ID. The cost is to have an extra u64 mm context id to track
the last mm we were using before switching to the init_mm used by idle.
Avoiding the extra IBPB is probably worth the extra memory for this
common scenario.
For those cases where tlb_defer_switch_to_init_mm() returns true (non
PCID), lazy tlb will defer switch to init_mm, so we will not be changing
the mm for the process A -> idle -> process A switch. So IBPB will be
skipped for this case.
Thanks to the reviewers and Andy Lutomirski for the suggestion of
using ctx_id which got rid of the problem of mm pointer recycling.
Signed-off-by: Tim Chen <tim.c.chen(a)linux.intel.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: ak(a)linux.intel.com
Cc: karahmed(a)amazon.de
Cc: arjan(a)linux.intel.com
Cc: torvalds(a)linux-foundation.org
Cc: linux(a)dominikbrodowski.net
Cc: peterz(a)infradead.org
Cc: bp(a)alien8.de
Cc: luto(a)kernel.org
Cc: pbonzini(a)redhat.com
Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/tlbflush.h | 2 ++
arch/x86/mm/tlb.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index e2a89d2..8ce07db 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -68,6 +68,8 @@ static inline void invpcid_flush_all_nonglobals(void)
struct tlb_state {
struct mm_struct *active_mm;
int state;
+ /* last user mm's ctx id */
+ u64 last_ctx_id;
/*
* Access to this CR4 shadow and to H/W CR4 is protected by
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index efec198..6d683bb 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -10,6 +10,7 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
@@ -106,6 +107,36 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
unsigned cpu = smp_processor_id();
if (likely(prev != next)) {
+ u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
+
+ /*
+ * Avoid user/user BTB poisoning by flushing the branch
+ * predictor when switching between processes. This stops
+ * one process from doing Spectre-v2 attacks on another.
+ *
+ * As an optimization, flush indirect branches only when
+ * switching into processes that disable dumping. This
+ * protects high value processes like gpg, without having
+ * too high performance overhead. IBPB is *expensive*!
+ *
+ * This will not flush branches when switching into kernel
+ * threads. It will also not flush if we switch to idle
+ * thread and back to the same process. It will flush if we
+ * switch to a different non-dumpable process.
+ */
+ if (tsk && tsk->mm &&
+ tsk->mm->context.ctx_id != last_ctx_id &&
+ get_dumpable(tsk->mm) != SUID_DUMP_USER)
+ indirect_branch_prediction_barrier();
+
+ /*
+ * Record last user mm's context id, so we can avoid
+ * flushing branch buffer with IBPB if we switch back
+ * to the same user.
+ */
+ if (next != &init_mm)
+ this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
this_cpu_write(cpu_tlbstate.active_mm, next);
cpumask_set_cpu(cpu, mm_cpumask(next));
From: Andy Lutomirski <luto(a)kernel.org>
commit f39681ed0f48498b80455095376f11535feea332 upstream.
This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
ctx_id uniquely identifies the mm_struct and will never be reused.
For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
count of the number of times that a TLB flush has been requested.
The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
flush actions and will be used in subsequent patches to reliably
determine whether all needed TLB flushes have occurred on a given
CPU.
This patch is split out for ease of review. By itself, it has no
real effect other than creating and updating the new variables.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Nadav Amit <nadav.amit(a)gmail.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/413a91c24dab3ed0caa5f4e4d017d87b0857f920.149875120…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Tim Chen <tim.c.chen(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/mmu.h | 15 +++++++++++++--
arch/x86/include/asm/mmu_context.h | 4 ++++
arch/x86/mm/tlb.c | 2 ++
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 7680b76..3359dfe 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -3,12 +3,18 @@
#include <linux/spinlock.h>
#include <linux/mutex.h>
+#include <linux/atomic.h>
/*
- * The x86 doesn't have a mmu context, but
- * we put the segment information here.
+ * x86 has arch-specific MMU state beyond what lives in mm_struct.
*/
typedef struct {
+ /*
+ * ctx_id uniquely identifies this mm_struct. A ctx_id will never
+ * be reused, and zero is not a valid ctx_id.
+ */
+ u64 ctx_id;
+
#ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
#endif
@@ -24,6 +30,11 @@ typedef struct {
atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */
} mm_context_t;
+#define INIT_MM_CONTEXT(mm) \
+ .context = { \
+ .ctx_id = 1, \
+ }
+
void leave_mm(int cpu);
#endif /* _ASM_X86_MMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1c4794f..effc127 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -11,6 +11,9 @@
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
#include <asm/mpx.h>
+
+extern atomic64_t last_mm_ctx_id;
+
#ifndef CONFIG_PARAVIRT
static inline void paravirt_activate_mm(struct mm_struct *prev,
struct mm_struct *next)
@@ -105,6 +108,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
static inline int init_new_context(struct task_struct *tsk,
struct mm_struct *mm)
{
+ mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
init_new_context_ldt(tsk, mm);
return 0;
}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7cad01af..efec198 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -29,6 +29,8 @@
* Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
*/
+atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
+
struct flush_tlb_info {
struct mm_struct *flush_mm;
unsigned long flush_start;
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit d37fc6d360a404b208547ba112e7dabb6533c7fc upstream.
Arjan points out that the Intel document only clears the 0xc2 microcode
on *some* parts with CPUID 506E3 (INTEL_FAM6_SKYLAKE_DESKTOP stepping 3).
For the Skylake H/S platform it's OK but for Skylake E3 which has the
same CPUID it isn't (yet) cleared.
So removing it from the blacklist was premature. Put it back for now.
Also, Arjan assures me that the 0x84 microcode for Kaby Lake which was
featured in one of the early revisions of the Intel document was never
released to the public, and won't be until/unless it is also validated
as safe. So those can change to 0x80 which is what all *other* versions
of the doc have identified.
Once the retrospective testing of existing public microcodes is done, we
should be back into a mode where new microcodes are only released in
batches and we shouldn't even need to update the blacklist for those
anyway, so this tweaking of the list isn't expected to be a thing which
keeps happening.
Requested-by: Arjan van de Ven <arjan.van.de.ven(a)intel.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: arjan.van.de.ven(a)intel.com
Cc: dave.hansen(a)intel.com
Cc: kvm(a)vger.kernel.org
Cc: pbonzini(a)redhat.com
Link: http://lkml.kernel.org/r/1518449255-2182-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/cpu/intel.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 71492d2..b69d258 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -40,13 +40,14 @@ struct sku_microcode {
u32 microcode;
};
static const struct sku_microcode spectre_bad_microcodes[] = {
- { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x84 },
- { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0A, 0x84 },
- { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x84 },
- { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x84 },
- { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
+ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 },
+ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0A, 0x80 },
+ { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 },
+ { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 },
+ { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 },
{ INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
{ INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
+ { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
{ INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
{ INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
{ INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit 1751342095f0d2b36fa8114d8e12c5688c455ac4 upstream.
Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
too. We blacklisted the latter purely because it was present with all
the other problematic ones in the 2018-01-08 release, but now it's
explicitly listed as OK.
We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
that appeared in one version of the blacklist and then reverted to
0x80 again. We can change it if 0x84 is actually announced to be safe.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: arjan.van.de.ven(a)intel.com
Cc: jmattson(a)google.com
Cc: karahmed(a)amazon.de
Cc: kvm(a)vger.kernel.org
Cc: pbonzini(a)redhat.com
Cc: rkrcmar(a)redhat.com
Cc: sironi(a)amazon.de
Link: http://lkml.kernel.org/r/1518305967-31356-2-git-send-email-dwmw@amazon.co.uk
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/cpu/intel.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0f13189..71492d2 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -47,8 +47,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
{ INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
{ INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
{ INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
- { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
- { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
{ INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
{ INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
{ INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
@@ -60,8 +58,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
{ INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
{ INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
{ INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
- /* Updated in the 20180108 release; blacklist until we know otherwise */
- { INTEL_FAM6_ATOM_GEMINI_LAKE, 0x01, 0x22 },
/* Observed in the wild */
{ INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b },
{ INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 },
From: David Woodhouse <dwmw(a)amazon.co.uk>
(cherry picked from commit 7fcae1118f5fd44a862aa5c3525248e35ee67c3b)
Despite the fact that all the other code there seems to be doing it, just
using set_cpu_cap() in early_intel_init() doesn't actually work.
For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
c->c_init() has set those feature bits. That resets those bits back to what
was queried from the hardware.
Turning the bits off for bad microcode is easy to fix. That can just use
setup_clear_cpu_cap() to force them off for all CPUs.
I was less keen on forcing the feature bits *on* that way, just in case
of inconsistencies. I appreciate that the kernel is going to get this
utterly wrong if CPU features are not consistent, because it has already
applied alternatives by the time secondary CPUs are brought up.
But at least if setup_force_cpu_cap() isn't being used, we might have a
chance of *detecting* the lack of the corresponding bit and either
panicking or refusing to bring the offending CPU online.
So ensure that the appropriate feature bits are set within get_cpu_cap()
regardless of how many extra times it's called.
Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: karahmed(a)amazon.de
Cc: peterz(a)infradead.org
Cc: bp(a)alien8.de
Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.…
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/cpu/common.c | 21 +++++++++++++++++++++
arch/x86/kernel/cpu/intel.c | 27 ++++++++-------------------
2 files changed, 29 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d6c097c..72d7e5a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -676,6 +676,26 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
}
}
+static void init_speculation_control(struct cpuinfo_x86 *c)
+{
+ /*
+ * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
+ * and they also have a different bit for STIBP support. Also,
+ * a hypervisor might have set the individual AMD bits even on
+ * Intel CPUs, for finer-grained selection of what's available.
+ *
+ * We use the AMD bits in 0x8000_0008 EBX as the generic hardware
+ * features, which are visible in /proc/cpuinfo and used by the
+ * kernel. So set those accordingly from the Intel bits.
+ */
+ if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
+ set_cpu_cap(c, X86_FEATURE_IBRS);
+ set_cpu_cap(c, X86_FEATURE_IBPB);
+ }
+ if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
+ set_cpu_cap(c, X86_FEATURE_STIBP);
+}
+
void get_cpu_cap(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
@@ -768,6 +788,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
init_scattered_cpuid_features(c);
+ init_speculation_control(c);
}
static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fee94ee..0f13189 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -105,28 +105,17 @@ static void early_init_intel(struct cpuinfo_x86 *c)
rdmsr(MSR_IA32_UCODE_REV, lower_word, c->microcode);
}
- /*
- * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
- * and they also have a different bit for STIBP support. Also,
- * a hypervisor might have set the individual AMD bits even on
- * Intel CPUs, for finer-grained selection of what's available.
- */
- if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
- set_cpu_cap(c, X86_FEATURE_IBRS);
- set_cpu_cap(c, X86_FEATURE_IBPB);
- }
- if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
- set_cpu_cap(c, X86_FEATURE_STIBP);
-
/* Now if any of them are set, check the blacklist and clear the lot */
- if ((cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
+ if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
+ cpu_has(c, X86_FEATURE_INTEL_STIBP) ||
+ cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) {
pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n");
- clear_cpu_cap(c, X86_FEATURE_IBRS);
- clear_cpu_cap(c, X86_FEATURE_IBPB);
- clear_cpu_cap(c, X86_FEATURE_STIBP);
- clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL);
- clear_cpu_cap(c, X86_FEATURE_INTEL_STIBP);
+ setup_clear_cpu_cap(X86_FEATURE_IBRS);
+ setup_clear_cpu_cap(X86_FEATURE_IBPB);
+ setup_clear_cpu_cap(X86_FEATURE_STIBP);
+ setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL);
+ setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP);
}
/*
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 0d47638f80a02b15869f1fe1fc09e5bf996750fd upstream
Kirill Shutemov pointed this out to me.
The tip tree currently has commit:
dfb4a70f2 [x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions]
whioch added support for two new CPUID bits: X86_FEATURE_PKU and
X86_FEATURE_OSPKE. But, those bits were mis-merged and put in
cpufeature.h instead of cpufeatures.h.
This didn't cause any breakage *except* it keeps the "ospke" and
"pku" bits from showing up in cpuinfo.
Now cpuinfo has the two new flags:
flags : ... pku ospke
BTW, is it really wise to have cpufeature.h and cpufeatures.h?
It seems like they can only cause confusion and mahem with tab
completion.
Reported-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Dave Hansen <dave(a)sr71.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/20160310221213.06F9DB53@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/cpufeature.h | 4 ----
arch/x86/include/asm/cpufeatures.h | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 7fdd717..b953fb7 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -94,10 +94,6 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
(__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \
x86_this_cpu_test_bit(bit, (unsigned long *)&cpu_info.x86_capability))
-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
-#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */
-#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
-
/*
* This macro is for detection of features which need kernel
* infrastructure to be used. It may *not* directly test the CPU
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6ebb4c2d..f19b901 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -276,6 +276,10 @@
#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */
#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */
+/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
+#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
+
/*
* BUG word(s)
*/
From: Borislav Petkov <bp(a)suse.de>
commit 337e4cc84021212a87b04b77b65cccc49304909e upstream
Add .altinstr_aux for additional instructions which will be used
before and/or during patching. All stuff which needs more
sophisticated patching should go there. See next patch.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/kernel/vmlinux.lds.S | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index e065065..a703842 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -202,6 +202,17 @@ SECTIONS
:init
#endif
+ /*
+ * Section for code used exclusively before alternatives are run. All
+ * references to such code must be patched out by alternatives, normally
+ * by using X86_FEATURE_ALWAYS CPU feature bit.
+ *
+ * See static_cpu_has() for an example.
+ */
+ .altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
+ *(.altinstr_aux)
+ }
+
INIT_DATA_SECTION(16)
.x86_cpu_dev.init : AT(ADDR(.x86_cpu_dev.init) - LOAD_OFFSET) {
From: Andi Kleen <ak(a)linux.intel.com>
commit 153a4334c439cfb62e1d31cee0c790ba4157813d upstream
asm/atomic.h doesn't really need asm/processor.h anymore. Everything
it uses has moved to other header files. So remove that include.
processor.h is a nasty header that includes lots of
other headers and makes it prone to include loops. Removing the
include here makes asm/atomic.h a "leaf" header that can
be safely included in most other headers.
The only fallout is in the lib/atomic tester which relied on
this implicit include. Give it an explicit include.
(the include is in ifdef because the user is also in ifdef)
Signed-off-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mike Galbraith <efault(a)gmx.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Cc: rostedt(a)goodmis.org
Link: http://lkml.kernel.org/r/1449018060-1742-1-git-send-email-andi@firstfloor.o…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/include/asm/atomic.h | 1 -
arch/x86/include/asm/atomic64_32.h | 1 -
lib/atomic64_test.c | 4 ++++
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index ae5fb83..3e86742 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -3,7 +3,6 @@
#include <linux/compiler.h>
#include <linux/types.h>
-#include <asm/processor.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
#include <asm/rmwcc.h>
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index a11c30b..a984111 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -3,7 +3,6 @@
#include <linux/compiler.h>
#include <linux/types.h>
-#include <asm/processor.h>
//#include <asm/cmpxchg.h>
/* An 64bit atomic type */
diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
index 83c33a5b..d51e25a 100644
--- a/lib/atomic64_test.c
+++ b/lib/atomic64_test.c
@@ -16,6 +16,10 @@
#include <linux/kernel.h>
#include <linux/atomic.h>
+#ifdef CONFIG_X86
+#include <asm/processor.h> /* for boot_cpu_has below */
+#endif
+
#define TEST(bit, op, c_op, val) \
do { \
atomic##bit##_set(&v, v0); \
From: Borislav Petkov <bp(a)suse.de>
commit 6e1315fe82308cd29e7550eab967262e8bbc71a3 upstream
This brings .text savings of about ~1.6K when building a tinyconfig. It
is off by default so nothing changes for the default.
Kconfig help text from Josh.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Josh Triplett <josh(a)joshtriplett.org>
Link: http://lkml.kernel.org/r/1449481182-27541-5-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Srivatsa S. Bhat <srivatsa(a)csail.mit.edu>
Reviewed-by: Matt Helsley (VMware) <matt.helsley(a)gmail.com>
Reviewed-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: Bo Gan <ganb(a)vmware.com>
---
arch/x86/Kconfig | 11 +++++++++++
arch/x86/include/asm/cpufeature.h | 2 +-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eab1ef2..d9afe6d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -346,6 +346,17 @@ config X86_FEATURE_NAMES
If in doubt, say Y.
+config X86_FAST_FEATURE_TESTS
+ bool "Fast CPU feature tests" if EMBEDDED
+ default y
+ ---help---
+ Some fast-paths in the kernel depend on the capabilities of the CPU.
+ Say Y here for the kernel to patch in the appropriate code at runtime
+ based on the capabilities of the CPU. The infrastructure for patching
+ code at runtime takes up some additional space; space-constrained
+ embedded systems may wish to say N here to produce smaller, slightly
+ slower code.
+
config X86_X2APIC
bool "Support x2apic"
depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 54c7a0b..67f917b 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -422,7 +422,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
* fast paths and boot_cpu_has() otherwise!
*/
-#if __GNUC__ >= 4
+#if __GNUC__ >= 4 && defined(CONFIG_X86_FAST_FEATURE_TESTS)
extern void warn_pre_alternatives(void);
extern bool __static_cpu_has_safe(u16 bit);
Hi Greg,
This patch series is a backport of the Spectre-v2 fixes (IBPB/IBRS)
and patches for the Speculative Store Bypass vulnerability to 4.4.y
(they apply cleanly on top of 4.4.140).
I used 4.9.y as my reference when backporting to 4.4.y (as I thought
that would minimize the amount of fixing up necessary). Unfortunately
I had to skip the KVM fixes for these vulnerabilities, as the KVM
codebase is drastically different in 4.4 as compared to 4.9. (I tried
my best to backport them initially, but wasn't confident that they
were correct, so I decided to drop them from this series).
You'll notice that the initial few patches in this series include
cleanups etc., that are non-critical to IBPB/IBRS/SSBD. Most of these
patches are aimed at getting the cpufeature.h vs cpufeatures.h split
into 4.4, since a lot of the subsequent patches update these headers.
On my first attempt to backport these patches to 4.4.y, I had actually
tried to do all the updates on the cpufeature.h file itself, but it
started getting very cumbersome, so I resorted to backporting the
cpufeature.h vs cpufeatures.h split and their dependencies as well. I
think apart from these initial patches, the rest of the patchset
doesn't have all that much noise.
This patchset has been tested on both Intel and AMD machines (Intel
Xeon CPU E5-2660 v4 and AMD EPYC 7281 16-Core Processor, respectively)
with updated microcode. All the patch backports have been
independently reviewed by Matt Helsley, Alexey Makhalov and Bo Gan.
I would appreciate if you could kindly consider these patches for
review and inclusion in a future 4.4.y release.
Thank you very much!
Regards,
Srivatsa
VMware Photon OS
P.S. This patchset is also available in the following repo if anyone
is interested in giving it a try:
https://github.com/srivatsabhat/linux-stable spectre-v2-fixes-nokvm-4.4.140
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm: do not bug_on on incorrect length in __mm_populate()
syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate
localhost login: [ 81.210241] emacs (9634) used greatest stack depth: 10416 bytes left
[ 140.099935] ------------[ cut here ]------------
[ 140.101904] kernel BUG at mm/gup.c:1242!
[ 140.103572] invalid opcode: 0000 [#1] SMP
[ 140.105220] CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
[ 140.107762] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 140.112000] RIP: 0010:__mm_populate+0x1e2/0x1f0
[ 140.113875] Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
[ 140.121403] RSP: 0018:ffffc90000dffd78 EFLAGS: 00010293
[ 140.123516] RAX: ffff8801366c63c0 RBX: 000000007bf81000 RCX: ffffffff813e4ee2
[ 140.126352] RDX: 0000000000000000 RSI: 0000000000007676 RDI: 000000007bf81000
[ 140.129236] RBP: ffffc90000dffdc0 R08: 0000000000000000 R09: 0000000000000000
[ 140.132110] R10: ffff880135895c80 R11: 0000000000000000 R12: 0000000000007676
[ 140.134955] R13: 0000000000008000 R14: 0000000000000000 R15: 0000000000007676
[ 140.137785] FS: 0000000000000000(0000) GS:ffff88013a680000(0063) knlGS:00000000f7db9700
[ 140.140998] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 140.143303] CR2: 00000000f7ea56e0 CR3: 0000000134674004 CR4: 00000000000606e0
[ 140.145906] Call Trace:
[ 140.146728] vm_brk_flags+0xc3/0x100
[ 140.147830] vm_brk+0x1f/0x30
[ 140.148714] load_elf_library+0x281/0x2e0
[ 140.149875] __ia32_sys_uselib+0x170/0x1e0
[ 140.151028] ? copy_overflow+0x30/0x30
[ 140.152105] ? __ia32_sys_uselib+0x170/0x1e0
[ 140.153301] do_fast_syscall_32+0xca/0x420
[ 140.154455] entry_SYSENTER_compat+0x70/0x7f
The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is expanded
as it should. All we need is to tell mm_populate about it. Besides that
there is absolutely no reason to to bug_on in the first place. The worst
thing that could happen is that the last page wouldn't get populated and
that is far from putting system into an inconsistent state.
Fix the issue by moving the length sanitization code from do_brk_flags up
to vm_brk_flags. The only other caller of do_brk_flags is brk syscall
entry and it makes sure to provide the proper length so t here is no need
for sanitation and so we can use do_brk_flags without it.
Also remove the bogus BUG_ONs.
[osalvador(a)techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06(a)syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 2 --
mm/mmap.c | 29 ++++++++++++-----------------
2 files changed, 12 insertions(+), 19 deletions(-)
diff -puN mm/gup.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate mm/gup.c
--- a/mm/gup.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate
+++ a/mm/gup.c
@@ -1238,8 +1238,6 @@ int __mm_populate(unsigned long start, u
int locked = 0;
long ret = 0;
- VM_BUG_ON(start & ~PAGE_MASK);
- VM_BUG_ON(len != PAGE_ALIGN(len));
end = start + len;
for (nstart = start; nstart < end; nstart = nend) {
diff -puN mm/mmap.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate mm/mmap.c
--- a/mm/mmap.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate
+++ a/mm/mmap.c
@@ -186,8 +186,8 @@ static struct vm_area_struct *remove_vma
return next;
}
-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf);
-
+static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags,
+ struct list_head *uf);
SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long retval;
@@ -245,7 +245,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
goto out;
/* Ok, looks good - let it rip. */
- if (do_brk(oldbrk, newbrk-oldbrk, &uf) < 0)
+ if (do_brk_flags(oldbrk, newbrk-oldbrk, 0, &uf) < 0)
goto out;
set_brk:
@@ -2929,21 +2929,14 @@ static inline void verify_mm_writelocked
* anonymous maps. eventually we may be able to do some
* brk-specific accounting here.
*/
-static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf)
+static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long flags, struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
- unsigned long len;
struct rb_node **rb_link, *rb_parent;
pgoff_t pgoff = addr >> PAGE_SHIFT;
int error;
- len = PAGE_ALIGN(request);
- if (len < request)
- return -ENOMEM;
- if (!len)
- return 0;
-
/* Until we need other flags, refuse anything except VM_EXEC. */
if ((flags & (~VM_EXEC)) != 0)
return -EINVAL;
@@ -3015,18 +3008,20 @@ out:
return 0;
}
-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf)
-{
- return do_brk_flags(addr, len, 0, uf);
-}
-
-int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
+int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ unsigned long len;
int ret;
bool populate;
LIST_HEAD(uf);
+ len = PAGE_ALIGN(request);
+ if (len < request)
+ return -ENOMEM;
+ if (!len)
+ return 0;
+
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
_
From: Oscar Salvador <osalvador(a)suse.de>
Subject: fs, elf: make sure to page align bss in load_elf_library
The current code does not make sure to page align bss before calling
vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate()
due to the requested lenght not being correctly aligned.
Let us make sure to align it properly.
Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for
libc5.
Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-by: syzbot+5dcb560fe12aa5091c06(a)syzkaller.appspotmail.com
Tested-by: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nicolas Pitre <nicolas.pitre(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_elf.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff -puN fs/binfmt_elf.c~fs-elf-make-sure-to-page-align-bss-in-load_elf_library fs/binfmt_elf.c
--- a/fs/binfmt_elf.c~fs-elf-make-sure-to-page-align-bss-in-load_elf_library
+++ a/fs/binfmt_elf.c
@@ -1259,9 +1259,8 @@ static int load_elf_library(struct file
goto out_free_ph;
}
- len = ELF_PAGESTART(eppnt->p_filesz + eppnt->p_vaddr +
- ELF_MIN_ALIGN - 1);
- bss = eppnt->p_memsz + eppnt->p_vaddr;
+ len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr);
+ bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr);
if (bss > len) {
error = vm_brk(len, bss - len);
if (error)
_
From: Philipp Rudo <prudo(a)linux.ibm.com>
Subject: x86/purgatory: add missing FORCE to Makefile target
- Build the kernel without the fix
- Add some flag to the purgatories KBUILD_CFLAGS,I used
-fno-asynchronous-unwind-tables
- Re-build the kernel
When you look at makes output you see that sha256.o is not re-build in the
last step. Also readelf -S still shows the .eh_frame section for
sha256.o.
With the fix sha256.o is rebuilt in the last step.
Without FORCE make does not detect changes only made to the command line
options. So object files might not be re-built even when they should be.
Fix this by adding FORCE where it is missing.
Link: http://lkml.kernel.org/r/20180704110044.29279-2-prudo@linux.ibm.com
Fixes: df6f2801f511 ("kernel/kexec_file.c: move purgatories sha256 to common code")
Signed-off-by: Philipp Rudo <prudo(a)linux.ibm.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/purgatory/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN arch/x86/purgatory/Makefile~x86-purgatory-add-missing-force-to-makefile-target arch/x86/purgatory/Makefile
--- a/arch/x86/purgatory/Makefile~x86-purgatory-add-missing-force-to-makefile-target
+++ a/arch/x86/purgatory/Makefile
@@ -6,7 +6,7 @@ purgatory-y := purgatory.o stack.o setup
targets += $(purgatory-y)
PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
-$(obj)/sha256.o: $(srctree)/lib/sha256.c
+$(obj)/sha256.o: $(srctree)/lib/sha256.c FORCE
$(call if_changed_rule,cc_o_c)
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib
_
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Thomas reports:
: While looking around in /proc on my v4.14.52 system I noticed that
: all processes got a lot of "Locked" memory in /proc/*/smaps. A lot
: more memory than a regular user can usually lock with mlock().
:
: commit 493b0e9d945fa9dfe96be93ae41b4ca4b6fdb317 (v4.14-rc1) seems
: to have changed the behavior of "Locked".
:
: Before that commit the code was like this. Notice the VM_LOCKED
: check.
:
: (vma->vm_flags & VM_LOCKED) ?
: (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
:
: After that commit Locked is now the same as Pss. This looks like a
: mistake.
:
: (unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
Indeed, the commit has added mss->pss_locked with the correct value that
depends on VM_LOCKED, but forgot to actually use it. Fix it.
Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Thomas Lindroth <thomas.lindroth(a)gmail.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Daniel Colascione <dancol(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN fs/proc/task_mmu.c~mm-fix-locked-field-in-proc-pid-smaps fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-fix-locked-field-in-proc-pid-smaps
+++ a/fs/proc/task_mmu.c
@@ -831,7 +831,8 @@ static int show_smap(struct seq_file *m,
SEQ_PUT_DEC(" kB\nSwap: ", mss->swap);
SEQ_PUT_DEC(" kB\nSwapPss: ",
mss->swap_pss >> PSS_SHIFT);
- SEQ_PUT_DEC(" kB\nLocked: ", mss->pss >> PSS_SHIFT);
+ SEQ_PUT_DEC(" kB\nLocked: ",
+ mss->pss_locked >> PSS_SHIFT);
seq_puts(m, " kB\n");
}
if (!rollup_mode) {
_
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
Subject: mm: do not drop unused pages when userfaultd is running
KVM guests on s390 can notify the host of unused pages. This can result
in pte_unused callbacks to be true for KVM guest memory.
If a page is unused (checked with pte_unused) we might drop this page
instead of paging it. This can have side-effects on userfaultd, when the
page in question was already migrated:
The next access of that page will trigger a fault and a user fault instead
of faulting in a new and empty zero page. As QEMU does not expect a
userfault on an already migrated page this migration will fail.
The most straightforward solution is to ignore the pte_unused hint if a
userfault context is active for this VMA.
Link: http://lkml.kernel.org/r/20180703171854.63981-1-borntraeger@de.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Cornelia Huck <cohuck(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff -puN mm/rmap.c~mm-do-not-drop-unused-pages-when-userfaultd-is-running mm/rmap.c
--- a/mm/rmap.c~mm-do-not-drop-unused-pages-when-userfaultd-is-running
+++ a/mm/rmap.c
@@ -64,6 +64,7 @@
#include <linux/backing-dev.h>
#include <linux/page_idle.h>
#include <linux/memremap.h>
+#include <linux/userfaultfd_k.h>
#include <asm/tlbflush.h>
@@ -1481,11 +1482,16 @@ static bool try_to_unmap_one(struct page
set_pte_at(mm, address, pvmw.pte, pteval);
}
- } else if (pte_unused(pteval)) {
+ } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
/*
* The guest indicated that the page content is of no
* interest anymore. Simply discard the pte, vmscan
* will take care of the rest.
+ * A future reference will then fault in a new zero
+ * page. When userfaultfd is active, we must not drop
+ * this page though, as its main user (postcopy
+ * migration) will not expect userfaults on already
+ * copied pages.
*/
dec_mm_counter(mm, mm_counter(page));
/* We have to invalidate as we cleared the pte */
_
The patch titled
Subject: mm/huge_memory.c: fix data loss when splitting a file pmd
has been added to the -mm tree. Its filename is
thp-fix-data-loss-when-splitting-a-file-pmd.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/thp-fix-data-loss-when-splitting-a…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/thp-fix-data-loss-when-splitting-a…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/huge_memory.c: fix data loss when splitting a file pmd
__split_huge_pmd_locked() must check if the cleared huge pmd was dirty,
and propagate that to PageDirty: otherwise, data may be lost when a huge
tmpfs page is modified then split then reclaimed.
How has this taken so long to be noticed? Because there was no problem
when the huge page is written by a write system call (shmem_write_end()
calls set_page_dirty()), nor when the page is allocated for a write fault
(fault_dirty_shared_page() calls set_page_dirty()); but when allocated for
a read fault (which MAP_POPULATE simulates), no set_page_dirty().
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1807111741430.1106@eggly.anvils
Fixes: d21b9e57c74c ("thp: handle file pages in split_huge_pmd()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Ashwin Chaugule <ashwinch(a)google.com>
Reviewed-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
diff -puN mm/huge_memory.c~thp-fix-data-loss-when-splitting-a-file-pmd mm/huge_memory.c
--- a/mm/huge_memory.c~thp-fix-data-loss-when-splitting-a-file-pmd
+++ a/mm/huge_memory.c
@@ -2084,6 +2084,8 @@ static void __split_huge_pmd_locked(stru
if (vma_is_dax(vma))
return;
page = pmd_page(_pmd);
+ if (!PageDirty(page) && pmd_dirty(_pmd))
+ set_page_dirty(page);
if (!PageReferenced(page) && pmd_young(_pmd))
SetPageReferenced(page);
page_remove_rmap(page, true);
_
Patches currently in -mm which might be from hughd(a)google.com are
thp-fix-data-loss-when-splitting-a-file-pmd.patch
The patch titled
Subject: fat: fix memory allocation failure handling of match_strdup()
has been added to the -mm tree. Its filename is
fat-fix-memory-allocation-failure-handling-of-match_strdup.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/fat-fix-memory-allocation-failure-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/fat-fix-memory-allocation-failure-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Subject: fat: fix memory allocation failure handling of match_strdup()
In parse_options(), if match_strdup() failed, parse_options() leaves
opts->iocharset in unexpected state (i.e. still pointing the freed
string). And this can be the cause of double free.
To fix, this initialize opts->iocharset always when freeing.
Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Reported-by: syzbot+90b8e10515ae88228a92(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fat/inode.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff -puN fs/fat/inode.c~fat-fix-memory-allocation-failure-handling-of-match_strdup fs/fat/inode.c
--- a/fs/fat/inode.c~fat-fix-memory-allocation-failure-handling-of-match_strdup
+++ a/fs/fat/inode.c
@@ -707,13 +707,21 @@ static void fat_set_state(struct super_b
brelse(bh);
}
+static void fat_reset_iocharset(struct fat_mount_options *opts)
+{
+ if (opts->iocharset != fat_default_iocharset) {
+ /* Note: opts->iocharset can be NULL here */
+ kfree(opts->iocharset);
+ opts->iocharset = fat_default_iocharset;
+ }
+}
+
static void delayed_free(struct rcu_head *p)
{
struct msdos_sb_info *sbi = container_of(p, struct msdos_sb_info, rcu);
unload_nls(sbi->nls_disk);
unload_nls(sbi->nls_io);
- if (sbi->options.iocharset != fat_default_iocharset)
- kfree(sbi->options.iocharset);
+ fat_reset_iocharset(&sbi->options);
kfree(sbi);
}
@@ -1132,7 +1140,7 @@ static int parse_options(struct super_bl
opts->fs_fmask = opts->fs_dmask = current_umask();
opts->allow_utime = -1;
opts->codepage = fat_default_codepage;
- opts->iocharset = fat_default_iocharset;
+ fat_reset_iocharset(opts);
if (is_vfat) {
opts->shortname = VFAT_SFN_DISPLAY_WINNT|VFAT_SFN_CREATE_WIN95;
opts->rodir = 0;
@@ -1289,8 +1297,7 @@ static int parse_options(struct super_bl
/* vfat specific */
case Opt_charset:
- if (opts->iocharset != fat_default_iocharset)
- kfree(opts->iocharset);
+ fat_reset_iocharset(opts);
iocharset = match_strdup(&args[0]);
if (!iocharset)
return -ENOMEM;
@@ -1881,8 +1888,7 @@ out_fail:
iput(fat_inode);
unload_nls(sbi->nls_io);
unload_nls(sbi->nls_disk);
- if (sbi->options.iocharset != fat_default_iocharset)
- kfree(sbi->options.iocharset);
+ fat_reset_iocharset(&sbi->options);
sb->s_fs_info = NULL;
kfree(sbi);
return error;
_
Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are
fat-fix-memory-allocation-failure-handling-of-match_strdup.patch
This fixes some nasty issues I found in nouveau that were being caused
looping through connectors using racy legacy methods, along with some
caused by making incorrect assumptions about the drm_connector structs
in nouveau's connector list. Most of these memory corruption issues
could be reproduced by using an MST hub with nouveau.
Cc: Karol Herbst <karolherbst(a)gmail.com>
Cc: stable(a)vger.kernel.org
Lyude Paul (2):
drm/nouveau: Use drm_connector_list_iter_* for iterating ues connectors
drm/nouveau: Avoid looping through fake MST connectors
drivers/gpu/drm/nouveau/nouveau_backlight.c | 6 ++--
drivers/gpu/drm/nouveau/nouveau_connector.c | 9 ++++--
drivers/gpu/drm/nouveau/nouveau_connector.h | 36 ++++++++++++++++++---
drivers/gpu/drm/nouveau/nouveau_display.c | 10 ++++--
4 files changed, 51 insertions(+), 10 deletions(-)
--
2.17.1
We are a team, we can process 300+ images per day for you.
If you need any image editing, please let us know.
Photos cut out;
Photos clipping path;
Photos masking;
Photo shadow creation;
Photos retouching;
Beauty Model retouching on skin, face, body;
Glamour retouching;
Products retouching.
We can give you editing test on your photos.
Turnaround time fast
7/24/365 available
Thanks,
Simon
We are a team, we can process 300+ images per day for you.
If you need any image editing, please let us know.
Photos cut out;
Photos clipping path;
Photos masking;
Photo shadow creation;
Photos retouching;
Beauty Model retouching on skin, face, body;
Glamour retouching;
Products retouching.
We can give you editing test on your photos.
Turnaround time fast
7/24/365 available
Thanks,
Simon
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 92748beac07c471d995fbec642b63572dc01b3dc Mon Sep 17 00:00:00 2001
From: Stefan Agner <stefan(a)agner.ch>
Date: Wed, 4 Jul 2018 17:07:45 +0200
Subject: [PATCH] mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz
pinctrl states
If pinctrl nodes for 100/200MHz are missing, the controller should
not select any mode which need signal frequencies 100MHz or higher.
To prevent such speed modes the driver currently uses the quirk flag
SDHCI_QUIRK2_NO_1_8_V. This works nicely for SD cards since 1.8V
signaling is required for all faster modes and slower modes use 3.3V
signaling only.
However, there are eMMC modes which use 1.8V signaling and run below
100MHz, e.g. DDR52 at 1.8V. With using SDHCI_QUIRK2_NO_1_8_V this
mode is prevented. When using a fixed 1.8V regulator as vqmmc-supply
the stack has no valid mode to use. In this tenuous situation the
kernel continuously prints voltage switching errors:
mmc1: Switching to 3.3V signalling voltage failed
Avoid using SDHCI_QUIRK2_NO_1_8_V and prevent faster modes by
altering the SDHCI capability register. With that the stack is able
to select 1.8V modes even if no faster pinctrl states are available:
# cat /sys/kernel/debug/mmc1/ios
...
timing spec: 8 (mmc DDR52)
signal voltage: 1 (1.80 V)
...
Link: http://lkml.kernel.org/r/20180628081331.13051-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan(a)agner.ch>
Fixes: ad93220de7da ("mmc: sdhci-esdhc-imx: change pinctrl state according
to uhs mode")
Cc: <stable(a)vger.kernel.org> # v4.13+
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index d6aef70d34fa..4eb3d29ecde1 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -312,6 +312,15 @@ static u32 esdhc_readl_le(struct sdhci_host *host, int reg)
if (imx_data->socdata->flags & ESDHC_FLAG_HS400)
val |= SDHCI_SUPPORT_HS400;
+
+ /*
+ * Do not advertise faster UHS modes if there are no
+ * pinctrl states for 100MHz/200MHz.
+ */
+ if (IS_ERR_OR_NULL(imx_data->pins_100mhz) ||
+ IS_ERR_OR_NULL(imx_data->pins_200mhz))
+ val &= ~(SDHCI_SUPPORT_SDR50 | SDHCI_SUPPORT_DDR50
+ | SDHCI_SUPPORT_SDR104 | SDHCI_SUPPORT_HS400);
}
}
@@ -1158,18 +1167,6 @@ sdhci_esdhc_imx_probe_dt(struct platform_device *pdev,
ESDHC_PINCTRL_STATE_100MHZ);
imx_data->pins_200mhz = pinctrl_lookup_state(imx_data->pinctrl,
ESDHC_PINCTRL_STATE_200MHZ);
- if (IS_ERR(imx_data->pins_100mhz) ||
- IS_ERR(imx_data->pins_200mhz)) {
- dev_warn(mmc_dev(host->mmc),
- "could not get ultra high speed state, work on normal mode\n");
- /*
- * fall back to not supporting uhs by specifying no
- * 1.8v quirk
- */
- host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
- }
- } else {
- host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
}
/* call to generic mmc_of_parse to support additional capabilities */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 92748beac07c471d995fbec642b63572dc01b3dc Mon Sep 17 00:00:00 2001
From: Stefan Agner <stefan(a)agner.ch>
Date: Wed, 4 Jul 2018 17:07:45 +0200
Subject: [PATCH] mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz
pinctrl states
If pinctrl nodes for 100/200MHz are missing, the controller should
not select any mode which need signal frequencies 100MHz or higher.
To prevent such speed modes the driver currently uses the quirk flag
SDHCI_QUIRK2_NO_1_8_V. This works nicely for SD cards since 1.8V
signaling is required for all faster modes and slower modes use 3.3V
signaling only.
However, there are eMMC modes which use 1.8V signaling and run below
100MHz, e.g. DDR52 at 1.8V. With using SDHCI_QUIRK2_NO_1_8_V this
mode is prevented. When using a fixed 1.8V regulator as vqmmc-supply
the stack has no valid mode to use. In this tenuous situation the
kernel continuously prints voltage switching errors:
mmc1: Switching to 3.3V signalling voltage failed
Avoid using SDHCI_QUIRK2_NO_1_8_V and prevent faster modes by
altering the SDHCI capability register. With that the stack is able
to select 1.8V modes even if no faster pinctrl states are available:
# cat /sys/kernel/debug/mmc1/ios
...
timing spec: 8 (mmc DDR52)
signal voltage: 1 (1.80 V)
...
Link: http://lkml.kernel.org/r/20180628081331.13051-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan(a)agner.ch>
Fixes: ad93220de7da ("mmc: sdhci-esdhc-imx: change pinctrl state according
to uhs mode")
Cc: <stable(a)vger.kernel.org> # v4.13+
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index d6aef70d34fa..4eb3d29ecde1 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -312,6 +312,15 @@ static u32 esdhc_readl_le(struct sdhci_host *host, int reg)
if (imx_data->socdata->flags & ESDHC_FLAG_HS400)
val |= SDHCI_SUPPORT_HS400;
+
+ /*
+ * Do not advertise faster UHS modes if there are no
+ * pinctrl states for 100MHz/200MHz.
+ */
+ if (IS_ERR_OR_NULL(imx_data->pins_100mhz) ||
+ IS_ERR_OR_NULL(imx_data->pins_200mhz))
+ val &= ~(SDHCI_SUPPORT_SDR50 | SDHCI_SUPPORT_DDR50
+ | SDHCI_SUPPORT_SDR104 | SDHCI_SUPPORT_HS400);
}
}
@@ -1158,18 +1167,6 @@ sdhci_esdhc_imx_probe_dt(struct platform_device *pdev,
ESDHC_PINCTRL_STATE_100MHZ);
imx_data->pins_200mhz = pinctrl_lookup_state(imx_data->pinctrl,
ESDHC_PINCTRL_STATE_200MHZ);
- if (IS_ERR(imx_data->pins_100mhz) ||
- IS_ERR(imx_data->pins_200mhz)) {
- dev_warn(mmc_dev(host->mmc),
- "could not get ultra high speed state, work on normal mode\n");
- /*
- * fall back to not supporting uhs by specifying no
- * 1.8v quirk
- */
- host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
- }
- } else {
- host->quirks2 |= SDHCI_QUIRK2_NO_1_8_V;
}
/* call to generic mmc_of_parse to support additional capabilities */
Hi Greg,
Seems you have missed 5845e6155d8f4a4a9bae2d4c1d1bb4a4d9a925c2 in the
stable trees. No backport required, it will apply cleanly.
--
Regards
Sudip
Commit 4aae4388165a2611fa4206363ccb243c1622446c ("nvme: fix hang in remove
path"), which was introduced in Linux 4.9.94, changed nvme_kill_queues()
to also forcibly start admin queues in order to avoid getting stuck during
device removal.
If a device is being removed because it did not respond during device
initialization (e.g., if it is not ready yet at boot time), we will end up
trying to start an admin queue that has not yet been set up at all. This
attempt will lead to a NULL pointer dereference.
To avoid hitting this bug, we add a sanity check around the invocation of
blk_mq_start_hw_queues() to ensure that the admin queue has actually been
set up already.
Upstream already has this check in place since commit
7dd1ab163c17e11473a65b11f7e748db30618ebb ("nvme: validate admin queue
before unquiesce"), and thus 4.14 contains it as well. Linux 4.4 is not
affected by this particular issue since it does not have the force-start
behavior yet.
Fixes: 4aae4388165a2611fa42 ("nvme: fix hang in remove path")
Signed-off-by: Simon Veith <sveith(a)amazon.de>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
---
drivers/nvme/host/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c823e93..8a30478 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2041,8 +2041,10 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
mutex_lock(&ctrl->namespaces_mutex);
- /* Forcibly start all queues to avoid having stuck requests */
- blk_mq_start_hw_queues(ctrl->admin_q);
+ if (ctrl->admin_q) {
+ /* Forcibly start all queues to avoid having stuck requests */
+ blk_mq_start_hw_queues(ctrl->admin_q);
+ }
list_for_each_entry(ns, &ctrl->namespaces, list) {
/*
--
2.7.4
Fixes: https://bugs.linaro.org/show_bug.cgi?id=3903
LTP Functional tests have caused a bad paging request when triggering
the regmap_read_debugfs() logic of the device PMIC Hi6553 (reading
regmap/f8000000.pmic/registers file during read_all test):
Unable to handle kernel paging request at virtual address ffff0
[ffff00000984e000] pgd=0000000077ffe803, pud=0000000077ffd803,0
Internal error: Oops: 96000007 [#1] SMP
...
Hardware name: HiKey Development Board (DT)
...
Call trace:
regmap_mmio_read8+0x24/0x40
regmap_mmio_read+0x48/0x70
_regmap_bus_reg_read+0x38/0x48
_regmap_read+0x68/0x170
regmap_read+0x50/0x78
regmap_read_debugfs+0x1a0/0x308
regmap_map_read_file+0x48/0x58
full_proxy_read+0x68/0x98
__vfs_read+0x48/0x80
vfs_read+0x94/0x150
SyS_read+0x6c/0xd8
el0_svc_naked+0x30/0x34
Code: aa1e03e0 d503201f f9400280 8b334000 (39400000)
Investigations have showed that, when triggered by debugfs read()
handler, the mmio regmap logic was reading a bigger (16k) register area
than the one mapped by devm_ioremap_resource() during hi655x-pmic probe
time (4k).
This commit changes hi655x's max register, according to HW specs, to be
the same as the one declared in the pmic device in hi6220's dts, fixing
the issue.
Signed-off-by: Rafael David Tinoco <rafael.tinoco(a)linaro.org>
Cc: <stable(a)vger.kernel.org> #v4.9 #v4.14 #v4.16 #v4.17
---
drivers/mfd/hi655x-pmic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mfd/hi655x-pmic.c b/drivers/mfd/hi655x-pmic.c
index c37ccbfd52f2..96c07fa1802a 100644
--- a/drivers/mfd/hi655x-pmic.c
+++ b/drivers/mfd/hi655x-pmic.c
@@ -49,7 +49,7 @@ static struct regmap_config hi655x_regmap_config = {
.reg_bits = 32,
.reg_stride = HI655X_STRIDE,
.val_bits = 8,
- .max_register = HI655X_BUS_ADDR(0xFFF),
+ .max_register = HI655X_BUS_ADDR(0x400) - HI655X_STRIDE,
};
static struct resource pwrkey_resources[] = {
--
2.18.0
Rather than using the index variable stored in vram. If
the device fails to come back online after a resume cycle,
reads from vram will return all 1s which will cause a
segfault. Based on a patch from Thomas Martitz <kugel(a)rockbox.org>.
This avoids the segfault, but we still need to sort out
why the GPU does not come back online after a resume.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105760
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c | 23 +++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c
index d644a9bb9078..9f407c48d4f0 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c
@@ -381,6 +381,7 @@ int smu7_request_smu_load_fw(struct pp_hwmgr *hwmgr)
uint32_t fw_to_load;
int result = 0;
struct SMU_DRAMData_TOC *toc;
+ uint32_t num_entries = 0;
if (!hwmgr->reload_fw) {
pr_info("skip reloading...\n");
@@ -422,41 +423,41 @@ int smu7_request_smu_load_fw(struct pp_hwmgr *hwmgr)
}
toc = (struct SMU_DRAMData_TOC *)smu_data->header;
- toc->num_entries = 0;
toc->structure_version = 1;
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_RLC_G, &toc->entry[toc->num_entries++]),
+ UCODE_ID_RLC_G, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_CP_CE, &toc->entry[toc->num_entries++]),
+ UCODE_ID_CP_CE, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_CP_PFP, &toc->entry[toc->num_entries++]),
+ UCODE_ID_CP_PFP, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_CP_ME, &toc->entry[toc->num_entries++]),
+ UCODE_ID_CP_ME, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_CP_MEC, &toc->entry[toc->num_entries++]),
+ UCODE_ID_CP_MEC, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_CP_MEC_JT1, &toc->entry[toc->num_entries++]),
+ UCODE_ID_CP_MEC_JT1, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_CP_MEC_JT2, &toc->entry[toc->num_entries++]),
+ UCODE_ID_CP_MEC_JT2, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_SDMA0, &toc->entry[toc->num_entries++]),
+ UCODE_ID_SDMA0, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_SDMA1, &toc->entry[toc->num_entries++]),
+ UCODE_ID_SDMA1, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
if (!hwmgr->not_vf)
PP_ASSERT_WITH_CODE(0 == smu7_populate_single_firmware_entry(hwmgr,
- UCODE_ID_MEC_STORAGE, &toc->entry[toc->num_entries++]),
+ UCODE_ID_MEC_STORAGE, &toc->entry[num_entries++]),
"Failed to Get Firmware Entry.", return -EINVAL);
+ toc->num_entries = num_entries;
smu7_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_DRV_DRAM_ADDR_HI, upper_32_bits(smu_data->header_buffer.mc_addr));
smu7_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_DRV_DRAM_ADDR_LO, lower_32_bits(smu_data->header_buffer.mc_addr));
--
2.13.6
mprotect(EXEC) was failing for stack mappings as default vm flags was
missing MAYEXEC.
This was triggered by glibc test suite nptl/tst-execstack testcase
What is surprising is that despite running LTP for years on, we didn't
catch this issue as it lacks a directed test case.
gcc dejagnu tests with nested functions also requiring exec stack work
fine though because they rely on the GNU_STACK segment spit out by
compiler and handled in kernel elf loader.
This glibc case is different as the stack is non exec to begin with and
a dlopen of shared lib with GNU_STACK segment triggers the exec stack
proceedings using a mprotect(PROT_EXEC) which was broken.
CC: stable(a)vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
---
arch/arc/include/asm/page.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arc/include/asm/page.h b/arch/arc/include/asm/page.h
index 109baa06831c..09ddddf71cc5 100644
--- a/arch/arc/include/asm/page.h
+++ b/arch/arc/include/asm/page.h
@@ -105,7 +105,7 @@ typedef pte_t * pgtable_t;
#define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr))
/* Default Permissions for stack/heaps pages (Non Executable) */
-#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE)
+#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
#define WANT_PAGE_VIRTUAL 1
--
2.7.4
Setting pv_irq_ops for Xen PV domains should be done as early as
possible in order to support e.g. very early printk() usage.
The same applies to xen_vcpu_info_reset(0), as it is needed for the
pv irq ops.
Move the call of xen_setup_machphys_mapping() after initializing the
pv functions as it contains a WARN_ON(), too.
Remove the no longer necessary conditional in xen_init_irq_ops()
from PVH V1 times to make clear this is a PV only function.
Cc: <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
arch/x86/xen/enlighten_pv.c | 24 +++++++++++-------------
arch/x86/xen/irq.c | 4 +---
2 files changed, 12 insertions(+), 16 deletions(-)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4816b6f82a9a..439a94bf89ad 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1207,12 +1207,20 @@ asmlinkage __visible void __init xen_start_kernel(void)
xen_setup_features();
- xen_setup_machphys_mapping();
-
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_init_ops.patch = paravirt_patch_default;
pv_cpu_ops = xen_cpu_ops;
+ xen_init_irq_ops();
+
+ /*
+ * Setup xen_vcpu early because it is needed for
+ * local_irq_disable(), irqs_disabled(), e.g. in printk().
+ *
+ * Don't do the full vcpu_info placement stuff until we have
+ * the cpu_possible_mask and a non-dummy shared_info.
+ */
+ xen_vcpu_info_reset(0);
x86_platform.get_nmi_reason = xen_get_nmi_reason;
@@ -1225,6 +1233,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
* Set up some pagetable state before starting to set any ptes.
*/
+ xen_setup_machphys_mapping();
xen_init_mmu_ops();
/* Prevent unwanted bits from being set in PTEs. */
@@ -1250,20 +1259,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
get_cpu_cap(&boot_cpu_data);
x86_configure_nx();
- xen_init_irq_ops();
-
/* Let's presume PV guests always boot on vCPU with id 0. */
per_cpu(xen_vcpu_id, 0) = 0;
- /*
- * Setup xen_vcpu early because idt_setup_early_handler needs it for
- * local_irq_disable(), irqs_disabled().
- *
- * Don't do the full vcpu_info placement stuff until we have
- * the cpu_possible_mask and a non-dummy shared_info.
- */
- xen_vcpu_info_reset(0);
-
idt_setup_early_handler();
xen_init_capabilities();
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 74179852e46c..7515a19fd324 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -128,8 +128,6 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
void __init xen_init_irq_ops(void)
{
- /* For PVH we use default pv_irq_ops settings. */
- if (!xen_feature(XENFEAT_hvm_callback_vector))
- pv_irq_ops = xen_irq_ops;
+ pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
}
--
2.13.7
This both uses the legacy modesetting structures in a racy manner, and
additionally also doesn't even check the right variable (enabled != the
CRTC is actually turned on for atomic).
This fixes issues on my P50 regarding the dedicated GPU not entering
runtime suspend.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 0f668e275ee1..c7ec86d6c3c9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -881,22 +881,11 @@ nouveau_pmops_runtime_resume(struct device *dev)
static int
nouveau_pmops_runtime_idle(struct device *dev)
{
- struct pci_dev *pdev = to_pci_dev(dev);
- struct drm_device *drm_dev = pci_get_drvdata(pdev);
- struct nouveau_drm *drm = nouveau_drm(drm_dev);
- struct drm_crtc *crtc;
-
if (!nouveau_pmops_runtime()) {
pm_runtime_forbid(dev);
return -EBUSY;
}
- list_for_each_entry(crtc, &drm->dev->mode_config.crtc_list, head) {
- if (crtc->enabled) {
- DRM_DEBUG_DRIVER("failing to power off - crtc active\n");
- return -EBUSY;
- }
- }
pm_runtime_mark_last_busy(dev);
pm_runtime_autosuspend(dev);
/* we don't want the main rpm_idle to call suspend - we want to autosuspend */
--
2.17.1
The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
commit 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci
memory space properly") incorrectly sets *end to the address of the
byte after the resource, rather than the last byte of the resource.
This results in userland seeing resources as a byte larger than they
actually are, for example a 32 byte BAR will be reported by a tool such
as lspci as being 33 bytes in size:
Region 2: I/O ports at 1000 [disabled] [size=33]
Correct this by subtracting one from the calculated end address,
reporting the correct address to userland.
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Reported-by: Rui Wang <rui.wang(a)windriver.com>
Fixes: 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
Cc: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Wolfgang Grandegger <wg(a)grandegger.com>
Cc: linux-mips(a)linux-mips.org
Cc: stable(a)vger.kernel.org # v3.12+
---
arch/mips/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 9632436d74d7..c2e94cf5ecda 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -54,5 +54,5 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar,
phys_addr_t size = resource_size(rsrc);
*start = fixup_bigphys_addr(rsrc->start, size);
- *end = rsrc->start + size;
+ *end = rsrc->start + size - 1;
}
--
2.18.0
This is a note to let you know that I've just added the patch titled
mei: don't update offset in write
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From a103af1b64d74853a5e08ca6c86aeb0e5c6ca4f1 Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Mon, 9 Jul 2018 12:21:44 +0300
Subject: mei: don't update offset in write
MEI enables writes of complete messages only
while read can be performed in parts, hence
write should not update the file offset to
not break interleaving partial reads with writes.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c
index f690918f7817..302ba7a63bd2 100644
--- a/drivers/misc/mei/main.c
+++ b/drivers/misc/mei/main.c
@@ -312,7 +312,6 @@ static ssize_t mei_write(struct file *file, const char __user *ubuf,
}
}
- *offset = 0;
cb = mei_cl_alloc_cb(cl, length, MEI_FOP_WRITE, file);
if (!cb) {
rets = -ENOMEM;
--
2.18.0
This is a note to let you know that I've just added the patch titled
mei: bus: type promotion bug in mei_nfc_if_version()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From b40b3e9358fbafff6a4ba0f4b9658f6617146f9c Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Wed, 11 Jul 2018 15:29:31 +0300
Subject: mei: bus: type promotion bug in mei_nfc_if_version()
We accidentally removed the check for negative returns
without considering the issue of type promotion.
The "if_version_length" variable is type size_t so if __mei_cl_recv()
returns a negative then "bytes_recv" is type promoted
to a high positive value and treated as success.
Cc: <stable(a)vger.kernel.org>
Fixes: 582ab27a063a ("mei: bus: fix received data size check in NFC fixup")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/bus-fixup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/mei/bus-fixup.c b/drivers/misc/mei/bus-fixup.c
index e45fe826d87d..65e28be3c8cc 100644
--- a/drivers/misc/mei/bus-fixup.c
+++ b/drivers/misc/mei/bus-fixup.c
@@ -341,7 +341,7 @@ static int mei_nfc_if_version(struct mei_cl *cl,
ret = 0;
bytes_recv = __mei_cl_recv(cl, (u8 *)reply, if_version_length, 0, 0);
- if (bytes_recv < if_version_length) {
+ if (bytes_recv < 0 || bytes_recv < if_version_length) {
dev_err(bus->dev, "Could not read IF version\n");
ret = -EIO;
goto err;
--
2.18.0
Hi Greg,
I think you have missed 6ed66c3ce095ae65bbc976b5817c318653745736 for
the v4.14-stable tree. No backport needed, it will apply cleanly.
--
Regards
Sudip
This is a note to let you know that I've just added the patch titled
mei: bus: type promotion bug in mei_nfc_if_version()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From b40b3e9358fbafff6a4ba0f4b9658f6617146f9c Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Wed, 11 Jul 2018 15:29:31 +0300
Subject: mei: bus: type promotion bug in mei_nfc_if_version()
We accidentally removed the check for negative returns
without considering the issue of type promotion.
The "if_version_length" variable is type size_t so if __mei_cl_recv()
returns a negative then "bytes_recv" is type promoted
to a high positive value and treated as success.
Cc: <stable(a)vger.kernel.org>
Fixes: 582ab27a063a ("mei: bus: fix received data size check in NFC fixup")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/bus-fixup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/mei/bus-fixup.c b/drivers/misc/mei/bus-fixup.c
index e45fe826d87d..65e28be3c8cc 100644
--- a/drivers/misc/mei/bus-fixup.c
+++ b/drivers/misc/mei/bus-fixup.c
@@ -341,7 +341,7 @@ static int mei_nfc_if_version(struct mei_cl *cl,
ret = 0;
bytes_recv = __mei_cl_recv(cl, (u8 *)reply, if_version_length, 0, 0);
- if (bytes_recv < if_version_length) {
+ if (bytes_recv < 0 || bytes_recv < if_version_length) {
dev_err(bus->dev, "Could not read IF version\n");
ret = -EIO;
goto err;
--
2.18.0
This is a note to let you know that I've just added the patch titled
mei: don't update offset in write
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From a103af1b64d74853a5e08ca6c86aeb0e5c6ca4f1 Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Mon, 9 Jul 2018 12:21:44 +0300
Subject: mei: don't update offset in write
MEI enables writes of complete messages only
while read can be performed in parts, hence
write should not update the file offset to
not break interleaving partial reads with writes.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c
index f690918f7817..302ba7a63bd2 100644
--- a/drivers/misc/mei/main.c
+++ b/drivers/misc/mei/main.c
@@ -312,7 +312,6 @@ static ssize_t mei_write(struct file *file, const char __user *ubuf,
}
}
- *offset = 0;
cb = mei_cl_alloc_cb(cl, length, MEI_FOP_WRITE, file);
if (!cb) {
rets = -ENOMEM;
--
2.18.0
On Thu, Jul 12, 2018 at 10:33:24AM +0200, Daniel Borkmann wrote:
> Hi Greg,
>
> if you have a chance, please queue the following BPF patches for -stable:
Always cc: stable@vger for stable stuff, as I'm not the only stable tree
out there...
>
> - 4.17.y:
>
> 3a38bb98d9ab ("bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog")
This is already in 4.17.
> c93552c443eb ("bpf: properly enforce index mask to prevent out-of-bounds speculation")
So is this.
> 58990d1ff3f7 ("bpf: reject passing modified ctx to helper functions")
Now applied.
> b16558579576 ("bpf: implement dummy fops for bpf objects")
Does not apply at all :(
> 7d1982b4e335 ("bpf: fix panic in prog load calls cleanup")
Does not apply :(
> ed2b82c03dc1 ("bpf: hash map: decrement counter on error")
This is not in Linus's tree.
> - 4.14.y:
>
> c93552c443eb ("bpf: properly enforce index mask to prevent out-of-bounds speculation")
Doesn't apply to 4.14.y :(
> b16558579576 ("bpf: implement dummy fops for bpf objects")
Does not apply at all.
> ed2b82c03dc1 ("bpf: hash map: decrement counter on error")
Not in Linus's tree.
Can you send me the needed backports?
thanks,
greg k-h