This is a note to let you know that I've just added the patch titled
dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dm-bufio-fix-shrinker-scans-when-nr_to_scan-retain_target.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fbc7c07ec23c040179384a1f16b62b6030eb6bdd Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Wed, 6 Dec 2017 09:27:30 -0800
Subject: dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
From: Suren Baghdasaryan <surenb(a)google.com>
commit fbc7c07ec23c040179384a1f16b62b6030eb6bdd upstream.
When system is under memory pressure it is observed that dm bufio
shrinker often reclaims only one buffer per scan. This change fixes
the following two issues in dm bufio shrinker that cause this behavior:
1. ((nr_to_scan - freed) <= retain_target) condition is used to
terminate slab scan process. This assumes that nr_to_scan is equal
to the LRU size, which might not be correct because do_shrink_slab()
in vmscan.c calculates nr_to_scan using multiple inputs.
As a result when nr_to_scan is less than retain_target (64) the scan
will terminate after the first iteration, effectively reclaiming one
buffer per scan and making scans very inefficient. This hurts vmscan
performance especially because mutex is acquired/released every time
dm_bufio_shrink_scan() is called.
New implementation uses ((LRU size - freed) <= retain_target)
condition for scan termination. LRU size can be safely determined
inside __scan() because this function is called after dm_bufio_lock().
2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
determine number of freeable objects in the slab. However dm_bufio
always retains retain_target buffers in its LRU and will terminate
a scan when this mark is reached. Therefore returning the entire LRU size
from dm_bufio_shrink_count() is misleading because that does not
represent the number of freeable objects that slab will reclaim during
a scan. Returning (LRU size - retain_target) better represents the
number of freeable objects in the slab. This way do_shrink_slab()
returns 0 when (LRU size < retain_target) and vmscan will not try to
scan this shrinker avoiding scans that will not reclaim any memory.
Test: tested using Android device running
<AOSP>/system/extras/alloc-stress that generates memory pressure
and causes intensive shrinker scans
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/dm-bufio.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1611,7 +1611,8 @@ static unsigned long __scan(struct dm_bu
int l;
struct dm_buffer *b, *tmp;
unsigned long freed = 0;
- unsigned long count = nr_to_scan;
+ unsigned long count = c->n_buffers[LIST_CLEAN] +
+ c->n_buffers[LIST_DIRTY];
unsigned long retain_target = get_retain_buffers(c);
for (l = 0; l < LIST_SIZE; l++) {
@@ -1647,8 +1648,11 @@ static unsigned long
dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
struct dm_bufio_client *c = container_of(shrink, struct dm_bufio_client, shrinker);
+ unsigned long count = ACCESS_ONCE(c->n_buffers[LIST_CLEAN]) +
+ ACCESS_ONCE(c->n_buffers[LIST_DIRTY]);
+ unsigned long retain_target = get_retain_buffers(c);
- return ACCESS_ONCE(c->n_buffers[LIST_CLEAN]) + ACCESS_ONCE(c->n_buffers[LIST_DIRTY]);
+ return (count < retain_target) ? 0 : (count - retain_target);
}
/*
Patches currently in stable-queue which might be from surenb(a)google.com are
queue-4.14/dm-bufio-fix-shrinker-scans-when-nr_to_scan-retain_target.patch
Handling CD-ROM devices from libsas is decidedly odd, as libata
relies on SCSI EH to be started to figure out that no medium is
present.
So we cannot do asynchronous aborts for SATA devices.
Fixes: 909657615d9 ("scsi: libsas: allow async aborts")
Cc: <stable(a)vger.kernel.org> # 4.12+
Signed-off-by: Hannes Reinecke <hare(a)suse.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Tested-by: Yves-Alexis Perez <corsac(a)debian.org>
---
drivers/scsi/libsas/sas_scsi_host.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 6267272..6de9681 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -487,15 +487,28 @@ static int sas_queue_reset(struct domain_device *dev, int reset_type,
int sas_eh_abort_handler(struct scsi_cmnd *cmd)
{
- int res;
+ int res = TMF_RESP_FUNC_FAILED;
struct sas_task *task = TO_SAS_TASK(cmd);
struct Scsi_Host *host = cmd->device->host;
+ struct domain_device *dev = cmd_to_domain_dev(cmd);
struct sas_internal *i = to_sas_internal(host->transportt);
+ unsigned long flags;
if (!i->dft->lldd_abort_task)
return FAILED;
- res = i->dft->lldd_abort_task(task);
+ spin_lock_irqsave(host->host_lock, flags);
+ /* We cannot do async aborts for SATA devices */
+ if (dev_is_sata(dev) && !host->host_eh_scheduled) {
+ spin_unlock_irqrestore(host->host_lock, flags);
+ return FAILED;
+ }
+ spin_unlock_irqrestore(host->host_lock, flags);
+
+ if (task)
+ res = i->dft->lldd_abort_task(task);
+ else
+ SAS_DPRINTK("no task to abort\n");
if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
return SUCCESS;
--
1.8.5.6
From: Eric Biggers <ebiggers(a)google.com>
pipe-user-pages-hard and pipe-user-pages-soft are only supposed to apply
to unprivileged users, as documented in both Documentation/sysctl/fs.txt
and the pipe(7) man page.
However, the capabilities are actually only checked when increasing a
pipe's size using F_SETPIPE_SZ, not when creating a new pipe.
Therefore, if pipe-user-pages-hard has been set, the root user can run
into it and be unable to create pipes. Similarly, if
pipe-user-pages-soft has been set, the root user can run into it and
have their pipes limited to 1 page each.
Fix this by allowing the privileged override in both cases.
Fixes: 759c01142a5d ("pipe: limit the per-user amount of pages allocated in pipes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/pipe.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c
index d0dec5e7ef33..847ecc388820 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -613,6 +613,11 @@ static bool too_many_pipe_buffers_hard(unsigned long user_bufs)
return pipe_user_pages_hard && user_bufs >= pipe_user_pages_hard;
}
+static bool is_unprivileged_user(void)
+{
+ return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
+}
+
struct pipe_inode_info *alloc_pipe_info(void)
{
struct pipe_inode_info *pipe;
@@ -629,12 +634,12 @@ struct pipe_inode_info *alloc_pipe_info(void)
user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
- if (too_many_pipe_buffers_soft(user_bufs)) {
+ if (too_many_pipe_buffers_soft(user_bufs) && is_unprivileged_user()) {
user_bufs = account_pipe_buffers(user, pipe_bufs, 1);
pipe_bufs = 1;
}
- if (too_many_pipe_buffers_hard(user_bufs))
+ if (too_many_pipe_buffers_hard(user_bufs) && is_unprivileged_user())
goto out_revert_acct;
pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer),
@@ -1065,7 +1070,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
if (nr_pages > pipe->buffers &&
(too_many_pipe_buffers_hard(user_bufs) ||
too_many_pipe_buffers_soft(user_bufs)) &&
- !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
+ is_unprivileged_user()) {
ret = -EPERM;
goto out_revert_acct;
}
--
2.15.1
Hello Bjorn,
Again, reviving this very old thread :-)
On Thu, 5 Oct 2017 12:23:30 -0500, Bjorn Helgaas wrote:
> > - if (PCI_SLOT(devfn) != 0) {
> > + if ((bus->number == pcie->root_bus_nr) && (PCI_SLOT(devfn) != 0)) {
>
> I'm fine with this, but please take a look at these:
>
> 8e7ca8ca5fd8 PCI: xilinx: Relax device number checking to allow SR-IOV
> e18934b5e9c7 PCI: designware: Relax device number checking to allow SR-IOV
> d99e30b7936a PCI: altera: Relax device number checking to allow SR-IOV
>
> and make sure that reasoning doesn't apply here, too.
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8…
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e…
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d…
The original code for xilinx/designware/altera was doing:
if (bus->number == port->root_busno && devfn > 0)
return false;
if (bus->primary == port->root_busno && devfn > 0)
return false;
I.e, it was checking both if bus->number *and* bus->primary were equal
to port->root_busno.
The commit you points removed the check on bus->primary, keeping the
check on bus->number.
Your patch for the Aadvark driver only adds a check on bus->number, i.e
exactly what the xilinx/designware/altera code is still doing today:
Altera:
/* access only one slot on each root port */
if (bus->number == pcie->root_bus_nr && dev > 0)
return false;
Designware:
/* access only one slot on each root port */
if (bus->number == pp->root_bus_nr && dev > 0)
return 0;
Xilinx:
/* Only one device down on each root port */
if (bus->number == port->root_busno && devfn > 0)
return false;
Aardvark (with our patch):
if ((bus->number == pcie->root_bus_nr) && (PCI_SLOT(devfn) != 0)) {
*val = 0xffffffff;
return PCIBIOS_DEVICE_NOT_FOUND;
}
So we're doing exactly the same thing.
Do you agree ?
Best regards,
Thomas Petazzoni
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
The patch titled
Subject: lib/strscpy: remove word-at-a-time optimization.
has been added to the -mm tree. Its filename is
lib-strscpy-remove-word-at-a-time-optimization.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/lib-strscpy-remove-word-at-a-time-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/lib-strscpy-remove-word-at-a-time-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: lib/strscpy: remove word-at-a-time optimization.
strscpy() performs the word-at-a-time optimistic reads. So it may may
access the memory past the end of the object, which is perfectly fine
since strscpy() doesn't use that (past-the-end) data and makes sure the
optimistic read won't cross a page boundary.
But KASAN doesn't know anything about that so it will complain. There are
several possible ways to address this issue, but none are perfect. See
https://lkml.kernel.org/r/9f0a9cf6-51f7-cd1f-5dc6-6d510a7b8ec4@virtuozzo.com
It seems the best solution is to simply disable word-at-a-time
optimization. My trivial testing shows that byte-at-a-time could be up to
x4.3 times slower than word-at-a-time. It may seems like a lot, but it's
actually ~1.2e-10 sec per symbol vs ~4.8e-10 sec per symbol on modern
hardware. And we don't use strscpy() in a performance critical paths to
copy large amounts of data, so it shouldn't matter anyway.
Link: http://lkml.kernel.org/r/20180109163745.3692-1-aryabinin@virtuozzo.com
Fixes: 30035e45753b7 ("string: provide strscpy()")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Eryu Guan <eguan(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Chris Metcalf <metcalf(a)alum.mit.edu>
Cc: David Laight <David.Laight(a)ACULAB.COM>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/string.c | 38 --------------------------------------
1 file changed, 38 deletions(-)
diff -puN lib/string.c~lib-strscpy-remove-word-at-a-time-optimization lib/string.c
--- a/lib/string.c~lib-strscpy-remove-word-at-a-time-optimization
+++ a/lib/string.c
@@ -29,7 +29,6 @@
#include <linux/errno.h>
#include <asm/byteorder.h>
-#include <asm/word-at-a-time.h>
#include <asm/page.h>
#ifndef __HAVE_ARCH_STRNCASECMP
@@ -177,45 +176,8 @@ EXPORT_SYMBOL(strlcpy);
*/
ssize_t strscpy(char *dest, const char *src, size_t count)
{
- const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
- size_t max = count;
long res = 0;
- if (count == 0)
- return -E2BIG;
-
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- /*
- * If src is unaligned, don't cross a page boundary,
- * since we don't know if the next page is mapped.
- */
- if ((long)src & (sizeof(long) - 1)) {
- size_t limit = PAGE_SIZE - ((long)src & (PAGE_SIZE - 1));
- if (limit < max)
- max = limit;
- }
-#else
- /* If src or dest is unaligned, don't do word-at-a-time. */
- if (((long) dest | (long) src) & (sizeof(long) - 1))
- max = 0;
-#endif
-
- while (max >= sizeof(unsigned long)) {
- unsigned long c, data;
-
- c = *(unsigned long *)(src+res);
- if (has_zero(c, &data, &constants)) {
- data = prep_zero_mask(c, data, &constants);
- data = create_zero_mask(data);
- *(unsigned long *)(dest+res) = c & zero_bytemask(data);
- return res + find_zero(data);
- }
- *(unsigned long *)(dest+res) = c;
- res += sizeof(unsigned long);
- count -= sizeof(unsigned long);
- max -= sizeof(unsigned long);
- }
-
while (count) {
char c;
_
Patches currently in -mm which might be from aryabinin(a)virtuozzo.com are
kasan-makefile-support-llvm-style-asan-parameters.patch
lib-strscpy-remove-word-at-a-time-optimization.patch
> Christoph,
>
> > Ok. If the stable maintainers are ok with your small fix I'm not
> > going to complain too loudly. But I'm always worried about stable
> > trees divering too much from mainline.
>
> The seemingly innocuous transition from SG_GAPS to virt boundary has
> caused several data corruption regressions in the distro kernels. So has the
> corresponding conversion of storvsc.
>
> As a result, getting the current upstream code into 4.1 would mean
> backporting and testing a significant amount of both block layer and driver
> code. I don't think it's worth the risk. This patch is simple and the path of least
> resistance.
>
> Acked-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Sorry to bring up this patch again. It seems it hasn't made it to stable branches.
Please take a look.
>
> --
> Martin K. Petersen Oracle Linux Engineering
This is the start of the stable review cycle for the 4.14.13 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jan 10 12:59:02 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.13-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.13-rc1
Christian Borntraeger <borntraeger(a)de.ibm.com>
KVM: s390: prevent buffer overrun on memory hotplug during migration
Christian Borntraeger <borntraeger(a)de.ibm.com>
KVM: s390: fix cmma migration for multiple memory slots
Boris Brezillon <boris.brezillon(a)free-electrons.com>
mtd: nand: pxa3xx: Fix READOOB implementation
Helge Deller <deller(a)gmx.de>
parisc: qemu idle sleep support
Helge Deller <deller(a)gmx.de>
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
John Johansen <john.johansen(a)canonical.com>
apparmor: fix regression in mount mediation when feature set is pinned
Tom Lendacky <thomas.lendacky(a)amd.com>
x86/microcode/AMD: Add support for fam17h microcode loading
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - add new icbody type 15
John Sperbeck <jsperbeck(a)google.com>
powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR
Vineet Gupta <vgupta(a)synopsys.com>
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
Robin Murphy <robin.murphy(a)arm.com>
iommu/arm-smmu-v3: Cope with duplicated Stream IDs
Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
iommu/arm-smmu-v3: Don't free page table ops twice
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()
David Howells <dhowells(a)redhat.com>
fscache: Fix the default for fscache_maybe_release_page()
Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
sunxi-rsb: Include OF based modalias in device uevent
Lucas De Marchi <lucas.demarchi(a)intel.com>
drm/i915: Apply Display WA #1183 on skl, kbl, and cfl
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/i915: Disable DC states around GMBUS on GLK
Arnd Bergmann <arnd(a)arndb.de>
crypto: chelsio - select CRYPTO_GF128MUL
Eric Biggers <ebiggers(a)google.com>
crypto: pcrypt - fix freeing pcrypt instances
Eric Biggers <ebiggers(a)google.com>
crypto: chacha20poly1305 - validate the digest size
Jan Engelhardt <jengelh(a)inai.de>
crypto: n2 - cure use after free
Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
efi/capsule-loader: Reinstate virtual capsule mapping
Chris Mason <clm(a)fb.com>
btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
Andrea Arcangeli <aarcange(a)redhat.com>
userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
Baoquan He <bhe(a)redhat.com>
mm/sparse.c: wrong allocation for mem_section
Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
mm/mprotect: add a cond_resched() inside change_pmd_range()
Oleg Nesterov <oleg(a)redhat.com>
kernel/acct.c: fix the acct->needcheck check in check_free_space()
Thomas Gleixner <tglx(a)linutronix.de>
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
David Woodhouse <dwmw(a)amazon.co.uk>
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
Thomas Gleixner <tglx(a)linutronix.de>
x86/tlb: Drop the _GPL from the cpu_tlbstate export
Peter Zijlstra <peterz(a)infradead.org>
x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
Thomas Gleixner <tglx(a)linutronix.de>
x86/kaslr: Fix the vaddr_end mess
Thomas Gleixner <tglx(a)linutronix.de>
x86/mm: Map cpu_entry_area at the same place on 4/5 level
Andrey Ryabinin <aryabinin(a)virtuozzo.com>
x86/mm: Set MODULES_END to 0xffffffffff000000
-------------
Diffstat:
Documentation/x86/x86_64/mm.txt | 18 +++++----
Makefile | 4 +-
arch/arc/include/asm/uaccess.h | 5 ++-
arch/parisc/include/asm/ldcw.h | 2 +
arch/parisc/kernel/entry.S | 13 +++++-
arch/parisc/kernel/pacache.S | 9 ++++-
arch/parisc/kernel/process.c | 39 ++++++++++++++++++
arch/powerpc/mm/fault.c | 7 +++-
arch/s390/kvm/kvm-s390.c | 9 +++--
arch/s390/kvm/priv.c | 2 +-
arch/x86/events/intel/ds.c | 16 ++++++++
arch/x86/include/asm/alternative.h | 4 +-
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/pgtable_64_types.h | 14 +++++--
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/aperfmperf.c | 71 ++++++++++++++++++++++++---------
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/cpu.h | 3 ++
arch/x86/kernel/cpu/microcode/amd.c | 4 ++
arch/x86/kernel/cpu/proc.c | 6 ++-
arch/x86/mm/dump_pagetables.c | 2 +-
arch/x86/mm/init.c | 2 +-
arch/x86/mm/kaslr.c | 32 +++++----------
arch/x86/mm/pti.c | 6 +--
arch/x86/platform/efi/quirks.c | 13 +++++-
crypto/chacha20poly1305.c | 6 ++-
crypto/pcrypt.c | 19 ++++-----
drivers/bus/sunxi-rsb.c | 1 +
drivers/crypto/chelsio/Kconfig | 1 +
drivers/crypto/n2_core.c | 3 ++
drivers/firmware/efi/capsule-loader.c | 45 +++++++++++++++++----
drivers/gpu/drm/i915/i915_reg.h | 2 +
drivers/gpu/drm/i915/intel_cdclk.c | 35 +++++++++++-----
drivers/gpu/drm/i915/intel_runtime_pm.c | 11 +++++
drivers/input/mouse/elantech.c | 2 +-
drivers/iommu/arm-smmu-v3.c | 17 ++++++--
drivers/mtd/nand/pxa3xx_nand.c | 1 +
fs/btrfs/delayed-inode.c | 45 ++++++++++++++++-----
fs/proc/cpuinfo.c | 6 +++
fs/userfaultfd.c | 20 +++++++++-
include/linux/cpufreq.h | 1 +
include/linux/efi.h | 4 +-
include/linux/fscache.h | 2 +-
kernel/acct.c | 2 +-
kernel/signal.c | 18 +++++----
mm/mprotect.c | 6 ++-
mm/sparse.c | 2 +-
security/apparmor/mount.c | 12 +++++-
48 files changed, 409 insertions(+), 139 deletions(-)
This is a note to let you know that I've just added the patch titled
ANDROID: binder: remove waitqueue when thread exits.
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From f5cb779ba16334b45ba8946d6bfa6d9834d1527f Mon Sep 17 00:00:00 2001
From: Martijn Coenen <maco(a)android.com>
Date: Fri, 5 Jan 2018 11:27:07 +0100
Subject: ANDROID: binder: remove waitqueue when thread exits.
binder_poll() passes the thread->wait waitqueue that
can be slept on for work. When a thread that uses
epoll explicitly exits using BINDER_THREAD_EXIT,
the waitqueue is freed, but it is never removed
from the corresponding epoll data structure. When
the process subsequently exits, the epoll cleanup
code tries to access the waitlist, which results in
a use-after-free.
Prevent this by using POLLFREE when the thread exits.
Signed-off-by: Martijn Coenen <maco(a)android.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: stable <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 82fcc1e64e82..de4b67f09ddb 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4365,6 +4365,18 @@ static int binder_thread_release(struct binder_proc *proc,
if (t)
spin_lock(&t->lock);
}
+
+ /*
+ * If this thread used poll, make sure we remove the waitqueue
+ * from any epoll data structures holding it with POLLFREE.
+ * waitqueue_active() is safe to use here because we're holding
+ * the inner lock.
+ */
+ if ((thread->looper & BINDER_LOOPER_STATE_POLL) &&
+ waitqueue_active(&thread->wait)) {
+ wake_up_poll(&thread->wait, POLLHUP | POLLFREE);
+ }
+
binder_inner_proc_unlock(thread->proc);
if (send_reply)
--
2.15.1
This is a note to let you know that I've just added the patch titled
usb: f_fs: Prevent gadget unbind if it is already unbound
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From ce5bf9a50daf2d9078b505aca1cea22e88ecb94a Mon Sep 17 00:00:00 2001
From: Hemant Kumar <hemantk(a)codeaurora.org>
Date: Tue, 9 Jan 2018 12:30:53 +0530
Subject: usb: f_fs: Prevent gadget unbind if it is already unbound
Upon usb composition switch there is possibility of ep0 file
release happening after gadget driver bind. In case of composition
switch from adb to a non-adb composition gadget will never gets
bound again resulting into failure of usb device enumeration. Fix
this issue by checking FFS_FL_BOUND flag and avoid extra
gadget driver unbind if it is already done as part of composition
switch.
This fixes adb reconnection error reported on Android running
v4.4 and above kernel versions. Verified on Hikey running vanilla
v4.15-rc7 + few out of tree Mali patches.
Reviewed-at: https://android-review.googlesource.com/#/c/582632/
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Greg KH <gregkh(a)linux-foundation.org>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Dmitry Shmidt <dimitrysh(a)google.com>
Cc: Badhri <badhri(a)google.com>
Cc: Android Kernel Team <kernel-team(a)android.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hemant Kumar <hemantk(a)codeaurora.org>
[AmitP: Cherry-picked it from android-4.14 and updated the commit log]
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_fs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 038a27a13ebc..686af89323a5 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -3703,7 +3703,8 @@ static void ffs_closed(struct ffs_data *ffs)
ci = opts->func_inst.group.cg_item.ci_parent->ci_parent;
ffs_dev_unlock();
- unregister_gadget_item(ci);
+ if (test_bit(FFS_FL_BOUND, &ffs->flags))
+ unregister_gadget_item(ci);
return;
done:
ffs_dev_unlock();
--
2.15.1
This is a note to let you know that I've just added the patch titled
serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 38b1f0fb42f772b8c9aac53593883a18ff5eb9d7 Mon Sep 17 00:00:00 2001
From: Fabio Estevam <fabio.estevam(a)nxp.com>
Date: Thu, 4 Jan 2018 15:58:34 -0200
Subject: serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
The wakeup mechanism via RTSDEN bit relies on the system using the RTS/CTS
lines, so only allow such wakeup method when the system actually has
RTS/CTS support.
Fixes: bc85734b126f ("serial: imx: allow waking up on RTSD")
Signed-off-by: Fabio Estevam <fabio.estevam(a)nxp.com>
Reviewed-by: Martin Kaiser <martin(a)kaiser.cx>
Acked-by: Fugang Duan <fugang.duan(a)nxp.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/imx.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index c2b29fd66e8a..7143da39c170 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -2225,12 +2225,14 @@ static void serial_imx_enable_wakeup(struct imx_port *sport, bool on)
val &= ~UCR3_AWAKEN;
writel(val, sport->port.membase + UCR3);
- val = readl(sport->port.membase + UCR1);
- if (on)
- val |= UCR1_RTSDEN;
- else
- val &= ~UCR1_RTSDEN;
- writel(val, sport->port.membase + UCR1);
+ if (sport->have_rtscts) {
+ val = readl(sport->port.membase + UCR1);
+ if (on)
+ val |= UCR1_RTSDEN;
+ else
+ val &= ~UCR1_RTSDEN;
+ writel(val, sport->port.membase + UCR1);
+ }
}
static int imx_serial_port_suspend_noirq(struct device *dev)
--
2.15.1
This is a note to let you know that I've just added the patch titled
serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 7defa77d2baca4d6eb85234f10f38ab618332e75 Mon Sep 17 00:00:00 2001
From: Wei Yongjun <weiyongjun1(a)huawei.com>
Date: Thu, 4 Jan 2018 07:42:15 +0000
Subject: serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
Fix to return a negative error code from the port register error
handling case instead of 0, as done elsewhere in this function.
Fixes: 39be40ce066d ("serial: 8250_uniphier: fix serial port index in private data")
Signed-off-by: Wei Yongjun <weiyongjun1(a)huawei.com>
Acked-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_uniphier.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_uniphier.c b/drivers/tty/serial/8250/8250_uniphier.c
index 45ef506293ae..28d88ccf5a0c 100644
--- a/drivers/tty/serial/8250/8250_uniphier.c
+++ b/drivers/tty/serial/8250/8250_uniphier.c
@@ -250,12 +250,13 @@ static int uniphier_uart_probe(struct platform_device *pdev)
up.dl_read = uniphier_serial_dl_read;
up.dl_write = uniphier_serial_dl_write;
- priv->line = serial8250_register_8250_port(&up);
- if (priv->line < 0) {
+ ret = serial8250_register_8250_port(&up);
+ if (ret < 0) {
dev_err(dev, "failed to register 8250 port\n");
clk_disable_unprepare(priv->clk);
return ret;
}
+ priv->line = ret;
platform_set_drvdata(pdev, priv);
--
2.15.1
This is a note to let you know that I've just added the patch titled
ANDROID: binder: remove waitqueue when thread exits.
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From f5cb779ba16334b45ba8946d6bfa6d9834d1527f Mon Sep 17 00:00:00 2001
From: Martijn Coenen <maco(a)android.com>
Date: Fri, 5 Jan 2018 11:27:07 +0100
Subject: ANDROID: binder: remove waitqueue when thread exits.
binder_poll() passes the thread->wait waitqueue that
can be slept on for work. When a thread that uses
epoll explicitly exits using BINDER_THREAD_EXIT,
the waitqueue is freed, but it is never removed
from the corresponding epoll data structure. When
the process subsequently exits, the epoll cleanup
code tries to access the waitlist, which results in
a use-after-free.
Prevent this by using POLLFREE when the thread exits.
Signed-off-by: Martijn Coenen <maco(a)android.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: stable <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 82fcc1e64e82..de4b67f09ddb 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4365,6 +4365,18 @@ static int binder_thread_release(struct binder_proc *proc,
if (t)
spin_lock(&t->lock);
}
+
+ /*
+ * If this thread used poll, make sure we remove the waitqueue
+ * from any epoll data structures holding it with POLLFREE.
+ * waitqueue_active() is safe to use here because we're holding
+ * the inner lock.
+ */
+ if ((thread->looper & BINDER_LOOPER_STATE_POLL) &&
+ waitqueue_active(&thread->wait)) {
+ wake_up_poll(&thread->wait, POLLHUP | POLLFREE);
+ }
+
binder_inner_proc_unlock(thread->proc);
if (send_reply)
--
2.15.1
This is the start of the stable review cycle for the 4.4.111 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jan 10 12:59:14 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.111-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.111-rc1
Borislav Petkov <bp(a)suse.de>
Map the vsyscall page with _PAGE_USER
Alexey Dobriyan <adobriyan(a)gmail.com>
proc: much faster /proc/vmstat
Libor Pechacek <lpechacek(a)suse.com>
module: Issue warnings when tainting kernel
Miroslav Benes <mbenes(a)suse.cz>
module: keep percpu symbols in module's symtab
Michal Marek <mmarek(a)suse.com>
genksyms: Handle string literals with spaces in reference files
Thomas Gleixner <tglx(a)linutronix.de>
x86/tlb: Drop the _GPL from the cpu_tlbstate export
Boris Brezillon <boris.brezillon(a)free-electrons.com>
mtd: nand: pxa3xx: Fix READOOB implementation
Helge Deller <deller(a)gmx.de>
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
Tom Lendacky <thomas.lendacky(a)amd.com>
x86/microcode/AMD: Add support for fam17h microcode loading
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - add new icbody type 15
Vineet Gupta <vgupta(a)synopsys.com>
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
Thiago Rafael Becker <thiago.becker(a)gmail.com>
kernel: make groups_sort calling a responsibility group_info allocators
David Howells <dhowells(a)redhat.com>
fscache: Fix the default for fscache_maybe_release_page()
Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
sunxi-rsb: Include OF based modalias in device uevent
Eric Biggers <ebiggers(a)google.com>
crypto: pcrypt - fix freeing pcrypt instances
Eric Biggers <ebiggers(a)google.com>
crypto: chacha20poly1305 - validate the digest size
Jan Engelhardt <jengelh(a)inai.de>
crypto: n2 - cure use after free
Oleg Nesterov <oleg(a)redhat.com>
kernel/acct.c: fix the acct->needcheck check in check_free_space()
Andrey Ryabinin <aryabinin(a)virtuozzo.com>
x86/kasan: Write protect kasan zero shadow
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/include/asm/uaccess.h | 5 +++--
arch/parisc/include/asm/ldcw.h | 2 ++
arch/parisc/kernel/entry.S | 13 +++++++++++--
arch/parisc/kernel/pacache.S | 9 +++++++--
arch/s390/kernel/compat_linux.c | 1 +
arch/x86/entry/vsyscall/vsyscall_64.c | 5 +++++
arch/x86/include/asm/vsyscall.h | 2 ++
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/mm/init.c | 2 +-
arch/x86/mm/kaiser.c | 34 ++++++++++++++++++++++++++++++----
arch/x86/mm/kasan_init_64.c | 10 ++++++++--
crypto/chacha20poly1305.c | 6 +++++-
crypto/pcrypt.c | 19 ++++++++++---------
drivers/bus/sunxi-rsb.c | 1 +
drivers/crypto/n2_core.c | 3 +++
drivers/input/mouse/elantech.c | 2 +-
drivers/mtd/nand/pxa3xx_nand.c | 1 +
fs/nfsd/auth.c | 3 +++
include/linux/cred.h | 1 +
include/linux/fscache.h | 2 +-
kernel/acct.c | 2 +-
kernel/groups.c | 5 +++--
kernel/module.c | 26 +++++++++++++++++++++-----
kernel/signal.c | 18 ++++++++++--------
kernel/uid16.c | 1 +
mm/vmstat.c | 4 +++-
net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 +
net/sunrpc/auth_gss/svcauth_gss.c | 1 +
net/sunrpc/svcauth_unix.c | 2 ++
scripts/genksyms/genksyms.c | 6 ++++--
31 files changed, 149 insertions(+), 46 deletions(-)
This is a note to let you know that I've just added the patch titled
serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 38b1f0fb42f772b8c9aac53593883a18ff5eb9d7 Mon Sep 17 00:00:00 2001
From: Fabio Estevam <fabio.estevam(a)nxp.com>
Date: Thu, 4 Jan 2018 15:58:34 -0200
Subject: serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
The wakeup mechanism via RTSDEN bit relies on the system using the RTS/CTS
lines, so only allow such wakeup method when the system actually has
RTS/CTS support.
Fixes: bc85734b126f ("serial: imx: allow waking up on RTSD")
Signed-off-by: Fabio Estevam <fabio.estevam(a)nxp.com>
Reviewed-by: Martin Kaiser <martin(a)kaiser.cx>
Acked-by: Fugang Duan <fugang.duan(a)nxp.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/imx.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index c2b29fd66e8a..7143da39c170 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -2225,12 +2225,14 @@ static void serial_imx_enable_wakeup(struct imx_port *sport, bool on)
val &= ~UCR3_AWAKEN;
writel(val, sport->port.membase + UCR3);
- val = readl(sport->port.membase + UCR1);
- if (on)
- val |= UCR1_RTSDEN;
- else
- val &= ~UCR1_RTSDEN;
- writel(val, sport->port.membase + UCR1);
+ if (sport->have_rtscts) {
+ val = readl(sport->port.membase + UCR1);
+ if (on)
+ val |= UCR1_RTSDEN;
+ else
+ val &= ~UCR1_RTSDEN;
+ writel(val, sport->port.membase + UCR1);
+ }
}
static int imx_serial_port_suspend_noirq(struct device *dev)
--
2.15.1
This is a note to let you know that I've just added the patch titled
serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 7defa77d2baca4d6eb85234f10f38ab618332e75 Mon Sep 17 00:00:00 2001
From: Wei Yongjun <weiyongjun1(a)huawei.com>
Date: Thu, 4 Jan 2018 07:42:15 +0000
Subject: serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
Fix to return a negative error code from the port register error
handling case instead of 0, as done elsewhere in this function.
Fixes: 39be40ce066d ("serial: 8250_uniphier: fix serial port index in private data")
Signed-off-by: Wei Yongjun <weiyongjun1(a)huawei.com>
Acked-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_uniphier.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_uniphier.c b/drivers/tty/serial/8250/8250_uniphier.c
index 45ef506293ae..28d88ccf5a0c 100644
--- a/drivers/tty/serial/8250/8250_uniphier.c
+++ b/drivers/tty/serial/8250/8250_uniphier.c
@@ -250,12 +250,13 @@ static int uniphier_uart_probe(struct platform_device *pdev)
up.dl_read = uniphier_serial_dl_read;
up.dl_write = uniphier_serial_dl_write;
- priv->line = serial8250_register_8250_port(&up);
- if (priv->line < 0) {
+ ret = serial8250_register_8250_port(&up);
+ if (ret < 0) {
dev_err(dev, "failed to register 8250 port\n");
clk_disable_unprepare(priv->clk);
return ret;
}
+ priv->line = ret;
platform_set_drvdata(pdev, priv);
--
2.15.1
This is a note to let you know that I've just added the patch titled
usb: f_fs: Prevent gadget unbind if it is already unbound
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From ce5bf9a50daf2d9078b505aca1cea22e88ecb94a Mon Sep 17 00:00:00 2001
From: Hemant Kumar <hemantk(a)codeaurora.org>
Date: Tue, 9 Jan 2018 12:30:53 +0530
Subject: usb: f_fs: Prevent gadget unbind if it is already unbound
Upon usb composition switch there is possibility of ep0 file
release happening after gadget driver bind. In case of composition
switch from adb to a non-adb composition gadget will never gets
bound again resulting into failure of usb device enumeration. Fix
this issue by checking FFS_FL_BOUND flag and avoid extra
gadget driver unbind if it is already done as part of composition
switch.
This fixes adb reconnection error reported on Android running
v4.4 and above kernel versions. Verified on Hikey running vanilla
v4.15-rc7 + few out of tree Mali patches.
Reviewed-at: https://android-review.googlesource.com/#/c/582632/
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Greg KH <gregkh(a)linux-foundation.org>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Dmitry Shmidt <dimitrysh(a)google.com>
Cc: Badhri <badhri(a)google.com>
Cc: Android Kernel Team <kernel-team(a)android.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hemant Kumar <hemantk(a)codeaurora.org>
[AmitP: Cherry-picked it from android-4.14 and updated the commit log]
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_fs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 038a27a13ebc..686af89323a5 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -3703,7 +3703,8 @@ static void ffs_closed(struct ffs_data *ffs)
ci = opts->func_inst.group.cg_item.ci_parent->ci_parent;
ffs_dev_unlock();
- unregister_gadget_item(ci);
+ if (test_bit(FFS_FL_BOUND, &ffs->flags))
+ unregister_gadget_item(ci);
return;
done:
ffs_dev_unlock();
--
2.15.1
Commit 384b38b66947 ("ARM: 7873/1: vfp: clear vfp_current_hw_state
for dying cpu") fixed the cpu dying notifier by using
vfp_current_hw_state(). However commit e5b61bafe704 ("arm: Convert VFP
hotplug notifiers to state machine") incorrectly used the original
vfp_force_reload() function in the cpu dying notifier.
Fix it by going back to the correct vfp_current_hw_state() usage.
Fixes: e5b61bafe704 ("arm: Convert VFP hotplug notifiers to state machine")
Cc: linux-stable <stable(a)vger.kernel.org>
Reported-by: Kohji Okuno <okuno.kohji(a)jp.panasonic.com>
Signed-off-by: Fabio Estevam <fabio.estevam(a)nxp.com>
---
arch/arm/vfp/vfpmodule.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index a71a48e..aa7496b 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -648,7 +648,7 @@ int vfp_restore_user_hwstate(struct user_vfp __user *ufp,
*/
static int vfp_dying_cpu(unsigned int cpu)
{
- vfp_force_reload(cpu, current_thread_info());
+ vfp_current_hw_state[cpu] = NULL;
return 0;
}
--
2.7.4
This is a note to let you know that I've just added the patch titled
USB: UDC core: fix double-free in usb_add_gadget_udc_release
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 7ae2c3c280db183ca9ada2675c34ec2f7378abfa Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 3 Jan 2018 12:51:51 -0500
Subject: USB: UDC core: fix double-free in usb_add_gadget_udc_release
The error-handling pathways in usb_add_gadget_udc_release() are messed
up. Aside from the uninformative statement labels, they can deallocate
the udc structure after calling put_device(), which is a double-free.
This was observed by KASAN in automatic testing.
This patch cleans up the routine. It preserves the requirement that
when any failure occurs, we call put_device(&gadget->dev).
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: Fengguang Wu <fengguang.wu(a)intel.com>
CC: <stable(a)vger.kernel.org>
Reviewed-by: Peter Chen <peter.chen(a)nxp.com>
Acked-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/core.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 93eff7dec2f5..1b3efb14aec7 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1147,11 +1147,7 @@ int usb_add_gadget_udc_release(struct device *parent, struct usb_gadget *gadget,
udc = kzalloc(sizeof(*udc), GFP_KERNEL);
if (!udc)
- goto err1;
-
- ret = device_add(&gadget->dev);
- if (ret)
- goto err2;
+ goto err_put_gadget;
device_initialize(&udc->dev);
udc->dev.release = usb_udc_release;
@@ -1160,7 +1156,11 @@ int usb_add_gadget_udc_release(struct device *parent, struct usb_gadget *gadget,
udc->dev.parent = parent;
ret = dev_set_name(&udc->dev, "%s", kobject_name(&parent->kobj));
if (ret)
- goto err3;
+ goto err_put_udc;
+
+ ret = device_add(&gadget->dev);
+ if (ret)
+ goto err_put_udc;
udc->gadget = gadget;
gadget->udc = udc;
@@ -1170,7 +1170,7 @@ int usb_add_gadget_udc_release(struct device *parent, struct usb_gadget *gadget,
ret = device_add(&udc->dev);
if (ret)
- goto err4;
+ goto err_unlist_udc;
usb_gadget_set_state(gadget, USB_STATE_NOTATTACHED);
udc->vbus = true;
@@ -1178,27 +1178,25 @@ int usb_add_gadget_udc_release(struct device *parent, struct usb_gadget *gadget,
/* pick up one of pending gadget drivers */
ret = check_pending_gadget_drivers(udc);
if (ret)
- goto err5;
+ goto err_del_udc;
mutex_unlock(&udc_lock);
return 0;
-err5:
+ err_del_udc:
device_del(&udc->dev);
-err4:
+ err_unlist_udc:
list_del(&udc->list);
mutex_unlock(&udc_lock);
-err3:
- put_device(&udc->dev);
device_del(&gadget->dev);
-err2:
- kfree(udc);
+ err_put_udc:
+ put_device(&udc->dev);
-err1:
+ err_put_gadget:
put_device(&gadget->dev);
return ret;
}
--
2.15.1
This is a note to let you know that I've just added the patch titled
USB: fix usbmon BUG trigger
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 46eb14a6e1585d99c1b9f58d0e7389082a5f466b Mon Sep 17 00:00:00 2001
From: Pete Zaitcev <zaitcev(a)redhat.com>
Date: Mon, 8 Jan 2018 15:46:41 -0600
Subject: USB: fix usbmon BUG trigger
Automated tests triggered this by opening usbmon and accessing the
mmap while simultaneously resizing the buffers. This bug was with
us since 2006, because typically applications only size the buffers
once and thus avoid racing. Reported by Kirill A. Shutemov.
Reported-by: <syzbot+f9831b881b3e849829fc(a)syzkaller.appspotmail.com>
Signed-off-by: Pete Zaitcev <zaitcev(a)redhat.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/mon/mon_bin.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
index f6ae753ab99b..f932f40302df 100644
--- a/drivers/usb/mon/mon_bin.c
+++ b/drivers/usb/mon/mon_bin.c
@@ -1004,7 +1004,9 @@ static long mon_bin_ioctl(struct file *file, unsigned int cmd, unsigned long arg
break;
case MON_IOCQ_RING_SIZE:
+ mutex_lock(&rp->fetch_lock);
ret = rp->b_size;
+ mutex_unlock(&rp->fetch_lock);
break;
case MON_IOCT_RING_SIZE:
@@ -1231,12 +1233,16 @@ static int mon_bin_vma_fault(struct vm_fault *vmf)
unsigned long offset, chunk_idx;
struct page *pageptr;
+ mutex_lock(&rp->fetch_lock);
offset = vmf->pgoff << PAGE_SHIFT;
- if (offset >= rp->b_size)
+ if (offset >= rp->b_size) {
+ mutex_unlock(&rp->fetch_lock);
return VM_FAULT_SIGBUS;
+ }
chunk_idx = offset / CHUNK_SIZE;
pageptr = rp->b_vec[chunk_idx].pg;
get_page(pageptr);
+ mutex_unlock(&rp->fetch_lock);
vmf->page = pageptr;
return 0;
}
--
2.15.1
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
A: No.
Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Tue, Jan 09, 2018 at 06:28:29AM -0800, Pradeep Sawlani wrote:
> 4.4.110 stable tree.
What about 4.9 and 4.14? You do not want to cause a regression if
someone moves from 4.4.y to 4.9.y, right?
> bwh was internal thing which got here. Do you want me to modify commit
> message and send it across?
It makes no sense in the context, the person who did the work should
sign-off on the patch, right?
thanks,
greg k-h
This is the start of the stable review cycle for the 4.9.76 release.
There are 21 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jan 10 12:59:08 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.76-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.76-rc1
Borislav Petkov <bp(a)suse.de>
Map the vsyscall page with _PAGE_USER
Thomas Gleixner <tglx(a)linutronix.de>
x86/tlb: Drop the _GPL from the cpu_tlbstate export
Boris Brezillon <boris.brezillon(a)free-electrons.com>
mtd: nand: pxa3xx: Fix READOOB implementation
Helge Deller <deller(a)gmx.de>
parisc: qemu idle sleep support
Helge Deller <deller(a)gmx.de>
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
Tom Lendacky <thomas.lendacky(a)amd.com>
x86/microcode/AMD: Add support for fam17h microcode loading
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - add new icbody type 15
Vineet Gupta <vgupta(a)synopsys.com>
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
Robin Murphy <robin.murphy(a)arm.com>
iommu/arm-smmu-v3: Cope with duplicated Stream IDs
Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
iommu/arm-smmu-v3: Don't free page table ops twice
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
Oleg Nesterov <oleg(a)redhat.com>
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
Thiago Rafael Becker <thiago.becker(a)gmail.com>
kernel: make groups_sort calling a responsibility group_info allocators
Jens Axboe <axboe(a)fb.com>
nbd: fix use-after-free of rq/bio in the xmit path
David Howells <dhowells(a)redhat.com>
fscache: Fix the default for fscache_maybe_release_page()
Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
sunxi-rsb: Include OF based modalias in device uevent
Eric Biggers <ebiggers(a)google.com>
crypto: pcrypt - fix freeing pcrypt instances
Eric Biggers <ebiggers(a)google.com>
crypto: chacha20poly1305 - validate the digest size
Jan Engelhardt <jengelh(a)inai.de>
crypto: n2 - cure use after free
Oleg Nesterov <oleg(a)redhat.com>
kernel/acct.c: fix the acct->needcheck check in check_free_space()
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/include/asm/uaccess.h | 5 +++--
arch/parisc/include/asm/ldcw.h | 2 ++
arch/parisc/kernel/entry.S | 13 ++++++++++--
arch/parisc/kernel/pacache.S | 9 ++++++--
arch/parisc/kernel/process.c | 39 +++++++++++++++++++++++++++++++++++
arch/s390/kernel/compat_linux.c | 1 +
arch/x86/entry/vsyscall/vsyscall_64.c | 5 +++++
arch/x86/include/asm/vsyscall.h | 2 ++
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
arch/x86/mm/init.c | 2 +-
arch/x86/mm/kaiser.c | 34 ++++++++++++++++++++++++++----
crypto/chacha20poly1305.c | 6 +++++-
crypto/pcrypt.c | 19 +++++++++--------
drivers/block/nbd.c | 32 ++++++++++++++++++++--------
drivers/bus/sunxi-rsb.c | 1 +
drivers/crypto/n2_core.c | 3 +++
drivers/input/mouse/elantech.c | 2 +-
drivers/iommu/arm-smmu-v3.c | 17 +++++++++++----
drivers/mtd/nand/pxa3xx_nand.c | 1 +
fs/nfsd/auth.c | 3 +++
include/linux/cred.h | 1 +
include/linux/fscache.h | 2 +-
kernel/acct.c | 2 +-
kernel/groups.c | 5 +++--
kernel/signal.c | 18 +++++++++-------
kernel/uid16.c | 1 +
net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 +
net/sunrpc/auth_gss/svcauth_gss.c | 1 +
net/sunrpc/svcauth_unix.c | 2 ++
30 files changed, 188 insertions(+), 49 deletions(-)
This is a note to let you know that I've just added the patch titled
mux: core: fix double get_device()
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From aa1f10e85b0ab53dee85d8e293c8159d18d293a8 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Fri, 29 Dec 2017 00:22:54 +0100
Subject: mux: core: fix double get_device()
class_find_device already does a get_device on the returned device.
So the device returned by of_find_mux_chip_by_node is already referenced
and we should not reference it again (and unref it on error).
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Peter Rosin <peda(a)axentia.se>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mux/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/mux/core.c b/drivers/mux/core.c
index 2260063b0ea8..6e5cf9d9cd99 100644
--- a/drivers/mux/core.c
+++ b/drivers/mux/core.c
@@ -413,6 +413,7 @@ static int of_dev_node_match(struct device *dev, const void *data)
return dev->of_node == data;
}
+/* Note this function returns a reference to the mux_chip dev. */
static struct mux_chip *of_find_mux_chip_by_node(struct device_node *np)
{
struct device *dev;
@@ -466,6 +467,7 @@ struct mux_control *mux_control_get(struct device *dev, const char *mux_name)
(!args.args_count && (mux_chip->controllers > 1))) {
dev_err(dev, "%pOF: wrong #mux-control-cells for %pOF\n",
np, args.np);
+ put_device(&mux_chip->dev);
return ERR_PTR(-EINVAL);
}
@@ -476,10 +478,10 @@ struct mux_control *mux_control_get(struct device *dev, const char *mux_name)
if (controller >= mux_chip->controllers) {
dev_err(dev, "%pOF: bad mux controller %u specified in %pOF\n",
np, controller, args.np);
+ put_device(&mux_chip->dev);
return ERR_PTR(-EINVAL);
}
- get_device(&mux_chip->dev);
return &mux_chip->mux[controller];
}
EXPORT_SYMBOL_GPL(mux_control_get);
--
2.15.1
Tree/Branch: v3.16.53
Git describe: v3.16.53
Commit: 329f19bd63 Linux 3.16.53
Build Time: 34 min 29 sec
Passed: 9 / 9 (100.00 %)
Failed: 0 / 9 ( 0.00 %)
Errors: 0
Warnings: 8
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
1 warnings 0 mismatches : arm64-allnoconfig
2 warnings 0 mismatches : arm64-allmodconfig
1 warnings 0 mismatches : arm-multi_v5_defconfig
1 warnings 0 mismatches : arm-multi_v7_defconfig
2 warnings 0 mismatches : x86_64-defconfig
4 warnings 0 mismatches : arm-allmodconfig
1 warnings 0 mismatches : arm-allnoconfig
1 warnings 0 mismatches : x86_64-allnoconfig
3 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 8
7 ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
2 ../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
2 ../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
1 ../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function]
1 ../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label]
1 ../arch/x86/mm/kaiser.c:325:11: warning: unused variable 'idx' [-Wunused-variable]
1 ../arch/x86/kernel/cpu/common.c:1022:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm64-allmodconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
-------------------------------------------------------------------------------
arm-multi_v5_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm-multi_v7_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
x86_64-defconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../arch/x86/mm/kaiser.c:325:11: warning: unused variable 'idx' [-Wunused-variable]
../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm-allmodconfig : PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label]
../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
-------------------------------------------------------------------------------
arm-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../arch/x86/kernel/cpu/common.c:1022:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function]
-------------------------------------------------------------------------------
arm64-defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
This is a note to let you know that I've just added the patch titled
Fix build error in vma.c
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fix-build-error-in-vma.c.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Jan 9 10:24:02 CET 2018
Date: Tue, 09 Jan 2018 10:24:02 +0100
To: Greg KH <gregkh(a)linuxfoundation.org>
From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Subject: Fix build error in vma.c
From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
This fixes the following much-reported build issue:
arch/x86/entry/vdso/vma.c: In function ‘map_vdso’:
arch/x86/entry/vdso/vma.c:175:9: error:
implicit declaration of function ‘pvclock_pvti_cpu0_va’
on some arches and configurations.
Thanks to Guenter for being persistent enough to get it fixed :)
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vdso/vma.c | 1 +
1 file changed, 1 insertion(+)
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -12,6 +12,7 @@
#include <linux/random.h>
#include <linux/elf.h>
#include <linux/cpu.h>
+#include <asm/pvclock.h>
#include <asm/vgtod.h>
#include <asm/proto.h>
#include <asm/vdso.h>
Patches currently in stable-queue which might be from gregkh(a)linuxfoundation.org are
queue-4.4/map-the-vsyscall-page-with-_page_user.patch
queue-4.4/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.4/proc-much-faster-proc-vmstat.patch
queue-4.4/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.4/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
queue-4.4/parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
queue-4.4/crypto-n2-cure-use-after-free.patch
queue-4.4/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.4/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
queue-4.4/fscache-fix-the-default-for-fscache_maybe_release_page.patch
queue-4.4/fix-build-error-in-vma.c.patch
queue-4.4/kernel-make-groups_sort-calling-a-responsibility-group_info-allocators.patch
queue-4.4/module-issue-warnings-when-tainting-kernel.patch
queue-4.4/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.4/genksyms-handle-string-literals-with-spaces-in-reference-files.patch
queue-4.4/input-elantech-add-new-icbody-type-15.patch
queue-4.4/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
queue-4.4/x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
queue-4.4/sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
queue-4.4/module-keep-percpu-symbols-in-module-s-symtab.patch
queue-4.4/arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
Replaces Pan Bian <bianpan2016(a)163.com> patch
"net: dsa: lan9303: correctly check return value of devm_gpiod_get_optional"
Errors need to be prograted back from probe.
Note: I have only compile tested the code as I don't have the hardware.
Phil Reid (2):
net: dsa: lan9303: make lan9303_handle_reset() a void function
net: dsa: lan9303: check error value from devm_gpiod_get_optional()
drivers/net/dsa/lan9303-core.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
--
1.8.3.1
This is a note to let you know that I've just added the patch titled
iio: chemical: ccs811: Fix output of IIO_CONCENTRATION channels
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 8f114acd4e1a9cfa05b70bcc4219bc88197b5c9b Mon Sep 17 00:00:00 2001
From: Narcisa Ana Maria Vasile <narcisaanamaria12(a)gmail.com>
Date: Wed, 6 Dec 2017 18:57:58 +0200
Subject: iio: chemical: ccs811: Fix output of IIO_CONCENTRATION channels
in_concentration_raw should report, according to sysfs-bus-iio documentation,
a "Raw (unscaled no offset etc.) percentage reading of a substance."
Modify scale to convert from ppm/ppb to percentage:
1 ppm = 0.0001%
1 ppb = 0.0000001%
There is no offset needed to convert the ppm/ppb to percentage,
so remove offset from IIO_CONCENTRATION (IIO_MOD_CO2) channel.
Cc'd stable to reduce chance of userspace breakage in the long
run as we fix this wrong bit of ABI usage.
Signed-off-by: Narcisa Ana Maria Vasile <narcisaanamaria12(a)gmail.com>
Cc: <Stable(a)vger.kernel.org>
Reviewed-by: Matt Ranostay <matt.ranostay(a)konsulko.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/chemical/ccs811.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/iio/chemical/ccs811.c b/drivers/iio/chemical/ccs811.c
index 97bce8345c6a..fbe2431f5b81 100644
--- a/drivers/iio/chemical/ccs811.c
+++ b/drivers/iio/chemical/ccs811.c
@@ -96,7 +96,6 @@ static const struct iio_chan_spec ccs811_channels[] = {
.channel2 = IIO_MOD_CO2,
.modified = 1,
.info_mask_separate = BIT(IIO_CHAN_INFO_RAW) |
- BIT(IIO_CHAN_INFO_OFFSET) |
BIT(IIO_CHAN_INFO_SCALE),
.scan_index = 0,
.scan_type = {
@@ -255,24 +254,18 @@ static int ccs811_read_raw(struct iio_dev *indio_dev,
switch (chan->channel2) {
case IIO_MOD_CO2:
*val = 0;
- *val2 = 12834;
+ *val2 = 100;
return IIO_VAL_INT_PLUS_MICRO;
case IIO_MOD_VOC:
*val = 0;
- *val2 = 84246;
- return IIO_VAL_INT_PLUS_MICRO;
+ *val2 = 100;
+ return IIO_VAL_INT_PLUS_NANO;
default:
return -EINVAL;
}
default:
return -EINVAL;
}
- case IIO_CHAN_INFO_OFFSET:
- if (!(chan->type == IIO_CONCENTRATION &&
- chan->channel2 == IIO_MOD_CO2))
- return -EINVAL;
- *val = -400;
- return IIO_VAL_INT;
default:
return -EINVAL;
}
--
2.15.1
This is a note to let you know that I've just added the patch titled
iio: adc: stm32: fix scan of multiple channels with DMA
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 04e491ca9df60ffe8637d00d68e5ab8bc73b30d5 Mon Sep 17 00:00:00 2001
From: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Date: Fri, 5 Jan 2018 15:34:54 +0100
Subject: iio: adc: stm32: fix scan of multiple channels with DMA
By default, watermark is set to '1'. Watermark is used to fine tune
cyclic dma buffer period. In case watermark is left untouched (e.g. 1)
and several channels are being scanned, buffer period is wrongly set
(e.g. to 1 sample). As a consequence, data is never pushed to upper layer.
Fix buffer period size, by taking scan channels number into account.
Fixes: 2763ea0585c9 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/adc/stm32-adc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
index 6dbf9549cdc9..7f5def465340 100644
--- a/drivers/iio/adc/stm32-adc.c
+++ b/drivers/iio/adc/stm32-adc.c
@@ -1289,6 +1289,7 @@ static int stm32_adc_set_watermark(struct iio_dev *indio_dev, unsigned int val)
{
struct stm32_adc *adc = iio_priv(indio_dev);
unsigned int watermark = STM32_DMA_BUFFER_SIZE / 2;
+ unsigned int rx_buf_sz = STM32_DMA_BUFFER_SIZE;
/*
* dma cyclic transfers are used, buffer is split into two periods.
@@ -1297,7 +1298,7 @@ static int stm32_adc_set_watermark(struct iio_dev *indio_dev, unsigned int val)
* - one buffer (period) driver can push with iio_trigger_poll().
*/
watermark = min(watermark, val * (unsigned)(sizeof(u16)));
- adc->rx_buf_sz = watermark * 2;
+ adc->rx_buf_sz = min(rx_buf_sz, watermark * 2 * adc->num_conv);
return 0;
}
--
2.15.1
Hi,
I managed to take a bit of time to run some more tests on PTI both
native and hosted in KVM, on stable versions built with
CONFIG_PAGE_TABLE_ISOLATION=y. Here it's 4.9.75, used both on the
host and the VM. I could compare pti=on/off both in the host and the
VM. A single CPU was exposed in the VM.
It was running on my laptop (core i7 3320M at 2.6 GHz, 3.3 GHz single
core turbo).
The test was run on haproxy's ability to forward connections. The
results are below :
Host | Guest | conn/s | ratio_to_host | ratio_to_VM | Notes
---------+---------+---------+---------------+--------------+----------------
pti=off | - | 27400 | 100.0% | - | host reference
pti=off | pti=off | 24200 | 88.3% | 100.0% | VM reference
pti=off | pti=on | 13300 | 48.5% | 55.0% |
pti=on | - | 23800 | 86.9% | - | protected host
pti=on | pti=off | 23100 | 84.3% | 95.5% |
pti=on | pti=on | 13300 | 48.5% | 55.0% |
The ratio_to_host column shows the performance relative to the host
with pti=off. The ratio_to_VM column shows the performance relative to
the VM running with pti=off in a host also having pti=off (ie:
performance before upgrading the systems).
On this test we see a few things :
- the performance impact on the native host is around 13%
- the highest performance impact on VMs comes from having PTI on the
guest kernel (-45%). At this point it makes no difference whether
the host kernel has it or not.
- the host kernel's protection has a very limited impact on the guest
system's performance (-4.5%), which is probably nice for some cloud
users who might want to take the risk of turning the protection off
on their VMs.
The impact inside VMs is quite big but it's not where we usuall install
processes sensitive to syscall performance. I could find an even higher
impact on a packet generation program dropping from 2.5 Mpps to 600kpps
in the VM after the fix, but it doesn't make much sense to do this in
VMs so I don't really care.
I have not yet tried the retpoline patches.
Regards,
Willy
From: Hemant Kumar <hemantk(a)codeaurora.org>
Upon usb composition switch there is possibility of ep0 file
release happening after gadget driver bind. In case of composition
switch from adb to a non-adb composition gadget will never gets
bound again resulting into failure of usb device enumeration. Fix
this issue by checking FFS_FL_BOUND flag and avoid extra
gadget driver unbind if it is already done as part of composition
switch.
This fixes adb reconnection error reported on Android running
v4.4 and above kernel versions. Verified on Hikey running vanilla
v4.15-rc7 + few out of tree Mali patches.
Reviewed-at: https://android-review.googlesource.com/#/c/582632/
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Greg KH <gregkh(a)linux-foundation.org>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Dmitry Shmidt <dimitrysh(a)google.com>
Cc: Badhri <badhri(a)google.com>
Cc: Android Kernel Team <kernel-team(a)android.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hemant Kumar <hemantk(a)codeaurora.org>
[AmitP: Cherry-picked it from android-4.14 and updated the commit log]
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
---
drivers/usb/gadget/function/f_fs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index b6cf5ab5a0a1..f9bd351637cd 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -3700,7 +3700,8 @@ static void ffs_closed(struct ffs_data *ffs)
ci = opts->func_inst.group.cg_item.ci_parent->ci_parent;
ffs_dev_unlock();
- unregister_gadget_item(ci);
+ if (test_bit(FFS_FL_BOUND, &ffs->flags))
+ unregister_gadget_item(ci);
return;
done:
ffs_dev_unlock();
--
2.7.4
Some distributions have turned on the reset attack mitigation feature,
which is designed to force the platform to clear the contents of RAM if
the machine is shut down uncleanly. However, in order for the platform
to be able to determine whether the shutdown was clean or not, userspace
has to be configured to clear the MemoryOverwriteRequest flag on
shutdown - otherwise the firmware will end up clearing RAM on every
reboot, which is unnecessarily time consuming. Add some additional
clarity to the kconfig text to reduce the risk of systems being
configured this way.
Signed-off-by: Matthew Garrett <mjg59(a)google.com>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: linux-efi(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
drivers/firmware/efi/Kconfig | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 2b4c39fdfa91..86210f75d233 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -159,7 +159,10 @@ config RESET_ATTACK_MITIGATION
using the TCG Platform Reset Attack Mitigation specification. This
protects against an attacker forcibly rebooting the system while it
still contains secrets in RAM, booting another OS and extracting the
- secrets.
+ secrets. This should only be enabled when userland is configured to
+ clear the MemoryOverwriteRequest flag on clean shutdown after secrets
+ have been evicted, since otherwise it will trigger even on clean
+ reboots.
endmenu
--
2.16.0.rc0.223.g4a4ac83678-goog
On Mon, Jan 08, 2018 at 09:26:10PM +0100, Yves-Alexis Perez wrote:
> On Mon, 2018-01-08 at 19:26 +0100, Willy Tarreau wrote:
> > You're totally right, I discovered during my later developments that
> > indeed PCID is not exposed there. So we take the hit of a full TLB
> > flush twice per syscall.
>
> So I really think it might make sense to redo the tests with PCID, because the
> assumptions you're basing your patch series on might actually not hold.
I'll have to do it on the bare-metal server soon anyway.
Cheers,
Willy
From: Viktor Slavkovic <viktors(a)google.com>
A lock-unlock is missing in ASHMEM_SET_SIZE ioctl which can result in a
race condition when mmap is called. After the !asma->file check, before
setting asma->size, asma->file can be set in mmap. That would result in
having different asma->size than the mapped memory size. Combined with
ASHMEM_UNPIN ioctl and shrinker invocation, this can result in memory
corruption.
Signed-off-by: Viktor Slavkovic <viktors(a)google.com>
Cc: stable(a)vger.kernel.org
---
drivers/staging/android/ashmem.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 0f695df14c9d..372ce9913e6d 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -765,10 +765,12 @@ static long ashmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
break;
case ASHMEM_SET_SIZE:
ret = -EINVAL;
+ mutex_lock(&ashmem_mutex);
if (!asma->file) {
ret = 0;
asma->size = (size_t)arg;
}
+ mutex_unlock(&ashmem_mutex);
break;
case ASHMEM_GET_SIZE:
ret = asma->size;
--
2.16.0.rc0.223.g4a4ac83678-goog
Hi Yves-Alexis,
On Mon, Jan 08, 2018 at 06:07:54PM +0100, Yves-Alexis Perez wrote:
> On Sun, 2018-01-07 at 11:18 +0100, Willy Tarreau wrote:
> > - the highest performance impact on VMs comes from having PTI on the
> > guest kernel (-45%). At this point it makes no difference whether
> > the host kernel has it or not.
>
> Hi Willy,
>
> out of curiosity, is the pcid/invpcid flags exposed to and used by your guest
> CPU? It might very well that the PCID optimisations are not used by the guests
> here, and it might be worth either checking on bare metal or with the PCID
> optimisations enabled.
You're totally right, I discovered during my later developments that
indeed PCID is not exposed there. So we take the hit of a full TLB
flush twice per syscall.
Willy
On Mon, Jan 8, 2018 at 3:05 PM, kernelci.org bot <bot(a)kernelci.org> wrote:
>
> stable-rc/linux-4.4.y build: 178 builds: 7 failed, 171 passed, 8 errors (v4.4.110-23-g49278737d445)
> Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.110-23-…
> Tree: stable-rc
> Branch: linux-4.4.y
> Git Describe: v4.4.110-23-g49278737d445
> Git Commit: 49278737d4458032fb523dfe5451b441c04c5b73
> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Built: 4 unique architectures
>
> Build Failures Detected:
>
> x86: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
> allnoconfig FAIL
> i386_defconfig FAIL
> tinyconfig FAIL
I missed these earlier, since the kernelci summary output doesn't
print link errors:
arch/x86/kernel/setup.o: In function `vsyscall_enabled':
setup.c:(.text+0x10): multiple definition of `vsyscall_enabled'
arch/x86/kernel/time.o:time.c:(.text+0x10): first defined here
arch/x86/kernel/rtc.o: In function `vsyscall_enabled':
rtc.c:(.text+0x0): multiple definition of `vsyscall_enabled'
arch/x86/kernel/time.o:time.c:(.text+0x10): first defined here
arch/x86/kernel/cpu/built-in.o: In function `vsyscall_enabled':
(.text+0xbc0): multiple definition of `vsyscall_enabled'
This comes from 0cbf2b590bea ("Map the vsyscall page with _PAGE_USER")
which adds a line 'bool vsyscall_enabled(void) { return false; }' that
presumably
should have been 'static inline'.
Arnd
On 01/08/2018 01:05 AM, Yves-Alexis Perez wrote:
> On Mon, 2018-01-08 at 03:25 +0000, Ben Hutchings wrote:
>> This is with the full patch set applied (and a fix for NMI handling
>> that wasn't in 3.16.53-rc1):
>> https://www.decadent.org.uk/ben/tmp/linux-image-3.16.52_3.16.52-50_amd64.deb
I booted this. It crashes in *secondary* CPU startup when it sets
CR4.PCIDE while still in 32-bit protected mode. That's illegal.
Plain 3.16 doesn't do this:
https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/tree/a…
> /* Enable PAE mode and PGE */
> movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
> movq %rcx, %cr4
So I suspect the "Enable PAE and PGE" area is wrong.
On Mon, Jan 8, 2018 at 12:25 PM, kernelci.org bot <bot(a)kernelci.org> wrote:
>
> stable-rc/linux-4.4.y build: 178 builds: 4 failed, 174 passed, 8 errors (v4.4.110-18-g5da3d9af3a4b)
> Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.110-18-…
> Tree: stable-rc
> Branch: linux-4.4.y
> Git Describe: v4.4.110-18-g5da3d9af3a4b
> Git Commit: 5da3d9af3a4b90d3c5ab19f9ad1dbb7d237edcf9
> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> Built: 4 unique architectures
>
> Build Failures Detected:
>
> arm: gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05)
> cm_x300_defconfig FAIL
> mvebu_v7_defconfig FAIL
> pxa3xx_defconfig FAIL
> raumfeld_defconfig FAIL
>
> mvebu_v7_defconfig (arm) — FAIL, 2 errors, 0 warnings, 0 section mismatches
>
> Errors:
> drivers/mtd/nand/pxa3xx_nand.c:918:2: error: duplicate case value
> drivers/mtd/nand/pxa3xx_nand.c:915:2: error: previously used here
Hi Greg,
Commit fee4380f368e ("mtd: nand: pxa3xx: Fix READOOB implementation") was
apparently backported in error, it looks like it should just be
dropped here in 4.4.y.
The commit lists 'Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific
ECC BCH support")', that commit was merged into v4.14, so backporting the
fix further is probably not appropriate for 4.9 or earlier kernels either.
The duplicate case statement only happens before linux-4.6, as commit
c2cdace755b5 ("mtd: nand: pxa3xx_nand: add support for partial chunks")'
removed a previous 'case READOOB' statement in this driver, so the build
regression appears only in 4.4 but not 4.9.
Arnd
This is a note to let you know that I've just added the patch titled
mtd: nand: pxa3xx: Fix READOOB implementation
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mtd-nand-pxa3xx-fix-readoob-implementation.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fee4380f368e84ed216b62ccd2fbc4126f2bf40b Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Date: Mon, 18 Dec 2017 11:32:45 +0100
Subject: mtd: nand: pxa3xx: Fix READOOB implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
commit fee4380f368e84ed216b62ccd2fbc4126f2bf40b upstream.
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Reported-by: Sean Nyekjær <sean.nyekjaer(a)prevas.dk>
Tested-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel(a)vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer(a)prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik(a)free.fr>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mtd/nand/pxa3xx_nand.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mtd/nand/pxa3xx_nand.c
+++ b/drivers/mtd/nand/pxa3xx_nand.c
@@ -950,6 +950,7 @@ static void prepare_start_command(struct
switch (command) {
case NAND_CMD_READ0:
+ case NAND_CMD_READOOB:
case NAND_CMD_PAGEPROG:
info->use_ecc = 1;
break;
Patches currently in stable-queue which might be from boris.brezillon(a)free-electrons.com are
queue-4.9/mtd-nand-pxa3xx-fix-readoob-implementation.patch
Hi Pavlos!
On Mon, Jan 08, 2018 at 04:19:17PM +0100, Pavlos Parissis wrote:
> Hi,
>
> The 4.14 kernel is listed as a 'Stable' but not as 'Longterm' in the kernel releases feed[1], is
> this expected?
Normally kernels are only listed as LTS once they leave their regular
"stable" status, often after next kernel version is issued. I think this
leaves a last minute chance to Greg to change his mind if the kernel he
elected is too damaged ;-)
Cheers,
Willy
Hi Greg,
sorry, I lost the overview -- the KPTI 4.14, 4.9 and 4.4 patch set is
diverging...
I am missing patches like
x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
x86/pti: Make sure the user/kernel PTEs match
in 4.9.
The first one was really important but the problem might be hidden now
due to
x86/cpu, x86/pti: Do not enable PTI on AMD processors
--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
The patch
ASoC: rockchip: i2s: fix playback after runtime resume
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From c66234cfedfc3e6e3b62563a5f2c1562be09a35d Mon Sep 17 00:00:00 2001
From: John Keeping <john(a)metanate.com>
Date: Mon, 8 Jan 2018 16:01:04 +0000
Subject: [PATCH] ASoC: rockchip: i2s: fix playback after runtime resume
When restoring registers during runtime resume, we must not write to
I2S_TXDR which is the transmit FIFO as this queues up a sample to be
output and pushes all of the output channels down by one.
This can be demonstrated with the speaker-test utility:
for i in a b c; do speaker-test -c 2 -s 1; done
which should play a test through the left speaker three times but if the
I2S hardware starts runtime suspended the first sample will be played
through the right speaker.
Fix this by marking I2S_TXDR as volatile (which also requires marking it
as readble, even though it technically isn't). This seems to be the
most robust fix, the alternative of giving I2S_TXDR a default value is
more fragile since it does not prevent regcache writing to the register
in all circumstances.
While here, also fix the configuration of I2S_RXDR and I2S_FIFOLR; these
are not writable so they do not suffer from the same problem as I2S_TXDR
but reading from I2S_RXDR does suffer from a similar problem.
Fixes: f0447f6cbb20 ("ASoC: rockchip: i2s: restore register during runtime_suspend/resume cycle", 2016-09-07)
Signed-off-by: John Keeping <john(a)metanate.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/rockchip/rockchip_i2s.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/sound/soc/rockchip/rockchip_i2s.c b/sound/soc/rockchip/rockchip_i2s.c
index 908211e1d6fc..eb27f6c24bf7 100644
--- a/sound/soc/rockchip/rockchip_i2s.c
+++ b/sound/soc/rockchip/rockchip_i2s.c
@@ -504,6 +504,7 @@ static bool rockchip_i2s_rd_reg(struct device *dev, unsigned int reg)
case I2S_INTCR:
case I2S_XFER:
case I2S_CLR:
+ case I2S_TXDR:
case I2S_RXDR:
case I2S_FIFOLR:
case I2S_INTSR:
@@ -518,6 +519,9 @@ static bool rockchip_i2s_volatile_reg(struct device *dev, unsigned int reg)
switch (reg) {
case I2S_INTSR:
case I2S_CLR:
+ case I2S_FIFOLR:
+ case I2S_TXDR:
+ case I2S_RXDR:
return true;
default:
return false;
@@ -527,6 +531,8 @@ static bool rockchip_i2s_volatile_reg(struct device *dev, unsigned int reg)
static bool rockchip_i2s_precious_reg(struct device *dev, unsigned int reg)
{
switch (reg) {
+ case I2S_RXDR:
+ return true;
default:
return false;
}
--
2.15.1
On 01/08/2018 07:28 AM, Yves-Alexis Perez wrote:
> On Mon, 2018-01-08 at 15:19 +0000, Ben Hutchings wrote:
>>> I tried this on a ThinkPad X230 (Ivy Bridge CPU, i7 3520M, so pcid but not
>>> invpcid) on full UEFI mode and it doesn't boot at all. grub-efi loads the
>>> kernel and initrd, but I don't have any message after “Loading initial
>>> ramdisk” (even with debug earlyprintk=vga).
>> If you boot with EFI you need to use earlyprintk=efi.
> Thanks! With that, I can see the last few log lines before it hangs:
>
> Kernel/User page tables isolation: enabled
> […]
> bootconsole [earlyefi0] disabled
Does "earlyprintk=efi,keep" help get more out to the console?
> Also I must have done something wrong earlier today, because when booting with
> 'nopcid' it does work (which is consistent with other people tests, afaict).
If I had to guess, I'd say we're probably putting a PCID value into the
low bits of CR3 before we've _enabled_ PCID support in the CPU.
On Mon, Jan 08, 2018 at 04:19:17PM +0100, Pavlos Parissis wrote:
> Hi,
>
> The 4.14 kernel is listed as a 'Stable' but not as 'Longterm' in the kernel releases feed[1], is
> this expected?
Yes. A kernel can not become "LTS" until it has a chance to be that,
right now it is still just a "Stable" release as 4.15 is not released.
thanks,
greg k-h
On 05/01/18 00:06, Kevin Hilman wrote:
> kernelci.org bot <bot(a)kernelci.org> writes:
>
>> stable-rc/linux-4.4.y boot: 100 boots: 4 failed, 93 passed with 1 offline, 2 conflicts (v4.4.109-38-g99abd6cdd65e)
>>
>> Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.…
>> Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.109-38-…
>>
>> Tree: stable-rc
>> Branch: linux-4.4.y
>> Git Describe: v4.4.109-38-g99abd6cdd65e
>> Git Commit: 99abd6cdd65e984d89c8565508a7a96ea0fce179
>> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> Tested: 53 unique boards, 19 SoC families, 16 builds out of 178
>
> TL;DR; All is well.
>
>> Boot Regressions Detected:
>>
>> arm:
>>
>> exynos_defconfig:
>> exynos5422-odroidxu3:
>> lab-collabora: failing since 58 days (last pass: v4.4.95-21-g32458fcb7bd6 - first fail: v4.4.96-41-g336421367b9c)
>
> Long standing issue in lab-collabora (passing in other labs) Guillaume?
This should be fixed now, with a tweak to the device config to
enable relocating the ramdisk and dtb:
https://review.linaro.org/#/c/23238/
>> multi_v7_defconfig:
>> armada-xp-linksys-mamba:
>> lab-free-electrons: new failure (last pass: v4.4.109-36-g8b381424010c)
>
> Not a kerel issue, bootROM fails to start bootloader. I pinged lab
> owners (Free Electrons)
>
>> tegra124-nyan-big:
>> lab-collabora: failing since 1 day (last pass: v4.4.109 - first fail: v4.4.109-36-g8b381424010c)
>>
>> tegra_defconfig:
>> tegra124-nyan-big:
>> lab-collabora: failing since 1 day (last pass: v4.4.108-65-g57856049c0f8 - first fail: v4.4.109)
>
> This one is booting fine, but the command to power-off the board is
> timing out, resulting in a failure report.
Indeed, this was due to a crash of the lavapdu daemon - it's back
on track now.
(On a side note, the tegra124-nyan-big is still failing to boot
in mainline due to a genuine kernel driver issue.)
Guillaume
+Sjoerd
On 05/01/18 00:12, Kevin Hilman wrote:
> kernelci.org bot <bot(a)kernelci.org> writes:
>
>> stable-rc/linux-4.14.y boot: 118 boots: 4 failed, 113 passed with 1 offline (v4.14.11-15-g732141e47ee6)
>>
>> Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.1…
>> Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.11-15…
>>
>> Tree: stable-rc
>> Branch: linux-4.14.y
>> Git Describe: v4.14.11-15-g732141e47ee6
>> Git Commit: 732141e47ee614d70aeb8ad828a977ad19447e87
>> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> Tested: 68 unique boards, 23 SoC families, 16 builds out of 185
>>
>> Boot Regressions Detected:
>
> TL;DR; All is well.
>
> Same issues as reported for stable-rc/linux-4.4.y, but...
>
>> arm:
>>
>> multi_v7_defconfig:
>> armada-xp-linksys-mamba:
>> lab-free-electrons: new failure (last pass: v4.14.11)
>> tegra124-nyan-big:
>> lab-collabora: failing since 1 day (last pass: v4.14.10-147-g61f3176f64e3 - first fail: v4.14.11)
>>
>> omap2plus_defconfig:
>> am335x-boneblack:
>> lab-collabora: new failure (last pass: v4.14.11)
>
> ... this one is unique to 4.14, but is still a lab issue: bootloader
> fails to DHCP.
Turns out this is a known hardware issue with the ethernet phy on
this type of board. It's possible to work around it (retry...),
we'll take a look.
Guillaume
Quoting the original report:
It looks like there is a double-free vulnerability in Linux usbtv driver on an error path of usbtv_probe function. When audio registration fails, usbtv_video_free function ends up freeing usbtv data structure, which gets freed the second time under usbtv_video_fail label.
usbtv_audio_fail:
usbtv_video_free(usbtv); =>
v4l2_device_put(&usbtv->v4l2_dev);
=> v4l2_device_put
=> kref_put
=> v4l2_device_release
=> usbtv_release (CALLBACK)
=> kfree(usbtv) (1st time)
usbtv_video_fail:
usb_set_intfdata(intf, NULL);
usb_put_dev(usbtv->udev);
kfree(usbtv); (2nd time)
So, as we have refcounting, use it
Reported-by: Yavuz, Tuba <tuba(a)ece.ufl.edu>
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
CC: stable(a)vger.kernel.org
---
drivers/media/usb/usbtv/usbtv-core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/media/usb/usbtv/usbtv-core.c b/drivers/media/usb/usbtv/usbtv-core.c
index 127f8a0c098b..0c2e628e8723 100644
--- a/drivers/media/usb/usbtv/usbtv-core.c
+++ b/drivers/media/usb/usbtv/usbtv-core.c
@@ -112,6 +112,8 @@ static int usbtv_probe(struct usb_interface *intf,
return 0;
usbtv_audio_fail:
+ /* we must not free at this point */
+ usb_get_dev(usbtv->udev);
usbtv_video_free(usbtv);
usbtv_video_fail:
--
2.13.6
This is a note to let you know that I've just added the patch titled
iio: chemical: ccs811: Fix output of IIO_CONCENTRATION channels
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 1a0f51d5aaf71da6090ab839ba507b7df27efcbd Mon Sep 17 00:00:00 2001
From: Narcisa Ana Maria Vasile <narcisaanamaria12(a)gmail.com>
Date: Wed, 6 Dec 2017 18:57:58 +0200
Subject: iio: chemical: ccs811: Fix output of IIO_CONCENTRATION channels
in_concentration_raw should report, according to sysfs-bus-iio documentation,
a "Raw (unscaled no offset etc.) percentage reading of a substance."
Modify scale to convert from ppm/ppb to percentage:
1 ppm = 0.0001%
1 ppb = 0.0000001%
There is no offset needed to convert the ppm/ppb to percentage,
so remove offset from IIO_CONCENTRATION (IIO_MOD_CO2) channel.
Cc'd stable to reduce chance of userspace breakage in the long
run as we fix this wrong bit of ABI usage.
Signed-off-by: Narcisa Ana Maria Vasile <narcisaanamaria12(a)gmail.com>
Cc: <Stable(a)vger.kernel.org>
Reviewed-by: Matt Ranostay <matt.ranostay(a)konsulko.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/chemical/ccs811.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/iio/chemical/ccs811.c b/drivers/iio/chemical/ccs811.c
index 97bce8345c6a..fbe2431f5b81 100644
--- a/drivers/iio/chemical/ccs811.c
+++ b/drivers/iio/chemical/ccs811.c
@@ -96,7 +96,6 @@ static const struct iio_chan_spec ccs811_channels[] = {
.channel2 = IIO_MOD_CO2,
.modified = 1,
.info_mask_separate = BIT(IIO_CHAN_INFO_RAW) |
- BIT(IIO_CHAN_INFO_OFFSET) |
BIT(IIO_CHAN_INFO_SCALE),
.scan_index = 0,
.scan_type = {
@@ -255,24 +254,18 @@ static int ccs811_read_raw(struct iio_dev *indio_dev,
switch (chan->channel2) {
case IIO_MOD_CO2:
*val = 0;
- *val2 = 12834;
+ *val2 = 100;
return IIO_VAL_INT_PLUS_MICRO;
case IIO_MOD_VOC:
*val = 0;
- *val2 = 84246;
- return IIO_VAL_INT_PLUS_MICRO;
+ *val2 = 100;
+ return IIO_VAL_INT_PLUS_NANO;
default:
return -EINVAL;
}
default:
return -EINVAL;
}
- case IIO_CHAN_INFO_OFFSET:
- if (!(chan->type == IIO_CONCENTRATION &&
- chan->channel2 == IIO_MOD_CO2))
- return -EINVAL;
- *val = -400;
- return IIO_VAL_INT;
default:
return -EINVAL;
}
--
2.15.1
This is a note to let you know that I've just added the patch titled
iio: adc: stm32: fix scan of multiple channels with DMA
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 69545dfbcaf7a346d87dc6fb6f847b51a1845852 Mon Sep 17 00:00:00 2001
From: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Date: Fri, 5 Jan 2018 15:34:54 +0100
Subject: iio: adc: stm32: fix scan of multiple channels with DMA
By default, watermark is set to '1'. Watermark is used to fine tune
cyclic dma buffer period. In case watermark is left untouched (e.g. 1)
and several channels are being scanned, buffer period is wrongly set
(e.g. to 1 sample). As a consequence, data is never pushed to upper layer.
Fix buffer period size, by taking scan channels number into account.
Fixes: 2763ea0585c9 ("iio: adc: stm32: add optional dma support")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)st.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/stm32-adc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
index 6dbf9549cdc9..7f5def465340 100644
--- a/drivers/iio/adc/stm32-adc.c
+++ b/drivers/iio/adc/stm32-adc.c
@@ -1289,6 +1289,7 @@ static int stm32_adc_set_watermark(struct iio_dev *indio_dev, unsigned int val)
{
struct stm32_adc *adc = iio_priv(indio_dev);
unsigned int watermark = STM32_DMA_BUFFER_SIZE / 2;
+ unsigned int rx_buf_sz = STM32_DMA_BUFFER_SIZE;
/*
* dma cyclic transfers are used, buffer is split into two periods.
@@ -1297,7 +1298,7 @@ static int stm32_adc_set_watermark(struct iio_dev *indio_dev, unsigned int val)
* - one buffer (period) driver can push with iio_trigger_poll().
*/
watermark = min(watermark, val * (unsigned)(sizeof(u16)));
- adc->rx_buf_sz = watermark * 2;
+ adc->rx_buf_sz = min(rx_buf_sz, watermark * 2 * adc->num_conv);
return 0;
}
--
2.15.1
The current code hides a couple of bugs.
- The global variable 'clock_event_ddata' is overwritten each time the
init function is invoked.
This is fixed with a kmemdup instead of assigning the global variable. That
prevents a memory corruption when several timers are defined in the DT.
- The clockevent's event_handler is NULL if the time framework does
not select the clockevent when registering it, this is fine but the init
code generates in any case an interrupt leading to dereference this
NULL pointer.
The stm32 timer works with shadow registers, a mechanism to cache the
registers. When a change is done in one buffered register, we need to
artificially generate an event to force the timer to copy the content
of the register to the shadowed register.
The auto-reload register (ARR) is one of the shadowed register as well as
the prescaler register (PSC), so in order to force the copy, we issue an
event which in turn leads to an interrupt and the NULL dereference.
This is fixed by inverting two lines where we clear the status register
before enabling the update event interrupt.
As this kernel crash is resulting from the combination of these two bugs,
the fixes are grouped into a single patch.
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Tested-by: Benjamin Gaignard <benjamin.gaignard(a)st.com>
Acked-by: Benjamin Gaignard <benjamin.gaignard(a)st.com>
---
drivers/clocksource/timer-stm32.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/clocksource/timer-stm32.c b/drivers/clocksource/timer-stm32.c
index 8f24237..4bfeb99 100644
--- a/drivers/clocksource/timer-stm32.c
+++ b/drivers/clocksource/timer-stm32.c
@@ -106,6 +106,10 @@ static int __init stm32_clockevent_init(struct device_node *np)
unsigned long rate, max_delta;
int irq, ret, bits, prescaler = 1;
+ data = kmemdup(&clock_event_ddata, sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
clk = of_clk_get(np, 0);
if (IS_ERR(clk)) {
ret = PTR_ERR(clk);
@@ -156,8 +160,8 @@ static int __init stm32_clockevent_init(struct device_node *np)
writel_relaxed(prescaler - 1, data->base + TIM_PSC);
writel_relaxed(TIM_EGR_UG, data->base + TIM_EGR);
- writel_relaxed(TIM_DIER_UIE, data->base + TIM_DIER);
writel_relaxed(0, data->base + TIM_SR);
+ writel_relaxed(TIM_DIER_UIE, data->base + TIM_DIER);
data->periodic_top = DIV_ROUND_CLOSEST(rate, prescaler * HZ);
@@ -184,6 +188,7 @@ static int __init stm32_clockevent_init(struct device_node *np)
err_clk_enable:
clk_put(clk);
err_clk_get:
+ kfree(data);
return ret;
}
--
2.7.4
This is a note to let you know that I've just added the patch titled
Map the vsyscall page with _PAGE_USER
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
map-the-vsyscall-page-with-_page_user.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From: Borislav Petkov <bp(a)suse.de>
Date: Thu, 4 Jan 2018 17:42:45 +0100
Subject: Map the vsyscall page with _PAGE_USER
From: Borislav Petkov <bp(a)suse.de>
This needs to happen early in kaiser_pagetable_walk(), before the
hierarchy is established so that _PAGE_USER permission can be really
set.
A proper fix would be to teach kaiser_pagetable_walk() to update those
permissions but the vsyscall page is the only exception here so ...
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 5 +++++
arch/x86/include/asm/vsyscall.h | 2 ++
arch/x86/mm/kaiser.c | 34 ++++++++++++++++++++++++++++++----
3 files changed, 37 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -66,6 +66,11 @@ static int __init vsyscall_setup(char *s
}
early_param("vsyscall", vsyscall_setup);
+bool vsyscall_enabled(void)
+{
+ return vsyscall_mode != NONE;
+}
+
static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
const char *message)
{
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -12,12 +12,14 @@ extern void map_vsyscall(void);
* Returns true if handled.
*/
extern bool emulate_vsyscall(struct pt_regs *regs, unsigned long address);
+extern bool vsyscall_enabled(void);
#else
static inline void map_vsyscall(void) {}
static inline bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
{
return false;
}
+bool vsyscall_enabled(void) { return false; }
#endif
#endif /* _ASM_X86_VSYSCALL_H */
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -19,6 +19,7 @@
#include <asm/pgalloc.h>
#include <asm/desc.h>
#include <asm/cmdline.h>
+#include <asm/vsyscall.h>
int kaiser_enabled __read_mostly = 1;
EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
@@ -110,12 +111,13 @@ static inline unsigned long get_pa_from_
*
* Returns a pointer to a PTE on success, or NULL on failure.
*/
-static pte_t *kaiser_pagetable_walk(unsigned long address)
+static pte_t *kaiser_pagetable_walk(unsigned long address, bool user)
{
pmd_t *pmd;
pud_t *pud;
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ unsigned long prot = _KERNPG_TABLE;
if (pgd_none(*pgd)) {
WARN_ONCE(1, "All shadow pgds should have been populated");
@@ -123,6 +125,17 @@ static pte_t *kaiser_pagetable_walk(unsi
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
+ if (user) {
+ /*
+ * The vsyscall page is the only page that will have
+ * _PAGE_USER set. Catch everything else.
+ */
+ BUG_ON(address != VSYSCALL_ADDR);
+
+ set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
+ prot = _PAGE_TABLE;
+ }
+
pud = pud_offset(pgd, address);
/* The shadow page tables do not use large mappings: */
if (pud_large(*pud)) {
@@ -135,7 +148,7 @@ static pte_t *kaiser_pagetable_walk(unsi
return NULL;
spin_lock(&shadow_table_allocation_lock);
if (pud_none(*pud)) {
- set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ set_pud(pud, __pud(prot | __pa(new_pmd_page)));
__inc_zone_page_state(virt_to_page((void *)
new_pmd_page), NR_KAISERTABLE);
} else
@@ -155,7 +168,7 @@ static pte_t *kaiser_pagetable_walk(unsi
return NULL;
spin_lock(&shadow_table_allocation_lock);
if (pmd_none(*pmd)) {
- set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ set_pmd(pmd, __pmd(prot | __pa(new_pte_page)));
__inc_zone_page_state(virt_to_page((void *)
new_pte_page), NR_KAISERTABLE);
} else
@@ -191,7 +204,7 @@ static int kaiser_add_user_map(const voi
ret = -EIO;
break;
}
- pte = kaiser_pagetable_walk(address);
+ pte = kaiser_pagetable_walk(address, flags & _PAGE_USER);
if (!pte) {
ret = -ENOMEM;
break;
@@ -318,6 +331,19 @@ void __init kaiser_init(void)
kaiser_init_all_pgds();
+ /*
+ * Note that this sets _PAGE_USER and it needs to happen when the
+ * pagetable hierarchy gets created, i.e., early. Otherwise
+ * kaiser_pagetable_walk() will encounter initialized PTEs in the
+ * hierarchy and not set the proper permissions, leading to the
+ * pagefaults with page-protection violations when trying to read the
+ * vsyscall page. For example.
+ */
+ if (vsyscall_enabled())
+ kaiser_add_user_map_early((void *)VSYSCALL_ADDR,
+ PAGE_SIZE,
+ __PAGE_KERNEL_VSYSCALL);
+
for_each_possible_cpu(cpu) {
void *percpu_vaddr = __per_cpu_user_mapped_start +
per_cpu_offset(cpu);
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.9/map-the-vsyscall-page-with-_page_user.patch
This is a note to let you know that I've just added the patch titled
proc: much faster /proc/vmstat
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
proc-much-faster-proc-vmstat.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 68ba0326b4e14988f9e0c24a6e12a85cf2acd1ca Mon Sep 17 00:00:00 2001
From: Alexey Dobriyan <adobriyan(a)gmail.com>
Date: Fri, 7 Oct 2016 17:02:14 -0700
Subject: proc: much faster /proc/vmstat
From: Alexey Dobriyan <adobriyan(a)gmail.com>
commit 68ba0326b4e14988f9e0c24a6e12a85cf2acd1ca upstream.
Every current KDE system has process named ksysguardd polling files
below once in several seconds:
$ strace -e trace=open -p $(pidof ksysguardd)
Process 1812 attached
open("/etc/mtab", O_RDONLY|O_CLOEXEC) = 8
open("/etc/mtab", O_RDONLY|O_CLOEXEC) = 8
open("/proc/net/dev", O_RDONLY) = 8
open("/proc/net/wireless", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/stat", O_RDONLY) = 8
open("/proc/vmstat", O_RDONLY) = 8
Hell knows what it is doing but speed up reading /proc/vmstat by 33%!
Benchmark is open+read+close 1.000.000 times.
BEFORE
$ perf stat -r 10 taskset -c 3 ./proc-vmstat
Performance counter stats for 'taskset -c 3 ./proc-vmstat' (10 runs):
13146.768464 task-clock (msec) # 0.960 CPUs utilized ( +- 0.60% )
15 context-switches # 0.001 K/sec ( +- 1.41% )
1 cpu-migrations # 0.000 K/sec ( +- 11.11% )
104 page-faults # 0.008 K/sec ( +- 0.57% )
45,489,799,349 cycles # 3.460 GHz ( +- 0.03% )
9,970,175,743 stalled-cycles-frontend # 21.92% frontend cycles idle ( +- 0.10% )
2,800,298,015 stalled-cycles-backend # 6.16% backend cycles idle ( +- 0.32% )
79,241,190,850 instructions # 1.74 insn per cycle
# 0.13 stalled cycles per insn ( +- 0.00% )
17,616,096,146 branches # 1339.956 M/sec ( +- 0.00% )
176,106,232 branch-misses # 1.00% of all branches ( +- 0.18% )
13.691078109 seconds time elapsed ( +- 0.03% )
^^^^^^^^^^^^
AFTER
$ perf stat -r 10 taskset -c 3 ./proc-vmstat
Performance counter stats for 'taskset -c 3 ./proc-vmstat' (10 runs):
8688.353749 task-clock (msec) # 0.950 CPUs utilized ( +- 1.25% )
10 context-switches # 0.001 K/sec ( +- 2.13% )
1 cpu-migrations # 0.000 K/sec
104 page-faults # 0.012 K/sec ( +- 0.56% )
30,384,010,730 cycles # 3.497 GHz ( +- 0.07% )
12,296,259,407 stalled-cycles-frontend # 40.47% frontend cycles idle ( +- 0.13% )
3,370,668,651 stalled-cycles-backend # 11.09% backend cycles idle ( +- 0.69% )
28,969,052,879 instructions # 0.95 insn per cycle
# 0.42 stalled cycles per insn ( +- 0.01% )
6,308,245,891 branches # 726.058 M/sec ( +- 0.00% )
214,685,502 branch-misses # 3.40% of all branches ( +- 0.26% )
9.146081052 seconds time elapsed ( +- 0.07% )
^^^^^^^^^^^
vsnprintf() is slow because:
1. format_decode() is busy looking for format specifier: 2 branches
per character (not in this case, but in others)
2. approximately million branches while parsing format mini language
and everywhere
3. just look at what string() does /proc/vmstat is good case because
most of its content are strings
Link: http://lkml.kernel.org/r/20160806125455.GA1187@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Joe Perches <joe(a)perches.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/vmstat.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1351,7 +1351,9 @@ static int vmstat_show(struct seq_file *
unsigned long *l = arg;
unsigned long off = l - (unsigned long *)m->private;
- seq_printf(m, "%s %lu\n", vmstat_text[off], *l);
+ seq_puts(m, vmstat_text[off]);
+ seq_put_decimal_ull(m, ' ', *l);
+ seq_putc(m, '\n');
return 0;
}
Patches currently in stable-queue which might be from adobriyan(a)gmail.com are
queue-4.4/proc-much-faster-proc-vmstat.patch
This is a note to let you know that I've just added the patch titled
module: keep percpu symbols in module's symtab
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
module-keep-percpu-symbols-in-module-s-symtab.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e0224418516b4d8a6c2160574bac18447c354ef0 Mon Sep 17 00:00:00 2001
From: Miroslav Benes <mbenes(a)suse.cz>
Date: Thu, 26 Nov 2015 13:18:06 +1030
Subject: module: keep percpu symbols in module's symtab
From: Miroslav Benes <mbenes(a)suse.cz>
commit e0224418516b4d8a6c2160574bac18447c354ef0 upstream.
Currently, percpu symbols from .data..percpu ELF section of a module are
not copied over and stored in final symtab array of struct module.
Consequently such symbol cannot be returned via kallsyms API (for
example kallsyms_lookup_name). This can be especially confusing when the
percpu symbol is exported. Only its __ksymtab et al. are present in its
symtab.
The culprit is in layout_and_allocate() function where SHF_ALLOC flag is
dropped for .data..percpu section. There is in fact no need to copy the
section to final struct module, because kernel module loader allocates
extra percpu section by itself. Unfortunately only symbols from
SHF_ALLOC sections are copied due to a check in is_core_symbol().
The patch changes is_core_symbol() function to copy over also percpu
symbols (their st_shndx points to .data..percpu ELF section). We do it
only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the
function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally
elf_type() returns type 'a' for a percpu symbol because its address is
absolute.
Signed-off-by: Miroslav Benes <mbenes(a)suse.cz>
Signed-off-by: Rusty Russell <rusty(a)rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/module.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2404,7 +2404,7 @@ static char elf_type(const Elf_Sym *sym,
}
if (sym->st_shndx == SHN_UNDEF)
return 'U';
- if (sym->st_shndx == SHN_ABS)
+ if (sym->st_shndx == SHN_ABS || sym->st_shndx == info->index.pcpu)
return 'a';
if (sym->st_shndx >= SHN_LORESERVE)
return '?';
@@ -2433,7 +2433,7 @@ static char elf_type(const Elf_Sym *sym,
}
static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
- unsigned int shnum)
+ unsigned int shnum, unsigned int pcpundx)
{
const Elf_Shdr *sec;
@@ -2442,6 +2442,11 @@ static bool is_core_symbol(const Elf_Sym
|| !src->st_name)
return false;
+#ifdef CONFIG_KALLSYMS_ALL
+ if (src->st_shndx == pcpundx)
+ return true;
+#endif
+
sec = sechdrs + src->st_shndx;
if (!(sec->sh_flags & SHF_ALLOC)
#ifndef CONFIG_KALLSYMS_ALL
@@ -2479,7 +2484,8 @@ static void layout_symtab(struct module
/* Compute total space required for the core symbols' strtab. */
for (ndst = i = 0; i < nsrc; i++) {
if (i == 0 ||
- is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum)) {
+ is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum,
+ info->index.pcpu)) {
strtab_size += strlen(&info->strtab[src[i].st_name])+1;
ndst++;
}
@@ -2537,7 +2543,8 @@ static void add_kallsyms(struct module *
src = mod->kallsyms->symtab;
for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) {
if (i == 0 ||
- is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum)) {
+ is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum,
+ info->index.pcpu)) {
dst[ndst] = src[i];
dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
s += strlcpy(s, &mod->kallsyms->strtab[src[i].st_name],
Patches currently in stable-queue which might be from mbenes(a)suse.cz are
queue-4.4/module-keep-percpu-symbols-in-module-s-symtab.patch
This is a note to let you know that I've just added the patch titled
Map the vsyscall page with _PAGE_USER
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
map-the-vsyscall-page-with-_page_user.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From: Borislav Petkov <bp(a)suse.de>
Date: Thu, 4 Jan 2018 17:42:45 +0100
Subject: Map the vsyscall page with _PAGE_USER
From: Borislav Petkov <bp(a)suse.de>
This needs to happen early in kaiser_pagetable_walk(), before the
hierarchy is established so that _PAGE_USER permission can be really
set.
A proper fix would be to teach kaiser_pagetable_walk() to update those
permissions but the vsyscall page is the only exception here so ...
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 5 +++++
arch/x86/include/asm/vsyscall.h | 2 ++
arch/x86/mm/kaiser.c | 34 ++++++++++++++++++++++++++++++----
3 files changed, 37 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -66,6 +66,11 @@ static int __init vsyscall_setup(char *s
}
early_param("vsyscall", vsyscall_setup);
+bool vsyscall_enabled(void)
+{
+ return vsyscall_mode != NONE;
+}
+
static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
const char *message)
{
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -12,12 +12,14 @@ extern void map_vsyscall(void);
* Returns true if handled.
*/
extern bool emulate_vsyscall(struct pt_regs *regs, unsigned long address);
+extern bool vsyscall_enabled(void);
#else
static inline void map_vsyscall(void) {}
static inline bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
{
return false;
}
+bool vsyscall_enabled(void) { return false; }
#endif
#endif /* _ASM_X86_VSYSCALL_H */
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -20,6 +20,7 @@
#include <asm/pgalloc.h>
#include <asm/desc.h>
#include <asm/cmdline.h>
+#include <asm/vsyscall.h>
int kaiser_enabled __read_mostly = 1;
EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
@@ -111,12 +112,13 @@ static inline unsigned long get_pa_from_
*
* Returns a pointer to a PTE on success, or NULL on failure.
*/
-static pte_t *kaiser_pagetable_walk(unsigned long address)
+static pte_t *kaiser_pagetable_walk(unsigned long address, bool user)
{
pmd_t *pmd;
pud_t *pud;
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ unsigned long prot = _KERNPG_TABLE;
if (pgd_none(*pgd)) {
WARN_ONCE(1, "All shadow pgds should have been populated");
@@ -124,6 +126,17 @@ static pte_t *kaiser_pagetable_walk(unsi
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
+ if (user) {
+ /*
+ * The vsyscall page is the only page that will have
+ * _PAGE_USER set. Catch everything else.
+ */
+ BUG_ON(address != VSYSCALL_ADDR);
+
+ set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
+ prot = _PAGE_TABLE;
+ }
+
pud = pud_offset(pgd, address);
/* The shadow page tables do not use large mappings: */
if (pud_large(*pud)) {
@@ -136,7 +149,7 @@ static pte_t *kaiser_pagetable_walk(unsi
return NULL;
spin_lock(&shadow_table_allocation_lock);
if (pud_none(*pud)) {
- set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ set_pud(pud, __pud(prot | __pa(new_pmd_page)));
__inc_zone_page_state(virt_to_page((void *)
new_pmd_page), NR_KAISERTABLE);
} else
@@ -156,7 +169,7 @@ static pte_t *kaiser_pagetable_walk(unsi
return NULL;
spin_lock(&shadow_table_allocation_lock);
if (pmd_none(*pmd)) {
- set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ set_pmd(pmd, __pmd(prot | __pa(new_pte_page)));
__inc_zone_page_state(virt_to_page((void *)
new_pte_page), NR_KAISERTABLE);
} else
@@ -192,7 +205,7 @@ static int kaiser_add_user_map(const voi
ret = -EIO;
break;
}
- pte = kaiser_pagetable_walk(address);
+ pte = kaiser_pagetable_walk(address, flags & _PAGE_USER);
if (!pte) {
ret = -ENOMEM;
break;
@@ -319,6 +332,19 @@ void __init kaiser_init(void)
kaiser_init_all_pgds();
+ /*
+ * Note that this sets _PAGE_USER and it needs to happen when the
+ * pagetable hierarchy gets created, i.e., early. Otherwise
+ * kaiser_pagetable_walk() will encounter initialized PTEs in the
+ * hierarchy and not set the proper permissions, leading to the
+ * pagefaults with page-protection violations when trying to read the
+ * vsyscall page. For example.
+ */
+ if (vsyscall_enabled())
+ kaiser_add_user_map_early((void *)VSYSCALL_ADDR,
+ PAGE_SIZE,
+ __PAGE_KERNEL_VSYSCALL);
+
for_each_possible_cpu(cpu) {
void *percpu_vaddr = __per_cpu_user_mapped_start +
per_cpu_offset(cpu);
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.4/map-the-vsyscall-page-with-_page_user.patch
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
This is a note to let you know that I've just added the patch titled
genksyms: Handle string literals with spaces in reference files
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
genksyms-handle-string-literals-with-spaces-in-reference-files.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a78f70e8d65e88b9f631d073f68cb26dcd746298 Mon Sep 17 00:00:00 2001
From: Michal Marek <mmarek(a)suse.com>
Date: Wed, 9 Dec 2015 15:08:21 +0100
Subject: genksyms: Handle string literals with spaces in reference files
From: Michal Marek <mmarek(a)suse.com>
commit a78f70e8d65e88b9f631d073f68cb26dcd746298 upstream.
The reference files use spaces to separate tokens, however, we must
preserve spaces inside string literals. Currently the only case in the
tree is struct edac_raw_error_desc in <linux/edac.h>:
$ KBUILD_SYMTYPES=1 make -s drivers/edac/amd64_edac.symtypes
$ mv drivers/edac/amd64_edac.{symtypes,symref}
$ KBUILD_SYMTYPES=1 make -s drivers/edac/amd64_edac.symtypes
drivers/edac/amd64_edac.c:527: warning: amd64_get_dram_hole_info: modversion changed because of changes in struct edac_raw_error_desc
Signed-off-by: Michal Marek <mmarek(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
scripts/genksyms/genksyms.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/scripts/genksyms/genksyms.c
+++ b/scripts/genksyms/genksyms.c
@@ -423,13 +423,15 @@ static struct string_list *read_node(FIL
struct string_list node = {
.string = buffer,
.tag = SYM_NORMAL };
- int c;
+ int c, in_string = 0;
while ((c = fgetc(f)) != EOF) {
- if (c == ' ') {
+ if (!in_string && c == ' ') {
if (node.string == buffer)
continue;
break;
+ } else if (c == '"') {
+ in_string = !in_string;
} else if (c == '\n') {
if (node.string == buffer)
return NULL;
Patches currently in stable-queue which might be from mmarek(a)suse.com are
queue-4.4/genksyms-handle-string-literals-with-spaces-in-reference-files.patch
The patch
spi: imx: do not access registers while clocks disabled
has been applied to the spi tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From d593574aff0ab846136190b1729c151c736727ec Mon Sep 17 00:00:00 2001
From: Stefan Agner <stefan(a)agner.ch>
Date: Sun, 7 Jan 2018 15:05:49 +0100
Subject: [PATCH] spi: imx: do not access registers while clocks disabled
Since clocks are disabled except during message transfer clocks
are also disabled when spi_imx_remove gets called. Accessing
registers leads to a freeeze at least on a i.MX 6ULL. Enable
clocks before disabling accessing the MXC_CSPICTRL register.
Fixes: 9e556dcc55774 ("spi: spi-imx: only enable the clocks when we start to transfer a message")
Signed-off-by: Stefan Agner <stefan(a)agner.ch>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/spi/spi-imx.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 79ddefe4180d..40390d31a93b 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1668,12 +1668,23 @@ static int spi_imx_remove(struct platform_device *pdev)
{
struct spi_master *master = platform_get_drvdata(pdev);
struct spi_imx_data *spi_imx = spi_master_get_devdata(master);
+ int ret;
spi_bitbang_stop(&spi_imx->bitbang);
+ ret = clk_enable(spi_imx->clk_per);
+ if (ret)
+ return ret;
+
+ ret = clk_enable(spi_imx->clk_ipg);
+ if (ret) {
+ clk_disable(spi_imx->clk_per);
+ return ret;
+ }
+
writel(0, spi_imx->base + MXC_CSPICTRL);
- clk_unprepare(spi_imx->clk_ipg);
- clk_unprepare(spi_imx->clk_per);
+ clk_disable_unprepare(spi_imx->clk_ipg);
+ clk_disable_unprepare(spi_imx->clk_per);
spi_imx_sdma_exit(spi_imx);
spi_master_put(master);
--
2.15.1
On Fri, Jan 05, 2018 at 03:54:33PM +0100, Greg KH wrote:
> I'm announcing the release of the 4.4.110 kernel.
>
> All users of the 4.4 kernel series must upgrade.
>
> But be careful, there have been some reports of problems with this
> release during the -rc review cycle. Hopefully all of those issues are
> now resolved.
>
> So please test, as of right now, it should be "bug compatible" with the
> "enterprise" kernel releases with regards to the Meltdown bug and proper
> support on all virtual platforms (meaning there is still a vdso issue
> that might trip up some old binaries, again, please test!)
>
> If anyone has any problems, please let me know.
FWIW I've just booted one of our LBs on it and am hammering it at full
load with pti enabled and will let it run for the week-end. It takes
860k irq/s and about 1.7M syscalls/s. For now it works well (but slowly).
Hopefully if there are any rare race conditions left it has a chance to
trigger them.
Willy
This is a note to let you know that I've just added the patch titled
x86/tlb: Drop the _GPL from the cpu_tlbstate export
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1e5476815fd7f98b888e01a0f9522b63085f96c9 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 4 Jan 2018 22:19:04 +0100
Subject: x86/tlb: Drop the _GPL from the cpu_tlbstate export
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 1e5476815fd7f98b888e01a0f9522b63085f96c9 upstream.
The recent changes for PTI touch cpu_tlbstate from various tlb_flush
inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
regression when building out of tree drivers for certain graphics cards.
Aside of that the export was wrong since it was introduced as it should
have been EXPORT_PER_CPU_SYMBOL_GPL().
Use the correct PER_CPU export and drop the _GPL to restore the previous
state which allows users to utilize the cards they payed for.
As always I'm really thrilled to make this kind of change to support the
#friends (or however the hot hashtag of today is spelled) from that closet
sauce graphics corp.
Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Kees Cook <keescook(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Thomas Backlund <tmb(a)mageia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -768,7 +768,7 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb
.state = 0,
.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
};
-EXPORT_SYMBOL_GPL(cpu_tlbstate);
+EXPORT_PER_CPU_SYMBOL(cpu_tlbstate);
void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache)
{
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.9/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.9/x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
This is a note to let you know that I've just added the patch titled
x86/tlb: Drop the _GPL from the cpu_tlbstate export
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1e5476815fd7f98b888e01a0f9522b63085f96c9 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 4 Jan 2018 22:19:04 +0100
Subject: x86/tlb: Drop the _GPL from the cpu_tlbstate export
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 1e5476815fd7f98b888e01a0f9522b63085f96c9 upstream.
The recent changes for PTI touch cpu_tlbstate from various tlb_flush
inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
regression when building out of tree drivers for certain graphics cards.
Aside of that the export was wrong since it was introduced as it should
have been EXPORT_PER_CPU_SYMBOL_GPL().
Use the correct PER_CPU export and drop the _GPL to restore the previous
state which allows users to utilize the cards they payed for.
As always I'm really thrilled to make this kind of change to support the
#friends (or however the hot hashtag of today is spelled) from that closet
sauce graphics corp.
Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Kees Cook <keescook(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Thomas Backlund <tmb(a)mageia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -757,7 +757,7 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb
.state = 0,
.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
};
-EXPORT_SYMBOL_GPL(cpu_tlbstate);
+EXPORT_PER_CPU_SYMBOL(cpu_tlbstate);
void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache)
{
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.4/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
queue-4.4/x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
Hi Greg, if possible I would like to ask for commit
4307413256ac1e09b8f53e8715af3df9e49beec3
(https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial.git/commit…
— "USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ") to be
included in the next stable kernel(s).
It's a one line change that would save me a few headaches to not have
to build my own kernel just to get the driver loaded for this device
(for which I'm writing a tool) and will let users of the meter use the
tool sooner rather than later in the future.
Thanks!
--
Diego Elio Pettenò — Flameeyes
https://blog.flameeyes.eu/
This is a note to let you know that I've just added the patch titled
x86/tlb: Drop the _GPL from the cpu_tlbstate export
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1e5476815fd7f98b888e01a0f9522b63085f96c9 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 4 Jan 2018 22:19:04 +0100
Subject: x86/tlb: Drop the _GPL from the cpu_tlbstate export
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 1e5476815fd7f98b888e01a0f9522b63085f96c9 upstream.
The recent changes for PTI touch cpu_tlbstate from various tlb_flush
inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
regression when building out of tree drivers for certain graphics cards.
Aside of that the export was wrong since it was introduced as it should
have been EXPORT_PER_CPU_SYMBOL_GPL().
Use the correct PER_CPU export and drop the _GPL to restore the previous
state which allows users to utilize the cards they payed for.
As always I'm really thrilled to make this kind of change to support the
#friends (or however the hot hashtag of today is spelled) from that closet
sauce graphics corp.
Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Kees Cook <keescook(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -870,7 +870,7 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(
.next_asid = 1,
.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
};
-EXPORT_SYMBOL_GPL(cpu_tlbstate);
+EXPORT_PER_CPU_SYMBOL(cpu_tlbstate);
void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache)
{
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.14/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.14/x86-kaslr-fix-the-vaddr_end-mess.patch
queue-4.14/efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
queue-4.14/x86-mm-set-modules_end-to-0xffffffffff000000.patch
queue-4.14/mm-sparse.c-wrong-allocation-for-mem_section.patch
queue-4.14/x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
queue-4.14/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.14/x86-mm-map-cpu_entry_area-at-the-same-place-on-4-5-level.patch
On 01/07/2018 03:35 PM, Ben Hutchings wrote:
> I sent this out for review on the stable list after quite minimal
> testing, but have done more since then. On bare metal (Sandy Bridge,
> with pcid but not invpcid) it crashes at boot. In fact it
> reboots without any panic message, suggesting a triple fault, as soon
> as I apply the patch that turns on CR4.PCIDE, i.e. without KPTI itself.
My first guess would be something around this stuff:
> commit c7ad5ad297e644601747d6dbee978bf85e14f7bc
> Author: Andy Lutomirski <luto(a)kernel.org>
> Date: Sun Sep 10 17:48:27 2017 -0700
>
> x86/mm/64: Initialize CR4.PCIDE early
But, if you also want to toss a set of binaries up somewhere that I can
test I can give them a quick run in the simulator or with a hardware
debugger attached. It's been very useful in getting these things
debugged, especially when normal debugging techniques fail.
Tree/Branch: v3.2.98
Git describe: v3.2.98
Commit: 05580ec65b Linux 3.2.98
Build Time: 0 min 4 sec
Passed: 0 / 4 ( 0.00 %)
Failed: 4 / 4 (100.00 %)
Errors: 6
Warnings: 20
Section Mismatches: 0
Failed defconfigs:
x86_64-allnoconfig
arm-allmodconfig
arm-allnoconfig
x86_64-defconfig
Errors:
x86_64-allnoconfig
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
arm-allnoconfig
/home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
x86_64-defconfig
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
2 warnings 0 mismatches : arm-allmodconfig
19 warnings 0 mismatches : arm-allnoconfig
-------------------------------------------------------------------------------
Errors summary: 6
2 /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
2 /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
1 /home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
Warnings Summary: 20
2 .config:27:warning: symbol value '' invalid for PHYS_OFFSET
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
x86_64-allnoconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
arm-allmodconfig : FAIL, 0 errors, 2 warnings, 0 section mismatches
Warnings:
.config:27:warning: symbol value '' invalid for PHYS_OFFSET
.config:27:warning: symbol value '' invalid for PHYS_OFFSET
-------------------------------------------------------------------------------
arm-allnoconfig : FAIL, 4 errors, 19 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
Warnings:
/home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
-------------------------------------------------------------------------------
x86_64-defconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
On Sun, Jan 7, 2018 at 3:35 PM, Ben Hutchings <ben(a)decadent.org.uk> wrote:
> I have a backport of KPTI/KAISER to 3.16, based on Hugh Dickins's work
> for 3.18, some upstream changes between 3.16 and 3.18, and other
> patches that went into 4.4.75.
>
> I sent this out for review on the stable list after quite minimal
> testing, but have done more since then. On bare metal (Sandy Bridge,
> with pcid but not invpcid) it crashes at boot. In fact it
> reboots without any panic message, suggesting a triple fault, as soon
> as I apply the patch that turns on CR4.PCIDE, i.e. without KPTI itself.
>
> Using the 'nopcid' kernel parameter gets it to boot but it's somewhat
> unstable even after that - once I start another kernel build I see
> programs segfaulting. So I'm guessing I've screwed up some of the TLB
> stuff.
>
> I'm going to continue investigating this myself before making a
> release, but would really appreciate any time people can spare to
> review this patch series.
>
> (I haven't found any such regression in 3.2.98.)
>
> Ben.
>
> --
> Ben Hutchings
> friends: People who know you well, but like you anyway.
>
> --
> You received this message because you are subscribed to the Google Groups "kaiser-discuss-external" group.
> To view this discussion on the web visit https://groups.google.com/a/google.com/d/msgid/kaiser-discuss-external/1515….
Hi Ben, I'm not on that "stable list", and don't see where it is
archived, so this is the first I've heard of your 3.16 port (which
does sound from your description like you've approached it in exactly
the right way).
Could you please attach a tarfile of git-format-patches since 3.16.52
that we could look through - that will be easier to manage than
downloading 40 mails or whatever - thanks.
Hugh
This is a note to let you know that I've just added the patch titled
parisc: qemu idle sleep support
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-qemu-idle-sleep-support.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 310d82784fb4d60c80569f5ca9f53a7f3bf1d477 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Fri, 5 Jan 2018 21:55:38 +0100
Subject: parisc: qemu idle sleep support
From: Helge Deller <deller(a)gmx.de>
commit 310d82784fb4d60c80569f5ca9f53a7f3bf1d477 upstream.
Add qemu idle sleep support when running under qemu with SeaBIOS PDC
firmware.
Like the power architecture we use the "or" assembler instructions,
which translate to nops on real hardware, to indicate that qemu shall
idle sleep.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: Richard Henderson <rth(a)twiddle.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/kernel/process.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -39,6 +39,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/fs.h>
+#include <linux/cpu.h>
#include <linux/module.h>
#include <linux/personality.h>
#include <linux/ptrace.h>
@@ -181,6 +182,44 @@ int dump_task_fpu (struct task_struct *t
}
/*
+ * Idle thread support
+ *
+ * Detect when running on QEMU with SeaBIOS PDC Firmware and let
+ * QEMU idle the host too.
+ */
+
+int running_on_qemu __read_mostly;
+
+void __cpuidle arch_cpu_idle_dead(void)
+{
+ /* nop on real hardware, qemu will offline CPU. */
+ asm volatile("or %%r31,%%r31,%%r31\n":::);
+}
+
+void __cpuidle arch_cpu_idle(void)
+{
+ local_irq_enable();
+
+ /* nop on real hardware, qemu will idle sleep. */
+ asm volatile("or %%r10,%%r10,%%r10\n":::);
+}
+
+static int __init parisc_idle_init(void)
+{
+ const char *marker;
+
+ /* check QEMU/SeaBIOS marker in PAGE0 */
+ marker = (char *) &PAGE0->pad0;
+ running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0);
+
+ if (!running_on_qemu)
+ cpu_idle_poll_ctrl(1);
+
+ return 0;
+}
+arch_initcall(parisc_idle_init);
+
+/*
* Copy architecture-specific thread state
*/
int
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.9/parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
queue-4.9/parisc-qemu-idle-sleep-support.patch
This is a note to let you know that I've just added the patch titled
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 88776c0e70be0290f8357019d844aae15edaa967 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 2 Jan 2018 20:36:44 +0100
Subject: parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
From: Helge Deller <deller(a)gmx.de>
commit 88776c0e70be0290f8357019d844aae15edaa967 upstream.
Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures
about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9."
Those opcodes evaluate to the ldcw() assembly instruction which requires
(on 32bit) an alignment of 16 bytes to ensure atomicity.
As it turns out, qemu is correct and in our assembly code in entry.S and
pacache.S we don't pay attention to the required alignment.
This patch fixes the problem by aligning the lock offset in assembly
code in the same manner as we do in our C-code.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/include/asm/ldcw.h | 2 ++
arch/parisc/kernel/entry.S | 13 +++++++++++--
arch/parisc/kernel/pacache.S | 9 +++++++--
3 files changed, 20 insertions(+), 4 deletions(-)
--- a/arch/parisc/include/asm/ldcw.h
+++ b/arch/parisc/include/asm/ldcw.h
@@ -11,6 +11,7 @@
for the semaphore. */
#define __PA_LDCW_ALIGNMENT 16
+#define __PA_LDCW_ALIGN_ORDER 4
#define __ldcw_align(a) ({ \
unsigned long __ret = (unsigned long) &(a)->lock[0]; \
__ret = (__ret + __PA_LDCW_ALIGNMENT - 1) \
@@ -28,6 +29,7 @@
ldcd). */
#define __PA_LDCW_ALIGNMENT 4
+#define __PA_LDCW_ALIGN_ORDER 2
#define __ldcw_align(a) (&(a)->slock)
#define __LDCW "ldcw,co"
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -35,6 +35,7 @@
#include <asm/pgtable.h>
#include <asm/signal.h>
#include <asm/unistd.h>
+#include <asm/ldcw.h>
#include <asm/thread_info.h>
#include <linux/linkage.h>
@@ -46,6 +47,14 @@
#endif
.import pa_tlb_lock,data
+ .macro load_pa_tlb_lock reg
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 PA(pa_tlb_lock) + __PA_LDCW_ALIGNMENT-1, \reg
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \reg
+#else
+ load32 PA(pa_tlb_lock), \reg
+#endif
+ .endm
/* space_to_prot macro creates a prot id from a space id */
@@ -457,7 +466,7 @@
.macro tlb_lock spc,ptp,pte,tmp,tmp1,fault
#ifdef CONFIG_SMP
cmpib,COND(=),n 0,\spc,2f
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
1: LDCW 0(\tmp),\tmp1
cmpib,COND(=) 0,\tmp1,1b
nop
@@ -480,7 +489,7 @@
/* Release pa_tlb_lock lock. */
.macro tlb_unlock1 spc,tmp
#ifdef CONFIG_SMP
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
tlb_unlock0 \spc,\tmp
#endif
.endm
--- a/arch/parisc/kernel/pacache.S
+++ b/arch/parisc/kernel/pacache.S
@@ -36,6 +36,7 @@
#include <asm/assembly.h>
#include <asm/pgtable.h>
#include <asm/cache.h>
+#include <asm/ldcw.h>
#include <linux/linkage.h>
.text
@@ -333,8 +334,12 @@ ENDPROC_CFI(flush_data_cache_local)
.macro tlb_lock la,flags,tmp
#ifdef CONFIG_SMP
- ldil L%pa_tlb_lock,%r1
- ldo R%pa_tlb_lock(%r1),\la
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 pa_tlb_lock + __PA_LDCW_ALIGNMENT-1, \la
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \la
+#else
+ load32 pa_tlb_lock, \la
+#endif
rsm PSW_SM_I,\flags
1: LDCW 0(\la),\tmp
cmpib,<>,n 0,\tmp,3f
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.9/parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
queue-4.9/parisc-qemu-idle-sleep-support.patch
This is a note to let you know that I've just added the patch titled
mtd: nand: pxa3xx: Fix READOOB implementation
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mtd-nand-pxa3xx-fix-readoob-implementation.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fee4380f368e84ed216b62ccd2fbc4126f2bf40b Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Date: Mon, 18 Dec 2017 11:32:45 +0100
Subject: mtd: nand: pxa3xx: Fix READOOB implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
commit fee4380f368e84ed216b62ccd2fbc4126f2bf40b upstream.
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Reported-by: Sean Nyekjær <sean.nyekjaer(a)prevas.dk>
Tested-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel(a)vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer(a)prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik(a)free.fr>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mtd/nand/pxa3xx_nand.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mtd/nand/pxa3xx_nand.c
+++ b/drivers/mtd/nand/pxa3xx_nand.c
@@ -950,6 +950,7 @@ static void prepare_start_command(struct
switch (command) {
case NAND_CMD_READ0:
+ case NAND_CMD_READOOB:
case NAND_CMD_PAGEPROG:
info->use_ecc = 1;
break;
Patches currently in stable-queue which might be from boris.brezillon(a)free-electrons.com are
queue-4.9/mtd-nand-pxa3xx-fix-readoob-implementation.patch
This is a note to let you know that I've just added the patch titled
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 88776c0e70be0290f8357019d844aae15edaa967 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 2 Jan 2018 20:36:44 +0100
Subject: parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
From: Helge Deller <deller(a)gmx.de>
commit 88776c0e70be0290f8357019d844aae15edaa967 upstream.
Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures
about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9."
Those opcodes evaluate to the ldcw() assembly instruction which requires
(on 32bit) an alignment of 16 bytes to ensure atomicity.
As it turns out, qemu is correct and in our assembly code in entry.S and
pacache.S we don't pay attention to the required alignment.
This patch fixes the problem by aligning the lock offset in assembly
code in the same manner as we do in our C-code.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/include/asm/ldcw.h | 2 ++
arch/parisc/kernel/entry.S | 13 +++++++++++--
arch/parisc/kernel/pacache.S | 9 +++++++--
3 files changed, 20 insertions(+), 4 deletions(-)
--- a/arch/parisc/include/asm/ldcw.h
+++ b/arch/parisc/include/asm/ldcw.h
@@ -11,6 +11,7 @@
for the semaphore. */
#define __PA_LDCW_ALIGNMENT 16
+#define __PA_LDCW_ALIGN_ORDER 4
#define __ldcw_align(a) ({ \
unsigned long __ret = (unsigned long) &(a)->lock[0]; \
__ret = (__ret + __PA_LDCW_ALIGNMENT - 1) \
@@ -28,6 +29,7 @@
ldcd). */
#define __PA_LDCW_ALIGNMENT 4
+#define __PA_LDCW_ALIGN_ORDER 2
#define __ldcw_align(a) (&(a)->slock)
#define __LDCW "ldcw,co"
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -35,6 +35,7 @@
#include <asm/pgtable.h>
#include <asm/signal.h>
#include <asm/unistd.h>
+#include <asm/ldcw.h>
#include <asm/thread_info.h>
#include <linux/linkage.h>
@@ -46,6 +47,14 @@
#endif
.import pa_tlb_lock,data
+ .macro load_pa_tlb_lock reg
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 PA(pa_tlb_lock) + __PA_LDCW_ALIGNMENT-1, \reg
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \reg
+#else
+ load32 PA(pa_tlb_lock), \reg
+#endif
+ .endm
/* space_to_prot macro creates a prot id from a space id */
@@ -457,7 +466,7 @@
.macro tlb_lock spc,ptp,pte,tmp,tmp1,fault
#ifdef CONFIG_SMP
cmpib,COND(=),n 0,\spc,2f
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
1: LDCW 0(\tmp),\tmp1
cmpib,COND(=) 0,\tmp1,1b
nop
@@ -480,7 +489,7 @@
/* Release pa_tlb_lock lock. */
.macro tlb_unlock1 spc,tmp
#ifdef CONFIG_SMP
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
tlb_unlock0 \spc,\tmp
#endif
.endm
--- a/arch/parisc/kernel/pacache.S
+++ b/arch/parisc/kernel/pacache.S
@@ -36,6 +36,7 @@
#include <asm/assembly.h>
#include <asm/pgtable.h>
#include <asm/cache.h>
+#include <asm/ldcw.h>
#include <linux/linkage.h>
.text
@@ -333,8 +334,12 @@ ENDPROC(flush_data_cache_local)
.macro tlb_lock la,flags,tmp
#ifdef CONFIG_SMP
- ldil L%pa_tlb_lock,%r1
- ldo R%pa_tlb_lock(%r1),\la
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 pa_tlb_lock + __PA_LDCW_ALIGNMENT-1, \la
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \la
+#else
+ load32 pa_tlb_lock, \la
+#endif
rsm PSW_SM_I,\flags
1: LDCW 0(\la),\tmp
cmpib,<>,n 0,\tmp,3f
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.4/parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
This is a note to let you know that I've just added the patch titled
mtd: nand: pxa3xx: Fix READOOB implementation
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mtd-nand-pxa3xx-fix-readoob-implementation.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fee4380f368e84ed216b62ccd2fbc4126f2bf40b Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Date: Mon, 18 Dec 2017 11:32:45 +0100
Subject: mtd: nand: pxa3xx: Fix READOOB implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
commit fee4380f368e84ed216b62ccd2fbc4126f2bf40b upstream.
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Reported-by: Sean Nyekjær <sean.nyekjaer(a)prevas.dk>
Tested-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel(a)vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer(a)prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik(a)free.fr>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mtd/nand/pxa3xx_nand.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mtd/nand/pxa3xx_nand.c
+++ b/drivers/mtd/nand/pxa3xx_nand.c
@@ -912,6 +912,7 @@ static void prepare_start_command(struct
switch (command) {
case NAND_CMD_READ0:
+ case NAND_CMD_READOOB:
case NAND_CMD_PAGEPROG:
info->use_ecc = 1;
case NAND_CMD_READOOB:
Patches currently in stable-queue which might be from boris.brezillon(a)free-electrons.com are
queue-4.4/mtd-nand-pxa3xx-fix-readoob-implementation.patch
This is a note to let you know that I've just added the patch titled
parisc: qemu idle sleep support
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-qemu-idle-sleep-support.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 310d82784fb4d60c80569f5ca9f53a7f3bf1d477 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Fri, 5 Jan 2018 21:55:38 +0100
Subject: parisc: qemu idle sleep support
From: Helge Deller <deller(a)gmx.de>
commit 310d82784fb4d60c80569f5ca9f53a7f3bf1d477 upstream.
Add qemu idle sleep support when running under qemu with SeaBIOS PDC
firmware.
Like the power architecture we use the "or" assembler instructions,
which translate to nops on real hardware, to indicate that qemu shall
idle sleep.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: Richard Henderson <rth(a)twiddle.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/kernel/process.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -39,6 +39,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/fs.h>
+#include <linux/cpu.h>
#include <linux/module.h>
#include <linux/personality.h>
#include <linux/ptrace.h>
@@ -184,6 +185,44 @@ int dump_task_fpu (struct task_struct *t
}
/*
+ * Idle thread support
+ *
+ * Detect when running on QEMU with SeaBIOS PDC Firmware and let
+ * QEMU idle the host too.
+ */
+
+int running_on_qemu __read_mostly;
+
+void __cpuidle arch_cpu_idle_dead(void)
+{
+ /* nop on real hardware, qemu will offline CPU. */
+ asm volatile("or %%r31,%%r31,%%r31\n":::);
+}
+
+void __cpuidle arch_cpu_idle(void)
+{
+ local_irq_enable();
+
+ /* nop on real hardware, qemu will idle sleep. */
+ asm volatile("or %%r10,%%r10,%%r10\n":::);
+}
+
+static int __init parisc_idle_init(void)
+{
+ const char *marker;
+
+ /* check QEMU/SeaBIOS marker in PAGE0 */
+ marker = (char *) &PAGE0->pad0;
+ running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0);
+
+ if (!running_on_qemu)
+ cpu_idle_poll_ctrl(1);
+
+ return 0;
+}
+arch_initcall(parisc_idle_init);
+
+/*
* Copy architecture-specific thread state
*/
int
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.14/parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
queue-4.14/parisc-qemu-idle-sleep-support.patch
This is a note to let you know that I've just added the patch titled
mtd: nand: pxa3xx: Fix READOOB implementation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mtd-nand-pxa3xx-fix-readoob-implementation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fee4380f368e84ed216b62ccd2fbc4126f2bf40b Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Date: Mon, 18 Dec 2017 11:32:45 +0100
Subject: mtd: nand: pxa3xx: Fix READOOB implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
commit fee4380f368e84ed216b62ccd2fbc4126f2bf40b upstream.
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Reported-by: Sean Nyekjær <sean.nyekjaer(a)prevas.dk>
Tested-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel(a)vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer(a)prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik(a)free.fr>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mtd/nand/pxa3xx_nand.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mtd/nand/pxa3xx_nand.c
+++ b/drivers/mtd/nand/pxa3xx_nand.c
@@ -950,6 +950,7 @@ static void prepare_start_command(struct
switch (command) {
case NAND_CMD_READ0:
+ case NAND_CMD_READOOB:
case NAND_CMD_PAGEPROG:
info->use_ecc = 1;
break;
Patches currently in stable-queue which might be from boris.brezillon(a)free-electrons.com are
queue-4.14/mtd-nand-pxa3xx-fix-readoob-implementation.patch
This is a note to let you know that I've just added the patch titled
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 88776c0e70be0290f8357019d844aae15edaa967 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 2 Jan 2018 20:36:44 +0100
Subject: parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
From: Helge Deller <deller(a)gmx.de>
commit 88776c0e70be0290f8357019d844aae15edaa967 upstream.
Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures
about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9."
Those opcodes evaluate to the ldcw() assembly instruction which requires
(on 32bit) an alignment of 16 bytes to ensure atomicity.
As it turns out, qemu is correct and in our assembly code in entry.S and
pacache.S we don't pay attention to the required alignment.
This patch fixes the problem by aligning the lock offset in assembly
code in the same manner as we do in our C-code.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/include/asm/ldcw.h | 2 ++
arch/parisc/kernel/entry.S | 13 +++++++++++--
arch/parisc/kernel/pacache.S | 9 +++++++--
3 files changed, 20 insertions(+), 4 deletions(-)
--- a/arch/parisc/include/asm/ldcw.h
+++ b/arch/parisc/include/asm/ldcw.h
@@ -12,6 +12,7 @@
for the semaphore. */
#define __PA_LDCW_ALIGNMENT 16
+#define __PA_LDCW_ALIGN_ORDER 4
#define __ldcw_align(a) ({ \
unsigned long __ret = (unsigned long) &(a)->lock[0]; \
__ret = (__ret + __PA_LDCW_ALIGNMENT - 1) \
@@ -29,6 +30,7 @@
ldcd). */
#define __PA_LDCW_ALIGNMENT 4
+#define __PA_LDCW_ALIGN_ORDER 2
#define __ldcw_align(a) (&(a)->slock)
#define __LDCW "ldcw,co"
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -35,6 +35,7 @@
#include <asm/pgtable.h>
#include <asm/signal.h>
#include <asm/unistd.h>
+#include <asm/ldcw.h>
#include <asm/thread_info.h>
#include <linux/linkage.h>
@@ -46,6 +47,14 @@
#endif
.import pa_tlb_lock,data
+ .macro load_pa_tlb_lock reg
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 PA(pa_tlb_lock) + __PA_LDCW_ALIGNMENT-1, \reg
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \reg
+#else
+ load32 PA(pa_tlb_lock), \reg
+#endif
+ .endm
/* space_to_prot macro creates a prot id from a space id */
@@ -457,7 +466,7 @@
.macro tlb_lock spc,ptp,pte,tmp,tmp1,fault
#ifdef CONFIG_SMP
cmpib,COND(=),n 0,\spc,2f
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
1: LDCW 0(\tmp),\tmp1
cmpib,COND(=) 0,\tmp1,1b
nop
@@ -480,7 +489,7 @@
/* Release pa_tlb_lock lock. */
.macro tlb_unlock1 spc,tmp
#ifdef CONFIG_SMP
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
tlb_unlock0 \spc,\tmp
#endif
.endm
--- a/arch/parisc/kernel/pacache.S
+++ b/arch/parisc/kernel/pacache.S
@@ -36,6 +36,7 @@
#include <asm/assembly.h>
#include <asm/pgtable.h>
#include <asm/cache.h>
+#include <asm/ldcw.h>
#include <linux/linkage.h>
.text
@@ -333,8 +334,12 @@ ENDPROC_CFI(flush_data_cache_local)
.macro tlb_lock la,flags,tmp
#ifdef CONFIG_SMP
- ldil L%pa_tlb_lock,%r1
- ldo R%pa_tlb_lock(%r1),\la
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 pa_tlb_lock + __PA_LDCW_ALIGNMENT-1, \la
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \la
+#else
+ load32 pa_tlb_lock, \la
+#endif
rsm PSW_SM_I,\flags
1: LDCW 0(\la),\tmp
cmpib,<>,n 0,\tmp,3f
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.14/parisc-fix-alignment-of-pa_tlb_lock-in-assembly-on-32-bit-smp-kernel.patch
queue-4.14/parisc-qemu-idle-sleep-support.patch
This is a note to let you know that I've just added the patch titled
KVM: s390: prevent buffer overrun on memory hotplug during migration
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-s390-prevent-buffer-overrun-on-memory-hotplug-during-migration.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c2cf265d860882b51a200e4a7553c17827f2b730 Mon Sep 17 00:00:00 2001
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
Date: Thu, 21 Dec 2017 09:18:22 +0100
Subject: KVM: s390: prevent buffer overrun on memory hotplug during migration
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
commit c2cf265d860882b51a200e4a7553c17827f2b730 upstream.
We must not go beyond the pre-allocated buffer. This can happen when
a new memory slot is added during migration.
Reported-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Fixes: 190df4a212a7 (KVM: s390: CMMA tracking, ESSA emulation, migration mode)
Reviewed-by: Cornelia Huck <cohuck(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kvm/priv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -1009,7 +1009,7 @@ static inline int do_essa(struct kvm_vcp
cbrlo[entries] = gfn << PAGE_SHIFT;
}
- if (orc) {
+ if (orc && gfn < ms->bitmap_size) {
/* increment only if we are really flipping the bit to 1 */
if (!test_and_set_bit(gfn, ms->pgste_bitmap))
atomic64_inc(&ms->dirty_pages);
Patches currently in stable-queue which might be from borntraeger(a)de.ibm.com are
queue-4.14/kvm-s390-prevent-buffer-overrun-on-memory-hotplug-during-migration.patch
queue-4.14/kvm-s390-fix-cmma-migration-for-multiple-memory-slots.patch
This is a note to let you know that I've just added the patch titled
KVM: s390: fix cmma migration for multiple memory slots
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-s390-fix-cmma-migration-for-multiple-memory-slots.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 32aa144fc32abfcbf7140f473dfbd94c5b9b4105 Mon Sep 17 00:00:00 2001
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
Date: Fri, 15 Dec 2017 13:14:31 +0100
Subject: KVM: s390: fix cmma migration for multiple memory slots
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
commit 32aa144fc32abfcbf7140f473dfbd94c5b9b4105 upstream.
When multiple memory slots are present the cmma migration code
does not allocate enough memory for the bitmap. The memory slots
are sorted in reverse order, so we must use gfn and size of
slot[0] instead of the last one.
Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com>
Fixes: 190df4a212a7 (KVM: s390: CMMA tracking, ESSA emulation, migration mode)
Reviewed-by: Cornelia Huck <cohuck(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/s390/kvm/kvm-s390.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -794,11 +794,12 @@ static int kvm_s390_vm_start_migration(s
if (kvm->arch.use_cmma) {
/*
- * Get the last slot. They should be sorted by base_gfn, so the
- * last slot is also the one at the end of the address space.
- * We have verified above that at least one slot is present.
+ * Get the first slot. They are reverse sorted by base_gfn, so
+ * the first slot is also the one at the end of the address
+ * space. We have verified above that at least one slot is
+ * present.
*/
- ms = slots->memslots + slots->used_slots - 1;
+ ms = slots->memslots;
/* round up so we only use full longs */
ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG);
/* allocate enough bytes to store all the bits */
Patches currently in stable-queue which might be from borntraeger(a)de.ibm.com are
queue-4.14/kvm-s390-prevent-buffer-overrun-on-memory-hotplug-during-migration.patch
queue-4.14/kvm-s390-fix-cmma-migration-for-multiple-memory-slots.patch
This is a note to let you know that I've just added the patch titled
apparmor: fix regression in mount mediation when feature set is pinned
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
apparmor-fix-regression-in-mount-mediation-when-feature-set-is-pinned.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5b9f57cf47b87f07210875d6a24776b4496b818d Mon Sep 17 00:00:00 2001
From: John Johansen <john.johansen(a)canonical.com>
Date: Thu, 7 Dec 2017 00:28:27 -0800
Subject: apparmor: fix regression in mount mediation when feature set is pinned
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: John Johansen <john.johansen(a)canonical.com>
commit 5b9f57cf47b87f07210875d6a24776b4496b818d upstream.
When the mount code was refactored for Labels it was not correctly
updated to check whether policy supported mediation of the mount
class. This causes a regression when the kernel feature set is
reported as supporting mount and policy is pinned to a feature set
that does not support mount mediation.
BugLink: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882697#41
Fixes: 2ea3ffb7782a ("apparmor: add mount mediation")
Reported-by: Fabian Grünbichler <f.gruenbichler(a)proxmox.com>
Signed-off-by: John Johansen <john.johansen(a)canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/apparmor/mount.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -330,6 +330,9 @@ static int match_mnt_path_str(struct aa_
AA_BUG(!mntpath);
AA_BUG(!buffer);
+ if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
+ return 0;
+
error = aa_path_name(mntpath, path_flags(profile, mntpath), buffer,
&mntpnt, &info, profile->disconnected);
if (error)
@@ -381,6 +384,9 @@ static int match_mnt(struct aa_profile *
AA_BUG(!profile);
AA_BUG(devpath && !devbuffer);
+ if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
+ return 0;
+
if (devpath) {
error = aa_path_name(devpath, path_flags(profile, devpath),
devbuffer, &devname, &info,
@@ -559,6 +565,9 @@ static int profile_umount(struct aa_prof
AA_BUG(!profile);
AA_BUG(!path);
+ if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
+ return 0;
+
error = aa_path_name(path, path_flags(profile, path), buffer, &name,
&info, profile->disconnected);
if (error)
@@ -614,7 +623,8 @@ static struct aa_label *build_pivotroot(
AA_BUG(!new_path);
AA_BUG(!old_path);
- if (profile_unconfined(profile))
+ if (profile_unconfined(profile) ||
+ !PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
return aa_get_newest_label(&profile->label);
error = aa_path_name(old_path, path_flags(profile, old_path),
Patches currently in stable-queue which might be from john.johansen(a)canonical.com are
queue-4.14/apparmor-fix-regression-in-mount-mediation-when-feature-set-is-pinned.patch
This is a note to let you know that I've just added the patch titled
mtd: nand: pxa3xx: Fix READOOB implementation
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mtd-nand-pxa3xx-fix-readoob-implementation.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fee4380f368e84ed216b62ccd2fbc4126f2bf40b Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Date: Mon, 18 Dec 2017 11:32:45 +0100
Subject: mtd: nand: pxa3xx: Fix READOOB implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Boris Brezillon <boris.brezillon(a)free-electrons.com>
commit fee4380f368e84ed216b62ccd2fbc4126f2bf40b upstream.
In the current driver, OOB bytes are accessed in raw mode, and when a
page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the
driver must read the whole spare area (64 bytes in case of a 2k page,
16 bytes for a 512 page). The driver was only reading the free OOB
bytes, which was leaving some unread data in the FIFO and was somehow
leading to a timeout.
We could patch the driver to read ->spare_size + ->ecc_size instead of
just ->spare_size when READOOB is requested, but we'd better make
in-band and OOB accesses consistent.
Since the driver is always accessing in-band data in non-raw mode (with
the ECC engine enabled), we should also access OOB data in this mode.
That's particularly useful when using the BCH engine because in this
mode the free OOB bytes are also ECC protected.
Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support")
Reported-by: Sean Nyekjær <sean.nyekjaer(a)prevas.dk>
Tested-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Boris Brezillon <boris.brezillon(a)free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel(a)vanguardiasur.com.ar>
Tested-by: Sean Nyekjaer <sean.nyekjaer(a)prevas.dk>
Acked-by: Robert Jarzmik <robert.jarzmik(a)free.fr>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mtd/nand/pxa3xx_nand.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mtd/nand/pxa3xx_nand.c
+++ b/drivers/mtd/nand/pxa3xx_nand.c
@@ -743,6 +743,7 @@ static void prepare_start_command(struct
switch (command) {
case NAND_CMD_READ0:
+ case NAND_CMD_READOOB:
case NAND_CMD_PAGEPROG:
info->use_ecc = 1;
case NAND_CMD_READOOB:
Patches currently in stable-queue which might be from boris.brezillon(a)free-electrons.com are
queue-3.18/mtd-nand-pxa3xx-fix-readoob-implementation.patch
If we are asked for number of entries of an offset bigger than the
sg list we should not crash.
Cc: stable(a)vger.kernel.org
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
---
drivers/staging/ccree/ssi_buffer_mgr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/ccree/ssi_buffer_mgr.c b/drivers/staging/ccree/ssi_buffer_mgr.c
index 78288ed..0f71264 100644
--- a/drivers/staging/ccree/ssi_buffer_mgr.c
+++ b/drivers/staging/ccree/ssi_buffer_mgr.c
@@ -94,7 +94,7 @@ static unsigned int cc_get_sgl_nents(struct device *dev,
{
unsigned int nents = 0;
- while (nbytes) {
+ while (nbytes && sg_list) {
if (sg_list->length) {
nents++;
/* get the number of bytes in the last entry */
--
2.7.4
If we ran out of DMA pool buffers, we get into the unmap
code path with a NULL before. Deal with this by checking
the virtual mapping is not NULL.
Cc: stable(a)vger.kernel.org
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
---
drivers/staging/ccree/ssi_buffer_mgr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/ccree/ssi_buffer_mgr.c b/drivers/staging/ccree/ssi_buffer_mgr.c
index e85bb53..78288ed 100644
--- a/drivers/staging/ccree/ssi_buffer_mgr.c
+++ b/drivers/staging/ccree/ssi_buffer_mgr.c
@@ -465,7 +465,8 @@ void cc_unmap_blkcipher_request(struct device *dev, void *ctx,
DMA_TO_DEVICE);
}
/* Release pool */
- if (req_ctx->dma_buf_type == CC_DMA_BUF_MLLI) {
+ if (req_ctx->dma_buf_type == CC_DMA_BUF_MLLI &&
+ req_ctx->mlli_params.mlli_virt_addr) {
dma_pool_free(req_ctx->mlli_params.curr_pool,
req_ctx->mlli_params.mlli_virt_addr,
req_ctx->mlli_params.mlli_dma_addr);
--
2.7.4
This is a note to let you know that I've just added the patch titled
x86/microcode/AMD: Add support for fam17h microcode loading
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Thu, 30 Nov 2017 16:46:40 -0600
Subject: x86/microcode/AMD: Add support for fam17h microcode loading
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf upstream.
The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdo…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Alice Ferrazzi <alicef(a)gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -592,6 +592,7 @@ static unsigned int verify_patch_size(u8
#define F14H_MPB_MAX_SIZE 1824
#define F15H_MPB_MAX_SIZE 4096
#define F16H_MPB_MAX_SIZE 3458
+#define F17H_MPB_MAX_SIZE 3200
switch (family) {
case 0x14:
@@ -603,6 +604,9 @@ static unsigned int verify_patch_size(u8
case 0x16:
max_size = F16H_MPB_MAX_SIZE;
break;
+ case 0x17:
+ max_size = F17H_MPB_MAX_SIZE;
+ break;
default:
max_size = F1XH_MPB_MAX_SIZE;
break;
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.9/x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
This is a note to let you know that I've just added the patch titled
x86/microcode/AMD: Add support for fam17h microcode loading
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Thu, 30 Nov 2017 16:46:40 -0600
Subject: x86/microcode/AMD: Add support for fam17h microcode loading
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf upstream.
The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdo…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Alice Ferrazzi <alicef(a)gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -580,6 +580,7 @@ static unsigned int verify_patch_size(u8
#define F14H_MPB_MAX_SIZE 1824
#define F15H_MPB_MAX_SIZE 4096
#define F16H_MPB_MAX_SIZE 3458
+#define F17H_MPB_MAX_SIZE 3200
switch (family) {
case 0x14:
@@ -591,6 +592,9 @@ static unsigned int verify_patch_size(u8
case 0x16:
max_size = F16H_MPB_MAX_SIZE;
break;
+ case 0x17:
+ max_size = F17H_MPB_MAX_SIZE;
+ break;
default:
max_size = F1XH_MPB_MAX_SIZE;
break;
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.4/x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
This is a note to let you know that I've just added the patch titled
x86/microcode/AMD: Add support for fam17h microcode loading
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Thu, 30 Nov 2017 16:46:40 -0600
Subject: x86/microcode/AMD: Add support for fam17h microcode loading
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf upstream.
The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdo…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Alice Ferrazzi <alicef(a)gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/microcode/amd.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -470,6 +470,7 @@ static unsigned int verify_patch_size(u8
#define F14H_MPB_MAX_SIZE 1824
#define F15H_MPB_MAX_SIZE 4096
#define F16H_MPB_MAX_SIZE 3458
+#define F17H_MPB_MAX_SIZE 3200
switch (family) {
case 0x14:
@@ -481,6 +482,9 @@ static unsigned int verify_patch_size(u8
case 0x16:
max_size = F16H_MPB_MAX_SIZE;
break;
+ case 0x17:
+ max_size = F17H_MPB_MAX_SIZE;
+ break;
default:
max_size = F1XH_MPB_MAX_SIZE;
break;
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.14/x86-microcode-amd-add-support-for-fam17h-microcode-loading.patch
queue-4.14/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
This is a note to let you know that I've just added the patch titled
iommu/arm-smmu-v3: Don't free page table ops twice
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iommu-arm-smmu-v3-don-t-free-page-table-ops-twice.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 57d72e159b60456c8bb281736c02ddd3164037aa Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
Date: Thu, 14 Dec 2017 11:03:01 +0000
Subject: iommu/arm-smmu-v3: Don't free page table ops twice
From: Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
commit 57d72e159b60456c8bb281736c02ddd3164037aa upstream.
Kasan reports a double free when finalise_stage_fn fails: the io_pgtable
ops are freed by arm_smmu_domain_finalise and then again by
arm_smmu_domain_free. Prevent this by leaving pgtbl_ops empty on failure.
Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Reviewed-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iommu/arm-smmu-v3.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1547,13 +1547,15 @@ static int arm_smmu_domain_finalise(stru
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
domain->geometry.aperture_end = (1UL << ias) - 1;
domain->geometry.force_aperture = true;
- smmu_domain->pgtbl_ops = pgtbl_ops;
ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
- if (ret < 0)
+ if (ret < 0) {
free_io_pgtable_ops(pgtbl_ops);
+ return ret;
+ }
- return ret;
+ smmu_domain->pgtbl_ops = pgtbl_ops;
+ return 0;
}
static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
Patches currently in stable-queue which might be from jean-philippe.brucker(a)arm.com are
queue-4.9/iommu-arm-smmu-v3-don-t-free-page-table-ops-twice.patch
This is a note to let you know that I've just added the patch titled
Input: elantech - add new icbody type 15
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
input-elantech-add-new-icbody-type-15.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 10d900303f1c3a821eb0bef4e7b7ece16768fba4 Mon Sep 17 00:00:00 2001
From: Aaron Ma <aaron.ma(a)canonical.com>
Date: Sat, 25 Nov 2017 16:48:41 -0800
Subject: Input: elantech - add new icbody type 15
From: Aaron Ma <aaron.ma(a)canonical.com>
commit 10d900303f1c3a821eb0bef4e7b7ece16768fba4 upstream.
The touchpad of Lenovo Thinkpad L480 reports it's version as 15.
Signed-off-by: Aaron Ma <aaron.ma(a)canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/input/mouse/elantech.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1609,7 +1609,7 @@ static int elantech_set_properties(struc
case 5:
etd->hw_version = 3;
break;
- case 6 ... 14:
+ case 6 ... 15:
etd->hw_version = 4;
break;
default:
Patches currently in stable-queue which might be from aaron.ma(a)canonical.com are
queue-4.9/input-elantech-add-new-icbody-type-15.patch
This is a note to let you know that I've just added the patch titled
iommu/arm-smmu-v3: Cope with duplicated Stream IDs
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iommu-arm-smmu-v3-cope-with-duplicated-stream-ids.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 563b5cbe334e9503ab2b234e279d500fc4f76018 Mon Sep 17 00:00:00 2001
From: Robin Murphy <robin.murphy(a)arm.com>
Date: Tue, 2 Jan 2018 12:33:14 +0000
Subject: iommu/arm-smmu-v3: Cope with duplicated Stream IDs
From: Robin Murphy <robin.murphy(a)arm.com>
commit 563b5cbe334e9503ab2b234e279d500fc4f76018 upstream.
For PCI devices behind an aliasing PCIe-to-PCI/X bridge, the bridge
alias to DevFn 0.0 on the subordinate bus may match the original RID of
the device, resulting in the same SID being present in the device's
fwspec twice. This causes trouble later in arm_smmu_write_strtab_ent()
when we wind up visiting the STE a second time and find it already live.
Avoid the issue by giving arm_smmu_install_ste_for_dev() the cleverness
to skip over duplicates. It seems mildly counterintuitive compared to
preventing the duplicates from existing in the first place, but since
the DT and ACPI probe paths build their fwspecs differently, this is
actually the cleanest and most self-contained way to deal with it.
Fixes: 8f78515425da ("iommu/arm-smmu: Implement of_xlate() for SMMUv3")
Reported-by: Tomasz Nowicki <tomasz.nowicki(a)caviumnetworks.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki(a)cavium.com>
Tested-by: Jayachandran C. <jnair(a)caviumnetworks.com>
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iommu/arm-smmu-v3.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1582,7 +1582,7 @@ static __le64 *arm_smmu_get_step_for_sid
static int arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
{
- int i;
+ int i, j;
struct arm_smmu_master_data *master = fwspec->iommu_priv;
struct arm_smmu_device *smmu = master->smmu;
@@ -1590,6 +1590,13 @@ static int arm_smmu_install_ste_for_dev(
u32 sid = fwspec->ids[i];
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
+ /* Bridged PCI devices may end up with duplicated IDs */
+ for (j = 0; j < i; j++)
+ if (fwspec->ids[j] == sid)
+ break;
+ if (j < i)
+ continue;
+
arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
}
Patches currently in stable-queue which might be from robin.murphy(a)arm.com are
queue-4.9/iommu-arm-smmu-v3-cope-with-duplicated-stream-ids.patch
queue-4.9/iommu-arm-smmu-v3-don-t-free-page-table-ops-twice.patch
This is a note to let you know that I've just added the patch titled
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 79435ac78d160e4c245544d457850a56f805ac0d Mon Sep 17 00:00:00 2001
From: Vineet Gupta <vgupta(a)synopsys.com>
Date: Fri, 8 Dec 2017 08:26:58 -0800
Subject: ARC: uaccess: dont use "l" gcc inline asm constraint modifier
From: Vineet Gupta <vgupta(a)synopsys.com>
commit 79435ac78d160e4c245544d457850a56f805ac0d upstream.
This used to setup the LP_COUNT register automatically, but now has been
removed.
There was an earlier fix 3c7c7a2fc8811 which fixed instance in delay.h but
somehow missed this one as gcc change had not made its way into
production toolchains and was not pedantic as it is now !
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arc/include/asm/uaccess.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -673,6 +673,7 @@ __arc_strncpy_from_user(char *dst, const
return 0;
__asm__ __volatile__(
+ " mov lp_count, %5 \n"
" lp 3f \n"
"1: ldb.ab %3, [%2, 1] \n"
" breq.d %3, 0, 3f \n"
@@ -689,8 +690,8 @@ __arc_strncpy_from_user(char *dst, const
" .word 1b, 4b \n"
" .previous \n"
: "+r"(res), "+r"(dst), "+r"(src), "=r"(val)
- : "g"(-EFAULT), "l"(count)
- : "memory");
+ : "g"(-EFAULT), "r"(count)
+ : "lp_count", "lp_start", "lp_end", "memory");
return res;
}
Patches currently in stable-queue which might be from vgupta(a)synopsys.com are
queue-4.9/arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
This is a note to let you know that I've just added the patch titled
Input: elantech - add new icbody type 15
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
input-elantech-add-new-icbody-type-15.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 10d900303f1c3a821eb0bef4e7b7ece16768fba4 Mon Sep 17 00:00:00 2001
From: Aaron Ma <aaron.ma(a)canonical.com>
Date: Sat, 25 Nov 2017 16:48:41 -0800
Subject: Input: elantech - add new icbody type 15
From: Aaron Ma <aaron.ma(a)canonical.com>
commit 10d900303f1c3a821eb0bef4e7b7ece16768fba4 upstream.
The touchpad of Lenovo Thinkpad L480 reports it's version as 15.
Signed-off-by: Aaron Ma <aaron.ma(a)canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/input/mouse/elantech.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1613,7 +1613,7 @@ static int elantech_set_properties(struc
case 5:
etd->hw_version = 3;
break;
- case 6 ... 14:
+ case 6 ... 15:
etd->hw_version = 4;
break;
default:
Patches currently in stable-queue which might be from aaron.ma(a)canonical.com are
queue-4.4/input-elantech-add-new-icbody-type-15.patch
This is a note to let you know that I've just added the patch titled
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 79435ac78d160e4c245544d457850a56f805ac0d Mon Sep 17 00:00:00 2001
From: Vineet Gupta <vgupta(a)synopsys.com>
Date: Fri, 8 Dec 2017 08:26:58 -0800
Subject: ARC: uaccess: dont use "l" gcc inline asm constraint modifier
From: Vineet Gupta <vgupta(a)synopsys.com>
commit 79435ac78d160e4c245544d457850a56f805ac0d upstream.
This used to setup the LP_COUNT register automatically, but now has been
removed.
There was an earlier fix 3c7c7a2fc8811 which fixed instance in delay.h but
somehow missed this one as gcc change had not made its way into
production toolchains and was not pedantic as it is now !
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arc/include/asm/uaccess.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -673,6 +673,7 @@ __arc_strncpy_from_user(char *dst, const
return 0;
__asm__ __volatile__(
+ " mov lp_count, %5 \n"
" lp 3f \n"
"1: ldb.ab %3, [%2, 1] \n"
" breq.d %3, 0, 3f \n"
@@ -689,8 +690,8 @@ __arc_strncpy_from_user(char *dst, const
" .word 1b, 4b \n"
" .previous \n"
: "+r"(res), "+r"(dst), "+r"(src), "=r"(val)
- : "g"(-EFAULT), "l"(count)
- : "memory");
+ : "g"(-EFAULT), "r"(count)
+ : "lp_count", "lp_start", "lp_end", "memory");
return res;
}
Patches currently in stable-queue which might be from vgupta(a)synopsys.com are
queue-4.4/arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
This is a note to let you know that I've just added the patch titled
powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
powerpc-mm-fix-segv-on-mapped-region-to-return-segv_accerr.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ecb101aed86156ec7cd71e5dca668e09146e6994 Mon Sep 17 00:00:00 2001
From: John Sperbeck <jsperbeck(a)google.com>
Date: Sun, 31 Dec 2017 21:24:58 -0800
Subject: powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR
From: John Sperbeck <jsperbeck(a)google.com>
commit ecb101aed86156ec7cd71e5dca668e09146e6994 upstream.
The recent refactoring of the powerpc page fault handler in commit
c3350602e876 ("powerpc/mm: Make bad_area* helper functions") caused
access to protected memory regions to indicate SEGV_MAPERR instead of
the traditional SEGV_ACCERR in the si_code field of a user-space
signal handler. This can confuse debug libraries that temporarily
change the protection of memory regions, and expect to use SEGV_ACCERR
as an indication to restore access to a region.
This commit restores the previous behavior. The following program
exhibits the issue:
$ ./repro read || echo "FAILED"
$ ./repro write || echo "FAILED"
$ ./repro exec || echo "FAILED"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <assert.h>
static void segv_handler(int n, siginfo_t *info, void *arg) {
_exit(info->si_code == SEGV_ACCERR ? 0 : 1);
}
int main(int argc, char **argv)
{
void *p = NULL;
struct sigaction act = {
.sa_sigaction = segv_handler,
.sa_flags = SA_SIGINFO,
};
assert(argc == 2);
p = mmap(NULL, getpagesize(),
(strcmp(argv[1], "write") == 0) ? PROT_READ : 0,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
assert(p != MAP_FAILED);
assert(sigaction(SIGSEGV, &act, NULL) == 0);
if (strcmp(argv[1], "read") == 0)
printf("%c", *(unsigned char *)p);
else if (strcmp(argv[1], "write") == 0)
*(unsigned char *)p = 0;
else if (strcmp(argv[1], "exec") == 0)
((void (*)(void))p)();
return 1; /* failed to generate SEGV */
}
Fixes: c3350602e876 ("powerpc/mm: Make bad_area* helper functions")
Signed-off-by: John Sperbeck <jsperbeck(a)google.com>
Acked-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
[mpe: Add commit references in change log]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/powerpc/mm/fault.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -145,6 +145,11 @@ static noinline int bad_area(struct pt_r
return __bad_area(regs, address, SEGV_MAPERR);
}
+static noinline int bad_access(struct pt_regs *regs, unsigned long address)
+{
+ return __bad_area(regs, address, SEGV_ACCERR);
+}
+
static int do_sigbus(struct pt_regs *regs, unsigned long address,
unsigned int fault)
{
@@ -490,7 +495,7 @@ retry:
good_area:
if (unlikely(access_error(is_write, is_exec, vma)))
- return bad_area(regs, address);
+ return bad_access(regs, address);
/*
* If for any reason at all we couldn't handle the fault,
Patches currently in stable-queue which might be from jsperbeck(a)google.com are
queue-4.14/powerpc-mm-fix-segv-on-mapped-region-to-return-segv_accerr.patch
This is a note to let you know that I've just added the patch titled
iommu/arm-smmu-v3: Don't free page table ops twice
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iommu-arm-smmu-v3-don-t-free-page-table-ops-twice.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 57d72e159b60456c8bb281736c02ddd3164037aa Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
Date: Thu, 14 Dec 2017 11:03:01 +0000
Subject: iommu/arm-smmu-v3: Don't free page table ops twice
From: Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
commit 57d72e159b60456c8bb281736c02ddd3164037aa upstream.
Kasan reports a double free when finalise_stage_fn fails: the io_pgtable
ops are freed by arm_smmu_domain_finalise and then again by
arm_smmu_domain_free. Prevent this by leaving pgtbl_ops empty on failure.
Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices")
Reviewed-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iommu/arm-smmu-v3.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1611,13 +1611,15 @@ static int arm_smmu_domain_finalise(stru
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
domain->geometry.aperture_end = (1UL << ias) - 1;
domain->geometry.force_aperture = true;
- smmu_domain->pgtbl_ops = pgtbl_ops;
ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
- if (ret < 0)
+ if (ret < 0) {
free_io_pgtable_ops(pgtbl_ops);
+ return ret;
+ }
- return ret;
+ smmu_domain->pgtbl_ops = pgtbl_ops;
+ return 0;
}
static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
Patches currently in stable-queue which might be from jean-philippe.brucker(a)arm.com are
queue-4.14/iommu-arm-smmu-v3-don-t-free-page-table-ops-twice.patch
This is a note to let you know that I've just added the patch titled
iommu/arm-smmu-v3: Cope with duplicated Stream IDs
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iommu-arm-smmu-v3-cope-with-duplicated-stream-ids.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 563b5cbe334e9503ab2b234e279d500fc4f76018 Mon Sep 17 00:00:00 2001
From: Robin Murphy <robin.murphy(a)arm.com>
Date: Tue, 2 Jan 2018 12:33:14 +0000
Subject: iommu/arm-smmu-v3: Cope with duplicated Stream IDs
From: Robin Murphy <robin.murphy(a)arm.com>
commit 563b5cbe334e9503ab2b234e279d500fc4f76018 upstream.
For PCI devices behind an aliasing PCIe-to-PCI/X bridge, the bridge
alias to DevFn 0.0 on the subordinate bus may match the original RID of
the device, resulting in the same SID being present in the device's
fwspec twice. This causes trouble later in arm_smmu_write_strtab_ent()
when we wind up visiting the STE a second time and find it already live.
Avoid the issue by giving arm_smmu_install_ste_for_dev() the cleverness
to skip over duplicates. It seems mildly counterintuitive compared to
preventing the duplicates from existing in the first place, but since
the DT and ACPI probe paths build their fwspecs differently, this is
actually the cleanest and most self-contained way to deal with it.
Fixes: 8f78515425da ("iommu/arm-smmu: Implement of_xlate() for SMMUv3")
Reported-by: Tomasz Nowicki <tomasz.nowicki(a)caviumnetworks.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki(a)cavium.com>
Tested-by: Jayachandran C. <jnair(a)caviumnetworks.com>
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iommu/arm-smmu-v3.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1646,7 +1646,7 @@ static __le64 *arm_smmu_get_step_for_sid
static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
{
- int i;
+ int i, j;
struct arm_smmu_master_data *master = fwspec->iommu_priv;
struct arm_smmu_device *smmu = master->smmu;
@@ -1654,6 +1654,13 @@ static void arm_smmu_install_ste_for_dev
u32 sid = fwspec->ids[i];
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
+ /* Bridged PCI devices may end up with duplicated IDs */
+ for (j = 0; j < i; j++)
+ if (fwspec->ids[j] == sid)
+ break;
+ if (j < i)
+ continue;
+
arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
}
}
Patches currently in stable-queue which might be from robin.murphy(a)arm.com are
queue-4.14/iommu-arm-smmu-v3-cope-with-duplicated-stream-ids.patch
queue-4.14/iommu-arm-smmu-v3-don-t-free-page-table-ops-twice.patch
This is a note to let you know that I've just added the patch titled
Input: elantech - add new icbody type 15
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
input-elantech-add-new-icbody-type-15.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 10d900303f1c3a821eb0bef4e7b7ece16768fba4 Mon Sep 17 00:00:00 2001
From: Aaron Ma <aaron.ma(a)canonical.com>
Date: Sat, 25 Nov 2017 16:48:41 -0800
Subject: Input: elantech - add new icbody type 15
From: Aaron Ma <aaron.ma(a)canonical.com>
commit 10d900303f1c3a821eb0bef4e7b7ece16768fba4 upstream.
The touchpad of Lenovo Thinkpad L480 reports it's version as 15.
Signed-off-by: Aaron Ma <aaron.ma(a)canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/input/mouse/elantech.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1613,7 +1613,7 @@ static int elantech_set_properties(struc
case 5:
etd->hw_version = 3;
break;
- case 6 ... 14:
+ case 6 ... 15:
etd->hw_version = 4;
break;
default:
Patches currently in stable-queue which might be from aaron.ma(a)canonical.com are
queue-4.14/input-elantech-add-new-icbody-type-15.patch
This is a note to let you know that I've just added the patch titled
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 79435ac78d160e4c245544d457850a56f805ac0d Mon Sep 17 00:00:00 2001
From: Vineet Gupta <vgupta(a)synopsys.com>
Date: Fri, 8 Dec 2017 08:26:58 -0800
Subject: ARC: uaccess: dont use "l" gcc inline asm constraint modifier
From: Vineet Gupta <vgupta(a)synopsys.com>
commit 79435ac78d160e4c245544d457850a56f805ac0d upstream.
This used to setup the LP_COUNT register automatically, but now has been
removed.
There was an earlier fix 3c7c7a2fc8811 which fixed instance in delay.h but
somehow missed this one as gcc change had not made its way into
production toolchains and was not pedantic as it is now !
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arc/include/asm/uaccess.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/arc/include/asm/uaccess.h
+++ b/arch/arc/include/asm/uaccess.h
@@ -668,6 +668,7 @@ __arc_strncpy_from_user(char *dst, const
return 0;
__asm__ __volatile__(
+ " mov lp_count, %5 \n"
" lp 3f \n"
"1: ldb.ab %3, [%2, 1] \n"
" breq.d %3, 0, 3f \n"
@@ -684,8 +685,8 @@ __arc_strncpy_from_user(char *dst, const
" .word 1b, 4b \n"
" .previous \n"
: "+r"(res), "+r"(dst), "+r"(src), "=r"(val)
- : "g"(-EFAULT), "l"(count)
- : "memory");
+ : "g"(-EFAULT), "r"(count)
+ : "lp_count", "lp_start", "lp_end", "memory");
return res;
}
Patches currently in stable-queue which might be from vgupta(a)synopsys.com are
queue-4.14/arc-uaccess-dont-use-l-gcc-inline-asm-constraint-modifier.patch
This is a note to let you know that I've just added the patch titled
Input: elantech - add new icbody type 15
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
input-elantech-add-new-icbody-type-15.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 10d900303f1c3a821eb0bef4e7b7ece16768fba4 Mon Sep 17 00:00:00 2001
From: Aaron Ma <aaron.ma(a)canonical.com>
Date: Sat, 25 Nov 2017 16:48:41 -0800
Subject: Input: elantech - add new icbody type 15
From: Aaron Ma <aaron.ma(a)canonical.com>
commit 10d900303f1c3a821eb0bef4e7b7ece16768fba4 upstream.
The touchpad of Lenovo Thinkpad L480 reports it's version as 15.
Signed-off-by: Aaron Ma <aaron.ma(a)canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/input/mouse/elantech.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1558,7 +1558,7 @@ static int elantech_set_properties(struc
case 5:
etd->hw_version = 3;
break;
- case 6 ... 14:
+ case 6 ... 15:
etd->hw_version = 4;
break;
default:
Patches currently in stable-queue which might be from aaron.ma(a)canonical.com are
queue-3.18/input-elantech-add-new-icbody-type-15.patch
The UAS mode of Norelsys NS1068(X) is reported to fail to work on
several platforms with the following error message:
xhci-hcd xhci-hcd.0.auto: ERROR Transfer event for unknown stream ring slot 1 ep 8
xhci-hcd xhci-hcd.0.auto: @00000000bf04a400 00000000 00000000 1b000000 01098001
And when trying to mount a partition on the disk the disk will
disconnect from the USB controller, then after re-connecting the device
will be offlined and not working at all.
Falling back to USB mass storage can solve this problem, so ignore UAS
function of this chip.
Cc: stable(a)vger.kernel.org
Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io>
---
The NS1066 chip from the same vendor seems to also suffer from this
problem (its USB ID is 2537:1066) according to the report of Armbian
community. However I don't have such device (I have a USB HDD enclosure
with USB ID 2537:1066, but it doesn't report UAS function at all; as
it's
The `lsusb -v` result of my NS1068X is shown below:
Bus 004 Device 002: ID 2537:1068
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 3.00
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 9
idVendor 0x2537
idProduct 0x1068
bcdDevice 1.00
iManufacturer 1 Norelsys
iProduct 2 NS1068
iSerial 3 0123456789ABCDE
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 121
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 2mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8
bInterfaceSubClass 6
bInterfaceProtocol 80
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 1
bNumEndpoints 4
bInterfaceClass 8
bInterfaceSubClass 6
bInterfaceProtocol 98
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
MaxStreams 8
Status pipe (0x02)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x06 EP 6 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
Command pipe (0x01)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
MaxStreams 8
Data-in pipe (0x03)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x05 EP 5 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
MaxStreams 8
Data-out pipe (0x04)
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 22
bNumDeviceCaps 2
USB 2.0 Extension Device Capability:
bLength 7
bDescriptorType 16
bDevCapabilityType 2
bmAttributes 0x00000002
HIRD Link Power Management (LPM) Supported
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
wSpeedsSupported 0x000e
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 1
Lowest fully-functional device speed is Full Speed (12Mbps)
bU1DevExitLat 10 micro seconds
bU2DevExitLat 2047 micro seconds
can't get debug descriptor: Resource temporarily unavailable
Device Status: 0x0001
Self Powered
drivers/usb/storage/unusual_uas.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/usb/storage/unusual_uas.h b/drivers/usb/storage/unusual_uas.h
index 9c2ee55ad32e..38434d88954a 100644
--- a/drivers/usb/storage/unusual_uas.h
+++ b/drivers/usb/storage/unusual_uas.h
@@ -80,6 +80,13 @@ UNUSUAL_DEV(0x2109, 0x0711, 0x0000, 0x9999,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_ATA_1X),
+/* Reported-by: Icenowy Zheng <icenowy(a)aosc.io> */
+UNUSUAL_DEV(0x2537, 0x1068, 0x0000, 0x9999,
+ "Norelsys",
+ "NS1068X",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_IGNORE_UAS),
+
/* Reported-by: Takeo Nakayama <javhera(a)gmx.com> */
UNUSUAL_DEV(0x357d, 0x7788, 0x0000, 0x9999,
"JMicron",
--
2.14.2
This is a note to let you know that I've just added the patch titled
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 426915796ccaf9c2bd9bb06dc5702225957bc2e5 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:08 -0800
Subject: kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Reported-by: Kyle Huey <me(a)kylehuey.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -919,9 +919,9 @@ static void complete_signal(int sig, str
* then start taking the whole group down immediately.
*/
if (sig_fatal(p, sig) &&
- !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
+ !(signal->flags & SIGNAL_GROUP_EXIT) &&
!sigismember(&t->real_blocked, sig) &&
- (sig == SIGKILL || !t->ptrace)) {
+ (sig == SIGKILL || !p->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.9/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.9/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.9/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.9/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 628c1bcba204052d19b686b5bac149a644cdb72e Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:01 -0800
Subject: kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
From: Oleg Nesterov <oleg(a)redhat.com>
commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -88,13 +88,15 @@ static int sig_ignored(struct task_struc
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;
- if (!sig_task_ignored(t, sig, force))
- return 0;
-
/*
- * Tracers may want to know about even ignored signals.
+ * Tracers may want to know about even ignored signal unless it
+ * is SIGKILL which can't be reported anyway but can be ignored
+ * by SIGNAL_UNKILLABLE task.
*/
- return !t->ptrace;
+ if (t->ptrace && sig != SIGKILL)
+ return 0;
+
+ return sig_task_ignored(t, sig, force);
}
/*
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.9/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.9/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.9/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.9/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ac25385089f673560867eb5179228a44ade0cfc1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:04 -0800
Subject: kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
From: Oleg Nesterov <oleg(a)redhat.com>
commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.
Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -72,7 +72,7 @@ static int sig_task_ignored(struct task_
handler = sig_handler(t, sig);
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
- handler == SIG_DFL && !force)
+ handler == SIG_DFL && !(force && sig_kernel_only(sig)))
return 1;
return sig_handler_ignored(handler, sig);
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.9/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.9/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.9/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.9/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 426915796ccaf9c2bd9bb06dc5702225957bc2e5 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:08 -0800
Subject: kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Reported-by: Kyle Huey <me(a)kylehuey.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -919,9 +919,9 @@ static void complete_signal(int sig, str
* then start taking the whole group down immediately.
*/
if (sig_fatal(p, sig) &&
- !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
+ !(signal->flags & SIGNAL_GROUP_EXIT) &&
!sigismember(&t->real_blocked, sig) &&
- (sig == SIGKILL || !t->ptrace)) {
+ (sig == SIGKILL || !p->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.4/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.4/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
queue-4.4/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.4/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 628c1bcba204052d19b686b5bac149a644cdb72e Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:01 -0800
Subject: kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
From: Oleg Nesterov <oleg(a)redhat.com>
commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -88,13 +88,15 @@ static int sig_ignored(struct task_struc
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;
- if (!sig_task_ignored(t, sig, force))
- return 0;
-
/*
- * Tracers may want to know about even ignored signals.
+ * Tracers may want to know about even ignored signal unless it
+ * is SIGKILL which can't be reported anyway but can be ignored
+ * by SIGNAL_UNKILLABLE task.
*/
- return !t->ptrace;
+ if (t->ptrace && sig != SIGKILL)
+ return 0;
+
+ return sig_task_ignored(t, sig, force);
}
/*
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.4/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.4/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
queue-4.4/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.4/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ac25385089f673560867eb5179228a44ade0cfc1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:04 -0800
Subject: kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
From: Oleg Nesterov <oleg(a)redhat.com>
commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.
Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -72,7 +72,7 @@ static int sig_task_ignored(struct task_
handler = sig_handler(t, sig);
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
- handler == SIG_DFL && !force)
+ handler == SIG_DFL && !(force && sig_kernel_only(sig)))
return 1;
return sig_handler_ignored(handler, sig);
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.4/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.4/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
queue-4.4/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.4/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 426915796ccaf9c2bd9bb06dc5702225957bc2e5 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:08 -0800
Subject: kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Reported-by: Kyle Huey <me(a)kylehuey.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -931,9 +931,9 @@ static void complete_signal(int sig, str
* then start taking the whole group down immediately.
*/
if (sig_fatal(p, sig) &&
- !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
+ !(signal->flags & SIGNAL_GROUP_EXIT) &&
!sigismember(&t->real_blocked, sig) &&
- (sig == SIGKILL || !t->ptrace)) {
+ (sig == SIGKILL || !p->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.14/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.14/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.14/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.14/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 628c1bcba204052d19b686b5bac149a644cdb72e Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:01 -0800
Subject: kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
From: Oleg Nesterov <oleg(a)redhat.com>
commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -94,13 +94,15 @@ static int sig_ignored(struct task_struc
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;
- if (!sig_task_ignored(t, sig, force))
- return 0;
-
/*
- * Tracers may want to know about even ignored signals.
+ * Tracers may want to know about even ignored signal unless it
+ * is SIGKILL which can't be reported anyway but can be ignored
+ * by SIGNAL_UNKILLABLE task.
*/
- return !t->ptrace;
+ if (t->ptrace && sig != SIGKILL)
+ return 0;
+
+ return sig_task_ignored(t, sig, force);
}
/*
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.14/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.14/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.14/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.14/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ac25385089f673560867eb5179228a44ade0cfc1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:04 -0800
Subject: kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
From: Oleg Nesterov <oleg(a)redhat.com>
commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.
Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -78,7 +78,7 @@ static int sig_task_ignored(struct task_
handler = sig_handler(t, sig);
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
- handler == SIG_DFL && !force)
+ handler == SIG_DFL && !(force && sig_kernel_only(sig)))
return 1;
return sig_handler_ignored(handler, sig);
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.14/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-4.14/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-4.14/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-4.14/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 426915796ccaf9c2bd9bb06dc5702225957bc2e5 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:08 -0800
Subject: kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Reported-by: Kyle Huey <me(a)kylehuey.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -970,9 +970,9 @@ static void complete_signal(int sig, str
* then start taking the whole group down immediately.
*/
if (sig_fatal(p, sig) &&
- !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) &&
+ !(signal->flags & SIGNAL_GROUP_EXIT) &&
!sigismember(&t->real_blocked, sig) &&
- (sig == SIGKILL || !t->ptrace)) {
+ (sig == SIGKILL || !p->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-3.18/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-3.18/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-3.18/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-3.18/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 628c1bcba204052d19b686b5bac149a644cdb72e Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:01 -0800
Subject: kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
From: Oleg Nesterov <oleg(a)redhat.com>
commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -88,13 +88,15 @@ static int sig_ignored(struct task_struc
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;
- if (!sig_task_ignored(t, sig, force))
- return 0;
-
/*
- * Tracers may want to know about even ignored signals.
+ * Tracers may want to know about even ignored signal unless it
+ * is SIGKILL which can't be reported anyway but can be ignored
+ * by SIGNAL_UNKILLABLE task.
*/
- return !t->ptrace;
+ if (t->ptrace && sig != SIGKILL)
+ return 0;
+
+ return sig_task_ignored(t, sig, force);
}
/*
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-3.18/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-3.18/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-3.18/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-3.18/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
This is a note to let you know that I've just added the patch titled
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ac25385089f673560867eb5179228a44ade0cfc1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 17 Nov 2017 15:30:04 -0800
Subject: kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
From: Oleg Nesterov <oleg(a)redhat.com>
commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.
Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Tested-by: Kyle Huey <me(a)kylehuey.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -72,7 +72,7 @@ static int sig_task_ignored(struct task_
handler = sig_handler(t, sig);
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
- handler == SIG_DFL && !force)
+ handler == SIG_DFL && !(force && sig_kernel_only(sig)))
return 1;
return sig_handler_ignored(handler, sig);
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-3.18/kernel-signal.c-remove-the-no-longer-needed-signal_unkillable-check-in-complete_signal.patch
queue-3.18/kernel-signal.c-protect-the-signal_unkillable-tasks-from-sig_kernel_only-signals.patch
queue-3.18/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
queue-3.18/kernel-signal.c-protect-the-traced-signal_unkillable-tasks-from-sigkill.patch
Hi Greg at al,
Please include mainline commits
b29c6ef7bb12 x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()
7d5905dc14a8 x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
(in this order) into 4.14.x (they also are applicable to 4.13.x if
that is still being maintained by anyone).
Thanks,
Rafael
This is a note to let you know that I've just added the patch titled
x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu-avoid-unnecessary-ipis-in-arch_freq_get_on_cpu.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b29c6ef7bb1257853c1e31616d84f55e561cf631 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Mon, 13 Nov 2017 02:15:39 +0100
Subject: x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()
From: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
commit b29c6ef7bb1257853c1e31616d84f55e561cf631 upstream.
Even though aperfmperf_snapshot_khz() caches the samples.khz value to
return if called again in a sufficiently short time, its caller,
arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it
which may allow user space to trigger an IPI storm by reading from the
scaling_cur_freq cpufreq sysfs file in a tight loop.
To avoid that, move the decision on whether or not to return the cached
samples.khz value to arch_freq_get_on_cpu().
This change was part of commit 941f5f0f6ef5 ("x86: CPU: Fix up "cpu MHz"
in /proc/cpuinfo"), but it was not the reason for the revert and it
remains applicable.
Fixes: 4815d3c56d1e (cpufreq: x86: Make scaling_cur_freq behave more as expected)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Reviewed-by: WANG Chao <chao.wang(a)ucloud.cn>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/aperfmperf.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -42,10 +42,6 @@ static void aperfmperf_snapshot_khz(void
s64 time_delta = ktime_ms_delta(now, s->time);
unsigned long flags;
- /* Don't bother re-computing within the cache threshold time. */
- if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
- return;
-
local_irq_save(flags);
rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);
@@ -74,6 +70,7 @@ static void aperfmperf_snapshot_khz(void
unsigned int arch_freq_get_on_cpu(int cpu)
{
+ s64 time_delta;
unsigned int khz;
if (!cpu_khz)
@@ -82,6 +79,12 @@ unsigned int arch_freq_get_on_cpu(int cp
if (!static_cpu_has(X86_FEATURE_APERFMPERF))
return 0;
+ /* Don't bother re-computing within the cache threshold time. */
+ time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
+ khz = per_cpu(samples.khz, cpu);
+ if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
+ return khz;
+
smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
khz = per_cpu(samples.khz, cpu);
if (khz)
Patches currently in stable-queue which might be from rafael.j.wysocki(a)intel.com are
queue-4.14/x86-cpu-always-show-current-cpu-frequency-in-proc-cpuinfo.patch
queue-4.14/x86-cpu-avoid-unnecessary-ipis-in-arch_freq_get_on_cpu.patch
This is a note to let you know that I've just added the patch titled
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu-always-show-current-cpu-frequency-in-proc-cpuinfo.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Wed, 15 Nov 2017 02:13:40 +0100
Subject: x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
From: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
commit 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 upstream.
After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration. That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".
However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.
Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.
Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/Makefile | 2 -
arch/x86/kernel/cpu/aperfmperf.c | 74 +++++++++++++++++++++++++++------------
arch/x86/kernel/cpu/cpu.h | 3 +
arch/x86/kernel/cpu/proc.c | 6 ++-
fs/proc/cpuinfo.c | 6 +++
include/linux/cpufreq.h | 1
6 files changed, 68 insertions(+), 24 deletions(-)
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -22,7 +22,7 @@ obj-y += common.o
obj-y += rdrand.o
obj-y += match.o
obj-y += bugs.o
-obj-$(CONFIG_CPU_FREQ) += aperfmperf.o
+obj-y += aperfmperf.o
obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -14,6 +14,8 @@
#include <linux/percpu.h>
#include <linux/smp.h>
+#include "cpu.h"
+
struct aperfmperf_sample {
unsigned int khz;
ktime_t time;
@@ -24,7 +26,7 @@ struct aperfmperf_sample {
static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
#define APERFMPERF_CACHE_THRESHOLD_MS 10
-#define APERFMPERF_REFRESH_DELAY_MS 20
+#define APERFMPERF_REFRESH_DELAY_MS 10
#define APERFMPERF_STALE_THRESHOLD_MS 1000
/*
@@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
u64 aperf, aperf_delta;
u64 mperf, mperf_delta;
struct aperfmperf_sample *s = this_cpu_ptr(&samples);
- ktime_t now = ktime_get();
- s64 time_delta = ktime_ms_delta(now, s->time);
unsigned long flags;
local_irq_save(flags);
@@ -57,38 +57,68 @@ static void aperfmperf_snapshot_khz(void
if (mperf_delta == 0)
return;
- s->time = now;
+ s->time = ktime_get();
s->aperf = aperf;
s->mperf = mperf;
-
- /* If the previous iteration was too long ago, discard it. */
- if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
- s->khz = 0;
- else
- s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
+ s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
}
-unsigned int arch_freq_get_on_cpu(int cpu)
+static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
{
- s64 time_delta;
- unsigned int khz;
+ s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu));
+
+ /* Don't bother re-computing within the cache threshold time. */
+ if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
+ return true;
+
+ smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+
+ /* Return false if the previous iteration was too long ago. */
+ return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
+}
+unsigned int aperfmperf_get_khz(int cpu)
+{
if (!cpu_khz)
return 0;
if (!static_cpu_has(X86_FEATURE_APERFMPERF))
return 0;
- /* Don't bother re-computing within the cache threshold time. */
- time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
- khz = per_cpu(samples.khz, cpu);
- if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
- return khz;
+ aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
+ return per_cpu(samples.khz, cpu);
+}
- smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
- khz = per_cpu(samples.khz, cpu);
- if (khz)
- return khz;
+void arch_freq_prepare_all(void)
+{
+ ktime_t now = ktime_get();
+ bool wait = false;
+ int cpu;
+
+ if (!cpu_khz)
+ return;
+
+ if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+ return;
+
+ for_each_online_cpu(cpu)
+ if (!aperfmperf_snapshot_cpu(cpu, now, false))
+ wait = true;
+
+ if (wait)
+ msleep(APERFMPERF_REFRESH_DELAY_MS);
+}
+
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+ if (!cpu_khz)
+ return 0;
+
+ if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+ return 0;
+
+ if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
+ return per_cpu(samples.khz, cpu);
msleep(APERFMPERF_REFRESH_DELAY_MS);
smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -47,4 +47,7 @@ extern const struct cpu_dev *const __x86
extern void get_cpu_cap(struct cpuinfo_x86 *c);
extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+
+unsigned int aperfmperf_get_khz(int cpu);
+
#endif /* ARCH_X86_CPU_H */
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -5,6 +5,8 @@
#include <linux/seq_file.h>
#include <linux/cpufreq.h>
+#include "cpu.h"
+
/*
* Get CPU information for use by the procfs.
*/
@@ -78,9 +80,11 @@ static int show_cpuinfo(struct seq_file
seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
if (cpu_has(c, X86_FEATURE_TSC)) {
- unsigned int freq = cpufreq_quick_get(cpu);
+ unsigned int freq = aperfmperf_get_khz(cpu);
if (!freq)
+ freq = cpufreq_quick_get(cpu);
+ if (!freq)
freq = cpu_khz;
seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
freq / 1000, (freq % 1000));
--- a/fs/proc/cpuinfo.c
+++ b/fs/proc/cpuinfo.c
@@ -1,12 +1,18 @@
// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpufreq.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
+__weak void arch_freq_prepare_all(void)
+{
+}
+
extern const struct seq_operations cpuinfo_op;
static int cpuinfo_open(struct inode *inode, struct file *file)
{
+ arch_freq_prepare_all();
return seq_open(file, &cpuinfo_op);
}
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -917,6 +917,7 @@ static inline bool policy_has_boost_freq
}
#endif
+extern void arch_freq_prepare_all(void);
extern unsigned int arch_freq_get_on_cpu(int cpu);
/* the following are really really optional */
Patches currently in stable-queue which might be from rafael.j.wysocki(a)intel.com are
queue-4.14/x86-cpu-always-show-current-cpu-frequency-in-proc-cpuinfo.patch
queue-4.14/x86-cpu-avoid-unnecessary-ipis-in-arch_freq_get_on_cpu.patch
This is a note to let you know that I've just added the patch titled
fscache: Fix the default for fscache_maybe_release_page()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fscache-fix-the-default-for-fscache_maybe_release_page.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 98801506552593c9b8ac11021b0cdad12cab4f6b Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Tue, 2 Jan 2018 10:02:19 +0000
Subject: fscache: Fix the default for fscache_maybe_release_page()
From: David Howells <dhowells(a)redhat.com>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: Jeff Layton <jlayton(a)redhat.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/fscache.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -755,7 +755,7 @@ bool fscache_maybe_release_page(struct f
{
if (fscache_cookie_valid(cookie) && PageFsCache(page))
return __fscache_maybe_release_page(cookie, page, gfp);
- return false;
+ return true;
}
/**
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-4.14/fscache-fix-the-default-for-fscache_maybe_release_page.patch
This is a note to let you know that I've just added the patch titled
kernel/acct.c: fix the acct->needcheck check in check_free_space()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4d9570158b6260f449e317a5f9ed030c2504a615 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Thu, 4 Jan 2018 16:17:49 -0800
Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc73086015 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc73086015 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/acct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -96,7 +96,7 @@ static int check_free_space(struct bsd_a
{
struct kstatfs sbuf;
- if (time_is_before_jiffies(acct->needcheck))
+ if (time_is_after_jiffies(acct->needcheck))
goto out;
/* May block */
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-3.18/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
This is a note to let you know that I've just added the patch titled
fscache: Fix the default for fscache_maybe_release_page()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fscache-fix-the-default-for-fscache_maybe_release_page.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 98801506552593c9b8ac11021b0cdad12cab4f6b Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Tue, 2 Jan 2018 10:02:19 +0000
Subject: fscache: Fix the default for fscache_maybe_release_page()
From: David Howells <dhowells(a)redhat.com>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: Jeff Layton <jlayton(a)redhat.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/fscache.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -764,7 +764,7 @@ bool fscache_maybe_release_page(struct f
{
if (fscache_cookie_valid(cookie) && PageFsCache(page))
return __fscache_maybe_release_page(cookie, page, gfp);
- return false;
+ return true;
}
/**
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-3.18/fscache-fix-the-default-for-fscache_maybe_release_page.patch
This is a note to let you know that I've just added the patch titled
crypto: n2 - cure use after free
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-n2-cure-use-after-free.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 203f45003a3d03eea8fa28d74cfc74c354416fdb Mon Sep 17 00:00:00 2001
From: Jan Engelhardt <jengelh(a)inai.de>
Date: Tue, 19 Dec 2017 19:09:07 +0100
Subject: crypto: n2 - cure use after free
From: Jan Engelhardt <jengelh(a)inai.de>
commit 203f45003a3d03eea8fa28d74cfc74c354416fdb upstream.
queue_cache_init is first called for the Control Word Queue
(n2_crypto_probe). At that time, queue_cache[0] is NULL and a new
kmem_cache will be allocated. If the subsequent n2_register_algs call
fails, the kmem_cache will be released in queue_cache_destroy, but
queue_cache_init[0] is not set back to NULL.
So when the Module Arithmetic Unit gets probed next (n2_mau_probe),
queue_cache_init will not allocate a kmem_cache again, but leave it
as its bogus value, causing a BUG() to trigger when queue_cache[0] is
eventually passed to kmem_cache_zalloc:
n2_crypto: Found N2CP at /virtual-devices@100/n2cp@7
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
n2_crypto: md5 alg registration failed
n2cp f028687c: /virtual-devices@100/n2cp@7: Unable to register algorithms.
called queue_cache_destroy
n2cp: probe of f028687c failed with error -22
n2_crypto: Found NCP at /virtual-devices@100/ncp@6
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
kernel BUG at mm/slab.c:2993!
Call Trace:
[0000000000604488] kmem_cache_alloc+0x1a8/0x1e0
(inlined) kmem_cache_zalloc
(inlined) new_queue
(inlined) spu_queue_setup
(inlined) handle_exec_unit
[0000000010c61eb4] spu_mdesc_scan+0x1f4/0x460 [n2_crypto]
[0000000010c62b80] n2_mau_probe+0x100/0x220 [n2_crypto]
[000000000084b174] platform_drv_probe+0x34/0xc0
Signed-off-by: Jan Engelhardt <jengelh(a)inai.de>
Acked-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/crypto/n2_core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/crypto/n2_core.c
+++ b/drivers/crypto/n2_core.c
@@ -1641,6 +1641,7 @@ static int queue_cache_init(void)
CWQ_ENTRY_SIZE, 0, NULL);
if (!queue_cache[HV_NCS_QTYPE_CWQ - 1]) {
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
return -ENOMEM;
}
return 0;
@@ -1650,6 +1651,8 @@ static void queue_cache_destroy(void)
{
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_CWQ - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
+ queue_cache[HV_NCS_QTYPE_CWQ - 1] = NULL;
}
static int spu_queue_register(struct spu_queue *p, unsigned long q_type)
Patches currently in stable-queue which might be from jengelh(a)inai.de are
queue-3.18/crypto-n2-cure-use-after-free.patch
This is a note to let you know that I've just added the patch titled
sunxi-rsb: Include OF based modalias in device uevent
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e2bf801ecd4e62222a46d1ba9e57e710171d29c1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Stefan=20Br=C3=BCns?= <stefan.bruens(a)rwth-aachen.de>
Date: Mon, 27 Nov 2017 20:05:34 +0100
Subject: sunxi-rsb: Include OF based modalias in device uevent
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
commit e2bf801ecd4e62222a46d1ba9e57e710171d29c1 upstream.
Include the OF-based modalias in the uevent sent when registering devices
on the sunxi RSB bus, so that user space has a chance to autoload the
kernel module for the device.
Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64:
pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for
the AXP803 PMIC is built as a module, it is not loaded and the system
ends up with an disfunctional MMC controller.
Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus")
Acked-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/bus/sunxi-rsb.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/bus/sunxi-rsb.c
+++ b/drivers/bus/sunxi-rsb.c
@@ -178,6 +178,7 @@ static struct bus_type sunxi_rsb_bus = {
.match = sunxi_rsb_device_match,
.probe = sunxi_rsb_device_probe,
.remove = sunxi_rsb_device_remove,
+ .uevent = of_device_uevent_modalias,
};
static void sunxi_rsb_dev_release(struct device *dev)
Patches currently in stable-queue which might be from stefan.bruens(a)rwth-aachen.de are
queue-4.9/sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
This is a note to let you know that I've just added the patch titled
nbd: fix use-after-free of rq/bio in the xmit path
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
nbd-fix-use-after-free-of-rq-bio-in-the-xmit-path.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 429a787be6793554ee02aacc7e1f11ebcecc4453 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)fb.com>
Date: Thu, 17 Nov 2016 12:30:37 -0700
Subject: nbd: fix use-after-free of rq/bio in the xmit path
From: Jens Axboe <axboe(a)fb.com>
commit 429a787be6793554ee02aacc7e1f11ebcecc4453 upstream.
For writes, we can get a completion in while we're still iterating
the request and bio chain. If that happens, we're reading freed
memory and we can crash.
Break out after the last segment and avoid having the iterator
read freed memory.
Reviewed-by: Josef Bacik <jbacik(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/block/nbd.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -272,6 +272,7 @@ static int nbd_send_cmd(struct nbd_devic
int result, flags;
struct nbd_request request;
unsigned long size = blk_rq_bytes(req);
+ struct bio *bio;
u32 type;
if (req->cmd_type == REQ_TYPE_DRV_PRIV)
@@ -305,16 +306,20 @@ static int nbd_send_cmd(struct nbd_devic
return -EIO;
}
- if (type == NBD_CMD_WRITE) {
- struct req_iterator iter;
+ if (type != NBD_CMD_WRITE)
+ return 0;
+
+ flags = 0;
+ bio = req->bio;
+ while (bio) {
+ struct bio *next = bio->bi_next;
+ struct bvec_iter iter;
struct bio_vec bvec;
- /*
- * we are really probing at internals to determine
- * whether to set MSG_MORE or not...
- */
- rq_for_each_segment(bvec, req, iter) {
- flags = 0;
- if (!rq_iter_last(bvec, iter))
+
+ bio_for_each_segment(bvec, bio, iter) {
+ bool is_last = !next && bio_iter_last(bvec, iter);
+
+ if (is_last)
flags = MSG_MORE;
dev_dbg(nbd_to_dev(nbd), "request %p: sending %d bytes data\n",
cmd, bvec.bv_len);
@@ -325,7 +330,16 @@ static int nbd_send_cmd(struct nbd_devic
result);
return -EIO;
}
+ /*
+ * The completion might already have come in,
+ * so break for the last one instead of letting
+ * the iterator do it. This prevents use-after-free
+ * of the bio.
+ */
+ if (is_last)
+ break;
}
+ bio = next;
}
return 0;
}
Patches currently in stable-queue which might be from axboe(a)fb.com are
queue-4.9/nbd-fix-use-after-free-of-rq-bio-in-the-xmit-path.patch
This is a note to let you know that I've just added the patch titled
kernel/acct.c: fix the acct->needcheck check in check_free_space()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4d9570158b6260f449e317a5f9ed030c2504a615 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Thu, 4 Jan 2018 16:17:49 -0800
Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc73086015 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc73086015 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/acct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -99,7 +99,7 @@ static int check_free_space(struct bsd_a
{
struct kstatfs sbuf;
- if (time_is_before_jiffies(acct->needcheck))
+ if (time_is_after_jiffies(acct->needcheck))
goto out;
/* May block */
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.9/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
This is a note to let you know that I've just added the patch titled
fscache: Fix the default for fscache_maybe_release_page()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fscache-fix-the-default-for-fscache_maybe_release_page.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 98801506552593c9b8ac11021b0cdad12cab4f6b Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Tue, 2 Jan 2018 10:02:19 +0000
Subject: fscache: Fix the default for fscache_maybe_release_page()
From: David Howells <dhowells(a)redhat.com>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: Jeff Layton <jlayton(a)redhat.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/fscache.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -764,7 +764,7 @@ bool fscache_maybe_release_page(struct f
{
if (fscache_cookie_valid(cookie) && PageFsCache(page))
return __fscache_maybe_release_page(cookie, page, gfp);
- return false;
+ return true;
}
/**
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-4.9/fscache-fix-the-default-for-fscache_maybe_release_page.patch
This is a note to let you know that I've just added the patch titled
crypto: pcrypt - fix freeing pcrypt instances
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-pcrypt-fix-freeing-pcrypt-instances.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d76c68109f37cb85b243a1cf0f40313afd2bae68 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 20 Dec 2017 14:28:25 -0800
Subject: crypto: pcrypt - fix freeing pcrypt instances
From: Eric Biggers <ebiggers(a)google.com>
commit d76c68109f37cb85b243a1cf0f40313afd2bae68 upstream.
pcrypt is using the old way of freeing instances, where the ->free()
method specified in the 'struct crypto_template' is passed a pointer to
the 'struct crypto_instance'. But the crypto_instance is being
kfree()'d directly, which is incorrect because the memory was actually
allocated as an aead_instance, which contains the crypto_instance at a
nonzero offset. Thus, the wrong pointer was being kfree()'d.
Fix it by switching to the new way to free aead_instance's where the
->free() method is specified in the aead_instance itself.
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: 0496f56065e0 ("crypto: pcrypt - Add support for new AEAD interface")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/pcrypt.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -254,6 +254,14 @@ static void pcrypt_aead_exit_tfm(struct
crypto_free_aead(ctx->child);
}
+static void pcrypt_free(struct aead_instance *inst)
+{
+ struct pcrypt_instance_ctx *ctx = aead_instance_ctx(inst);
+
+ crypto_drop_aead(&ctx->spawn);
+ kfree(inst);
+}
+
static int pcrypt_init_instance(struct crypto_instance *inst,
struct crypto_alg *alg)
{
@@ -319,6 +327,8 @@ static int pcrypt_create_aead(struct cry
inst->alg.encrypt = pcrypt_aead_encrypt;
inst->alg.decrypt = pcrypt_aead_decrypt;
+ inst->free = pcrypt_free;
+
err = aead_register_instance(tmpl, inst);
if (err)
goto out_drop_aead;
@@ -349,14 +359,6 @@ static int pcrypt_create(struct crypto_t
return -EINVAL;
}
-static void pcrypt_free(struct crypto_instance *inst)
-{
- struct pcrypt_instance_ctx *ctx = crypto_instance_ctx(inst);
-
- crypto_drop_aead(&ctx->spawn);
- kfree(inst);
-}
-
static int pcrypt_cpumask_change_notify(struct notifier_block *self,
unsigned long val, void *data)
{
@@ -469,7 +471,6 @@ static void pcrypt_fini_padata(struct pa
static struct crypto_template pcrypt_tmpl = {
.name = "pcrypt",
.create = pcrypt_create,
- .free = pcrypt_free,
.module = THIS_MODULE,
};
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.9/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.9/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
This is a note to let you know that I've just added the patch titled
crypto: n2 - cure use after free
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-n2-cure-use-after-free.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 203f45003a3d03eea8fa28d74cfc74c354416fdb Mon Sep 17 00:00:00 2001
From: Jan Engelhardt <jengelh(a)inai.de>
Date: Tue, 19 Dec 2017 19:09:07 +0100
Subject: crypto: n2 - cure use after free
From: Jan Engelhardt <jengelh(a)inai.de>
commit 203f45003a3d03eea8fa28d74cfc74c354416fdb upstream.
queue_cache_init is first called for the Control Word Queue
(n2_crypto_probe). At that time, queue_cache[0] is NULL and a new
kmem_cache will be allocated. If the subsequent n2_register_algs call
fails, the kmem_cache will be released in queue_cache_destroy, but
queue_cache_init[0] is not set back to NULL.
So when the Module Arithmetic Unit gets probed next (n2_mau_probe),
queue_cache_init will not allocate a kmem_cache again, but leave it
as its bogus value, causing a BUG() to trigger when queue_cache[0] is
eventually passed to kmem_cache_zalloc:
n2_crypto: Found N2CP at /virtual-devices@100/n2cp@7
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
n2_crypto: md5 alg registration failed
n2cp f028687c: /virtual-devices@100/n2cp@7: Unable to register algorithms.
called queue_cache_destroy
n2cp: probe of f028687c failed with error -22
n2_crypto: Found NCP at /virtual-devices@100/ncp@6
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
kernel BUG at mm/slab.c:2993!
Call Trace:
[0000000000604488] kmem_cache_alloc+0x1a8/0x1e0
(inlined) kmem_cache_zalloc
(inlined) new_queue
(inlined) spu_queue_setup
(inlined) handle_exec_unit
[0000000010c61eb4] spu_mdesc_scan+0x1f4/0x460 [n2_crypto]
[0000000010c62b80] n2_mau_probe+0x100/0x220 [n2_crypto]
[000000000084b174] platform_drv_probe+0x34/0xc0
Signed-off-by: Jan Engelhardt <jengelh(a)inai.de>
Acked-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/crypto/n2_core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/crypto/n2_core.c
+++ b/drivers/crypto/n2_core.c
@@ -1620,6 +1620,7 @@ static int queue_cache_init(void)
CWQ_ENTRY_SIZE, 0, NULL);
if (!queue_cache[HV_NCS_QTYPE_CWQ - 1]) {
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
return -ENOMEM;
}
return 0;
@@ -1629,6 +1630,8 @@ static void queue_cache_destroy(void)
{
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_CWQ - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
+ queue_cache[HV_NCS_QTYPE_CWQ - 1] = NULL;
}
static int spu_queue_register(struct spu_queue *p, unsigned long q_type)
Patches currently in stable-queue which might be from jengelh(a)inai.de are
queue-4.9/crypto-n2-cure-use-after-free.patch
This is a note to let you know that I've just added the patch titled
crypto: chacha20poly1305 - validate the digest size
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-chacha20poly1305-validate-the-digest-size.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e57121d08c38dabec15cf3e1e2ad46721af30cae Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 11 Dec 2017 12:15:17 -0800
Subject: crypto: chacha20poly1305 - validate the digest size
From: Eric Biggers <ebiggers(a)google.com>
commit e57121d08c38dabec15cf3e1e2ad46721af30cae upstream.
If the rfc7539 template was instantiated with a hash algorithm with
digest size larger than 16 bytes (POLY1305_DIGEST_SIZE), then the digest
overran the 'tag' buffer in 'struct chachapoly_req_ctx', corrupting the
subsequent memory, including 'cryptlen'. This caused a crash during
crypto_skcipher_decrypt().
Fix it by, when instantiating the template, requiring that the
underlying hash algorithm has the digest size expected for Poly1305.
Reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int algfd, reqfd;
struct sockaddr_alg addr = {
.salg_type = "aead",
.salg_name = "rfc7539(chacha20,sha256)",
};
unsigned char buf[32] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (void *)&addr, sizeof(addr));
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, sizeof(buf));
reqfd = accept(algfd, 0, 0);
write(reqfd, buf, 16);
read(reqfd, buf, 16);
}
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: 71ebc4d1b27d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/chacha20poly1305.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/crypto/chacha20poly1305.c
+++ b/crypto/chacha20poly1305.c
@@ -610,6 +610,11 @@ static int chachapoly_create(struct cryp
algt->mask));
if (IS_ERR(poly))
return PTR_ERR(poly);
+ poly_hash = __crypto_hash_alg_common(poly);
+
+ err = -EINVAL;
+ if (poly_hash->digestsize != POLY1305_DIGEST_SIZE)
+ goto out_put_poly;
err = -ENOMEM;
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
@@ -618,7 +623,6 @@ static int chachapoly_create(struct cryp
ctx = aead_instance_ctx(inst);
ctx->saltlen = CHACHAPOLY_IV_SIZE - ivsize;
- poly_hash = __crypto_hash_alg_common(poly);
err = crypto_init_ahash_spawn(&ctx->poly, poly_hash,
aead_crypto_instance(inst));
if (err)
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.9/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.9/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
This is a note to let you know that I've just added the patch titled
sunxi-rsb: Include OF based modalias in device uevent
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e2bf801ecd4e62222a46d1ba9e57e710171d29c1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Stefan=20Br=C3=BCns?= <stefan.bruens(a)rwth-aachen.de>
Date: Mon, 27 Nov 2017 20:05:34 +0100
Subject: sunxi-rsb: Include OF based modalias in device uevent
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
commit e2bf801ecd4e62222a46d1ba9e57e710171d29c1 upstream.
Include the OF-based modalias in the uevent sent when registering devices
on the sunxi RSB bus, so that user space has a chance to autoload the
kernel module for the device.
Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64:
pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for
the AXP803 PMIC is built as a module, it is not loaded and the system
ends up with an disfunctional MMC controller.
Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus")
Acked-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/bus/sunxi-rsb.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/bus/sunxi-rsb.c
+++ b/drivers/bus/sunxi-rsb.c
@@ -178,6 +178,7 @@ static struct bus_type sunxi_rsb_bus = {
.match = sunxi_rsb_device_match,
.probe = sunxi_rsb_device_probe,
.remove = sunxi_rsb_device_remove,
+ .uevent = of_device_uevent_modalias,
};
static void sunxi_rsb_dev_release(struct device *dev)
Patches currently in stable-queue which might be from stefan.bruens(a)rwth-aachen.de are
queue-4.4/sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
This is a note to let you know that I've just added the patch titled
kernel/acct.c: fix the acct->needcheck check in check_free_space()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4d9570158b6260f449e317a5f9ed030c2504a615 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Thu, 4 Jan 2018 16:17:49 -0800
Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc73086015 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc73086015 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/acct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -99,7 +99,7 @@ static int check_free_space(struct bsd_a
{
struct kstatfs sbuf;
- if (time_is_before_jiffies(acct->needcheck))
+ if (time_is_after_jiffies(acct->needcheck))
goto out;
/* May block */
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
queue-4.4/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
This is a note to let you know that I've just added the patch titled
fscache: Fix the default for fscache_maybe_release_page()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fscache-fix-the-default-for-fscache_maybe_release_page.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 98801506552593c9b8ac11021b0cdad12cab4f6b Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Tue, 2 Jan 2018 10:02:19 +0000
Subject: fscache: Fix the default for fscache_maybe_release_page()
From: David Howells <dhowells(a)redhat.com>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: Jeff Layton <jlayton(a)redhat.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/fscache.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -764,7 +764,7 @@ bool fscache_maybe_release_page(struct f
{
if (fscache_cookie_valid(cookie) && PageFsCache(page))
return __fscache_maybe_release_page(cookie, page, gfp);
- return false;
+ return true;
}
/**
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-4.4/fscache-fix-the-default-for-fscache_maybe_release_page.patch
This is a note to let you know that I've just added the patch titled
crypto: pcrypt - fix freeing pcrypt instances
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-pcrypt-fix-freeing-pcrypt-instances.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d76c68109f37cb85b243a1cf0f40313afd2bae68 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 20 Dec 2017 14:28:25 -0800
Subject: crypto: pcrypt - fix freeing pcrypt instances
From: Eric Biggers <ebiggers(a)google.com>
commit d76c68109f37cb85b243a1cf0f40313afd2bae68 upstream.
pcrypt is using the old way of freeing instances, where the ->free()
method specified in the 'struct crypto_template' is passed a pointer to
the 'struct crypto_instance'. But the crypto_instance is being
kfree()'d directly, which is incorrect because the memory was actually
allocated as an aead_instance, which contains the crypto_instance at a
nonzero offset. Thus, the wrong pointer was being kfree()'d.
Fix it by switching to the new way to free aead_instance's where the
->free() method is specified in the aead_instance itself.
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: 0496f56065e0 ("crypto: pcrypt - Add support for new AEAD interface")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/pcrypt.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -254,6 +254,14 @@ static void pcrypt_aead_exit_tfm(struct
crypto_free_aead(ctx->child);
}
+static void pcrypt_free(struct aead_instance *inst)
+{
+ struct pcrypt_instance_ctx *ctx = aead_instance_ctx(inst);
+
+ crypto_drop_aead(&ctx->spawn);
+ kfree(inst);
+}
+
static int pcrypt_init_instance(struct crypto_instance *inst,
struct crypto_alg *alg)
{
@@ -319,6 +327,8 @@ static int pcrypt_create_aead(struct cry
inst->alg.encrypt = pcrypt_aead_encrypt;
inst->alg.decrypt = pcrypt_aead_decrypt;
+ inst->free = pcrypt_free;
+
err = aead_register_instance(tmpl, inst);
if (err)
goto out_drop_aead;
@@ -349,14 +359,6 @@ static int pcrypt_create(struct crypto_t
return -EINVAL;
}
-static void pcrypt_free(struct crypto_instance *inst)
-{
- struct pcrypt_instance_ctx *ctx = crypto_instance_ctx(inst);
-
- crypto_drop_aead(&ctx->spawn);
- kfree(inst);
-}
-
static int pcrypt_cpumask_change_notify(struct notifier_block *self,
unsigned long val, void *data)
{
@@ -469,7 +471,6 @@ static void pcrypt_fini_padata(struct pa
static struct crypto_template pcrypt_tmpl = {
.name = "pcrypt",
.create = pcrypt_create,
- .free = pcrypt_free,
.module = THIS_MODULE,
};
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.4/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.4/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
This is a note to let you know that I've just added the patch titled
crypto: n2 - cure use after free
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-n2-cure-use-after-free.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 203f45003a3d03eea8fa28d74cfc74c354416fdb Mon Sep 17 00:00:00 2001
From: Jan Engelhardt <jengelh(a)inai.de>
Date: Tue, 19 Dec 2017 19:09:07 +0100
Subject: crypto: n2 - cure use after free
From: Jan Engelhardt <jengelh(a)inai.de>
commit 203f45003a3d03eea8fa28d74cfc74c354416fdb upstream.
queue_cache_init is first called for the Control Word Queue
(n2_crypto_probe). At that time, queue_cache[0] is NULL and a new
kmem_cache will be allocated. If the subsequent n2_register_algs call
fails, the kmem_cache will be released in queue_cache_destroy, but
queue_cache_init[0] is not set back to NULL.
So when the Module Arithmetic Unit gets probed next (n2_mau_probe),
queue_cache_init will not allocate a kmem_cache again, but leave it
as its bogus value, causing a BUG() to trigger when queue_cache[0] is
eventually passed to kmem_cache_zalloc:
n2_crypto: Found N2CP at /virtual-devices@100/n2cp@7
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
n2_crypto: md5 alg registration failed
n2cp f028687c: /virtual-devices@100/n2cp@7: Unable to register algorithms.
called queue_cache_destroy
n2cp: probe of f028687c failed with error -22
n2_crypto: Found NCP at /virtual-devices@100/ncp@6
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
kernel BUG at mm/slab.c:2993!
Call Trace:
[0000000000604488] kmem_cache_alloc+0x1a8/0x1e0
(inlined) kmem_cache_zalloc
(inlined) new_queue
(inlined) spu_queue_setup
(inlined) handle_exec_unit
[0000000010c61eb4] spu_mdesc_scan+0x1f4/0x460 [n2_crypto]
[0000000010c62b80] n2_mau_probe+0x100/0x220 [n2_crypto]
[000000000084b174] platform_drv_probe+0x34/0xc0
Signed-off-by: Jan Engelhardt <jengelh(a)inai.de>
Acked-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/crypto/n2_core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/crypto/n2_core.c
+++ b/drivers/crypto/n2_core.c
@@ -1641,6 +1641,7 @@ static int queue_cache_init(void)
CWQ_ENTRY_SIZE, 0, NULL);
if (!queue_cache[HV_NCS_QTYPE_CWQ - 1]) {
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
return -ENOMEM;
}
return 0;
@@ -1650,6 +1651,8 @@ static void queue_cache_destroy(void)
{
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_CWQ - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
+ queue_cache[HV_NCS_QTYPE_CWQ - 1] = NULL;
}
static int spu_queue_register(struct spu_queue *p, unsigned long q_type)
Patches currently in stable-queue which might be from jengelh(a)inai.de are
queue-4.4/crypto-n2-cure-use-after-free.patch
This is a note to let you know that I've just added the patch titled
crypto: chacha20poly1305 - validate the digest size
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-chacha20poly1305-validate-the-digest-size.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e57121d08c38dabec15cf3e1e2ad46721af30cae Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 11 Dec 2017 12:15:17 -0800
Subject: crypto: chacha20poly1305 - validate the digest size
From: Eric Biggers <ebiggers(a)google.com>
commit e57121d08c38dabec15cf3e1e2ad46721af30cae upstream.
If the rfc7539 template was instantiated with a hash algorithm with
digest size larger than 16 bytes (POLY1305_DIGEST_SIZE), then the digest
overran the 'tag' buffer in 'struct chachapoly_req_ctx', corrupting the
subsequent memory, including 'cryptlen'. This caused a crash during
crypto_skcipher_decrypt().
Fix it by, when instantiating the template, requiring that the
underlying hash algorithm has the digest size expected for Poly1305.
Reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int algfd, reqfd;
struct sockaddr_alg addr = {
.salg_type = "aead",
.salg_name = "rfc7539(chacha20,sha256)",
};
unsigned char buf[32] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (void *)&addr, sizeof(addr));
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, sizeof(buf));
reqfd = accept(algfd, 0, 0);
write(reqfd, buf, 16);
read(reqfd, buf, 16);
}
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: 71ebc4d1b27d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/chacha20poly1305.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/crypto/chacha20poly1305.c
+++ b/crypto/chacha20poly1305.c
@@ -600,6 +600,11 @@ static int chachapoly_create(struct cryp
CRYPTO_ALG_TYPE_AHASH_MASK);
if (IS_ERR(poly))
return PTR_ERR(poly);
+ poly_hash = __crypto_hash_alg_common(poly);
+
+ err = -EINVAL;
+ if (poly_hash->digestsize != POLY1305_DIGEST_SIZE)
+ goto out_put_poly;
err = -ENOMEM;
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
@@ -608,7 +613,6 @@ static int chachapoly_create(struct cryp
ctx = aead_instance_ctx(inst);
ctx->saltlen = CHACHAPOLY_IV_SIZE - ivsize;
- poly_hash = __crypto_hash_alg_common(poly);
err = crypto_init_ahash_spawn(&ctx->poly, poly_hash,
aead_crypto_instance(inst));
if (err)
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.4/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.4/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
On Thu, Jan 04, 2018 at 09:21:05AM +1100, NeilBrown wrote:
>
> From: Thiago Rafael Becker <thiago.becker(a)gmail.com>
>
Thanks for the two backports, now queued up.
greg k-h
Hi,
Can you include commit:
commit 429a787be6793554ee02aacc7e1f11ebcecc4453
Author: Jens Axboe <axboe(a)fb.com>
Date: Thu Nov 17 12:30:37 2016 -0700
nbd: fix use-after-free of rq/bio in the xmit path
in the 4.9 stable release? Bug doesn't exist in 4.8 or 4.10, but it does
in 4.9 and it's a mistake that the above commit wasn't marked for 4.9
stable when added.
--
Jens Axboe
This is a note to let you know that I've just added the patch titled
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From de791821c295cc61419a06fe5562288417d1bc58 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Fri, 5 Jan 2018 15:27:34 +0100
Subject: x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
From: Thomas Gleixner <tglx(a)linutronix.de>
commit de791821c295cc61419a06fe5562288417d1bc58 upstream.
Use the name associated with the particular attack which needs page table
isolation for mitigation.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Alan Cox <gnomes(a)lxorguk.ukuu.org.uk>
Cc: Jiri Koshina <jikos(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Andi Lutomirski <luto(a)amacapital.net>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Paul Turner <pjt(a)google.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Greg KH <gregkh(a)linux-foundation.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/mm/pti.c | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -341,6 +341,6 @@
#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */
#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */
#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */
-#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */
+#define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */
#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -900,7 +900,7 @@ static void __init early_identify_cpu(st
setup_force_cpu_cap(X86_FEATURE_ALWAYS);
if (c->x86_vendor != X86_VENDOR_AMD)
- setup_force_cpu_bug(X86_BUG_CPU_INSECURE);
+ setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
fpu__init_system(c);
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -56,13 +56,13 @@
static void __init pti_print_if_insecure(const char *reason)
{
- if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ if (boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
pr_info("%s\n", reason);
}
static void __init pti_print_if_secure(const char *reason)
{
- if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
pr_info("%s\n", reason);
}
@@ -96,7 +96,7 @@ void __init pti_check_boottime_disable(v
}
autosel:
- if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return;
enable:
setup_force_cpu_cap(X86_FEATURE_PTI);
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.14/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.14/x86-kaslr-fix-the-vaddr_end-mess.patch
queue-4.14/efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
queue-4.14/x86-mm-set-modules_end-to-0xffffffffff000000.patch
queue-4.14/mm-sparse.c-wrong-allocation-for-mem_section.patch
queue-4.14/x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
queue-4.14/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.14/x86-mm-map-cpu_entry_area-at-the-same-place-on-4-5-level.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Set MODULES_END to 0xffffffffff000000
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-set-modules_end-to-0xffffffffff000000.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f5a40711fa58f1c109165a4fec6078bf2dfd2bdc Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Date: Thu, 28 Dec 2017 19:06:20 +0300
Subject: x86/mm: Set MODULES_END to 0xffffffffff000000
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
commit f5a40711fa58f1c109165a4fec6078bf2dfd2bdc upstream.
Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.
So passing page unaligned address to kasan_populate_zero_shadow() have two
possible effects:
1) It may leave one page hole in supposed to be populated area. After commit
21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
hole happens to be in the shadow covering fixmap area and leads to crash:
BUG: unable to handle kernel paging request at fffffbffffe8ee04
RIP: 0010:check_memory_region+0x5c/0x190
Call Trace:
<NMI>
memcpy+0x1f/0x50
ghes_copy_tofrom_phys+0xab/0x180
ghes_read_estatus+0xfb/0x280
ghes_notify_nmi+0x2b2/0x410
nmi_handle+0x115/0x2c0
default_do_nmi+0x57/0x110
do_nmi+0xf8/0x150
end_repeat_nmi+0x1a/0x1e
Note, the crash likely disappeared after commit 92a0f81d8957, which
changed kasan_populate_zero_shadow() call the way it was before
commit 21506525fb8d.
2) Attempt to load module near MODULES_END will fail, because
__vmalloc_node_range() called from kasan_module_alloc() will hit the
WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.
To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
which means that MODULES_END should be 8*PAGE_SIZE aligned.
The whole point of commit f06bdd4001c2 was to move MODULES_END down if
NR_CPUS is big, so the cpu_entry_area takes a lot of space.
But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
the cpu_entry_area is no longer in fixmap, so we could just set
MODULES_END to a fixed 8*PAGE_SIZE aligned address.
Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
Reported-by: Jakub Kicinski <kubakici(a)wp.pl>
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Thomas Garnier <thgarnie(a)google.com>
Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/x86/x86_64/mm.txt | 5 +----
arch/x86/include/asm/pgtable_64_types.h | 2 +-
2 files changed, 2 insertions(+), 5 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -43,7 +43,7 @@ ffffff0000000000 - ffffff7fffffffff (=39
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space
+ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
@@ -67,9 +67,6 @@ memory window (this size is arbitrary, i
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.
-The module mapping space size changes based on the CONFIG requirements for the
-following fixmap section.
-
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
Their order is preserved but their base will be offset early at boot time.
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -104,7 +104,7 @@ typedef struct { pteval_t pte; } pte_t;
#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
-#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1)
+#define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)
#define ESPFIX_PGD_ENTRY _AC(-2, UL)
Patches currently in stable-queue which might be from aryabinin(a)virtuozzo.com are
queue-4.14/x86-mm-set-modules_end-to-0xffffffffff000000.patch
This is a note to let you know that I've just added the patch titled
x86/kaslr: Fix the vaddr_end mess
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaslr-fix-the-vaddr_end-mess.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1dddd25125112ba49706518ac9077a1026a18f37 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 4 Jan 2018 12:32:03 +0100
Subject: x86/kaslr: Fix the vaddr_end mess
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 1dddd25125112ba49706518ac9077a1026a18f37 upstream.
vaddr_end for KASLR is only documented in the KASLR code itself and is
adjusted depending on config options. So it's not surprising that a change
of the memory layout causes KASLR to have the wrong vaddr_end. This can map
arbitrary stuff into other areas causing hard to understand problems.
Remove the whole ifdef magic and define the start of the cpu_entry_area to
be the end of the KASLR vaddr range.
Add documentation to that effect.
Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Benjamin Gilbert <benjamin.gilbert(a)coreos.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Benjamin Gilbert <benjamin.gilbert(a)coreos.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Garnier <thgarnie(a)google.com>,
Cc: Alexander Kuleshov <kuleshovmail(a)gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/x86/x86_64/mm.txt | 6 ++++++
arch/x86/include/asm/pgtable_64_types.h | 8 +++++++-
arch/x86/mm/kaslr.c | 32 +++++++++-----------------------
3 files changed, 22 insertions(+), 24 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
+ vaddr_end for KASLR
fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
@@ -37,6 +38,7 @@ ffd4000000000000 - ffd5ffffffffffff (=49
... unused hole ...
ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
... unused hole ...
+ vaddr_end for KASLR
fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
@@ -71,3 +73,7 @@ during EFI runtime calls.
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
Their order is preserved but their base will be offset early at boot time.
+
+Be very careful vs. KASLR when changing anything here. The KASLR address
+range must not overlap with anything except the KASAN shadow area, which is
+correct as KASAN disables KASLR.
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -75,7 +75,13 @@ typedef struct { pteval_t pte; } pte_t;
#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE - 1))
-/* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
+/*
+ * See Documentation/x86/x86_64/mm.txt for a description of the memory map.
+ *
+ * Be very careful vs. KASLR when changing anything here. The KASLR address
+ * range must not overlap with anything except the KASAN shadow area, which
+ * is correct as KASAN disables KASLR.
+ */
#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
#ifdef CONFIG_X86_5LEVEL
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -34,25 +34,14 @@
#define TB_SHIFT 40
/*
- * Virtual address start and end range for randomization. The end changes base
- * on configuration to have the highest amount of space for randomization.
- * It increases the possible random position for each randomized region.
+ * Virtual address start and end range for randomization.
*
- * You need to add an if/def entry if you introduce a new memory region
- * compatible with KASLR. Your entry must be in logical order with memory
- * layout. For example, ESPFIX is before EFI because its virtual address is
- * before. You also need to add a BUILD_BUG_ON() in kernel_randomize_memory() to
- * ensure that this order is correct and won't be changed.
+ * The end address could depend on more configuration options to make the
+ * highest amount of space for randomization available, but that's too hard
+ * to keep straight and caused issues already.
*/
static const unsigned long vaddr_start = __PAGE_OFFSET_BASE;
-
-#if defined(CONFIG_X86_ESPFIX64)
-static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
-#elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_END;
-#else
-static const unsigned long vaddr_end = __START_KERNEL_map;
-#endif
+static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
/* Default values */
unsigned long page_offset_base = __PAGE_OFFSET_BASE;
@@ -101,15 +90,12 @@ void __init kernel_randomize_memory(void
unsigned long remain_entropy;
/*
- * All these BUILD_BUG_ON checks ensures the memory layout is
- * consistent with the vaddr_start/vaddr_end variables.
+ * These BUILD_BUG_ON checks ensure the memory layout is consistent
+ * with the vaddr_start/vaddr_end variables. These checks are very
+ * limited....
*/
BUILD_BUG_ON(vaddr_start >= vaddr_end);
- BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
- vaddr_end >= EFI_VA_END);
- BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
- IS_ENABLED(CONFIG_EFI)) &&
- vaddr_end >= __START_KERNEL_map);
+ BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
if (!kaslr_memory_enabled())
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.14/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.14/x86-kaslr-fix-the-vaddr_end-mess.patch
queue-4.14/efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
queue-4.14/x86-mm-set-modules_end-to-0xffffffffff000000.patch
queue-4.14/mm-sparse.c-wrong-allocation-for-mem_section.patch
queue-4.14/x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
queue-4.14/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.14/x86-mm-map-cpu_entry_area-at-the-same-place-on-4-5-level.patch
This is a note to let you know that I've just added the patch titled
x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 42f3bdc5dd962a5958bc024c1e1444248a6b8b4a Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Thu, 4 Jan 2018 18:07:12 +0100
Subject: x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
From: Peter Zijlstra <peterz(a)infradead.org>
commit 42f3bdc5dd962a5958bc024c1e1444248a6b8b4a upstream.
Thomas reported the following warning:
BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
caller is native_flush_tlb_single+0x57/0xc0
native_flush_tlb_single+0x57/0xc0
__set_pte_vaddr+0x2d/0x40
set_pte_vaddr+0x2f/0x40
cea_set_pte+0x30/0x40
ds_update_cea.constprop.4+0x4d/0x70
reserve_ds_buffers+0x159/0x410
x86_reserve_hardware+0x150/0x160
x86_pmu_event_init+0x3e/0x1f0
perf_try_init_event+0x69/0x80
perf_event_alloc+0x652/0x740
SyS_perf_event_open+0x3f6/0xd60
do_syscall_64+0x5c/0x190
set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
there are two problems with that:
1) The resulting flush is not supposed to be called in preemptible context
2) The cpu entry area is supposed to be per CPU, but the debug store
buffers are mapped for all CPUs so these mappings need to be flushed
globally.
Add the necessary preemption protection across the mapping code and flush
TLBs globally.
Fixes: c1961a4631da ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
Reported-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml(a)ze-it.at>
Signed-off-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml(a)ze-it.at>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Hugh Dickins <hughd(a)google.com>
Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/events/intel/ds.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -5,6 +5,7 @@
#include <asm/cpu_entry_area.h>
#include <asm/perf_event.h>
+#include <asm/tlbflush.h>
#include <asm/insn.h>
#include "../perf_event.h"
@@ -283,20 +284,35 @@ static DEFINE_PER_CPU(void *, insn_buffe
static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot)
{
+ unsigned long start = (unsigned long)cea;
phys_addr_t pa;
size_t msz = 0;
pa = virt_to_phys(addr);
+
+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, pa, prot);
+
+ /*
+ * This is a cross-CPU update of the cpu_entry_area, we must shoot down
+ * all TLB entries for it.
+ */
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}
static void ds_clear_cea(void *cea, size_t size)
{
+ unsigned long start = (unsigned long)cea;
size_t msz = 0;
+ preempt_disable();
for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE)
cea_set_pte(cea, 0, PAGE_NONE);
+
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
}
static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-tlb-drop-the-_gpl-from-the-cpu_tlbstate-export.patch
queue-4.14/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.14/x86-kaslr-fix-the-vaddr_end-mess.patch
queue-4.14/efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
queue-4.14/mm-sparse.c-wrong-allocation-for-mem_section.patch
queue-4.14/x86-events-intel-ds-use-the-proper-cache-flush-method-for-mapping-ds-buffers.patch
queue-4.14/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.14/x86-mm-map-cpu_entry_area-at-the-same-place-on-4-5-level.patch
This is a note to let you know that I've just added the patch titled
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b9e705ef7cfaf22db0daab91ad3cd33b0fa32eb9 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 4 Jan 2018 14:37:05 +0000
Subject: x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit b9e705ef7cfaf22db0daab91ad3cd33b0fa32eb9 upstream.
Where an ALTERNATIVE is used in the middle of an inline asm block, this
would otherwise lead to the following instruction being appended directly
to the trailing ".popsection", and a failed compile.
Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: ak(a)linux.intel.com
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Paul Turner <pjt(a)google.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/alternative.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -140,7 +140,7 @@ static inline int alternatives_text_rese
".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr, feature, 1) \
- ".popsection"
+ ".popsection\n"
#define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
OLDINSTR_2(oldinstr, 1, 2) \
@@ -151,7 +151,7 @@ static inline int alternatives_text_rese
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr1, feature1, 1) \
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \
- ".popsection"
+ ".popsection\n"
/*
* Alternative instructions for different CPU types or capabilities.
Patches currently in stable-queue which might be from dwmw(a)amazon.co.uk are
queue-4.14/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.14/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
This is a note to let you know that I've just added the patch titled
userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0cbb4b4f4c44f54af268969b18d8deda63aded59 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange(a)redhat.com>
Date: Thu, 4 Jan 2018 16:18:09 -0800
Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
From: Andrea Arcangeli <aarcange(a)redhat.com>
commit 0cbb4b4f4c44f54af268969b18d8deda63aded59 upstream.
The previous fix in commit 384632e67e08 ("userfaultfd: non-cooperative:
fix fork use after free") corrected the refcounting in case of
UFFD_EVENT_FORK failure for the fork userfault paths.
That still didn't clear the vma->vm_userfaultfd_ctx of the vmas that
were set to point to the aborted new uffd ctx earlier in
dup_userfaultfd.
Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Eric Biggers <ebiggers3(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/userfaultfd.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -570,11 +570,14 @@ out:
static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
struct userfaultfd_wait_queue *ewq)
{
+ struct userfaultfd_ctx *release_new_ctx;
+
if (WARN_ON_ONCE(current->flags & PF_EXITING))
goto out;
ewq->ctx = ctx;
init_waitqueue_entry(&ewq->wq, current);
+ release_new_ctx = NULL;
spin_lock(&ctx->event_wqh.lock);
/*
@@ -601,8 +604,7 @@ static void userfaultfd_event_wait_compl
new = (struct userfaultfd_ctx *)
(unsigned long)
ewq->msg.arg.reserved.reserved1;
-
- userfaultfd_ctx_put(new);
+ release_new_ctx = new;
}
break;
}
@@ -617,6 +619,20 @@ static void userfaultfd_event_wait_compl
__set_current_state(TASK_RUNNING);
spin_unlock(&ctx->event_wqh.lock);
+ if (release_new_ctx) {
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = release_new_ctx->mm;
+
+ /* the various vma->vm_userfaultfd_ctx still points to it */
+ down_write(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next)
+ if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx)
+ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
+ up_write(&mm->mmap_sem);
+
+ userfaultfd_ctx_put(release_new_ctx);
+ }
+
/*
* ctx may go away after this if the userfault pseudo fd is
* already released.
Patches currently in stable-queue which might be from aarcange(a)redhat.com are
queue-4.14/userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
This is a note to let you know that I've just added the patch titled
sunxi-rsb: Include OF based modalias in device uevent
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e2bf801ecd4e62222a46d1ba9e57e710171d29c1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Stefan=20Br=C3=BCns?= <stefan.bruens(a)rwth-aachen.de>
Date: Mon, 27 Nov 2017 20:05:34 +0100
Subject: sunxi-rsb: Include OF based modalias in device uevent
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
commit e2bf801ecd4e62222a46d1ba9e57e710171d29c1 upstream.
Include the OF-based modalias in the uevent sent when registering devices
on the sunxi RSB bus, so that user space has a chance to autoload the
kernel module for the device.
Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64:
pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for
the AXP803 PMIC is built as a module, it is not loaded and the system
ends up with an disfunctional MMC controller.
Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus")
Acked-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/bus/sunxi-rsb.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/bus/sunxi-rsb.c
+++ b/drivers/bus/sunxi-rsb.c
@@ -178,6 +178,7 @@ static struct bus_type sunxi_rsb_bus = {
.match = sunxi_rsb_device_match,
.probe = sunxi_rsb_device_probe,
.remove = sunxi_rsb_device_remove,
+ .uevent = of_device_uevent_modalias,
};
static void sunxi_rsb_dev_release(struct device *dev)
Patches currently in stable-queue which might be from stefan.bruens(a)rwth-aachen.de are
queue-4.14/sunxi-rsb-include-of-based-modalias-in-device-uevent.patch
This is a note to let you know that I've just added the patch titled
mm/sparse.c: wrong allocation for mem_section
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-sparse.c-wrong-allocation-for-mem_section.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d09cfbbfa0f761a97687828b5afb27b56cbf2e19 Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe(a)redhat.com>
Date: Thu, 4 Jan 2018 16:18:06 -0800
Subject: mm/sparse.c: wrong allocation for mem_section
From: Baoquan He <bhe(a)redhat.com>
commit d09cfbbfa0f761a97687828b5afb27b56cbf2e19 upstream.
In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime
for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to
save memory.
It allocates the first dimension of array with sizeof(struct mem_section).
It costs extra memory, should be sizeof(struct mem_section *).
Fix it.
Link: http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com
Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Tested-by: Dave Young <dyoung(a)redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Atsushi Kumagai <ats-kumagai(a)wm.jp.nec.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -211,7 +211,7 @@ void __init memory_present(int nid, unsi
if (unlikely(!mem_section)) {
unsigned long size, align;
- size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
+ size = sizeof(struct mem_section*) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
}
Patches currently in stable-queue which might be from bhe(a)redhat.com are
queue-4.14/mm-sparse.c-wrong-allocation-for-mem_section.patch
This is a note to let you know that I've just added the patch titled
mm/mprotect: add a cond_resched() inside change_pmd_range()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-mprotect-add-a-cond_resched-inside-change_pmd_range.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4991c09c7c812dba13ea9be79a68b4565bb1fa4e Mon Sep 17 00:00:00 2001
From: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Date: Thu, 4 Jan 2018 16:17:52 -0800
Subject: mm/mprotect: add a cond_resched() inside change_pmd_range()
From: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
commit 4991c09c7c812dba13ea9be79a68b4565bb1fa4e upstream.
While testing on a large CPU system, detected the following RCU stall
many times over the span of the workload. This problem is solved by
adding a cond_resched() in the change_pmd_range() function.
INFO: rcu_sched detected stalls on CPUs/tasks:
154-....: (670 ticks this GP) idle=022/140000000000000/0 softirq=2825/2825 fqs=612
(detected by 955, t=6002 jiffies, g=4486, c=4485, q=90864)
Sending NMI from CPU 955 to CPUs 154:
NMI backtrace for cpu 154
CPU: 154 PID: 147071 Comm: workload Not tainted 4.15.0-rc3+ #3
NIP: c0000000000b3f64 LR: c0000000000b33d4 CTR: 000000000000aa18
REGS: 00000000a4b0fb44 TRAP: 0501 Not tainted (4.15.0-rc3+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22422082 XER: 00000000
CFAR: 00000000006cf8f0 SOFTE: 1
GPR00: 0010000000000000 c00003ef9b1cb8c0 c0000000010cc600 0000000000000000
GPR04: 8e0000018c32b200 40017b3858fd6e00 8e0000018c32b208 40017b3858fd6e00
GPR08: 8e0000018c32b210 40017b3858fd6e00 8e0000018c32b218 40017b3858fd6e00
GPR12: ffffffffffffffff c00000000fb25100
NIP [c0000000000b3f64] plpar_hcall9+0x44/0x7c
LR [c0000000000b33d4] pSeries_lpar_flush_hash_range+0x384/0x420
Call Trace:
flush_hash_range+0x48/0x100
__flush_tlb_pending+0x44/0xd0
hpte_need_flush+0x408/0x470
change_protection_range+0xaac/0xf10
change_prot_numa+0x30/0xb0
task_numa_work+0x2d0/0x3e0
task_work_run+0x130/0x190
do_notify_resume+0x118/0x120
ret_from_except_lite+0x70/0x74
Instruction dump:
60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
Link: http://lkml.kernel.org/r/20171214140551.5794-1-khandual@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Suggested-by: Nicholas Piggin <npiggin(a)gmail.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/mprotect.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -166,7 +166,7 @@ static inline unsigned long change_pmd_r
next = pmd_addr_end(addr, end);
if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)
&& pmd_none_or_clear_bad(pmd))
- continue;
+ goto next;
/* invoke the mmu notifier if the pmd is populated */
if (!mni_start) {
@@ -188,7 +188,7 @@ static inline unsigned long change_pmd_r
}
/* huge pmd was handled */
- continue;
+ goto next;
}
}
/* fall through, the trans huge pmd just split */
@@ -196,6 +196,8 @@ static inline unsigned long change_pmd_r
this_pages = change_pte_range(vma, pmd, addr, next, newprot,
dirty_accountable, prot_numa);
pages += this_pages;
+next:
+ cond_resched();
} while (pmd++, addr = next, addr != end);
if (mni_start)
Patches currently in stable-queue which might be from khandual(a)linux.vnet.ibm.com are
queue-4.14/mm-mprotect-add-a-cond_resched-inside-change_pmd_range.patch
This is a note to let you know that I've just added the patch titled
kernel/acct.c: fix the acct->needcheck check in check_free_space()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4d9570158b6260f449e317a5f9ed030c2504a615 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Thu, 4 Jan 2018 16:17:49 -0800
Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space()
From: Oleg Nesterov <oleg(a)redhat.com>
commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc73086015 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc73086015 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/acct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -102,7 +102,7 @@ static int check_free_space(struct bsd_a
{
struct kstatfs sbuf;
- if (time_is_before_jiffies(acct->needcheck))
+ if (time_is_after_jiffies(acct->needcheck))
goto out;
/* May block */
Patches currently in stable-queue which might be from oleg(a)redhat.com are
queue-4.14/kernel-acct.c-fix-the-acct-needcheck-check-in-check_free_space.patch
This is a note to let you know that I've just added the patch titled
efi/capsule-loader: Reinstate virtual capsule mapping
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f24c4d478013d82bd1b943df566fff3561d52864 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Tue, 2 Jan 2018 17:21:10 +0000
Subject: efi/capsule-loader: Reinstate virtual capsule mapping
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
commit f24c4d478013d82bd1b943df566fff3561d52864 upstream.
Commit:
82c3768b8d68 ("efi/capsule-loader: Use a cached copy of the capsule header")
... refactored the capsule loading code that maps the capsule header,
to avoid having to map it several times.
However, as it turns out, the vmap() call we ended up removing did not
just map the header, but the entire capsule image, and dropping this
virtual mapping breaks capsules that are processed by the firmware
immediately (i.e., without a reboot).
Unfortunately, that change was part of a larger refactor that allowed
a quirk to be implemented for Quark, which has a non-standard memory
layout for capsules, and we have slightly painted ourselves into a
corner by allowing quirk code to mangle the capsule header and memory
layout.
So we need to fix this without breaking Quark. Fortunately, Quark does
not appear to care about the virtual mapping, and so we can simply
do a partial revert of commit:
2a457fb31df6 ("efi/capsule-loader: Use page addresses rather than struct page pointers")
... and create a vmap() mapping of the entire capsule (including header)
based on the reinstated struct page array, unless running on Quark, in
which case we pass the capsule header copy as before.
Reported-by: Ge Song <ge.song(a)hxt-semitech.com>
Tested-by: Bryan O'Donoghue <pure.logic(a)nexus-software.ie>
Tested-by: Ge Song <ge.song(a)hxt-semitech.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-efi(a)vger.kernel.org
Fixes: 82c3768b8d68 ("efi/capsule-loader: Use a cached copy of the capsule header")
Link: http://lkml.kernel.org/r/20180102172110.17018-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/platform/efi/quirks.c | 13 +++++++++
drivers/firmware/efi/capsule-loader.c | 45 +++++++++++++++++++++++++++-------
include/linux/efi.h | 4 ++-
3 files changed, 52 insertions(+), 10 deletions(-)
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -592,7 +592,18 @@ static int qrk_capsule_setup_info(struct
/*
* Update the first page pointer to skip over the CSH header.
*/
- cap_info->pages[0] += csh->headersize;
+ cap_info->phys[0] += csh->headersize;
+
+ /*
+ * cap_info->capsule should point at a virtual mapping of the entire
+ * capsule, starting at the capsule header. Our image has the Quark
+ * security header prepended, so we cannot rely on the default vmap()
+ * mapping created by the generic capsule code.
+ * Given that the Quark firmware does not appear to care about the
+ * virtual mapping, let's just point cap_info->capsule at our copy
+ * of the capsule header.
+ */
+ cap_info->capsule = &cap_info->header;
return 1;
}
--- a/drivers/firmware/efi/capsule-loader.c
+++ b/drivers/firmware/efi/capsule-loader.c
@@ -20,10 +20,6 @@
#define NO_FURTHER_WRITE_ACTION -1
-#ifndef phys_to_page
-#define phys_to_page(x) pfn_to_page((x) >> PAGE_SHIFT)
-#endif
-
/**
* efi_free_all_buff_pages - free all previous allocated buffer pages
* @cap_info: pointer to current instance of capsule_info structure
@@ -35,7 +31,7 @@
static void efi_free_all_buff_pages(struct capsule_info *cap_info)
{
while (cap_info->index > 0)
- __free_page(phys_to_page(cap_info->pages[--cap_info->index]));
+ __free_page(cap_info->pages[--cap_info->index]);
cap_info->index = NO_FURTHER_WRITE_ACTION;
}
@@ -71,6 +67,14 @@ int __efi_capsule_setup_info(struct caps
cap_info->pages = temp_page;
+ temp_page = krealloc(cap_info->phys,
+ pages_needed * sizeof(phys_addr_t *),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!temp_page)
+ return -ENOMEM;
+
+ cap_info->phys = temp_page;
+
return 0;
}
@@ -105,9 +109,24 @@ int __weak efi_capsule_setup_info(struct
**/
static ssize_t efi_capsule_submit_update(struct capsule_info *cap_info)
{
+ bool do_vunmap = false;
int ret;
- ret = efi_capsule_update(&cap_info->header, cap_info->pages);
+ /*
+ * cap_info->capsule may have been assigned already by a quirk
+ * handler, so only overwrite it if it is NULL
+ */
+ if (!cap_info->capsule) {
+ cap_info->capsule = vmap(cap_info->pages, cap_info->index,
+ VM_MAP, PAGE_KERNEL);
+ if (!cap_info->capsule)
+ return -ENOMEM;
+ do_vunmap = true;
+ }
+
+ ret = efi_capsule_update(cap_info->capsule, cap_info->phys);
+ if (do_vunmap)
+ vunmap(cap_info->capsule);
if (ret) {
pr_err("capsule update failed\n");
return ret;
@@ -165,10 +184,12 @@ static ssize_t efi_capsule_write(struct
goto failed;
}
- cap_info->pages[cap_info->index++] = page_to_phys(page);
+ cap_info->pages[cap_info->index] = page;
+ cap_info->phys[cap_info->index] = page_to_phys(page);
cap_info->page_bytes_remain = PAGE_SIZE;
+ cap_info->index++;
} else {
- page = phys_to_page(cap_info->pages[cap_info->index - 1]);
+ page = cap_info->pages[cap_info->index - 1];
}
kbuff = kmap(page);
@@ -252,6 +273,7 @@ static int efi_capsule_release(struct in
struct capsule_info *cap_info = file->private_data;
kfree(cap_info->pages);
+ kfree(cap_info->phys);
kfree(file->private_data);
file->private_data = NULL;
return 0;
@@ -280,6 +302,13 @@ static int efi_capsule_open(struct inode
kfree(cap_info);
return -ENOMEM;
}
+
+ cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL);
+ if (!cap_info->phys) {
+ kfree(cap_info->pages);
+ kfree(cap_info);
+ return -ENOMEM;
+ }
file->private_data = cap_info;
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -140,11 +140,13 @@ struct efi_boot_memmap {
struct capsule_info {
efi_capsule_header_t header;
+ efi_capsule_header_t *capsule;
int reset_type;
long index;
size_t count;
size_t total_size;
- phys_addr_t *pages;
+ struct page **pages;
+ phys_addr_t *phys;
size_t page_bytes_remain;
};
Patches currently in stable-queue which might be from ard.biesheuvel(a)linaro.org are
queue-4.14/efi-capsule-loader-reinstate-virtual-capsule-mapping.patch
This is a note to let you know that I've just added the patch titled
drm/i915: Disable DC states around GMBUS on GLK
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
drm-i915-disable-dc-states-around-gmbus-on-glk.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3488d0237f6364614f0c59d6d784bb79b11eeb92 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Fri, 8 Dec 2017 23:37:36 +0200
Subject: drm/i915: Disable DC states around GMBUS on GLK
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
commit 3488d0237f6364614f0c59d6d784bb79b11eeb92 upstream.
Prevent the DMC from destroying GMBUS transfers on GLK. GMBUS
lives in PG1 so DC off is all we need.
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171208213739.16388-1-ville.…
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan(a)intel.com>
(cherry picked from commit 156961ae7bdf6feb72778e8da83d321b273343fd)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpu/drm/i915/intel_runtime_pm.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -1786,6 +1786,7 @@ void intel_display_power_put(struct drm_
GLK_DISPLAY_POWERWELL_2_POWER_DOMAINS | \
BIT_ULL(POWER_DOMAIN_MODESET) | \
BIT_ULL(POWER_DOMAIN_AUX_A) | \
+ BIT_ULL(POWER_DOMAIN_GMBUS) | \
BIT_ULL(POWER_DOMAIN_INIT))
#define CNL_DISPLAY_POWERWELL_2_POWER_DOMAINS ( \
Patches currently in stable-queue which might be from ville.syrjala(a)linux.intel.com are
queue-4.14/drm-i915-apply-display-wa-1183-on-skl-kbl-and-cfl.patch
queue-4.14/drm-i915-disable-dc-states-around-gmbus-on-glk.patch
This is a note to let you know that I've just added the patch titled
drm/i915: Apply Display WA #1183 on skl, kbl, and cfl
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
drm-i915-apply-display-wa-1183-on-skl-kbl-and-cfl.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 30414f3010aff95ffdb6bed7b9dce62cde94fdc7 Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi(a)intel.com>
Date: Tue, 2 Jan 2018 12:18:37 -0800
Subject: drm/i915: Apply Display WA #1183 on skl, kbl, and cfl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Lucas De Marchi <lucas.demarchi(a)intel.com>
commit 30414f3010aff95ffdb6bed7b9dce62cde94fdc7 upstream.
Display WA #1183 was recently added to workaround
"Failures when enabling DPLL0 with eDP link rate 2.16
or 4.32 GHz and CD clock frequency 308.57 or 617.14 MHz
(CDCLK_CTL CD Frequency Select 10b or 11b) used in this
enabling or in previous enabling."
This workaround was designed to minimize the impact only
to save the bad case with that link rates. But HW engineers
indicated that it should be safe to apply broadly, although
they were expecting the DPLL0 link rate to be unchanged on
runtime.
We need to cover 2 cases: when we are in fact enabling DPLL0
and when we are just changing the frequency with small
differences.
This is based on previous patch by Rodrigo Vivi with suggestions
from Ville Syrjälä.
Cc: Arthur J Runyan <arthur.j.runyan(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204232210.4958-1-lucas.d…
(cherry picked from commit 53421c2fe99ce16838639ad89d772d914a119a49)
[ Lucas: Backport to 4.15 adding back variable that has been removed on
commits not meant to be backported ]
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180102201837.6812-1-lucas.d…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpu/drm/i915/i915_reg.h | 2 +
drivers/gpu/drm/i915/intel_cdclk.c | 35 +++++++++++++++++++++++---------
drivers/gpu/drm/i915/intel_runtime_pm.c | 10 +++++++++
3 files changed, 38 insertions(+), 9 deletions(-)
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6944,6 +6944,7 @@ enum {
#define RESET_PCH_HANDSHAKE_ENABLE (1<<4)
#define GEN8_CHICKEN_DCPR_1 _MMIO(0x46430)
+#define SKL_SELECT_ALTERNATE_DC_EXIT (1<<30)
#define MASK_WAKEMEM (1<<13)
#define SKL_DFSM _MMIO(0x51000)
@@ -8475,6 +8476,7 @@ enum skl_power_gate {
#define BXT_CDCLK_CD2X_DIV_SEL_2 (2<<22)
#define BXT_CDCLK_CD2X_DIV_SEL_4 (3<<22)
#define BXT_CDCLK_CD2X_PIPE(pipe) ((pipe)<<20)
+#define CDCLK_DIVMUX_CD_OVERRIDE (1<<19)
#define BXT_CDCLK_CD2X_PIPE_NONE BXT_CDCLK_CD2X_PIPE(3)
#define BXT_CDCLK_SSA_PRECHARGE_ENABLE (1<<16)
#define CDCLK_FREQ_DECIMAL_MASK (0x7ff)
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -859,16 +859,10 @@ static void skl_set_preferred_cdclk_vco(
static void skl_dpll0_enable(struct drm_i915_private *dev_priv, int vco)
{
- int min_cdclk = skl_calc_cdclk(0, vco);
u32 val;
WARN_ON(vco != 8100000 && vco != 8640000);
- /* select the minimum CDCLK before enabling DPLL 0 */
- val = CDCLK_FREQ_337_308 | skl_cdclk_decimal(min_cdclk);
- I915_WRITE(CDCLK_CTL, val);
- POSTING_READ(CDCLK_CTL);
-
/*
* We always enable DPLL0 with the lowest link rate possible, but still
* taking into account the VCO required to operate the eDP panel at the
@@ -922,7 +916,7 @@ static void skl_set_cdclk(struct drm_i91
{
int cdclk = cdclk_state->cdclk;
int vco = cdclk_state->vco;
- u32 freq_select, pcu_ack;
+ u32 freq_select, pcu_ack, cdclk_ctl;
int ret;
WARN_ON((cdclk == 24000) != (vco == 0));
@@ -939,7 +933,7 @@ static void skl_set_cdclk(struct drm_i91
return;
}
- /* set CDCLK_CTL */
+ /* Choose frequency for this cdclk */
switch (cdclk) {
case 450000:
case 432000:
@@ -967,10 +961,33 @@ static void skl_set_cdclk(struct drm_i91
dev_priv->cdclk.hw.vco != vco)
skl_dpll0_disable(dev_priv);
+ cdclk_ctl = I915_READ(CDCLK_CTL);
+
+ if (dev_priv->cdclk.hw.vco != vco) {
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK);
+ cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+ }
+
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl |= CDCLK_DIVMUX_CD_OVERRIDE;
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+ POSTING_READ(CDCLK_CTL);
+
if (dev_priv->cdclk.hw.vco != vco)
skl_dpll0_enable(dev_priv, vco);
- I915_WRITE(CDCLK_CTL, freq_select | skl_cdclk_decimal(cdclk));
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+
+ cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~CDCLK_DIVMUX_CD_OVERRIDE;
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
POSTING_READ(CDCLK_CTL);
/* inform PCU of the change */
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -598,6 +598,11 @@ void gen9_enable_dc5(struct drm_i915_pri
DRM_DEBUG_KMS("Enabling DC5\n");
+ /* Wa Display #1183: skl,kbl,cfl */
+ if (IS_GEN9_BC(dev_priv))
+ I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) |
+ SKL_SELECT_ALTERNATE_DC_EXIT);
+
gen9_set_dc_state(dev_priv, DC_STATE_EN_UPTO_DC5);
}
@@ -625,6 +630,11 @@ void skl_disable_dc6(struct drm_i915_pri
{
DRM_DEBUG_KMS("Disabling DC6\n");
+ /* Wa Display #1183: skl,kbl,cfl */
+ if (IS_GEN9_BC(dev_priv))
+ I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) |
+ SKL_SELECT_ALTERNATE_DC_EXIT);
+
gen9_set_dc_state(dev_priv, DC_STATE_DISABLE);
}
Patches currently in stable-queue which might be from lucas.demarchi(a)intel.com are
queue-4.14/drm-i915-apply-display-wa-1183-on-skl-kbl-and-cfl.patch
This is a note to let you know that I've just added the patch titled
crypto: pcrypt - fix freeing pcrypt instances
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-pcrypt-fix-freeing-pcrypt-instances.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d76c68109f37cb85b243a1cf0f40313afd2bae68 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 20 Dec 2017 14:28:25 -0800
Subject: crypto: pcrypt - fix freeing pcrypt instances
From: Eric Biggers <ebiggers(a)google.com>
commit d76c68109f37cb85b243a1cf0f40313afd2bae68 upstream.
pcrypt is using the old way of freeing instances, where the ->free()
method specified in the 'struct crypto_template' is passed a pointer to
the 'struct crypto_instance'. But the crypto_instance is being
kfree()'d directly, which is incorrect because the memory was actually
allocated as an aead_instance, which contains the crypto_instance at a
nonzero offset. Thus, the wrong pointer was being kfree()'d.
Fix it by switching to the new way to free aead_instance's where the
->free() method is specified in the aead_instance itself.
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: 0496f56065e0 ("crypto: pcrypt - Add support for new AEAD interface")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/pcrypt.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -254,6 +254,14 @@ static void pcrypt_aead_exit_tfm(struct
crypto_free_aead(ctx->child);
}
+static void pcrypt_free(struct aead_instance *inst)
+{
+ struct pcrypt_instance_ctx *ctx = aead_instance_ctx(inst);
+
+ crypto_drop_aead(&ctx->spawn);
+ kfree(inst);
+}
+
static int pcrypt_init_instance(struct crypto_instance *inst,
struct crypto_alg *alg)
{
@@ -319,6 +327,8 @@ static int pcrypt_create_aead(struct cry
inst->alg.encrypt = pcrypt_aead_encrypt;
inst->alg.decrypt = pcrypt_aead_decrypt;
+ inst->free = pcrypt_free;
+
err = aead_register_instance(tmpl, inst);
if (err)
goto out_drop_aead;
@@ -349,14 +359,6 @@ static int pcrypt_create(struct crypto_t
return -EINVAL;
}
-static void pcrypt_free(struct crypto_instance *inst)
-{
- struct pcrypt_instance_ctx *ctx = crypto_instance_ctx(inst);
-
- crypto_drop_aead(&ctx->spawn);
- kfree(inst);
-}
-
static int pcrypt_cpumask_change_notify(struct notifier_block *self,
unsigned long val, void *data)
{
@@ -469,7 +471,6 @@ static void pcrypt_fini_padata(struct pa
static struct crypto_template pcrypt_tmpl = {
.name = "pcrypt",
.create = pcrypt_create,
- .free = pcrypt_free,
.module = THIS_MODULE,
};
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.14/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.14/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
This is a note to let you know that I've just added the patch titled
crypto: n2 - cure use after free
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-n2-cure-use-after-free.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 203f45003a3d03eea8fa28d74cfc74c354416fdb Mon Sep 17 00:00:00 2001
From: Jan Engelhardt <jengelh(a)inai.de>
Date: Tue, 19 Dec 2017 19:09:07 +0100
Subject: crypto: n2 - cure use after free
From: Jan Engelhardt <jengelh(a)inai.de>
commit 203f45003a3d03eea8fa28d74cfc74c354416fdb upstream.
queue_cache_init is first called for the Control Word Queue
(n2_crypto_probe). At that time, queue_cache[0] is NULL and a new
kmem_cache will be allocated. If the subsequent n2_register_algs call
fails, the kmem_cache will be released in queue_cache_destroy, but
queue_cache_init[0] is not set back to NULL.
So when the Module Arithmetic Unit gets probed next (n2_mau_probe),
queue_cache_init will not allocate a kmem_cache again, but leave it
as its bogus value, causing a BUG() to trigger when queue_cache[0] is
eventually passed to kmem_cache_zalloc:
n2_crypto: Found N2CP at /virtual-devices@100/n2cp@7
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
n2_crypto: md5 alg registration failed
n2cp f028687c: /virtual-devices@100/n2cp@7: Unable to register algorithms.
called queue_cache_destroy
n2cp: probe of f028687c failed with error -22
n2_crypto: Found NCP at /virtual-devices@100/ncp@6
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
kernel BUG at mm/slab.c:2993!
Call Trace:
[0000000000604488] kmem_cache_alloc+0x1a8/0x1e0
(inlined) kmem_cache_zalloc
(inlined) new_queue
(inlined) spu_queue_setup
(inlined) handle_exec_unit
[0000000010c61eb4] spu_mdesc_scan+0x1f4/0x460 [n2_crypto]
[0000000010c62b80] n2_mau_probe+0x100/0x220 [n2_crypto]
[000000000084b174] platform_drv_probe+0x34/0xc0
Signed-off-by: Jan Engelhardt <jengelh(a)inai.de>
Acked-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/crypto/n2_core.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/crypto/n2_core.c
+++ b/drivers/crypto/n2_core.c
@@ -1625,6 +1625,7 @@ static int queue_cache_init(void)
CWQ_ENTRY_SIZE, 0, NULL);
if (!queue_cache[HV_NCS_QTYPE_CWQ - 1]) {
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
return -ENOMEM;
}
return 0;
@@ -1634,6 +1635,8 @@ static void queue_cache_destroy(void)
{
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]);
kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_CWQ - 1]);
+ queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL;
+ queue_cache[HV_NCS_QTYPE_CWQ - 1] = NULL;
}
static long spu_queue_register_workfn(void *arg)
Patches currently in stable-queue which might be from jengelh(a)inai.de are
queue-4.14/crypto-n2-cure-use-after-free.patch
This is a note to let you know that I've just added the patch titled
crypto: chelsio - select CRYPTO_GF128MUL
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-chelsio-select-crypto_gf128mul.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d042566d8c704e1ecec370300545d4a409222e39 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Tue, 5 Dec 2017 11:10:26 +0100
Subject: crypto: chelsio - select CRYPTO_GF128MUL
From: Arnd Bergmann <arnd(a)arndb.de>
commit d042566d8c704e1ecec370300545d4a409222e39 upstream.
Without the gf128mul library support, we can run into a link
error:
drivers/crypto/chelsio/chcr_algo.o: In function `chcr_update_tweak':
chcr_algo.c:(.text+0x7e0): undefined reference to `gf128mul_x8_ble'
This adds a Kconfig select statement for it, next to the ones we
already have.
Fixes: b8fd1f4170e7 ("crypto: chcr - Add ctr mode and process large sg entries for cipher")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/crypto/chelsio/Kconfig | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/crypto/chelsio/Kconfig
+++ b/drivers/crypto/chelsio/Kconfig
@@ -5,6 +5,7 @@ config CRYPTO_DEV_CHELSIO
select CRYPTO_SHA256
select CRYPTO_SHA512
select CRYPTO_AUTHENC
+ select CRYPTO_GF128MUL
---help---
The Chelsio Crypto Co-processor driver for T6 adapters.
Patches currently in stable-queue which might be from arnd(a)arndb.de are
queue-4.14/crypto-chelsio-select-crypto_gf128mul.patch
This is a note to let you know that I've just added the patch titled
crypto: chacha20poly1305 - validate the digest size
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-chacha20poly1305-validate-the-digest-size.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e57121d08c38dabec15cf3e1e2ad46721af30cae Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 11 Dec 2017 12:15:17 -0800
Subject: crypto: chacha20poly1305 - validate the digest size
From: Eric Biggers <ebiggers(a)google.com>
commit e57121d08c38dabec15cf3e1e2ad46721af30cae upstream.
If the rfc7539 template was instantiated with a hash algorithm with
digest size larger than 16 bytes (POLY1305_DIGEST_SIZE), then the digest
overran the 'tag' buffer in 'struct chachapoly_req_ctx', corrupting the
subsequent memory, including 'cryptlen'. This caused a crash during
crypto_skcipher_decrypt().
Fix it by, when instantiating the template, requiring that the
underlying hash algorithm has the digest size expected for Poly1305.
Reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int algfd, reqfd;
struct sockaddr_alg addr = {
.salg_type = "aead",
.salg_name = "rfc7539(chacha20,sha256)",
};
unsigned char buf[32] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (void *)&addr, sizeof(addr));
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, sizeof(buf));
reqfd = accept(algfd, 0, 0);
write(reqfd, buf, 16);
read(reqfd, buf, 16);
}
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: 71ebc4d1b27d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/chacha20poly1305.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/crypto/chacha20poly1305.c
+++ b/crypto/chacha20poly1305.c
@@ -610,6 +610,11 @@ static int chachapoly_create(struct cryp
algt->mask));
if (IS_ERR(poly))
return PTR_ERR(poly);
+ poly_hash = __crypto_hash_alg_common(poly);
+
+ err = -EINVAL;
+ if (poly_hash->digestsize != POLY1305_DIGEST_SIZE)
+ goto out_put_poly;
err = -ENOMEM;
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
@@ -618,7 +623,6 @@ static int chachapoly_create(struct cryp
ctx = aead_instance_ctx(inst);
ctx->saltlen = CHACHAPOLY_IV_SIZE - ivsize;
- poly_hash = __crypto_hash_alg_common(poly);
err = crypto_init_ahash_spawn(&ctx->poly, poly_hash,
aead_crypto_instance(inst));
if (err)
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.14/crypto-chacha20poly1305-validate-the-digest-size.patch
queue-4.14/crypto-pcrypt-fix-freeing-pcrypt-instances.patch
This is a note to let you know that I've just added the patch titled
btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-fix-refcount_t-usage-when-deleting-btrfs_delayed_nodes.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ec35e48b286959991cdbb886f1bdeda4575c80b4 Mon Sep 17 00:00:00 2001
From: Chris Mason <clm(a)fb.com>
Date: Fri, 15 Dec 2017 11:58:27 -0800
Subject: btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
From: Chris Mason <clm(a)fb.com>
commit ec35e48b286959991cdbb886f1bdeda4575c80b4 upstream.
refcounts have a generic implementation and an asm optimized one. The
generic version has extra debugging to make sure that once a refcount
goes to zero, refcount_inc won't increase it.
The btrfs delayed inode code wasn't expecting this, and we're tripping
over the warnings when the generic refcounts are used. We ended up with
this race:
Process A Process B
btrfs_get_delayed_node()
spin_lock(root->inode_lock)
radix_tree_lookup()
__btrfs_release_delayed_node()
refcount_dec_and_test(&delayed_node->refs)
our refcount is now zero
refcount_add(2) <---
warning here, refcount
unchanged
spin_lock(root->inode_lock)
radix_tree_delete()
With the generic refcounts, we actually warn again when process B above
tries to release his refcount because refcount_add() turned into a
no-op.
We saw this in production on older kernels without the asm optimized
refcounts.
The fix used here is to use refcount_inc_not_zero() to detect when the
object is in the middle of being freed and return NULL. This is almost
always the right answer anyway, since we usually end up pitching the
delayed_node if it didn't have fresh data in it.
This also changes __btrfs_release_delayed_node() to remove the extra
check for zero refcounts before radix tree deletion.
btrfs_get_delayed_node() was the only path that was allowing refcounts
to go from zero to one.
Fixes: 6de5f18e7b0da ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node")
Signed-off-by: Chris Mason <clm(a)fb.com>
Reviewed-by: Liu Bo <bo.li.liu(a)oracle.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/delayed-inode.c | 45 ++++++++++++++++++++++++++++++++++-----------
1 file changed, 34 insertions(+), 11 deletions(-)
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -87,6 +87,7 @@ static struct btrfs_delayed_node *btrfs_
spin_lock(&root->inode_lock);
node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
+
if (node) {
if (btrfs_inode->delayed_node) {
refcount_inc(&node->refs); /* can be accessed */
@@ -94,9 +95,30 @@ static struct btrfs_delayed_node *btrfs_
spin_unlock(&root->inode_lock);
return node;
}
- btrfs_inode->delayed_node = node;
- /* can be accessed and cached in the inode */
- refcount_add(2, &node->refs);
+
+ /*
+ * It's possible that we're racing into the middle of removing
+ * this node from the radix tree. In this case, the refcount
+ * was zero and it should never go back to one. Just return
+ * NULL like it was never in the radix at all; our release
+ * function is in the process of removing it.
+ *
+ * Some implementations of refcount_inc refuse to bump the
+ * refcount once it has hit zero. If we don't do this dance
+ * here, refcount_inc() may decide to just WARN_ONCE() instead
+ * of actually bumping the refcount.
+ *
+ * If this node is properly in the radix, we want to bump the
+ * refcount twice, once for the inode and once for this get
+ * operation.
+ */
+ if (refcount_inc_not_zero(&node->refs)) {
+ refcount_inc(&node->refs);
+ btrfs_inode->delayed_node = node;
+ } else {
+ node = NULL;
+ }
+
spin_unlock(&root->inode_lock);
return node;
}
@@ -254,17 +276,18 @@ static void __btrfs_release_delayed_node
mutex_unlock(&delayed_node->mutex);
if (refcount_dec_and_test(&delayed_node->refs)) {
- bool free = false;
struct btrfs_root *root = delayed_node->root;
+
spin_lock(&root->inode_lock);
- if (refcount_read(&delayed_node->refs) == 0) {
- radix_tree_delete(&root->delayed_nodes_tree,
- delayed_node->inode_id);
- free = true;
- }
+ /*
+ * Once our refcount goes to zero, nobody is allowed to bump it
+ * back up. We can delete it now.
+ */
+ ASSERT(refcount_read(&delayed_node->refs) == 0);
+ radix_tree_delete(&root->delayed_nodes_tree,
+ delayed_node->inode_id);
spin_unlock(&root->inode_lock);
- if (free)
- kmem_cache_free(delayed_node_cache, delayed_node);
+ kmem_cache_free(delayed_node_cache, delayed_node);
}
}
Patches currently in stable-queue which might be from clm(a)fb.com are
queue-4.14/btrfs-fix-refcount_t-usage-when-deleting-btrfs_delayed_nodes.patch
This is a note to let you know that I've just added the patch titled
x86/kasan: Write protect kasan zero shadow
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kasan-write-protect-kasan-zero-shadow.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 063fb3e56f6dd29b2633b678b837e1d904200e6f Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Date: Mon, 11 Jan 2016 15:51:19 +0300
Subject: x86/kasan: Write protect kasan zero shadow
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
commit 063fb3e56f6dd29b2633b678b837e1d904200e6f upstream.
After kasan_init() executed, no one is allowed to write to kasan_zero_page,
so write protect it.
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof(a)suse.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Toshi Kani <toshi.kani(a)hp.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-3-git-send-email-aryabinin@virtuo…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kasan_init_64.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -126,10 +126,16 @@ void __init kasan_init(void)
/*
* kasan_zero_page has been used as early shadow memory, thus it may
- * contain some garbage. Now we can clear it, since after the TLB flush
- * no one should write to it.
+ * contain some garbage. Now we can clear and write protect it, since
+ * after the TLB flush no one should write to it.
*/
memset(kasan_zero_page, 0, PAGE_SIZE);
+ for (i = 0; i < PTRS_PER_PTE; i++) {
+ pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO);
+ set_pte(&kasan_zero_pte[i], pte);
+ }
+ /* Flush TLBs again to be sure that write protection applied. */
+ __flush_tlb_all();
init_task.kasan_depth = 0;
pr_info("KernelAddressSanitizer initialized\n");
Patches currently in stable-queue which might be from aryabinin(a)virtuozzo.com are
queue-4.4/x86-kasan-write-protect-kasan-zero-shadow.patch
Geminilake requires the 3D driver to select whether barriers are
intended for compute shaders, or tessellation control shaders, by
whacking a "Barrier Mode" bit in SLICE_COMMON_ECO_CHICKEN1 when
switching pipelines. Failure to do this properly can result in GPU
hangs.
Unfortunately, this means it needs to switch mid-batch, so only
userspace can properly set it. To facilitate this, the kernel needs
to whitelist the register.
The workarounds page currently tags this as applying to Broxton only,
but that doesn't make sense. The documentation for the register it
references says the bit userspace is supposed to toggle only exists on
Geminilake. Empirically, the Mesa patch to toggle this bit appears to
fix intermittent GPU hangs in tessellation control shader barrier tests
on Geminilake; we haven't seen those hangs on Broxton.
v2: Mention WA #0862 in the comment (it doesn't have a name).
Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org>
Acked-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/i915_reg.h | 2 ++
drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++++
2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 966e4df9700e..505c605eff98 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7079,6 +7079,8 @@ enum {
#define GEN9_SLICE_COMMON_ECO_CHICKEN0 _MMIO(0x7308)
#define DISABLE_PIXEL_MASK_CAMMING (1<<14)
+#define GEN9_SLICE_COMMON_ECO_CHICKEN1 _MMIO(0x731c)
+
#define GEN7_L3SQCREG1 _MMIO(0xB010)
#define VLV_B0_WA_L3SQCREG1_VALUE 0x00D30000
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ebdcbcbacb3c..6bb51a502b8b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1338,6 +1338,11 @@ static int glk_init_workarounds(struct intel_engine_cs *engine)
if (ret)
return ret;
+ /* WA #0862: Userspace has to set "Barrier Mode" to avoid hangs. */
+ ret = wa_ring_whitelist_reg(engine, GEN9_SLICE_COMMON_ECO_CHICKEN1);
+ if (ret)
+ return ret;
+
/* WaToEnableHwFixForPushConstHWBug:glk */
WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION);
--
2.15.1
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 152ec4e87b57..5d2676d043de 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2763,6 +2763,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nojitter [IA-64] Disables jitter checking for ITC timers.
+ nopti [X86-64] Disable KAISER isolation of kernel from user.
+
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
@@ -3325,6 +3327,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64]
+ Control KAISER user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
diff --git a/Makefile b/Makefile
index 075e429732e7..acbc1b032db2 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
VERSION = 4
PATCHLEVEL = 9
-SUBLEVEL = 74
+SUBLEVEL = 75
EXTRAVERSION =
NAME = Roaring Lionus
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 766a5211f827..2728e1b7e4a6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,6 +9,7 @@
*/
#undef CONFIG_PARAVIRT
#undef CONFIG_PARAVIRT_SPINLOCKS
+#undef CONFIG_PAGE_TABLE_ISOLATION
#undef CONFIG_KASAN
#include <linux/linkage.h>
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e7b0e7ff4c58..af4e58132d91 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -36,6 +36,7 @@
#include <asm/smap.h>
#include <asm/pgtable_types.h>
#include <asm/export.h>
+#include <asm/kaiser.h>
#include <linux/err.h>
/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
@@ -146,6 +147,7 @@ ENTRY(entry_SYSCALL_64)
* it is too small to ever cause noticeable irq latency.
*/
SWAPGS_UNSAFE_STACK
+ SWITCH_KERNEL_CR3_NO_STACK
/*
* A hypervisor implementation might want to use a label
* after the swapgs, so that it can do the swapgs
@@ -228,6 +230,14 @@ entry_SYSCALL_64_fastpath:
movq RIP(%rsp), %rcx
movq EFLAGS(%rsp), %r11
RESTORE_C_REGS_EXCEPT_RCX_R11
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
+ SWITCH_USER_CR3
movq RSP(%rsp), %rsp
USERGS_SYSRET64
@@ -323,10 +333,26 @@ return_from_SYSCALL_64:
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
RESTORE_C_REGS_EXCEPT_RCX_R11
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
+ SWITCH_USER_CR3
movq RSP(%rsp), %rsp
USERGS_SYSRET64
opportunistic_sysret_failed:
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
+ SWITCH_USER_CR3
SWAPGS
jmp restore_c_regs_and_iret
END(entry_SYSCALL_64)
@@ -424,6 +450,7 @@ ENTRY(ret_from_fork)
movq %rsp, %rdi
call syscall_return_slowpath /* returns with IRQs disabled */
TRACE_IRQS_ON /* user mode is traced as IRQS on */
+ SWITCH_USER_CR3
SWAPGS
jmp restore_regs_and_iret
@@ -478,6 +505,7 @@ END(irq_entries_start)
* tracking that we're in kernel mode.
*/
SWAPGS
+ SWITCH_KERNEL_CR3
/*
* We need to tell lockdep that IRQs are off. We can't do this until
@@ -535,6 +563,7 @@ GLOBAL(retint_user)
mov %rsp,%rdi
call prepare_exit_to_usermode
TRACE_IRQS_IRETQ
+ SWITCH_USER_CR3
SWAPGS
jmp restore_regs_and_iret
@@ -612,6 +641,7 @@ native_irq_return_ldt:
pushq %rdi /* Stash user RDI */
SWAPGS
+ SWITCH_KERNEL_CR3
movq PER_CPU_VAR(espfix_waddr), %rdi
movq %rax, (0*8)(%rdi) /* user RAX */
movq (1*8)(%rsp), %rax /* user RIP */
@@ -638,6 +668,7 @@ native_irq_return_ldt:
* still points to an RO alias of the ESPFIX stack.
*/
orq PER_CPU_VAR(espfix_stack), %rax
+ SWITCH_USER_CR3
SWAPGS
movq %rax, %rsp
@@ -1022,7 +1053,11 @@ idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vec
/*
* Save all registers in pt_regs, and switch gs if needed.
* Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ *
+ * Return: ebx=0: needs swapgs but not SWITCH_USER_CR3 in paranoid_exit
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3 in paranoid_exit
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3 in paranoid_exit
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs in paranoid_exit
*/
ENTRY(paranoid_entry)
cld
@@ -1035,7 +1070,26 @@ ENTRY(paranoid_entry)
js 1f /* negative -> in kernel */
SWAPGS
xorl %ebx, %ebx
-1: ret
+1:
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
+ * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
+ * Do a conditional SWITCH_KERNEL_CR3: this could safely be done
+ * unconditionally, but we need to find out whether the reverse
+ * should be done on return (conveyed to paranoid_exit in %ebx).
+ */
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
+ testl $KAISER_SHADOW_PGD_OFFSET, %eax
+ jz 2f
+ orl $2, %ebx
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ /* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
+ movq %rax, %cr3
+2:
+#endif
+ ret
END(paranoid_entry)
/*
@@ -1048,19 +1102,26 @@ END(paranoid_entry)
* be complicated. Fortunately, we there's no good reason
* to try to handle preemption here.
*
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry: ebx=0: needs swapgs but not SWITCH_USER_CR3
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs
*/
ENTRY(paranoid_exit)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF_DEBUG
- testl %ebx, %ebx /* swapgs needed? */
+ TRACE_IRQS_IRETQ_DEBUG
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /* No ALTERNATIVE for X86_FEATURE_KAISER: paranoid_entry sets %ebx */
+ testl $2, %ebx /* SWITCH_USER_CR3 needed? */
+ jz paranoid_exit_no_switch
+ SWITCH_USER_CR3
+paranoid_exit_no_switch:
+#endif
+ testl $1, %ebx /* swapgs needed? */
jnz paranoid_exit_no_swapgs
- TRACE_IRQS_IRETQ
SWAPGS_UNSAFE_STACK
- jmp paranoid_exit_restore
paranoid_exit_no_swapgs:
- TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
RESTORE_EXTRA_REGS
RESTORE_C_REGS
REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1075,6 +1136,13 @@ ENTRY(error_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+ /*
+ * error_entry() always returns with a kernel gsbase and
+ * CR3. We must also have a kernel CR3/gsbase before
+ * calling TRACE_IRQS_*. Just unconditionally switch to
+ * the kernel CR3 here.
+ */
+ SWITCH_KERNEL_CR3
xorl %ebx, %ebx
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
@@ -1235,6 +1303,10 @@ ENTRY(nmi)
*/
SWAPGS_UNSAFE_STACK
+ /*
+ * percpu variables are mapped with user CR3, so no need
+ * to switch CR3 here.
+ */
cld
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1268,12 +1340,34 @@ ENTRY(nmi)
movq %rsp, %rdi
movq $-1, %rsi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /* Unconditionally use kernel CR3 for do_nmi() */
+ /* %rax is saved above, so OK to clobber here */
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
+ pushq %rax
+ /* mask off "user" bit of pgd address and 12 PCID bits: */
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ movq %rax, %cr3
+2:
+#endif
call do_nmi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * Unconditionally restore CR3. I know we return to
+ * kernel code that needs user CR3, but do we ever return
+ * to "user mode" where we need the kernel CR3?
+ */
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
+#endif
+
/*
* Return back to user mode. We must *not* do the normal exit
- * work, because we don't want to enable interrupts. Fortunately,
- * do_nmi doesn't modify pt_regs.
+ * work, because we don't want to enable interrupts. Do not
+ * switch to user CR3: we might be going back to kernel code
+ * that had a user CR3 set.
*/
SWAPGS
jmp restore_c_regs_and_iret
@@ -1470,22 +1564,55 @@ end_repeat_nmi:
ALLOC_PT_GPREGS_ON_STACK
/*
- * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
- * as we should not be calling schedule in NMI context.
- * Even with normal interrupts enabled. An NMI should not be
- * setting NEED_RESCHED or anything that normal interrupts and
- * exceptions might do.
+ * Use the same approach as paranoid_entry to handle SWAPGS, but
+ * without CR3 handling since we do that differently in NMIs. No
+ * need to use paranoid_exit as we should not be calling schedule
+ * in NMI context. Even with normal interrupts enabled. An NMI
+ * should not be setting NEED_RESCHED or anything that normal
+ * interrupts and exceptions might do.
*/
- call paranoid_entry
-
- /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
+ cld
+ SAVE_C_REGS
+ SAVE_EXTRA_REGS
+ movl $1, %ebx
+ movl $MSR_GS_BASE, %ecx
+ rdmsr
+ testl %edx, %edx
+ js 1f /* negative -> in kernel */
+ SWAPGS
+ xorl %ebx, %ebx
+1:
movq %rsp, %rdi
movq $-1, %rsi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /* Unconditionally use kernel CR3 for do_nmi() */
+ /* %rax is saved above, so OK to clobber here */
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
+ pushq %rax
+ /* mask off "user" bit of pgd address and 12 PCID bits: */
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ movq %rax, %cr3
+2:
+#endif
+
+ /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
call do_nmi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * Unconditionally restore CR3. We might be returning to
+ * kernel code that needs user CR3, like just just before
+ * a sysret.
+ */
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
+#endif
+
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
+ /* We fixed up CR3 above, so no need to switch it here */
SWAPGS_UNSAFE_STACK
nmi_restore:
RESTORE_EXTRA_REGS
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e1721dafbcb1..d76a97653980 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -13,6 +13,8 @@
#include <asm/irqflags.h>
#include <asm/asm.h>
#include <asm/smap.h>
+#include <asm/pgtable_types.h>
+#include <asm/kaiser.h>
#include <linux/linkage.h>
#include <linux/err.h>
@@ -48,6 +50,7 @@
ENTRY(entry_SYSENTER_compat)
/* Interrupts are off on entry. */
SWAPGS_UNSAFE_STACK
+ SWITCH_KERNEL_CR3_NO_STACK
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/*
@@ -184,6 +187,7 @@ ENDPROC(entry_SYSENTER_compat)
ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
SWAPGS_UNSAFE_STACK
+ SWITCH_KERNEL_CR3_NO_STACK
/* Stash user ESP and switch to the kernel stack. */
movl %esp, %r8d
@@ -259,6 +263,7 @@ sysret32_from_system_call:
xorq %r8, %r8
xorq %r9, %r9
xorq %r10, %r10
+ SWITCH_USER_CR3
movq RSP-ORIG_RAX(%rsp), %rsp
swapgs
sysretl
@@ -297,7 +302,7 @@ ENTRY(entry_INT80_compat)
PARAVIRT_ADJUST_EXCEPTION_FRAME
ASM_CLAC /* Do this early to minimize exposure */
SWAPGS
-
+ SWITCH_KERNEL_CR3_NO_STACK
/*
* User tracing code (ptrace or signal handlers) might assume that
* the saved RAX contains a 32-bit number when we're invoking a 32-bit
@@ -338,6 +343,7 @@ ENTRY(entry_INT80_compat)
/* Go back to user mode. */
TRACE_IRQS_ON
+ SWITCH_USER_CR3
SWAPGS
jmp restore_regs_and_iret
END(entry_INT80_compat)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 9dfeeeca0ea8..8e7a3f1df3a5 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2,11 +2,15 @@
#include <linux/types.h>
#include <linux/slab.h>
+#include <asm/kaiser.h>
#include <asm/perf_event.h>
#include <asm/insn.h>
#include "../perf_event.h"
+static
+DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct debug_store, cpu_debug_store);
+
/* The size of a BTS record in bytes: */
#define BTS_RECORD_SIZE 24
@@ -268,6 +272,39 @@ void fini_debug_store_on_cpu(int cpu)
static DEFINE_PER_CPU(void *, insn_buffer);
+static void *dsalloc(size_t size, gfp_t flags, int node)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ unsigned int order = get_order(size);
+ struct page *page;
+ unsigned long addr;
+
+ page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+ if (!page)
+ return NULL;
+ addr = (unsigned long)page_address(page);
+ if (kaiser_add_mapping(addr, size, __PAGE_KERNEL) < 0) {
+ __free_pages(page, order);
+ addr = 0;
+ }
+ return (void *)addr;
+#else
+ return kmalloc_node(size, flags | __GFP_ZERO, node);
+#endif
+}
+
+static void dsfree(const void *buffer, size_t size)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (!buffer)
+ return;
+ kaiser_remove_mapping((unsigned long)buffer, size);
+ free_pages((unsigned long)buffer, get_order(size));
+#else
+ kfree(buffer);
+#endif
+}
+
static int alloc_pebs_buffer(int cpu)
{
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -278,7 +315,7 @@ static int alloc_pebs_buffer(int cpu)
if (!x86_pmu.pebs)
return 0;
- buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
+ buffer = dsalloc(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
if (unlikely(!buffer))
return -ENOMEM;
@@ -289,7 +326,7 @@ static int alloc_pebs_buffer(int cpu)
if (x86_pmu.intel_cap.pebs_format < 2) {
ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
if (!ibuffer) {
- kfree(buffer);
+ dsfree(buffer, x86_pmu.pebs_buffer_size);
return -ENOMEM;
}
per_cpu(insn_buffer, cpu) = ibuffer;
@@ -315,7 +352,8 @@ static void release_pebs_buffer(int cpu)
kfree(per_cpu(insn_buffer, cpu));
per_cpu(insn_buffer, cpu) = NULL;
- kfree((void *)(unsigned long)ds->pebs_buffer_base);
+ dsfree((void *)(unsigned long)ds->pebs_buffer_base,
+ x86_pmu.pebs_buffer_size);
ds->pebs_buffer_base = 0;
}
@@ -329,7 +367,7 @@ static int alloc_bts_buffer(int cpu)
if (!x86_pmu.bts)
return 0;
- buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+ buffer = dsalloc(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
if (unlikely(!buffer)) {
WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
return -ENOMEM;
@@ -355,19 +393,15 @@ static void release_bts_buffer(int cpu)
if (!ds || !x86_pmu.bts)
return;
- kfree((void *)(unsigned long)ds->bts_buffer_base);
+ dsfree((void *)(unsigned long)ds->bts_buffer_base, BTS_BUFFER_SIZE);
ds->bts_buffer_base = 0;
}
static int alloc_ds_buffer(int cpu)
{
- int node = cpu_to_node(cpu);
- struct debug_store *ds;
-
- ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
- if (unlikely(!ds))
- return -ENOMEM;
+ struct debug_store *ds = per_cpu_ptr(&cpu_debug_store, cpu);
+ memset(ds, 0, sizeof(*ds));
per_cpu(cpu_hw_events, cpu).ds = ds;
return 0;
@@ -381,7 +415,6 @@ static void release_ds_buffer(int cpu)
return;
per_cpu(cpu_hw_events, cpu).ds = NULL;
- kfree(ds);
}
void release_ds_buffers(void)
diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index e01f7f7ccb0c..84ae170bc3d0 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
#define _ASM_X86_CMDLINE_H
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+ char *buffer, int bufsize);
#endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ed10b5bf9b93..454a37adb823 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -189,6 +189,7 @@
#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */
#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 4) /* Effectively INVPCID && CR4.PCIDE=1 */
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
@@ -197,6 +198,9 @@
#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 12080d87da3b..2ed5a2b3f8f7 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -43,7 +43,7 @@ struct gdt_page {
struct desc_struct gdt[GDT_ENTRIES];
} __attribute__((aligned(PAGE_SIZE)));
-DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);
+DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page);
static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
{
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index b90e1053049b..0817d63bce41 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -178,7 +178,7 @@ extern char irq_entries_start[];
#define VECTOR_RETRIGGERED ((void *)~0UL)
typedef struct irq_desc* vector_irq_t[NR_VECTORS];
-DECLARE_PER_CPU(vector_irq_t, vector_irq);
+DECLARE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq);
#endif /* !ASSEMBLY_ */
diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h
new file mode 100644
index 000000000000..802bbbdfe143
--- /dev/null
+++ b/arch/x86/include/asm/kaiser.h
@@ -0,0 +1,141 @@
+#ifndef _ASM_X86_KAISER_H
+#define _ASM_X86_KAISER_H
+
+#include <uapi/asm/processor-flags.h> /* For PCID constants */
+
+/*
+ * This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on
+ * the kernel virtual memory. It has a shadow pgd for every process: the
+ * shadow pgd has a minimalistic kernel-set mapped, but includes the whole
+ * user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode,
+ * the shadow pgd is enabled. By this, the virtual memory caches are freed,
+ * and the user may not attack the whole kernel memory.
+ *
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user
+ * mode, such as the entry/exit functions of the user space, or the stacks.
+ */
+
+#define KAISER_SHADOW_PGD_OFFSET 0x1000
+
+#ifdef __ASSEMBLY__
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+.macro _SWITCH_TO_KERNEL_CR3 reg
+movq %cr3, \reg
+andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
+/* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ALTERNATIVE "", "bts $63, \reg", X86_FEATURE_PCID
+movq \reg, %cr3
+.endm
+
+.macro _SWITCH_TO_USER_CR3 reg regb
+/*
+ * regb must be the low byte portion of reg: because we have arranged
+ * for the low byte of the user PCID to serve as the high byte of NOFLUSH
+ * (0x80 for each when PCID is enabled, or 0x00 when PCID and NOFLUSH are
+ * not enabled): so that the one register can update both memory and cr3.
+ */
+movq %cr3, \reg
+orq PER_CPU_VAR(x86_cr3_pcid_user), \reg
+js 9f
+/* If PCID enabled, FLUSH this time, reset to NOFLUSH for next time */
+movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
+9:
+movq \reg, %cr3
+.endm
+
+.macro SWITCH_KERNEL_CR3
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
+_SWITCH_TO_KERNEL_CR3 %rax
+popq %rax
+8:
+.endm
+
+.macro SWITCH_USER_CR3
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
+_SWITCH_TO_USER_CR3 %rax %al
+popq %rax
+8:
+.endm
+
+.macro SWITCH_KERNEL_CR3_NO_STACK
+ALTERNATIVE "jmp 8f", \
+ __stringify(movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)), \
+ X86_FEATURE_KAISER
+_SWITCH_TO_KERNEL_CR3 %rax
+movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+8:
+.endm
+
+#else /* CONFIG_PAGE_TABLE_ISOLATION */
+
+.macro SWITCH_KERNEL_CR3
+.endm
+.macro SWITCH_USER_CR3
+.endm
+.macro SWITCH_KERNEL_CR3_NO_STACK
+.endm
+
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Upon kernel/user mode switch, it may happen that the address
+ * space has to be switched before the registers have been
+ * stored. To change the address space, another register is
+ * needed. A register therefore has to be stored/restored.
+*/
+DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+
+extern int kaiser_enabled;
+extern void __init kaiser_check_boottime_disable(void);
+#else
+#define kaiser_enabled 0
+static inline void __init kaiser_check_boottime_disable(void) {}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+/*
+ * Kaiser function prototypes are needed even when CONFIG_PAGE_TABLE_ISOLATION is not set,
+ * so as to build with tests on kaiser_enabled instead of #ifdefs.
+ */
+
+/**
+ * kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
+ * @addr: the start address of the range
+ * @size: the size of the range
+ * @flags: The mapping flags of the pages
+ *
+ * The mapping is done on a global scope, so no bigger
+ * synchronization has to be done. the pages have to be
+ * manually unmapped again when they are not needed any longer.
+ */
+extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
+
+/**
+ * kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
+ * @addr: the start address of the range
+ * @size: the size of the range
+ */
+extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
+
+/**
+ * kaiser_init - Initialize the shadow mapping
+ *
+ * Most parts of the shadow mapping can be mapped upon boot
+ * time. Only per-process things like the thread stacks
+ * or a new LDT have to be mapped at runtime. These boot-
+ * time mappings are permanent and never unmapped.
+ */
+extern void kaiser_init(void);
+
+#endif /* __ASSEMBLY */
+
+#endif /* _ASM_X86_KAISER_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 437feb436efa..2536f90cd30c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -18,6 +18,12 @@
#ifndef __ASSEMBLY__
#include <asm/x86_init.h>
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif
+
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
@@ -690,7 +696,17 @@ static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address)
static inline int pgd_bad(pgd_t pgd)
{
- return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+ pgdval_t ignore_flags = _PAGE_USER;
+ /*
+ * We set NX on KAISER pgds that map userspace memory so
+ * that userspace can not meaningfully use the kernel
+ * page table by accident; it will fault on the first
+ * instruction it tries to run. See native_set_pgd().
+ */
+ if (kaiser_enabled)
+ ignore_flags |= _PAGE_NX;
+
+ return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
}
static inline int pgd_none(pgd_t pgd)
@@ -903,7 +919,15 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (kaiser_enabled) {
+ /* Clone the shadow pgd part as well */
+ memcpy(native_get_shadow_pgd(dst),
+ native_get_shadow_pgd(src),
+ count * sizeof(pgd_t));
+ }
+#endif
}
#define PTE_SHIFT ilog2(PTRS_PER_PTE)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 1cc82ece9ac1..ce97c8c6a310 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -106,9 +106,32 @@ static inline void native_pud_clear(pud_t *pud)
native_set_pud(pud, native_make_pud(0));
}
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd);
+
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
+{
+#ifdef CONFIG_DEBUG_VM
+ /* linux/mmdebug.h may not have been included at this point */
+ BUG_ON(!kaiser_enabled);
+#endif
+ return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
+}
+#else
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
+{
+ BUILD_BUG_ON(1);
+ return NULL;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
- *pgdp = pgd;
+ *pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
}
static inline void native_pgd_clear(pgd_t *pgd)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 8b4de22d6429..f1c8ac468292 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -119,7 +119,7 @@
#define _PAGE_DEVMAP (_AT(pteval_t, 0))
#endif
-#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_ACCESSED | _PAGE_DIRTY)
@@ -137,6 +137,33 @@
_PAGE_SOFT_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
+/* The ASID is the lower 12 bits of CR3 */
+#define X86_CR3_PCID_ASID_MASK (_AC((1<<12)-1,UL))
+
+/* Mask for all the PCID-related bits in CR3: */
+#define X86_CR3_PCID_MASK (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
+
+#if defined(CONFIG_PAGE_TABLE_ISOLATION) && defined(CONFIG_X86_64)
+/* Let X86_CR3_PCID_ASID_USER be usable for the X86_CR3_PCID_NOFLUSH bit */
+#define X86_CR3_PCID_ASID_USER (_AC(0x80,UL))
+
+#define X86_CR3_PCID_KERN_FLUSH (X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_FLUSH (X86_CR3_PCID_ASID_USER)
+#define X86_CR3_PCID_KERN_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
+#else
+#define X86_CR3_PCID_ASID_USER (_AC(0x0,UL))
+/*
+ * PCIDs are unsupported on 32-bit and none of these bits can be
+ * set in CR3:
+ */
+#define X86_CR3_PCID_KERN_FLUSH (0)
+#define X86_CR3_PCID_USER_FLUSH (0)
+#define X86_CR3_PCID_KERN_NOFLUSH (0)
+#define X86_CR3_PCID_USER_NOFLUSH (0)
+#endif
+
/*
* The cache modes defined here are used to translate between pure SW usage
* and the HW defined cache mode bits and/or PAT entries.
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 83db0eae9979..8cb52ee3ade6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -308,7 +308,7 @@ struct tss_struct {
} ____cacheline_aligned;
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss);
#ifdef CONFIG_X86_32
DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 7d2ea6b1f7d9..94146f665a3c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -132,6 +132,24 @@ static inline void cr4_set_bits_and_update_boot(unsigned long mask)
cr4_set_bits(mask);
}
+/*
+ * Declare a couple of kaiser interfaces here for convenience,
+ * to avoid the need for asm/kaiser.h in unexpected places.
+ */
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern int kaiser_enabled;
+extern void kaiser_setup_pcid(void);
+extern void kaiser_flush_tlb_on_return_to_user(void);
+#else
+#define kaiser_enabled 0
+static inline void kaiser_setup_pcid(void)
+{
+}
+static inline void kaiser_flush_tlb_on_return_to_user(void)
+{
+}
+#endif
+
static inline void __native_flush_tlb(void)
{
/*
@@ -140,6 +158,8 @@ static inline void __native_flush_tlb(void)
* back:
*/
preempt_disable();
+ if (kaiser_enabled)
+ kaiser_flush_tlb_on_return_to_user();
native_write_cr3(native_read_cr3());
preempt_enable();
}
@@ -149,20 +169,27 @@ static inline void __native_flush_tlb_global_irq_disabled(void)
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- /* clear PGE */
- native_write_cr4(cr4 & ~X86_CR4_PGE);
- /* write old PGE again and flush TLBs */
- native_write_cr4(cr4);
+ if (cr4 & X86_CR4_PGE) {
+ /* clear PGE and flush TLB of all entries */
+ native_write_cr4(cr4 & ~X86_CR4_PGE);
+ /* restore PGE as it was before */
+ native_write_cr4(cr4);
+ } else {
+ /* do it with cr3, letting kaiser flush user PCID */
+ __native_flush_tlb();
+ }
}
static inline void __native_flush_tlb_global(void)
{
unsigned long flags;
- if (static_cpu_has(X86_FEATURE_INVPCID)) {
+ if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
invpcid_flush_all();
return;
@@ -174,24 +201,45 @@ static inline void __native_flush_tlb_global(void)
* be called from deep inside debugging code.)
*/
raw_local_irq_save(flags);
-
__native_flush_tlb_global_irq_disabled();
-
raw_local_irq_restore(flags);
}
static inline void __native_flush_tlb_single(unsigned long addr)
{
- asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+ /*
+ * SIMICS #GP's if you run INVPCID with type 2/3
+ * and X86_CR4_PCIDE clear. Shame!
+ *
+ * The ASIDs used below are hard-coded. But, we must not
+ * call invpcid(type=1/2) before CR4.PCIDE=1. Just call
+ * invlpg in the case we are called early.
+ */
+
+ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+ if (kaiser_enabled)
+ kaiser_flush_tlb_on_return_to_user();
+ asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+ return;
+ }
+ /* Flush the address out of both PCIDs. */
+ /*
+ * An optimization here might be to determine addresses
+ * that are only kernel-mapped and only flush the kernel
+ * ASID. But, userspace flushes are probably much more
+ * important performance-wise.
+ *
+ * Make sure to do only a single invpcid when KAISER is
+ * disabled and we have only a single ASID.
+ */
+ if (kaiser_enabled)
+ invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
}
static inline void __flush_tlb_all(void)
{
- if (boot_cpu_has(X86_FEATURE_PGE))
- __flush_tlb_global();
- else
- __flush_tlb();
-
+ __flush_tlb_global();
/*
* Note: if we somehow had PCID but not PGE, then this wouldn't work --
* we'd end up flushing kernel translations for the current ASID but
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index 567de50a4c2a..6768d1321016 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -77,7 +77,8 @@
#define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT)
#define X86_CR3_PCD_BIT 4 /* Page Cache Disable */
#define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
/*
* Intel CPU features in CR4
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 91588be529b9..918e44772b04 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -93,7 +93,7 @@ static const struct cpu_dev default_cpu = {
static const struct cpu_dev *this_cpu = &default_cpu;
-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page) = { .gdt = {
#ifdef CONFIG_X86_64
/*
* We need valid kernel segments for data and code in long mode too
@@ -327,8 +327,21 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
- if (cpu_has(c, X86_FEATURE_PGE)) {
+ if (cpu_has(c, X86_FEATURE_PGE) || kaiser_enabled) {
cr4_set_bits(X86_CR4_PCIDE);
+ /*
+ * INVPCID has two "groups" of types:
+ * 1/2: Invalidate an individual address
+ * 3/4: Invalidate all contexts
+ *
+ * 1/2 take a PCID, but 3/4 do not. So, 3/4
+ * ignore the PCID argument in the descriptor.
+ * But, we have to be careful not to call 1/2
+ * with an actual non-zero PCID in them before
+ * we do the above cr4_set_bits().
+ */
+ if (cpu_has(c, X86_FEATURE_INVPCID))
+ set_cpu_cap(c, X86_FEATURE_INVPCID_SINGLE);
} else {
/*
* flush_tlb_all(), as currently implemented, won't
@@ -341,6 +354,7 @@ static void setup_pcid(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_PCID);
}
}
+ kaiser_setup_pcid();
}
/*
@@ -1365,7 +1379,7 @@ static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
[DEBUG_STACK - 1] = DEBUG_STKSZ
};
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(char, exception_stacks
[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
/* May not be marked __init: used by software suspend */
@@ -1523,6 +1537,14 @@ void cpu_init(void)
* try to read it.
*/
cr4_init_shadow();
+ if (!kaiser_enabled) {
+ /*
+ * secondary_startup_64() deferred setting PGE in cr4:
+ * probe_page_size_mask() sets it on the boot cpu,
+ * but it needs to be set on each secondary cpu.
+ */
+ cr4_set_bits(X86_CR4_PGE);
+ }
/*
* Load microcode on this cpu if a valid microcode is available.
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 04f89caef9c4..e33b38541be3 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -41,6 +41,7 @@
#include <asm/pgalloc.h>
#include <asm/setup.h>
#include <asm/espfix.h>
+#include <asm/kaiser.h>
/*
* Note: we only need 6*8 = 48 bytes for the espfix stack, but round
@@ -126,6 +127,15 @@ void __init init_espfix_bsp(void)
/* Install the espfix pud into the kernel page directory */
pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
+ /*
+ * Just copy the top-level PGD that is mapping the espfix
+ * area to ensure it is mapped into the shadow user page
+ * tables.
+ */
+ if (kaiser_enabled) {
+ set_pgd(native_get_shadow_pgd(pgd_p),
+ __pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
+ }
/* Randomize the locations */
init_espfix_random();
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index b4421cc191b0..67cd7c1b99da 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -190,8 +190,8 @@ ENTRY(secondary_startup_64)
movq $(init_level4_pgt - __START_KERNEL_map), %rax
1:
- /* Enable PAE mode and PGE */
- movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
+ /* Enable PAE and PSE, but defer PGE until kaiser_enabled is decided */
+ movl $(X86_CR4_PAE | X86_CR4_PSE), %ecx
movq %rcx, %cr4
/* Setup early boot stage 4 level pagetables. */
@@ -405,6 +405,27 @@ GLOBAL(early_recursion_flag)
.balign PAGE_SIZE; \
GLOBAL(name)
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Each PGD needs to be 8k long and 8k aligned. We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define KAISER_USER_PGD_FILL 512
+/* This ensures they are 8k-aligned: */
+#define NEXT_PGD_PAGE(name) \
+ .balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define KAISER_USER_PGD_FILL 0
+#endif
+
/* Automate the creation of 1 to 1 mapping pmd entries */
#define PMDS(START, PERM, COUNT) \
i = 0 ; \
@@ -414,9 +435,10 @@ GLOBAL(name)
.endr
__INITDATA
-NEXT_PAGE(early_level4_pgt)
+NEXT_PGD_PAGE(early_level4_pgt)
.fill 511,8,0
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -424,16 +446,18 @@ NEXT_PAGE(early_dynamic_pgts)
.data
#ifndef CONFIG_XEN
-NEXT_PAGE(init_level4_pgt)
+NEXT_PGD_PAGE(init_level4_pgt)
.fill 512,8,0
+ .fill KAISER_USER_PGD_FILL,8,0
#else
-NEXT_PAGE(init_level4_pgt)
+NEXT_PGD_PAGE(init_level4_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.org init_level4_pgt + L4_PAGE_OFFSET*8, 0
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.org init_level4_pgt + L4_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -444,6 +468,7 @@ NEXT_PAGE(level2_ident_pgt)
*/
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#endif
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(level3_kernel_pgt)
.fill L3_START_KERNEL,8,0
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 1423ab1b0312..f480b38a03c3 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -51,7 +51,7 @@ static struct irqaction irq2 = {
.flags = IRQF_NO_THREAD,
};
-DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
+DEFINE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq) = {
[0 ... NR_VECTORS - 1] = VECTOR_UNUSED,
};
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 5f70014ca602..8bc68cfc0d33 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <linux/kaiser.h>
#include <asm/ldt.h>
#include <asm/desc.h>
@@ -34,11 +35,21 @@ static void flush_ldt(void *current_mm)
set_ldt(pc->ldt->entries, pc->ldt->size);
}
+static void __free_ldt_struct(struct ldt_struct *ldt)
+{
+ if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+ vfree(ldt->entries);
+ else
+ free_page((unsigned long)ldt->entries);
+ kfree(ldt);
+}
+
/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
static struct ldt_struct *alloc_ldt_struct(int size)
{
struct ldt_struct *new_ldt;
int alloc_size;
+ int ret;
if (size > LDT_ENTRIES)
return NULL;
@@ -66,7 +77,13 @@ static struct ldt_struct *alloc_ldt_struct(int size)
return NULL;
}
+ ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+ __PAGE_KERNEL);
new_ldt->size = size;
+ if (ret) {
+ __free_ldt_struct(new_ldt);
+ return NULL;
+ }
return new_ldt;
}
@@ -92,12 +109,10 @@ static void free_ldt_struct(struct ldt_struct *ldt)
if (likely(!ldt))
return;
+ kaiser_remove_mapping((unsigned long)ldt->entries,
+ ldt->size * LDT_ENTRY_SIZE);
paravirt_free_ldt(ldt->entries, ldt->size);
- if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
- vfree(ldt->entries);
- else
- free_page((unsigned long)ldt->entries);
- kfree(ldt);
+ __free_ldt_struct(ldt);
}
/*
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index bb3840cedb4f..ee43b36075c7 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -9,7 +9,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax");
DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
DEF_NATIVE(pv_cpu_ops, clts, "clts");
DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
@@ -59,7 +58,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
PATCH_SITE(pv_cpu_ops, clts);
- PATCH_SITE(pv_mmu_ops, flush_tlb_single);
PATCH_SITE(pv_cpu_ops, wbinvd);
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 8e10e72bf6ee..a55b32007785 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -41,7 +41,7 @@
* section. Since TSS's are completely CPU-local, we want them
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
*/
-__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss) = {
.x86_tss = {
.sp0 = TOP_OF_INIT_STACK,
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index feaab07fa124..6b55012d02a3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -114,6 +114,7 @@
#include <asm/microcode.h>
#include <asm/mmu_context.h>
#include <asm/kaslr.h>
+#include <asm/kaiser.h>
/*
* max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -1019,6 +1020,12 @@ void __init setup_arch(char **cmdline_p)
*/
init_hypervisor_platform();
+ /*
+ * This needs to happen right after XENPV is set on xen and
+ * kaiser_enabled is checked below in cleanup_highmap().
+ */
+ kaiser_check_boottime_disable();
+
x86_init.resources.probe_roms();
/* after parse_early_param, so could debug it */
diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c
index 1c113db9ed57..2bb5ee464df3 100644
--- a/arch/x86/kernel/tracepoint.c
+++ b/arch/x86/kernel/tracepoint.c
@@ -9,10 +9,12 @@
#include <linux/atomic.h>
atomic_t trace_idt_ctr = ATOMIC_INIT(0);
+__aligned(PAGE_SIZE)
struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
(unsigned long) trace_idt_table };
/* No need to be aligned, but done to keep all IDTs defined the same way. */
+__aligned(PAGE_SIZE)
gate_desc trace_idt_table[NR_VECTORS] __page_aligned_bss;
static int trace_irq_vector_refcount;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7e28e6c877d9..73304b1a03cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -773,7 +773,8 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
/* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
- if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
+ if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_ASID_MASK) ||
+ !is_long_mode(vcpu))
return 1;
}
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 5cc78bf57232..3261abb21ef4 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -104,7 +104,112 @@ __cmdline_find_option_bool(const char *cmdline, int max_cmdline_size,
return 0; /* Buffer overrun */
}
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+ const char *option, char *buffer, int bufsize)
+{
+ char c;
+ int pos = 0, len = -1;
+ const char *opptr = NULL;
+ char *bufptr = buffer;
+ enum {
+ st_wordstart = 0, /* Start of word/after whitespace */
+ st_wordcmp, /* Comparing this word */
+ st_wordskip, /* Miscompare, skip */
+ st_bufcpy, /* Copying this to buffer */
+ } state = st_wordstart;
+
+ if (!cmdline)
+ return -1; /* No command line */
+
+ /*
+ * This 'pos' check ensures we do not overrun
+ * a non-NULL-terminated 'cmdline'
+ */
+ while (pos++ < max_cmdline_size) {
+ c = *(char *)cmdline++;
+ if (!c)
+ break;
+
+ switch (state) {
+ case st_wordstart:
+ if (myisspace(c))
+ break;
+
+ state = st_wordcmp;
+ opptr = option;
+ /* fall through */
+
+ case st_wordcmp:
+ if ((c == '=') && !*opptr) {
+ /*
+ * We matched all the way to the end of the
+ * option we were looking for, prepare to
+ * copy the argument.
+ */
+ len = 0;
+ bufptr = buffer;
+ state = st_bufcpy;
+ break;
+ } else if (c == *opptr++) {
+ /*
+ * We are currently matching, so continue
+ * to the next character on the cmdline.
+ */
+ break;
+ }
+ state = st_wordskip;
+ /* fall through */
+
+ case st_wordskip:
+ if (myisspace(c))
+ state = st_wordstart;
+ break;
+
+ case st_bufcpy:
+ if (myisspace(c)) {
+ state = st_wordstart;
+ } else {
+ /*
+ * Increment len, but don't overrun the
+ * supplied buffer and leave room for the
+ * NULL terminator.
+ */
+ if (++len < bufsize)
+ *bufptr++ = c;
+ }
+ break;
+ }
+ }
+
+ if (bufsize)
+ *bufptr = '\0';
+
+ return len;
+}
+
int cmdline_find_option_bool(const char *cmdline, const char *option)
{
return __cmdline_find_option_bool(cmdline, COMMAND_LINE_SIZE, option);
}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+ int bufsize)
+{
+ return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+ buffer, bufsize);
+}
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 96d2b847e09e..c548b46100cb 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -37,5 +37,5 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
-obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
-
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_PAGE_TABLE_ISOLATION) += kaiser.o
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 0381638168d1..1e779bca4f3e 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -177,7 +177,7 @@ static void __init probe_page_size_mask(void)
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
- if (boot_cpu_has(X86_FEATURE_PGE)) {
+ if (boot_cpu_has(X86_FEATURE_PGE) && !kaiser_enabled) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
__supported_pte_mask |= _PAGE_GLOBAL;
} else
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3e27ded6ac65..7df8e3a79dc0 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -324,6 +324,16 @@ void __init cleanup_highmap(void)
continue;
if (vaddr < (unsigned long) _text || vaddr > end)
set_pmd(pmd, __pmd(0));
+ else if (kaiser_enabled) {
+ /*
+ * level2_kernel_pgt is initialized with _PAGE_GLOBAL:
+ * clear that now. This is not important, so long as
+ * CR4.PGE remains clear, but it removes an anomaly.
+ * Physical mapping setup below avoids _PAGE_GLOBAL
+ * by use of massage_pgprot() inside pfn_pte() etc.
+ */
+ set_pmd(pmd, pmd_clear_flags(*pmd, _PAGE_GLOBAL));
+ }
}
}
diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
new file mode 100644
index 000000000000..d8376b4ad9f0
--- /dev/null
+++ b/arch/x86/mm/kaiser.c
@@ -0,0 +1,455 @@
+#include <linux/bug.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
+
+#include <asm/kaiser.h>
+#include <asm/tlbflush.h> /* to verify its kaiser declarations */
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/desc.h>
+#include <asm/cmdline.h>
+
+int kaiser_enabled __read_mostly = 1;
+EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
+
+__visible
+DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3. It would take
+ * another register. So, we use a memory reference to these instead.
+ *
+ * This is also handy because systems that do not support PCIDs
+ * just end up or'ing a 0 into their CR3, which does no harm.
+ */
+DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
+
+/*
+ * At runtime, the only things we map are some things for CPU
+ * hotplug, and stacks for new processes. No two CPUs will ever
+ * be populating the same addresses, so we only need to ensure
+ * that we protect between two CPUs trying to allocate and
+ * populate the same page table page.
+ *
+ * Only take this lock when doing a set_p[4um]d(), but it is not
+ * needed for doing a set_pte(). We assume that only the *owner*
+ * of a given allocation will be doing this for _their_
+ * allocation.
+ *
+ * This ensures that once a system has been running for a while
+ * and there have been stacks all over and these page tables
+ * are fully populated, there will be no further acquisitions of
+ * this lock.
+ */
+static DEFINE_SPINLOCK(shadow_table_allocation_lock);
+
+/*
+ * Returns -1 on error.
+ */
+static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = pgd_offset_k(vaddr);
+ /*
+ * We made all the kernel PGDs present in kaiser_init().
+ * We expect them to stay that way.
+ */
+ BUG_ON(pgd_none(*pgd));
+ /*
+ * PGDs are either 512GB or 128TB on all x86_64
+ * configurations. We don't handle these.
+ */
+ BUG_ON(pgd_large(*pgd));
+
+ pud = pud_offset(pgd, vaddr);
+ if (pud_none(*pud)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ if (pud_large(*pud))
+ return (pud_pfn(*pud) << PAGE_SHIFT) | (vaddr & ~PUD_PAGE_MASK);
+
+ pmd = pmd_offset(pud, vaddr);
+ if (pmd_none(*pmd)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ if (pmd_large(*pmd))
+ return (pmd_pfn(*pmd) << PAGE_SHIFT) | (vaddr & ~PMD_PAGE_MASK);
+
+ pte = pte_offset_kernel(pmd, vaddr);
+ if (pte_none(*pte)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ return (pte_pfn(*pte) << PAGE_SHIFT) | (vaddr & ~PAGE_MASK);
+}
+
+/*
+ * This is a relatively normal page table walk, except that it
+ * also tries to allocate page tables pages along the way.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static pte_t *kaiser_pagetable_walk(unsigned long address)
+{
+ pmd_t *pmd;
+ pud_t *pud;
+ pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+
+ if (pgd_none(*pgd)) {
+ WARN_ONCE(1, "All shadow pgds should have been populated");
+ return NULL;
+ }
+ BUILD_BUG_ON(pgd_large(*pgd) != 0);
+
+ pud = pud_offset(pgd, address);
+ /* The shadow page tables do not use large mappings: */
+ if (pud_large(*pud)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pud_none(*pud)) {
+ unsigned long new_pmd_page = __get_free_page(gfp);
+ if (!new_pmd_page)
+ return NULL;
+ spin_lock(&shadow_table_allocation_lock);
+ if (pud_none(*pud)) {
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pmd_page), NR_KAISERTABLE);
+ } else
+ free_page(new_pmd_page);
+ spin_unlock(&shadow_table_allocation_lock);
+ }
+
+ pmd = pmd_offset(pud, address);
+ /* The shadow page tables do not use large mappings: */
+ if (pmd_large(*pmd)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pmd_none(*pmd)) {
+ unsigned long new_pte_page = __get_free_page(gfp);
+ if (!new_pte_page)
+ return NULL;
+ spin_lock(&shadow_table_allocation_lock);
+ if (pmd_none(*pmd)) {
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pte_page), NR_KAISERTABLE);
+ } else
+ free_page(new_pte_page);
+ spin_unlock(&shadow_table_allocation_lock);
+ }
+
+ return pte_offset_kernel(pmd, address);
+}
+
+static int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+ unsigned long flags)
+{
+ int ret = 0;
+ pte_t *pte;
+ unsigned long start_addr = (unsigned long )__start_addr;
+ unsigned long address = start_addr & PAGE_MASK;
+ unsigned long end_addr = PAGE_ALIGN(start_addr + size);
+ unsigned long target_address;
+
+ /*
+ * It is convenient for callers to pass in __PAGE_KERNEL etc,
+ * and there is no actual harm from setting _PAGE_GLOBAL, so
+ * long as CR4.PGE is not set. But it is nonetheless troubling
+ * to see Kaiser itself setting _PAGE_GLOBAL (now that "nokaiser"
+ * requires that not to be #defined to 0): so mask it off here.
+ */
+ flags &= ~_PAGE_GLOBAL;
+
+ for (; address < end_addr; address += PAGE_SIZE) {
+ target_address = get_pa_from_mapping(address);
+ if (target_address == -1) {
+ ret = -EIO;
+ break;
+ }
+ pte = kaiser_pagetable_walk(address);
+ if (!pte) {
+ ret = -ENOMEM;
+ break;
+ }
+ if (pte_none(*pte)) {
+ set_pte(pte, __pte(flags | target_address));
+ } else {
+ pte_t tmp;
+ set_pte(&tmp, __pte(flags | target_address));
+ WARN_ON_ONCE(!pte_same(*pte, tmp));
+ }
+ }
+ return ret;
+}
+
+static int kaiser_add_user_map_ptrs(const void *start, const void *end, unsigned long flags)
+{
+ unsigned long size = end - start;
+
+ return kaiser_add_user_map(start, size, flags);
+}
+
+/*
+ * Ensure that the top level of the (shadow) page tables are
+ * entirely populated. This ensures that all processes that get
+ * forked have the same entries. This way, we do not have to
+ * ever go set up new entries in older processes.
+ *
+ * Note: we never free these, so there are no updates to them
+ * after this.
+ */
+static void __init kaiser_init_all_pgds(void)
+{
+ pgd_t *pgd;
+ int i = 0;
+
+ pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
+ for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
+ pgd_t new_pgd;
+ pud_t *pud = pud_alloc_one(&init_mm,
+ PAGE_OFFSET + i * PGDIR_SIZE);
+ if (!pud) {
+ WARN_ON(1);
+ break;
+ }
+ inc_zone_page_state(virt_to_page(pud), NR_KAISERTABLE);
+ new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
+ /*
+ * Make sure not to stomp on some other pgd entry.
+ */
+ if (!pgd_none(pgd[i])) {
+ WARN_ON(1);
+ continue;
+ }
+ set_pgd(pgd + i, new_pgd);
+ }
+}
+
+#define kaiser_add_user_map_early(start, size, flags) do { \
+ int __ret = kaiser_add_user_map(start, size, flags); \
+ WARN_ON(__ret); \
+} while (0)
+
+#define kaiser_add_user_map_ptrs_early(start, end, flags) do { \
+ int __ret = kaiser_add_user_map_ptrs(start, end, flags); \
+ WARN_ON(__ret); \
+} while (0)
+
+void __init kaiser_check_boottime_disable(void)
+{
+ bool enable = true;
+ char arg[5];
+ int ret;
+
+ if (boot_cpu_has(X86_FEATURE_XENPV))
+ goto silent_disable;
+
+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (!strncmp(arg, "on", 2))
+ goto enable;
+
+ if (!strncmp(arg, "off", 3))
+ goto disable;
+
+ if (!strncmp(arg, "auto", 4))
+ goto skip;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti"))
+ goto disable;
+
+skip:
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+ goto disable;
+
+enable:
+ if (enable)
+ setup_force_cpu_cap(X86_FEATURE_KAISER);
+
+ return;
+
+disable:
+ pr_info("disabled\n");
+
+silent_disable:
+ kaiser_enabled = 0;
+ setup_clear_cpu_cap(X86_FEATURE_KAISER);
+}
+
+/*
+ * If anything in here fails, we will likely die on one of the
+ * first kernel->user transitions and init will die. But, we
+ * will have most of the kernel up by then and should be able to
+ * get a clean warning out of it. If we BUG_ON() here, we run
+ * the risk of being before we have good console output.
+ */
+void __init kaiser_init(void)
+{
+ int cpu;
+
+ if (!kaiser_enabled)
+ return;
+
+ kaiser_init_all_pgds();
+
+ for_each_possible_cpu(cpu) {
+ void *percpu_vaddr = __per_cpu_user_mapped_start +
+ per_cpu_offset(cpu);
+ unsigned long percpu_sz = __per_cpu_user_mapped_end -
+ __per_cpu_user_mapped_start;
+ kaiser_add_user_map_early(percpu_vaddr, percpu_sz,
+ __PAGE_KERNEL);
+ }
+
+ /*
+ * Map the entry/exit text section, which is needed at
+ * switches from user to and from kernel.
+ */
+ kaiser_add_user_map_ptrs_early(__entry_text_start, __entry_text_end,
+ __PAGE_KERNEL_RX);
+
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+ kaiser_add_user_map_ptrs_early(__irqentry_text_start,
+ __irqentry_text_end,
+ __PAGE_KERNEL_RX);
+#endif
+ kaiser_add_user_map_early((void *)idt_descr.address,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL_RO);
+#ifdef CONFIG_TRACING
+ kaiser_add_user_map_early(&trace_idt_descr,
+ sizeof(trace_idt_descr),
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&trace_idt_table,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL);
+#endif
+ kaiser_add_user_map_early(&debug_idt_descr, sizeof(debug_idt_descr),
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&debug_idt_table,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL);
+
+ pr_info("enabled\n");
+}
+
+/* Add a mapping to the shadow mapping, and synchronize the mappings */
+int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+{
+ if (!kaiser_enabled)
+ return 0;
+ return kaiser_add_user_map((const void *)addr, size, flags);
+}
+
+void kaiser_remove_mapping(unsigned long start, unsigned long size)
+{
+ extern void unmap_pud_range_nofree(pgd_t *pgd,
+ unsigned long start, unsigned long end);
+ unsigned long end = start + size;
+ unsigned long addr, next;
+ pgd_t *pgd;
+
+ if (!kaiser_enabled)
+ return;
+ pgd = native_get_shadow_pgd(pgd_offset_k(start));
+ for (addr = start; addr < end; pgd++, addr = next) {
+ next = pgd_addr_end(addr, end);
+ unmap_pud_range_nofree(pgd, addr, next);
+ }
+}
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(pgd_t *pgdp)
+{
+ return ((unsigned long)pgdp % PAGE_SIZE) < (PAGE_SIZE / 2);
+}
+
+pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ if (!kaiser_enabled)
+ return pgd;
+ /*
+ * Do we need to also populate the shadow pgd? Check _PAGE_USER to
+ * skip cases like kexec and EFI which make temporary low mappings.
+ */
+ if (pgd.pgd & _PAGE_USER) {
+ if (is_userspace_pgd(pgdp)) {
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ /*
+ * Even if the entry is *mapping* userspace, ensure
+ * that userspace can not use it. This way, if we
+ * get out to userspace running on the kernel CR3,
+ * userspace will crash instead of running.
+ */
+ if (__supported_pte_mask & _PAGE_NX)
+ pgd.pgd |= _PAGE_NX;
+ }
+ } else if (!pgd.pgd) {
+ /*
+ * pgd_clear() cannot check _PAGE_USER, and is even used to
+ * clear corrupted pgd entries: so just rely on cases like
+ * kexec and EFI never to be using pgd_clear().
+ */
+ if (!WARN_ON_ONCE((unsigned long)pgdp & PAGE_SIZE) &&
+ is_userspace_pgd(pgdp))
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ }
+ return pgd;
+}
+
+void kaiser_setup_pcid(void)
+{
+ unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
+
+ if (this_cpu_has(X86_FEATURE_PCID))
+ user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
+ /*
+ * These variables are used by the entry/exit
+ * code to change PCID and pgd and TLB flushing.
+ */
+ this_cpu_write(x86_cr3_pcid_user, user_cr3);
+}
+
+/*
+ * Make a note that this cpu will need to flush USER tlb on return to user.
+ * If cpu does not have PCID, then the NOFLUSH bit will never have been set.
+ */
+void kaiser_flush_tlb_on_return_to_user(void)
+{
+ if (this_cpu_has(X86_FEATURE_PCID))
+ this_cpu_write(x86_cr3_pcid_user,
+ X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
+}
+EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index aed206475aa7..319183d93602 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -189,6 +189,6 @@ void __meminit init_trampoline(void)
*pud_tramp = *pud;
}
- set_pgd(&trampoline_pgd_entry,
- __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+ /* Avoid set_pgd(), in case it's complicated by CONFIG_PAGE_TABLE_ISOLATION */
+ trampoline_pgd_entry = __pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e3353c97d086..73dcb0e18c1b 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -52,6 +52,7 @@ static DEFINE_SPINLOCK(cpa_lock);
#define CPA_FLUSHTLB 1
#define CPA_ARRAY 2
#define CPA_PAGES_ARRAY 4
+#define CPA_FREE_PAGETABLES 8
#ifdef CONFIG_PROC_FS
static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -729,10 +730,13 @@ static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
return 0;
}
-static bool try_to_free_pte_page(pte_t *pte)
+static bool try_to_free_pte_page(struct cpa_data *cpa, pte_t *pte)
{
int i;
+ if (!(cpa->flags & CPA_FREE_PAGETABLES))
+ return false;
+
for (i = 0; i < PTRS_PER_PTE; i++)
if (!pte_none(pte[i]))
return false;
@@ -741,10 +745,13 @@ static bool try_to_free_pte_page(pte_t *pte)
return true;
}
-static bool try_to_free_pmd_page(pmd_t *pmd)
+static bool try_to_free_pmd_page(struct cpa_data *cpa, pmd_t *pmd)
{
int i;
+ if (!(cpa->flags & CPA_FREE_PAGETABLES))
+ return false;
+
for (i = 0; i < PTRS_PER_PMD; i++)
if (!pmd_none(pmd[i]))
return false;
@@ -753,7 +760,9 @@ static bool try_to_free_pmd_page(pmd_t *pmd)
return true;
}
-static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
+static bool unmap_pte_range(struct cpa_data *cpa, pmd_t *pmd,
+ unsigned long start,
+ unsigned long end)
{
pte_t *pte = pte_offset_kernel(pmd, start);
@@ -764,22 +773,23 @@ static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
pte++;
}
- if (try_to_free_pte_page((pte_t *)pmd_page_vaddr(*pmd))) {
+ if (try_to_free_pte_page(cpa, (pte_t *)pmd_page_vaddr(*pmd))) {
pmd_clear(pmd);
return true;
}
return false;
}
-static void __unmap_pmd_range(pud_t *pud, pmd_t *pmd,
+static void __unmap_pmd_range(struct cpa_data *cpa, pud_t *pud, pmd_t *pmd,
unsigned long start, unsigned long end)
{
- if (unmap_pte_range(pmd, start, end))
- if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+ if (unmap_pte_range(cpa, pmd, start, end))
+ if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
pud_clear(pud);
}
-static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
+static void unmap_pmd_range(struct cpa_data *cpa, pud_t *pud,
+ unsigned long start, unsigned long end)
{
pmd_t *pmd = pmd_offset(pud, start);
@@ -790,7 +800,7 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
unsigned long next_page = (start + PMD_SIZE) & PMD_MASK;
unsigned long pre_end = min_t(unsigned long, end, next_page);
- __unmap_pmd_range(pud, pmd, start, pre_end);
+ __unmap_pmd_range(cpa, pud, pmd, start, pre_end);
start = pre_end;
pmd++;
@@ -803,7 +813,8 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
if (pmd_large(*pmd))
pmd_clear(pmd);
else
- __unmap_pmd_range(pud, pmd, start, start + PMD_SIZE);
+ __unmap_pmd_range(cpa, pud, pmd,
+ start, start + PMD_SIZE);
start += PMD_SIZE;
pmd++;
@@ -813,17 +824,19 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
* 4K leftovers?
*/
if (start < end)
- return __unmap_pmd_range(pud, pmd, start, end);
+ return __unmap_pmd_range(cpa, pud, pmd, start, end);
/*
* Try again to free the PMD page if haven't succeeded above.
*/
if (!pud_none(*pud))
- if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+ if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
pud_clear(pud);
}
-static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+static void __unmap_pud_range(struct cpa_data *cpa, pgd_t *pgd,
+ unsigned long start,
+ unsigned long end)
{
pud_t *pud = pud_offset(pgd, start);
@@ -834,7 +847,7 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
unsigned long next_page = (start + PUD_SIZE) & PUD_MASK;
unsigned long pre_end = min_t(unsigned long, end, next_page);
- unmap_pmd_range(pud, start, pre_end);
+ unmap_pmd_range(cpa, pud, start, pre_end);
start = pre_end;
pud++;
@@ -848,7 +861,7 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
if (pud_large(*pud))
pud_clear(pud);
else
- unmap_pmd_range(pud, start, start + PUD_SIZE);
+ unmap_pmd_range(cpa, pud, start, start + PUD_SIZE);
start += PUD_SIZE;
pud++;
@@ -858,7 +871,7 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
* 2M leftovers?
*/
if (start < end)
- unmap_pmd_range(pud, start, end);
+ unmap_pmd_range(cpa, pud, start, end);
/*
* No need to try to free the PUD page because we'll free it in
@@ -866,6 +879,24 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
*/
}
+static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+ struct cpa_data cpa = {
+ .flags = CPA_FREE_PAGETABLES,
+ };
+
+ __unmap_pud_range(&cpa, pgd, start, end);
+}
+
+void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+ struct cpa_data cpa = {
+ .flags = 0,
+ };
+
+ __unmap_pud_range(&cpa, pgd, start, end);
+}
+
static int alloc_pte_page(pmd_t *pmd)
{
pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3feec5af4e67..5aaec8effc5f 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -344,14 +344,22 @@ static inline void _pgd_free(pgd_t *pgd)
kmem_cache_free(pgd_cache, pgd);
}
#else
+
+/*
+ * Instead of one pgd, Kaiser acquires two pgds. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER kaiser_enabled
+
static inline pgd_t *_pgd_alloc(void)
{
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
}
static inline void _pgd_free(pgd_t *pgd)
{
- free_page((unsigned long)pgd);
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
}
#endif /* CONFIG_X86_PAE */
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 53b72fb4e781..41205de487e7 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
#include <linux/interrupt.h>
#include <linux/export.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
-#include <linux/debugfs.h>
+#include <asm/kaiser.h>
/*
* TLB flushing, formerly SMP-only
@@ -34,6 +35,36 @@ struct flush_tlb_info {
unsigned long flush_end;
};
+static void load_new_mm_cr3(pgd_t *pgdir)
+{
+ unsigned long new_mm_cr3 = __pa(pgdir);
+
+ if (kaiser_enabled) {
+ /*
+ * We reuse the same PCID for different tasks, so we must
+ * flush all the entries for the PCID out when we change tasks.
+ * Flush KERN below, flush USER when returning to userspace in
+ * kaiser's SWITCH_USER_CR3 (_SWITCH_TO_USER_CR3) macro.
+ *
+ * invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
+ * do it here, but can only be used if X86_FEATURE_INVPCID is
+ * available - and many machines support pcid without invpcid.
+ *
+ * If X86_CR3_PCID_KERN_FLUSH actually added something, then it
+ * would be needed in the write_cr3() below - if PCIDs enabled.
+ */
+ BUILD_BUG_ON(X86_CR3_PCID_KERN_FLUSH);
+ kaiser_flush_tlb_on_return_to_user();
+ }
+
+ /*
+ * Caution: many callers of this function expect
+ * that load_cr3() is serializing and orders TLB
+ * fills with respect to the mm_cpumask writes.
+ */
+ write_cr3(new_mm_cr3);
+}
+
/*
* We cannot call mmdrop() because we are in interrupt context,
* instead update mm->cpu_vm_mask.
@@ -45,7 +76,7 @@ void leave_mm(int cpu)
BUG();
if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
- load_cr3(swapper_pg_dir);
+ load_new_mm_cr3(swapper_pg_dir);
/*
* This gets called in the idle path where RCU
* functions differently. Tracing normally
@@ -120,7 +151,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* ordering guarantee we need.
*
*/
- load_cr3(next->pgd);
+ load_new_mm_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
@@ -167,7 +198,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* As above, load_cr3() is serializing and orders TLB
* fills with respect to the mm_cpumask write.
*/
- load_cr3(next->pgd);
+ load_new_mm_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
load_mm_cr4(next);
load_mm_ldt(next);
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index dc81e5287ebf..2e6000a4eb2c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -778,7 +778,14 @@
*/
#define PERCPU_INPUT(cacheline) \
VMLINUX_SYMBOL(__per_cpu_start) = .; \
+ VMLINUX_SYMBOL(__per_cpu_user_mapped_start) = .; \
*(.data..percpu..first) \
+ . = ALIGN(cacheline); \
+ *(.data..percpu..user_mapped) \
+ *(.data..percpu..user_mapped..shared_aligned) \
+ . = ALIGN(PAGE_SIZE); \
+ *(.data..percpu..user_mapped..page_aligned) \
+ VMLINUX_SYMBOL(__per_cpu_user_mapped_end) = .; \
. = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
. = ALIGN(cacheline); \
diff --git a/include/linux/kaiser.h b/include/linux/kaiser.h
new file mode 100644
index 000000000000..58c55b1589d0
--- /dev/null
+++ b/include/linux/kaiser.h
@@ -0,0 +1,52 @@
+#ifndef _LINUX_KAISER_H
+#define _LINUX_KAISER_H
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#include <asm/kaiser.h>
+
+static inline int kaiser_map_thread_stack(void *stack)
+{
+ /*
+ * Map that page of kernel stack on which we enter from user context.
+ */
+ return kaiser_add_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE, __PAGE_KERNEL);
+}
+
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+ /*
+ * Note: may be called even when kaiser_map_thread_stack() failed.
+ */
+ kaiser_remove_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE);
+}
+#else
+
+/*
+ * These stubs are used whenever CONFIG_PAGE_TABLE_ISOLATION is off, which
+ * includes architectures that support KAISER, but have it disabled.
+ */
+
+static inline void kaiser_init(void)
+{
+}
+static inline int kaiser_add_mapping(unsigned long addr,
+ unsigned long size, unsigned long flags)
+{
+ return 0;
+}
+static inline void kaiser_remove_mapping(unsigned long start,
+ unsigned long size)
+{
+}
+static inline int kaiser_map_thread_stack(void *stack)
+{
+ return 0;
+}
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+}
+
+#endif /* !CONFIG_PAGE_TABLE_ISOLATION */
+#endif /* _LINUX_KAISER_H */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fff21a82780c..490f5a83f947 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -124,8 +124,9 @@ enum zone_stat_item {
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
NR_KERNEL_STACK_KB, /* measured in KiB */
- /* Second 128 byte cacheline */
+ NR_KAISERTABLE,
NR_BOUNCE,
+ /* Second 128 byte cacheline */
#if IS_ENABLED(CONFIG_ZSMALLOC)
NR_ZSPAGES, /* allocated in zsmalloc */
#endif
diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index 8f16299ca068..8902f23bb770 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -35,6 +35,12 @@
#endif
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define USER_MAPPED_SECTION "..user_mapped"
+#else
+#define USER_MAPPED_SECTION ""
+#endif
+
/*
* Base implementations of per-CPU variable declarations and definitions, where
* the section in which the variable is to be placed is provided by the
@@ -115,6 +121,12 @@
#define DEFINE_PER_CPU(type, name) \
DEFINE_PER_CPU_SECTION(type, name, "")
+#define DECLARE_PER_CPU_USER_MAPPED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
+#define DEFINE_PER_CPU_USER_MAPPED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
/*
* Declaration/definition used for per-CPU variables that must come first in
* the set of variables.
@@ -144,6 +156,14 @@
DEFINE_PER_CPU_SECTION(type, name, PER_CPU_SHARED_ALIGNED_SECTION) \
____cacheline_aligned_in_smp
+#define DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+ ____cacheline_aligned_in_smp
+
+#define DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+ ____cacheline_aligned_in_smp
+
#define DECLARE_PER_CPU_ALIGNED(type, name) \
DECLARE_PER_CPU_SECTION(type, name, PER_CPU_ALIGNED_SECTION) \
____cacheline_aligned
@@ -162,11 +182,21 @@
#define DEFINE_PER_CPU_PAGE_ALIGNED(type, name) \
DEFINE_PER_CPU_SECTION(type, name, "..page_aligned") \
__aligned(PAGE_SIZE)
+/*
+ * Declaration/definition used for per-CPU variables that must be page aligned and need to be mapped in user mode.
+ */
+#define DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned") \
+ __aligned(PAGE_SIZE)
+
+#define DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned") \
+ __aligned(PAGE_SIZE)
/*
* Declaration/definition used for per-CPU variables that must be read mostly.
*/
-#define DECLARE_PER_CPU_READ_MOSTLY(type, name) \
+#define DECLARE_PER_CPU_READ_MOSTLY(type, name) \
DECLARE_PER_CPU_SECTION(type, name, "..read_mostly")
#define DEFINE_PER_CPU_READ_MOSTLY(type, name) \
diff --git a/init/main.c b/init/main.c
index 25bac88bc66e..99f026565608 100644
--- a/init/main.c
+++ b/init/main.c
@@ -80,6 +80,7 @@
#include <linux/integrity.h>
#include <linux/proc_ns.h>
#include <linux/io.h>
+#include <linux/kaiser.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -473,6 +474,7 @@ static void __init mm_init(void)
pgtable_init();
vmalloc_init();
ioremap_huge_init();
+ kaiser_init();
}
asmlinkage __visible void __init start_kernel(void)
diff --git a/kernel/fork.c b/kernel/fork.c
index 9321b1ad3335..70e10cb49be0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -58,6 +58,7 @@
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/freezer.h>
+#include <linux/kaiser.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
@@ -213,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
static inline void free_thread_stack(struct task_struct *tsk)
{
+ kaiser_unmap_thread_stack(tsk->stack);
#ifdef CONFIG_VMAP_STACK
if (task_stack_vm_area(tsk)) {
unsigned long flags;
@@ -495,6 +497,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
* functions again.
*/
tsk->stack = stack;
+
+ err= kaiser_map_thread_stack(tsk->stack);
+ if (err)
+ goto free_stack;
#ifdef CONFIG_VMAP_STACK
tsk->stack_vm_area = stack_vm_area;
#endif
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 604f26a4f696..6a088df04b29 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -932,6 +932,7 @@ const char * const vmstat_text[] = {
"nr_slab_unreclaimable",
"nr_page_table_pages",
"nr_kernel_stack",
+ "nr_overhead",
"nr_bounce",
#if IS_ENABLED(CONFIG_ZSMALLOC)
"nr_zspages",
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 97f9cac98348..e86a34fd5484 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -843,6 +843,11 @@ static u32 bbr_sndbuf_expand(struct sock *sk)
*/
static u32 bbr_undo_cwnd(struct sock *sk)
{
+ struct bbr *bbr = inet_csk_ca(sk);
+
+ bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
+ bbr->full_bw_cnt = 0;
+ bbr_reset_lt_bw_sampling(sk);
return tcp_sk(sk)->snd_cwnd;
}
diff --git a/security/Kconfig b/security/Kconfig
index 118f4549404e..32f36b40e9f0 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -31,6 +31,16 @@ config SECURITY
If you are unsure how to answer this question, answer N.
+config PAGE_TABLE_ISOLATION
+ bool "Remove the kernel mapping in user mode"
+ default y
+ depends on X86_64 && SMP
+ help
+ This enforces a strict kernel and user space isolation, in order
+ to close hardware side channels on kernel address information.
+
+ If you are unsure how to answer this question, answer Y.
+
config SECURITYFS
bool "Enable the securityfs filesystem"
help
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index a39629206864..f79669a38c0c 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,9 @@
#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b4a83a490212..5977c4d71356 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2523,6 +2523,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nojitter [IA-64] Disables jitter checking for ITC timers.
+ nopti [X86-64] Disable KAISER isolation of kernel from user.
+
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
@@ -3054,6 +3056,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64]
+ Control KAISER user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
diff --git a/Makefile b/Makefile
index 5d67056e24dd..b028c106535b 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
VERSION = 4
PATCHLEVEL = 4
-SUBLEVEL = 109
+SUBLEVEL = 110
EXTRAVERSION =
NAME = Blurry Fish Butt
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3783dc3e10b3..4abb284a5b9c 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,6 +9,7 @@
*/
#undef CONFIG_PARAVIRT
#undef CONFIG_PARAVIRT_SPINLOCKS
+#undef CONFIG_PAGE_TABLE_ISOLATION
#undef CONFIG_KASAN
#include <linux/linkage.h>
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cc0f2f5da19b..952b23b5d4e9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -35,6 +35,7 @@
#include <asm/asm.h>
#include <asm/smap.h>
#include <asm/pgtable_types.h>
+#include <asm/kaiser.h>
#include <linux/err.h>
/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
@@ -135,6 +136,7 @@ ENTRY(entry_SYSCALL_64)
* it is too small to ever cause noticeable irq latency.
*/
SWAPGS_UNSAFE_STACK
+ SWITCH_KERNEL_CR3_NO_STACK
/*
* A hypervisor implementation might want to use a label
* after the swapgs, so that it can do the swapgs
@@ -207,9 +209,17 @@ entry_SYSCALL_64_fastpath:
testl $_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz int_ret_from_sys_call_irqs_off /* Go to the slow path */
- RESTORE_C_REGS_EXCEPT_RCX_R11
movq RIP(%rsp), %rcx
movq EFLAGS(%rsp), %r11
+ RESTORE_C_REGS_EXCEPT_RCX_R11
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
+ SWITCH_USER_CR3
movq RSP(%rsp), %rsp
/*
* 64-bit SYSRET restores rip from rcx,
@@ -347,10 +357,26 @@ GLOBAL(int_ret_from_sys_call)
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
RESTORE_C_REGS_EXCEPT_RCX_R11
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
+ SWITCH_USER_CR3
movq RSP(%rsp), %rsp
USERGS_SYSRET64
opportunistic_sysret_failed:
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
+ SWITCH_USER_CR3
SWAPGS
jmp restore_c_regs_and_iret
END(entry_SYSCALL_64)
@@ -509,6 +535,7 @@ END(irq_entries_start)
* tracking that we're in kernel mode.
*/
SWAPGS
+ SWITCH_KERNEL_CR3
/*
* We need to tell lockdep that IRQs are off. We can't do this until
@@ -568,6 +595,7 @@ GLOBAL(retint_user)
mov %rsp,%rdi
call prepare_exit_to_usermode
TRACE_IRQS_IRETQ
+ SWITCH_USER_CR3
SWAPGS
jmp restore_regs_and_iret
@@ -625,6 +653,7 @@ native_irq_return_ldt:
pushq %rax
pushq %rdi
SWAPGS
+ SWITCH_KERNEL_CR3
movq PER_CPU_VAR(espfix_waddr), %rdi
movq %rax, (0*8)(%rdi) /* RAX */
movq (2*8)(%rsp), %rax /* RIP */
@@ -640,6 +669,7 @@ native_irq_return_ldt:
andl $0xffff0000, %eax
popq %rdi
orq PER_CPU_VAR(espfix_stack), %rax
+ SWITCH_USER_CR3
SWAPGS
movq %rax, %rsp
popq %rax
@@ -995,7 +1025,11 @@ idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vec
/*
* Save all registers in pt_regs, and switch gs if needed.
* Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ *
+ * Return: ebx=0: needs swapgs but not SWITCH_USER_CR3 in paranoid_exit
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3 in paranoid_exit
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3 in paranoid_exit
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs in paranoid_exit
*/
ENTRY(paranoid_entry)
cld
@@ -1008,7 +1042,26 @@ ENTRY(paranoid_entry)
js 1f /* negative -> in kernel */
SWAPGS
xorl %ebx, %ebx
-1: ret
+1:
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
+ * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
+ * Do a conditional SWITCH_KERNEL_CR3: this could safely be done
+ * unconditionally, but we need to find out whether the reverse
+ * should be done on return (conveyed to paranoid_exit in %ebx).
+ */
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
+ testl $KAISER_SHADOW_PGD_OFFSET, %eax
+ jz 2f
+ orl $2, %ebx
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ /* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
+ movq %rax, %cr3
+2:
+#endif
+ ret
END(paranoid_entry)
/*
@@ -1021,19 +1074,26 @@ END(paranoid_entry)
* be complicated. Fortunately, we there's no good reason
* to try to handle preemption here.
*
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry: ebx=0: needs swapgs but not SWITCH_USER_CR3
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs
*/
ENTRY(paranoid_exit)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF_DEBUG
- testl %ebx, %ebx /* swapgs needed? */
+ TRACE_IRQS_IRETQ_DEBUG
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /* No ALTERNATIVE for X86_FEATURE_KAISER: paranoid_entry sets %ebx */
+ testl $2, %ebx /* SWITCH_USER_CR3 needed? */
+ jz paranoid_exit_no_switch
+ SWITCH_USER_CR3
+paranoid_exit_no_switch:
+#endif
+ testl $1, %ebx /* swapgs needed? */
jnz paranoid_exit_no_swapgs
- TRACE_IRQS_IRETQ
SWAPGS_UNSAFE_STACK
- jmp paranoid_exit_restore
paranoid_exit_no_swapgs:
- TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
RESTORE_EXTRA_REGS
RESTORE_C_REGS
REMOVE_PT_GPREGS_FROM_STACK 8
@@ -1048,6 +1108,13 @@ ENTRY(error_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+ /*
+ * error_entry() always returns with a kernel gsbase and
+ * CR3. We must also have a kernel CR3/gsbase before
+ * calling TRACE_IRQS_*. Just unconditionally switch to
+ * the kernel CR3 here.
+ */
+ SWITCH_KERNEL_CR3
xorl %ebx, %ebx
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
@@ -1210,6 +1277,10 @@ ENTRY(nmi)
*/
SWAPGS_UNSAFE_STACK
+ /*
+ * percpu variables are mapped with user CR3, so no need
+ * to switch CR3 here.
+ */
cld
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1243,12 +1314,34 @@ ENTRY(nmi)
movq %rsp, %rdi
movq $-1, %rsi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /* Unconditionally use kernel CR3 for do_nmi() */
+ /* %rax is saved above, so OK to clobber here */
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
+ pushq %rax
+ /* mask off "user" bit of pgd address and 12 PCID bits: */
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ movq %rax, %cr3
+2:
+#endif
call do_nmi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * Unconditionally restore CR3. I know we return to
+ * kernel code that needs user CR3, but do we ever return
+ * to "user mode" where we need the kernel CR3?
+ */
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
+#endif
+
/*
* Return back to user mode. We must *not* do the normal exit
- * work, because we don't want to enable interrupts. Fortunately,
- * do_nmi doesn't modify pt_regs.
+ * work, because we don't want to enable interrupts. Do not
+ * switch to user CR3: we might be going back to kernel code
+ * that had a user CR3 set.
*/
SWAPGS
jmp restore_c_regs_and_iret
@@ -1445,22 +1538,55 @@ end_repeat_nmi:
ALLOC_PT_GPREGS_ON_STACK
/*
- * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
- * as we should not be calling schedule in NMI context.
- * Even with normal interrupts enabled. An NMI should not be
- * setting NEED_RESCHED or anything that normal interrupts and
- * exceptions might do.
+ * Use the same approach as paranoid_entry to handle SWAPGS, but
+ * without CR3 handling since we do that differently in NMIs. No
+ * need to use paranoid_exit as we should not be calling schedule
+ * in NMI context. Even with normal interrupts enabled. An NMI
+ * should not be setting NEED_RESCHED or anything that normal
+ * interrupts and exceptions might do.
*/
- call paranoid_entry
-
- /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
+ cld
+ SAVE_C_REGS
+ SAVE_EXTRA_REGS
+ movl $1, %ebx
+ movl $MSR_GS_BASE, %ecx
+ rdmsr
+ testl %edx, %edx
+ js 1f /* negative -> in kernel */
+ SWAPGS
+ xorl %ebx, %ebx
+1:
movq %rsp, %rdi
movq $-1, %rsi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /* Unconditionally use kernel CR3 for do_nmi() */
+ /* %rax is saved above, so OK to clobber here */
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
+ pushq %rax
+ /* mask off "user" bit of pgd address and 12 PCID bits: */
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ movq %rax, %cr3
+2:
+#endif
+
+ /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
call do_nmi
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * Unconditionally restore CR3. We might be returning to
+ * kernel code that needs user CR3, like just just before
+ * a sysret.
+ */
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
+#endif
+
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
+ /* We fixed up CR3 above, so no need to switch it here */
SWAPGS_UNSAFE_STACK
nmi_restore:
RESTORE_EXTRA_REGS
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 15cfebaa7688..d03bf0e28b8b 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -13,6 +13,8 @@
#include <asm/irqflags.h>
#include <asm/asm.h>
#include <asm/smap.h>
+#include <asm/pgtable_types.h>
+#include <asm/kaiser.h>
#include <linux/linkage.h>
#include <linux/err.h>
@@ -50,6 +52,7 @@ ENDPROC(native_usergs_sysret32)
ENTRY(entry_SYSENTER_compat)
/* Interrupts are off on entry. */
SWAPGS_UNSAFE_STACK
+ SWITCH_KERNEL_CR3_NO_STACK
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/*
@@ -161,6 +164,7 @@ ENDPROC(entry_SYSENTER_compat)
ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
SWAPGS_UNSAFE_STACK
+ SWITCH_KERNEL_CR3_NO_STACK
/* Stash user ESP and switch to the kernel stack. */
movl %esp, %r8d
@@ -208,6 +212,7 @@ ENTRY(entry_SYSCALL_compat)
/* Opportunistic SYSRET */
sysret32_from_system_call:
TRACE_IRQS_ON /* User mode traces as IRQs on. */
+ SWITCH_USER_CR3
movq RBX(%rsp), %rbx /* pt_regs->rbx */
movq RBP(%rsp), %rbp /* pt_regs->rbp */
movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */
@@ -269,6 +274,7 @@ ENTRY(entry_INT80_compat)
PARAVIRT_ADJUST_EXCEPTION_FRAME
ASM_CLAC /* Do this early to minimize exposure */
SWAPGS
+ SWITCH_KERNEL_CR3_NO_STACK
/*
* User tracing code (ptrace or signal handlers) might assume that
@@ -311,6 +317,7 @@ ENTRY(entry_INT80_compat)
/* Go back to user mode. */
TRACE_IRQS_ON
+ SWITCH_USER_CR3
SWAPGS
jmp restore_regs_and_iret
END(entry_INT80_compat)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index ca94fa649251..5dd363d54348 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -36,6 +36,11 @@ static notrace cycle_t vread_hpet(void)
}
#endif
+#ifdef CONFIG_PARAVIRT_CLOCK
+extern u8 pvclock_page
+ __attribute__((visibility("hidden")));
+#endif
+
#ifndef BUILD_VDSO32
#include <linux/kernel.h>
@@ -62,63 +67,65 @@ notrace static long vdso_fallback_gtod(struct timeval *tv, struct timezone *tz)
#ifdef CONFIG_PARAVIRT_CLOCK
-static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
+static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void)
{
- const struct pvclock_vsyscall_time_info *pvti_base;
- int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
- int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
-
- BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx > PVCLOCK_FIXMAP_END);
-
- pvti_base = (struct pvclock_vsyscall_time_info *)
- __fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
-
- return &pvti_base[offset];
+ return (const struct pvclock_vsyscall_time_info *)&pvclock_page;
}
static notrace cycle_t vread_pvclock(int *mode)
{
- const struct pvclock_vsyscall_time_info *pvti;
+ const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti;
cycle_t ret;
- u64 last;
- u32 version;
- u8 flags;
- unsigned cpu, cpu1;
-
+ u64 tsc, pvti_tsc;
+ u64 last, delta, pvti_system_time;
+ u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
/*
- * Note: hypervisor must guarantee that:
- * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
- * 2. that per-CPU pvclock time info is updated if the
- * underlying CPU changes.
- * 3. that version is increased whenever underlying CPU
- * changes.
+ * Note: The kernel and hypervisor must guarantee that cpu ID
+ * number maps 1:1 to per-CPU pvclock time info.
+ *
+ * Because the hypervisor is entirely unaware of guest userspace
+ * preemption, it cannot guarantee that per-CPU pvclock time
+ * info is updated if the underlying CPU changes or that that
+ * version is increased whenever underlying CPU changes.
*
+ * On KVM, we are guaranteed that pvti updates for any vCPU are
+ * atomic as seen by *all* vCPUs. This is an even stronger
+ * guarantee than we get with a normal seqlock.
+ *
+ * On Xen, we don't appear to have that guarantee, but Xen still
+ * supplies a valid seqlock using the version field.
+
+ * We only do pvclock vdso timing at all if
+ * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
+ * mean that all vCPUs have matching pvti and that the TSC is
+ * synced, so we can just look at vCPU 0's pvti.
*/
- do {
- cpu = __getcpu() & VGETCPU_CPU_MASK;
- /* TODO: We can put vcpu id into higher bits of pvti.version.
- * This will save a couple of cycles by getting rid of
- * __getcpu() calls (Gleb).
- */
-
- pvti = get_pvti(cpu);
-
- version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
-
- /*
- * Test we're still on the cpu as well as the version.
- * We could have been migrated just after the first
- * vgetcpu but before fetching the version, so we
- * wouldn't notice a version change.
- */
- cpu1 = __getcpu() & VGETCPU_CPU_MASK;
- } while (unlikely(cpu != cpu1 ||
- (pvti->pvti.version & 1) ||
- pvti->pvti.version != version));
-
- if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
+
+ if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
*mode = VCLOCK_NONE;
+ return 0;
+ }
+
+ do {
+ version = pvti->version;
+
+ /* This is also a read barrier, so we'll read version first. */
+ tsc = rdtsc_ordered();
+
+ pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
+ pvti_tsc_shift = pvti->tsc_shift;
+ pvti_system_time = pvti->system_time;
+ pvti_tsc = pvti->tsc_timestamp;
+
+ /* Make sure that the version double-check is last. */
+ smp_rmb();
+ } while (unlikely((version & 1) || version != pvti->version));
+
+ delta = tsc - pvti_tsc;
+ ret = pvti_system_time +
+ pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
+ pvti_tsc_shift);
/* refer to tsc.c read_tsc() comment for rationale */
last = gtod->cycle_last;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index de2c921025f5..4158acc17df0 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
* segment.
*/
- vvar_start = . - 2 * PAGE_SIZE;
+ vvar_start = . - 3 * PAGE_SIZE;
vvar_page = vvar_start;
/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
#undef EMIT_VVAR
hpet_page = vvar_start + PAGE_SIZE;
+ pvclock_page = vvar_start + 2 * PAGE_SIZE;
. = SIZEOF_HEADERS;
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 785d9922b106..491020b2826d 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -73,6 +73,7 @@ enum {
sym_vvar_start,
sym_vvar_page,
sym_hpet_page,
+ sym_pvclock_page,
sym_VDSO_FAKE_SECTION_TABLE_START,
sym_VDSO_FAKE_SECTION_TABLE_END,
};
@@ -80,6 +81,7 @@ enum {
const int special_pages[] = {
sym_vvar_page,
sym_hpet_page,
+ sym_pvclock_page,
};
struct vdso_sym {
@@ -91,6 +93,7 @@ struct vdso_sym required_syms[] = {
[sym_vvar_start] = {"vvar_start", true},
[sym_vvar_page] = {"vvar_page", true},
[sym_hpet_page] = {"hpet_page", true},
+ [sym_pvclock_page] = {"pvclock_page", true},
[sym_VDSO_FAKE_SECTION_TABLE_START] = {
"VDSO_FAKE_SECTION_TABLE_START", false
},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 64df47148160..aa828191c654 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -100,6 +100,7 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
.name = "[vvar]",
.pages = no_pages,
};
+ struct pvclock_vsyscall_time_info *pvti;
if (calculate_addr) {
addr = vdso_addr(current->mm->start_stack,
@@ -169,6 +170,18 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
}
#endif
+ pvti = pvclock_pvti_cpu0_va();
+ if (pvti && image->sym_pvclock_page) {
+ ret = remap_pfn_range(vma,
+ text_start + image->sym_pvclock_page,
+ __pa(pvti) >> PAGE_SHIFT,
+ PAGE_SIZE,
+ PAGE_READONLY);
+
+ if (ret)
+ goto up_fail;
+ }
+
up_fail:
if (ret)
current->mm->context.vdso = NULL;
diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index e01f7f7ccb0c..84ae170bc3d0 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
#define _ASM_X86_CMDLINE_H
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+ char *buffer, int bufsize);
#endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index f7ba9fbf12ee..f6605712ca90 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -187,6 +187,7 @@
#define X86_FEATURE_ARAT ( 7*32+ 1) /* Always Running APIC Timer */
#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */
#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 4) /* Effectively INVPCID && CR4.PCIDE=1 */
#define X86_FEATURE_PLN ( 7*32+ 5) /* Intel Power Limit Notification */
#define X86_FEATURE_PTS ( 7*32+ 6) /* Intel Package Thermal Status */
#define X86_FEATURE_DTHERM ( 7*32+ 7) /* Digital Thermal Sensor */
@@ -199,6 +200,9 @@
#define X86_FEATURE_HWP_PKG_REQ ( 7*32+14) /* Intel HWP_PKG_REQ */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 4e10d73cf018..880db91d9457 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -43,7 +43,7 @@ struct gdt_page {
struct desc_struct gdt[GDT_ENTRIES];
} __attribute__((aligned(PAGE_SIZE)));
-DECLARE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page);
+DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page);
static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
{
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 59caa55fb9b5..ee52ff858699 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,7 +187,7 @@ extern char irq_entries_start[];
#define VECTOR_RETRIGGERED ((void *)~0UL)
typedef struct irq_desc* vector_irq_t[NR_VECTORS];
-DECLARE_PER_CPU(vector_irq_t, vector_irq);
+DECLARE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq);
#endif /* !ASSEMBLY_ */
diff --git a/arch/x86/include/asm/kaiser.h b/arch/x86/include/asm/kaiser.h
new file mode 100644
index 000000000000..802bbbdfe143
--- /dev/null
+++ b/arch/x86/include/asm/kaiser.h
@@ -0,0 +1,141 @@
+#ifndef _ASM_X86_KAISER_H
+#define _ASM_X86_KAISER_H
+
+#include <uapi/asm/processor-flags.h> /* For PCID constants */
+
+/*
+ * This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on
+ * the kernel virtual memory. It has a shadow pgd for every process: the
+ * shadow pgd has a minimalistic kernel-set mapped, but includes the whole
+ * user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode,
+ * the shadow pgd is enabled. By this, the virtual memory caches are freed,
+ * and the user may not attack the whole kernel memory.
+ *
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user
+ * mode, such as the entry/exit functions of the user space, or the stacks.
+ */
+
+#define KAISER_SHADOW_PGD_OFFSET 0x1000
+
+#ifdef __ASSEMBLY__
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+.macro _SWITCH_TO_KERNEL_CR3 reg
+movq %cr3, \reg
+andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
+/* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ALTERNATIVE "", "bts $63, \reg", X86_FEATURE_PCID
+movq \reg, %cr3
+.endm
+
+.macro _SWITCH_TO_USER_CR3 reg regb
+/*
+ * regb must be the low byte portion of reg: because we have arranged
+ * for the low byte of the user PCID to serve as the high byte of NOFLUSH
+ * (0x80 for each when PCID is enabled, or 0x00 when PCID and NOFLUSH are
+ * not enabled): so that the one register can update both memory and cr3.
+ */
+movq %cr3, \reg
+orq PER_CPU_VAR(x86_cr3_pcid_user), \reg
+js 9f
+/* If PCID enabled, FLUSH this time, reset to NOFLUSH for next time */
+movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
+9:
+movq \reg, %cr3
+.endm
+
+.macro SWITCH_KERNEL_CR3
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
+_SWITCH_TO_KERNEL_CR3 %rax
+popq %rax
+8:
+.endm
+
+.macro SWITCH_USER_CR3
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
+_SWITCH_TO_USER_CR3 %rax %al
+popq %rax
+8:
+.endm
+
+.macro SWITCH_KERNEL_CR3_NO_STACK
+ALTERNATIVE "jmp 8f", \
+ __stringify(movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)), \
+ X86_FEATURE_KAISER
+_SWITCH_TO_KERNEL_CR3 %rax
+movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+8:
+.endm
+
+#else /* CONFIG_PAGE_TABLE_ISOLATION */
+
+.macro SWITCH_KERNEL_CR3
+.endm
+.macro SWITCH_USER_CR3
+.endm
+.macro SWITCH_KERNEL_CR3_NO_STACK
+.endm
+
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Upon kernel/user mode switch, it may happen that the address
+ * space has to be switched before the registers have been
+ * stored. To change the address space, another register is
+ * needed. A register therefore has to be stored/restored.
+*/
+DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+
+extern int kaiser_enabled;
+extern void __init kaiser_check_boottime_disable(void);
+#else
+#define kaiser_enabled 0
+static inline void __init kaiser_check_boottime_disable(void) {}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+/*
+ * Kaiser function prototypes are needed even when CONFIG_PAGE_TABLE_ISOLATION is not set,
+ * so as to build with tests on kaiser_enabled instead of #ifdefs.
+ */
+
+/**
+ * kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
+ * @addr: the start address of the range
+ * @size: the size of the range
+ * @flags: The mapping flags of the pages
+ *
+ * The mapping is done on a global scope, so no bigger
+ * synchronization has to be done. the pages have to be
+ * manually unmapped again when they are not needed any longer.
+ */
+extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
+
+/**
+ * kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
+ * @addr: the start address of the range
+ * @size: the size of the range
+ */
+extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
+
+/**
+ * kaiser_init - Initialize the shadow mapping
+ *
+ * Most parts of the shadow mapping can be mapped upon boot
+ * time. Only per-process things like the thread stacks
+ * or a new LDT have to be mapped at runtime. These boot-
+ * time mappings are permanent and never unmapped.
+ */
+extern void kaiser_init(void);
+
+#endif /* __ASSEMBLY */
+
+#endif /* _ASM_X86_KAISER_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 6ec0c8b2e9df..84c62d950023 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -18,6 +18,12 @@
#ifndef __ASSEMBLY__
#include <asm/x86_init.h>
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif
+
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
@@ -653,7 +659,17 @@ static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address)
static inline int pgd_bad(pgd_t pgd)
{
- return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+ pgdval_t ignore_flags = _PAGE_USER;
+ /*
+ * We set NX on KAISER pgds that map userspace memory so
+ * that userspace can not meaningfully use the kernel
+ * page table by accident; it will fault on the first
+ * instruction it tries to run. See native_set_pgd().
+ */
+ if (kaiser_enabled)
+ ignore_flags |= _PAGE_NX;
+
+ return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
}
static inline int pgd_none(pgd_t pgd)
@@ -855,7 +871,15 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (kaiser_enabled) {
+ /* Clone the shadow pgd part as well */
+ memcpy(native_get_shadow_pgd(dst),
+ native_get_shadow_pgd(src),
+ count * sizeof(pgd_t));
+ }
+#endif
}
#define PTE_SHIFT ilog2(PTRS_PER_PTE)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 2ee781114d34..c810226e741a 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -106,9 +106,32 @@ static inline void native_pud_clear(pud_t *pud)
native_set_pud(pud, native_make_pud(0));
}
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd);
+
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
+{
+#ifdef CONFIG_DEBUG_VM
+ /* linux/mmdebug.h may not have been included at this point */
+ BUG_ON(!kaiser_enabled);
+#endif
+ return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
+}
+#else
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
+{
+ BUILD_BUG_ON(1);
+ return NULL;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
- *pgdp = pgd;
+ *pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
}
static inline void native_pgd_clear(pgd_t *pgd)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 79c91853e50e..8dba273da25a 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -89,7 +89,7 @@
#define _PAGE_NX (_AT(pteval_t, 0))
#endif
-#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_ACCESSED | _PAGE_DIRTY)
@@ -102,6 +102,33 @@
_PAGE_SOFT_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
+/* The ASID is the lower 12 bits of CR3 */
+#define X86_CR3_PCID_ASID_MASK (_AC((1<<12)-1,UL))
+
+/* Mask for all the PCID-related bits in CR3: */
+#define X86_CR3_PCID_MASK (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
+
+#if defined(CONFIG_PAGE_TABLE_ISOLATION) && defined(CONFIG_X86_64)
+/* Let X86_CR3_PCID_ASID_USER be usable for the X86_CR3_PCID_NOFLUSH bit */
+#define X86_CR3_PCID_ASID_USER (_AC(0x80,UL))
+
+#define X86_CR3_PCID_KERN_FLUSH (X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_FLUSH (X86_CR3_PCID_ASID_USER)
+#define X86_CR3_PCID_KERN_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
+#else
+#define X86_CR3_PCID_ASID_USER (_AC(0x0,UL))
+/*
+ * PCIDs are unsupported on 32-bit and none of these bits can be
+ * set in CR3:
+ */
+#define X86_CR3_PCID_KERN_FLUSH (0)
+#define X86_CR3_PCID_USER_FLUSH (0)
+#define X86_CR3_PCID_KERN_NOFLUSH (0)
+#define X86_CR3_PCID_USER_NOFLUSH (0)
+#endif
+
/*
* The cache modes defined here are used to translate between pure SW usage
* and the HW defined cache mode bits and/or PAT entries.
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2d5a50cb61a2..f3bdaed0188f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -305,7 +305,7 @@ struct tss_struct {
} ____cacheline_aligned;
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss);
#ifdef CONFIG_X86_32
DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index baad72e4c100..6045cef376c2 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,6 +4,15 @@
#include <linux/clocksource.h>
#include <asm/pvclock-abi.h>
+#ifdef CONFIG_PARAVIRT_CLOCK
+extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
+#else
+static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+ return NULL;
+}
+#endif
+
/* some helper functions for xen and kvm pv clock sources */
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 9fc5968da820..a691b66cc40a 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -131,6 +131,24 @@ static inline void cr4_set_bits_and_update_boot(unsigned long mask)
cr4_set_bits(mask);
}
+/*
+ * Declare a couple of kaiser interfaces here for convenience,
+ * to avoid the need for asm/kaiser.h in unexpected places.
+ */
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern int kaiser_enabled;
+extern void kaiser_setup_pcid(void);
+extern void kaiser_flush_tlb_on_return_to_user(void);
+#else
+#define kaiser_enabled 0
+static inline void kaiser_setup_pcid(void)
+{
+}
+static inline void kaiser_flush_tlb_on_return_to_user(void)
+{
+}
+#endif
+
static inline void __native_flush_tlb(void)
{
/*
@@ -139,6 +157,8 @@ static inline void __native_flush_tlb(void)
* back:
*/
preempt_disable();
+ if (kaiser_enabled)
+ kaiser_flush_tlb_on_return_to_user();
native_write_cr3(native_read_cr3());
preempt_enable();
}
@@ -148,20 +168,27 @@ static inline void __native_flush_tlb_global_irq_disabled(void)
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- /* clear PGE */
- native_write_cr4(cr4 & ~X86_CR4_PGE);
- /* write old PGE again and flush TLBs */
- native_write_cr4(cr4);
+ if (cr4 & X86_CR4_PGE) {
+ /* clear PGE and flush TLB of all entries */
+ native_write_cr4(cr4 & ~X86_CR4_PGE);
+ /* restore PGE as it was before */
+ native_write_cr4(cr4);
+ } else {
+ /* do it with cr3, letting kaiser flush user PCID */
+ __native_flush_tlb();
+ }
}
static inline void __native_flush_tlb_global(void)
{
unsigned long flags;
- if (static_cpu_has(X86_FEATURE_INVPCID)) {
+ if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
invpcid_flush_all();
return;
@@ -173,24 +200,45 @@ static inline void __native_flush_tlb_global(void)
* be called from deep inside debugging code.)
*/
raw_local_irq_save(flags);
-
__native_flush_tlb_global_irq_disabled();
-
raw_local_irq_restore(flags);
}
static inline void __native_flush_tlb_single(unsigned long addr)
{
- asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+ /*
+ * SIMICS #GP's if you run INVPCID with type 2/3
+ * and X86_CR4_PCIDE clear. Shame!
+ *
+ * The ASIDs used below are hard-coded. But, we must not
+ * call invpcid(type=1/2) before CR4.PCIDE=1. Just call
+ * invlpg in the case we are called early.
+ */
+
+ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+ if (kaiser_enabled)
+ kaiser_flush_tlb_on_return_to_user();
+ asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+ return;
+ }
+ /* Flush the address out of both PCIDs. */
+ /*
+ * An optimization here might be to determine addresses
+ * that are only kernel-mapped and only flush the kernel
+ * ASID. But, userspace flushes are probably much more
+ * important performance-wise.
+ *
+ * Make sure to do only a single invpcid when KAISER is
+ * disabled and we have only a single ASID.
+ */
+ if (kaiser_enabled)
+ invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
}
static inline void __flush_tlb_all(void)
{
- if (cpu_has_pge)
- __flush_tlb_global();
- else
- __flush_tlb();
-
+ __flush_tlb_global();
/*
* Note: if we somehow had PCID but not PGE, then this wouldn't work --
* we'd end up flushing kernel translations for the current ASID but
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 756de9190aec..deabaf9759b6 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -22,6 +22,7 @@ struct vdso_image {
long sym_vvar_page;
long sym_hpet_page;
+ long sym_pvclock_page;
long sym_VDSO32_NOTE_MASK;
long sym___kernel_sigreturn;
long sym___kernel_rt_sigreturn;
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index 79887abcb5e1..1361779f44fe 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -77,7 +77,8 @@
#define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT)
#define X86_CR3_PCD_BIT 4 /* Page Cache Disable */
#define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
/*
* Intel CPU features in CR4
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index aa1e7246b06b..cc154ac64f00 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -92,7 +92,7 @@ static const struct cpu_dev default_cpu = {
static const struct cpu_dev *this_cpu = &default_cpu;
-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(struct gdt_page, gdt_page) = { .gdt = {
#ifdef CONFIG_X86_64
/*
* We need valid kernel segments for data and code in long mode too
@@ -324,8 +324,21 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
- if (cpu_has(c, X86_FEATURE_PGE)) {
+ if (cpu_has(c, X86_FEATURE_PGE) || kaiser_enabled) {
cr4_set_bits(X86_CR4_PCIDE);
+ /*
+ * INVPCID has two "groups" of types:
+ * 1/2: Invalidate an individual address
+ * 3/4: Invalidate all contexts
+ *
+ * 1/2 take a PCID, but 3/4 do not. So, 3/4
+ * ignore the PCID argument in the descriptor.
+ * But, we have to be careful not to call 1/2
+ * with an actual non-zero PCID in them before
+ * we do the above cr4_set_bits().
+ */
+ if (cpu_has(c, X86_FEATURE_INVPCID))
+ set_cpu_cap(c, X86_FEATURE_INVPCID_SINGLE);
} else {
/*
* flush_tlb_all(), as currently implemented, won't
@@ -338,6 +351,7 @@ static void setup_pcid(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_PCID);
}
}
+ kaiser_setup_pcid();
}
/*
@@ -1229,7 +1243,7 @@ static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
[DEBUG_STACK - 1] = DEBUG_STKSZ
};
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(char, exception_stacks
[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
/* May not be marked __init: used by software suspend */
@@ -1392,6 +1406,14 @@ void cpu_init(void)
* try to read it.
*/
cr4_init_shadow();
+ if (!kaiser_enabled) {
+ /*
+ * secondary_startup_64() deferred setting PGE in cr4:
+ * probe_page_size_mask() sets it on the boot cpu,
+ * but it needs to be set on each secondary cpu.
+ */
+ cr4_set_bits(X86_CR4_PGE);
+ }
/*
* Load microcode on this cpu if a valid microcode is available.
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 1e7de3cefc9c..f01b3a12dce0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -2,11 +2,15 @@
#include <linux/types.h>
#include <linux/slab.h>
+#include <asm/kaiser.h>
#include <asm/perf_event.h>
#include <asm/insn.h>
#include "perf_event.h"
+static
+DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct debug_store, cpu_debug_store);
+
/* The size of a BTS record in bytes: */
#define BTS_RECORD_SIZE 24
@@ -268,6 +272,39 @@ void fini_debug_store_on_cpu(int cpu)
static DEFINE_PER_CPU(void *, insn_buffer);
+static void *dsalloc(size_t size, gfp_t flags, int node)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ unsigned int order = get_order(size);
+ struct page *page;
+ unsigned long addr;
+
+ page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+ if (!page)
+ return NULL;
+ addr = (unsigned long)page_address(page);
+ if (kaiser_add_mapping(addr, size, __PAGE_KERNEL) < 0) {
+ __free_pages(page, order);
+ addr = 0;
+ }
+ return (void *)addr;
+#else
+ return kmalloc_node(size, flags | __GFP_ZERO, node);
+#endif
+}
+
+static void dsfree(const void *buffer, size_t size)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (!buffer)
+ return;
+ kaiser_remove_mapping((unsigned long)buffer, size);
+ free_pages((unsigned long)buffer, get_order(size));
+#else
+ kfree(buffer);
+#endif
+}
+
static int alloc_pebs_buffer(int cpu)
{
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -278,7 +315,7 @@ static int alloc_pebs_buffer(int cpu)
if (!x86_pmu.pebs)
return 0;
- buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
+ buffer = dsalloc(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
if (unlikely(!buffer))
return -ENOMEM;
@@ -289,7 +326,7 @@ static int alloc_pebs_buffer(int cpu)
if (x86_pmu.intel_cap.pebs_format < 2) {
ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
if (!ibuffer) {
- kfree(buffer);
+ dsfree(buffer, x86_pmu.pebs_buffer_size);
return -ENOMEM;
}
per_cpu(insn_buffer, cpu) = ibuffer;
@@ -315,7 +352,8 @@ static void release_pebs_buffer(int cpu)
kfree(per_cpu(insn_buffer, cpu));
per_cpu(insn_buffer, cpu) = NULL;
- kfree((void *)(unsigned long)ds->pebs_buffer_base);
+ dsfree((void *)(unsigned long)ds->pebs_buffer_base,
+ x86_pmu.pebs_buffer_size);
ds->pebs_buffer_base = 0;
}
@@ -329,7 +367,7 @@ static int alloc_bts_buffer(int cpu)
if (!x86_pmu.bts)
return 0;
- buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+ buffer = dsalloc(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
if (unlikely(!buffer)) {
WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
return -ENOMEM;
@@ -355,19 +393,15 @@ static void release_bts_buffer(int cpu)
if (!ds || !x86_pmu.bts)
return;
- kfree((void *)(unsigned long)ds->bts_buffer_base);
+ dsfree((void *)(unsigned long)ds->bts_buffer_base, BTS_BUFFER_SIZE);
ds->bts_buffer_base = 0;
}
static int alloc_ds_buffer(int cpu)
{
- int node = cpu_to_node(cpu);
- struct debug_store *ds;
-
- ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
- if (unlikely(!ds))
- return -ENOMEM;
+ struct debug_store *ds = per_cpu_ptr(&cpu_debug_store, cpu);
+ memset(ds, 0, sizeof(*ds));
per_cpu(cpu_hw_events, cpu).ds = ds;
return 0;
@@ -381,7 +415,6 @@ static void release_ds_buffer(int cpu)
return;
per_cpu(cpu_hw_events, cpu).ds = NULL;
- kfree(ds);
}
void release_ds_buffers(void)
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 4d38416e2a7f..b02cb2ec6726 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -41,6 +41,7 @@
#include <asm/pgalloc.h>
#include <asm/setup.h>
#include <asm/espfix.h>
+#include <asm/kaiser.h>
/*
* Note: we only need 6*8 = 48 bytes for the espfix stack, but round
@@ -126,6 +127,15 @@ void __init init_espfix_bsp(void)
/* Install the espfix pud into the kernel page directory */
pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
+ /*
+ * Just copy the top-level PGD that is mapping the espfix
+ * area to ensure it is mapped into the shadow user page
+ * tables.
+ */
+ if (kaiser_enabled) {
+ set_pgd(native_get_shadow_pgd(pgd_p),
+ __pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
+ }
/* Randomize the locations */
init_espfix_random();
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ffdc0e860390..4034e905741a 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -183,8 +183,8 @@ ENTRY(secondary_startup_64)
movq $(init_level4_pgt - __START_KERNEL_map), %rax
1:
- /* Enable PAE mode and PGE */
- movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
+ /* Enable PAE and PSE, but defer PGE until kaiser_enabled is decided */
+ movl $(X86_CR4_PAE | X86_CR4_PSE), %ecx
movq %rcx, %cr4
/* Setup early boot stage 4 level pagetables. */
@@ -441,6 +441,27 @@ early_idt_ripmsg:
.balign PAGE_SIZE; \
GLOBAL(name)
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Each PGD needs to be 8k long and 8k aligned. We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define KAISER_USER_PGD_FILL 512
+/* This ensures they are 8k-aligned: */
+#define NEXT_PGD_PAGE(name) \
+ .balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define KAISER_USER_PGD_FILL 0
+#endif
+
/* Automate the creation of 1 to 1 mapping pmd entries */
#define PMDS(START, PERM, COUNT) \
i = 0 ; \
@@ -450,9 +471,10 @@ GLOBAL(name)
.endr
__INITDATA
-NEXT_PAGE(early_level4_pgt)
+NEXT_PGD_PAGE(early_level4_pgt)
.fill 511,8,0
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -460,16 +482,18 @@ NEXT_PAGE(early_dynamic_pgts)
.data
#ifndef CONFIG_XEN
-NEXT_PAGE(init_level4_pgt)
+NEXT_PGD_PAGE(init_level4_pgt)
.fill 512,8,0
+ .fill KAISER_USER_PGD_FILL,8,0
#else
-NEXT_PAGE(init_level4_pgt)
+NEXT_PGD_PAGE(init_level4_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.org init_level4_pgt + L4_PAGE_OFFSET*8, 0
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.org init_level4_pgt + L4_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -480,6 +504,7 @@ NEXT_PAGE(level2_ident_pgt)
*/
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#endif
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(level3_kernel_pgt)
.fill L3_START_KERNEL,8,0
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 1423ab1b0312..f480b38a03c3 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -51,7 +51,7 @@ static struct irqaction irq2 = {
.flags = IRQF_NO_THREAD,
};
-DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
+DEFINE_PER_CPU_USER_MAPPED(vector_irq_t, vector_irq) = {
[0 ... NR_VECTORS - 1] = VECTOR_UNUSED,
};
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 2bd81e302427..ec1b06dc82d2 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -45,6 +45,11 @@ early_param("no-kvmclock", parse_no_kvmclock);
static struct pvclock_vsyscall_time_info *hv_clock;
static struct pvclock_wall_clock wall_clock;
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+ return hv_clock;
+}
+
/*
* The wallclock is the time of day when we booted. Since then, some time may
* have elapsed since the hypervisor wrote the data. So we try to account for
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index d6279593bcdd..bc429365b72a 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <linux/kaiser.h>
#include <asm/ldt.h>
#include <asm/desc.h>
@@ -34,11 +35,21 @@ static void flush_ldt(void *current_mm)
set_ldt(pc->ldt->entries, pc->ldt->size);
}
+static void __free_ldt_struct(struct ldt_struct *ldt)
+{
+ if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+ vfree(ldt->entries);
+ else
+ free_page((unsigned long)ldt->entries);
+ kfree(ldt);
+}
+
/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
static struct ldt_struct *alloc_ldt_struct(int size)
{
struct ldt_struct *new_ldt;
int alloc_size;
+ int ret;
if (size > LDT_ENTRIES)
return NULL;
@@ -66,7 +77,13 @@ static struct ldt_struct *alloc_ldt_struct(int size)
return NULL;
}
+ ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+ __PAGE_KERNEL);
new_ldt->size = size;
+ if (ret) {
+ __free_ldt_struct(new_ldt);
+ return NULL;
+ }
return new_ldt;
}
@@ -92,12 +109,10 @@ static void free_ldt_struct(struct ldt_struct *ldt)
if (likely(!ldt))
return;
+ kaiser_remove_mapping((unsigned long)ldt->entries,
+ ldt->size * LDT_ENTRY_SIZE);
paravirt_free_ldt(ldt->entries, ldt->size);
- if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
- vfree(ldt->entries);
- else
- free_page((unsigned long)ldt->entries);
- kfree(ldt);
+ __free_ldt_struct(ldt);
}
/*
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 8aa05583bc42..0677bf8d3a42 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -9,7 +9,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax");
DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
DEF_NATIVE(pv_cpu_ops, clts, "clts");
DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
@@ -62,7 +61,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
PATCH_SITE(pv_cpu_ops, clts);
- PATCH_SITE(pv_mmu_ops, flush_tlb_single);
PATCH_SITE(pv_cpu_ops, wbinvd);
#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..7c5c5dc90ffa 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -39,7 +39,7 @@
* section. Since TSS's are completely CPU-local, we want them
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
*/
-__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(struct tss_struct, cpu_tss) = {
.x86_tss = {
.sp0 = TOP_OF_INIT_STACK,
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e67b834279b2..bbaae4cf9e8e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -112,6 +112,7 @@
#include <asm/alternative.h>
#include <asm/prom.h>
#include <asm/microcode.h>
+#include <asm/kaiser.h>
/*
* max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -1016,6 +1017,12 @@ void __init setup_arch(char **cmdline_p)
*/
init_hypervisor_platform();
+ /*
+ * This needs to happen right after XENPV is set on xen and
+ * kaiser_enabled is checked below in cleanup_highmap().
+ */
+ kaiser_check_boottime_disable();
+
x86_init.resources.probe_roms();
/* after parse_early_param, so could debug it */
diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c
index 1c113db9ed57..2bb5ee464df3 100644
--- a/arch/x86/kernel/tracepoint.c
+++ b/arch/x86/kernel/tracepoint.c
@@ -9,10 +9,12 @@
#include <linux/atomic.h>
atomic_t trace_idt_ctr = ATOMIC_INIT(0);
+__aligned(PAGE_SIZE)
struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
(unsigned long) trace_idt_table };
/* No need to be aligned, but done to keep all IDTs defined the same way. */
+__aligned(PAGE_SIZE)
gate_desc trace_idt_table[NR_VECTORS] __page_aligned_bss;
static int trace_irq_vector_refcount;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 796f1ec67469..ccf17dbfea09 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -759,7 +759,8 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
/* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
- if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
+ if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_ASID_MASK) ||
+ !is_long_mode(vcpu))
return 1;
}
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 422db000d727..a744506856b1 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -82,3 +82,108 @@ int cmdline_find_option_bool(const char *cmdline, const char *option)
return 0; /* Buffer overrun */
}
+
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+ const char *option, char *buffer, int bufsize)
+{
+ char c;
+ int pos = 0, len = -1;
+ const char *opptr = NULL;
+ char *bufptr = buffer;
+ enum {
+ st_wordstart = 0, /* Start of word/after whitespace */
+ st_wordcmp, /* Comparing this word */
+ st_wordskip, /* Miscompare, skip */
+ st_bufcpy, /* Copying this to buffer */
+ } state = st_wordstart;
+
+ if (!cmdline)
+ return -1; /* No command line */
+
+ /*
+ * This 'pos' check ensures we do not overrun
+ * a non-NULL-terminated 'cmdline'
+ */
+ while (pos++ < max_cmdline_size) {
+ c = *(char *)cmdline++;
+ if (!c)
+ break;
+
+ switch (state) {
+ case st_wordstart:
+ if (myisspace(c))
+ break;
+
+ state = st_wordcmp;
+ opptr = option;
+ /* fall through */
+
+ case st_wordcmp:
+ if ((c == '=') && !*opptr) {
+ /*
+ * We matched all the way to the end of the
+ * option we were looking for, prepare to
+ * copy the argument.
+ */
+ len = 0;
+ bufptr = buffer;
+ state = st_bufcpy;
+ break;
+ } else if (c == *opptr++) {
+ /*
+ * We are currently matching, so continue
+ * to the next character on the cmdline.
+ */
+ break;
+ }
+ state = st_wordskip;
+ /* fall through */
+
+ case st_wordskip:
+ if (myisspace(c))
+ state = st_wordstart;
+ break;
+
+ case st_bufcpy:
+ if (myisspace(c)) {
+ state = st_wordstart;
+ } else {
+ /*
+ * Increment len, but don't overrun the
+ * supplied buffer and leave room for the
+ * NULL terminator.
+ */
+ if (++len < bufsize)
+ *bufptr++ = c;
+ }
+ break;
+ }
+ }
+
+ if (bufsize)
+ *bufptr = '\0';
+
+ return len;
+}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+ int bufsize)
+{
+ return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+ buffer, bufsize);
+}
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 1ae7c141f778..61e6cead9c4a 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o
obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
+obj-$(CONFIG_PAGE_TABLE_ISOLATION) += kaiser.o
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ed4b372860e4..2bd45ae91eb3 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -165,7 +165,7 @@ static void __init probe_page_size_mask(void)
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
- if (cpu_has_pge) {
+ if (cpu_has_pge && !kaiser_enabled) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
__supported_pte_mask |= _PAGE_GLOBAL;
} else
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ec081fe0ce2c..d76ec9348cff 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -395,6 +395,16 @@ void __init cleanup_highmap(void)
continue;
if (vaddr < (unsigned long) _text || vaddr > end)
set_pmd(pmd, __pmd(0));
+ else if (kaiser_enabled) {
+ /*
+ * level2_kernel_pgt is initialized with _PAGE_GLOBAL:
+ * clear that now. This is not important, so long as
+ * CR4.PGE remains clear, but it removes an anomaly.
+ * Physical mapping setup below avoids _PAGE_GLOBAL
+ * by use of massage_pgprot() inside pfn_pte() etc.
+ */
+ set_pmd(pmd, pmd_clear_flags(*pmd, _PAGE_GLOBAL));
+ }
}
}
diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
new file mode 100644
index 000000000000..b0b3a69f1c7f
--- /dev/null
+++ b/arch/x86/mm/kaiser.c
@@ -0,0 +1,456 @@
+#include <linux/bug.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <linux/ftrace.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
+
+#include <asm/kaiser.h>
+#include <asm/tlbflush.h> /* to verify its kaiser declarations */
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/desc.h>
+#include <asm/cmdline.h>
+
+int kaiser_enabled __read_mostly = 1;
+EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
+
+__visible
+DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3. It would take
+ * another register. So, we use a memory reference to these instead.
+ *
+ * This is also handy because systems that do not support PCIDs
+ * just end up or'ing a 0 into their CR3, which does no harm.
+ */
+DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
+
+/*
+ * At runtime, the only things we map are some things for CPU
+ * hotplug, and stacks for new processes. No two CPUs will ever
+ * be populating the same addresses, so we only need to ensure
+ * that we protect between two CPUs trying to allocate and
+ * populate the same page table page.
+ *
+ * Only take this lock when doing a set_p[4um]d(), but it is not
+ * needed for doing a set_pte(). We assume that only the *owner*
+ * of a given allocation will be doing this for _their_
+ * allocation.
+ *
+ * This ensures that once a system has been running for a while
+ * and there have been stacks all over and these page tables
+ * are fully populated, there will be no further acquisitions of
+ * this lock.
+ */
+static DEFINE_SPINLOCK(shadow_table_allocation_lock);
+
+/*
+ * Returns -1 on error.
+ */
+static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = pgd_offset_k(vaddr);
+ /*
+ * We made all the kernel PGDs present in kaiser_init().
+ * We expect them to stay that way.
+ */
+ BUG_ON(pgd_none(*pgd));
+ /*
+ * PGDs are either 512GB or 128TB on all x86_64
+ * configurations. We don't handle these.
+ */
+ BUG_ON(pgd_large(*pgd));
+
+ pud = pud_offset(pgd, vaddr);
+ if (pud_none(*pud)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ if (pud_large(*pud))
+ return (pud_pfn(*pud) << PAGE_SHIFT) | (vaddr & ~PUD_PAGE_MASK);
+
+ pmd = pmd_offset(pud, vaddr);
+ if (pmd_none(*pmd)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ if (pmd_large(*pmd))
+ return (pmd_pfn(*pmd) << PAGE_SHIFT) | (vaddr & ~PMD_PAGE_MASK);
+
+ pte = pte_offset_kernel(pmd, vaddr);
+ if (pte_none(*pte)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ return (pte_pfn(*pte) << PAGE_SHIFT) | (vaddr & ~PAGE_MASK);
+}
+
+/*
+ * This is a relatively normal page table walk, except that it
+ * also tries to allocate page tables pages along the way.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static pte_t *kaiser_pagetable_walk(unsigned long address)
+{
+ pmd_t *pmd;
+ pud_t *pud;
+ pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+
+ if (pgd_none(*pgd)) {
+ WARN_ONCE(1, "All shadow pgds should have been populated");
+ return NULL;
+ }
+ BUILD_BUG_ON(pgd_large(*pgd) != 0);
+
+ pud = pud_offset(pgd, address);
+ /* The shadow page tables do not use large mappings: */
+ if (pud_large(*pud)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pud_none(*pud)) {
+ unsigned long new_pmd_page = __get_free_page(gfp);
+ if (!new_pmd_page)
+ return NULL;
+ spin_lock(&shadow_table_allocation_lock);
+ if (pud_none(*pud)) {
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pmd_page), NR_KAISERTABLE);
+ } else
+ free_page(new_pmd_page);
+ spin_unlock(&shadow_table_allocation_lock);
+ }
+
+ pmd = pmd_offset(pud, address);
+ /* The shadow page tables do not use large mappings: */
+ if (pmd_large(*pmd)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pmd_none(*pmd)) {
+ unsigned long new_pte_page = __get_free_page(gfp);
+ if (!new_pte_page)
+ return NULL;
+ spin_lock(&shadow_table_allocation_lock);
+ if (pmd_none(*pmd)) {
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pte_page), NR_KAISERTABLE);
+ } else
+ free_page(new_pte_page);
+ spin_unlock(&shadow_table_allocation_lock);
+ }
+
+ return pte_offset_kernel(pmd, address);
+}
+
+static int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+ unsigned long flags)
+{
+ int ret = 0;
+ pte_t *pte;
+ unsigned long start_addr = (unsigned long )__start_addr;
+ unsigned long address = start_addr & PAGE_MASK;
+ unsigned long end_addr = PAGE_ALIGN(start_addr + size);
+ unsigned long target_address;
+
+ /*
+ * It is convenient for callers to pass in __PAGE_KERNEL etc,
+ * and there is no actual harm from setting _PAGE_GLOBAL, so
+ * long as CR4.PGE is not set. But it is nonetheless troubling
+ * to see Kaiser itself setting _PAGE_GLOBAL (now that "nokaiser"
+ * requires that not to be #defined to 0): so mask it off here.
+ */
+ flags &= ~_PAGE_GLOBAL;
+
+ for (; address < end_addr; address += PAGE_SIZE) {
+ target_address = get_pa_from_mapping(address);
+ if (target_address == -1) {
+ ret = -EIO;
+ break;
+ }
+ pte = kaiser_pagetable_walk(address);
+ if (!pte) {
+ ret = -ENOMEM;
+ break;
+ }
+ if (pte_none(*pte)) {
+ set_pte(pte, __pte(flags | target_address));
+ } else {
+ pte_t tmp;
+ set_pte(&tmp, __pte(flags | target_address));
+ WARN_ON_ONCE(!pte_same(*pte, tmp));
+ }
+ }
+ return ret;
+}
+
+static int kaiser_add_user_map_ptrs(const void *start, const void *end, unsigned long flags)
+{
+ unsigned long size = end - start;
+
+ return kaiser_add_user_map(start, size, flags);
+}
+
+/*
+ * Ensure that the top level of the (shadow) page tables are
+ * entirely populated. This ensures that all processes that get
+ * forked have the same entries. This way, we do not have to
+ * ever go set up new entries in older processes.
+ *
+ * Note: we never free these, so there are no updates to them
+ * after this.
+ */
+static void __init kaiser_init_all_pgds(void)
+{
+ pgd_t *pgd;
+ int i = 0;
+
+ pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
+ for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
+ pgd_t new_pgd;
+ pud_t *pud = pud_alloc_one(&init_mm,
+ PAGE_OFFSET + i * PGDIR_SIZE);
+ if (!pud) {
+ WARN_ON(1);
+ break;
+ }
+ inc_zone_page_state(virt_to_page(pud), NR_KAISERTABLE);
+ new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
+ /*
+ * Make sure not to stomp on some other pgd entry.
+ */
+ if (!pgd_none(pgd[i])) {
+ WARN_ON(1);
+ continue;
+ }
+ set_pgd(pgd + i, new_pgd);
+ }
+}
+
+#define kaiser_add_user_map_early(start, size, flags) do { \
+ int __ret = kaiser_add_user_map(start, size, flags); \
+ WARN_ON(__ret); \
+} while (0)
+
+#define kaiser_add_user_map_ptrs_early(start, end, flags) do { \
+ int __ret = kaiser_add_user_map_ptrs(start, end, flags); \
+ WARN_ON(__ret); \
+} while (0)
+
+void __init kaiser_check_boottime_disable(void)
+{
+ bool enable = true;
+ char arg[5];
+ int ret;
+
+ if (boot_cpu_has(X86_FEATURE_XENPV))
+ goto silent_disable;
+
+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (!strncmp(arg, "on", 2))
+ goto enable;
+
+ if (!strncmp(arg, "off", 3))
+ goto disable;
+
+ if (!strncmp(arg, "auto", 4))
+ goto skip;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti"))
+ goto disable;
+
+skip:
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+ goto disable;
+
+enable:
+ if (enable)
+ setup_force_cpu_cap(X86_FEATURE_KAISER);
+
+ return;
+
+disable:
+ pr_info("disabled\n");
+
+silent_disable:
+ kaiser_enabled = 0;
+ setup_clear_cpu_cap(X86_FEATURE_KAISER);
+}
+
+/*
+ * If anything in here fails, we will likely die on one of the
+ * first kernel->user transitions and init will die. But, we
+ * will have most of the kernel up by then and should be able to
+ * get a clean warning out of it. If we BUG_ON() here, we run
+ * the risk of being before we have good console output.
+ */
+void __init kaiser_init(void)
+{
+ int cpu;
+
+ if (!kaiser_enabled)
+ return;
+
+ kaiser_init_all_pgds();
+
+ for_each_possible_cpu(cpu) {
+ void *percpu_vaddr = __per_cpu_user_mapped_start +
+ per_cpu_offset(cpu);
+ unsigned long percpu_sz = __per_cpu_user_mapped_end -
+ __per_cpu_user_mapped_start;
+ kaiser_add_user_map_early(percpu_vaddr, percpu_sz,
+ __PAGE_KERNEL);
+ }
+
+ /*
+ * Map the entry/exit text section, which is needed at
+ * switches from user to and from kernel.
+ */
+ kaiser_add_user_map_ptrs_early(__entry_text_start, __entry_text_end,
+ __PAGE_KERNEL_RX);
+
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+ kaiser_add_user_map_ptrs_early(__irqentry_text_start,
+ __irqentry_text_end,
+ __PAGE_KERNEL_RX);
+#endif
+ kaiser_add_user_map_early((void *)idt_descr.address,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL_RO);
+#ifdef CONFIG_TRACING
+ kaiser_add_user_map_early(&trace_idt_descr,
+ sizeof(trace_idt_descr),
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&trace_idt_table,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL);
+#endif
+ kaiser_add_user_map_early(&debug_idt_descr, sizeof(debug_idt_descr),
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&debug_idt_table,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL);
+
+ pr_info("enabled\n");
+}
+
+/* Add a mapping to the shadow mapping, and synchronize the mappings */
+int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+{
+ if (!kaiser_enabled)
+ return 0;
+ return kaiser_add_user_map((const void *)addr, size, flags);
+}
+
+void kaiser_remove_mapping(unsigned long start, unsigned long size)
+{
+ extern void unmap_pud_range_nofree(pgd_t *pgd,
+ unsigned long start, unsigned long end);
+ unsigned long end = start + size;
+ unsigned long addr, next;
+ pgd_t *pgd;
+
+ if (!kaiser_enabled)
+ return;
+ pgd = native_get_shadow_pgd(pgd_offset_k(start));
+ for (addr = start; addr < end; pgd++, addr = next) {
+ next = pgd_addr_end(addr, end);
+ unmap_pud_range_nofree(pgd, addr, next);
+ }
+}
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(pgd_t *pgdp)
+{
+ return ((unsigned long)pgdp % PAGE_SIZE) < (PAGE_SIZE / 2);
+}
+
+pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ if (!kaiser_enabled)
+ return pgd;
+ /*
+ * Do we need to also populate the shadow pgd? Check _PAGE_USER to
+ * skip cases like kexec and EFI which make temporary low mappings.
+ */
+ if (pgd.pgd & _PAGE_USER) {
+ if (is_userspace_pgd(pgdp)) {
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ /*
+ * Even if the entry is *mapping* userspace, ensure
+ * that userspace can not use it. This way, if we
+ * get out to userspace running on the kernel CR3,
+ * userspace will crash instead of running.
+ */
+ if (__supported_pte_mask & _PAGE_NX)
+ pgd.pgd |= _PAGE_NX;
+ }
+ } else if (!pgd.pgd) {
+ /*
+ * pgd_clear() cannot check _PAGE_USER, and is even used to
+ * clear corrupted pgd entries: so just rely on cases like
+ * kexec and EFI never to be using pgd_clear().
+ */
+ if (!WARN_ON_ONCE((unsigned long)pgdp & PAGE_SIZE) &&
+ is_userspace_pgd(pgdp))
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ }
+ return pgd;
+}
+
+void kaiser_setup_pcid(void)
+{
+ unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
+
+ if (this_cpu_has(X86_FEATURE_PCID))
+ user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
+ /*
+ * These variables are used by the entry/exit
+ * code to change PCID and pgd and TLB flushing.
+ */
+ this_cpu_write(x86_cr3_pcid_user, user_cr3);
+}
+
+/*
+ * Make a note that this cpu will need to flush USER tlb on return to user.
+ * If cpu does not have PCID, then the NOFLUSH bit will never have been set.
+ */
+void kaiser_flush_tlb_on_return_to_user(void)
+{
+ if (this_cpu_has(X86_FEATURE_PCID))
+ this_cpu_write(x86_cr3_pcid_user,
+ X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
+}
+EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 4e5ac46adc9d..81ec7c02f968 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -121,11 +121,16 @@ void __init kasan_init(void)
kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
(void *)KASAN_SHADOW_END);
- memset(kasan_zero_page, 0, PAGE_SIZE);
-
load_cr3(init_level4_pgt);
__flush_tlb_all();
- init_task.kasan_depth = 0;
+ /*
+ * kasan_zero_page has been used as early shadow memory, thus it may
+ * contain some garbage. Now we can clear it, since after the TLB flush
+ * no one should write to it.
+ */
+ memset(kasan_zero_page, 0, PAGE_SIZE);
+
+ init_task.kasan_depth = 0;
pr_info("KernelAddressSanitizer initialized\n");
}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index b599a780a5a9..79377e2a7bcd 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -52,6 +52,7 @@ static DEFINE_SPINLOCK(cpa_lock);
#define CPA_FLUSHTLB 1
#define CPA_ARRAY 2
#define CPA_PAGES_ARRAY 4
+#define CPA_FREE_PAGETABLES 8
#ifdef CONFIG_PROC_FS
static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -723,10 +724,13 @@ static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
return 0;
}
-static bool try_to_free_pte_page(pte_t *pte)
+static bool try_to_free_pte_page(struct cpa_data *cpa, pte_t *pte)
{
int i;
+ if (!(cpa->flags & CPA_FREE_PAGETABLES))
+ return false;
+
for (i = 0; i < PTRS_PER_PTE; i++)
if (!pte_none(pte[i]))
return false;
@@ -735,10 +739,13 @@ static bool try_to_free_pte_page(pte_t *pte)
return true;
}
-static bool try_to_free_pmd_page(pmd_t *pmd)
+static bool try_to_free_pmd_page(struct cpa_data *cpa, pmd_t *pmd)
{
int i;
+ if (!(cpa->flags & CPA_FREE_PAGETABLES))
+ return false;
+
for (i = 0; i < PTRS_PER_PMD; i++)
if (!pmd_none(pmd[i]))
return false;
@@ -759,7 +766,9 @@ static bool try_to_free_pud_page(pud_t *pud)
return true;
}
-static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
+static bool unmap_pte_range(struct cpa_data *cpa, pmd_t *pmd,
+ unsigned long start,
+ unsigned long end)
{
pte_t *pte = pte_offset_kernel(pmd, start);
@@ -770,22 +779,23 @@ static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
pte++;
}
- if (try_to_free_pte_page((pte_t *)pmd_page_vaddr(*pmd))) {
+ if (try_to_free_pte_page(cpa, (pte_t *)pmd_page_vaddr(*pmd))) {
pmd_clear(pmd);
return true;
}
return false;
}
-static void __unmap_pmd_range(pud_t *pud, pmd_t *pmd,
+static void __unmap_pmd_range(struct cpa_data *cpa, pud_t *pud, pmd_t *pmd,
unsigned long start, unsigned long end)
{
- if (unmap_pte_range(pmd, start, end))
- if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+ if (unmap_pte_range(cpa, pmd, start, end))
+ if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
pud_clear(pud);
}
-static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
+static void unmap_pmd_range(struct cpa_data *cpa, pud_t *pud,
+ unsigned long start, unsigned long end)
{
pmd_t *pmd = pmd_offset(pud, start);
@@ -796,7 +806,7 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
unsigned long next_page = (start + PMD_SIZE) & PMD_MASK;
unsigned long pre_end = min_t(unsigned long, end, next_page);
- __unmap_pmd_range(pud, pmd, start, pre_end);
+ __unmap_pmd_range(cpa, pud, pmd, start, pre_end);
start = pre_end;
pmd++;
@@ -809,7 +819,8 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
if (pmd_large(*pmd))
pmd_clear(pmd);
else
- __unmap_pmd_range(pud, pmd, start, start + PMD_SIZE);
+ __unmap_pmd_range(cpa, pud, pmd,
+ start, start + PMD_SIZE);
start += PMD_SIZE;
pmd++;
@@ -819,17 +830,19 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
* 4K leftovers?
*/
if (start < end)
- return __unmap_pmd_range(pud, pmd, start, end);
+ return __unmap_pmd_range(cpa, pud, pmd, start, end);
/*
* Try again to free the PMD page if haven't succeeded above.
*/
if (!pud_none(*pud))
- if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+ if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
pud_clear(pud);
}
-static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+static void __unmap_pud_range(struct cpa_data *cpa, pgd_t *pgd,
+ unsigned long start,
+ unsigned long end)
{
pud_t *pud = pud_offset(pgd, start);
@@ -840,7 +853,7 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
unsigned long next_page = (start + PUD_SIZE) & PUD_MASK;
unsigned long pre_end = min_t(unsigned long, end, next_page);
- unmap_pmd_range(pud, start, pre_end);
+ unmap_pmd_range(cpa, pud, start, pre_end);
start = pre_end;
pud++;
@@ -854,7 +867,7 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
if (pud_large(*pud))
pud_clear(pud);
else
- unmap_pmd_range(pud, start, start + PUD_SIZE);
+ unmap_pmd_range(cpa, pud, start, start + PUD_SIZE);
start += PUD_SIZE;
pud++;
@@ -864,7 +877,7 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
* 2M leftovers?
*/
if (start < end)
- unmap_pmd_range(pud, start, end);
+ unmap_pmd_range(cpa, pud, start, end);
/*
* No need to try to free the PUD page because we'll free it in
@@ -872,6 +885,24 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
*/
}
+static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+ struct cpa_data cpa = {
+ .flags = CPA_FREE_PAGETABLES,
+ };
+
+ __unmap_pud_range(&cpa, pgd, start, end);
+}
+
+void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+ struct cpa_data cpa = {
+ .flags = 0,
+ };
+
+ __unmap_pud_range(&cpa, pgd, start, end);
+}
+
static void unmap_pgd_range(pgd_t *root, unsigned long addr, unsigned long end)
{
pgd_t *pgd_entry = root + pgd_index(addr);
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fb0a9dd1d6e4..dbc27a2b4ad5 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,7 +6,7 @@
#include <asm/fixmap.h>
#include <asm/mtrr.h>
-#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
#ifdef CONFIG_HIGHPTE
#define PGALLOC_USER_GFP __GFP_HIGHMEM
@@ -340,14 +340,24 @@ static inline void _pgd_free(pgd_t *pgd)
kmem_cache_free(pgd_cache, pgd);
}
#else
+
+/*
+ * Instead of one pgd, Kaiser acquires two pgds. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER kaiser_enabled
+
static inline pgd_t *_pgd_alloc(void)
{
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+ /* No __GFP_REPEAT: to avoid page allocation stalls in order-1 case */
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP & ~__GFP_REPEAT,
+ PGD_ALLOCATION_ORDER);
}
static inline void _pgd_free(pgd_t *pgd)
{
- free_page((unsigned long)pgd);
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
}
#endif /* CONFIG_X86_PAE */
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7a4cdb632508..7cad01af6dcd 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
-#include <linux/debugfs.h>
+#include <asm/kaiser.h>
/*
* TLB flushing, formerly SMP-only
@@ -34,6 +35,36 @@ struct flush_tlb_info {
unsigned long flush_end;
};
+static void load_new_mm_cr3(pgd_t *pgdir)
+{
+ unsigned long new_mm_cr3 = __pa(pgdir);
+
+ if (kaiser_enabled) {
+ /*
+ * We reuse the same PCID for different tasks, so we must
+ * flush all the entries for the PCID out when we change tasks.
+ * Flush KERN below, flush USER when returning to userspace in
+ * kaiser's SWITCH_USER_CR3 (_SWITCH_TO_USER_CR3) macro.
+ *
+ * invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
+ * do it here, but can only be used if X86_FEATURE_INVPCID is
+ * available - and many machines support pcid without invpcid.
+ *
+ * If X86_CR3_PCID_KERN_FLUSH actually added something, then it
+ * would be needed in the write_cr3() below - if PCIDs enabled.
+ */
+ BUILD_BUG_ON(X86_CR3_PCID_KERN_FLUSH);
+ kaiser_flush_tlb_on_return_to_user();
+ }
+
+ /*
+ * Caution: many callers of this function expect
+ * that load_cr3() is serializing and orders TLB
+ * fills with respect to the mm_cpumask writes.
+ */
+ write_cr3(new_mm_cr3);
+}
+
/*
* We cannot call mmdrop() because we are in interrupt context,
* instead update mm->cpu_vm_mask.
@@ -45,7 +76,7 @@ void leave_mm(int cpu)
BUG();
if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
- load_cr3(swapper_pg_dir);
+ load_new_mm_cr3(swapper_pg_dir);
/*
* This gets called in the idle path where RCU
* functions differently. Tracing normally
@@ -105,7 +136,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* ordering guarantee we need.
*
*/
- load_cr3(next->pgd);
+ load_new_mm_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
@@ -152,7 +183,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* As above, load_cr3() is serializing and orders TLB
* fills with respect to the mm_cpumask write.
*/
- load_cr3(next->pgd);
+ load_new_mm_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
load_mm_cr4(next);
load_mm_ldt(next);
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ef2e8c97e183..a461b6604fd9 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -725,7 +725,14 @@
*/
#define PERCPU_INPUT(cacheline) \
VMLINUX_SYMBOL(__per_cpu_start) = .; \
+ VMLINUX_SYMBOL(__per_cpu_user_mapped_start) = .; \
*(.data..percpu..first) \
+ . = ALIGN(cacheline); \
+ *(.data..percpu..user_mapped) \
+ *(.data..percpu..user_mapped..shared_aligned) \
+ . = ALIGN(PAGE_SIZE); \
+ *(.data..percpu..user_mapped..page_aligned) \
+ VMLINUX_SYMBOL(__per_cpu_user_mapped_end) = .; \
. = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
. = ALIGN(cacheline); \
diff --git a/include/linux/kaiser.h b/include/linux/kaiser.h
new file mode 100644
index 000000000000..58c55b1589d0
--- /dev/null
+++ b/include/linux/kaiser.h
@@ -0,0 +1,52 @@
+#ifndef _LINUX_KAISER_H
+#define _LINUX_KAISER_H
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#include <asm/kaiser.h>
+
+static inline int kaiser_map_thread_stack(void *stack)
+{
+ /*
+ * Map that page of kernel stack on which we enter from user context.
+ */
+ return kaiser_add_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE, __PAGE_KERNEL);
+}
+
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+ /*
+ * Note: may be called even when kaiser_map_thread_stack() failed.
+ */
+ kaiser_remove_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE);
+}
+#else
+
+/*
+ * These stubs are used whenever CONFIG_PAGE_TABLE_ISOLATION is off, which
+ * includes architectures that support KAISER, but have it disabled.
+ */
+
+static inline void kaiser_init(void)
+{
+}
+static inline int kaiser_add_mapping(unsigned long addr,
+ unsigned long size, unsigned long flags)
+{
+ return 0;
+}
+static inline void kaiser_remove_mapping(unsigned long start,
+ unsigned long size)
+{
+}
+static inline int kaiser_map_thread_stack(void *stack)
+{
+ return 0;
+}
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+}
+
+#endif /* !CONFIG_PAGE_TABLE_ISOLATION */
+#endif /* _LINUX_KAISER_H */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ff88d6189411..b93b578cfa42 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -131,8 +131,9 @@ enum zone_stat_item {
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
- NR_KERNEL_STACK,
/* Second 128 byte cacheline */
+ NR_KERNEL_STACK,
+ NR_KAISERTABLE,
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index 8f16299ca068..8902f23bb770 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -35,6 +35,12 @@
#endif
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define USER_MAPPED_SECTION "..user_mapped"
+#else
+#define USER_MAPPED_SECTION ""
+#endif
+
/*
* Base implementations of per-CPU variable declarations and definitions, where
* the section in which the variable is to be placed is provided by the
@@ -115,6 +121,12 @@
#define DEFINE_PER_CPU(type, name) \
DEFINE_PER_CPU_SECTION(type, name, "")
+#define DECLARE_PER_CPU_USER_MAPPED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
+#define DEFINE_PER_CPU_USER_MAPPED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION)
+
/*
* Declaration/definition used for per-CPU variables that must come first in
* the set of variables.
@@ -144,6 +156,14 @@
DEFINE_PER_CPU_SECTION(type, name, PER_CPU_SHARED_ALIGNED_SECTION) \
____cacheline_aligned_in_smp
+#define DECLARE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+ ____cacheline_aligned_in_smp
+
+#define DEFINE_PER_CPU_SHARED_ALIGNED_USER_MAPPED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION PER_CPU_SHARED_ALIGNED_SECTION) \
+ ____cacheline_aligned_in_smp
+
#define DECLARE_PER_CPU_ALIGNED(type, name) \
DECLARE_PER_CPU_SECTION(type, name, PER_CPU_ALIGNED_SECTION) \
____cacheline_aligned
@@ -162,11 +182,21 @@
#define DEFINE_PER_CPU_PAGE_ALIGNED(type, name) \
DEFINE_PER_CPU_SECTION(type, name, "..page_aligned") \
__aligned(PAGE_SIZE)
+/*
+ * Declaration/definition used for per-CPU variables that must be page aligned and need to be mapped in user mode.
+ */
+#define DECLARE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned") \
+ __aligned(PAGE_SIZE)
+
+#define DEFINE_PER_CPU_PAGE_ALIGNED_USER_MAPPED(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, USER_MAPPED_SECTION"..page_aligned") \
+ __aligned(PAGE_SIZE)
/*
* Declaration/definition used for per-CPU variables that must be read mostly.
*/
-#define DECLARE_PER_CPU_READ_MOSTLY(type, name) \
+#define DECLARE_PER_CPU_READ_MOSTLY(type, name) \
DECLARE_PER_CPU_SECTION(type, name, "..read_mostly")
#define DEFINE_PER_CPU_READ_MOSTLY(type, name) \
diff --git a/init/main.c b/init/main.c
index 9e64d7097f1a..49926d95442f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -81,6 +81,7 @@
#include <linux/integrity.h>
#include <linux/proc_ns.h>
#include <linux/io.h>
+#include <linux/kaiser.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -492,6 +493,7 @@ static void __init mm_init(void)
pgtable_init();
vmalloc_init();
ioremap_huge_init();
+ kaiser_init();
}
asmlinkage __visible void __init start_kernel(void)
diff --git a/kernel/fork.c b/kernel/fork.c
index 68cfda1c1800..ac00f14208b7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -58,6 +58,7 @@
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/freezer.h>
+#include <linux/kaiser.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
@@ -169,6 +170,7 @@ static struct thread_info *alloc_thread_info_node(struct task_struct *tsk,
static inline void free_thread_info(struct thread_info *ti)
{
+ kaiser_unmap_thread_stack(ti);
free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
}
# else
@@ -352,6 +354,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
goto free_ti;
tsk->stack = ti;
+
+ err = kaiser_map_thread_stack(tsk->stack);
+ if (err)
+ goto free_ti;
#ifdef CONFIG_SECCOMP
/*
* We must handle setting up seccomp filters once we're under
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c344e3609c53..324b7e90b4c5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -736,6 +736,7 @@ const char * const vmstat_text[] = {
"nr_slab_unreclaimable",
"nr_page_table_pages",
"nr_kernel_stack",
+ "nr_overhead",
"nr_unstable",
"nr_bounce",
"nr_vmscan_write",
diff --git a/security/Kconfig b/security/Kconfig
index e45237897b43..a3ebb6ee5bd5 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -31,6 +31,16 @@ config SECURITY
If you are unsure how to answer this question, answer N.
+config PAGE_TABLE_ISOLATION
+ bool "Remove the kernel mapping in user mode"
+ default y
+ depends on X86_64 && SMP
+ help
+ This enforces a strict kernel and user space isolation, in order
+ to close hardware side channels on kernel address information.
+
+ If you are unsure how to answer this question, answer Y.
+
config SECURITYFS
bool "Enable the securityfs filesystem"
help
This is a note to let you know that I've just added the patch titled
x86/kasan: Clear kasan_zero_page after TLB flush
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kasan-clear-kasan_zero_page-after-tlb-flush.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 69e0210fd01ff157d332102219aaf5c26ca8069b Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Date: Mon, 11 Jan 2016 15:51:18 +0300
Subject: x86/kasan: Clear kasan_zero_page after TLB flush
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
commit 69e0210fd01ff157d332102219aaf5c26ca8069b upstream.
Currently we clear kasan_zero_page before __flush_tlb_all(). This
works with current implementation of native_flush_tlb[_global]()
because it doesn't cause do any writes to kasan shadow memory.
But any subtle change made in native_flush_tlb*() could break this.
Also current code seems doesn't work for paravirt guests (lguest).
Only after the TLB flush we can be sure that kasan_zero_page is not
used as early shadow anymore (instrumented code will not write to it).
So it should cleared it only after the TLB flush.
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof(a)suse.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Toshi Kani <toshi.kani(a)hp.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuo…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Jamie Iles <jamie.iles(a)oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kasan_init_64.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -121,11 +121,16 @@ void __init kasan_init(void)
kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
(void *)KASAN_SHADOW_END);
- memset(kasan_zero_page, 0, PAGE_SIZE);
-
load_cr3(init_level4_pgt);
__flush_tlb_all();
- init_task.kasan_depth = 0;
+ /*
+ * kasan_zero_page has been used as early shadow memory, thus it may
+ * contain some garbage. Now we can clear it, since after the TLB flush
+ * no one should write to it.
+ */
+ memset(kasan_zero_page, 0, PAGE_SIZE);
+
+ init_task.kasan_depth = 0;
pr_info("KernelAddressSanitizer initialized\n");
}
Patches currently in stable-queue which might be from aryabinin(a)virtuozzo.com are
queue-4.4/x86-kasan-clear-kasan_zero_page-after-tlb-flush.patch
queue-4.4/x86-boot-add-early-cmdline-parsing-for-options-with-arguments.patch
This is a note to let you know that I've just added the patch titled
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vdso-pvclock-simplify-and-speed-up-the-vdso-pvclock-reader.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6b078f5de7fc0851af4102493c7b5bb07e49c4cb Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)amacapital.net>
Date: Thu, 10 Dec 2015 19:20:19 -0800
Subject: x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
From: Andy Lutomirski <luto(a)amacapital.net>
commit 6b078f5de7fc0851af4102493c7b5bb07e49c4cb upstream.
The pvclock vdso code was too abstracted to understand easily
and excessively paranoid. Simplify it for a huge speedup.
This opens the door for additional simplifications, as the vdso
no longer accesses the pvti for any vcpu other than vcpu 0.
Before, vclock_gettime using kvm-clock took about 45ns on my
machine. With this change, it takes 29ns, which is almost as
fast as the pure TSC implementation.
Signed-off-by: Andy Lutomirski <luto(a)amacapital.net>
Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/6b51dcc41f1b101f963945c5ec7093d72bdac429.144970253…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Jamie Iles <jamie.iles(a)oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vdso/vclock_gettime.c | 79 +++++++++++++++++++----------------
1 file changed, 45 insertions(+), 34 deletions(-)
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -78,47 +78,58 @@ static notrace const struct pvclock_vsys
static notrace cycle_t vread_pvclock(int *mode)
{
- const struct pvclock_vsyscall_time_info *pvti;
+ const struct pvclock_vcpu_time_info *pvti = &get_pvti(0)->pvti;
cycle_t ret;
- u64 last;
- u32 version;
- u8 flags;
- unsigned cpu, cpu1;
-
+ u64 tsc, pvti_tsc;
+ u64 last, delta, pvti_system_time;
+ u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
/*
- * Note: hypervisor must guarantee that:
- * 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
- * 2. that per-CPU pvclock time info is updated if the
- * underlying CPU changes.
- * 3. that version is increased whenever underlying CPU
- * changes.
+ * Note: The kernel and hypervisor must guarantee that cpu ID
+ * number maps 1:1 to per-CPU pvclock time info.
+ *
+ * Because the hypervisor is entirely unaware of guest userspace
+ * preemption, it cannot guarantee that per-CPU pvclock time
+ * info is updated if the underlying CPU changes or that that
+ * version is increased whenever underlying CPU changes.
+ *
+ * On KVM, we are guaranteed that pvti updates for any vCPU are
+ * atomic as seen by *all* vCPUs. This is an even stronger
+ * guarantee than we get with a normal seqlock.
*
+ * On Xen, we don't appear to have that guarantee, but Xen still
+ * supplies a valid seqlock using the version field.
+
+ * We only do pvclock vdso timing at all if
+ * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
+ * mean that all vCPUs have matching pvti and that the TSC is
+ * synced, so we can just look at vCPU 0's pvti.
*/
- do {
- cpu = __getcpu() & VGETCPU_CPU_MASK;
- /* TODO: We can put vcpu id into higher bits of pvti.version.
- * This will save a couple of cycles by getting rid of
- * __getcpu() calls (Gleb).
- */
-
- pvti = get_pvti(cpu);
-
- version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
-
- /*
- * Test we're still on the cpu as well as the version.
- * We could have been migrated just after the first
- * vgetcpu but before fetching the version, so we
- * wouldn't notice a version change.
- */
- cpu1 = __getcpu() & VGETCPU_CPU_MASK;
- } while (unlikely(cpu != cpu1 ||
- (pvti->pvti.version & 1) ||
- pvti->pvti.version != version));
- if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
+ if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
*mode = VCLOCK_NONE;
+ return 0;
+ }
+
+ do {
+ version = pvti->version;
+
+ /* This is also a read barrier, so we'll read version first. */
+ tsc = rdtsc_ordered();
+
+ pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
+ pvti_tsc_shift = pvti->tsc_shift;
+ pvti_system_time = pvti->system_time;
+ pvti_tsc = pvti->tsc_timestamp;
+
+ /* Make sure that the version double-check is last. */
+ smp_rmb();
+ } while (unlikely((version & 1) || version != pvti->version));
+
+ delta = tsc - pvti_tsc;
+ ret = pvti_system_time +
+ pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
+ pvti_tsc_shift);
/* refer to tsc.c read_tsc() comment for rationale */
last = gtod->cycle_last;
Patches currently in stable-queue which might be from luto(a)amacapital.net are
queue-4.4/x86-vdso-pvclock-simplify-and-speed-up-the-vdso-pvclock-reader.patch
queue-4.4/x86-vdso-get-pvclock-data-from-the-vvar-vma-instead-of-the-fixmap.patch
This is the start of the stable review cycle for the 4.14.12 release.
There are 14 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat Jan 6 12:08:52 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.12-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.12-rc1
Troy Kisky <troy.kisky(a)boundarydevices.com>
rtc: m41t80: remove unneeded checks from m41t80_sqw_set_rate
Troy Kisky <troy.kisky(a)boundarydevices.com>
rtc: m41t80: avoid i2c read in m41t80_sqw_is_prepared
Troy Kisky <troy.kisky(a)boundarydevices.com>
rtc: m41t80: avoid i2c read in m41t80_sqw_recalc_rate
Troy Kisky <troy.kisky(a)boundarydevices.com>
rtc: m41t80: fix m41t80_sqw_round_rate return value
Troy Kisky <troy.kisky(a)boundarydevices.com>
rtc: m41t80: m41t80_sqw_set_rate should return 0 on success
Steffen Klassert <steffen.klassert(a)secunet.com>
Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find."
Nick Desaulniers <ndesaulniers(a)google.com>
x86/process: Define cpu_tss_rw in same section as declaration
Thomas Gleixner <tglx(a)linutronix.de>
x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/dumpstack: Print registers for first stack frame
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/dumpstack: Fix partial register dumps
Thomas Gleixner <tglx(a)linutronix.de>
x86/pti: Make sure the user/kernel PTEs match
Tom Lendacky <thomas.lendacky(a)amd.com>
x86/cpu, x86/pti: Do not enable PTI on AMD processors
Eric Biggers <ebiggers(a)google.com>
capabilities: fix buffer overread on very short xattr
Kees Cook <keescook(a)chromium.org>
exec: Weaken dumpability for secureexec
-------------
Diffstat:
Makefile | 4 +-
arch/x86/entry/entry_64_compat.S | 13 +++----
arch/x86/include/asm/unwind.h | 17 ++++++--
arch/x86/kernel/cpu/common.c | 4 +-
arch/x86/kernel/dumpstack.c | 31 ++++++++++-----
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/stacktrace.c | 2 +-
arch/x86/mm/pti.c | 3 +-
drivers/rtc/rtc-m41t80.c | 84 ++++++++++++++++++----------------------
fs/exec.c | 9 ++++-
net/xfrm/xfrm_policy.c | 29 ++++++++------
security/commoncap.c | 21 +++++-----
12 files changed, 120 insertions(+), 99 deletions(-)
From: Eric Biggers <ebiggers(a)google.com>
syzkaller triggered a NULL pointer dereference in crypto_remove_spawns()
via a program that repeatedly and concurrently requests AEADs
"authenc(cmac(des3_ede-asm),pcbc-aes-aesni)" and hashes "cmac(des3_ede)"
through AF_ALG, where the hashes are requested as "untested"
(CRYPTO_ALG_TESTED is set in ->salg_mask but clear in ->salg_feat; this
causes the template to be instantiated for every request).
Although AF_ALG users really shouldn't be able to request an "untested"
algorithm, the NULL pointer dereference is actually caused by a
longstanding race condition where crypto_remove_spawns() can encounter
an instance which has had spawn(s) "grabbed" but hasn't yet been
registered, resulting in ->cra_users still being NULL.
We probably should properly initialize ->cra_users earlier, but that
would require updating many templates individually. For now just fix
the bug in a simple way that can easily be backported: make
crypto_remove_spawns() treat a NULL ->cra_users list as empty.
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
crypto/algapi.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/crypto/algapi.c b/crypto/algapi.c
index 9895cafcce7e..395b082d03a9 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -166,6 +166,18 @@ void crypto_remove_spawns(struct crypto_alg *alg, struct list_head *list,
spawn->alg = NULL;
spawns = &inst->alg.cra_users;
+
+ /*
+ * We may encounter an unregistered instance here, since
+ * an instance's spawns are set up prior to the instance
+ * being registered. An unregistered instance will have
+ * NULL ->cra_users.next, since ->cra_users isn't
+ * properly initialized until registration. But an
+ * unregistered instance cannot have any users, so treat
+ * it the same as ->cra_users being empty.
+ */
+ if (spawns->next == NULL)
+ break;
}
} while ((spawns = crypto_more_spawns(alg, &stack, &top,
&secondary_spawns)));
--
2.15.1
As mentioned on commit '88be58be886f ("drm/i915/fbdev:
Always forward hotplug events") we have real valid cases
of hotplugs where fbdev is not fully setup yet.
Unfortunately this remove the checkpoint after the sync point.
So probably we can live without it. Or we need a more robust
serialization.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104158
Fixes: a45b30a6c5db ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: jrg2718(a)gmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
---
drivers/gpu/drm/i915/intel_fbdev.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index da48af11eb6b..7a6069b389f2 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -801,8 +801,7 @@ void intel_fbdev_output_poll_changed(struct drm_device *dev)
return;
intel_fbdev_sync(ifbdev);
- if (ifbdev->vma)
- drm_fb_helper_hotplug_event(&ifbdev->helper);
+ drm_fb_helper_hotplug_event(&ifbdev->helper);
}
void intel_fbdev_restore_mode(struct drm_device *dev)
--
2.13.6
On Thu, Jan 04, 2018 at 11:39:23PM +0000, Kenneth Graunke wrote:
> On Thursday, January 4, 2018 1:23:06 PM PST Chris Wilson wrote:
> > Quoting Kenneth Graunke (2018-01-04 19:38:05)
> > > Geminilake requires the 3D driver to select whether barriers are
> > > intended for compute shaders, or tessellation control shaders, by
> > > whacking a "Barrier Mode" bit in SLICE_COMMON_ECO_CHICKEN1 when
> > > switching pipelines. Failure to do this properly can result in GPU
> > > hangs.
> > >
> > > Unfortunately, this means it needs to switch mid-batch, so only
> > > userspace can properly set it. To facilitate this, the kernel needs
> > > to whitelist the register.
> > >
> > > Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org>
> > > Cc: stable(a)vger.kernel.org
> > > ---
> > > drivers/gpu/drm/i915/i915_reg.h | 2 ++
> > > drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++++
> > > 2 files changed, 7 insertions(+)
> > >
> > > Hello,
> > >
> > > We unfortunately need to whitelist an extra register for GPU hang fix
> > > on Geminilake. Here's the corresponding Mesa patch:
> >
> > Thankfully it appears to be context saved. Has a w/a name been assigned
> > for this?
> > -Chris
>
> There doesn't appear to be one. The workaround page lists it, but there
> is no name. The register description has a note saying that you need to
> set this, but doesn't call it out as a workaround.
It mentions only BXT:ALL, but not mention to GLK.
Should we add to both then?
>
> That's why I put a generic comment, rather than the name.
On Display side we started using the row name for this case, to help
easily finding this later.
ex: "Display WA #0390: skl,kbl"
The number for this apparently is:
WA #0862
Maybe we could use this one to start
/* GT WA #0862: bxt,glk */
GT? GEM?
Unnamed WA #0862?
Thanks,
Rodrigo.
>
> --Ken
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
From: Baoquan He <bhe(a)redhat.com>
Subject: mm/sparse.c: wrong allocation for mem_section
In 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to save memory. While it allocates
the first dimension of array with sizeof(struct mem_section). It costs
extra memory, should be sizeof(struct mem_section*).
Fix it.
Link: http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com
Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Tested-by: Dave Young <dyoung(a)redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Atsushi Kumagai <ats-kumagai(a)wm.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/sparse.c~mm-sparsec-wrong-allocation-for-mem_section mm/sparse.c
--- a/mm/sparse.c~mm-sparsec-wrong-allocation-for-mem_section
+++ a/mm/sparse.c
@@ -211,7 +211,7 @@ void __init memory_present(int nid, unsi
if (unlikely(!mem_section)) {
unsigned long size, align;
- size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
+ size = sizeof(struct mem_section*) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
}
_
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space()
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check is
very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck > jiffies,
while currently it checks "needcheck < jiffies" and thus in the likely
case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on() sets
acct->needcheck = jiffies and expects that check_free_space() should set
acct->active = 1 after the free-space check, but this won't happen if
jiffies increments in between.
This was broken by commit 32dc73086015 ("get rid of timer in kern/acct.c")
in 2011, then another (correct) commit 795a2f22a8ea ("acct() should honour
the limits from the very beginning") made the problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc73086015 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada(a)ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/acct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN kernel/acct.c~acct-fix-the-acct-needcheck-check-in-check_free_space kernel/acct.c
--- a/kernel/acct.c~acct-fix-the-acct-needcheck-check-in-check_free_space
+++ a/kernel/acct.c
@@ -102,7 +102,7 @@ static int check_free_space(struct bsd_a
{
struct kstatfs sbuf;
- if (time_is_before_jiffies(acct->needcheck))
+ if (time_is_after_jiffies(acct->needcheck))
goto out;
/* May block */
_
From: Thomas Gleixner <tglx(a)linutronix.de>
The preparation for PTI which added CR3 switching to the entry code
misplaced the CR3 switch in entry_SYSCALL_compat().
With PTI enabled the entry code tries to access a per cpu variable after
switching to kernel GS. This fails because that variable is not mapped to
user space. This results in a double fault and in the worst case a kernel
crash.
Move the switch ahead of the access and clobber RSP which has been saved
already.
Fixes: 8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching")
Reported-by: Lars Wendler <wendler.lars(a)web.de>
Reported-by: Laura Abbott <labbott(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Andy Lutomirski <luto(a)kernel.org>,
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>,
Cc: Peter Zijlstra <peterz(a)infradead.org>,
Cc: Greg KH <gregkh(a)linuxfoundation.org>, ,
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>,
Cc: Juergen Gross <jgross(a)suse.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
---
arch/x86/entry/entry_64_compat.S | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 40f17009ec20..98d5358e4041 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -190,8 +190,13 @@ ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
swapgs
- /* Stash user ESP and switch to the kernel stack. */
+ /* Stash user ESP */
movl %esp, %r8d
+
+ /* Use %rsp as scratch reg. User ESP is stashed in r8 */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
+ /* Switch to the kernel stack */
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/* Construct struct pt_regs on stack */
@@ -219,12 +224,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq $0 /* pt_regs->r14 = 0 */
pushq $0 /* pt_regs->r15 = 0 */
- /*
- * We just saved %rdi so it is safe to clobber. It is not
- * preserved during the C calls inside TRACE_IRQS_OFF anyway.
- */
- SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
-
/*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
--
2.15.1
Geminilake requires the 3D driver to select whether barriers are
intended for compute shaders, or tessellation control shaders, by
whacking a "Barrier Mode" bit in SLICE_COMMON_ECO_CHICKEN1 when
switching pipelines. Failure to do this properly can result in GPU
hangs.
Unfortunately, this means it needs to switch mid-batch, so only
userspace can properly set it. To facilitate this, the kernel needs
to whitelist the register.
Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/i915_reg.h | 2 ++
drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++++
2 files changed, 7 insertions(+)
Hello,
We unfortunately need to whitelist an extra register for GPU hang fix
on Geminilake. Here's the corresponding Mesa patch:
https://patchwork.freedesktop.org/patch/196047/
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 966e4df9700e..505c605eff98 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7079,6 +7079,8 @@ enum {
#define GEN9_SLICE_COMMON_ECO_CHICKEN0 _MMIO(0x7308)
#define DISABLE_PIXEL_MASK_CAMMING (1<<14)
+#define GEN9_SLICE_COMMON_ECO_CHICKEN1 _MMIO(0x731c)
+
#define GEN7_L3SQCREG1 _MMIO(0xB010)
#define VLV_B0_WA_L3SQCREG1_VALUE 0x00D30000
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ebdcbcbacb3c..d64a9f907550 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1338,6 +1338,11 @@ static int glk_init_workarounds(struct intel_engine_cs *engine)
if (ret)
return ret;
+ /* Userspace needs to toggle "Barrier Mode" to avoid GPU hangs */
+ ret = wa_ring_whitelist_reg(engine, GEN9_SLICE_COMMON_ECO_CHICKEN1);
+ if (ret)
+ return ret;
+
/* WaToEnableHwFixForPushConstHWBug:glk */
WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION);
--
2.15.1
This is a note to let you know that I've just added the patch titled
usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 5fd77a3a0e408c23ab4002a57db980e46bc16e72 Mon Sep 17 00:00:00 2001
From: Shuah Khan <shuahkh(a)osg.samsung.com>
Date: Fri, 22 Dec 2017 19:23:47 -0700
Subject: usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer
buffer
v_send_ret_submit() handles urb with a null transfer_buffer, when it
replays a packet with potential malicious data that could contain a
null buffer.
Add a check for the condition when actual_length > 0 and transfer_buffer
is null.
Signed-off-by: Shuah Khan <shuahkh(a)osg.samsung.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/usbip/vudc_tx.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/usbip/vudc_tx.c b/drivers/usb/usbip/vudc_tx.c
index 1440ae0919ec..3ccb17c3e840 100644
--- a/drivers/usb/usbip/vudc_tx.c
+++ b/drivers/usb/usbip/vudc_tx.c
@@ -85,6 +85,13 @@ static int v_send_ret_submit(struct vudc *udc, struct urbp *urb_p)
memset(&pdu_header, 0, sizeof(pdu_header));
memset(&msg, 0, sizeof(msg));
+ if (urb->actual_length > 0 && !urb->transfer_buffer) {
+ dev_err(&udc->gadget.dev,
+ "urb: actual_length %d transfer_buffer null\n",
+ urb->actual_length);
+ return -1;
+ }
+
if (urb_p->type == USB_ENDPOINT_XFER_ISOC)
iovnum = 2 + urb->number_of_packets;
else
@@ -100,8 +107,8 @@ static int v_send_ret_submit(struct vudc *udc, struct urbp *urb_p)
/* 1. setup usbip_header */
setup_ret_submit_pdu(&pdu_header, urb_p);
- usbip_dbg_stub_tx("setup txdata seqnum: %d urb: %p\n",
- pdu_header.base.seqnum, urb);
+ usbip_dbg_stub_tx("setup txdata seqnum: %d\n",
+ pdu_header.base.seqnum);
usbip_header_correct_endian(&pdu_header, 1);
iov[iovnum].iov_base = &pdu_header;
--
2.15.1
This is a note to let you know that I've just added the patch titled
usbip: remove kernel addresses from usb device and urb debug msgs
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From e1346fd87c71a1f61de1fe476ec8df1425ac931c Mon Sep 17 00:00:00 2001
From: Shuah Khan <shuahkh(a)osg.samsung.com>
Date: Fri, 22 Dec 2017 17:00:06 -0700
Subject: usbip: remove kernel addresses from usb device and urb debug msgs
usbip_dump_usb_device() and usbip_dump_urb() print kernel addresses.
Remove kernel addresses from usb device and urb debug msgs and improve
the message content.
Instead of printing parent device and bus addresses, print parent device
and bus names.
Signed-off-by: Shuah Khan <shuahkh(a)osg.samsung.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/usbip/usbip_common.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c
index 7b219d9109b4..ee2bbce24584 100644
--- a/drivers/usb/usbip/usbip_common.c
+++ b/drivers/usb/usbip/usbip_common.c
@@ -91,7 +91,7 @@ static void usbip_dump_usb_device(struct usb_device *udev)
dev_dbg(dev, " devnum(%d) devpath(%s) usb speed(%s)",
udev->devnum, udev->devpath, usb_speed_string(udev->speed));
- pr_debug("tt %p, ttport %d\n", udev->tt, udev->ttport);
+ pr_debug("tt hub ttport %d\n", udev->ttport);
dev_dbg(dev, " ");
for (i = 0; i < 16; i++)
@@ -124,12 +124,8 @@ static void usbip_dump_usb_device(struct usb_device *udev)
}
pr_debug("\n");
- dev_dbg(dev, "parent %p, bus %p\n", udev->parent, udev->bus);
-
- dev_dbg(dev,
- "descriptor %p, config %p, actconfig %p, rawdescriptors %p\n",
- &udev->descriptor, udev->config,
- udev->actconfig, udev->rawdescriptors);
+ dev_dbg(dev, "parent %s, bus %s\n", dev_name(&udev->parent->dev),
+ udev->bus->bus_name);
dev_dbg(dev, "have_langid %d, string_langid %d\n",
udev->have_langid, udev->string_langid);
@@ -237,9 +233,6 @@ void usbip_dump_urb(struct urb *urb)
dev = &urb->dev->dev;
- dev_dbg(dev, " urb :%p\n", urb);
- dev_dbg(dev, " dev :%p\n", urb->dev);
-
usbip_dump_usb_device(urb->dev);
dev_dbg(dev, " pipe :%08x ", urb->pipe);
@@ -248,11 +241,9 @@ void usbip_dump_urb(struct urb *urb)
dev_dbg(dev, " status :%d\n", urb->status);
dev_dbg(dev, " transfer_flags :%08X\n", urb->transfer_flags);
- dev_dbg(dev, " transfer_buffer :%p\n", urb->transfer_buffer);
dev_dbg(dev, " transfer_buffer_length:%d\n",
urb->transfer_buffer_length);
dev_dbg(dev, " actual_length :%d\n", urb->actual_length);
- dev_dbg(dev, " setup_packet :%p\n", urb->setup_packet);
if (urb->setup_packet && usb_pipetype(urb->pipe) == PIPE_CONTROL)
usbip_dump_usb_ctrlrequest(
@@ -262,8 +253,6 @@ void usbip_dump_urb(struct urb *urb)
dev_dbg(dev, " number_of_packets :%d\n", urb->number_of_packets);
dev_dbg(dev, " interval :%d\n", urb->interval);
dev_dbg(dev, " error_count :%d\n", urb->error_count);
- dev_dbg(dev, " context :%p\n", urb->context);
- dev_dbg(dev, " complete :%p\n", urb->complete);
}
EXPORT_SYMBOL_GPL(usbip_dump_urb);
--
2.15.1
This is a note to let you know that I've just added the patch titled
usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From b78d830f0049ef1966dc1e0ebd1ec2a594e2cf25 Mon Sep 17 00:00:00 2001
From: Shuah Khan <shuahkh(a)osg.samsung.com>
Date: Fri, 22 Dec 2017 19:23:46 -0700
Subject: usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
Harden CMD_SUBMIT path to handle malicious input that could trigger
large memory allocations. Add checks to validate transfer_buffer_length
and number_of_packets to protect against bad input requesting for
unbounded memory allocations.
Signed-off-by: Shuah Khan <shuahkh(a)osg.samsung.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/usbip/vudc_rx.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/usb/usbip/vudc_rx.c b/drivers/usb/usbip/vudc_rx.c
index df1e30989148..1e8a23d92cb4 100644
--- a/drivers/usb/usbip/vudc_rx.c
+++ b/drivers/usb/usbip/vudc_rx.c
@@ -120,6 +120,25 @@ static int v_recv_cmd_submit(struct vudc *udc,
urb_p->new = 1;
urb_p->seqnum = pdu->base.seqnum;
+ if (urb_p->ep->type == USB_ENDPOINT_XFER_ISOC) {
+ /* validate packet size and number of packets */
+ unsigned int maxp, packets, bytes;
+
+ maxp = usb_endpoint_maxp(urb_p->ep->desc);
+ maxp *= usb_endpoint_maxp_mult(urb_p->ep->desc);
+ bytes = pdu->u.cmd_submit.transfer_buffer_length;
+ packets = DIV_ROUND_UP(bytes, maxp);
+
+ if (pdu->u.cmd_submit.number_of_packets < 0 ||
+ pdu->u.cmd_submit.number_of_packets > packets) {
+ dev_err(&udc->gadget.dev,
+ "CMD_SUBMIT: isoc invalid num packets %d\n",
+ pdu->u.cmd_submit.number_of_packets);
+ ret = -EMSGSIZE;
+ goto free_urbp;
+ }
+ }
+
ret = alloc_urb_from_cmd(&urb_p->urb, pdu, urb_p->ep->type);
if (ret) {
usbip_event_add(&udc->ud, VUDC_EVENT_ERROR_MALLOC);
--
2.15.1