This is a note to let you know that I've just added the patch titled
x86/mm/64: Fix reboot interaction with CR4.PCIDE
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 924c6b900cfdf376b07bccfd80e62b21914f8a5a Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sun, 8 Oct 2017 21:53:05 -0700
Subject: x86/mm/64: Fix reboot interaction with CR4.PCIDE
From: Andy Lutomirski <luto(a)kernel.org>
commit 924c6b900cfdf376b07bccfd80e62b21914f8a5a upstream.
Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set. (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.) The result is a GPF and
a hang instead of a reboot.
I didn't catch this in testing because neither my computer nor my VM
reboots this way. I can trigger it with reboot=bios, though.
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.15075247…
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/reboot.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -106,6 +106,10 @@ void __noreturn machine_real_restart(uns
load_cr3(initial_page_table);
#else
write_cr3(real_mode_header->trampoline_pgd);
+
+ /* Exiting long mode will fail if CR4.PCIDE is set. */
+ if (static_cpu_has(X86_FEATURE_PCID))
+ cr4_clear_bits(X86_CR4_PCIDE);
#endif
/* Jump to the identity-mapped low memory code */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
queue-4.9/x86-mm-enable-cr4.pcide-on-supported-systems.patch
queue-4.9/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.9/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.9/x86-mm-disable-pcid-on-32-bit-kernels.patch
queue-4.9/x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Add the 'nopcid' boot option to turn off PCID
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0790c9aad84901ca1bdc14746175549c8b5da215 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Thu, 29 Jun 2017 08:53:20 -0700
Subject: x86/mm: Add the 'nopcid' boot option to turn off PCID
From: Andy Lutomirski <luto(a)kernel.org>
commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream.
The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Nadav Amit <nadav.amit(a)gmail.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.149875120…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 2 ++
arch/x86/kernel/cpu/common.c | 18 ++++++++++++++++++
2 files changed, 20 insertions(+)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2795,6 +2795,8 @@ bytes respectively. Such letter suffixes
nopat [X86] Disable PAT (page attribute table extension of
pagetables) support.
+ nopcid [X86-64] Disable the PCID cpu feature.
+
norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -163,6 +163,24 @@ static int __init x86_mpx_setup(char *s)
}
__setup("nompx", x86_mpx_setup);
+#ifdef CONFIG_X86_64
+static int __init x86_pcid_setup(char *s)
+{
+ /* require an exact match without trailing characters */
+ if (strlen(s))
+ return 0;
+
+ /* do not emit a message if the feature is not present */
+ if (!boot_cpu_has(X86_FEATURE_PCID))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ pr_info("nopcid: PCID feature disabled\n");
+ return 1;
+}
+__setup("nopcid", x86_pcid_setup);
+#endif
+
static int __init x86_noinvpcid_setup(char *s)
{
/* noinvpcid doesn't accept parameters */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
queue-4.9/x86-mm-enable-cr4.pcide-on-supported-systems.patch
queue-4.9/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.9/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.9/x86-mm-disable-pcid-on-32-bit-kernels.patch
queue-4.9/x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Enable CR4.PCIDE on supported systems
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-enable-cr4.pcide-on-supported-systems.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 660da7c9228f685b2ebe664f9fd69aaddcc420b5 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Thu, 29 Jun 2017 08:53:21 -0700
Subject: x86/mm: Enable CR4.PCIDE on supported systems
From: Andy Lutomirski <luto(a)kernel.org>
commit 660da7c9228f685b2ebe664f9fd69aaddcc420b5 upstream.
We can use PCID if the CPU has PCID and PGE and we're not on Xen.
By itself, this has no effect. A followup patch will start using PCID.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Nadav Amit <nadav.amit(a)gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.149875120…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 8 ++++++++
arch/x86/kernel/cpu/common.c | 22 ++++++++++++++++++++++
arch/x86/xen/enlighten.c | 6 ++++++
3 files changed, 36 insertions(+)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -190,6 +190,14 @@ static inline void __flush_tlb_all(void)
__flush_tlb_global();
else
__flush_tlb();
+
+ /*
+ * Note: if we somehow had PCID but not PGE, then this wouldn't work --
+ * we'd end up flushing kernel translations for the current ASID but
+ * we might fail to flush kernel translations for other cached ASIDs.
+ *
+ * To avoid this issue, we force PCID off if PGE is off.
+ */
}
static inline void __flush_tlb_one(unsigned long addr)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -321,6 +321,25 @@ static __always_inline void setup_smap(s
}
}
+static void setup_pcid(struct cpuinfo_x86 *c)
+{
+ if (cpu_has(c, X86_FEATURE_PCID)) {
+ if (cpu_has(c, X86_FEATURE_PGE)) {
+ cr4_set_bits(X86_CR4_PCIDE);
+ } else {
+ /*
+ * flush_tlb_all(), as currently implemented, won't
+ * work if PCID is on but PGE is not. Since that
+ * combination doesn't exist on real hardware, there's
+ * no reason to try to fully support it, but it's
+ * polite to avoid corrupting data if we're on
+ * an improperly configured VM.
+ */
+ clear_cpu_cap(c, X86_FEATURE_PCID);
+ }
+ }
+}
+
/*
* Some CPU features depend on higher CPUID levels, which may not always
* be available due to CPUID level capping or broken virtualization
@@ -952,6 +971,9 @@ static void identify_cpu(struct cpuinfo_
setup_smep(c);
setup_smap(c);
+ /* Set up PCID */
+ setup_pcid(c);
+
/*
* The vendor-specific functions might have changed features.
* Now we do "generic changes."
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -433,6 +433,12 @@ static void __init xen_init_cpuid_mask(v
~((1 << X86_FEATURE_MTRR) | /* disable MTRR */
(1 << X86_FEATURE_ACC)); /* thermal monitoring */
+ /*
+ * Xen PV would need some work to support PCID: CR3 handling as well
+ * as xen_flush_tlb_others() would need updating.
+ */
+ cpuid_leaf1_ecx_mask &= ~(1 << X86_FEATURE_PCID); /* disable PCID */
+
if (!xen_initial_domain())
cpuid_leaf1_edx_mask &=
~((1 << X86_FEATURE_ACPI)); /* disable ACPI */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
queue-4.4/x86-mm-enable-cr4.pcide-on-supported-systems.patch
queue-4.4/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.4/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.4/x86-mm-disable-pcid-on-32-bit-kernels.patch
queue-4.4/x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
This is a note to let you know that I've just added the patch titled
x86/mm/64: Fix reboot interaction with CR4.PCIDE
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 924c6b900cfdf376b07bccfd80e62b21914f8a5a Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sun, 8 Oct 2017 21:53:05 -0700
Subject: x86/mm/64: Fix reboot interaction with CR4.PCIDE
From: Andy Lutomirski <luto(a)kernel.org>
commit 924c6b900cfdf376b07bccfd80e62b21914f8a5a upstream.
Trying to reboot via real mode fails with PCID on: long mode cannot
be exited while CR4.PCIDE is set. (No, I have no idea why, but the
SDM and actual CPUs are in agreement here.) The result is a GPF and
a hang instead of a reboot.
I didn't catch this in testing because neither my computer nor my VM
reboots this way. I can trigger it with reboot=bios, though.
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Reported-and-tested-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.15075247…
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/reboot.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -93,6 +93,10 @@ void __noreturn machine_real_restart(uns
load_cr3(initial_page_table);
#else
write_cr3(real_mode_header->trampoline_pgd);
+
+ /* Exiting long mode will fail if CR4.PCIDE is set. */
+ if (static_cpu_has(X86_FEATURE_PCID))
+ cr4_clear_bits(X86_CR4_PCIDE);
#endif
/* Jump to the identity-mapped low memory code */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
queue-4.4/x86-mm-enable-cr4.pcide-on-supported-systems.patch
queue-4.4/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.4/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.4/x86-mm-disable-pcid-on-32-bit-kernels.patch
queue-4.4/x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Add the 'nopcid' boot option to turn off PCID
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0790c9aad84901ca1bdc14746175549c8b5da215 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Thu, 29 Jun 2017 08:53:20 -0700
Subject: x86/mm: Add the 'nopcid' boot option to turn off PCID
From: Andy Lutomirski <luto(a)kernel.org>
commit 0790c9aad84901ca1bdc14746175549c8b5da215 upstream.
The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Nadav Amit <nadav.amit(a)gmail.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.149875120…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 2 ++
arch/x86/kernel/cpu/common.c | 18 ++++++++++++++++++
2 files changed, 20 insertions(+)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2555,6 +2555,8 @@ bytes respectively. Such letter suffixes
nopat [X86] Disable PAT (page attribute table extension of
pagetables) support.
+ nopcid [X86-64] Disable the PCID cpu feature.
+
norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -162,6 +162,24 @@ static int __init x86_mpx_setup(char *s)
}
__setup("nompx", x86_mpx_setup);
+#ifdef CONFIG_X86_64
+static int __init x86_pcid_setup(char *s)
+{
+ /* require an exact match without trailing characters */
+ if (strlen(s))
+ return 0;
+
+ /* do not emit a message if the feature is not present */
+ if (!boot_cpu_has(X86_FEATURE_PCID))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+ pr_info("nopcid: PCID feature disabled\n");
+ return 1;
+}
+__setup("nopcid", x86_pcid_setup);
+#endif
+
static int __init x86_noinvpcid_setup(char *s)
{
/* noinvpcid doesn't accept parameters */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
queue-4.4/x86-mm-enable-cr4.pcide-on-supported-systems.patch
queue-4.4/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.4/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.4/x86-mm-disable-pcid-on-32-bit-kernels.patch
queue-4.4/x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch
This is a note to let you know that I've just added the patch titled
kbuild: add '-fno-stack-check' to kernel build options
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kbuild-add-fno-stack-check-to-kernel-build-options.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3ce120b16cc548472f80cf8644f90eda958cf1b6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Fri, 29 Dec 2017 17:34:43 -0800
Subject: kbuild: add '-fno-stack-check' to kernel build options
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream.
It appears that hardened gentoo enables "-fstack-check" by default for
gcc.
That doesn't work _at_all_ for the kernel, because the kernel stack
doesn't act like a user stack at all: it's much smaller, and it doesn't
auto-expand on use. So the extra "probe one page below the stack" code
generated by -fstack-check just breaks the kernel in horrible ways,
causing infinite double faults etc.
[ I have to say, that the particular code gcc generates looks very
stupid even for user space where it works, but that's a separate
issue. ]
Reported-and-tested-by: Alexander Tsoy <alexander(a)tsoy.me>
Reported-and-tested-by: Toralf Förster <toralf.foerster(a)gmx.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Makefile | 3 +++
1 file changed, 3 insertions(+)
--- a/Makefile
+++ b/Makefile
@@ -788,6 +788,9 @@ KBUILD_CFLAGS += $(call cc-disable-warni
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
+# Make sure -fstack-check isn't enabled (like gentoo apparently did)
+KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,)
+
# conserve stack if available
KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.9/kbuild-add-fno-stack-check-to-kernel-build-options.patch
queue-4.9/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.9/x86-mm-disable-pcid-on-32-bit-kernels.patch
This is a note to let you know that I've just added the patch titled
kbuild: add '-fno-stack-check' to kernel build options
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kbuild-add-fno-stack-check-to-kernel-build-options.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3ce120b16cc548472f80cf8644f90eda958cf1b6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Fri, 29 Dec 2017 17:34:43 -0800
Subject: kbuild: add '-fno-stack-check' to kernel build options
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream.
It appears that hardened gentoo enables "-fstack-check" by default for
gcc.
That doesn't work _at_all_ for the kernel, because the kernel stack
doesn't act like a user stack at all: it's much smaller, and it doesn't
auto-expand on use. So the extra "probe one page below the stack" code
generated by -fstack-check just breaks the kernel in horrible ways,
causing infinite double faults etc.
[ I have to say, that the particular code gcc generates looks very
stupid even for user space where it works, but that's a separate
issue. ]
Reported-and-tested-by: Alexander Tsoy <alexander(a)tsoy.me>
Reported-and-tested-by: Toralf Förster <toralf.foerster(a)gmx.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Makefile | 3 +++
1 file changed, 3 insertions(+)
--- a/Makefile
+++ b/Makefile
@@ -782,6 +782,9 @@ KBUILD_CFLAGS += $(call cc-disable-warni
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
+# Make sure -fstack-check isn't enabled (like gentoo apparently did)
+KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,)
+
# conserve stack if available
KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.4/kbuild-add-fno-stack-check-to-kernel-build-options.patch
queue-4.4/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.4/x86-mm-disable-pcid-on-32-bit-kernels.patch
This is a note to let you know that I've just added the patch titled
kbuild: add '-fno-stack-check' to kernel build options
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kbuild-add-fno-stack-check-to-kernel-build-options.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3ce120b16cc548472f80cf8644f90eda958cf1b6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Fri, 29 Dec 2017 17:34:43 -0800
Subject: kbuild: add '-fno-stack-check' to kernel build options
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream.
It appears that hardened gentoo enables "-fstack-check" by default for
gcc.
That doesn't work _at_all_ for the kernel, because the kernel stack
doesn't act like a user stack at all: it's much smaller, and it doesn't
auto-expand on use. So the extra "probe one page below the stack" code
generated by -fstack-check just breaks the kernel in horrible ways,
causing infinite double faults etc.
[ I have to say, that the particular code gcc generates looks very
stupid even for user space where it works, but that's a separate
issue. ]
Reported-and-tested-by: Alexander Tsoy <alexander(a)tsoy.me>
Reported-and-tested-by: Toralf Förster <toralf.foerster(a)gmx.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Makefile | 3 +++
1 file changed, 3 insertions(+)
--- a/Makefile
+++ b/Makefile
@@ -802,6 +802,9 @@ KBUILD_CFLAGS += $(call cc-disable-warni
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
+# Make sure -fstack-check isn't enabled (like gentoo apparently did)
+KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,)
+
# conserve stack if available
KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/kbuild-add-fno-stack-check-to-kernel-build-options.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
kbuild: add '-fno-stack-check' to kernel build options
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kbuild-add-fno-stack-check-to-kernel-build-options.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3ce120b16cc548472f80cf8644f90eda958cf1b6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds(a)linux-foundation.org>
Date: Fri, 29 Dec 2017 17:34:43 -0800
Subject: kbuild: add '-fno-stack-check' to kernel build options
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream.
It appears that hardened gentoo enables "-fstack-check" by default for
gcc.
That doesn't work _at_all_ for the kernel, because the kernel stack
doesn't act like a user stack at all: it's much smaller, and it doesn't
auto-expand on use. So the extra "probe one page below the stack" code
generated by -fstack-check just breaks the kernel in horrible ways,
causing infinite double faults etc.
[ I have to say, that the particular code gcc generates looks very
stupid even for user space where it works, but that's a separate
issue. ]
Reported-and-tested-by: Alexander Tsoy <alexander(a)tsoy.me>
Reported-and-tested-by: Toralf Förster <toralf.foerster(a)gmx.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Makefile | 3 +++
1 file changed, 3 insertions(+)
--- a/Makefile
+++ b/Makefile
@@ -762,6 +762,9 @@ KBUILD_CFLAGS += $(call cc-disable-warni
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
+# Make sure -fstack-check isn't enabled (like gentoo apparently did)
+KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,)
+
# conserve stack if available
KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are
queue-3.18/kbuild-add-fno-stack-check-to-kernel-build-options.patch
I just discovered that commit 7042c2b9a19e74972f5783ab3b7ef98ebdee293f
in 4.14.4 completely breaks the external displays attached to my Dell
TB16 dock.
How/where should I report this? (Does the stable tree use the regular
kernel Bugzilla?)
Thanks!
--
========================================================================
Ian Pilcher arequipeno(a)gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================
From: Eric Biggers <ebiggers(a)google.com>
If a message sent to a PF_KEY socket ended with one of the extensions
that takes a 'struct sadb_address' but there were not enough bytes
remaining in the message for the ->sa_family member of the 'struct
sockaddr' which is supposed to follow, then verify_address_len() read
past the end of the message, into uninitialized memory. Fix it by
returning -EINVAL in this case.
This bug was found using syzkaller with KMSAN.
Reproducer:
#include <linux/pfkeyv2.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
char buf[24] = { 0 };
struct sadb_msg *msg = (void *)buf;
struct sadb_address *addr = (void *)(msg + 1);
msg->sadb_msg_version = PF_KEY_V2;
msg->sadb_msg_type = SADB_DELETE;
msg->sadb_msg_len = 3;
addr->sadb_address_len = 1;
addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;
write(sock, buf, 24);
}
Reported-by: Alexander Potapenko <glider(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
net/key/af_key.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 3dffb892d52c..596499cc8b2f 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -401,6 +401,11 @@ static int verify_address_len(const void *p)
#endif
int len;
+ if (sp->sadb_address_len <
+ DIV_ROUND_UP(sizeof(*sp) + offsetofend(typeof(*addr), sa_family),
+ sizeof(uint64_t)))
+ return -EINVAL;
+
switch (addr->sa_family) {
case AF_INET:
len = DIV_ROUND_UP(sizeof(*sp) + sizeof(*sin), sizeof(uint64_t));
--
2.15.1
I can confirm now, that that kernel breaks both a desktop (an ThinkPad T440s i5) and a headless server (i3930) setup. For the server the attached .config works fine but switching from CONFIG_GENERIC_CPU to CONFIG_MCORE2 legt them hang at boot w/op any messages. Similar picture at the desktop.
Both are stable Gentoo Linux hardened systems.
This issue seems to exist in mainline too, probably visible with d120cd749 (stable) and 9aaefe7b59 (upstream).
--
Toralf
PGP C4EACDDE 0076E94E
Hi Greg,
v4.4.y-stable-queue reference used to be v4.4.108-15-gbb44baf,
now it is v4.4.108. Did something get lost, or did you drop
the pending patches ?
Guenter
This is a note to let you know that I've just added the patch titled
x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ca6c99c0794875c6d1db6e22f246699691ab7e6b Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Mon, 22 May 2017 15:30:01 -0700
Subject: x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
From: Andy Lutomirski <luto(a)kernel.org>
commit ca6c99c0794875c6d1db6e22f246699691ab7e6b upstream.
flush_tlb_page() was very similar to flush_tlb_mm_range() except that
it had a couple of issues:
- It was missing an smp_mb() in the case where
current->active_mm != mm. (This is a longstanding bug reported by Nadav Amit)
- It was missing tracepoints and vm counter updates.
The only reason that I can see for keeping it at as a separate
function is that it could avoid a few branches that
flush_tlb_mm_range() needs to decide to flush just one page. This
hardly seems worthwhile. If we decide we want to get rid of those
branches again, a better way would be to introduce an
__flush_tlb_mm_range() helper and make both flush_tlb_page() and
flush_tlb_mm_range() use it.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.149549206…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 6 +++++-
arch/x86/mm/tlb.c | 27 ---------------------------
2 files changed, 5 insertions(+), 28 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -297,11 +297,15 @@ static inline void flush_tlb_kernel_rang
flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
extern void flush_tlb_all(void);
-extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
+{
+ flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, VM_NONE);
+}
+
void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long start, unsigned long end);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -354,33 +354,6 @@ out:
preempt_enable();
}
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
-{
- struct mm_struct *mm = vma->vm_mm;
-
- preempt_disable();
-
- if (current->active_mm == mm) {
- if (current->mm) {
- /*
- * Implicit full barrier (INVLPG) that synchronizes
- * with switch_mm.
- */
- __flush_tlb_one(start);
- } else {
- leave_mm(smp_processor_id());
-
- /* Synchronize with switch_mm. */
- smp_mb();
- }
- }
-
- if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(mm_cpumask(mm), mm, start, start + PAGE_SIZE);
-
- preempt_enable();
-}
-
static void do_flush_tlb_all(void *info)
{
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.9/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.9/x86-mm-disable-pcid-on-32-bit-kernels.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Disable PCID on 32-bit kernels
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-disable-pcid-on-32-bit-kernels.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From cba4671af7550e008f7a7835f06df0763825bf3e Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Thu, 29 Jun 2017 08:53:19 -0700
Subject: x86/mm: Disable PCID on 32-bit kernels
From: Andy Lutomirski <luto(a)kernel.org>
commit cba4671af7550e008f7a7835f06df0763825bf3e upstream.
32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode. Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Nadav Amit <nadav.amit(a)gmail.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.149875120…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/disabled-features.h | 4 +++-
arch/x86/kernel/cpu/bugs.c | 8 ++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -21,11 +21,13 @@
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
# define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
# define DISABLE_CENTAUR_MCR (1<<(X86_FEATURE_CENTAUR_MCR & 31))
+# define DISABLE_PCID 0
#else
# define DISABLE_VME 0
# define DISABLE_K6_MTRR 0
# define DISABLE_CYRIX_ARR 0
# define DISABLE_CENTAUR_MCR 0
+# define DISABLE_PCID (1<<(X86_FEATURE_PCID & 31))
#endif /* CONFIG_X86_64 */
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@@ -43,7 +45,7 @@
#define DISABLED_MASK1 0
#define DISABLED_MASK2 0
#define DISABLED_MASK3 (DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR)
-#define DISABLED_MASK4 0
+#define DISABLED_MASK4 (DISABLE_PCID)
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
#define DISABLED_MASK7 0
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -19,6 +19,14 @@
void __init check_bugs(void)
{
+#ifdef CONFIG_X86_32
+ /*
+ * Regardless of whether PCID is enumerated, the SDM says
+ * that it can't be enabled in 32-bit mode.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
+
identify_boot_cpu();
#ifndef CONFIG_SMP
pr_info("CPU: ");
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.9/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.9/x86-mm-disable-pcid-on-32-bit-kernels.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ca6c99c0794875c6d1db6e22f246699691ab7e6b Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Mon, 22 May 2017 15:30:01 -0700
Subject: x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
From: Andy Lutomirski <luto(a)kernel.org>
commit ca6c99c0794875c6d1db6e22f246699691ab7e6b upstream.
flush_tlb_page() was very similar to flush_tlb_mm_range() except that
it had a couple of issues:
- It was missing an smp_mb() in the case where
current->active_mm != mm. (This is a longstanding bug reported by Nadav Amit)
- It was missing tracepoints and vm counter updates.
The only reason that I can see for keeping it at as a separate
function is that it could avoid a few branches that
flush_tlb_mm_range() needs to decide to flush just one page. This
hardly seems worthwhile. If we decide we want to get rid of those
branches again, a better way would be to introduce an
__flush_tlb_mm_range() helper and make both flush_tlb_page() and
flush_tlb_mm_range() use it.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.149549206…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 6 +++++-
arch/x86/mm/tlb.c | 27 ---------------------------
2 files changed, 5 insertions(+), 28 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -296,11 +296,15 @@ static inline void flush_tlb_kernel_rang
flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
extern void flush_tlb_all(void);
-extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
+{
+ flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, VM_NONE);
+}
+
void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long start, unsigned long end);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -339,33 +339,6 @@ out:
preempt_enable();
}
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
-{
- struct mm_struct *mm = vma->vm_mm;
-
- preempt_disable();
-
- if (current->active_mm == mm) {
- if (current->mm) {
- /*
- * Implicit full barrier (INVLPG) that synchronizes
- * with switch_mm.
- */
- __flush_tlb_one(start);
- } else {
- leave_mm(smp_processor_id());
-
- /* Synchronize with switch_mm. */
- smp_mb();
- }
- }
-
- if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(mm_cpumask(mm), mm, start, start + PAGE_SIZE);
-
- preempt_enable();
-}
-
static void do_flush_tlb_all(void *info)
{
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.4/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.4/x86-mm-disable-pcid-on-32-bit-kernels.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Disable PCID on 32-bit kernels
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-disable-pcid-on-32-bit-kernels.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From cba4671af7550e008f7a7835f06df0763825bf3e Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Thu, 29 Jun 2017 08:53:19 -0700
Subject: x86/mm: Disable PCID on 32-bit kernels
From: Andy Lutomirski <luto(a)kernel.org>
commit cba4671af7550e008f7a7835f06df0763825bf3e upstream.
32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode. Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Nadav Amit <nadav.amit(a)gmail.com>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.149875120…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/disabled-features.h | 4 +++-
arch/x86/kernel/cpu/bugs.c | 8 ++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -21,11 +21,13 @@
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
# define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
# define DISABLE_CENTAUR_MCR (1<<(X86_FEATURE_CENTAUR_MCR & 31))
+# define DISABLE_PCID 0
#else
# define DISABLE_VME 0
# define DISABLE_K6_MTRR 0
# define DISABLE_CYRIX_ARR 0
# define DISABLE_CENTAUR_MCR 0
+# define DISABLE_PCID (1<<(X86_FEATURE_PCID & 31))
#endif /* CONFIG_X86_64 */
/*
@@ -35,7 +37,7 @@
#define DISABLED_MASK1 0
#define DISABLED_MASK2 0
#define DISABLED_MASK3 (DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR)
-#define DISABLED_MASK4 0
+#define DISABLED_MASK4 (DISABLE_PCID)
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
#define DISABLED_MASK7 0
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -19,6 +19,14 @@
void __init check_bugs(void)
{
+#ifdef CONFIG_X86_32
+ /*
+ * Regardless of whether PCID is enumerated, the SDM says
+ * that it can't be enabled in 32-bit mode.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
+
identify_boot_cpu();
#ifndef CONFIG_SMP
pr_info("CPU: ");
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.4/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.4/x86-mm-disable-pcid-on-32-bit-kernels.patch
This is a note to let you know that I've just added the patch titled
block: don't let passthrough IO go into .make_request_fn()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
block-don-t-let-passthrough-io-go-into-.make_request_fn.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei(a)redhat.com>
Date: Mon, 18 Dec 2017 15:40:43 +0800
Subject: block: don't let passthrough IO go into .make_request_fn()
From: Ming Lei <ming.lei(a)redhat.com>
commit 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed upstream.
Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries
to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
but ignores that passthrough I/O can use blk_queue_bounce() too.
Especially, passthrough IO may not be sector-aligned, and the check
of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
then cause the bio splitted, and the original passthrough bio is submited
to generic_make_request().
This patch fixes this issue by checking if the bio is passthrough IO,
and use bio_kmalloc() to allocate the cloned passthrough bio.
Cc: NeilBrown <neilb(a)suse.com>
Fixes: a8821f3f3("block: Improvements to bounce-buffer handling")
Tested-by: Michele Ballabio <barra_cuda(a)katamail.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
block/bounce.c | 6 ++++--
include/linux/blkdev.h | 21 +++++++++++++++++++--
2 files changed, 23 insertions(+), 4 deletions(-)
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -200,6 +200,7 @@ static void __blk_queue_bounce(struct re
unsigned i = 0;
bool bounce = false;
int sectors = 0;
+ bool passthrough = bio_is_passthrough(*bio_orig);
bio_for_each_segment(from, *bio_orig, iter) {
if (i++ < BIO_MAX_PAGES)
@@ -210,13 +211,14 @@ static void __blk_queue_bounce(struct re
if (!bounce)
return;
- if (sectors < bio_sectors(*bio_orig)) {
+ if (!passthrough && sectors < bio_sectors(*bio_orig)) {
bio = bio_split(*bio_orig, sectors, GFP_NOIO, bounce_bio_split);
bio_chain(bio, *bio_orig);
generic_make_request(*bio_orig);
*bio_orig = bio;
}
- bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set);
+ bio = bio_clone_bioset(*bio_orig, GFP_NOIO, passthrough ? NULL :
+ bounce_bio_set);
bio_for_each_segment_all(to, bio, i) {
struct page *page = to->bv_page;
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -241,14 +241,24 @@ struct request {
struct request *next_rq;
};
+static inline bool blk_op_is_scsi(unsigned int op)
+{
+ return op == REQ_OP_SCSI_IN || op == REQ_OP_SCSI_OUT;
+}
+
+static inline bool blk_op_is_private(unsigned int op)
+{
+ return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
+}
+
static inline bool blk_rq_is_scsi(struct request *rq)
{
- return req_op(rq) == REQ_OP_SCSI_IN || req_op(rq) == REQ_OP_SCSI_OUT;
+ return blk_op_is_scsi(req_op(rq));
}
static inline bool blk_rq_is_private(struct request *rq)
{
- return req_op(rq) == REQ_OP_DRV_IN || req_op(rq) == REQ_OP_DRV_OUT;
+ return blk_op_is_private(req_op(rq));
}
static inline bool blk_rq_is_passthrough(struct request *rq)
@@ -256,6 +266,13 @@ static inline bool blk_rq_is_passthrough
return blk_rq_is_scsi(rq) || blk_rq_is_private(rq);
}
+static inline bool bio_is_passthrough(struct bio *bio)
+{
+ unsigned op = bio_op(bio);
+
+ return blk_op_is_scsi(op) || blk_op_is_private(op);
+}
+
static inline unsigned short req_get_ioprio(struct request *req)
{
return req->ioprio;
Patches currently in stable-queue which might be from ming.lei(a)redhat.com are
queue-4.14/block-don-t-let-passthrough-io-go-into-.make_request_fn.patch
queue-4.14/block-fix-blk_rq_append_bio.patch
This is a note to let you know that I've just added the patch titled
block: fix blk_rq_append_bio
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
block-fix-blk_rq_append_bio.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0abc2a10389f0c9070f76ca906c7382788036b93 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 18 Dec 2017 15:40:44 +0800
Subject: block: fix blk_rq_append_bio
From: Jens Axboe <axboe(a)kernel.dk>
commit 0abc2a10389f0c9070f76ca906c7382788036b93 upstream.
Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.
Fixes: caa4b02476e3 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <hch(a)lst.de>
Reported-by: Michele Ballabio <barra_cuda(a)katamail.com>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: Michele Ballabio <barra_cuda(a)katamail.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
block/blk-map.c | 38 +++++++++++++++++++++----------------
drivers/scsi/osd/osd_initiator.c | 4 ++-
drivers/target/target_core_pscsi.c | 4 +--
include/linux/blkdev.h | 2 -
4 files changed, 28 insertions(+), 20 deletions(-)
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -12,22 +12,29 @@
#include "blk.h"
/*
- * Append a bio to a passthrough request. Only works can be merged into
- * the request based on the driver constraints.
+ * Append a bio to a passthrough request. Only works if the bio can be merged
+ * into the request based on the driver constraints.
*/
-int blk_rq_append_bio(struct request *rq, struct bio *bio)
+int blk_rq_append_bio(struct request *rq, struct bio **bio)
{
- blk_queue_bounce(rq->q, &bio);
+ struct bio *orig_bio = *bio;
+
+ blk_queue_bounce(rq->q, bio);
if (!rq->bio) {
- blk_rq_bio_prep(rq->q, rq, bio);
+ blk_rq_bio_prep(rq->q, rq, *bio);
} else {
- if (!ll_back_merge_fn(rq->q, rq, bio))
+ if (!ll_back_merge_fn(rq->q, rq, *bio)) {
+ if (orig_bio != *bio) {
+ bio_put(*bio);
+ *bio = orig_bio;
+ }
return -EINVAL;
+ }
- rq->biotail->bi_next = bio;
- rq->biotail = bio;
- rq->__data_len += bio->bi_iter.bi_size;
+ rq->biotail->bi_next = *bio;
+ rq->biotail = *bio;
+ rq->__data_len += (*bio)->bi_iter.bi_size;
}
return 0;
@@ -80,14 +87,12 @@ static int __blk_rq_map_user_iov(struct
* We link the bounce buffer in and could have to traverse it
* later so we have to get a ref to prevent it from being freed
*/
- ret = blk_rq_append_bio(rq, bio);
- bio_get(bio);
+ ret = blk_rq_append_bio(rq, &bio);
if (ret) {
- bio_endio(bio);
__blk_rq_unmap_user(orig_bio);
- bio_put(bio);
return ret;
}
+ bio_get(bio);
return 0;
}
@@ -220,7 +225,7 @@ int blk_rq_map_kern(struct request_queue
int reading = rq_data_dir(rq) == READ;
unsigned long addr = (unsigned long) kbuf;
int do_copy = 0;
- struct bio *bio;
+ struct bio *bio, *orig_bio;
int ret;
if (len > (queue_max_hw_sectors(q) << 9))
@@ -243,10 +248,11 @@ int blk_rq_map_kern(struct request_queue
if (do_copy)
rq->rq_flags |= RQF_COPY_USER;
- ret = blk_rq_append_bio(rq, bio);
+ orig_bio = bio;
+ ret = blk_rq_append_bio(rq, &bio);
if (unlikely(ret)) {
/* request is too big */
- bio_put(bio);
+ bio_put(orig_bio);
return ret;
}
--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -1576,7 +1576,9 @@ static struct request *_make_request(str
return req;
for_each_bio(bio) {
- ret = blk_rq_append_bio(req, bio);
+ struct bio *bounce_bio = bio;
+
+ ret = blk_rq_append_bio(req, &bounce_bio);
if (ret)
return ERR_PTR(ret);
}
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -920,7 +920,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);
- rc = blk_rq_append_bio(req, bio);
+ rc = blk_rq_append_bio(req, &bio);
if (rc) {
pr_err("pSCSI: failed to append bio\n");
goto fail;
@@ -938,7 +938,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct
}
if (bio) {
- rc = blk_rq_append_bio(req, bio);
+ rc = blk_rq_append_bio(req, &bio);
if (rc) {
pr_err("pSCSI: failed to append bio\n");
goto fail;
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,7 +952,7 @@ extern int blk_rq_prep_clone(struct requ
extern void blk_rq_unprep_clone(struct request *rq);
extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
struct request *rq);
-extern int blk_rq_append_bio(struct request *rq, struct bio *bio);
+extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
extern void blk_delay_queue(struct request_queue *, unsigned long);
extern void blk_queue_split(struct request_queue *, struct bio **);
extern void blk_recount_segments(struct request_queue *, struct bio *);
Patches currently in stable-queue which might be from axboe(a)kernel.dk are
queue-4.14/block-don-t-let-passthrough-io-go-into-.make_request_fn.patch
queue-4.14/block-fix-blk_rq_append_bio.patch
Hi,
there are two commits in master that should be cherry-picked for
4.14.x:
commit 0abc2a10389f0c9070f76ca906c7382788036b93
block: fix blk_rq_append_bio
commit 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed
block: don't let passthrough IO go into .make_request_fn()
These avoid oopses, especially on x86-32.
Thanks,
Michele Ballabio
This is a note to let you know that I've just added the patch titled
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 29961b59a51f8c6838a26a45e871a7ed6771809b Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sat, 22 Apr 2017 00:01:20 -0700
Subject: x86/mm: Remove flush_tlb() and flush_tlb_current_task()
From: Andy Lutomirski <luto(a)kernel.org>
commit 29961b59a51f8c6838a26a45e871a7ed6771809b upstream.
I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.149284437…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 9 ---------
arch/x86/mm/tlb.c | 17 -----------------
2 files changed, 26 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -205,7 +205,6 @@ static inline void __flush_tlb_one(unsig
/*
* TLB flushing:
*
- * - flush_tlb() flushes the current mm struct TLBs
* - flush_tlb_all() flushes all processes TLBs
* - flush_tlb_mm(mm) flushes the specified mm context TLB's
* - flush_tlb_page(vma, vmaddr) flushes one page
@@ -237,11 +236,6 @@ static inline void flush_tlb_all(void)
__flush_tlb_all();
}
-static inline void flush_tlb(void)
-{
- __flush_tlb_up();
-}
-
static inline void local_flush_tlb(void)
{
__flush_tlb_up();
@@ -303,14 +297,11 @@ static inline void flush_tlb_kernel_rang
flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
extern void flush_tlb_all(void);
-extern void flush_tlb_current_task(void);
extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
-#define flush_tlb() flush_tlb_current_task()
-
void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long start, unsigned long end);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -287,23 +287,6 @@ void native_flush_tlb_others(const struc
smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
}
-void flush_tlb_current_task(void)
-{
- struct mm_struct *mm = current->mm;
-
- preempt_disable();
-
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
-
- /* This is an implicit full barrier that synchronizes with switch_mm. */
- local_flush_tlb();
-
- trace_tlb_flush(TLB_LOCAL_SHOOTDOWN, TLB_FLUSH_ALL);
- if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
- preempt_enable();
-}
-
/*
* See Documentation/x86/tlb.txt for details. We choose 33
* because it is large enough to cover the vast majority (at
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
This is a note to let you know that I've just added the patch titled
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ccee2373f0658f234727700e619df097ba57023 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sat, 22 Apr 2017 00:01:19 -0700
Subject: x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
From: Andy Lutomirski <luto(a)kernel.org>
commit 9ccee2373f0658f234727700e619df097ba57023 upstream.
mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.
Compile-tested only because I don't know whether software that uses
this mechanism even exists.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Sasha Levin <sasha.levin(a)oracle.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.149284437…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/vm86_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -191,7 +191,7 @@ static void mark_screen_rdonly(struct mm
pte_unmap_unlock(pte, ptl);
out:
up_write(&mm->mmap_sem);
- flush_tlb();
+ flush_tlb_mm_range(mm, 0xA0000, 0xA0000 + 32*PAGE_SIZE, 0UL);
}
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Make flush_tlb_mm_range() more predictable
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-make-flush_tlb_mm_range-more-predictable.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ce27374fabf553153c3f53efcaa9bfab9216bd8c Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sat, 22 Apr 2017 00:01:21 -0700
Subject: x86/mm: Make flush_tlb_mm_range() more predictable
From: Andy Lutomirski <luto(a)kernel.org>
commit ce27374fabf553153c3f53efcaa9bfab9216bd8c upstream.
I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way. Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs. This seems to be
an accident, and preserving it will be awkward. Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.
The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.
As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Acked-by: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.149284437…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/tlb.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -307,6 +307,12 @@ void flush_tlb_mm_range(struct mm_struct
unsigned long base_pages_to_flush = TLB_FLUSH_ALL;
preempt_disable();
+
+ if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
+ base_pages_to_flush = (end - start) >> PAGE_SHIFT;
+ if (base_pages_to_flush > tlb_single_page_flush_ceiling)
+ base_pages_to_flush = TLB_FLUSH_ALL;
+
if (current->active_mm != mm) {
/* Synchronize with switch_mm. */
smp_mb();
@@ -323,15 +329,11 @@ void flush_tlb_mm_range(struct mm_struct
goto out;
}
- if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
- base_pages_to_flush = (end - start) >> PAGE_SHIFT;
-
/*
* Both branches below are implicit full barriers (MOV to CR or
* INVLPG) that synchronize with switch_mm.
*/
- if (base_pages_to_flush > tlb_single_page_flush_ceiling) {
- base_pages_to_flush = TLB_FLUSH_ALL;
+ if (base_pages_to_flush == TLB_FLUSH_ALL) {
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
local_flush_tlb();
} else {
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
This is a note to let you know that I've just added the patch titled
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ccee2373f0658f234727700e619df097ba57023 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sat, 22 Apr 2017 00:01:19 -0700
Subject: x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
From: Andy Lutomirski <luto(a)kernel.org>
commit 9ccee2373f0658f234727700e619df097ba57023 upstream.
mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.
Compile-tested only because I don't know whether software that uses
this mechanism even exists.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Sasha Levin <sasha.levin(a)oracle.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.149284437…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/vm86_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -187,7 +187,7 @@ static void mark_screen_rdonly(struct mm
pte_unmap_unlock(pte, ptl);
out:
up_write(&mm->mmap_sem);
- flush_tlb();
+ flush_tlb_mm_range(mm, 0xA0000, 0xA0000 + 32*PAGE_SIZE, 0UL);
}
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Make flush_tlb_mm_range() more predictable
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-make-flush_tlb_mm_range-more-predictable.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ce27374fabf553153c3f53efcaa9bfab9216bd8c Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sat, 22 Apr 2017 00:01:21 -0700
Subject: x86/mm: Make flush_tlb_mm_range() more predictable
From: Andy Lutomirski <luto(a)kernel.org>
commit ce27374fabf553153c3f53efcaa9bfab9216bd8c upstream.
I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way. Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs. This seems to be
an accident, and preserving it will be awkward. Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.
The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.
As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Acked-by: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.149284437…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/tlb.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -292,6 +292,12 @@ void flush_tlb_mm_range(struct mm_struct
unsigned long base_pages_to_flush = TLB_FLUSH_ALL;
preempt_disable();
+
+ if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
+ base_pages_to_flush = (end - start) >> PAGE_SHIFT;
+ if (base_pages_to_flush > tlb_single_page_flush_ceiling)
+ base_pages_to_flush = TLB_FLUSH_ALL;
+
if (current->active_mm != mm) {
/* Synchronize with switch_mm. */
smp_mb();
@@ -308,15 +314,11 @@ void flush_tlb_mm_range(struct mm_struct
goto out;
}
- if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
- base_pages_to_flush = (end - start) >> PAGE_SHIFT;
-
/*
* Both branches below are implicit full barriers (MOV to CR or
* INVLPG) that synchronize with switch_mm.
*/
- if (base_pages_to_flush > tlb_single_page_flush_ceiling) {
- base_pages_to_flush = TLB_FLUSH_ALL;
+ if (base_pages_to_flush == TLB_FLUSH_ALL) {
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
local_flush_tlb();
} else {
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 29961b59a51f8c6838a26a45e871a7ed6771809b Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sat, 22 Apr 2017 00:01:20 -0700
Subject: x86/mm: Remove flush_tlb() and flush_tlb_current_task()
From: Andy Lutomirski <luto(a)kernel.org>
commit 29961b59a51f8c6838a26a45e871a7ed6771809b upstream.
I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.149284437…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 9 ---------
arch/x86/mm/tlb.c | 17 -----------------
2 files changed, 26 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -204,7 +204,6 @@ static inline void __flush_tlb_one(unsig
/*
* TLB flushing:
*
- * - flush_tlb() flushes the current mm struct TLBs
* - flush_tlb_all() flushes all processes TLBs
* - flush_tlb_mm(mm) flushes the specified mm context TLB's
* - flush_tlb_page(vma, vmaddr) flushes one page
@@ -236,11 +235,6 @@ static inline void flush_tlb_all(void)
__flush_tlb_all();
}
-static inline void flush_tlb(void)
-{
- __flush_tlb_up();
-}
-
static inline void local_flush_tlb(void)
{
__flush_tlb_up();
@@ -302,14 +296,11 @@ static inline void flush_tlb_kernel_rang
flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
extern void flush_tlb_all(void);
-extern void flush_tlb_current_task(void);
extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
-#define flush_tlb() flush_tlb_current_task()
-
void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long start, unsigned long end);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -272,23 +272,6 @@ void native_flush_tlb_others(const struc
smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
}
-void flush_tlb_current_task(void)
-{
- struct mm_struct *mm = current->mm;
-
- preempt_disable();
-
- count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL);
-
- /* This is an implicit full barrier that synchronizes with switch_mm. */
- local_flush_tlb();
-
- trace_tlb_flush(TLB_LOCAL_SHOOTDOWN, TLB_FLUSH_ALL);
- if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
- flush_tlb_others(mm_cpumask(mm), mm, 0UL, TLB_FLUSH_ALL);
- preempt_enable();
-}
-
/*
* See Documentation/x86/tlb.txt for details. We choose 33
* because it is large enough to cover the vast majority (at
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.4/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.4/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
This is a note to let you know that I've just added the patch titled
ring-buffer: Mask out the info bits when returning buffer page length
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:32:35 -0500
Subject: ring-buffer: Mask out the info bits when returning buffer page length
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -280,6 +280,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -331,7 +333,9 @@ static void rb_init_page(struct buffer_d
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.9/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.9/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.9/ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
queue-4.9/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
ASoC: wm_adsp: Fix validation of firmware and coeff lengths
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-wm_adsp-fix-validation-of-firmware-and-coeff-lengths.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Date: Fri, 8 Dec 2017 16:15:20 +0000
Subject: ASoC: wm_adsp: Fix validation of firmware and coeff lengths
From: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
commit 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 upstream.
The checks for whether another region/block header could be present
are subtracting the size from the current offset. Obviously we should
instead subtract the offset from the size.
The checks for whether the region/block data fit in the file are
adding the data size to the current offset and header size, without
checking for integer overflow. Rearrange these so that overflow is
impossible.
Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Acked-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Tested-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/wm_adsp.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/sound/soc/codecs/wm_adsp.c
+++ b/sound/soc/codecs/wm_adsp.c
@@ -1465,7 +1465,7 @@ static int wm_adsp_load(struct wm_adsp *
le64_to_cpu(footer->timestamp));
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*region)) {
+ sizeof(*region) < firmware->size - pos) {
region = (void *)&(firmware->data[pos]);
region_name = "Unknown";
reg = 0;
@@ -1526,8 +1526,8 @@ static int wm_adsp_load(struct wm_adsp *
regions, le32_to_cpu(region->len), offset,
region_name);
- if ((pos + le32_to_cpu(region->len) + sizeof(*region)) >
- firmware->size) {
+ if (le32_to_cpu(region->len) >
+ firmware->size - pos - sizeof(*region)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, regions, region_name,
@@ -1992,7 +1992,7 @@ static int wm_adsp_load_coeff(struct wm_
blocks = 0;
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*blk)) {
+ sizeof(*blk) < firmware->size - pos) {
blk = (void *)(&firmware->data[pos]);
type = le16_to_cpu(blk->type);
@@ -2066,8 +2066,8 @@ static int wm_adsp_load_coeff(struct wm_
}
if (reg) {
- if ((pos + le32_to_cpu(blk->len) + sizeof(*blk)) >
- firmware->size) {
+ if (le32_to_cpu(blk->len) >
+ firmware->size - pos - sizeof(*blk)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, blocks, region_name,
Patches currently in stable-queue which might be from ben.hutchings(a)codethink.co.uk are
queue-4.9/asoc-wm_adsp-fix-validation-of-firmware-and-coeff-lengths.patch
This is a note to let you know that I've just added the patch titled
iw_cxgb4: Only validate the MSN for successful completions
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-only-validate-the-msn-for-successful-completions.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f55688c45442bc863f40ad678c638785b26cdce6 Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Mon, 18 Dec 2017 13:10:00 -0800
Subject: iw_cxgb4: Only validate the MSN for successful completions
From: Steve Wise <swise(a)opengridcomputing.com>
commit f55688c45442bc863f40ad678c638785b26cdce6 upstream.
If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/cq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -575,10 +575,10 @@ static int poll_cq(struct t4_wq *wq, str
ret = -EAGAIN;
goto skip_cqe;
}
- if (unlikely((CQE_WRID_MSN(hw_cqe) != (wq->rq.msn)))) {
+ if (unlikely(!CQE_STATUS(hw_cqe) &&
+ CQE_WRID_MSN(hw_cqe) != wq->rq.msn)) {
t4_set_wq_in_error(wq);
- hw_cqe->header |= htonl(CQE_STATUS_V(T4_ERR_MSN));
- goto proc_cqe;
+ hw_cqe->header |= cpu_to_be32(CQE_STATUS_V(T4_ERR_MSN));
}
goto proc_cqe;
}
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.9/iw_cxgb4-only-validate-the-msn-for-successful-completions.patch
This is a note to let you know that I've just added the patch titled
ASoC: twl4030: fix child-node lookup
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-twl4030-fix-child-node-lookup.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 13 Nov 2017 12:12:56 +0100
Subject: ASoC: twl4030: fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent codec node was also prematurely freed,
while the child node was leaked.
Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/twl4030.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/soc/codecs/twl4030.c
+++ b/sound/soc/codecs/twl4030.c
@@ -232,7 +232,7 @@ static struct twl4030_codec_data *twl403
struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev);
struct device_node *twl4030_codec_node = NULL;
- twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node,
+ twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node,
"codec");
if (!pdata && twl4030_codec_node) {
@@ -241,9 +241,11 @@ static struct twl4030_codec_data *twl403
GFP_KERNEL);
if (!pdata) {
dev_err(codec->dev, "Can not allocate memory\n");
+ of_node_put(twl4030_codec_node);
return NULL;
}
twl4030_setup_pdata_of(pdata, twl4030_codec_node);
+ of_node_put(twl4030_codec_node);
}
return pdata;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.9/asoc-twl4030-fix-child-node-lookup.patch
queue-4.9/asoc-da7218-fix-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
ASoC: tlv320aic31xx: Fix GPIO1 register definition
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-tlv320aic31xx-fix-gpio1-register-definition.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 737e0b7b67bdfe24090fab2852044bb283282fc5 Mon Sep 17 00:00:00 2001
From: "Andrew F. Davis" <afd(a)ti.com>
Date: Wed, 29 Nov 2017 15:32:46 -0600
Subject: ASoC: tlv320aic31xx: Fix GPIO1 register definition
From: Andrew F. Davis <afd(a)ti.com>
commit 737e0b7b67bdfe24090fab2852044bb283282fc5 upstream.
GPIO1 control register is number 51, fix this here.
Fixes: bafcbfe429eb ("ASoC: tlv320aic31xx: Make the register values human readable")
Signed-off-by: Andrew F. Davis <afd(a)ti.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/tlv320aic31xx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/tlv320aic31xx.h
+++ b/sound/soc/codecs/tlv320aic31xx.h
@@ -115,7 +115,7 @@ struct aic31xx_pdata {
/* INT2 interrupt control */
#define AIC31XX_INT2CTRL AIC31XX_REG(0, 49)
/* GPIO1 control */
-#define AIC31XX_GPIO1 AIC31XX_REG(0, 50)
+#define AIC31XX_GPIO1 AIC31XX_REG(0, 51)
#define AIC31XX_DACPRB AIC31XX_REG(0, 60)
/* ADC Instruction Set Register */
Patches currently in stable-queue which might be from afd(a)ti.com are
queue-4.9/asoc-tlv320aic31xx-fix-gpio1-register-definition.patch
This is a note to let you know that I've just added the patch titled
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-fsl_ssi-ac-97-ops-need-regmap-clock-and-cleaning-up-on-failure.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 695b78b548d8a26288f041e907ff17758df9e1d5 Mon Sep 17 00:00:00 2001
From: "Maciej S. Szmigiero" <mail(a)maciej.szmigiero.name>
Date: Mon, 20 Nov 2017 23:14:55 +0100
Subject: ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
From: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
commit 695b78b548d8a26288f041e907ff17758df9e1d5 upstream.
AC'97 ops (register read / write) need SSI regmap and clock, so they have
to be set after them.
We also need to set these ops back to NULL if we fail the probe.
Signed-off-by: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Acked-by: Nicolin Chen <nicoleotsuka(a)gmail.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/fsl/fsl_ssi.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1467,12 +1467,6 @@ static int fsl_ssi_probe(struct platform
sizeof(fsl_ssi_ac97_dai));
fsl_ac97_data = ssi_private;
-
- ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
- if (ret) {
- dev_err(&pdev->dev, "could not set AC'97 ops\n");
- return ret;
- }
} else {
/* Initialize this copy of the CPU DAI driver structure */
memcpy(&ssi_private->cpu_dai_drv, &fsl_ssi_dai_template,
@@ -1583,6 +1577,14 @@ static int fsl_ssi_probe(struct platform
return ret;
}
+ if (fsl_ssi_is_ac97(ssi_private)) {
+ ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "could not set AC'97 ops\n");
+ goto error_ac97_ops;
+ }
+ }
+
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_ssi_component,
&ssi_private->cpu_dai_drv, 1);
if (ret) {
@@ -1666,6 +1668,10 @@ error_sound_card:
fsl_ssi_debugfs_remove(&ssi_private->dbg_stats);
error_asoc_register:
+ if (fsl_ssi_is_ac97(ssi_private))
+ snd_soc_set_ac97_ops(NULL);
+
+error_ac97_ops:
if (ssi_private->soc->imx)
fsl_ssi_imx_clean(pdev, ssi_private);
Patches currently in stable-queue which might be from mail(a)maciej.szmigiero.name are
queue-4.9/asoc-fsl_ssi-ac-97-ops-need-regmap-clock-and-cleaning-up-on-failure.patch
This is a note to let you know that I've just added the patch titled
ASoC: da7218: fix fix child-node lookup
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-da7218-fix-fix-child-node-lookup.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From bc6476d6c1edcb9b97621b5131bd169aa81f27db Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 13 Nov 2017 12:12:55 +0100
Subject: ASoC: da7218: fix fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit bc6476d6c1edcb9b97621b5131bd169aa81f27db upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent codec node was also prematurely freed.
Fixes: 4d50934abd22 ("ASoC: da7218: Add da7218 codec driver")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Acked-by: Adam Thomson <Adam.Thomson.Opensource(a)diasemi.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/da7218.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/da7218.c
+++ b/sound/soc/codecs/da7218.c
@@ -2519,7 +2519,7 @@ static struct da7218_pdata *da7218_of_to
}
if (da7218->dev_id == DA7218_DEV_ID) {
- hpldet_np = of_find_node_by_name(np, "da7218_hpldet");
+ hpldet_np = of_get_child_by_name(np, "da7218_hpldet");
if (!hpldet_np)
return pdata;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.9/asoc-twl4030-fix-child-node-lookup.patch
queue-4.9/asoc-da7218-fix-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - fix headset mic detection issue on a Dell machine
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d Mon Sep 17 00:00:00 2001
From: Hui Wang <hui.wang(a)canonical.com>
Date: Fri, 22 Dec 2017 11:17:45 +0800
Subject: ALSA: hda - fix headset mic detection issue on a Dell machine
From: Hui Wang <hui.wang(a)canonical.com>
commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream.
It has the codec alc256, and add its pin definition to pin quirk
table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE.
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -5972,6 +5972,11 @@ static const struct snd_hda_pin_quirk al
{0x1b, 0x01011020},
{0x21, 0x02211010}),
SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
+ {0x12, 0x90a60130},
+ {0x14, 0x90170110},
+ {0x1b, 0x01011020},
+ {0x21, 0x0221101f}),
+ SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x12, 0x90a60160},
{0x14, 0x90170120},
{0x21, 0x02211030}),
Patches currently in stable-queue which might be from hui.wang(a)canonical.com are
queue-4.9/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Drop useless WARN_ON()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-drop-useless-warn_on.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a36c2638380c0a4676647a1f553b70b20d3ebce1 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Fri, 22 Dec 2017 10:45:07 +0100
Subject: ALSA: hda: Drop useless WARN_ON()
From: Takashi Iwai <tiwai(a)suse.de>
commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream.
Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915
pointer leftover in error path") cleared hdac_acomp pointer, the
WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give
a false-positive warning, as the function gets called no matter
whether the component is registered or not. For fixing it, let's get
rid of the spurious WARN_ON().
Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path")
Reported-by: Kouta Okamoto <kouta.okamoto(a)toshiba.co.jp>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/hda/hdac_i915.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/hda/hdac_i915.c
+++ b/sound/hda/hdac_i915.c
@@ -319,7 +319,7 @@ static int hdac_component_master_match(s
*/
int snd_hdac_i915_register_notifier(const struct i915_audio_component_audio_ops *aops)
{
- if (WARN_ON(!hdac_acomp))
+ if (!hdac_acomp)
return -ENODEV;
hdac_acomp->audio_ops = aops;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.9/alsa-hda-drop-useless-warn_on.patch
queue-4.9/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
This is a note to let you know that I've just added the patch titled
ring-buffer: Mask out the info bits when returning buffer page length
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:32:35 -0500
Subject: ring-buffer: Mask out the info bits when returning buffer page length
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -280,6 +280,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -331,7 +333,9 @@ static void rb_init_page(struct buffer_d
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.4/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.4/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.4/ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
queue-4.4/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
ASoC: twl4030: fix child-node lookup
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-twl4030-fix-child-node-lookup.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 13 Nov 2017 12:12:56 +0100
Subject: ASoC: twl4030: fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent codec node was also prematurely freed,
while the child node was leaked.
Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/twl4030.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/soc/codecs/twl4030.c
+++ b/sound/soc/codecs/twl4030.c
@@ -232,7 +232,7 @@ static struct twl4030_codec_data *twl403
struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev);
struct device_node *twl4030_codec_node = NULL;
- twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node,
+ twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node,
"codec");
if (!pdata && twl4030_codec_node) {
@@ -241,9 +241,11 @@ static struct twl4030_codec_data *twl403
GFP_KERNEL);
if (!pdata) {
dev_err(codec->dev, "Can not allocate memory\n");
+ of_node_put(twl4030_codec_node);
return NULL;
}
twl4030_setup_pdata_of(pdata, twl4030_codec_node);
+ of_node_put(twl4030_codec_node);
}
return pdata;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.4/mfd-twl4030-audio-fix-sibling-node-lookup.patch
queue-4.4/asoc-twl4030-fix-child-node-lookup.patch
queue-4.4/mfd-twl6040-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
iw_cxgb4: Only validate the MSN for successful completions
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-only-validate-the-msn-for-successful-completions.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f55688c45442bc863f40ad678c638785b26cdce6 Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Mon, 18 Dec 2017 13:10:00 -0800
Subject: iw_cxgb4: Only validate the MSN for successful completions
From: Steve Wise <swise(a)opengridcomputing.com>
commit f55688c45442bc863f40ad678c638785b26cdce6 upstream.
If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/cq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -579,10 +579,10 @@ static int poll_cq(struct t4_wq *wq, str
ret = -EAGAIN;
goto skip_cqe;
}
- if (unlikely((CQE_WRID_MSN(hw_cqe) != (wq->rq.msn)))) {
+ if (unlikely(!CQE_STATUS(hw_cqe) &&
+ CQE_WRID_MSN(hw_cqe) != wq->rq.msn)) {
t4_set_wq_in_error(wq);
- hw_cqe->header |= htonl(CQE_STATUS_V(T4_ERR_MSN));
- goto proc_cqe;
+ hw_cqe->header |= cpu_to_be32(CQE_STATUS_V(T4_ERR_MSN));
}
goto proc_cqe;
}
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.4/iw_cxgb4-only-validate-the-msn-for-successful-completions.patch
This is a note to let you know that I've just added the patch titled
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-fsl_ssi-ac-97-ops-need-regmap-clock-and-cleaning-up-on-failure.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 695b78b548d8a26288f041e907ff17758df9e1d5 Mon Sep 17 00:00:00 2001
From: "Maciej S. Szmigiero" <mail(a)maciej.szmigiero.name>
Date: Mon, 20 Nov 2017 23:14:55 +0100
Subject: ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
From: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
commit 695b78b548d8a26288f041e907ff17758df9e1d5 upstream.
AC'97 ops (register read / write) need SSI regmap and clock, so they have
to be set after them.
We also need to set these ops back to NULL if we fail the probe.
Signed-off-by: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Acked-by: Nicolin Chen <nicoleotsuka(a)gmail.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/fsl/fsl_ssi.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1408,12 +1408,6 @@ static int fsl_ssi_probe(struct platform
sizeof(fsl_ssi_ac97_dai));
fsl_ac97_data = ssi_private;
-
- ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
- if (ret) {
- dev_err(&pdev->dev, "could not set AC'97 ops\n");
- return ret;
- }
} else {
/* Initialize this copy of the CPU DAI driver structure */
memcpy(&ssi_private->cpu_dai_drv, &fsl_ssi_dai_template,
@@ -1473,6 +1467,14 @@ static int fsl_ssi_probe(struct platform
return ret;
}
+ if (fsl_ssi_is_ac97(ssi_private)) {
+ ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "could not set AC'97 ops\n");
+ goto error_ac97_ops;
+ }
+ }
+
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_ssi_component,
&ssi_private->cpu_dai_drv, 1);
if (ret) {
@@ -1556,6 +1558,10 @@ error_sound_card:
fsl_ssi_debugfs_remove(&ssi_private->dbg_stats);
error_asoc_register:
+ if (fsl_ssi_is_ac97(ssi_private))
+ snd_soc_set_ac97_ops(NULL);
+
+error_ac97_ops:
if (ssi_private->soc->imx)
fsl_ssi_imx_clean(pdev, ssi_private);
Patches currently in stable-queue which might be from mail(a)maciej.szmigiero.name are
queue-4.4/asoc-fsl_ssi-ac-97-ops-need-regmap-clock-and-cleaning-up-on-failure.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Drop useless WARN_ON()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-drop-useless-warn_on.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a36c2638380c0a4676647a1f553b70b20d3ebce1 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Fri, 22 Dec 2017 10:45:07 +0100
Subject: ALSA: hda: Drop useless WARN_ON()
From: Takashi Iwai <tiwai(a)suse.de>
commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream.
Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915
pointer leftover in error path") cleared hdac_acomp pointer, the
WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give
a false-positive warning, as the function gets called no matter
whether the component is registered or not. For fixing it, let's get
rid of the spurious WARN_ON().
Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path")
Reported-by: Kouta Okamoto <kouta.okamoto(a)toshiba.co.jp>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/hda/hdac_i915.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/hda/hdac_i915.c
+++ b/sound/hda/hdac_i915.c
@@ -183,7 +183,7 @@ static int hdac_component_master_match(s
*/
int snd_hdac_i915_register_notifier(const struct i915_audio_component_audio_ops *aops)
{
- if (WARN_ON(!hdac_acomp))
+ if (!hdac_acomp)
return -ENODEV;
hdac_acomp->audio_ops = aops;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.4/alsa-usb-audio-fix-the-missing-ctl-name-suffix-at-parsing-su.patch
queue-4.4/alsa-hda-drop-useless-warn_on.patch
queue-4.4/alsa-rawmidi-avoid-racy-info-ioctl-via-ctl-device.patch
queue-4.4/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
queue-4.4/acpi-apei-erst-fix-missing-error-handling-in-erst_reader.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - fix headset mic detection issue on a Dell machine
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d Mon Sep 17 00:00:00 2001
From: Hui Wang <hui.wang(a)canonical.com>
Date: Fri, 22 Dec 2017 11:17:45 +0800
Subject: ALSA: hda - fix headset mic detection issue on a Dell machine
From: Hui Wang <hui.wang(a)canonical.com>
commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream.
It has the codec alc256, and add its pin definition to pin quirk
table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE.
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -5954,6 +5954,11 @@ static const struct snd_hda_pin_quirk al
{0x1b, 0x01011020},
{0x21, 0x02211010}),
SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
+ {0x12, 0x90a60130},
+ {0x14, 0x90170110},
+ {0x1b, 0x01011020},
+ {0x21, 0x0221101f}),
+ SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x12, 0x90a60160},
{0x14, 0x90170120},
{0x21, 0x02211030}),
Patches currently in stable-queue which might be from hui.wang(a)canonical.com are
queue-4.4/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
This is a note to let you know that I've just added the patch titled
ring-buffer: Mask out the info bits when returning buffer page length
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:32:35 -0500
Subject: ring-buffer: Mask out the info bits when returning buffer page length
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -281,6 +281,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -332,7 +334,9 @@ static void rb_init_page(struct buffer_d
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.14/ring-buffer-do-no-reuse-reader-page-if-still-in-use.patch
queue-4.14/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.14/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.14/ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
queue-4.14/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
iw_cxgb4: Only validate the MSN for successful completions
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iw_cxgb4-only-validate-the-msn-for-successful-completions.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f55688c45442bc863f40ad678c638785b26cdce6 Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Mon, 18 Dec 2017 13:10:00 -0800
Subject: iw_cxgb4: Only validate the MSN for successful completions
From: Steve Wise <swise(a)opengridcomputing.com>
commit f55688c45442bc863f40ad678c638785b26cdce6 upstream.
If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/cxgb4/cq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -586,10 +586,10 @@ static int poll_cq(struct t4_wq *wq, str
ret = -EAGAIN;
goto skip_cqe;
}
- if (unlikely((CQE_WRID_MSN(hw_cqe) != (wq->rq.msn)))) {
+ if (unlikely(!CQE_STATUS(hw_cqe) &&
+ CQE_WRID_MSN(hw_cqe) != wq->rq.msn)) {
t4_set_wq_in_error(wq);
- hw_cqe->header |= htonl(CQE_STATUS_V(T4_ERR_MSN));
- goto proc_cqe;
+ hw_cqe->header |= cpu_to_be32(CQE_STATUS_V(T4_ERR_MSN));
}
goto proc_cqe;
}
Patches currently in stable-queue which might be from swise(a)opengridcomputing.com are
queue-4.14/iw_cxgb4-only-validate-the-msn-for-successful-completions.patch
This is a note to let you know that I've just added the patch titled
ring-buffer: Do no reuse reader page if still in use
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ring-buffer-do-no-reuse-reader-page-if-still-in-use.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ae415fa4c5248a8cf4faabd5a3c20576cb1ad607 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 21:19:29 -0500
Subject: ring-buffer: Do no reuse reader page if still in use
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit ae415fa4c5248a8cf4faabd5a3c20576cb1ad607 upstream.
To free the reader page that is allocated with ring_buffer_alloc_read_page(),
ring_buffer_free_read_page() must be called. For faster performance, this
page can be reused by the ring buffer to avoid having to free and allocate
new pages.
The issue arises when the page is used with a splice pipe into the
networking code. The networking code may up the page counter for the page,
and keep it active while sending it is queued to go to the network. The
incrementing of the page ref does not prevent it from being reused in the
ring buffer, and this can cause the page that is being sent out to the
network to be modified before it is sent by reading new data.
Add a check to the page ref counter, and only reuse the page if it is not
being used anywhere else.
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4443,8 +4443,13 @@ void ring_buffer_free_read_page(struct r
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
struct buffer_data_page *bpage = data;
+ struct page *page = virt_to_page(bpage);
unsigned long flags;
+ /* If the page is still in use someplace else, we can't reuse it */
+ if (page_ref_count(page) > 1)
+ goto out;
+
local_irq_save(flags);
arch_spin_lock(&cpu_buffer->lock);
@@ -4456,6 +4461,7 @@ void ring_buffer_free_read_page(struct r
arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags);
+ out:
free_page((unsigned long)bpage);
}
EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.14/ring-buffer-do-no-reuse-reader-page-if-still-in-use.patch
queue-4.14/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.14/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.14/ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
queue-4.14/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
IB/mlx5: Serialize access to the VMA list
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx5-serialize-access-to-the-vma-list.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ad9a3668a434faca1339789ed2f043d679199309 Mon Sep 17 00:00:00 2001
From: Majd Dibbiny <majd(a)mellanox.com>
Date: Sun, 24 Dec 2017 13:54:56 +0200
Subject: IB/mlx5: Serialize access to the VMA list
From: Majd Dibbiny <majd(a)mellanox.com>
commit ad9a3668a434faca1339789ed2f043d679199309 upstream.
User-space applications can do mmap and munmap directly at
any time.
Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.
Fixes: 7c2344c3bbf9 ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih(a)mellanox.com>
Signed-off-by: Majd Dibbiny <majd(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx5/main.c | 8 ++++++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 ++++
2 files changed, 12 insertions(+)
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1415,6 +1415,7 @@ static struct ib_ucontext *mlx5_ib_alloc
}
INIT_LIST_HEAD(&context->vma_private_list);
+ mutex_init(&context->vma_private_list_mutex);
INIT_LIST_HEAD(&context->db_page_list);
mutex_init(&context->db_page_mutex);
@@ -1576,7 +1577,9 @@ static void mlx5_ib_vma_close(struct vm
* mlx5_ib_disassociate_ucontext().
*/
mlx5_ib_vma_priv_data->vma = NULL;
+ mutex_lock(mlx5_ib_vma_priv_data->vma_private_list_mutex);
list_del(&mlx5_ib_vma_priv_data->list);
+ mutex_unlock(mlx5_ib_vma_priv_data->vma_private_list_mutex);
kfree(mlx5_ib_vma_priv_data);
}
@@ -1596,10 +1599,13 @@ static int mlx5_ib_set_vma_data(struct v
return -ENOMEM;
vma_prv->vma = vma;
+ vma_prv->vma_private_list_mutex = &ctx->vma_private_list_mutex;
vma->vm_private_data = vma_prv;
vma->vm_ops = &mlx5_ib_vm_ops;
+ mutex_lock(&ctx->vma_private_list_mutex);
list_add(&vma_prv->list, vma_head);
+ mutex_unlock(&ctx->vma_private_list_mutex);
return 0;
}
@@ -1642,6 +1648,7 @@ static void mlx5_ib_disassociate_ucontex
* mlx5_ib_vma_close.
*/
down_write(&owning_mm->mmap_sem);
+ mutex_lock(&context->vma_private_list_mutex);
list_for_each_entry_safe(vma_private, n, &context->vma_private_list,
list) {
vma = vma_private->vma;
@@ -1656,6 +1663,7 @@ static void mlx5_ib_disassociate_ucontex
list_del(&vma_private->list);
kfree(vma_private);
}
+ mutex_unlock(&context->vma_private_list_mutex);
up_write(&owning_mm->mmap_sem);
mmput(owning_mm);
put_task_struct(owning_process);
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -115,6 +115,8 @@ enum {
struct mlx5_ib_vma_private_data {
struct list_head list;
struct vm_area_struct *vma;
+ /* protect vma_private_list add/del */
+ struct mutex *vma_private_list_mutex;
};
struct mlx5_ib_ucontext {
@@ -129,6 +131,8 @@ struct mlx5_ib_ucontext {
/* Transport Domain number */
u32 tdn;
struct list_head vma_private_list;
+ /* protect vma_private_list add/del */
+ struct mutex vma_private_list_mutex;
unsigned long upd_xlt_page;
/* protect ODP/KSM */
Patches currently in stable-queue which might be from majd(a)mellanox.com are
queue-4.14/ib-mlx5-serialize-access-to-the-vma-list.patch
This is a note to let you know that I've just added the patch titled
IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-uverbs-fix-command-checking-as-part-of-ib_uverbs_ex_modify_qp.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 05d14e7b0c138cb07ba30e464f47b39434f3fdef Mon Sep 17 00:00:00 2001
From: Moni Shoua <monis(a)mellanox.com>
Date: Sun, 24 Dec 2017 13:54:57 +0200
Subject: IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
From: Moni Shoua <monis(a)mellanox.com>
commit 05d14e7b0c138cb07ba30e464f47b39434f3fdef upstream.
If the input command length is larger than the kernel supports an error should
be returned in case the unsupported bytes are not cleared, instead of the
other way aroudn. This matches what all other callers of ib_is_udata_cleared
do and will avoid user ABI problems in the future.
Fixes: 189aba99e700 ("IB/uverbs: Extend modify_qp and support packet pacing")
Reviewed-by: Yishai Hadas <yishaih(a)mellanox.com>
Signed-off-by: Moni Shoua <monis(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/core/uverbs_cmd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2085,8 +2085,8 @@ int ib_uverbs_ex_modify_qp(struct ib_uve
return -EOPNOTSUPP;
if (ucore->inlen > sizeof(cmd)) {
- if (ib_is_udata_cleared(ucore, sizeof(cmd),
- ucore->inlen - sizeof(cmd)))
+ if (!ib_is_udata_cleared(ucore, sizeof(cmd),
+ ucore->inlen - sizeof(cmd)))
return -EOPNOTSUPP;
}
Patches currently in stable-queue which might be from monis(a)mellanox.com are
queue-4.14/ib-core-verify-that-qp-is-security-enabled-in-create-and-destroy.patch
queue-4.14/ib-uverbs-fix-command-checking-as-part-of-ib_uverbs_ex_modify_qp.patch
This is a note to let you know that I've just added the patch titled
IB/core: Verify that QP is security enabled in create and destroy
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-core-verify-that-qp-is-security-enabled-in-create-and-destroy.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4a50881bbac309e6f0684816a180bc3c14e1485d Mon Sep 17 00:00:00 2001
From: Moni Shoua <monis(a)mellanox.com>
Date: Sun, 24 Dec 2017 13:54:58 +0200
Subject: IB/core: Verify that QP is security enabled in create and destroy
From: Moni Shoua <monis(a)mellanox.com>
commit 4a50881bbac309e6f0684816a180bc3c14e1485d upstream.
The XRC target QP create flow sets up qp_sec only if there is an IB link with
LSM security enabled. However, several other related uAPI entry points blindly
follow the qp_sec NULL pointer, resulting in a possible oops.
Check for NULL before using qp_sec.
Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Reviewed-by: Daniel Jurgens <danielj(a)mellanox.com>
Signed-off-by: Moni Shoua <monis(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/core/security.c | 3 +++
drivers/infiniband/core/verbs.c | 3 ++-
2 files changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -386,6 +386,9 @@ int ib_open_shared_qp_security(struct ib
if (ret)
return ret;
+ if (!qp->qp_sec)
+ return 0;
+
mutex_lock(&real_qp->qp_sec->mutex);
ret = check_qp_port_pkey_settings(real_qp->qp_sec->ports_pkeys,
qp->qp_sec);
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1400,7 +1400,8 @@ int ib_close_qp(struct ib_qp *qp)
spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags);
atomic_dec(&real_qp->usecnt);
- ib_close_shared_qp_security(qp->qp_sec);
+ if (qp->qp_sec)
+ ib_close_shared_qp_security(qp->qp_sec);
kfree(qp);
return 0;
Patches currently in stable-queue which might be from monis(a)mellanox.com are
queue-4.14/ib-core-verify-that-qp-is-security-enabled-in-create-and-destroy.patch
queue-4.14/ib-uverbs-fix-command-checking-as-part-of-ib_uverbs_ex_modify_qp.patch
This is a note to let you know that I've just added the patch titled
IB/hfi: Only read capability registers if the capability exists
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-hfi-only-read-capability-registers-if-the-capability-exists.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4c009af473b2026caaa26107e34d7cc68dad7756 Mon Sep 17 00:00:00 2001
From: "Michael J. Ruhl" <michael.j.ruhl(a)intel.com>
Date: Fri, 22 Dec 2017 08:47:20 -0800
Subject: IB/hfi: Only read capability registers if the capability exists
From: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
commit 4c009af473b2026caaa26107e34d7cc68dad7756 upstream.
During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump. Some of these registers are not available
in some circumstances (i.e. Virtual machines).
This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.
Delete unnecessary register read/write, and only access register if
the capability exists.
Fixes: a618b7e40af2 ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/hfi1/hfi.h | 1 -
drivers/infiniband/hw/hfi1/pcie.c | 30 ++++++++++++------------------
2 files changed, 12 insertions(+), 19 deletions(-)
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1129,7 +1129,6 @@ struct hfi1_devdata {
u16 pcie_lnkctl;
u16 pcie_devctl2;
u32 pci_msix0;
- u32 pci_lnkctl3;
u32 pci_tph2;
/*
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -411,15 +411,12 @@ int restore_pci_variables(struct hfi1_de
if (ret)
goto error;
- ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
- dd->pci_lnkctl3);
- if (ret)
- goto error;
-
- ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2);
- if (ret)
- goto error;
-
+ if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) {
+ ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2,
+ dd->pci_tph2);
+ if (ret)
+ goto error;
+ }
return 0;
error:
@@ -469,15 +466,12 @@ int save_pci_variables(struct hfi1_devda
if (ret)
goto error;
- ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
- &dd->pci_lnkctl3);
- if (ret)
- goto error;
-
- ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2);
- if (ret)
- goto error;
-
+ if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) {
+ ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2,
+ &dd->pci_tph2);
+ if (ret)
+ goto error;
+ }
return 0;
error:
Patches currently in stable-queue which might be from michael.j.ruhl(a)intel.com are
queue-4.14/ib-hfi-only-read-capability-registers-if-the-capability-exists.patch
This is a note to let you know that I've just added the patch titled
gpio: fix "gpio-line-names" property retrieval
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
gpio-fix-gpio-line-names-property-retrieval.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 822703354774ec935169cbbc8d503236bcb54fda Mon Sep 17 00:00:00 2001
From: Christophe Leroy <christophe.leroy(a)c-s.fr>
Date: Fri, 15 Dec 2017 15:02:33 +0100
Subject: gpio: fix "gpio-line-names" property retrieval
From: Christophe Leroy <christophe.leroy(a)c-s.fr>
commit 822703354774ec935169cbbc8d503236bcb54fda upstream.
Following commit 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names()
to use device property accessors"), "gpio-line-names" DT property is
not retrieved anymore when chip->parent is not set by the driver.
This is due to OF based property reads having been replaced by device
based property reads.
This patch fixes that by making use of
fwnode_property_read_string_array() instead of
device_property_read_string_array() and handing over either
of_fwnode_handle(chip->of_node) or dev_fwnode(chip->parent)
to that function.
Fixes: 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names() to use device property accessors")
Signed-off-by: Christophe Leroy <christophe.leroy(a)c-s.fr>
Acked-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpio/gpiolib-acpi.c | 2 +-
drivers/gpio/gpiolib-devprop.c | 17 +++++++----------
drivers/gpio/gpiolib-of.c | 3 ++-
drivers/gpio/gpiolib.h | 3 ++-
4 files changed, 12 insertions(+), 13 deletions(-)
--- a/drivers/gpio/gpiolib-acpi.c
+++ b/drivers/gpio/gpiolib-acpi.c
@@ -1074,7 +1074,7 @@ void acpi_gpiochip_add(struct gpio_chip
}
if (!chip->names)
- devprop_gpiochip_set_names(chip);
+ devprop_gpiochip_set_names(chip, dev_fwnode(chip->parent));
acpi_gpiochip_request_regions(acpi_gpio);
acpi_gpiochip_scan_gpios(acpi_gpio);
--- a/drivers/gpio/gpiolib-devprop.c
+++ b/drivers/gpio/gpiolib-devprop.c
@@ -19,30 +19,27 @@
/**
* devprop_gpiochip_set_names - Set GPIO line names using device properties
* @chip: GPIO chip whose lines should be named, if possible
+ * @fwnode: Property Node containing the gpio-line-names property
*
* Looks for device property "gpio-line-names" and if it exists assigns
* GPIO line names for the chip. The memory allocated for the assigned
* names belong to the underlying firmware node and should not be released
* by the caller.
*/
-void devprop_gpiochip_set_names(struct gpio_chip *chip)
+void devprop_gpiochip_set_names(struct gpio_chip *chip,
+ const struct fwnode_handle *fwnode)
{
struct gpio_device *gdev = chip->gpiodev;
const char **names;
int ret, i;
- if (!chip->parent) {
- dev_warn(&gdev->dev, "GPIO chip parent is NULL\n");
- return;
- }
-
- ret = device_property_read_string_array(chip->parent, "gpio-line-names",
+ ret = fwnode_property_read_string_array(fwnode, "gpio-line-names",
NULL, 0);
if (ret < 0)
return;
if (ret != gdev->ngpio) {
- dev_warn(chip->parent,
+ dev_warn(&gdev->dev,
"names %d do not match number of GPIOs %d\n", ret,
gdev->ngpio);
return;
@@ -52,10 +49,10 @@ void devprop_gpiochip_set_names(struct g
if (!names)
return;
- ret = device_property_read_string_array(chip->parent, "gpio-line-names",
+ ret = fwnode_property_read_string_array(fwnode, "gpio-line-names",
names, gdev->ngpio);
if (ret < 0) {
- dev_warn(chip->parent, "failed to read GPIO line names\n");
+ dev_warn(&gdev->dev, "failed to read GPIO line names\n");
kfree(names);
return;
}
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -493,7 +493,8 @@ int of_gpiochip_add(struct gpio_chip *ch
/* If the chip defines names itself, these take precedence */
if (!chip->names)
- devprop_gpiochip_set_names(chip);
+ devprop_gpiochip_set_names(chip,
+ of_fwnode_handle(chip->of_node));
of_node_get(chip->of_node);
--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -224,7 +224,8 @@ static inline int gpio_chip_hwgpio(const
return desc - &desc->gdev->descs[0];
}
-void devprop_gpiochip_set_names(struct gpio_chip *chip);
+void devprop_gpiochip_set_names(struct gpio_chip *chip,
+ const struct fwnode_handle *fwnode);
/* With descriptor prefix */
Patches currently in stable-queue which might be from christophe.leroy(a)c-s.fr are
queue-4.14/gpio-fix-gpio-line-names-property-retrieval.patch
This is a note to let you know that I've just added the patch titled
cpufreq: schedutil: Use idle_calls counter of the remote CPU
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
cpufreq-schedutil-use-idle_calls-counter-of-the-remote-cpu.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 466a2b42d67644447a1765276259a3ea5531ddff Mon Sep 17 00:00:00 2001
From: Joel Fernandes <joelaf(a)google.com>
Date: Thu, 21 Dec 2017 02:22:45 +0100
Subject: cpufreq: schedutil: Use idle_calls counter of the remote CPU
From: Joel Fernandes <joelaf(a)google.com>
commit 466a2b42d67644447a1765276259a3ea5531ddff upstream.
Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.
Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Joel Fernandes <joelaf(a)google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/tick.h | 1 +
kernel/sched/cpufreq_schedutil.c | 2 +-
kernel/time/tick-sched.c | 13 +++++++++++++
3 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -119,6 +119,7 @@ extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern unsigned long tick_nohz_get_idle_calls(void);
+extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -244,7 +244,7 @@ static void sugov_iowait_boost(struct su
#ifdef CONFIG_NO_HZ_COMMON
static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
{
- unsigned long idle_calls = tick_nohz_get_idle_calls();
+ unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
bool ret = idle_calls == sg_cpu->saved_idle_calls;
sg_cpu->saved_idle_calls = idle_calls;
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1010,6 +1010,19 @@ ktime_t tick_nohz_get_sleep_length(void)
}
/**
+ * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
+ * for a particular CPU.
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
+{
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
+
+ return ts->idle_calls;
+}
+
+/**
* tick_nohz_get_idle_calls - return the current idle calls counter value
*
* Called from the schedutil frequency scaling governor in scheduler context.
Patches currently in stable-queue which might be from joelaf(a)google.com are
queue-4.14/cpufreq-schedutil-use-idle_calls-counter-of-the-remote-cpu.patch
This is a note to let you know that I've just added the patch titled
ASoC: wm_adsp: Fix validation of firmware and coeff lengths
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-wm_adsp-fix-validation-of-firmware-and-coeff-lengths.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Date: Fri, 8 Dec 2017 16:15:20 +0000
Subject: ASoC: wm_adsp: Fix validation of firmware and coeff lengths
From: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
commit 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 upstream.
The checks for whether another region/block header could be present
are subtracting the size from the current offset. Obviously we should
instead subtract the offset from the size.
The checks for whether the region/block data fit in the file are
adding the data size to the current offset and header size, without
checking for integer overflow. Rearrange these so that overflow is
impossible.
Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Acked-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Tested-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/wm_adsp.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/sound/soc/codecs/wm_adsp.c
+++ b/sound/soc/codecs/wm_adsp.c
@@ -1733,7 +1733,7 @@ static int wm_adsp_load(struct wm_adsp *
le64_to_cpu(footer->timestamp));
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*region)) {
+ sizeof(*region) < firmware->size - pos) {
region = (void *)&(firmware->data[pos]);
region_name = "Unknown";
reg = 0;
@@ -1782,8 +1782,8 @@ static int wm_adsp_load(struct wm_adsp *
regions, le32_to_cpu(region->len), offset,
region_name);
- if ((pos + le32_to_cpu(region->len) + sizeof(*region)) >
- firmware->size) {
+ if (le32_to_cpu(region->len) >
+ firmware->size - pos - sizeof(*region)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, regions, region_name,
@@ -2253,7 +2253,7 @@ static int wm_adsp_load_coeff(struct wm_
blocks = 0;
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*blk)) {
+ sizeof(*blk) < firmware->size - pos) {
blk = (void *)(&firmware->data[pos]);
type = le16_to_cpu(blk->type);
@@ -2327,8 +2327,8 @@ static int wm_adsp_load_coeff(struct wm_
}
if (reg) {
- if ((pos + le32_to_cpu(blk->len) + sizeof(*blk)) >
- firmware->size) {
+ if (le32_to_cpu(blk->len) >
+ firmware->size - pos - sizeof(*blk)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, blocks, region_name,
Patches currently in stable-queue which might be from ben.hutchings(a)codethink.co.uk are
queue-4.14/asoc-wm_adsp-fix-validation-of-firmware-and-coeff-lengths.patch
This is a note to let you know that I've just added the patch titled
ASoC: twl4030: fix child-node lookup
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-twl4030-fix-child-node-lookup.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 13 Nov 2017 12:12:56 +0100
Subject: ASoC: twl4030: fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent codec node was also prematurely freed,
while the child node was leaked.
Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/twl4030.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/soc/codecs/twl4030.c
+++ b/sound/soc/codecs/twl4030.c
@@ -232,7 +232,7 @@ static struct twl4030_codec_data *twl403
struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev);
struct device_node *twl4030_codec_node = NULL;
- twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node,
+ twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node,
"codec");
if (!pdata && twl4030_codec_node) {
@@ -241,9 +241,11 @@ static struct twl4030_codec_data *twl403
GFP_KERNEL);
if (!pdata) {
dev_err(codec->dev, "Can not allocate memory\n");
+ of_node_put(twl4030_codec_node);
return NULL;
}
twl4030_setup_pdata_of(pdata, twl4030_codec_node);
+ of_node_put(twl4030_codec_node);
}
return pdata;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.14/asoc-twl4030-fix-child-node-lookup.patch
queue-4.14/asoc-da7218-fix-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
ASoC: da7218: fix fix child-node lookup
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-da7218-fix-fix-child-node-lookup.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From bc6476d6c1edcb9b97621b5131bd169aa81f27db Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 13 Nov 2017 12:12:55 +0100
Subject: ASoC: da7218: fix fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit bc6476d6c1edcb9b97621b5131bd169aa81f27db upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent codec node was also prematurely freed.
Fixes: 4d50934abd22 ("ASoC: da7218: Add da7218 codec driver")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Acked-by: Adam Thomson <Adam.Thomson.Opensource(a)diasemi.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/da7218.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/da7218.c
+++ b/sound/soc/codecs/da7218.c
@@ -2520,7 +2520,7 @@ static struct da7218_pdata *da7218_of_to
}
if (da7218->dev_id == DA7218_DEV_ID) {
- hpldet_np = of_find_node_by_name(np, "da7218_hpldet");
+ hpldet_np = of_get_child_by_name(np, "da7218_hpldet");
if (!hpldet_np)
return pdata;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.14/asoc-twl4030-fix-child-node-lookup.patch
queue-4.14/asoc-da7218-fix-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
ASoC: tlv320aic31xx: Fix GPIO1 register definition
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-tlv320aic31xx-fix-gpio1-register-definition.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 737e0b7b67bdfe24090fab2852044bb283282fc5 Mon Sep 17 00:00:00 2001
From: "Andrew F. Davis" <afd(a)ti.com>
Date: Wed, 29 Nov 2017 15:32:46 -0600
Subject: ASoC: tlv320aic31xx: Fix GPIO1 register definition
From: Andrew F. Davis <afd(a)ti.com>
commit 737e0b7b67bdfe24090fab2852044bb283282fc5 upstream.
GPIO1 control register is number 51, fix this here.
Fixes: bafcbfe429eb ("ASoC: tlv320aic31xx: Make the register values human readable")
Signed-off-by: Andrew F. Davis <afd(a)ti.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/tlv320aic31xx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/tlv320aic31xx.h
+++ b/sound/soc/codecs/tlv320aic31xx.h
@@ -116,7 +116,7 @@ struct aic31xx_pdata {
/* INT2 interrupt control */
#define AIC31XX_INT2CTRL AIC31XX_REG(0, 49)
/* GPIO1 control */
-#define AIC31XX_GPIO1 AIC31XX_REG(0, 50)
+#define AIC31XX_GPIO1 AIC31XX_REG(0, 51)
#define AIC31XX_DACPRB AIC31XX_REG(0, 60)
/* ADC Instruction Set Register */
Patches currently in stable-queue which might be from afd(a)ti.com are
queue-4.14/asoc-tlv320aic31xx-fix-gpio1-register-definition.patch
This is a note to let you know that I've just added the patch titled
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-fsl_ssi-ac-97-ops-need-regmap-clock-and-cleaning-up-on-failure.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 695b78b548d8a26288f041e907ff17758df9e1d5 Mon Sep 17 00:00:00 2001
From: "Maciej S. Szmigiero" <mail(a)maciej.szmigiero.name>
Date: Mon, 20 Nov 2017 23:14:55 +0100
Subject: ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
From: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
commit 695b78b548d8a26288f041e907ff17758df9e1d5 upstream.
AC'97 ops (register read / write) need SSI regmap and clock, so they have
to be set after them.
We also need to set these ops back to NULL if we fail the probe.
Signed-off-by: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Acked-by: Nicolin Chen <nicoleotsuka(a)gmail.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/fsl/fsl_ssi.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1452,12 +1452,6 @@ static int fsl_ssi_probe(struct platform
sizeof(fsl_ssi_ac97_dai));
fsl_ac97_data = ssi_private;
-
- ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
- if (ret) {
- dev_err(&pdev->dev, "could not set AC'97 ops\n");
- return ret;
- }
} else {
/* Initialize this copy of the CPU DAI driver structure */
memcpy(&ssi_private->cpu_dai_drv, &fsl_ssi_dai_template,
@@ -1568,6 +1562,14 @@ static int fsl_ssi_probe(struct platform
return ret;
}
+ if (fsl_ssi_is_ac97(ssi_private)) {
+ ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "could not set AC'97 ops\n");
+ goto error_ac97_ops;
+ }
+ }
+
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_ssi_component,
&ssi_private->cpu_dai_drv, 1);
if (ret) {
@@ -1651,6 +1653,10 @@ error_sound_card:
fsl_ssi_debugfs_remove(&ssi_private->dbg_stats);
error_asoc_register:
+ if (fsl_ssi_is_ac97(ssi_private))
+ snd_soc_set_ac97_ops(NULL);
+
+error_ac97_ops:
if (ssi_private->soc->imx)
fsl_ssi_imx_clean(pdev, ssi_private);
Patches currently in stable-queue which might be from mail(a)maciej.szmigiero.name are
queue-4.14/asoc-fsl_ssi-ac-97-ops-need-regmap-clock-and-cleaning-up-on-failure.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - Fix missing COEF init for ALC225/295/299
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-fix-missing-coef-init-for-alc225-295-299.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 44be77c590f381bc629815ac789b8b15ecc4ddcf Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Wed, 27 Dec 2017 08:53:59 +0100
Subject: ALSA: hda - Fix missing COEF init for ALC225/295/299
From: Takashi Iwai <tiwai(a)suse.de>
commit 44be77c590f381bc629815ac789b8b15ecc4ddcf upstream.
There was a long-standing problem on HP Spectre X360 with Kabylake
where it lacks of the front speaker output in some situations. Also
there are other products showing the similar behavior. The culprit
seems to be the missing COEF setup on ALC codecs, ALC225/295/299,
which are all compatible.
This patch adds the proper COEF setup (to initialize idx 0x67 / bits
0x3000) for addressing the issue.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195457
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -324,8 +324,12 @@ static void alc_fill_eapd_coef(struct hd
case 0x10ec0292:
alc_update_coef_idx(codec, 0x4, 1<<15, 0);
break;
- case 0x10ec0215:
case 0x10ec0225:
+ case 0x10ec0295:
+ case 0x10ec0299:
+ alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000);
+ /* fallthrough */
+ case 0x10ec0215:
case 0x10ec0233:
case 0x10ec0236:
case 0x10ec0255:
@@ -336,10 +340,8 @@ static void alc_fill_eapd_coef(struct hd
case 0x10ec0286:
case 0x10ec0288:
case 0x10ec0285:
- case 0x10ec0295:
case 0x10ec0298:
case 0x10ec0289:
- case 0x10ec0299:
alc_update_coef_idx(codec, 0x10, 1<<9, 0);
break;
case 0x10ec0275:
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-hda-drop-useless-warn_on.patch
queue-4.14/alsa-hda-fix-missing-coef-init-for-alc225-295-299.patch
queue-4.14/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
queue-4.14/alsa-hda-change-the-location-for-one-mic-on-a-lenovo-machine.patch
queue-4.14/alsa-hda-add-mic_no_presence-fixup-for-2-hp-machines.patch
This is a note to let you know that I've just added the patch titled
ASoC: codecs: msm8916-wcd: Fix supported formats
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-codecs-msm8916-wcd-fix-supported-formats.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 51f493ae71adc2c49a317a13c38e54e1cdf46005 Mon Sep 17 00:00:00 2001
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Date: Thu, 30 Nov 2017 10:15:02 +0000
Subject: ASoC: codecs: msm8916-wcd: Fix supported formats
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
commit 51f493ae71adc2c49a317a13c38e54e1cdf46005 upstream.
This codec is configurable for only 16 bit and 32 bit samples, so reflect
this in the supported formats also remove 24bit sample from supported list.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/msm8916-wcd-analog.c | 2 +-
sound/soc/codecs/msm8916-wcd-digital.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/sound/soc/codecs/msm8916-wcd-analog.c
+++ b/sound/soc/codecs/msm8916-wcd-analog.c
@@ -267,7 +267,7 @@
#define MSM8916_WCD_ANALOG_RATES (SNDRV_PCM_RATE_8000 | SNDRV_PCM_RATE_16000 |\
SNDRV_PCM_RATE_32000 | SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_ANALOG_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)
static int btn_mask = SND_JACK_BTN_0 | SND_JACK_BTN_1 |
SND_JACK_BTN_2 | SND_JACK_BTN_3 | SND_JACK_BTN_4;
--- a/sound/soc/codecs/msm8916-wcd-digital.c
+++ b/sound/soc/codecs/msm8916-wcd-digital.c
@@ -194,7 +194,7 @@
SNDRV_PCM_RATE_32000 | \
SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_DIGITAL_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)
struct msm8916_wcd_digital_priv {
struct clk *ahbclk, *mclk;
@@ -645,7 +645,7 @@ static int msm8916_wcd_digital_hw_params
RX_I2S_CTL_RX_I2S_MODE_MASK,
RX_I2S_CTL_RX_I2S_MODE_16);
break;
- case SNDRV_PCM_FORMAT_S24_LE:
+ case SNDRV_PCM_FORMAT_S32_LE:
snd_soc_update_bits(dai->codec, LPASS_CDC_CLK_TX_I2S_CTL,
TX_I2S_CTL_TX_I2S_MODE_MASK,
TX_I2S_CTL_TX_I2S_MODE_32);
Patches currently in stable-queue which might be from srinivas.kandagatla(a)linaro.org are
queue-4.14/asoc-codecs-msm8916-wcd-fix-supported-formats.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - change the location for one mic on a Lenovo machine
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-change-the-location-for-one-mic-on-a-lenovo-machine.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8da5bbfc7cbba909f4f32d5e1dda3750baa5d853 Mon Sep 17 00:00:00 2001
From: Hui Wang <hui.wang(a)canonical.com>
Date: Fri, 22 Dec 2017 11:17:46 +0800
Subject: ALSA: hda - change the location for one mic on a Lenovo machine
From: Hui Wang <hui.wang(a)canonical.com>
commit 8da5bbfc7cbba909f4f32d5e1dda3750baa5d853 upstream.
There are two front mics on this machine, and current driver assign
the same name Mic to both of them, but pulseaudio can't handle them.
As a workaround, we change the location for one of them, then the
driver will assign "Front Mic" and "Mic" for them.
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6305,6 +6305,7 @@ static const struct snd_pci_quirk alc269
SND_PCI_QUIRK(0x17aa, 0x30bb, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x30e2, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x310c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION),
+ SND_PCI_QUIRK(0x17aa, 0x313c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION),
SND_PCI_QUIRK(0x17aa, 0x3112, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x3902, "Lenovo E50-80", ALC269_FIXUP_DMIC_THINKPAD_ACPI),
SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC),
Patches currently in stable-queue which might be from hui.wang(a)canonical.com are
queue-4.14/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
queue-4.14/alsa-hda-change-the-location-for-one-mic-on-a-lenovo-machine.patch
queue-4.14/alsa-hda-add-mic_no_presence-fixup-for-2-hp-machines.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda: Drop useless WARN_ON()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-drop-useless-warn_on.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a36c2638380c0a4676647a1f553b70b20d3ebce1 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Fri, 22 Dec 2017 10:45:07 +0100
Subject: ALSA: hda: Drop useless WARN_ON()
From: Takashi Iwai <tiwai(a)suse.de>
commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream.
Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915
pointer leftover in error path") cleared hdac_acomp pointer, the
WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give
a false-positive warning, as the function gets called no matter
whether the component is registered or not. For fixing it, let's get
rid of the spurious WARN_ON().
Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path")
Reported-by: Kouta Okamoto <kouta.okamoto(a)toshiba.co.jp>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/hda/hdac_i915.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/hda/hdac_i915.c
+++ b/sound/hda/hdac_i915.c
@@ -325,7 +325,7 @@ static int hdac_component_master_match(s
*/
int snd_hdac_i915_register_notifier(const struct i915_audio_component_audio_ops *aops)
{
- if (WARN_ON(!hdac_acomp))
+ if (!hdac_acomp)
return -ENODEV;
hdac_acomp->audio_ops = aops;
Patches currently in stable-queue which might be from tiwai(a)suse.de are
queue-4.14/alsa-hda-drop-useless-warn_on.patch
queue-4.14/alsa-hda-fix-missing-coef-init-for-alc225-295-299.patch
queue-4.14/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
queue-4.14/alsa-hda-change-the-location-for-one-mic-on-a-lenovo-machine.patch
queue-4.14/alsa-hda-add-mic_no_presence-fixup-for-2-hp-machines.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - fix headset mic detection issue on a Dell machine
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d Mon Sep 17 00:00:00 2001
From: Hui Wang <hui.wang(a)canonical.com>
Date: Fri, 22 Dec 2017 11:17:45 +0800
Subject: ALSA: hda - fix headset mic detection issue on a Dell machine
From: Hui Wang <hui.wang(a)canonical.com>
commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream.
It has the codec alc256, and add its pin definition to pin quirk
table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE.
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_realtek.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6559,6 +6559,11 @@ static const struct snd_hda_pin_quirk al
{0x1b, 0x01011020},
{0x21, 0x02211010}),
SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
+ {0x12, 0x90a60130},
+ {0x14, 0x90170110},
+ {0x1b, 0x01011020},
+ {0x21, 0x0221101f}),
+ SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x12, 0x90a60160},
{0x14, 0x90170120},
{0x21, 0x02211030}),
Patches currently in stable-queue which might be from hui.wang(a)canonical.com are
queue-4.14/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
queue-4.14/alsa-hda-change-the-location-for-one-mic-on-a-lenovo-machine.patch
queue-4.14/alsa-hda-add-mic_no_presence-fixup-for-2-hp-machines.patch
This is a note to let you know that I've just added the patch titled
ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
alsa-hda-add-mic_no_presence-fixup-for-2-hp-machines.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 322f74ede933b3e2cb78768b6a6fdbfbf478a0c1 Mon Sep 17 00:00:00 2001
From: Hui Wang <hui.wang(a)canonical.com>
Date: Fri, 22 Dec 2017 11:17:44 +0800
Subject: ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines
From: Hui Wang <hui.wang(a)canonical.com>
commit 322f74ede933b3e2cb78768b6a6fdbfbf478a0c1 upstream.
There is a headset jack on the front panel, when we plug a headset
into it, the headset mic can't trigger unsol events, and
read_pin_sense() can't detect its presence too. So add this fixup
to fix this issue.
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/pci/hda/patch_conexant.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -271,6 +271,8 @@ enum {
CXT_FIXUP_HP_SPECTRE,
CXT_FIXUP_HP_GATE_MIC,
CXT_FIXUP_MUTE_LED_GPIO,
+ CXT_FIXUP_HEADSET_MIC,
+ CXT_FIXUP_HP_MIC_NO_PRESENCE,
};
/* for hda_fixup_thinkpad_acpi() */
@@ -350,6 +352,18 @@ static void cxt_fixup_headphone_mic(stru
}
}
+static void cxt_fixup_headset_mic(struct hda_codec *codec,
+ const struct hda_fixup *fix, int action)
+{
+ struct conexant_spec *spec = codec->spec;
+
+ switch (action) {
+ case HDA_FIXUP_ACT_PRE_PROBE:
+ spec->parse_flags |= HDA_PINCFG_HEADSET_MIC;
+ break;
+ }
+}
+
/* OPLC XO 1.5 fixup */
/* OLPC XO-1.5 supports DC input mode (e.g. for use with analog sensors)
@@ -880,6 +894,19 @@ static const struct hda_fixup cxt_fixups
.type = HDA_FIXUP_FUNC,
.v.func = cxt_fixup_mute_led_gpio,
},
+ [CXT_FIXUP_HEADSET_MIC] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = cxt_fixup_headset_mic,
+ },
+ [CXT_FIXUP_HP_MIC_NO_PRESENCE] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+ { 0x1a, 0x02a1113c },
+ { }
+ },
+ .chained = true,
+ .chain_id = CXT_FIXUP_HEADSET_MIC,
+ },
};
static const struct snd_pci_quirk cxt5045_fixups[] = {
@@ -934,6 +961,8 @@ static const struct snd_pci_quirk cxt506
SND_PCI_QUIRK(0x103c, 0x8115, "HP Z1 Gen3", CXT_FIXUP_HP_GATE_MIC),
SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO),
SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO),
+ SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN),
SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO),
SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410),
Patches currently in stable-queue which might be from hui.wang(a)canonical.com are
queue-4.14/alsa-hda-fix-headset-mic-detection-issue-on-a-dell-machine.patch
queue-4.14/alsa-hda-change-the-location-for-one-mic-on-a-lenovo-machine.patch
queue-4.14/alsa-hda-add-mic_no_presence-fixup-for-2-hp-machines.patch
This is a note to let you know that I've just added the patch titled
ring-buffer: Mask out the info bits when returning buffer page length
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:32:35 -0500
Subject: ring-buffer: Mask out the info bits when returning buffer page length
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -336,6 +336,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -387,7 +389,9 @@ static void rb_init_page(struct buffer_d
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-3.18/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-3.18/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-3.18/ring-buffer-mask-out-the-info-bits-when-returning-buffer-page-length.patch
queue-3.18/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
ASoC: twl4030: fix child-node lookup
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
asoc-twl4030-fix-child-node-lookup.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 13 Nov 2017 12:12:56 +0100
Subject: ASoC: twl4030: fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent codec node was also prematurely freed,
while the child node was leaked.
Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
sound/soc/codecs/twl4030.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/soc/codecs/twl4030.c
+++ b/sound/soc/codecs/twl4030.c
@@ -232,7 +232,7 @@ static struct twl4030_codec_data *twl403
struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev);
struct device_node *twl4030_codec_node = NULL;
- twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node,
+ twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node,
"codec");
if (!pdata && twl4030_codec_node) {
@@ -241,9 +241,11 @@ static struct twl4030_codec_data *twl403
GFP_KERNEL);
if (!pdata) {
dev_err(codec->dev, "Can not allocate memory\n");
+ of_node_put(twl4030_codec_node);
return NULL;
}
twl4030_setup_pdata_of(pdata, twl4030_codec_node);
+ of_node_put(twl4030_codec_node);
}
return pdata;
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-3.18/mfd-twl4030-audio-fix-sibling-node-lookup.patch
queue-3.18/asoc-twl4030-fix-child-node-lookup.patch
queue-3.18/mfd-twl6040-fix-child-node-lookup.patch
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d14587334580bc94d3ee11e8320e0c157f91ae8f Mon Sep 17 00:00:00 2001
From: Steve Wise <swise(a)opengridcomputing.com>
Date: Tue, 19 Dec 2017 14:02:10 -0800
Subject: [PATCH] iw_cxgb4: when flushing, complete all wrs in a chain
If a wr chain was posted and needed to be flushed, only the first
wr in the chain was completed with FLUSHED status. The rest were
never completed. This caused isert to hang on shutdown due to the
missing completions which left iscsi IO commands referenced, stalling
the shutdown.
Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable(a)vger.kernel.org
Signed-off-by: Steve Wise <swise(a)opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 21495f917bcc..d5c92fc520d6 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -858,6 +858,22 @@ static int complete_sq_drain_wr(struct c4iw_qp *qhp, struct ib_send_wr *wr)
return 0;
}
+static int complete_sq_drain_wrs(struct c4iw_qp *qhp, struct ib_send_wr *wr,
+ struct ib_send_wr **bad_wr)
+{
+ int ret = 0;
+
+ while (wr) {
+ ret = complete_sq_drain_wr(qhp, wr);
+ if (ret) {
+ *bad_wr = wr;
+ break;
+ }
+ wr = wr->next;
+ }
+ return ret;
+}
+
static void complete_rq_drain_wr(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
{
struct t4_cqe cqe = {};
@@ -890,6 +906,14 @@ static void complete_rq_drain_wr(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
}
}
+static void complete_rq_drain_wrs(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
+{
+ while (wr) {
+ complete_rq_drain_wr(qhp, wr);
+ wr = wr->next;
+ }
+}
+
int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
struct ib_send_wr **bad_wr)
{
@@ -913,7 +937,7 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
*/
if (qhp->wq.flushed) {
spin_unlock_irqrestore(&qhp->lock, flag);
- err = complete_sq_drain_wr(qhp, wr);
+ err = complete_sq_drain_wrs(qhp, wr, bad_wr);
return err;
}
num_wrs = t4_sq_avail(&qhp->wq);
@@ -1061,7 +1085,7 @@ int c4iw_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
*/
if (qhp->wq.flushed) {
spin_unlock_irqrestore(&qhp->lock, flag);
- complete_rq_drain_wr(qhp, wr);
+ complete_rq_drain_wrs(qhp, wr);
return err;
}
num_wrs = t4_rq_avail(&qhp->wq);
This is a note to let you know that I've just added the patch titled
x86/pti: Put the LDT in its own PGD if PTI is on
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f55f0501cbf65ec41cca5058513031b711730b1d Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Tue, 12 Dec 2017 07:56:45 -0800
Subject: x86/pti: Put the LDT in its own PGD if PTI is on
From: Andy Lutomirski <luto(a)kernel.org>
commit f55f0501cbf65ec41cca5058513031b711730b1d upstream.
With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
The LDT is per process, i.e. per mm.
An earlier approach mapped the LDT on context switch into a fixmap area,
but that's a big overhead and exhausted the fixmap space when NR_CPUS got
big.
Take advantage of the fact that there is an address space hole which
provides a completely unused pgd. Use this pgd to manage per-mm LDT
mappings.
This has a down side: the LDT isn't (currently) randomized, and an attack
that can write the LDT is instant root due to call gates (thanks, AMD, for
leaving call gates in AMD64 but designing them wrong so they're only useful
for exploits). This can be mitigated by making the LDT read-only or
randomizing the mapping, either of which is strightforward on top of this
patch.
This will significantly slow down LDT users, but that shouldn't matter for
important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
old libc implementations.
[ tglx: Cleaned it up. ]
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/x86/x86_64/mm.txt | 3
arch/x86/include/asm/mmu_context.h | 59 ++++++++++++-
arch/x86/include/asm/pgtable_64_types.h | 4
arch/x86/include/asm/processor.h | 23 +++--
arch/x86/kernel/ldt.c | 139 +++++++++++++++++++++++++++++++-
arch/x86/mm/dump_pagetables.c | 9 ++
6 files changed, 220 insertions(+), 17 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
+fffffe0000000000 - fffffe7fffffffff (=39 bits) LDT remap for PTI
fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
@@ -29,7 +30,7 @@ Virtual memory map with 5 level page tab
hole caused by [56:63] sign extension
ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) hole
+ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -50,10 +50,33 @@ struct ldt_struct {
* call gates. On native, we could merge the ldt_struct and LDT
* allocations, but it's not worth trying to optimize.
*/
- struct desc_struct *entries;
- unsigned int nr_entries;
+ struct desc_struct *entries;
+ unsigned int nr_entries;
+
+ /*
+ * If PTI is in use, then the entries array is not mapped while we're
+ * in user mode. The whole array will be aliased at the addressed
+ * given by ldt_slot_va(slot). We use two slots so that we can allocate
+ * and map, and enable a new LDT without invalidating the mapping
+ * of an older, still-in-use LDT.
+ *
+ * slot will be -1 if this LDT doesn't have an alias mapping.
+ */
+ int slot;
};
+/* This is a multiple of PAGE_SIZE. */
+#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
+
+static inline void *ldt_slot_va(int slot)
+{
+#ifdef CONFIG_X86_64
+ return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
+#else
+ BUG();
+#endif
+}
+
/*
* Used for LDT copy/destruction.
*/
@@ -64,6 +87,7 @@ static inline void init_new_context_ldt(
}
int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm);
void destroy_context_ldt(struct mm_struct *mm);
+void ldt_arch_exit_mmap(struct mm_struct *mm);
#else /* CONFIG_MODIFY_LDT_SYSCALL */
static inline void init_new_context_ldt(struct mm_struct *mm) { }
static inline int ldt_dup_context(struct mm_struct *oldmm,
@@ -71,7 +95,8 @@ static inline int ldt_dup_context(struct
{
return 0;
}
-static inline void destroy_context_ldt(struct mm_struct *mm) {}
+static inline void destroy_context_ldt(struct mm_struct *mm) { }
+static inline void ldt_arch_exit_mmap(struct mm_struct *mm) { }
#endif
static inline void load_mm_ldt(struct mm_struct *mm)
@@ -96,10 +121,31 @@ static inline void load_mm_ldt(struct mm
* that we can see.
*/
- if (unlikely(ldt))
- set_ldt(ldt->entries, ldt->nr_entries);
- else
+ if (unlikely(ldt)) {
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ if (WARN_ON_ONCE((unsigned long)ldt->slot > 1)) {
+ /*
+ * Whoops -- either the new LDT isn't mapped
+ * (if slot == -1) or is mapped into a bogus
+ * slot (if slot > 1).
+ */
+ clear_LDT();
+ return;
+ }
+
+ /*
+ * If page table isolation is enabled, ldt->entries
+ * will not be mapped in the userspace pagetables.
+ * Tell the CPU to access the LDT through the alias
+ * at ldt_slot_va(ldt->slot).
+ */
+ set_ldt(ldt_slot_va(ldt->slot), ldt->nr_entries);
+ } else {
+ set_ldt(ldt->entries, ldt->nr_entries);
+ }
+ } else {
clear_LDT();
+ }
#else
clear_LDT();
#endif
@@ -194,6 +240,7 @@ static inline int arch_dup_mmap(struct m
static inline void arch_exit_mmap(struct mm_struct *mm)
{
paravirt_arch_exit_mmap(mm);
+ ldt_arch_exit_mmap(mm);
}
#ifdef CONFIG_X86_64
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -82,10 +82,14 @@ typedef struct { pteval_t pte; } pte_t;
# define VMALLOC_SIZE_TB _AC(12800, UL)
# define __VMALLOC_BASE _AC(0xffa0000000000000, UL)
# define __VMEMMAP_BASE _AC(0xffd4000000000000, UL)
+# define LDT_PGD_ENTRY _AC(-112, UL)
+# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
#else
# define VMALLOC_SIZE_TB _AC(32, UL)
# define __VMALLOC_BASE _AC(0xffffc90000000000, UL)
# define __VMEMMAP_BASE _AC(0xffffea0000000000, UL)
+# define LDT_PGD_ENTRY _AC(-4, UL)
+# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
#endif
#ifdef CONFIG_RANDOMIZE_MEMORY
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -851,13 +851,22 @@ static inline void spin_lock_prefetch(co
#else
/*
- * User space process size. 47bits minus one guard page. The guard
- * page is necessary on Intel CPUs: if a SYSCALL instruction is at
- * the highest possible canonical userspace address, then that
- * syscall will enter the kernel with a non-canonical return
- * address, and SYSRET will explode dangerously. We avoid this
- * particular problem by preventing anything from being mapped
- * at the maximum canonical address.
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -24,6 +24,7 @@
#include <linux/uaccess.h>
#include <asm/ldt.h>
+#include <asm/tlb.h>
#include <asm/desc.h>
#include <asm/mmu_context.h>
#include <asm/syscalls.h>
@@ -51,13 +52,11 @@ static void refresh_ldt_segments(void)
static void flush_ldt(void *__mm)
{
struct mm_struct *mm = __mm;
- mm_context_t *pc;
if (this_cpu_read(cpu_tlbstate.loaded_mm) != mm)
return;
- pc = &mm->context;
- set_ldt(pc->ldt->entries, pc->ldt->nr_entries);
+ load_mm_ldt(mm);
refresh_ldt_segments();
}
@@ -94,10 +93,121 @@ static struct ldt_struct *alloc_ldt_stru
return NULL;
}
+ /* The new LDT isn't aliased for PTI yet. */
+ new_ldt->slot = -1;
+
new_ldt->nr_entries = num_entries;
return new_ldt;
}
+/*
+ * If PTI is enabled, this maps the LDT into the kernelmode and
+ * usermode tables for the given mm.
+ *
+ * There is no corresponding unmap function. Even if the LDT is freed, we
+ * leave the PTEs around until the slot is reused or the mm is destroyed.
+ * This is harmless: the LDT is always in ordinary memory, and no one will
+ * access the freed slot.
+ *
+ * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
+ * it useful, and the flush would slow down modify_ldt().
+ */
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ bool is_vmalloc, had_top_level_entry;
+ unsigned long va;
+ spinlock_t *ptl;
+ pgd_t *pgd;
+ int i;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return 0;
+
+ /*
+ * Any given ldt_struct should have map_ldt_struct() called at most
+ * once.
+ */
+ WARN_ON(ldt->slot != -1);
+
+ /*
+ * Did we already have the top level entry allocated? We can't
+ * use pgd_none() for this because it doens't do anything on
+ * 4-level page table kernels.
+ */
+ pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ had_top_level_entry = (pgd->pgd != 0);
+
+ is_vmalloc = is_vmalloc_addr(ldt->entries);
+
+ for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+ unsigned long offset = i << PAGE_SHIFT;
+ const void *src = (char *)ldt->entries + offset;
+ unsigned long pfn;
+ pte_t pte, *ptep;
+
+ va = (unsigned long)ldt_slot_va(slot) + offset;
+ pfn = is_vmalloc ? vmalloc_to_pfn(src) :
+ page_to_pfn(virt_to_page(src));
+ /*
+ * Treat the PTI LDT range as a *userspace* range.
+ * get_locked_pte() will allocate all needed pagetables
+ * and account for them in this mm.
+ */
+ ptep = get_locked_pte(mm, va, &ptl);
+ if (!ptep)
+ return -ENOMEM;
+ pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL));
+ set_pte_at(mm, va, ptep, pte);
+ pte_unmap_unlock(ptep, ptl);
+ }
+
+ if (mm->context.ldt) {
+ /*
+ * We already had an LDT. The top-level entry should already
+ * have been allocated and synchronized with the usermode
+ * tables.
+ */
+ WARN_ON(!had_top_level_entry);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
+ } else {
+ /*
+ * This is the first time we're mapping an LDT for this process.
+ * Sync the pgd to the usermode tables.
+ */
+ WARN_ON(had_top_level_entry);
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
+ set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+ }
+ }
+
+ va = (unsigned long)ldt_slot_va(slot);
+ flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
+
+ ldt->slot = slot;
+#endif
+ return 0;
+}
+
+static void free_ldt_pgtables(struct mm_struct *mm)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ struct mmu_gather tlb;
+ unsigned long start = LDT_BASE_ADDR;
+ unsigned long end = start + (1UL << PGDIR_SHIFT);
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ free_pgd_range(&tlb, start, end, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+#endif
+}
+
/* After calling this, the LDT is immutable. */
static void finalize_ldt_struct(struct ldt_struct *ldt)
{
@@ -156,6 +266,12 @@ int ldt_dup_context(struct mm_struct *ol
new_ldt->nr_entries * LDT_ENTRY_SIZE);
finalize_ldt_struct(new_ldt);
+ retval = map_ldt_struct(mm, new_ldt, 0);
+ if (retval) {
+ free_ldt_pgtables(mm);
+ free_ldt_struct(new_ldt);
+ goto out_unlock;
+ }
mm->context.ldt = new_ldt;
out_unlock:
@@ -174,6 +290,11 @@ void destroy_context_ldt(struct mm_struc
mm->context.ldt = NULL;
}
+void ldt_arch_exit_mmap(struct mm_struct *mm)
+{
+ free_ldt_pgtables(mm);
+}
+
static int read_ldt(void __user *ptr, unsigned long bytecount)
{
struct mm_struct *mm = current->mm;
@@ -287,6 +408,18 @@ static int write_ldt(void __user *ptr, u
new_ldt->entries[ldt_info.entry_number] = ldt;
finalize_ldt_struct(new_ldt);
+ /*
+ * If we are using PTI, map the new LDT into the userspace pagetables.
+ * If there is already an LDT, use the other slot so that other CPUs
+ * will continue to use the old LDT until install_ldt() switches
+ * them over to the new LDT.
+ */
+ error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0);
+ if (error) {
+ free_ldt_struct(old_ldt);
+ goto out_unlock;
+ }
+
install_ldt(mm, new_ldt);
free_ldt_struct(old_ldt);
error = 0;
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -52,12 +52,18 @@ enum address_markers_idx {
USER_SPACE_NR = 0,
KERNEL_SPACE_NR,
LOW_KERNEL_NR,
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && defined(CONFIG_X86_5LEVEL)
+ LDT_NR,
+#endif
VMALLOC_START_NR,
VMEMMAP_START_NR,
#ifdef CONFIG_KASAN
KASAN_SHADOW_START_NR,
KASAN_SHADOW_END_NR,
#endif
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL)
+ LDT_NR,
+#endif
CPU_ENTRY_AREA_NR,
#ifdef CONFIG_X86_ESPFIX64
ESPFIX_START_NR,
@@ -82,6 +88,9 @@ static struct addr_marker address_marker
[KASAN_SHADOW_START_NR] = { KASAN_SHADOW_START, "KASAN shadow" },
[KASAN_SHADOW_END_NR] = { KASAN_SHADOW_END, "KASAN shadow end" },
#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ [LDT_NR] = { LDT_BASE_ADDR, "LDT remap" },
+#endif
[CPU_ENTRY_AREA_NR] = { CPU_ENTRY_AREA_BASE,"CPU entry Area" },
#ifdef CONFIG_X86_ESPFIX64
[ESPFIX_START_NR] = { ESPFIX_BASE_ADDR, "ESPfix Area", 16 },
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/pti: Map the vsyscall page if needed
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-pti-map-the-vsyscall-page-if-needed.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 85900ea51577e31b186e523c8f4e068c79ecc7d3 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Tue, 12 Dec 2017 07:56:42 -0800
Subject: x86/pti: Map the vsyscall page if needed
From: Andy Lutomirski <luto(a)kernel.org>
commit 85900ea51577e31b186e523c8f4e068c79ecc7d3 upstream.
Make VSYSCALLs work fully in PTI mode by mapping them properly to the user
space visible page tables.
[ tglx: Hide unused functions (Patch by Arnd Bergmann) ]
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 6 +--
arch/x86/include/asm/vsyscall.h | 1
arch/x86/mm/pti.c | 65 ++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 3 deletions(-)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -344,14 +344,14 @@ int in_gate_area_no_mm(unsigned long add
* vsyscalls but leave the page not present. If so, we skip calling
* this.
*/
-static void __init set_vsyscall_pgtable_user_bits(void)
+void __init set_vsyscall_pgtable_user_bits(pgd_t *root)
{
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
- pgd = pgd_offset_k(VSYSCALL_ADDR);
+ pgd = pgd_offset_pgd(root, VSYSCALL_ADDR);
set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
p4d = p4d_offset(pgd, VSYSCALL_ADDR);
#if CONFIG_PGTABLE_LEVELS >= 5
@@ -373,7 +373,7 @@ void __init map_vsyscall(void)
vsyscall_mode == NATIVE
? PAGE_KERNEL_VSYSCALL
: PAGE_KERNEL_VVAR);
- set_vsyscall_pgtable_user_bits();
+ set_vsyscall_pgtable_user_bits(swapper_pg_dir);
}
BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -7,6 +7,7 @@
#ifdef CONFIG_X86_VSYSCALL_EMULATION
extern void map_vsyscall(void);
+extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
/*
* Called on instruction fetch fault in vsyscall page.
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -38,6 +38,7 @@
#include <asm/cpufeature.h>
#include <asm/hypervisor.h>
+#include <asm/vsyscall.h>
#include <asm/cmdline.h>
#include <asm/pti.h>
#include <asm/pgtable.h>
@@ -223,6 +224,69 @@ static pmd_t *pti_user_pagetable_walk_pm
return pmd_offset(pud, address);
}
+#ifdef CONFIG_X86_VSYSCALL_EMULATION
+/*
+ * Walk the shadow copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down. Does not support large pages.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables. It is never used for userspace data.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static __init pte_t *pti_user_pagetable_walk_pte(unsigned long address)
+{
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ pmd_t *pmd = pti_user_pagetable_walk_pmd(address);
+ pte_t *pte;
+
+ /* We can't do anything sensible if we hit a large mapping. */
+ if (pmd_large(*pmd)) {
+ WARN_ON(1);
+ return NULL;
+ }
+
+ if (pmd_none(*pmd)) {
+ unsigned long new_pte_page = __get_free_page(gfp);
+ if (!new_pte_page)
+ return NULL;
+
+ if (pmd_none(*pmd)) {
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ new_pte_page = 0;
+ }
+ if (new_pte_page)
+ free_page(new_pte_page);
+ }
+
+ pte = pte_offset_kernel(pmd, address);
+ if (pte_flags(*pte) & _PAGE_USER) {
+ WARN_ONCE(1, "attempt to walk to user pte\n");
+ return NULL;
+ }
+ return pte;
+}
+
+static void __init pti_setup_vsyscall(void)
+{
+ pte_t *pte, *target_pte;
+ unsigned int level;
+
+ pte = lookup_address(VSYSCALL_ADDR, &level);
+ if (!pte || WARN_ON(level != PG_LEVEL_4K) || pte_none(*pte))
+ return;
+
+ target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR);
+ if (WARN_ON(!target_pte))
+ return;
+
+ *target_pte = *pte;
+ set_vsyscall_pgtable_user_bits(kernel_to_user_pgdp(swapper_pg_dir));
+}
+#else
+static void __init pti_setup_vsyscall(void) { }
+#endif
+
static void __init
pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
{
@@ -319,4 +383,5 @@ void __init pti_init(void)
pti_clone_user_shared();
pti_clone_entry_text();
pti_setup_espfix64();
+ pti_setup_vsyscall();
}
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/pti: Add the pti= cmdline option and documentation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-pti-add-the-pti-cmdline-option-and-documentation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 41f4c20b57a4890ea7f56ff8717cc83fefb8d537 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Tue, 12 Dec 2017 14:39:52 +0100
Subject: x86/pti: Add the pti= cmdline option and documentation
From: Borislav Petkov <bp(a)suse.de>
commit 41f4c20b57a4890ea7f56ff8717cc83fefb8d537 upstream.
Keep the "nopti" optional for traditional reasons.
[ tglx: Don't allow force on when running on XEN PV and made 'on'
printout conditional ]
Requested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/admin-guide/kernel-parameters.txt | 6 +++++
arch/x86/mm/pti.c | 26 +++++++++++++++++++++++-
2 files changed, 31 insertions(+), 1 deletion(-)
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3255,6 +3255,12 @@
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64]
+ Control user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -54,21 +54,45 @@ static void __init pti_print_if_insecure
pr_info("%s\n", reason);
}
+static void __init pti_print_if_secure(const char *reason)
+{
+ if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ pr_info("%s\n", reason);
+}
+
void __init pti_check_boottime_disable(void)
{
+ char arg[5];
+ int ret;
+
if (hypervisor_is_type(X86_HYPER_XEN_PV)) {
pti_print_if_insecure("disabled on XEN PV.");
return;
}
+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (ret == 3 && !strncmp(arg, "off", 3)) {
+ pti_print_if_insecure("disabled on command line.");
+ return;
+ }
+ if (ret == 2 && !strncmp(arg, "on", 2)) {
+ pti_print_if_secure("force enabled on command line.");
+ goto enable;
+ }
+ if (ret == 4 && !strncmp(arg, "auto", 4))
+ goto autosel;
+ }
+
if (cmdline_find_option_bool(boot_command_line, "nopti")) {
pti_print_if_insecure("disabled on command line.");
return;
}
+autosel:
if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
return;
-
+enable:
setup_force_cpu_cap(X86_FEATURE_PTI);
}
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Use INVPCID for __native_flush_tlb_single()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6cff64b86aaaa07f89f50498055a20e45754b0c1 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:08:01 +0100
Subject: x86/mm: Use INVPCID for __native_flush_tlb_single()
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 6cff64b86aaaa07f89f50498055a20e45754b0c1 upstream.
This uses INVPCID to shoot down individual lines of the user mapping
instead of marking the entire user map as invalid. This
could/might/possibly be faster.
This for sure needs tlb_single_page_flush_ceiling to be redetermined;
esp. since INVPCID is _slow_.
A detailed performance analysis is available here:
https://lkml.kernel.org/r/3062e486-3539-8a1f-5724-16199420be71@intel.com
[ Peterz: Split out from big combo patch ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/tlbflush.h | 23 ++++++++++++-
arch/x86/mm/init.c | 64 +++++++++++++++++++++----------------
3 files changed, 60 insertions(+), 28 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,7 @@
#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */
#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */
#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 7) /* Effectively INVPCID && CR4.PCIDE=1 */
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -85,6 +85,18 @@ static inline u16 kern_pcid(u16 asid)
return asid + 1;
}
+/*
+ * The user PCID is just the kernel one, plus the "switch bit".
+ */
+static inline u16 user_pcid(u16 asid)
+{
+ u16 ret = kern_pcid(asid);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ ret |= 1 << X86_CR3_PTI_SWITCH_BIT;
+#endif
+ return ret;
+}
+
struct pgd_t;
static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
{
@@ -335,6 +347,8 @@ static inline void __native_flush_tlb_gl
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
invpcid_flush_all();
return;
@@ -368,7 +382,14 @@ static inline void __native_flush_tlb_si
if (!static_cpu_has(X86_FEATURE_PTI))
return;
- invalidate_user_asid(loaded_mm_asid);
+ /*
+ * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1.
+ * Just use invalidate_user_asid() in case we are called early.
+ */
+ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE))
+ invalidate_user_asid(loaded_mm_asid);
+ else
+ invpcid_flush_one(user_pcid(loaded_mm_asid), addr);
}
/*
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -203,34 +203,44 @@ static void __init probe_page_size_mask(
static void setup_pcid(void)
{
-#ifdef CONFIG_X86_64
- if (boot_cpu_has(X86_FEATURE_PCID)) {
- if (boot_cpu_has(X86_FEATURE_PGE)) {
- /*
- * This can't be cr4_set_bits_and_update_boot() --
- * the trampoline code can't handle CR4.PCIDE and
- * it wouldn't do any good anyway. Despite the name,
- * cr4_set_bits_and_update_boot() doesn't actually
- * cause the bits in question to remain set all the
- * way through the secondary boot asm.
- *
- * Instead, we brute-force it and set CR4.PCIDE
- * manually in start_secondary().
- */
- cr4_set_bits(X86_CR4_PCIDE);
- } else {
- /*
- * flush_tlb_all(), as currently implemented, won't
- * work if PCID is on but PGE is not. Since that
- * combination doesn't exist on real hardware, there's
- * no reason to try to fully support it, but it's
- * polite to avoid corrupting data if we're on
- * an improperly configured VM.
- */
- setup_clear_cpu_cap(X86_FEATURE_PCID);
- }
+ if (!IS_ENABLED(CONFIG_X86_64))
+ return;
+
+ if (!boot_cpu_has(X86_FEATURE_PCID))
+ return;
+
+ if (boot_cpu_has(X86_FEATURE_PGE)) {
+ /*
+ * This can't be cr4_set_bits_and_update_boot() -- the
+ * trampoline code can't handle CR4.PCIDE and it wouldn't
+ * do any good anyway. Despite the name,
+ * cr4_set_bits_and_update_boot() doesn't actually cause
+ * the bits in question to remain set all the way through
+ * the secondary boot asm.
+ *
+ * Instead, we brute-force it and set CR4.PCIDE manually in
+ * start_secondary().
+ */
+ cr4_set_bits(X86_CR4_PCIDE);
+
+ /*
+ * INVPCID's single-context modes (2/3) only work if we set
+ * X86_CR4_PCIDE, *and* we INVPCID support. It's unusable
+ * on systems that have X86_CR4_PCIDE clear, or that have
+ * no INVPCID support at all.
+ */
+ if (boot_cpu_has(X86_FEATURE_INVPCID))
+ setup_force_cpu_cap(X86_FEATURE_INVPCID_SINGLE);
+ } else {
+ /*
+ * flush_tlb_all(), as currently implemented, won't work if
+ * PCID is on but PGE is not. Since that combination
+ * doesn't exist on real hardware, there's no reason to try
+ * to fully support it, but it's polite to avoid corrupting
+ * data if we're on an improperly configured VM.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
}
-#endif
}
#ifdef CONFIG_X86_32
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Use/Fix PCID to optimize user/kernel switches
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6fd166aae78c0ab738d49bda653cbd9e3b1491cf Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 4 Dec 2017 15:07:59 +0100
Subject: x86/mm: Use/Fix PCID to optimize user/kernel switches
From: Peter Zijlstra <peterz(a)infradead.org>
commit 6fd166aae78c0ab738d49bda653cbd9e3b1491cf upstream.
We can use PCID to retain the TLBs across CR3 switches; including those now
part of the user/kernel switch. This increases performance of kernel
entry/exit at the cost of more expensive/complicated TLB flushing.
Now that we have two address spaces, one for kernel and one for user space,
we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID
(just like we use the PFN LSB for the PGD). Since we do TLB invalidation
from kernel space, the existing code will only invalidate the kernel PCID,
we augment that by marking the corresponding user PCID invalid, and upon
switching back to userspace, use a flushing CR3 write for the switch.
In order to access the user_pcid_flush_mask we use PER_CPU storage, which
means the previously established SWAPGS vs CR3 ordering is now mandatory
and required.
Having to do this memory access does require additional registers, most
sites have a functioning stack and we can spill one (RAX), sites without
functional stack need to otherwise provide the second scratch register.
Note: PCID is generally available on Intel Sandybridge and later CPUs.
Note: Up until this point TLB flushing was broken in this series.
Based-on-code-from: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/calling.h | 72 ++++++++++++++++++----
arch/x86/entry/entry_64.S | 9 +-
arch/x86/entry/entry_64_compat.S | 4 -
arch/x86/include/asm/processor-flags.h | 5 +
arch/x86/include/asm/tlbflush.h | 91 ++++++++++++++++++++++++----
arch/x86/include/uapi/asm/processor-flags.h | 7 +-
arch/x86/kernel/asm-offsets.c | 4 +
arch/x86/mm/init.c | 2
arch/x86/mm/tlb.c | 1
9 files changed, 162 insertions(+), 33 deletions(-)
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -3,6 +3,9 @@
#include <asm/unwind_hints.h>
#include <asm/cpufeatures.h>
#include <asm/page_types.h>
+#include <asm/percpu.h>
+#include <asm/asm-offsets.h>
+#include <asm/processor-flags.h>
/*
@@ -191,17 +194,21 @@ For 32-bit we have the following convent
#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */
-#define PTI_SWITCH_MASK (1<<PAGE_SHIFT)
+/*
+ * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
+ * halves:
+ */
+#define PTI_SWITCH_PGTABLES_MASK (1<<PAGE_SHIFT)
+#define PTI_SWITCH_MASK (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))
-.macro ADJUST_KERNEL_CR3 reg:req
- /* Clear "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
- andq $(~PTI_SWITCH_MASK), \reg
+.macro SET_NOFLUSH_BIT reg:req
+ bts $X86_CR3_PCID_NOFLUSH_BIT, \reg
.endm
-.macro ADJUST_USER_CR3 reg:req
- /* Move CR3 up a page to the user page tables: */
- orq $(PTI_SWITCH_MASK), \reg
+.macro ADJUST_KERNEL_CR3 reg:req
+ ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID
+ /* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
+ andq $(~PTI_SWITCH_MASK), \reg
.endm
.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
@@ -212,21 +219,58 @@ For 32-bit we have the following convent
.Lend_\@:
.endm
-.macro SWITCH_TO_USER_CR3 scratch_reg:req
+#define THIS_CPU_user_pcid_flush_mask \
+ PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
- ADJUST_USER_CR3 \scratch_reg
+
+ ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
+
+ /*
+ * Test if the ASID needs a flush.
+ */
+ movq \scratch_reg, \scratch_reg2
+ andq $(0x7FF), \scratch_reg /* mask ASID */
+ bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jnc .Lnoflush_\@
+
+ /* Flush needed, clear the bit */
+ btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ movq \scratch_reg2, \scratch_reg
+ jmp .Lwrcr3_\@
+
+.Lnoflush_\@:
+ movq \scratch_reg2, \scratch_reg
+ SET_NOFLUSH_BIT \scratch_reg
+
+.Lwrcr3_\@:
+ /* Flip the PGD and ASID to the user version */
+ orq $(PTI_SWITCH_MASK), \scratch_reg
mov \scratch_reg, %cr3
.Lend_\@:
.endm
+.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
+ pushq %rax
+ SWITCH_TO_USER_CR3_NOSTACK scratch_reg=\scratch_reg scratch_reg2=%rax
+ popq %rax
+.endm
+
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
movq %cr3, \scratch_reg
movq \scratch_reg, \save_reg
/*
- * Is the switch bit zero? This means the address is
- * up in real PAGE_TABLE_ISOLATION patches in a moment.
+ * Is the "switch mask" all zero? That means that both of
+ * these are zero:
+ *
+ * 1. The user/kernel PCID bit, and
+ * 2. The user/kernel "bit" that points CR3 to the
+ * bottom half of the 8k PGD
+ *
+ * That indicates a kernel CR3 value, not a user CR3.
*/
testq $(PTI_SWITCH_MASK), \scratch_reg
jz .Ldone_\@
@@ -251,7 +295,9 @@ For 32-bit we have the following convent
.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
.endm
-.macro SWITCH_TO_USER_CR3 scratch_reg:req
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
+.endm
+.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
.endm
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
.endm
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -23,7 +23,6 @@
#include <asm/segment.h>
#include <asm/cache.h>
#include <asm/errno.h>
-#include "calling.h"
#include <asm/asm-offsets.h>
#include <asm/msr.h>
#include <asm/unistd.h>
@@ -40,6 +39,8 @@
#include <asm/frame.h>
#include <linux/err.h>
+#include "calling.h"
+
.code64
.section .entry.text, "ax"
@@ -406,7 +407,7 @@ syscall_return_via_sysret:
* We are on the trampoline stack. All regs except RDI are live.
* We can do future final exit work right here.
*/
- SWITCH_TO_USER_CR3 scratch_reg=%rdi
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
popq %rdi
popq %rsp
@@ -744,7 +745,7 @@ GLOBAL(swapgs_restore_regs_and_return_to
* We can do future final exit work right here.
*/
- SWITCH_TO_USER_CR3 scratch_reg=%rdi
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
/* Restore RDI. */
popq %rdi
@@ -857,7 +858,7 @@ native_irq_return_ldt:
*/
orq PER_CPU_VAR(espfix_stack), %rax
- SWITCH_TO_USER_CR3 scratch_reg=%rdi /* to user CR3 */
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
SWAPGS /* to user GS */
popq %rdi /* Restore user RDI */
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -275,9 +275,9 @@ sysret32_from_system_call:
* switch until after after the last reference to the process
* stack.
*
- * %r8 is zeroed before the sysret, thus safe to clobber.
+ * %r8/%r9 are zeroed before the sysret, thus safe to clobber.
*/
- SWITCH_TO_USER_CR3 scratch_reg=%r8
+ SWITCH_TO_USER_CR3_NOSTACK scratch_reg=%r8 scratch_reg2=%r9
xorq %r8, %r8
xorq %r9, %r9
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -38,6 +38,11 @@
#define CR3_ADDR_MASK __sme_clr(0x7FFFFFFFFFFFF000ull)
#define CR3_PCID_MASK 0xFFFull
#define CR3_NOFLUSH BIT_ULL(63)
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_SWITCH_BIT 11
+#endif
+
#else
/*
* CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -10,6 +10,8 @@
#include <asm/special_insns.h>
#include <asm/smp.h>
#include <asm/invpcid.h>
+#include <asm/pti.h>
+#include <asm/processor-flags.h>
static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
{
@@ -24,24 +26,54 @@ static inline u64 inc_mm_tlb_gen(struct
/* There are 12 bits of space for ASIDS in CR3 */
#define CR3_HW_ASID_BITS 12
+
/*
* When enabled, PAGE_TABLE_ISOLATION consumes a single bit for
* user/kernel switches
*/
-#define PTI_CONSUMED_ASID_BITS 0
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define PTI_CONSUMED_PCID_BITS 1
+#else
+# define PTI_CONSUMED_PCID_BITS 0
+#endif
+
+#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS)
-#define CR3_AVAIL_ASID_BITS (CR3_HW_ASID_BITS - PTI_CONSUMED_ASID_BITS)
/*
* ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account
* for them being zero-based. Another -1 is because ASID 0 is reserved for
* use by non-PCID-aware users.
*/
-#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2)
+#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
+
+/*
+ * 6 because 6 should be plenty and struct tlb_state will fit in two cache
+ * lines.
+ */
+#define TLB_NR_DYN_ASIDS 6
static inline u16 kern_pcid(u16 asid)
{
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
/*
+ * Make sure that the dynamic ASID space does not confict with the
+ * bit we are using to switch between user and kernel ASIDs.
+ */
+ BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT));
+
+ /*
+ * The ASID being passed in here should have respected the
+ * MAX_ASID_AVAILABLE and thus never have the switch bit set.
+ */
+ VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT));
+#endif
+ /*
+ * The dynamically-assigned ASIDs that get passed in are small
+ * (<TLB_NR_DYN_ASIDS). They never have the high switch bit set,
+ * so do not bother to clear it.
+ *
* If PCID is on, ASID-aware code paths put the ASID+1 into the
* PCID bits. This serves two purposes. It prevents a nasty
* situation in which PCID-unaware code saves CR3, loads some other
@@ -95,12 +127,6 @@ static inline bool tlb_defer_switch_to_i
return !static_cpu_has(X86_FEATURE_PCID);
}
-/*
- * 6 because 6 should be plenty and struct tlb_state will fit in
- * two cache lines.
- */
-#define TLB_NR_DYN_ASIDS 6
-
struct tlb_context {
u64 ctx_id;
u64 tlb_gen;
@@ -146,6 +172,13 @@ struct tlb_state {
bool invalidate_other;
/*
+ * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
+ * the corresponding user PCID needs a flush next time we
+ * switch to it; see SWITCH_TO_USER_CR3.
+ */
+ unsigned short user_pcid_flush_mask;
+
+ /*
* Access to this CR4 shadow and to H/W CR4 is protected by
* disabling interrupts when modifying either one.
*/
@@ -250,14 +283,41 @@ static inline void cr4_set_bits_and_upda
extern void initialize_tlbstate_and_flush(void);
/*
+ * Given an ASID, flush the corresponding user ASID. We can delay this
+ * until the next time we switch to it.
+ *
+ * See SWITCH_TO_USER_CR3.
+ */
+static inline void invalidate_user_asid(u16 asid)
+{
+ /* There is no user ASID if address space separation is off */
+ if (!IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ return;
+
+ /*
+ * We only have a single ASID if PCID is off and the CR3
+ * write will have flushed it.
+ */
+ if (!cpu_feature_enabled(X86_FEATURE_PCID))
+ return;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ __set_bit(kern_pcid(asid),
+ (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
+}
+
+/*
* flush the entire current user mapping
*/
static inline void __native_flush_tlb(void)
{
+ invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/*
- * If current->mm == NULL then we borrow a mm which may change during a
- * task switch and therefore we must not be preempted while we write CR3
- * back:
+ * If current->mm == NULL then we borrow a mm which may change
+ * during a task switch and therefore we must not be preempted
+ * while we write CR3 back:
*/
preempt_disable();
native_write_cr3(__native_read_cr3());
@@ -301,7 +361,14 @@ static inline void __native_flush_tlb_gl
*/
static inline void __native_flush_tlb_single(unsigned long addr)
{
+ u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ invalidate_user_asid(loaded_mm_asid);
}
/*
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -78,7 +78,12 @@
#define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT)
#define X86_CR3_PCD_BIT 4 /* Page Cache Disable */
#define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */
+
+#define X86_CR3_PCID_BITS 12
+#define X86_CR3_PCID_MASK (_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
+
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
/*
* Intel CPU features in CR4
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -17,6 +17,7 @@
#include <asm/sigframe.h>
#include <asm/bootparam.h>
#include <asm/suspend.h>
+#include <asm/tlbflush.h>
#ifdef CONFIG_XEN
#include <xen/interface/xen.h>
@@ -94,6 +95,9 @@ void common(void) {
BLANK();
DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
+ /* TLB state for the entry code */
+ OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);
+
/* Layout info for cpu_entry_area */
OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss);
OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -855,7 +855,7 @@ void __init zone_sizes_init(void)
free_area_init_nodes(max_zone_pfns);
}
-DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
.loaded_mm = &init_mm,
.next_asid = 1,
.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -105,6 +105,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
unsigned long new_mm_cr3;
if (need_flush) {
+ invalidate_user_asid(new_asid);
new_mm_cr3 = build_cr3(pgdir, new_asid);
} else {
new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Share entry text PMD
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-share-entry-text-pmd.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6dc72c3cbca0580642808d677181cad4c6433893 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:47 +0100
Subject: x86/mm/pti: Share entry text PMD
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 6dc72c3cbca0580642808d677181cad4c6433893 upstream.
Share the entry text PMD of the kernel mapping with the user space
mapping. If large pages are enabled this is a single PMD entry and at the
point where it is copied into the user page table the RW bit has not been
cleared yet. Clear it right away so the user space visible map becomes RX.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pti.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -288,6 +288,15 @@ static void __init pti_clone_user_shared
}
/*
+ * Clone the populated PMDs of the entry and irqentry text and force it RO.
+ */
+static void __init pti_clone_entry_text(void)
+{
+ pti_clone_pmds((unsigned long) __entry_text_start,
+ (unsigned long) __irqentry_text_end, _PAGE_RW);
+}
+
+/*
* Initialize kernel page table isolation
*/
void __init pti_init(void)
@@ -298,4 +307,5 @@ void __init pti_init(void)
pr_info("enabled\n");
pti_clone_user_shared();
+ pti_clone_entry_text();
}
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Share cpu_entry_area with user space page tables
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f7cfbee91559ca7e3e961a00ffac921208a115ad Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Mon, 4 Dec 2017 15:07:45 +0100
Subject: x86/mm/pti: Share cpu_entry_area with user space page tables
From: Andy Lutomirski <luto(a)kernel.org>
commit f7cfbee91559ca7e3e961a00ffac921208a115ad upstream.
Share the cpu entry area so the user space and kernel space page tables
have the same P4D page.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pti.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -265,6 +265,29 @@ pti_clone_pmds(unsigned long start, unsi
}
/*
+ * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
+ * next-level entry on 5-level systems.
+ */
+static void __init pti_clone_p4d(unsigned long addr)
+{
+ p4d_t *kernel_p4d, *user_p4d;
+ pgd_t *kernel_pgd;
+
+ user_p4d = pti_user_pagetable_walk_p4d(addr);
+ kernel_pgd = pgd_offset_k(addr);
+ kernel_p4d = p4d_offset(kernel_pgd, addr);
+ *user_p4d = *kernel_p4d;
+}
+
+/*
+ * Clone the CPU_ENTRY_AREA into the user space visible page table.
+ */
+static void __init pti_clone_user_shared(void)
+{
+ pti_clone_p4d(CPU_ENTRY_AREA_BASE);
+}
+
+/*
* Initialize kernel page table isolation
*/
void __init pti_init(void)
@@ -273,4 +296,6 @@ void __init pti_init(void)
return;
pr_info("enabled\n");
+
+ pti_clone_user_shared();
}
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8a09317b895f073977346779df52f67c1056d81d Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:35 +0100
Subject: x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 8a09317b895f073977346779df52f67c1056d81d upstream.
PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when it
enters the kernel and switch back when it exits. This essentially needs to
be done before leaving assembly code.
This is extra challenging because the switching context is tricky: the
registers that can be clobbered can vary. It is also hard to store things
on the stack because there is an established ABI (ptregs) or the stack is
entirely unsafe to use.
Establish a set of macros that allow changing to the user and kernel CR3
values.
Interactions with SWAPGS:
Previous versions of the PAGE_TABLE_ISOLATION code relied on having
per-CPU scratch space to save/restore a register that can be used for the
CR3 MOV. The %GS register is used to index into our per-CPU space, so
SWAPGS *had* to be done before the CR3 switch. That scratch space is gone
now, but the semantic that SWAPGS must be done before the CR3 MOV is
retained. This is good to keep because it is not that hard to do and it
allows to do things like add per-CPU debugging information.
What this does in the NMI code is worth pointing out. NMIs can interrupt
*any* context and they can also be nested with NMIs interrupting other
NMIs. The comments below ".Lnmi_from_kernel" explain the format of the
stack during this situation. Changing the format of this stack is hard.
Instead of storing the old CR3 value on the stack, this depends on the
*regular* register save/restore mechanism and then uses %r14 to keep CR3
during the NMI. It is callee-saved and will not be clobbered by the C NMI
handlers that get called.
[ PeterZ: ESPFIX optimization ]
Based-on-code-from: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/calling.h | 66 +++++++++++++++++++++++++++++++++++++++
arch/x86/entry/entry_64.S | 45 +++++++++++++++++++++++---
arch/x86/entry/entry_64_compat.S | 24 +++++++++++++-
3 files changed, 128 insertions(+), 7 deletions(-)
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -1,6 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/jump_label.h>
#include <asm/unwind_hints.h>
+#include <asm/cpufeatures.h>
+#include <asm/page_types.h>
/*
@@ -187,6 +189,70 @@ For 32-bit we have the following convent
#endif
.endm
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */
+#define PTI_SWITCH_MASK (1<<PAGE_SHIFT)
+
+.macro ADJUST_KERNEL_CR3 reg:req
+ /* Clear "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
+ andq $(~PTI_SWITCH_MASK), \reg
+.endm
+
+.macro ADJUST_USER_CR3 reg:req
+ /* Move CR3 up a page to the user page tables: */
+ orq $(PTI_SWITCH_MASK), \reg
+.endm
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ mov %cr3, \scratch_reg
+ ADJUST_KERNEL_CR3 \scratch_reg
+ mov \scratch_reg, %cr3
+.endm
+
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+ mov %cr3, \scratch_reg
+ ADJUST_USER_CR3 \scratch_reg
+ mov \scratch_reg, %cr3
+.endm
+
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+ movq %cr3, \scratch_reg
+ movq \scratch_reg, \save_reg
+ /*
+ * Is the switch bit zero? This means the address is
+ * up in real PAGE_TABLE_ISOLATION patches in a moment.
+ */
+ testq $(PTI_SWITCH_MASK), \scratch_reg
+ jz .Ldone_\@
+
+ ADJUST_KERNEL_CR3 \scratch_reg
+ movq \scratch_reg, %cr3
+
+.Ldone_\@:
+.endm
+
+.macro RESTORE_CR3 save_reg:req
+ /*
+ * The CR3 write could be avoided when not changing its value,
+ * but would require a CR3 read *and* a scratch register.
+ */
+ movq \save_reg, %cr3
+.endm
+
+#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+.endm
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+.endm
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+.endm
+.macro RESTORE_CR3 save_reg:req
+.endm
+
+#endif
+
#endif /* CONFIG_X86_64 */
/*
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -164,6 +164,9 @@ ENTRY(entry_SYSCALL_64_trampoline)
/* Stash the user RSP. */
movq %rsp, RSP_SCRATCH
+ /* Note: using %rsp as a scratch reg. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
/* Load the top of the task stack into RSP */
movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
@@ -203,6 +206,10 @@ ENTRY(entry_SYSCALL_64)
*/
swapgs
+ /*
+ * This path is not taken when PAGE_TABLE_ISOLATION is disabled so it
+ * is not required to switch CR3.
+ */
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -399,6 +406,7 @@ syscall_return_via_sysret:
* We are on the trampoline stack. All regs except RDI are live.
* We can do future final exit work right here.
*/
+ SWITCH_TO_USER_CR3 scratch_reg=%rdi
popq %rdi
popq %rsp
@@ -736,6 +744,8 @@ GLOBAL(swapgs_restore_regs_and_return_to
* We can do future final exit work right here.
*/
+ SWITCH_TO_USER_CR3 scratch_reg=%rdi
+
/* Restore RDI. */
popq %rdi
SWAPGS
@@ -818,7 +828,9 @@ native_irq_return_ldt:
*/
pushq %rdi /* Stash user RDI */
- SWAPGS
+ SWAPGS /* to kernel GS */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */
+
movq PER_CPU_VAR(espfix_waddr), %rdi
movq %rax, (0*8)(%rdi) /* user RAX */
movq (1*8)(%rsp), %rax /* user RIP */
@@ -834,7 +846,6 @@ native_irq_return_ldt:
/* Now RAX == RSP. */
andl $0xffff0000, %eax /* RAX = (RSP & 0xffff0000) */
- popq %rdi /* Restore user RDI */
/*
* espfix_stack[31:16] == 0. The page tables are set up such that
@@ -845,7 +856,11 @@ native_irq_return_ldt:
* still points to an RO alias of the ESPFIX stack.
*/
orq PER_CPU_VAR(espfix_stack), %rax
- SWAPGS
+
+ SWITCH_TO_USER_CR3 scratch_reg=%rdi /* to user CR3 */
+ SWAPGS /* to user GS */
+ popq %rdi /* Restore user RDI */
+
movq %rax, %rsp
UNWIND_HINT_IRET_REGS offset=8
@@ -945,6 +960,8 @@ ENTRY(switch_to_thread_stack)
UNWIND_HINT_FUNC
pushq %rdi
+ /* Need to switch before accessing the thread stack. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
@@ -1244,7 +1261,11 @@ ENTRY(paranoid_entry)
js 1f /* negative -> in kernel */
SWAPGS
xorl %ebx, %ebx
-1: ret
+
+1:
+ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+
+ ret
END(paranoid_entry)
/*
@@ -1266,6 +1287,7 @@ ENTRY(paranoid_exit)
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
+ RESTORE_CR3 save_reg=%r14
SWAPGS_UNSAFE_STACK
jmp .Lparanoid_exit_restore
.Lparanoid_exit_no_swapgs:
@@ -1293,6 +1315,8 @@ ENTRY(error_entry)
* from user mode due to an IRET fault.
*/
SWAPGS
+ /* We have user CR3. Change to kernel CR3. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
@@ -1339,6 +1363,7 @@ ENTRY(error_entry)
* .Lgs_change's error handler with kernel gsbase.
*/
SWAPGS
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
jmp .Lerror_entry_done
.Lbstep_iret:
@@ -1348,10 +1373,11 @@ ENTRY(error_entry)
.Lerror_bad_iret:
/*
- * We came from an IRET to user mode, so we have user gsbase.
- * Switch to kernel gsbase:
+ * We came from an IRET to user mode, so we have user
+ * gsbase and CR3. Switch to kernel gsbase and CR3:
*/
SWAPGS
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
/*
* Pretend that the exception came from user mode: set up pt_regs
@@ -1383,6 +1409,10 @@ END(error_exit)
/*
* Runs on exception stack. Xen PV does not go through this path at all,
* so we can use real assembly here.
+ *
+ * Registers:
+ * %r14: Used to save/restore the CR3 of the interrupted context
+ * when PAGE_TABLE_ISOLATION is in use. Do not clobber.
*/
ENTRY(nmi)
UNWIND_HINT_IRET_REGS
@@ -1446,6 +1476,7 @@ ENTRY(nmi)
swapgs
cld
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
UNWIND_HINT_IRET_REGS base=%rdx offset=8
@@ -1698,6 +1729,8 @@ end_repeat_nmi:
movq $-1, %rsi
call do_nmi
+ RESTORE_CR3 save_reg=%r14
+
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,6 +49,10 @@
ENTRY(entry_SYSENTER_compat)
/* Interrupts are off on entry. */
SWAPGS
+
+ /* We are about to clobber %rsp anyway, clobbering here is OK */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/*
@@ -216,6 +220,12 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
pushq $0 /* pt_regs->r15 = 0 */
/*
+ * We just saved %rdi so it is safe to clobber. It is not
+ * preserved during the C calls inside TRACE_IRQS_OFF anyway.
+ */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
+ /*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
*/
@@ -256,10 +266,22 @@ sysret32_from_system_call:
* when the system call started, which is already known to user
* code. We zero R8-R10 to avoid info leaks.
*/
+ movq RSP-ORIG_RAX(%rsp), %rsp
+
+ /*
+ * The original userspace %rsp (RSP-ORIG_RAX(%rsp)) is stored
+ * on the process stack which is not mapped to userspace and
+ * not readable after we SWITCH_TO_USER_CR3. Delay the CR3
+ * switch until after after the last reference to the process
+ * stack.
+ *
+ * %r8 is zeroed before the sysret, thus safe to clobber.
+ */
+ SWITCH_TO_USER_CR3 scratch_reg=%r8
+
xorq %r8, %r8
xorq %r9, %r9
xorq %r10, %r10
- movq RSP-ORIG_RAX(%rsp), %rsp
swapgs
sysretl
END(entry_SYSCALL_compat)
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Populate user PGD
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-populate-user-pgd.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fc2fbc8512ed08d1de7720936fd7d2e4ce02c3a2 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:40 +0100
Subject: x86/mm/pti: Populate user PGD
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit fc2fbc8512ed08d1de7720936fd7d2e4ce02c3a2 upstream.
In clone_pgd_range() copy the init user PGDs which cover the kernel half of
the address space, so a process has all the required kernel mappings
visible.
[ tglx: Split out from the big kaiser dump and folded Andys simplification ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/pgtable.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1125,7 +1125,14 @@ static inline int pud_write(pud_t pud)
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+ /* Clone the user space pgd as well */
+ memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
+ count * sizeof(pgd_t));
+#endif
}
#define PTE_SHIFT ilog2(PTRS_PER_PTE)
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Map ESPFIX into user space
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-map-espfix-into-user-space.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4b6bbe95b87966ba08999574db65c93c5e925a36 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Fri, 15 Dec 2017 22:08:18 +0100
Subject: x86/mm/pti: Map ESPFIX into user space
From: Andy Lutomirski <luto(a)kernel.org>
commit 4b6bbe95b87966ba08999574db65c93c5e925a36 upstream.
Map the ESPFIX pages into user space when PTI is enabled.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pti.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -288,6 +288,16 @@ static void __init pti_clone_user_shared
}
/*
+ * Clone the ESPFIX P4D into the user space visinble page table
+ */
+static void __init pti_setup_espfix64(void)
+{
+#ifdef CONFIG_X86_ESPFIX64
+ pti_clone_p4d(ESPFIX_BASE_ADDR);
+#endif
+}
+
+/*
* Clone the populated PMDs of the entry and irqentry text and force it RO.
*/
static void __init pti_clone_entry_text(void)
@@ -308,4 +318,5 @@ void __init pti_init(void)
pti_clone_user_shared();
pti_clone_entry_text();
+ pti_setup_espfix64();
}
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Force entry through trampoline when PTI active
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8d4b067895791ab9fdb1aadfc505f64d71239dd2 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:43 +0100
Subject: x86/mm/pti: Force entry through trampoline when PTI active
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 8d4b067895791ab9fdb1aadfc505f64d71239dd2 upstream.
Force the entry through the trampoline only when PTI is active. Otherwise
go through the normal entry code.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/common.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1339,7 +1339,10 @@ void syscall_init(void)
(entry_SYSCALL_64_trampoline - _entry_trampoline);
wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
- wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ else
+ wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
#ifdef CONFIG_IA32_EMULATION
wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c313ec66317d421fb5768d78c56abed2dc862264 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:34 +0100
Subject: x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit c313ec66317d421fb5768d78c56abed2dc862264 upstream.
Global pages stay in the TLB across context switches. Since all contexts
share the same kernel mapping, these mappings are marked as global pages
so kernel entries in the TLB are not flushed out on a context switch.
But, even having these entries in the TLB opens up something that an
attacker can use, such as the double-page-fault attack:
http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
That means that even when PAGE_TABLE_ISOLATION switches page tables
on return to user space the global pages would stay in the TLB cache.
Disable global pages so that kernel TLB entries can be flushed before
returning to user space. This way, all accesses to kernel addresses from
userspace result in a TLB miss independent of the existence of a kernel
mapping.
Suppress global pages via the __supported_pte_mask. The user space
mappings set PAGE_GLOBAL for the minimal kernel mappings which are
required for entry/exit. These mappings are set up manually so the
filtering does not take place.
[ The __supported_pte_mask simplification was written by Thomas Gleixner. ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/init.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -161,6 +161,12 @@ struct map_range {
static int page_size_mask;
+static void enable_global_pages(void)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ __supported_pte_mask |= _PAGE_GLOBAL;
+}
+
static void __init probe_page_size_mask(void)
{
/*
@@ -179,11 +185,11 @@ static void __init probe_page_size_mask(
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
+ __supported_pte_mask &= ~_PAGE_GLOBAL;
if (boot_cpu_has(X86_FEATURE_PGE)) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
- __supported_pte_mask |= _PAGE_GLOBAL;
- } else
- __supported_pte_mask &= ~_PAGE_GLOBAL;
+ enable_global_pages();
+ }
/* Enable 1 GB linear kernel mappings if available: */
if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Allow NX poison to be set in p4d/pgd
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1c4de1ff4fe50453b968579ee86fac3da80dd783 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:38 +0100
Subject: x86/mm/pti: Allow NX poison to be set in p4d/pgd
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 1c4de1ff4fe50453b968579ee86fac3da80dd783 upstream.
With PAGE_TABLE_ISOLATION the user portion of the kernel page tables is
poisoned with the NX bit so if the entry code exits with the kernel page
tables selected in CR3, userspace crashes.
But doing so trips the p4d/pgd_bad() checks. Make sure it does not do
that.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/pgtable.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -846,7 +846,12 @@ static inline pud_t *pud_offset(p4d_t *p
static inline int p4d_bad(p4d_t p4d)
{
- return (p4d_flags(p4d) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
+ unsigned long ignore_flags = _KERNPG_TABLE | _PAGE_USER;
+
+ if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ ignore_flags |= _PAGE_NX;
+
+ return (p4d_flags(p4d) & ~ignore_flags) != 0;
}
#endif /* CONFIG_PGTABLE_LEVELS > 3 */
@@ -880,7 +885,12 @@ static inline p4d_t *p4d_offset(pgd_t *p
static inline int pgd_bad(pgd_t pgd)
{
- return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+ unsigned long ignore_flags = _PAGE_USER;
+
+ if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ ignore_flags |= _PAGE_NX;
+
+ return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
}
static inline int pgd_none(pgd_t pgd)
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Allocate a separate user PGD
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-allocate-a-separate-user-pgd.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d9e9a6418065bb376e5de8d93ce346939b9a37a6 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:39 +0100
Subject: x86/mm/pti: Allocate a separate user PGD
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit d9e9a6418065bb376e5de8d93ce346939b9a37a6 upstream.
Kernel page table isolation requires to have two PGDs. One for the kernel,
which contains the full kernel mapping plus the user space mapping and one
for user space which contains the user space mappings and the minimal set
of kernel mappings which are required by the architecture to be able to
transition from and to user space.
Add the necessary preliminaries.
[ tglx: Split out from the big kaiser dump. EFI fixup from Kirill ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/pgalloc.h | 11 +++++++++++
arch/x86/kernel/head_64.S | 30 +++++++++++++++++++++++++++---
arch/x86/mm/pgtable.c | 5 +++--
arch/x86/platform/efi/efi_64.c | 5 ++++-
4 files changed, 45 insertions(+), 6 deletions(-)
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -30,6 +30,17 @@ static inline void paravirt_release_p4d(
*/
extern gfp_t __userpte_alloc_gfp;
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Instead of one PGD, we acquire two PGDs. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER 1
+#else
+#define PGD_ALLOCATION_ORDER 0
+#endif
+
/*
* Allocate and free page tables.
*/
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -341,6 +341,27 @@ GLOBAL(early_recursion_flag)
.balign PAGE_SIZE; \
GLOBAL(name)
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Each PGD needs to be 8k long and 8k aligned. We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define PTI_USER_PGD_FILL 512
+/* This ensures they are 8k-aligned: */
+#define NEXT_PGD_PAGE(name) \
+ .balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define PTI_USER_PGD_FILL 0
+#endif
+
/* Automate the creation of 1 to 1 mapping pmd entries */
#define PMDS(START, PERM, COUNT) \
i = 0 ; \
@@ -350,13 +371,14 @@ GLOBAL(name)
.endr
__INITDATA
-NEXT_PAGE(early_top_pgt)
+NEXT_PGD_PAGE(early_top_pgt)
.fill 511,8,0
#ifdef CONFIG_X86_5LEVEL
.quad level4_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
#else
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
#endif
+ .fill PTI_USER_PGD_FILL,8,0
NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -364,13 +386,14 @@ NEXT_PAGE(early_dynamic_pgts)
.data
#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH)
-NEXT_PAGE(init_top_pgt)
+NEXT_PGD_PAGE(init_top_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_PAGE_OFFSET*8, 0
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+ .fill PTI_USER_PGD_FILL,8,0
NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
@@ -381,8 +404,9 @@ NEXT_PAGE(level2_ident_pgt)
*/
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#else
-NEXT_PAGE(init_top_pgt)
+NEXT_PGD_PAGE(init_top_pgt)
.fill 512,8,0
+ .fill PTI_USER_PGD_FILL,8,0
#endif
#ifdef CONFIG_X86_5LEVEL
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -355,14 +355,15 @@ static inline void _pgd_free(pgd_t *pgd)
kmem_cache_free(pgd_cache, pgd);
}
#else
+
static inline pgd_t *_pgd_alloc(void)
{
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
}
static inline void _pgd_free(pgd_t *pgd)
{
- free_page((unsigned long)pgd);
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
}
#endif /* CONFIG_X86_PAE */
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -195,6 +195,9 @@ static pgd_t *efi_pgd;
* because we want to avoid inserting EFI region mappings (EFI_VA_END
* to EFI_VA_START) into the standard kernel page tables. Everything
* else can be shared, see efi_sync_low_kernel_mappings().
+ *
+ * We don't want the pgd on the pgd_list and cannot use pgd_alloc() for the
+ * allocation.
*/
int __init efi_alloc_page_tables(void)
{
@@ -207,7 +210,7 @@ int __init efi_alloc_page_tables(void)
return 0;
gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO;
- efi_pgd = (pgd_t *)__get_free_page(gfp_mask);
+ efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);
if (!efi_pgd)
return -ENOMEM;
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Add mapping helper functions
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-add-mapping-helper-functions.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 61e9b3671007a5da8127955a1a3bda7e0d5f42e8 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:37 +0100
Subject: x86/mm/pti: Add mapping helper functions
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 61e9b3671007a5da8127955a1a3bda7e0d5f42e8 upstream.
Add the pagetable helper functions do manage the separate user space page
tables.
[ tglx: Split out from the big combo kaiser patch. Folded Andys
simplification and made it out of line as Boris suggested ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/pgtable.h | 6 ++
arch/x86/include/asm/pgtable_64.h | 92 ++++++++++++++++++++++++++++++++++++++
arch/x86/mm/pti.c | 41 ++++++++++++++++
3 files changed, 138 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -909,7 +909,11 @@ static inline int pgd_none(pgd_t pgd)
* pgd_offset() returns a (pgd_t *)
* pgd_index() is used get the offset into the pgd page's array of pgd_t's;
*/
-#define pgd_offset(mm, address) ((mm)->pgd + pgd_index((address)))
+#define pgd_offset_pgd(pgd, address) (pgd + pgd_index((address)))
+/*
+ * a shortcut to get a pgd_t in a given mm
+ */
+#define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address))
/*
* a shortcut which implies the use of the kernel's pgd, instead
* of a process's
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,9 +131,97 @@ static inline pud_t native_pudp_get_and_
#endif
}
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and
+ * the user one is in the last 4k. To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr |= BIT(bit);
+ return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr &= ~BIT(bit);
+ return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+ return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+ return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+ return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+ return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+ unsigned long ptr = (unsigned long)__ptr;
+
+ return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
+}
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return pgd;
+ return __pti_set_user_pgd(pgdp, pgd);
+}
+#else
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+#endif
+
static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
{
+#if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
+ p4dp->pgd = pti_set_user_pgd(&p4dp->pgd, p4d.pgd);
+#else
*p4dp = p4d;
+#endif
}
static inline void native_p4d_clear(p4d_t *p4d)
@@ -147,7 +235,11 @@ static inline void native_p4d_clear(p4d_
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ *pgdp = pti_set_user_pgd(pgdp, pgd);
+#else
*pgdp = pgd;
+#endif
}
static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -96,6 +96,47 @@ enable:
setup_force_cpu_cap(X86_FEATURE_PTI);
}
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ /*
+ * Changes to the high (kernel) portion of the kernelmode page
+ * tables are not automatically propagated to the usermode tables.
+ *
+ * Users should keep in mind that, unlike the kernelmode tables,
+ * there is no vmalloc_fault equivalent for the usermode tables.
+ * Top-level entries added to init_mm's usermode pgd after boot
+ * will not be automatically propagated to other mms.
+ */
+ if (!pgdp_maps_userspace(pgdp))
+ return pgd;
+
+ /*
+ * The user page tables get the full PGD, accessible from
+ * userspace:
+ */
+ kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd;
+
+ /*
+ * If this is normal user memory, make it NX in the kernel
+ * pagetables so that, if we somehow screw up and return to
+ * usermode with the kernel CR3 loaded, we'll get a page fault
+ * instead of allowing user code to execute with the wrong CR3.
+ *
+ * As exceptions, we don't set NX if:
+ * - _PAGE_USER is not set. This could be an executable
+ * EFI runtime mapping or something similar, and the kernel
+ * may execute from it
+ * - we don't have NX support
+ * - we're clearing the PGD (i.e. the new pgd is not present).
+ */
+ if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) &&
+ (__supported_pte_mask & _PAGE_NX))
+ pgd.pgd |= _PAGE_NX;
+
+ /* return the copy of the PGD we want the kernel to use: */
+ return pgd;
+}
+
/*
* Initialize kernel page table isolation
*/
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Add Kconfig
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-add-kconfig.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 385ce0ea4c078517fa51c261882c4e72fba53005 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:08:03 +0100
Subject: x86/mm/pti: Add Kconfig
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 385ce0ea4c078517fa51c261882c4e72fba53005 upstream.
Finally allow CONFIG_PAGE_TABLE_ISOLATION to be enabled.
PARAVIRT generally requires that the kernel not manage its own page tables.
It also means that the hypervisor and kernel must agree wholeheartedly
about what format the page tables are in and what they contain.
PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they
can not be used together.
I've seen conflicting feedback from maintainers lately about whether they
want the Kconfig magic to go first or last in a patch series. It's going
last here because the partially-applied series leads to kernels that can
not boot in a bunch of cases. I did a run through the entire series with
CONFIG_PAGE_TABLE_ISOLATION=y to look for build errors, though.
[ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/Kconfig | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -54,6 +54,16 @@ config SECURITY_NETWORK
implement socket and networking access controls.
If you are unsure how to answer this question, answer N.
+config PAGE_TABLE_ISOLATION
+ bool "Remove the kernel mapping in user mode"
+ depends on X86_64 && !UML
+ help
+ This feature reduces the number of hardware side channels by
+ ensuring that the majority of kernel addresses are not mapped
+ into userspace.
+
+ See Documentation/x86/pagetable-isolation.txt for more details.
+
config SECURITY_INFINIBAND
bool "Infiniband Security Hooks"
depends on SECURITY && INFINIBAND
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Add infrastructure for page table isolation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:36 +0100
Subject: x86/mm/pti: Add infrastructure for page table isolation
From: Thomas Gleixner <tglx(a)linutronix.de>
commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream.
Add the initial files for kernel page table isolation, with a minimal init
function and the boot time detection for this misfeature.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/admin-guide/kernel-parameters.txt | 2
arch/x86/boot/compressed/pagetable.c | 3
arch/x86/entry/calling.h | 7 ++
arch/x86/include/asm/pti.h | 14 ++++
arch/x86/mm/Makefile | 7 +-
arch/x86/mm/init.c | 2
arch/x86/mm/pti.c | 84 ++++++++++++++++++++++++
include/linux/pti.h | 11 +++
init/main.c | 3
9 files changed, 130 insertions(+), 3 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2685,6 +2685,8 @@
steal time is computed, but won't influence scheduler
behaviour
+ nopti [X86-64] Disable kernel page table isolation
+
nolapic [X86-32,APIC] Do not enable or use the local APIC.
nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -23,6 +23,9 @@
*/
#undef CONFIG_AMD_MEM_ENCRYPT
+/* No PAGE_TABLE_ISOLATION support needed either: */
+#undef CONFIG_PAGE_TABLE_ISOLATION
+
#include "misc.h"
/* These actually do the work of building the kernel identity maps. */
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -205,18 +205,23 @@ For 32-bit we have the following convent
.endm
.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
ADJUST_KERNEL_CR3 \scratch_reg
mov \scratch_reg, %cr3
+.Lend_\@:
.endm
.macro SWITCH_TO_USER_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
mov %cr3, \scratch_reg
ADJUST_USER_CR3 \scratch_reg
mov \scratch_reg, %cr3
+.Lend_\@:
.endm
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+ ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
movq %cr3, \scratch_reg
movq \scratch_reg, \save_reg
/*
@@ -233,11 +238,13 @@ For 32-bit we have the following convent
.endm
.macro RESTORE_CR3 save_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
/*
* The CR3 write could be avoided when not changing its value,
* but would require a CR3 read *and* a scratch register.
*/
movq \save_reg, %cr3
+.Lend_\@:
.endm
#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */
--- /dev/null
+++ b/arch/x86/include/asm/pti.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _ASM_X86_PTI_H
+#define _ASM_X86_PTI_H
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern void pti_init(void);
+extern void pti_check_boottime_disable(void);
+#else
+static inline void pti_check_boottime_disable(void) { }
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_PTI_H */
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -43,9 +43,10 @@ obj-$(CONFIG_AMD_NUMA) += amdtopology.o
obj-$(CONFIG_ACPI_NUMA) += srat.o
obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
-obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
-obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
-obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
+obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -20,6 +20,7 @@
#include <asm/kaslr.h>
#include <asm/hypervisor.h>
#include <asm/cpufeature.h>
+#include <asm/pti.h>
/*
* We need to define the tracepoints somewhere, and tlb.c
@@ -630,6 +631,7 @@ void __init init_mem_mapping(void)
{
unsigned long end;
+ pti_check_boottime_disable();
probe_page_size_mask();
setup_pcid();
--- /dev/null
+++ b/arch/x86/mm/pti.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * This code is based in part on work published here:
+ *
+ * https://github.com/IAIK/KAISER
+ *
+ * The original work was written by and and signed off by for the Linux
+ * kernel by:
+ *
+ * Signed-off-by: Richard Fellner <richard.fellner(a)student.tugraz.at>
+ * Signed-off-by: Moritz Lipp <moritz.lipp(a)iaik.tugraz.at>
+ * Signed-off-by: Daniel Gruss <daniel.gruss(a)iaik.tugraz.at>
+ * Signed-off-by: Michael Schwarz <michael.schwarz(a)iaik.tugraz.at>
+ *
+ * Major changes to the original code by: Dave Hansen <dave.hansen(a)intel.com>
+ * Mostly rewritten by Thomas Gleixner <tglx(a)linutronix.de> and
+ * Andy Lutomirsky <luto(a)amacapital.net>
+ */
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+#include <asm/cpufeature.h>
+#include <asm/hypervisor.h>
+#include <asm/cmdline.h>
+#include <asm/pti.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm/desc.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
+
+static void __init pti_print_if_insecure(const char *reason)
+{
+ if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ pr_info("%s\n", reason);
+}
+
+void __init pti_check_boottime_disable(void)
+{
+ if (hypervisor_is_type(X86_HYPER_XEN_PV)) {
+ pti_print_if_insecure("disabled on XEN PV.");
+ return;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti")) {
+ pti_print_if_insecure("disabled on command line.");
+ return;
+ }
+
+ if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
+ return;
+
+ setup_force_cpu_cap(X86_FEATURE_PTI);
+}
+
+/*
+ * Initialize kernel page table isolation
+ */
+void __init pti_init(void)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ pr_info("enabled\n");
+}
--- /dev/null
+++ b/include/linux/pti.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _INCLUDE_PTI_H
+#define _INCLUDE_PTI_H
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#include <asm/pti.h>
+#else
+static inline void pti_init(void) { }
+#endif
+
+#endif
--- a/init/main.c
+++ b/init/main.c
@@ -75,6 +75,7 @@
#include <linux/slab.h>
#include <linux/perf_event.h>
#include <linux/ptrace.h>
+#include <linux/pti.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/sched_clock.h>
@@ -506,6 +507,8 @@ static void __init mm_init(void)
ioremap_huge_init();
/* Should be run before the first non-init thread is created */
init_espfix_bsp();
+ /* Should be run after espfix64 is set up. */
+ pti_init();
}
asmlinkage __visible void __init start_kernel(void)
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Add functions to clone kernel PMDs
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 03f4424f348e8be95eb1bbeba09461cd7b867828 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Mon, 4 Dec 2017 15:07:42 +0100
Subject: x86/mm/pti: Add functions to clone kernel PMDs
From: Andy Lutomirski <luto(a)kernel.org>
commit 03f4424f348e8be95eb1bbeba09461cd7b867828 upstream.
Provide infrastructure to:
- find a kernel PMD for a mapping which must be visible to user space for
the entry/exit code to work.
- walk an address range and share the kernel PMD with it.
This reuses a small part of the original KAISER patches to populate the
user space page table.
[ tglx: Made it universally usable so it can be used for any kind of shared
mapping. Add a mechanism to clear specific bits in the user space
visible PMD entry. Folded Andys simplifactions ]
Originally-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pti.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -48,6 +48,11 @@
#undef pr_fmt
#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
+/* Backporting helper */
+#ifndef __GFP_NOTRACK
+#define __GFP_NOTRACK 0
+#endif
+
static void __init pti_print_if_insecure(const char *reason)
{
if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE))
@@ -138,6 +143,128 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pg
}
/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a P4D on success, or NULL on failure.
+ */
+static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+{
+ pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+
+ if (address < PAGE_OFFSET) {
+ WARN_ONCE(1, "attempt to walk user address\n");
+ return NULL;
+ }
+
+ if (pgd_none(*pgd)) {
+ unsigned long new_p4d_page = __get_free_page(gfp);
+ if (!new_p4d_page)
+ return NULL;
+
+ if (pgd_none(*pgd)) {
+ set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
+ new_p4d_page = 0;
+ }
+ if (new_p4d_page)
+ free_page(new_p4d_page);
+ }
+ BUILD_BUG_ON(pgd_large(*pgd) != 0);
+
+ return p4d_offset(pgd, address);
+}
+
+/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a PMD on success, or NULL on failure.
+ */
+static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+{
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
+ pud_t *pud;
+
+ BUILD_BUG_ON(p4d_large(*p4d) != 0);
+ if (p4d_none(*p4d)) {
+ unsigned long new_pud_page = __get_free_page(gfp);
+ if (!new_pud_page)
+ return NULL;
+
+ if (p4d_none(*p4d)) {
+ set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
+ new_pud_page = 0;
+ }
+ if (new_pud_page)
+ free_page(new_pud_page);
+ }
+
+ pud = pud_offset(p4d, address);
+ /* The user page tables do not use large mappings: */
+ if (pud_large(*pud)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pud_none(*pud)) {
+ unsigned long new_pmd_page = __get_free_page(gfp);
+ if (!new_pmd_page)
+ return NULL;
+
+ if (pud_none(*pud)) {
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ new_pmd_page = 0;
+ }
+ if (new_pmd_page)
+ free_page(new_pmd_page);
+ }
+
+ return pmd_offset(pud, address);
+}
+
+static void __init
+pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
+{
+ unsigned long addr;
+
+ /*
+ * Clone the populated PMDs which cover start to end. These PMD areas
+ * can have holes.
+ */
+ for (addr = start; addr < end; addr += PMD_SIZE) {
+ pmd_t *pmd, *target_pmd;
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+
+ pgd = pgd_offset_k(addr);
+ if (WARN_ON(pgd_none(*pgd)))
+ return;
+ p4d = p4d_offset(pgd, addr);
+ if (WARN_ON(p4d_none(*p4d)))
+ return;
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud))
+ continue;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ continue;
+
+ target_pmd = pti_user_pagetable_walk_pmd(addr);
+ if (WARN_ON(!target_pmd))
+ return;
+
+ /*
+ * Copy the PMD. That is, the kernelmode and usermode
+ * tables will share the last-level page tables of this
+ * address range
+ */
+ *target_pmd = pmd_clear_flags(*pmd, clear);
+ }
+}
+
+/*
* Initialize kernel page table isolation
*/
void __init pti_init(void)
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Optimize RESTORE_CR3
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-optimize-restore_cr3.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 21e94459110252d41b45c0c8ba50fd72a664d50c Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 4 Dec 2017 15:08:00 +0100
Subject: x86/mm: Optimize RESTORE_CR3
From: Peter Zijlstra <peterz(a)infradead.org>
commit 21e94459110252d41b45c0c8ba50fd72a664d50c upstream.
Most NMI/paranoid exceptions will not in fact change pagetables and would
thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3
writes.
Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the
kernel mappings and now that we track which user PCIDs need flushing we can
avoid those too when possible.
This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both
sites have plenty available.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/calling.h | 30 ++++++++++++++++++++++++++++--
arch/x86/entry/entry_64.S | 4 ++--
2 files changed, 30 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -281,8 +281,34 @@ For 32-bit we have the following convent
.Ldone_\@:
.endm
-.macro RESTORE_CR3 save_reg:req
+.macro RESTORE_CR3 scratch_reg:req save_reg:req
ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+ ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
+
+ /*
+ * KERNEL pages can always resume with NOFLUSH as we do
+ * explicit flushes.
+ */
+ bt $X86_CR3_PTI_SWITCH_BIT, \save_reg
+ jnc .Lnoflush_\@
+
+ /*
+ * Check if there's a pending flush for the user ASID we're
+ * about to set.
+ */
+ movq \save_reg, \scratch_reg
+ andq $(0x7FF), \scratch_reg
+ bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jnc .Lnoflush_\@
+
+ btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jmp .Lwrcr3_\@
+
+.Lnoflush_\@:
+ SET_NOFLUSH_BIT \save_reg
+
+.Lwrcr3_\@:
/*
* The CR3 write could be avoided when not changing its value,
* but would require a CR3 read *and* a scratch register.
@@ -301,7 +327,7 @@ For 32-bit we have the following convent
.endm
.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
.endm
-.macro RESTORE_CR3 save_reg:req
+.macro RESTORE_CR3 scratch_reg:req save_reg:req
.endm
#endif
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1288,7 +1288,7 @@ ENTRY(paranoid_exit)
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
- RESTORE_CR3 save_reg=%r14
+ RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
SWAPGS_UNSAFE_STACK
jmp .Lparanoid_exit_restore
.Lparanoid_exit_no_swapgs:
@@ -1730,7 +1730,7 @@ end_repeat_nmi:
movq $-1, %rsi
call do_nmi
- RESTORE_CR3 save_reg=%r14
+ RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 75298aa179d56cd64f54e58a19fffc8ab922b4c0 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Mon, 4 Dec 2017 15:08:04 +0100
Subject: x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
From: Borislav Petkov <bp(a)suse.de>
commit 75298aa179d56cd64f54e58a19fffc8ab922b4c0 upstream.
The upcoming support for dumping the kernel and the user space page tables
of the current process would create more random files in the top level
debugfs directory.
Add a page table directory and move the existing file to it.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/debug_pagetables.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -22,21 +22,26 @@ static const struct file_operations ptdu
.release = single_release,
};
-static struct dentry *pe;
+static struct dentry *dir, *pe;
static int __init pt_dump_debug_init(void)
{
- pe = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
- &ptdump_fops);
- if (!pe)
+ dir = debugfs_create_dir("page_tables", NULL);
+ if (!dir)
return -ENOMEM;
+ pe = debugfs_create_file("kernel", 0400, dir, NULL, &ptdump_fops);
+ if (!pe)
+ goto err;
return 0;
+err:
+ debugfs_remove_recursive(dir);
+ return -ENOMEM;
}
static void __exit pt_dump_debug_exit(void)
{
- debugfs_remove_recursive(pe);
+ debugfs_remove_recursive(dir);
}
module_init(pt_dump_debug_init);
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0a126abd576ebc6403f063dbe20cf7416c9d9393 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:53 +0100
Subject: x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
From: Peter Zijlstra <peterz(a)infradead.org>
commit 0a126abd576ebc6403f063dbe20cf7416c9d9393 upstream.
Ideally we'd also use sparse to enforce this separation so it becomes much
more difficult to mess up.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 55 +++++++++++++++++++++++++++++++---------
1 file changed, 43 insertions(+), 12 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -13,16 +13,33 @@
#include <asm/pti.h>
#include <asm/processor-flags.h>
-static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
-{
- /*
- * Bump the generation count. This also serves as a full barrier
- * that synchronizes with switch_mm(): callers are required to order
- * their read of mm_cpumask after their writes to the paging
- * structures.
- */
- return atomic64_inc_return(&mm->context.tlb_gen);
-}
+/*
+ * The x86 feature is called PCID (Process Context IDentifier). It is similar
+ * to what is traditionally called ASID on the RISC processors.
+ *
+ * We don't use the traditional ASID implementation, where each process/mm gets
+ * its own ASID and flush/restart when we run out of ASID space.
+ *
+ * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
+ * that came by on this CPU, allowing cheaper switch_mm between processes on
+ * this CPU.
+ *
+ * We end up with different spaces for different things. To avoid confusion we
+ * use different names for each of them:
+ *
+ * ASID - [0, TLB_NR_DYN_ASIDS-1]
+ * the canonical identifier for an mm
+ *
+ * kPCID - [1, TLB_NR_DYN_ASIDS]
+ * the value we write into the PCID part of CR3; corresponds to the
+ * ASID+1, because PCID 0 is special.
+ *
+ * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
+ * for KPTI each mm has two address spaces and thus needs two
+ * PCID values, but we can still do with a single ASID denomination
+ * for each mm. Corresponds to kPCID + 2048.
+ *
+ */
/* There are 12 bits of space for ASIDS in CR3 */
#define CR3_HW_ASID_BITS 12
@@ -41,7 +58,7 @@ static inline u64 inc_mm_tlb_gen(struct
/*
* ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account
- * for them being zero-based. Another -1 is because ASID 0 is reserved for
+ * for them being zero-based. Another -1 is because PCID 0 is reserved for
* use by non-PCID-aware users.
*/
#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
@@ -52,6 +69,9 @@ static inline u64 inc_mm_tlb_gen(struct
*/
#define TLB_NR_DYN_ASIDS 6
+/*
+ * Given @asid, compute kPCID
+ */
static inline u16 kern_pcid(u16 asid)
{
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
@@ -86,7 +106,7 @@ static inline u16 kern_pcid(u16 asid)
}
/*
- * The user PCID is just the kernel one, plus the "switch bit".
+ * Given @asid, compute uPCID
*/
static inline u16 user_pcid(u16 asid)
{
@@ -484,6 +504,17 @@ static inline void flush_tlb_page(struct
void native_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info);
+static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
+{
+ /*
+ * Bump the generation count. This also serves as a full barrier
+ * that synchronizes with switch_mm(): callers are required to order
+ * their read of mm_cpumask after their writes to the paging
+ * structures.
+ */
+ return atomic64_inc_return(&mm->context.tlb_gen);
+}
+
static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm)
{
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Allow flushing for future ASID switches
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-allow-flushing-for-future-asid-switches.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2ea907c4fe7b78e5840c1dc07800eae93248cad1 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:57 +0100
Subject: x86/mm: Allow flushing for future ASID switches
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 2ea907c4fe7b78e5840c1dc07800eae93248cad1 upstream.
If changing the page tables in such a way that an invalidation of all
contexts (aka. PCIDs / ASIDs) is required, they can be actively invalidated
by:
1. INVPCID for each PCID (works for single pages too).
2. Load CR3 with each PCID without the NOFLUSH bit set
3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address.
But, none of these are really feasible since there are ~6 ASIDs (12 with
PAGE_TABLE_ISOLATION) at the time that invalidation is required.
Instead of actively invalidating them, invalidate the *current* context and
also mark the cpu_tlbstate _quickly_ to indicate future invalidation to be
required.
At the next context-switch, look for this indicator
('invalidate_other' being set) invalidate all of the
cpu_tlbstate.ctxs[] entries.
This ensures that any future context switches will do a full flush
of the TLB, picking up the previous changes.
[ tglx: Folded more fixups from Peter ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 37 +++++++++++++++++++++++++++++--------
arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -135,6 +135,17 @@ struct tlb_state {
bool is_lazy;
/*
+ * If set we changed the page tables in such a way that we
+ * needed an invalidation of all contexts (aka. PCIDs / ASIDs).
+ * This tells us to go invalidate all the non-loaded ctxs[]
+ * on the next context switch.
+ *
+ * The current ctx was kept up-to-date as it ran and does not
+ * need to be invalidated.
+ */
+ bool invalidate_other;
+
+ /*
* Access to this CR4 shadow and to H/W CR4 is protected by
* disabling interrupts when modifying either one.
*/
@@ -212,6 +223,14 @@ static inline unsigned long cr4_read_sha
}
/*
+ * Mark all other ASIDs as invalid, preserves the current.
+ */
+static inline void invalidate_other_asid(void)
+{
+ this_cpu_write(cpu_tlbstate.invalidate_other, true);
+}
+
+/*
* Save some of cr4 feature set we're using (e.g. Pentium 4MB
* enable and PPro Global page enable), so that any CPU's that boot
* up after us can get the correct flags. This should only be used
@@ -298,14 +317,6 @@ static inline void __flush_tlb_all(void)
*/
__flush_tlb();
}
-
- /*
- * Note: if we somehow had PCID but not PGE, then this wouldn't work --
- * we'd end up flushing kernel translations for the current ASID but
- * we might fail to flush kernel translations for other cached ASIDs.
- *
- * To avoid this issue, we force PCID off if PGE is off.
- */
}
/*
@@ -315,6 +326,16 @@ static inline void __flush_tlb_one(unsig
{
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
__flush_tlb_single(addr);
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ /*
+ * __flush_tlb_single() will have cleared the TLB entry for this ASID,
+ * but since kernel space is replicated across all, we must also
+ * invalidate all others.
+ */
+ invalidate_other_asid();
}
#define TLB_FLUSH_ALL -1UL
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -28,6 +28,38 @@
* Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
*/
+/*
+ * We get here when we do something requiring a TLB invalidation
+ * but could not go invalidate all of the contexts. We do the
+ * necessary invalidation by clearing out the 'ctx_id' which
+ * forces a TLB flush when the context is loaded.
+ */
+void clear_asid_other(void)
+{
+ u16 asid;
+
+ /*
+ * This is only expected to be set if we have disabled
+ * kernel _PAGE_GLOBAL pages.
+ */
+ if (!static_cpu_has(X86_FEATURE_PTI)) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+
+ for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
+ /* Do not need to flush the current asid */
+ if (asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid))
+ continue;
+ /*
+ * Make sure the next time we go to switch to
+ * this asid, we do a flush:
+ */
+ this_cpu_write(cpu_tlbstate.ctxs[asid].ctx_id, 0);
+ }
+ this_cpu_write(cpu_tlbstate.invalidate_other, false);
+}
+
atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
@@ -42,6 +74,9 @@ static void choose_new_asid(struct mm_st
return;
}
+ if (this_cpu_read(cpu_tlbstate.invalidate_other))
+ clear_asid_other();
+
for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) !=
next->context.ctx_id)
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Abstract switching CR3
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-abstract-switching-cr3.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 48e111982cda033fec832c6b0592c2acedd85d04 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:58 +0100
Subject: x86/mm: Abstract switching CR3
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 48e111982cda033fec832c6b0592c2acedd85d04 upstream.
In preparation to adding additional PCID flushing, abstract the
loading of a new ASID into CR3.
[ PeterZ: Split out from big combo patch ]
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/tlb.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -100,6 +100,24 @@ static void choose_new_asid(struct mm_st
*need_flush = true;
}
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+{
+ unsigned long new_mm_cr3;
+
+ if (need_flush) {
+ new_mm_cr3 = build_cr3(pgdir, new_asid);
+ } else {
+ new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+ }
+
+ /*
+ * Caution: many callers of this function expect
+ * that load_cr3() is serializing and orders TLB
+ * fills with respect to the mm_cpumask writes.
+ */
+ write_cr3(new_mm_cr3);
+}
+
void leave_mm(int cpu)
{
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
@@ -230,7 +248,7 @@ void switch_mm_irqs_off(struct mm_struct
if (need_flush) {
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- write_cr3(build_cr3(next->pgd, new_asid));
+ load_new_mm_cr3(next->pgd, new_asid, true);
/*
* NB: This gets called via leave_mm() in the idle path
@@ -243,7 +261,7 @@ void switch_mm_irqs_off(struct mm_struct
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
} else {
/* The new ASID is already up to date. */
- write_cr3(build_cr3_noflush(next->pgd, new_asid));
+ load_new_mm_cr3(next->pgd, new_asid, false);
/* See above wrt _rcuidle. */
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/mm/64: Make a full PGD-entry size hole in the memory map
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9f449772a3106bcdd4eb8fdeb281147b0e99fb30 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Tue, 12 Dec 2017 07:56:44 -0800
Subject: x86/mm/64: Make a full PGD-entry size hole in the memory map
From: Andy Lutomirski <luto(a)kernel.org>
commit 9f449772a3106bcdd4eb8fdeb281147b0e99fb30 upstream.
Shrink vmalloc space from 16384TiB to 12800TiB to enlarge the hole starting
at 0xff90000000000000 to be a full PGD entry.
A subsequent patch will use this hole for the pagetable isolation LDT
alias.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/x86/x86_64/mm.txt | 4 ++--
arch/x86/include/asm/pgtable_64_types.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -29,8 +29,8 @@ Virtual memory map with 5 level page tab
hole caused by [56:63] sign extension
ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff91ffffffffffff (=49 bits) hole
-ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
+ff90000000000000 - ff9fffffffffffff (=52 bits) hole
+ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
... unused hole ...
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -79,8 +79,8 @@ typedef struct { pteval_t pte; } pte_t;
#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
#ifdef CONFIG_X86_5LEVEL
-# define VMALLOC_SIZE_TB _AC(16384, UL)
-# define __VMALLOC_BASE _AC(0xff92000000000000, UL)
+# define VMALLOC_SIZE_TB _AC(12800, UL)
+# define __VMALLOC_BASE _AC(0xffa0000000000000, UL)
# define __VMEMMAP_BASE _AC(0xffd4000000000000, UL)
#else
# define VMALLOC_SIZE_TB _AC(32, UL)
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/ldt: Make the LDT mapping RO
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-ldt-make-the-ldt-mapping-ro.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9f5cb6b32d9e0a3a7453222baaf15664d92adbf2 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Fri, 15 Dec 2017 20:35:11 +0100
Subject: x86/ldt: Make the LDT mapping RO
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 9f5cb6b32d9e0a3a7453222baaf15664d92adbf2 upstream.
Now that the LDT mapping is in a known area when PAGE_TABLE_ISOLATION is
enabled its a primary target for attacks, if a user space interface fails
to validate a write address correctly. That can never happen, right?
The SDM states:
If the segment descriptors in the GDT or an LDT are placed in ROM, the
processor can enter an indefinite loop if software or the processor
attempts to update (write to) the ROM-based segment descriptors. To
prevent this problem, set the accessed bits for all segment descriptors
placed in a ROM. Also, remove operating-system or executive code that
attempts to modify segment descriptors located in ROM.
So its a valid approach to set the ACCESS bit when setting up the LDT entry
and to map the table RO. Fixup the selftest so it can handle that new mode.
Remove the manual ACCESS bit setter in set_tls_desc() as this is now
pointless. Folded the patch from Peter Ziljstra.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/desc.h | 2 ++
arch/x86/kernel/ldt.c | 7 ++++++-
arch/x86/kernel/tls.c | 11 ++---------
tools/testing/selftests/x86/ldt_gdt.c | 3 +--
4 files changed, 11 insertions(+), 12 deletions(-)
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -21,6 +21,8 @@ static inline void fill_ldt(struct desc_
desc->type = (info->read_exec_only ^ 1) << 1;
desc->type |= info->contents << 2;
+ /* Set the ACCESS bit so it can be mapped RO */
+ desc->type |= 1;
desc->s = 1;
desc->dpl = 0x3;
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -158,7 +158,12 @@ map_ldt_struct(struct mm_struct *mm, str
ptep = get_locked_pte(mm, va, &ptl);
if (!ptep)
return -ENOMEM;
- pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL));
+ /*
+ * Map it RO so the easy to find address is not a primary
+ * target via some kernel interface which misses a
+ * permission check.
+ */
+ pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL));
set_pte_at(mm, va, ptep, pte);
pte_unmap_unlock(ptep, ptl);
}
--- a/arch/x86/kernel/tls.c
+++ b/arch/x86/kernel/tls.c
@@ -93,17 +93,10 @@ static void set_tls_desc(struct task_str
cpu = get_cpu();
while (n-- > 0) {
- if (LDT_empty(info) || LDT_zero(info)) {
+ if (LDT_empty(info) || LDT_zero(info))
memset(desc, 0, sizeof(*desc));
- } else {
+ else
fill_ldt(desc, info);
-
- /*
- * Always set the accessed bit so that the CPU
- * doesn't try to write to the (read-only) GDT.
- */
- desc->type |= 1;
- }
++info;
++desc;
}
--- a/tools/testing/selftests/x86/ldt_gdt.c
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -122,8 +122,7 @@ static void check_valid_segment(uint16_t
* NB: Different Linux versions do different things with the
* accessed bit in set_thread_area().
*/
- if (ar != expected_ar &&
- (ldt || ar != (expected_ar | AR_ACCESSED))) {
+ if (ar != expected_ar && ar != (expected_ar | AR_ACCESSED)) {
printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
(ldt ? "LDT" : "GDT"), index, ar, expected_ar);
nerrs++;
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/entry: Align entry text section to PMD boundary
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-entry-align-entry-text-section-to-pmd-boundary.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2f7412ba9c6af5ab16bdbb4a3fdb1dcd2b4fd3c2 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:46 +0100
Subject: x86/entry: Align entry text section to PMD boundary
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 2f7412ba9c6af5ab16bdbb4a3fdb1dcd2b4fd3c2 upstream.
The (irq)entry text must be visible in the user space page tables. To allow
simple PMD based sharing, make the entry text PMD aligned.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/vmlinux.lds.S | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -61,11 +61,17 @@ jiffies_64 = jiffies;
. = ALIGN(HPAGE_SIZE); \
__end_rodata_hpage_align = .;
+#define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
+#define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
+
#else
#define X64_ALIGN_RODATA_BEGIN
#define X64_ALIGN_RODATA_END
+#define ALIGN_ENTRY_TEXT_BEGIN
+#define ALIGN_ENTRY_TEXT_END
+
#endif
PHDRS {
@@ -102,8 +108,10 @@ SECTIONS
CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
+ ALIGN_ENTRY_TEXT_BEGIN
ENTRY_TEXT
IRQENTRY_TEXT
+ ALIGN_ENTRY_TEXT_END
SOFTIRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5f26d76c3fd67c48806415ef8b1116c97beff8ba Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka(a)suse.cz>
Date: Tue, 19 Dec 2017 22:33:46 +0100
Subject: x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
From: Vlastimil Babka <vbabka(a)suse.cz>
commit 5f26d76c3fd67c48806415ef8b1116c97beff8ba upstream.
CONFIG_PAGE_TABLE_ISOLATION is relatively new and intrusive feature that may
still have some corner cases which could take some time to manifest and be
fixed. It would be useful to have Oops messages indicate whether it was
enabled for building the kernel, and whether it was disabled during boot.
Example of fully enabled:
Oops: 0001 [#1] SMP PTI
Example of enabled during build, but disabled during boot:
Oops: 0001 [#1] SMP NOPTI
We can decide to remove this after the feature has been tested in the field
long enough.
[ tglx: Made it use boot_cpu_has() as requested by Borislav ]
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Eduardo Valentin <eduval(a)amazon.com>
Acked-by: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: bpetkov(a)suse.de
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: jkosina(a)suse.cz
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/dumpstack.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -297,11 +297,13 @@ int __die(const char *str, struct pt_reg
unsigned long sp;
#endif
printk(KERN_DEFAULT
- "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter,
+ "%s: %04lx [#%d]%s%s%s%s%s\n", str, err & 0xffff, ++die_counter,
IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
- IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "");
+ IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "",
+ IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION) ?
+ (boot_cpu_has(X86_FEATURE_PTI) ? " PTI" : " NOPTI") : "");
if (notify_die(DIE_OOPS, str, regs, err,
current->thread.trap_nr, SIGSEGV) == NOTIFY_STOP)
Patches currently in stable-queue which might be from vbabka(a)suse.cz are
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
This is a note to let you know that I've just added the patch titled
x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 10043e02db7f8a4161f76434931051e7d797a5f6 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:49 +0100
Subject: x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 10043e02db7f8a4161f76434931051e7d797a5f6 upstream.
The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual
addresses which must be visible in any execution context.
So it is required to make these mappings visible to user space when kernel
page table isolation is active.
Provide enough room for the buffer mappings in the cpu_entry_area so the
buffers are available in the user space visible page tables.
At the point where the kernel side entry area is populated there is no
buffer available yet, but the kernel PMD must be populated. To achieve this
set the entries for these buffers to non present.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/events/intel/ds.c | 5 ++--
arch/x86/events/perf_event.h | 21 +------------------
arch/x86/include/asm/cpu_entry_area.h | 13 ++++++++++++
arch/x86/include/asm/intel_ds.h | 36 ++++++++++++++++++++++++++++++++++
arch/x86/mm/cpu_entry_area.c | 27 +++++++++++++++++++++++++
5 files changed, 81 insertions(+), 21 deletions(-)
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -8,11 +8,12 @@
#include "../perf_event.h"
+/* Waste a full page so it can be mapped into the cpu_entry_area */
+DEFINE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
+
/* The size of a BTS record in bytes: */
#define BTS_RECORD_SIZE 24
-#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
-#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
#define PEBS_FIXUP_SIZE PAGE_SIZE
/*
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -14,6 +14,8 @@
#include <linux/perf_event.h>
+#include <asm/intel_ds.h>
+
/* To enable MSR tracing please use the generic trace points. */
/*
@@ -77,8 +79,6 @@ struct amd_nb {
struct event_constraint event_constraints[X86_PMC_IDX_MAX];
};
-/* The maximal number of PEBS events: */
-#define MAX_PEBS_EVENTS 8
#define PEBS_COUNTER_MASK ((1ULL << MAX_PEBS_EVENTS) - 1)
/*
@@ -95,23 +95,6 @@ struct amd_nb {
PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \
PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)
-/*
- * A debug store configuration.
- *
- * We only support architectures that use 64bit fields.
- */
-struct debug_store {
- u64 bts_buffer_base;
- u64 bts_index;
- u64 bts_absolute_maximum;
- u64 bts_interrupt_threshold;
- u64 pebs_buffer_base;
- u64 pebs_index;
- u64 pebs_absolute_maximum;
- u64 pebs_interrupt_threshold;
- u64 pebs_event_reset[MAX_PEBS_EVENTS];
-};
-
#define PEBS_REGS \
(PERF_REG_X86_AX | \
PERF_REG_X86_BX | \
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -5,6 +5,7 @@
#include <linux/percpu-defs.h>
#include <asm/processor.h>
+#include <asm/intel_ds.h>
/*
* cpu_entry_area is a percpu region that contains things needed by the CPU
@@ -40,6 +41,18 @@ struct cpu_entry_area {
*/
char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
#endif
+#ifdef CONFIG_CPU_SUP_INTEL
+ /*
+ * Per CPU debug store for Intel performance monitoring. Wastes a
+ * full page at the moment.
+ */
+ struct debug_store cpu_debug_store;
+ /*
+ * The actual PEBS/BTS buffers must be mapped to user space
+ * Reserve enough fixmap PTEs.
+ */
+ struct debug_store_buffers cpu_debug_buffers;
+#endif
};
#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area))
--- /dev/null
+++ b/arch/x86/include/asm/intel_ds.h
@@ -0,0 +1,36 @@
+#ifndef _ASM_INTEL_DS_H
+#define _ASM_INTEL_DS_H
+
+#include <linux/percpu-defs.h>
+
+#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
+#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
+
+/* The maximal number of PEBS events: */
+#define MAX_PEBS_EVENTS 8
+
+/*
+ * A debug store configuration.
+ *
+ * We only support architectures that use 64bit fields.
+ */
+struct debug_store {
+ u64 bts_buffer_base;
+ u64 bts_index;
+ u64 bts_absolute_maximum;
+ u64 bts_interrupt_threshold;
+ u64 pebs_buffer_base;
+ u64 pebs_index;
+ u64 pebs_absolute_maximum;
+ u64 pebs_interrupt_threshold;
+ u64 pebs_event_reset[MAX_PEBS_EVENTS];
+} __aligned(PAGE_SIZE);
+
+DECLARE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
+
+struct debug_store_buffers {
+ char bts_buffer[BTS_BUFFER_SIZE];
+ char pebs_buffer[PEBS_BUFFER_SIZE];
+};
+
+#endif
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -38,6 +38,32 @@ cea_map_percpu_pages(void *cea_vaddr, vo
cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
}
+static void percpu_setup_debug_store(int cpu)
+{
+#ifdef CONFIG_CPU_SUP_INTEL
+ int npages;
+ void *cea;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return;
+
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_store;
+ npages = sizeof(struct debug_store) / PAGE_SIZE;
+ BUILD_BUG_ON(sizeof(struct debug_store) % PAGE_SIZE != 0);
+ cea_map_percpu_pages(cea, &per_cpu(cpu_debug_store, cpu), npages,
+ PAGE_KERNEL);
+
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers;
+ /*
+ * Force the population of PMDs for not yet allocated per cpu
+ * memory like debug store buffers.
+ */
+ npages = sizeof(struct debug_store_buffers) / PAGE_SIZE;
+ for (; npages; npages--, cea += PAGE_SIZE)
+ cea_set_pte(cea, 0, PAGE_NONE);
+#endif
+}
+
/* Setup the fixmap mappings only once per-processor */
static void __init setup_cpu_entry_area(int cpu)
{
@@ -109,6 +135,7 @@ static void __init setup_cpu_entry_area(
cea_set_pte(&get_cpu_entry_area(cpu)->entry_trampoline,
__pa_symbol(_entry_trampoline), PAGE_KERNEL_RX);
#endif
+ percpu_setup_debug_store(cpu);
}
static __init void setup_cpu_entry_area_ptes(void)
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-mm-pti-allow-nx-poison-to-be-set-in-p4d-pgd.patch
queue-4.14/x86-dumpstack-indicate-in-oops-whether-pti-is-configured-and-enabled.patch
queue-4.14/x86-mm-clarify-the-whole-asid-kernel-pcid-user-pcid-naming.patch
queue-4.14/x86-mm-abstract-switching-cr3.patch
queue-4.14/x86-mm-optimize-restore_cr3.patch
queue-4.14/x86-mm-pti-force-entry-through-trampoline-when-pti-active.patch
queue-4.14/x86-entry-align-entry-text-section-to-pmd-boundary.patch
queue-4.14/x86-mm-64-make-a-full-pgd-entry-size-hole-in-the-memory-map.patch
queue-4.14/x86-cpu_entry_area-add-debugstore-entries-to-cpu_entry_area.patch
queue-4.14/x86-mm-pti-share-entry-text-pmd.patch
queue-4.14/x86-ldt-make-the-ldt-mapping-ro.patch
queue-4.14/x86-mm-dump_pagetables-check-user-space-page-table-for-wx-pages.patch
queue-4.14/x86-pti-map-the-vsyscall-page-if-needed.patch
queue-4.14/x86-mm-pti-disable-global-pages-if-page_table_isolation-y.patch
queue-4.14/x86-mm-dump_pagetables-allow-dumping-current-pagetables.patch
queue-4.14/x86-mm-pti-add-infrastructure-for-page-table-isolation.patch
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.14/x86-mm-use-fix-pcid-to-optimize-user-kernel-switches.patch
queue-4.14/x86-mm-use-invpcid-for-__native_flush_tlb_single.patch
queue-4.14/x86-mm-pti-prepare-the-x86-entry-assembly-code-for-entry-exit-cr3-switching.patch
queue-4.14/x86-mm-allow-flushing-for-future-asid-switches.patch
queue-4.14/x86-mm-pti-add-kconfig.patch
queue-4.14/x86-mm-pti-add-mapping-helper-functions.patch
queue-4.14/x86-mm-pti-map-espfix-into-user-space.patch
queue-4.14/x86-mm-pti-share-cpu_entry_area-with-user-space-page-tables.patch
queue-4.14/x86-mm-pti-add-functions-to-clone-kernel-pmds.patch
queue-4.14/x86-pti-add-the-pti-cmdline-option-and-documentation.patch
queue-4.14/x86-events-intel-ds-map-debug-buffers-in-cpu_entry_area.patch
queue-4.14/x86-mm-pti-allocate-a-separate-user-pgd.patch
queue-4.14/x86-pti-put-the-ldt-in-its-own-pgd-if-pti-is-on.patch
queue-4.14/x86-mm-pti-populate-user-pgd.patch
queue-4.14/x86-mm-dump_pagetables-add-page-table-directory-to-the-debugfs-vfs-hierarchy.patch
This is a note to let you know that I've just added the patch titled
x86/cpufeatures: Add X86_BUG_CPU_INSECURE
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpufeatures-add-x86_bug_cpu_insecure.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a89f040fa34ec9cd682aed98b8f04e3c47d998bd Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:33 +0100
Subject: x86/cpufeatures: Add X86_BUG_CPU_INSECURE
From: Thomas Gleixner <tglx(a)linutronix.de>
commit a89f040fa34ec9cd682aed98b8f04e3c47d998bd upstream.
Many x86 CPUs leak information to user space due to missing isolation of
user space and kernel space page tables. There are many well documented
ways to exploit that.
The upcoming software migitation of isolating the user and kernel space
page tables needs a misfeature flag so code can be made runtime
conditional.
Add the BUG bits which indicates that the CPU is affected and add a feature
bit which indicates that the software migitation is enabled.
Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be
made later.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/cpufeatures.h | 3 ++-
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/kernel/cpu/common.c | 4 ++++
3 files changed, 13 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -201,7 +201,7 @@
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
-
+#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
@@ -340,5 +340,6 @@
#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */
#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */
#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */
+#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */
#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -44,6 +44,12 @@
# define DISABLE_LA57 (1<<(X86_FEATURE_LA57 & 31))
#endif
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define DISABLE_PTI 0
+#else
+# define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -54,7 +60,7 @@
#define DISABLED_MASK4 (DISABLE_PCID)
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
-#define DISABLED_MASK7 0
+#define DISABLED_MASK7 (DISABLE_PTI)
#define DISABLED_MASK8 0
#define DISABLED_MASK9 (DISABLE_MPX)
#define DISABLED_MASK10 0
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -898,6 +898,10 @@ static void __init early_identify_cpu(st
}
setup_force_cpu_cap(X86_FEATURE_ALWAYS);
+
+ /* Assume for now that ALL x86 CPUs are insecure */
+ setup_force_cpu_bug(X86_BUG_CPU_INSECURE);
+
fpu__init_system(c);
#ifdef CONFIG_X86_32
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
Hello,
4.14.9 fails to boot if CONFIG_MCORE2 is enabled and when compiled with
gcc 6+. More details in the following bug reports:
https://bugzilla.kernel.org/show_bug.cgi?id=198263https://bugs.gentoo.org/642268
I bisected it to the commit below:
$ git bisect good
2bc9fa0beaf10206a778f02e9e5cb62f50345b1a is the first bad commit
commit 2bc9fa0beaf10206a778f02e9e5cb62f50345b1a
Author: Andy Lutomirski <luto(a)kernel.org>
Date: Mon Dec 4 15:07:23 2017 +0100
x86/entry/64: Use a per-CPU trampoline stack for IDT entries
commit 7f2590a110b837af5679d08fc25c6227c5a8c497 upstream.
Historically, IDT entries from usermode have always gone directly
to the running task's kernel stack. Rearrange it so that we enter
on
a per-CPU trampoline stack and then manually switch to the task's
stack.
This touches a couple of extra cachelines, but it gives us a chance
to run some code before we touch the kernel stack.
The asm isn't exactly beautiful, but I think that fully refactoring
it can wait.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Link: https://lkml.kernel.org/r/20171204150606.225330557@linutronix
.de
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
:040000 040000 275d4746936a9e521a2b5041856f7dc1d1820dc6
8f8e869fd59c3dd781dceffa76e53e41d733a0cf M arch
$ git bisect log
git bisect start
# bad: [dad5c1402c570cd07a80113784bc20a7f930c8ae] Linux 4.14.9
git bisect bad dad5c1402c570cd07a80113784bc20a7f930c8ae
# good: [7b3775017f4e6b87dfd2c7f63d1eaf057948f31d] Linux 4.14.8
git bisect good 7b3775017f4e6b87dfd2c7f63d1eaf057948f31d
# good: [d120cd749ef9770ee98b708a83b49547dcf1c0e1] x86/entry/64:
Separate cpu_current_top_of_stack from TSS.sp0
git bisect good d120cd749ef9770ee98b708a83b49547dcf1c0e1
# bad: [97f41b41c432e5a80c91445d92c2f4b729984d36] powerpc/xmon: Avoid
tripping SMP hardlockup watchdog
git bisect bad 97f41b41c432e5a80c91445d92c2f4b729984d36
# bad: [bfd66a406fe7e590055c1d6714adc697f18664c8] PCI: Avoid bus reset
if bridge itself is broken
git bisect bad bfd66a406fe7e590055c1d6714adc697f18664c8
# bad: [8388d287e361a2fd0a39bece30a736d692d5c3d8] x86/cpufeatures: Make
CPU bugs sticky
git bisect bad 8388d287e361a2fd0a39bece30a736d692d5c3d8
# bad: [bb568391775d4a840992e2d2493f39d6e86401e3] x86/entry/64: Move
the IST stacks into struct cpu_entry_area
git bisect bad bb568391775d4a840992e2d2493f39d6e86401e3
# bad: [2bc9fa0beaf10206a778f02e9e5cb62f50345b1a] x86/entry/64: Use a
per-CPU trampoline stack for IDT entries
git bisect bad 2bc9fa0beaf10206a778f02e9e5cb62f50345b1a
# good: [c3dbef1bd0f7eb09daf49409ea533aa1b0eeb82e] x86/espfix/64: Stop
assuming that pt_regs is on the entry stack
git bisect good c3dbef1bd0f7eb09daf49409ea533aa1b0eeb82e
# first bad commit: [2bc9fa0beaf10206a778f02e9e5cb62f50345b1a]
x86/entry/64: Use a per-CPU trampoline stack for IDT entries
This is a note to let you know that I've just added the patch titled
tracing: Fix possible double free on failure of allocating trace buffer
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4397f04575c44e1440ec2e49b6302785c95fd2f8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Tue, 26 Dec 2017 20:07:34 -0500
Subject: tracing: Fix possible double free on failure of allocating trace buffer
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia(a)spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6955,6 +6955,7 @@ allocate_trace_buffer(struct trace_array
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.9/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.9/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.9/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Remove extra zeroing out of the ring buffer page
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6b7e633fe9c24682df550e5311f47fb524701586 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:38:57 -0500
Subject: tracing: Remove extra zeroing out of the ring buffer page
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream.
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6181,7 +6181,7 @@ tracing_buffers_splice_read(struct file
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -6232,14 +6232,6 @@ tracing_buffers_splice_read(struct file
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.9/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.9/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.9/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix crash when it fails to alloc ring buffer
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 Mon Sep 17 00:00:00 2001
From: Jing Xia <jing.xia(a)spreadtrum.com>
Date: Tue, 26 Dec 2017 15:12:53 +0800
Subject: tracing: Fix crash when it fails to alloc ring buffer
From: Jing Xia <jing.xia(a)spreadtrum.com>
commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia(a)spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6979,7 +6979,9 @@ static int allocate_trace_buffers(struct
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
Patches currently in stable-queue which might be from jing.xia(a)spreadtrum.com are
queue-4.9/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.9/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix possible double free on failure of allocating trace buffer
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4397f04575c44e1440ec2e49b6302785c95fd2f8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Tue, 26 Dec 2017 20:07:34 -0500
Subject: tracing: Fix possible double free on failure of allocating trace buffer
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia(a)spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6531,6 +6531,7 @@ allocate_trace_buffer(struct trace_array
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.4/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.4/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.4/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Remove extra zeroing out of the ring buffer page
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6b7e633fe9c24682df550e5311f47fb524701586 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:38:57 -0500
Subject: tracing: Remove extra zeroing out of the ring buffer page
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream.
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5754,7 +5754,7 @@ tracing_buffers_splice_read(struct file
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -5805,14 +5805,6 @@ tracing_buffers_splice_read(struct file
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.4/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.4/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.4/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix crash when it fails to alloc ring buffer
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 Mon Sep 17 00:00:00 2001
From: Jing Xia <jing.xia(a)spreadtrum.com>
Date: Tue, 26 Dec 2017 15:12:53 +0800
Subject: tracing: Fix crash when it fails to alloc ring buffer
From: Jing Xia <jing.xia(a)spreadtrum.com>
commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia(a)spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6555,7 +6555,9 @@ static int allocate_trace_buffers(struct
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
Patches currently in stable-queue which might be from jing.xia(a)spreadtrum.com are
queue-4.4/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.4/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Remove extra zeroing out of the ring buffer page
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6b7e633fe9c24682df550e5311f47fb524701586 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:38:57 -0500
Subject: tracing: Remove extra zeroing out of the ring buffer page
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream.
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6769,7 +6769,7 @@ tracing_buffers_splice_read(struct file
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -6823,14 +6823,6 @@ tracing_buffers_splice_read(struct file
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.14/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.14/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.14/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix possible double free on failure of allocating trace buffer
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4397f04575c44e1440ec2e49b6302785c95fd2f8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Tue, 26 Dec 2017 20:07:34 -0500
Subject: tracing: Fix possible double free on failure of allocating trace buffer
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia(a)spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7580,6 +7580,7 @@ allocate_trace_buffer(struct trace_array
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-4.14/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.14/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-4.14/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix crash when it fails to alloc ring buffer
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 Mon Sep 17 00:00:00 2001
From: Jing Xia <jing.xia(a)spreadtrum.com>
Date: Tue, 26 Dec 2017 15:12:53 +0800
Subject: tracing: Fix crash when it fails to alloc ring buffer
From: Jing Xia <jing.xia(a)spreadtrum.com>
commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia(a)spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7604,7 +7604,9 @@ static int allocate_trace_buffers(struct
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
Patches currently in stable-queue which might be from jing.xia(a)spreadtrum.com are
queue-4.14/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-4.14/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix possible double free on failure of allocating trace buffer
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4397f04575c44e1440ec2e49b6302785c95fd2f8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Tue, 26 Dec 2017 20:07:34 -0500
Subject: tracing: Fix possible double free on failure of allocating trace buffer
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia(a)spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6244,6 +6244,7 @@ allocate_trace_buffer(struct trace_array
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-3.18/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-3.18/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-3.18/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Remove extra zeroing out of the ring buffer page
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6b7e633fe9c24682df550e5311f47fb524701586 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 22 Dec 2017 20:38:57 -0500
Subject: tracing: Remove extra zeroing out of the ring buffer page
From: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream.
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5502,7 +5502,7 @@ tracing_buffers_splice_read(struct file
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
mutex_lock(&trace_types_lock);
@@ -5563,14 +5563,6 @@ tracing_buffers_splice_read(struct file
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
Patches currently in stable-queue which might be from rostedt(a)goodmis.org are
queue-3.18/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-3.18/tracing-remove-extra-zeroing-out-of-the-ring-buffer-page.patch
queue-3.18/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
This is a note to let you know that I've just added the patch titled
tracing: Fix crash when it fails to alloc ring buffer
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 Mon Sep 17 00:00:00 2001
From: Jing Xia <jing.xia(a)spreadtrum.com>
Date: Tue, 26 Dec 2017 15:12:53 +0800
Subject: tracing: Fix crash when it fails to alloc ring buffer
From: Jing Xia <jing.xia(a)spreadtrum.com>
commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia(a)spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6268,7 +6268,9 @@ static int allocate_trace_buffers(struct
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
Patches currently in stable-queue which might be from jing.xia(a)spreadtrum.com are
queue-3.18/tracing-fix-crash-when-it-fails-to-alloc-ring-buffer.patch
queue-3.18/tracing-fix-possible-double-free-on-failure-of-allocating-trace-buffer.patch
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent display node was also prematurely
freed.
Note that the display and timings node references are never put after a
successful dt-initialisation so the nodes would leak on later probe
deferrals and on driver unbind.
Fixes: b985172b328a ("video: atmel_lcdfb: add device tree suport")
Cc: stable <stable(a)vger.kernel.org> # 3.13
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj(a)jcrosoft.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/video/fbdev/atmel_lcdfb.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/atmel_lcdfb.c b/drivers/video/fbdev/atmel_lcdfb.c
index e06358da4b99..3dee267d7c75 100644
--- a/drivers/video/fbdev/atmel_lcdfb.c
+++ b/drivers/video/fbdev/atmel_lcdfb.c
@@ -1119,7 +1119,7 @@ static int atmel_lcdfb_of_init(struct atmel_lcdfb_info *sinfo)
goto put_display_node;
}
- timings_np = of_find_node_by_name(display_np, "display-timings");
+ timings_np = of_get_child_by_name(display_np, "display-timings");
if (!timings_np) {
dev_err(dev, "failed to find display-timings node\n");
ret = -ENODEV;
@@ -1140,6 +1140,12 @@ static int atmel_lcdfb_of_init(struct atmel_lcdfb_info *sinfo)
fb_add_videomode(&fb_vm, &info->modelist);
}
+ /*
+ * FIXME: Make sure we are not referencing any fields in display_np
+ * and timings_np and drop our references to them before returning to
+ * avoid leaking the nodes on probe deferral and driver unbind.
+ */
+
return 0;
put_timings_node:
--
2.15.0
On Fri, Dec 22, 2017 at 03:51:13PM +0100, Thomas Gleixner wrote:
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() are:
>
> if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
>
> This is too permissive in various aspects:
>
> 1) If need_resched() is set, then the tick cannot be stopped whether
> the CPU is idle or in nohz full mode.
That's not exactly true. In nohz full mode the tick is not restarted on the
switch from idle to a single task. And if an idle interrupt wakes up a
single task and enqueues a timer, we want that timer to be programmed even
though we have need_resched().
Problems reported yesterday have been fixed. New problems:
Building arm:axm55xx_defconfig ... failed
--------------
Error log:
/opt/buildbot/slave/stable-queue-4.1/build/arch/arm/kvm/interrupts.S: Assembler messages:
/opt/buildbot/slave/stable-queue-4.1/build/arch/arm/kvm/interrupts.S:339: Error: garbage following instruction -- `ldr r2,=BSYM(panic)'
/opt/buildbot/slave/stable-queue-4.1/build/arch/arm/kvm/interrupts.S:343: Error: garbage following instruction -- `ldr r2,=BSYM(panic)'
/opt/buildbot/slave/stable-queue-4.1/build/arch/arm/kvm/interrupts.S:347: Error: garbage following instruction -- `ldr r2,=BSYM(panic)'
/opt/buildbot/slave/stable-queue-4.1/build/arch/arm/kvm/interrupts.S:351: Error: garbage following instruction -- `ldr r2,=BSYM(panic)'
Looks like something is wrong with the backport of fdc31f5c95c9
("ARM: replace BSYM() with badr assembly macro") ?
Building mips:defconfig ... failed
--------------
Error log:
/opt/buildbot/slave/stable-queue-4.1/build/arch/mips/kernel/genex.S: Assembler messages:
/opt/buildbot/slave/stable-queue-4.1/build/arch/mips/kernel/genex.S:219: Error: absolute expression required `li $9,_IRQ_STACK_SIZE'
/opt/buildbot/slave/stable-queue-4.1/build/arch/mips/kernel/genex.S:329: Error: absolute expression required `li $9,_IRQ_STACK_SIZE'
_IRQ_STACK_SIZE is not defined. It was introduced with commit fe8bd18ffea53
("MIPS: Introduce irq_stack"). No idea if that can be backported.
Guenter
This is the start of the stable review cycle for the 4.9.73 release.
There are 21 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Dec 29 16:45:43 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.73-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.73-rc1
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: eliminate wrong call to handle rx descriptor error
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: use proper rxq_number in loop on rx queues
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: clear interface link status on port disable
Dan Williams <dan.j.williams(a)intel.com>
libnvdimm, pfn: fix start_pad handling for aligned namespaces
Ravi Bangoria <ravi.bangoria(a)linux.vnet.ibm.com>
powerpc/perf: Dereference BHRB entries safely
Chen-Yu Tsai <wens(a)csie.org>
clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: x86: fix RSM when PCID is non-zero
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: X86: Fix load RFLAGS w/o the fixed bit
Mika Westerberg <mika.westerberg(a)linux.intel.com>
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
Ricardo Ribalda Delgado <ricardo.ribalda(a)gmail.com>
spi: xilinx: Detect stall with Unknown commands
Helge Deller <deller(a)gmx.de>
parisc: Hide Diva-built-in serial aux and graphics card
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
Jussi Laako <jussi(a)sonarnerd.net>
ALSA: usb-audio: Add native DSD support for Esoteric D-05X
Takashi Iwai <tiwai(a)suse.de>
ALSA: rawmidi: Avoid racy info ioctl via ctl device
Johan Hovold <johan(a)kernel.org>
mfd: twl6040: Fix child-node lookup
Johan Hovold <johan(a)kernel.org>
mfd: twl4030-audio: Fix sibling-node lookup
Jon Hunter <jonathanh(a)nvidia.com>
mfd: cros ec: spi: Don't send first message too soon
Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
crypto: mcryptd - protect the per-CPU queue with a lock
Dan Williams <dan.j.williams(a)intel.com>
acpi, nfit: fix health event notification
Takashi Iwai <tiwai(a)suse.de>
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
-------------
Diffstat:
Makefile | 4 ++--
arch/powerpc/perf/core-book3s.c | 8 ++++++--
arch/x86/kvm/emulate.c | 32 ++++++++++++++++++++++-------
arch/x86/kvm/x86.c | 2 +-
crypto/mcryptd.c | 23 +++++++++------------
drivers/acpi/apei/erst.c | 2 +-
drivers/acpi/nfit/core.c | 9 +++++++-
drivers/clk/sunxi/clk-sun9i-mmc.c | 12 +++++++++++
drivers/mfd/cros_ec_spi.c | 1 +
drivers/mfd/twl4030-audio.c | 9 ++++++--
drivers/mfd/twl6040.c | 12 +++++++----
drivers/net/ethernet/marvell/mvneta.c | 8 ++++++--
drivers/nvdimm/pfn_devs.c | 5 +++--
drivers/parisc/lba_pci.c | 33 ++++++++++++++++++++++++++++++
drivers/pci/pci-driver.c | 7 ++++++-
drivers/pinctrl/intel/pinctrl-cherryview.c | 16 +++++++++++++++
drivers/spi/spi-xilinx.c | 11 ++++++++++
include/crypto/mcryptd.h | 1 +
sound/core/rawmidi.c | 15 +++++++++++---
sound/usb/mixer.c | 27 ++++++++++++++----------
sound/usb/quirks.c | 7 ++++---
21 files changed, 189 insertions(+), 55 deletions(-)
Just a heads-up to avoid more people lose hours in debugging:
After upgrading from 4.9 to 4.14 I noticed USB support was broken on
my Allwinner A20 boards like Cubietruck and first generation BananaPi.
There is simply no output of the lsusb command, and subsequently no
connected USB devices are detected.
Bisecting led to
commit 6254a6a94489c4be717f757bec7d3a372cba1b6e
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Thu Apr 27 21:11:48 2017 +0200
power: supply: axp20x_usb_power: add IIO dependency
which already happened in the 4.12 development. Turns out after that
commit "make oldconfig" will silently drop CONFIG_AXP20X_POWER unless
CONFIG_IIO is set - but CONFIG_AXP20X_POWER is needed for USB on these
boards.
Solution: Enable CONFIG_IIO, but no particular driver is required. I'm
not aware whether it's possible to provide a smoother upgrade path in
Kconfig, "selects" instead of "depends" might have been an option.
Christoph, two more 4.14 regressions to go
The patch titled
Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
has been added to the -mm tree. Its filename is
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/userfaultfd-clear-the-vma-vm_userf…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd-clear-the-vma-vm_userf…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
The previous fix 384632e67e0829 ("userfaultfd: non-cooperative: fix fork
use after free") corrected the refcounting in case of UFFD_EVENT_FORK
failure for the fork userfault paths. That still didn't clear the
vma->vm_userfaultfd_ctx of the vmas that were set to point to the aborted
new uffd ctx earlier in dup_userfaultfd.
Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Eric Biggers <ebiggers3(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff -puN fs/userfaultfd.c~userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails fs/userfaultfd.c
--- a/fs/userfaultfd.c~userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails
+++ a/fs/userfaultfd.c
@@ -570,11 +570,14 @@ out:
static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
struct userfaultfd_wait_queue *ewq)
{
+ struct userfaultfd_ctx *release_new_ctx;
+
if (WARN_ON_ONCE(current->flags & PF_EXITING))
goto out;
ewq->ctx = ctx;
init_waitqueue_entry(&ewq->wq, current);
+ release_new_ctx = NULL;
spin_lock(&ctx->event_wqh.lock);
/*
@@ -601,8 +604,7 @@ static void userfaultfd_event_wait_compl
new = (struct userfaultfd_ctx *)
(unsigned long)
ewq->msg.arg.reserved.reserved1;
-
- userfaultfd_ctx_put(new);
+ release_new_ctx = new;
}
break;
}
@@ -617,6 +619,20 @@ static void userfaultfd_event_wait_compl
__set_current_state(TASK_RUNNING);
spin_unlock(&ctx->event_wqh.lock);
+ if (release_new_ctx) {
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = release_new_ctx->mm;
+
+ /* the various vma->vm_userfaultfd_ctx still points to it */
+ down_write(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next)
+ if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx)
+ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
+ up_write(&mm->mmap_sem);
+
+ userfaultfd_ctx_put(release_new_ctx);
+ }
+
/*
* ctx may go away after this if the userfault pseudo fd is
* already released.
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.
So passing page unaligned address to kasan_populate_zero_shadow() have two
possible effects:
1) It may leave one page hole in supposed to be populated area. After commit
21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
hole happens to be in the shadow covering fixmap area and leads to crash:
BUG: unable to handle kernel paging request at fffffbffffe8ee04
RIP: 0010:check_memory_region+0x5c/0x190
Call Trace:
<NMI>
memcpy+0x1f/0x50
ghes_copy_tofrom_phys+0xab/0x180
ghes_read_estatus+0xfb/0x280
ghes_notify_nmi+0x2b2/0x410
nmi_handle+0x115/0x2c0
default_do_nmi+0x57/0x110
do_nmi+0xf8/0x150
end_repeat_nmi+0x1a/0x1e
Note, the crash likely disappeared after commit 92a0f81d8957, which
changed kasan_populate_zero_shadow() call the way it was before
commit 21506525fb8d.
2) Attempt to load module near MODULES_END will fail, because
__vmalloc_node_range() called from kasan_module_alloc() will hit the
WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.
To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
which means that MODULES_END should be 8*PAGE_SIZE aligned.
The whole point of commit f06bdd4001c2 was to move MODULES_END down if
NR_CPUS is big, so the cpu_entry_area takes a lot of space.
But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
the cpu_entry_area is no longer in fixmap, so we could just set
MODULES_END to a fixed 8*PAGE_SIZE aligned address.
Reported-by: Jakub Kicinski <kubakici(a)wp.pl>
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org> # 4.14
---
Documentation/x86/x86_64/mm.txt | 5 +----
arch/x86/include/asm/pgtable_64_types.h | 2 +-
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 51101708a03a..60a0fa2bc01f 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -42,7 +42,7 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space
+ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
@@ -66,9 +66,6 @@ memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.
-The module mapping space size changes based on the CONFIG requirements for the
-following fixmap section.
-
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
Their order is preserved but their base will be offset early at boot time.
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 3d27831bc58d..ed2da4e7f259 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -100,7 +100,7 @@ typedef struct { pteval_t pte; } pte_t;
#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
-#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1)
+#define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)
#define ESPFIX_PGD_ENTRY _AC(-2, UL)
--
2.13.6
On Tue, Dec 26, 2017 at 1:11 AM, 王元良 <yuanliang.wyl(a)alibaba-inc.com> wrote:
> To Andy and josh:
>
> Forgive me for not expressing clearly, The context is that we encountered
> hundreds of this panic in stable kernel, and hope this change can be merged
> into the stable branch linux-3.16.y
>
> Regards,
> Yuanliang Wang
>
You could try to backport the upstream fix to 3.16. Josh or I could review it.
--Andy
This is a note to let you know that I've just added the patch titled
bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bpf-verifier-fix-states_equal-comparison-of-pointer-and-unknown.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ben(a)decadent.org.uk Wed Dec 27 21:04:06 2017
From: Ben Hutchings <ben(a)decadent.org.uk>
Date: Sat, 23 Dec 2017 02:26:17 +0000
Subject: bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
To: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org, netdev(a)vger.kernel.org, Edward Cree <ecree(a)solarflare.com>, Jann Horn <jannh(a)google.com>, Alexei Starovoitov <ast(a)kernel.org>
Message-ID: <20171223022617.GO2971(a)decadent.org.uk>
Content-Disposition: inline
From: Ben Hutchings <ben(a)decadent.org.uk>
An UNKNOWN_VALUE is not supposed to be derived from a pointer, unless
pointer leaks are allowed. Therefore, states_equal() must not treat
a state with a pointer in a register as "equal" to a state with an
UNKNOWN_VALUE in that register.
This was fixed differently upstream, but the code around here was
largely rewritten in 4.14 by commit f1174f77b50c "bpf/verifier: rework
value tracking". The bug can be detected by the bpf/verifier sub-test
"pointer/scalar confusion in state equality check (way 1)".
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Cc: Edward Cree <ecree(a)solarflare.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
---
kernel/bpf/verifier.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2722,11 +2722,12 @@ static bool states_equal(struct bpf_veri
/* If we didn't map access then again we don't care about the
* mismatched range values and it's ok if our old type was
- * UNKNOWN and we didn't go to a NOT_INIT'ed reg.
+ * UNKNOWN and we didn't go to a NOT_INIT'ed or pointer reg.
*/
if (rold->type == NOT_INIT ||
(!varlen_map_access && rold->type == UNKNOWN_VALUE &&
- rcur->type != NOT_INIT))
+ rcur->type != NOT_INIT &&
+ !__is_pointer_value(env->allow_ptr_leaks, rcur)))
continue;
/* Don't care about the reg->id in this case. */
Patches currently in stable-queue which might be from ben(a)decadent.org.uk are
queue-4.9/bpf-verifier-fix-states_equal-comparison-of-pointer-and-unknown.patch
Here is my lspci output of ThunderX2 for which I am observing kernel panic coming from
SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):
# lspci -vt
-[0000:00]-+-00.0-[01-1f]--+ [...]
+ [...]
\-00.0-[1e-1f]----00.0-[1f]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family
ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family
PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge
While setting up ASP device SID in IORT dirver:
iort_iommu_configure() -> pci_for_each_dma_alias()
we need to walk up and iterate over each device which alias transaction from
downstream devices.
AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT.
Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge
spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using
the subordinate bus number. For bridge (1e:00.0), the subordinate is equal
to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream
device. So it is possible to have two identical SIDs. The question is what we
should do about such case. Presented patch prevents from registering the same
ID so that SMMUv3 is not complaining later on.
Tomasz Nowicki (1):
iommu: Make sure device's ID array elements are unique
drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
--
2.7.4
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Cc: stable(a)vger.kernel.org
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia(a)spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0e53d46544b8..2a8d8a294345 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7580,6 +7580,7 @@ allocate_trace_buffer(struct trace_array *tr, struct trace_buffer *buf, int size
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
--
2.13.2
From: Jing Xia <jing.xia(a)spreadtrum.com>
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Cc: stable(a)vger.kernel.org
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia(a)spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 73652d5318b2..0e53d46544b8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7603,7 +7603,9 @@ static int allocate_trace_buffers(struct trace_array *tr, int size)
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
--
2.13.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
To free the reader page that is allocated with ring_buffer_alloc_read_page(),
ring_buffer_free_read_page() must be called. For faster performance, this
page can be reused by the ring buffer to avoid having to free and allocate
new pages.
The issue arises when the page is used with a splice pipe into the
networking code. The networking code may up the page counter for the page,
and keep it active while sending it is queued to go to the network. The
incrementing of the page ref does not prevent it from being reused in the
ring buffer, and this can cause the page that is being sent out to the
network to be modified before it is sent by reading new data.
Add a check to the page ref counter, and only reuse the page if it is not
being used anywhere else.
Cc: stable(a)vger.kernel.org
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e06cde093f76..9ab18995ff1e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4404,8 +4404,13 @@ void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void *data)
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
struct buffer_data_page *bpage = data;
+ struct page *page = virt_to_page(bpage);
unsigned long flags;
+ /* If the page is still in use someplace else, we can't reuse it */
+ if (page_ref_count(page) > 1)
+ goto out;
+
local_irq_save(flags);
arch_spin_lock(&cpu_buffer->lock);
@@ -4417,6 +4422,7 @@ void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void *data)
arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags);
+ out:
free_page((unsigned long)bpage);
}
EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);
--
2.13.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Cc: stable(a)vger.kernel.org
Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 59518b8126d0..73652d5318b2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6769,7 +6769,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -6823,14 +6823,6 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
--
2.13.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Cc: stable(a)vger.kernel.org
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c87766c1c204..e06cde093f76 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -280,6 +280,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -331,7 +333,9 @@ static void rb_init_page(struct buffer_data_page *bpage)
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
--
2.13.2
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5663d8f9bbe4bf15488f7351efb61ea20fa6de06 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 12 Dec 2017 17:15:02 +0100
Subject: [PATCH] kvm: x86: fix WARN due to uninitialized guest FPU state
------------[ cut here ]------------
Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10
RIP: 0010:ex_handler_fprestore+0x88/0x90
Call Trace:
fixup_exception+0x4e/0x60
do_general_protection+0xff/0x270
general_protection+0x22/0x30
RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
kvm_apic_accept_events+0x1c0/0x240 [kvm]
kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
do_syscall_64+0x15f/0x600
where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.
Cc: stable(a)vger.kernel.org
Fixes: f775b13eedee2f7f3c6fdd4e90fb79090ce5d339
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 154ea27746e9..56d036b9ad75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7264,13 +7264,12 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
- struct fpu *fpu = ¤t->thread.fpu;
int r;
- fpu__initialize(fpu);
-
kvm_sigset_activate(vcpu);
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
if (kvm_run->immediate_exit) {
r = -EINTR;
@@ -7296,14 +7295,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}
- kvm_load_guest_fpu(vcpu);
-
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out_fpu;
+ goto out;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
@@ -7312,9 +7309,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);
-out_fpu:
- kvm_put_guest_fpu(vcpu);
out:
+ kvm_put_guest_fpu(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
From: Corey Minyard <cminyard(a)mvista.com>
This reverts commit c97e41076a298dbc4e910c33048e553658388eed.
A backport was requested of c0a32fe13cd32 "ipmi_si: fix memory leak on
new_smi", but the backport shouldn't have been done. This change needs
to be reverted, as it can result in an oops and the previous code is
correct.
Reverts: c97e41076a29 ("ipmi_si: fix memory leak on new_smi")
Link: https://bbs.archlinux.org/viewtopic.php?pid=1757130#p1757130
Reported-by: Neil Romig <neil(a)sixtythree.me.uk>
Cc: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
---
This is for 4.14 stable tree only. In hindsight, I should scrutinize
stable kernel requests from others in the IPMI tree. Sorry about that.
drivers/char/ipmi/ipmi_si_intf.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index e1cbb78..c04aa11 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -3469,7 +3469,6 @@ static int add_smi(struct smi_info *new_smi)
ipmi_addr_src_to_str(new_smi->addr_source),
si_to_str[new_smi->si_type]);
rv = -EBUSY;
- kfree(new_smi);
goto out_err;
}
}
--
2.7.4
This is a note to let you know that I've just added the patch titled
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 4 Dec 2017 14:07:43 -0800
Subject: libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
From: Dan Williams <dan.j.williams(a)intel.com>
commit 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 upstream.
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu(a)oracle.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/pfn_devs.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -562,6 +562,12 @@ static struct vmem_altmap *__nvdimm_setu
return altmap;
}
+static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
+{
+ return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
+ ALIGN_DOWN(phys, nd_pfn->align));
+}
+
static int nd_pfn_init(struct nd_pfn *nd_pfn)
{
u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0;
@@ -617,13 +623,16 @@ static int nd_pfn_init(struct nd_pfn *nd
start = nsio->res.start;
size = PHYS_SECTION_ALIGN_UP(start + size) - start;
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
- IORES_DESC_NONE) == REGION_MIXED) {
+ IORES_DESC_NONE) == REGION_MIXED
+ || !IS_ALIGNED(start + resource_size(&nsio->res),
+ nd_pfn->align)) {
size = resource_size(&nsio->res);
- end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size);
+ end_trunc = start + size - phys_pmem_align_down(nd_pfn,
+ start + size);
}
if (start_pad + end_trunc)
- dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n",
+ dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
dev_name(&ndns->dev), start_pad + end_trunc);
/*
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.9/libnvdimm-pfn-fix-start_pad-handling-for-aligned-namespaces.patch
queue-4.9/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
queue-4.9/acpi-nfit-fix-health-event-notification.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 4 Dec 2017 14:07:43 -0800
Subject: [PATCH] libnvdimm, dax: fix 1GB-aligned namespaces vs physical
misalignment
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Cc: <stable(a)vger.kernel.org>
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu(a)oracle.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index db2fc7c02e01..2adada1a5855 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -583,6 +583,12 @@ static struct vmem_altmap *__nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
return altmap;
}
+static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
+{
+ return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
+ ALIGN_DOWN(phys, nd_pfn->align));
+}
+
static int nd_pfn_init(struct nd_pfn *nd_pfn)
{
u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0;
@@ -638,13 +644,16 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
start = nsio->res.start;
size = PHYS_SECTION_ALIGN_UP(start + size) - start;
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
- IORES_DESC_NONE) == REGION_MIXED) {
+ IORES_DESC_NONE) == REGION_MIXED
+ || !IS_ALIGNED(start + resource_size(&nsio->res),
+ nd_pfn->align)) {
size = resource_size(&nsio->res);
- end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size);
+ end_trunc = start + size - phys_pmem_align_down(nd_pfn,
+ start + size);
}
if (start_pad + end_trunc)
- dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n",
+ dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
dev_name(&ndns->dev), start_pad + end_trunc);
/*
This is a note to let you know that I've just added the patch titled
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pinctrl-cherryview-mask-all-interrupts-on-intel_strago-based-systems.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d2b3c353595a855794f8b9df5b5bdbe8deb0c413 Mon Sep 17 00:00:00 2001
From: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Date: Mon, 4 Dec 2017 12:11:02 +0300
Subject: pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
From: Mika Westerberg <mika.westerberg(a)linux.intel.com>
commit d2b3c353595a855794f8b9df5b5bdbe8deb0c413 upstream.
Guenter Roeck reported an interrupt storm on a prototype system which is
based on Cyan Chromebook. The root cause turned out to be a incorrectly
configured pin that triggers spurious interrupts. This will be fixed in
coreboot but currently we need to prevent the interrupt storm from
happening by masking all interrupts (but not GPEs) on those systems.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197953
Fixes: bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in probe")
Reported-and-tested-by: Guenter Roeck <linux(a)roeck-us.net>
Reported-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pinctrl/intel/pinctrl-cherryview.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/drivers/pinctrl/intel/pinctrl-cherryview.c
+++ b/drivers/pinctrl/intel/pinctrl-cherryview.c
@@ -1466,6 +1466,22 @@ static int chv_gpio_probe(struct chv_pin
offset += range->npins;
}
+ /*
+ * The same set of machines in chv_no_valid_mask[] have incorrectly
+ * configured GPIOs that generate spurious interrupts so we use
+ * this same list to apply another quirk for them.
+ *
+ * See also https://bugzilla.kernel.org/show_bug.cgi?id=197953.
+ */
+ if (!need_valid_mask) {
+ /*
+ * Mask all interrupts the community is able to generate
+ * but leave the ones that can only generate GPEs unmasked.
+ */
+ chv_writel(GENMASK(31, pctrl->community->nirqs),
+ pctrl->regs + CHV_INTMASK);
+ }
+
/* Clear all interrupts */
chv_writel(0xffff, pctrl->regs + CHV_INTSTAT);
Patches currently in stable-queue which might be from mika.westerberg(a)linux.intel.com are
queue-4.4/pinctrl-cherryview-mask-all-interrupts-on-intel_strago-based-systems.patch
This is a note to let you know that I've just added the patch titled
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4831b779403a836158917d59a7ca880483c67378 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sun, 10 Dec 2017 22:47:20 -0800
Subject: x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
From: Andy Lutomirski <luto(a)kernel.org>
commit 4831b779403a836158917d59a7ca880483c67378 upstream.
If something goes wrong with pagetable setup, vsyscall=native will
accidentally fall back to emulation. Make it warn and fail so that we
notice.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -139,6 +139,10 @@ bool emulate_vsyscall(struct pt_regs *re
WARN_ON_ONCE(address != regs->ip);
+ /* This should be unreachable in NATIVE mode. */
+ if (WARN_ON(vsyscall_mode == NATIVE))
+ return false;
+
if (vsyscall_mode == NONE) {
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall attempted with vsyscall=none");
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 49275fef986abfb8b476e4708aaecc07e7d3e087 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sun, 10 Dec 2017 22:47:19 -0800
Subject: x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
From: Andy Lutomirski <luto(a)kernel.org>
commit 49275fef986abfb8b476e4708aaecc07e7d3e087 upstream.
The kernel is very erratic as to which pagetables have _PAGE_USER set. The
vsyscall page gets lucky: it seems that all of the relevant pagetables are
among the apparently arbitrary ones that set _PAGE_USER. Rather than
relying on chance, just explicitly set _PAGE_USER.
This will let us clean up pagetable setup to stop setting _PAGE_USER. The
added code can also be reused by pagetable isolation to manage the
_PAGE_USER bit in the usermode tables.
[ tglx: Folded paravirt fix from Juergen Gross ]
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -37,6 +37,7 @@
#include <asm/unistd.h>
#include <asm/fixmap.h>
#include <asm/traps.h>
+#include <asm/paravirt.h>
#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"
@@ -329,16 +330,47 @@ int in_gate_area_no_mm(unsigned long add
return vsyscall_mode != NONE && (addr & PAGE_MASK) == VSYSCALL_ADDR;
}
+/*
+ * The VSYSCALL page is the only user-accessible page in the kernel address
+ * range. Normally, the kernel page tables can have _PAGE_USER clear, but
+ * the tables covering VSYSCALL_ADDR need _PAGE_USER set if vsyscalls
+ * are enabled.
+ *
+ * Some day we may create a "minimal" vsyscall mode in which we emulate
+ * vsyscalls but leave the page not present. If so, we skip calling
+ * this.
+ */
+static void __init set_vsyscall_pgtable_user_bits(void)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd = pgd_offset_k(VSYSCALL_ADDR);
+ set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
+ p4d = p4d_offset(pgd, VSYSCALL_ADDR);
+#if CONFIG_PGTABLE_LEVELS >= 5
+ p4d->p4d |= _PAGE_USER;
+#endif
+ pud = pud_offset(p4d, VSYSCALL_ADDR);
+ set_pud(pud, __pud(pud_val(*pud) | _PAGE_USER));
+ pmd = pmd_offset(pud, VSYSCALL_ADDR);
+ set_pmd(pmd, __pmd(pmd_val(*pmd) | _PAGE_USER));
+}
+
void __init map_vsyscall(void)
{
extern char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- if (vsyscall_mode != NONE)
+ if (vsyscall_mode != NONE) {
__set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall,
vsyscall_mode == NATIVE
? PAGE_KERNEL_VSYSCALL
: PAGE_KERNEL_VVAR);
+ set_vsyscall_pgtable_user_bits();
+ }
BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
(unsigned long)VSYSCALL_ADDR);
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/uv: Use the right TLB-flush API
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-uv-use-the-right-tlb-flush-api.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3e46e0f5ee3643a1239be9046c7ba6c66ca2b329 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:50 +0100
Subject: x86/uv: Use the right TLB-flush API
From: Peter Zijlstra <peterz(a)infradead.org>
commit 3e46e0f5ee3643a1239be9046c7ba6c66ca2b329 upstream.
Since uv_flush_tlb_others() implements flush_tlb_others() which is
about flushing user mappings, we should use __flush_tlb_single(),
which too is about flushing user mappings.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Andrew Banman <abanman(a)hpe.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mike Travis <mike.travis(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/platform/uv/tlb_uv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -299,7 +299,7 @@ static void bau_process_message(struct m
local_flush_tlb();
stat->d_alltlb++;
} else {
- __flush_tlb_one(msg->address);
+ __flush_tlb_single(msg->address);
stat->d_onetlb++;
}
stat->d_requestee++;
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Use __flush_tlb_one() for kernel memory
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a501686b2923ce6f2ff2b1d0d50682c6411baf72 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:49 +0100
Subject: x86/mm: Use __flush_tlb_one() for kernel memory
From: Peter Zijlstra <peterz(a)infradead.org>
commit a501686b2923ce6f2ff2b1d0d50682c6411baf72 upstream.
__flush_tlb_single() is for user mappings, __flush_tlb_one() for
kernel mappings.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/tlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -551,7 +551,7 @@ static void do_kernel_range_flush(void *
/* flush range by one by one 'invlpg' */
for (addr = f->start; addr < f->end; addr += PAGE_SIZE)
- __flush_tlb_single(addr);
+ __flush_tlb_one(addr);
}
void flush_tlb_kernel_range(unsigned long start, unsigned long end)
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Remove hard-coded ASID limit checks
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-remove-hard-coded-asid-limit-checks.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From cb0a9144a744e55207e24dcef812f05cd15a499a Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:55 +0100
Subject: x86/mm: Remove hard-coded ASID limit checks
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit cb0a9144a744e55207e24dcef812f05cd15a499a upstream.
First, it's nice to remove the magic numbers.
Second, PAGE_TABLE_ISOLATION is going to consume half of the available ASID
space. The space is currently unused, but add a comment to spell out this
new restriction.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -69,6 +69,22 @@ static inline u64 inc_mm_tlb_gen(struct
return atomic64_inc_return(&mm->context.tlb_gen);
}
+/* There are 12 bits of space for ASIDS in CR3 */
+#define CR3_HW_ASID_BITS 12
+/*
+ * When enabled, PAGE_TABLE_ISOLATION consumes a single bit for
+ * user/kernel switches
+ */
+#define PTI_CONSUMED_ASID_BITS 0
+
+#define CR3_AVAIL_ASID_BITS (CR3_HW_ASID_BITS - PTI_CONSUMED_ASID_BITS)
+/*
+ * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account
+ * for them being zero-based. Another -1 is because ASID 0 is reserved for
+ * use by non-PCID-aware users.
+ */
+#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2)
+
/*
* If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits.
* This serves two purposes. It prevents a nasty situation in which
@@ -81,7 +97,7 @@ struct pgd_t;
static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
{
if (static_cpu_has(X86_FEATURE_PCID)) {
- VM_WARN_ON_ONCE(asid > 4094);
+ VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
return __sme_pa(pgd) | (asid + 1);
} else {
VM_WARN_ON_ONCE(asid != 0);
@@ -91,7 +107,7 @@ static inline unsigned long build_cr3(pg
static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
{
- VM_WARN_ON_ONCE(asid > 4094);
+ VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH;
}
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Remove superfluous barriers
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-remove-superfluous-barriers.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b5fc6d943808b570bdfbec80f40c6b3855f1c48b Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:46 +0100
Subject: x86/mm: Remove superfluous barriers
From: Peter Zijlstra <peterz(a)infradead.org>
commit b5fc6d943808b570bdfbec80f40c6b3855f1c48b upstream.
atomic64_inc_return() already implies smp_mb() before and after.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -60,19 +60,13 @@ static inline void invpcid_flush_all_non
static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
{
- u64 new_tlb_gen;
-
/*
* Bump the generation count. This also serves as a full barrier
* that synchronizes with switch_mm(): callers are required to order
* their read of mm_cpumask after their writes to the paging
* structures.
*/
- smp_mb__before_atomic();
- new_tlb_gen = atomic64_inc_return(&mm->context.tlb_gen);
- smp_mb__after_atomic();
-
- return new_tlb_gen;
+ return atomic64_inc_return(&mm->context.tlb_gen);
}
#ifdef CONFIG_PARAVIRT
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Put MMU to hardware ASID translation in one place
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From dd95f1a4b5ca904c78e6a097091eb21436478abb Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:56 +0100
Subject: x86/mm: Put MMU to hardware ASID translation in one place
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit dd95f1a4b5ca904c78e6a097091eb21436478abb upstream.
There are effectively two ASID types:
1. The one stored in the mmu_context that goes from 0..5
2. The one programmed into the hardware that goes from 1..6
This consolidates the locations where converting between the two (by doing
a +1) to a single place which gives us a nice place to comment.
PAGE_TABLE_ISOLATION will also need to, given an ASID, know which hardware
ASID to flush for the userspace mapping.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -85,20 +85,26 @@ static inline u64 inc_mm_tlb_gen(struct
*/
#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2)
-/*
- * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits.
- * This serves two purposes. It prevents a nasty situation in which
- * PCID-unaware code saves CR3, loads some other value (with PCID == 0),
- * and then restores CR3, thus corrupting the TLB for ASID 0 if the saved
- * ASID was nonzero. It also means that any bugs involving loading a
- * PCID-enabled CR3 with CR4.PCIDE off will trigger deterministically.
- */
+static inline u16 kern_pcid(u16 asid)
+{
+ VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+ /*
+ * If PCID is on, ASID-aware code paths put the ASID+1 into the
+ * PCID bits. This serves two purposes. It prevents a nasty
+ * situation in which PCID-unaware code saves CR3, loads some other
+ * value (with PCID == 0), and then restores CR3, thus corrupting
+ * the TLB for ASID 0 if the saved ASID was nonzero. It also means
+ * that any bugs involving loading a PCID-enabled CR3 with
+ * CR4.PCIDE off will trigger deterministically.
+ */
+ return asid + 1;
+}
+
struct pgd_t;
static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
{
if (static_cpu_has(X86_FEATURE_PCID)) {
- VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
- return __sme_pa(pgd) | (asid + 1);
+ return __sme_pa(pgd) | kern_pcid(asid);
} else {
VM_WARN_ON_ONCE(asid != 0);
return __sme_pa(pgd);
@@ -108,7 +114,8 @@ static inline unsigned long build_cr3(pg
static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
{
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
- return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH;
+ VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID));
+ return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
}
#ifdef CONFIG_PARAVIRT
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Move the CR3 construction functions to tlbflush.h
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 50fb83a62cf472dc53ba23bd3f7bd6c1b2b3b53e Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Mon, 4 Dec 2017 15:07:54 +0100
Subject: x86/mm: Move the CR3 construction functions to tlbflush.h
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 50fb83a62cf472dc53ba23bd3f7bd6c1b2b3b53e upstream.
For flushing the TLB, the ASID which has been programmed into the hardware
must be known. That differs from what is in 'cpu_tlbstate'.
Add functions to transform the 'cpu_tlbstate' values into to the one
programmed into the hardware (CR3).
It's not easy to include mmu_context.h into tlbflush.h, so just move the
CR3 building over to tlbflush.h.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/mmu_context.h | 29 +----------------------------
arch/x86/include/asm/tlbflush.h | 26 ++++++++++++++++++++++++++
arch/x86/mm/tlb.c | 8 ++++----
3 files changed, 31 insertions(+), 32 deletions(-)
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -291,33 +291,6 @@ static inline bool arch_vma_access_permi
}
/*
- * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID
- * bits. This serves two purposes. It prevents a nasty situation in
- * which PCID-unaware code saves CR3, loads some other value (with PCID
- * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
- * the saved ASID was nonzero. It also means that any bugs involving
- * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
- * deterministically.
- */
-
-static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
-{
- if (static_cpu_has(X86_FEATURE_PCID)) {
- VM_WARN_ON_ONCE(asid > 4094);
- return __sme_pa(mm->pgd) | (asid + 1);
- } else {
- VM_WARN_ON_ONCE(asid != 0);
- return __sme_pa(mm->pgd);
- }
-}
-
-static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
-{
- VM_WARN_ON_ONCE(asid > 4094);
- return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
-}
-
-/*
* This can be used from process context to figure out what the value of
* CR3 is without needing to do a (slow) __read_cr3().
*
@@ -326,7 +299,7 @@ static inline unsigned long build_cr3_no
*/
static inline unsigned long __get_current_cr3_fast(void)
{
- unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+ unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/* For now, be very restrictive about when this can be called. */
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -69,6 +69,32 @@ static inline u64 inc_mm_tlb_gen(struct
return atomic64_inc_return(&mm->context.tlb_gen);
}
+/*
+ * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits.
+ * This serves two purposes. It prevents a nasty situation in which
+ * PCID-unaware code saves CR3, loads some other value (with PCID == 0),
+ * and then restores CR3, thus corrupting the TLB for ASID 0 if the saved
+ * ASID was nonzero. It also means that any bugs involving loading a
+ * PCID-enabled CR3 with CR4.PCIDE off will trigger deterministically.
+ */
+struct pgd_t;
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+{
+ if (static_cpu_has(X86_FEATURE_PCID)) {
+ VM_WARN_ON_ONCE(asid > 4094);
+ return __sme_pa(pgd) | (asid + 1);
+ } else {
+ VM_WARN_ON_ONCE(asid != 0);
+ return __sme_pa(pgd);
+ }
+}
+
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+{
+ VM_WARN_ON_ONCE(asid > 4094);
+ return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH;
+}
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -128,7 +128,7 @@ void switch_mm_irqs_off(struct mm_struct
* isn't free.
*/
#ifdef CONFIG_DEBUG_VM
- if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
+ if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
/*
* If we were to BUG here, we'd be very likely to kill
* the system so hard that we don't see the call trace.
@@ -195,7 +195,7 @@ void switch_mm_irqs_off(struct mm_struct
if (need_flush) {
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- write_cr3(build_cr3(next, new_asid));
+ write_cr3(build_cr3(next->pgd, new_asid));
/*
* NB: This gets called via leave_mm() in the idle path
@@ -208,7 +208,7 @@ void switch_mm_irqs_off(struct mm_struct
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
} else {
/* The new ASID is already up to date. */
- write_cr3(build_cr3_noflush(next, new_asid));
+ write_cr3(build_cr3_noflush(next->pgd, new_asid));
/* See above wrt _rcuidle. */
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
@@ -288,7 +288,7 @@ void initialize_tlbstate_and_flush(void)
!(cr4_read_shadow() & X86_CR4_PCIDE));
/* Force ASID 0 and force a TLB flush. */
- write_cr3(build_cr3(mm, 0));
+ write_cr3(build_cr3(mm->pgd, 0));
/* Reinitialize tlbstate. */
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Create asm/invpcid.h
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-create-asm-invpcid.h.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1a3b0caeb77edeac5ce5fa05e6a61c474c9a9745 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:47 +0100
Subject: x86/mm: Create asm/invpcid.h
From: Peter Zijlstra <peterz(a)infradead.org>
commit 1a3b0caeb77edeac5ce5fa05e6a61c474c9a9745 upstream.
Unclutter tlbflush.h a little.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/invpcid.h | 53 ++++++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/tlbflush.h | 49 ------------------------------------
2 files changed, 54 insertions(+), 48 deletions(-)
--- /dev/null
+++ b/arch/x86/include/asm/invpcid.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_INVPCID
+#define _ASM_X86_INVPCID
+
+static inline void __invpcid(unsigned long pcid, unsigned long addr,
+ unsigned long type)
+{
+ struct { u64 d[2]; } desc = { { pcid, addr } };
+
+ /*
+ * The memory clobber is because the whole point is to invalidate
+ * stale TLB entries and, especially if we're flushing global
+ * mappings, we don't want the compiler to reorder any subsequent
+ * memory accesses before the TLB flush.
+ *
+ * The hex opcode is invpcid (%ecx), %eax in 32-bit mode and
+ * invpcid (%rcx), %rax in long mode.
+ */
+ asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01"
+ : : "m" (desc), "a" (type), "c" (&desc) : "memory");
+}
+
+#define INVPCID_TYPE_INDIV_ADDR 0
+#define INVPCID_TYPE_SINGLE_CTXT 1
+#define INVPCID_TYPE_ALL_INCL_GLOBAL 2
+#define INVPCID_TYPE_ALL_NON_GLOBAL 3
+
+/* Flush all mappings for a given pcid and addr, not including globals. */
+static inline void invpcid_flush_one(unsigned long pcid,
+ unsigned long addr)
+{
+ __invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR);
+}
+
+/* Flush all mappings for a given PCID, not including globals. */
+static inline void invpcid_flush_single_context(unsigned long pcid)
+{
+ __invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT);
+}
+
+/* Flush all mappings, including globals, for all PCIDs. */
+static inline void invpcid_flush_all(void)
+{
+ __invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL);
+}
+
+/* Flush all mappings for all PCIDs except globals. */
+static inline void invpcid_flush_all_nonglobals(void)
+{
+ __invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
+}
+
+#endif /* _ASM_X86_INVPCID */
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -9,54 +9,7 @@
#include <asm/cpufeature.h>
#include <asm/special_insns.h>
#include <asm/smp.h>
-
-static inline void __invpcid(unsigned long pcid, unsigned long addr,
- unsigned long type)
-{
- struct { u64 d[2]; } desc = { { pcid, addr } };
-
- /*
- * The memory clobber is because the whole point is to invalidate
- * stale TLB entries and, especially if we're flushing global
- * mappings, we don't want the compiler to reorder any subsequent
- * memory accesses before the TLB flush.
- *
- * The hex opcode is invpcid (%ecx), %eax in 32-bit mode and
- * invpcid (%rcx), %rax in long mode.
- */
- asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01"
- : : "m" (desc), "a" (type), "c" (&desc) : "memory");
-}
-
-#define INVPCID_TYPE_INDIV_ADDR 0
-#define INVPCID_TYPE_SINGLE_CTXT 1
-#define INVPCID_TYPE_ALL_INCL_GLOBAL 2
-#define INVPCID_TYPE_ALL_NON_GLOBAL 3
-
-/* Flush all mappings for a given pcid and addr, not including globals. */
-static inline void invpcid_flush_one(unsigned long pcid,
- unsigned long addr)
-{
- __invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR);
-}
-
-/* Flush all mappings for a given PCID, not including globals. */
-static inline void invpcid_flush_single_context(unsigned long pcid)
-{
- __invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT);
-}
-
-/* Flush all mappings, including globals, for all PCIDs. */
-static inline void invpcid_flush_all(void)
-{
- __invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL);
-}
-
-/* Flush all mappings for all PCIDs except globals. */
-static inline void invpcid_flush_all_nonglobals(void)
-{
- __invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
-}
+#include <asm/invpcid.h>
static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
{
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/mm/dump_pagetables: Check PAGE_PRESENT for real
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-dump_pagetables-check-page_present-for-real.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c05344947b37f7cda726e802457370bc6eac4d26 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sat, 16 Dec 2017 01:14:39 +0100
Subject: x86/mm/dump_pagetables: Check PAGE_PRESENT for real
From: Thomas Gleixner <tglx(a)linutronix.de>
commit c05344947b37f7cda726e802457370bc6eac4d26 upstream.
The check for a present page in printk_prot():
if (!pgprot_val(prot)) {
/* Not present */
is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
analyzing mapping correctness. Check for the present bit to make an
informed decision.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/dump_pagetables.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -140,7 +140,7 @@ static void printk_prot(struct seq_file
static const char * const level_name[] =
{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
- if (!pgprot_val(prot)) {
+ if (!(pr & _PAGE_PRESENT)) {
/* Not present */
pt_dump_cont_printf(m, dmsg, " ");
} else {
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 3f67af51e56f291d7417d77c4f67cd774633c5e1 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:52 +0100
Subject: x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
From: Peter Zijlstra <peterz(a)infradead.org>
commit 3f67af51e56f291d7417d77c4f67cd774633c5e1 upstream.
Per popular request..
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -228,6 +228,9 @@ static inline void cr4_set_bits_and_upda
extern void initialize_tlbstate_and_flush(void);
+/*
+ * flush the entire current user mapping
+ */
static inline void __native_flush_tlb(void)
{
/*
@@ -240,6 +243,9 @@ static inline void __native_flush_tlb(vo
preempt_enable();
}
+/*
+ * flush everything
+ */
static inline void __native_flush_tlb_global(void)
{
unsigned long cr4, flags;
@@ -269,17 +275,27 @@ static inline void __native_flush_tlb_gl
raw_local_irq_restore(flags);
}
+/*
+ * flush one page in the user mapping
+ */
static inline void __native_flush_tlb_single(unsigned long addr)
{
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
}
+/*
+ * flush everything
+ */
static inline void __flush_tlb_all(void)
{
- if (boot_cpu_has(X86_FEATURE_PGE))
+ if (boot_cpu_has(X86_FEATURE_PGE)) {
__flush_tlb_global();
- else
+ } else {
+ /*
+ * !PGE -> !PCID (setup_pcid()), thus every flush is total.
+ */
__flush_tlb();
+ }
/*
* Note: if we somehow had PCID but not PGE, then this wouldn't work --
@@ -290,6 +306,9 @@ static inline void __flush_tlb_all(void)
*/
}
+/*
+ * flush one page in the kernel mapping
+ */
static inline void __flush_tlb_one(unsigned long addr)
{
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/microcode: Dont abuse the TLB-flush interface
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-microcode-dont-abuse-the-tlb-flush-interface.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 23cb7d46f371844c004784ad9552a57446f73e5a Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:51 +0100
Subject: x86/microcode: Dont abuse the TLB-flush interface
From: Peter Zijlstra <peterz(a)infradead.org>
commit 23cb7d46f371844c004784ad9552a57446f73e5a upstream.
Commit:
ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
... grubbed into tlbflush internals without coherent explanation.
Since it says its a precaution and the SDM doesn't mention anything like
this, take it out back.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: fenghua.yu(a)intel.com
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 19 ++++++-------------
arch/x86/kernel/cpu/microcode/intel.c | 13 -------------
2 files changed, 6 insertions(+), 26 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -246,20 +246,9 @@ static inline void __native_flush_tlb(vo
preempt_enable();
}
-static inline void __native_flush_tlb_global_irq_disabled(void)
-{
- unsigned long cr4;
-
- cr4 = this_cpu_read(cpu_tlbstate.cr4);
- /* clear PGE */
- native_write_cr4(cr4 & ~X86_CR4_PGE);
- /* write old PGE again and flush TLBs */
- native_write_cr4(cr4);
-}
-
static inline void __native_flush_tlb_global(void)
{
- unsigned long flags;
+ unsigned long cr4, flags;
if (static_cpu_has(X86_FEATURE_INVPCID)) {
/*
@@ -277,7 +266,11 @@ static inline void __native_flush_tlb_gl
*/
raw_local_irq_save(flags);
- __native_flush_tlb_global_irq_disabled();
+ cr4 = this_cpu_read(cpu_tlbstate.cr4);
+ /* toggle PGE */
+ native_write_cr4(cr4 ^ X86_CR4_PGE);
+ /* write old PGE again and flush TLBs */
+ native_write_cr4(cr4);
raw_local_irq_restore(flags);
}
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -565,15 +565,6 @@ static void print_ucode(struct ucode_cpu
}
#else
-/*
- * Flush global tlb. We only do this in x86_64 where paging has been enabled
- * already and PGE should be enabled as well.
- */
-static inline void flush_tlb_early(void)
-{
- __native_flush_tlb_global_irq_disabled();
-}
-
static inline void print_ucode(struct ucode_cpu_info *uci)
{
struct microcode_intel *mc;
@@ -602,10 +593,6 @@ static int apply_microcode_early(struct
if (rev != mc->hdr.rev)
return -1;
-#ifdef CONFIG_X86_64
- /* Flush global tlb. This is precaution. */
- flush_tlb_early();
-#endif
uci->cpu_sig.rev = rev;
if (early)
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/mm/64: Improve the memory map documentation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-64-improve-the-memory-map-documentation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5a7ccf4754fb3660569a6de52ba7f7fc3dfaf280 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Tue, 12 Dec 2017 07:56:43 -0800
Subject: x86/mm/64: Improve the memory map documentation
From: Andy Lutomirski <luto(a)kernel.org>
commit 5a7ccf4754fb3660569a6de52ba7f7fc3dfaf280 upstream.
The old docs had the vsyscall range wrong and were missing the fixmap.
Fix both.
There used to be 8 MB reserved for future vsyscalls, but that's long gone.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/x86/x86_64/mm.txt | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -19,8 +19,9 @@ ffffff0000000000 - ffffff7fffffffff (=39
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space (variable)
-ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
+ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space (variable)
+[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
+ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
Virtual memory map with 5 level page tables:
@@ -41,8 +42,9 @@ ffffff0000000000 - ffffff7fffffffff (=39
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space
-ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
+ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space
+[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
+ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
Architecture defines a 64-bit virtual address. Implementations can support
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/ldt: Rework locking
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-ldt-rework-locking.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c2b3496bb30bd159e9de42e5c952e1f1f33c9a77 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Thu, 14 Dec 2017 12:27:30 +0100
Subject: x86/ldt: Rework locking
From: Peter Zijlstra <peterz(a)infradead.org>
commit c2b3496bb30bd159e9de42e5c952e1f1f33c9a77 upstream.
The LDT is duplicated on fork() and on exec(), which is wrong as exec()
should start from a clean state, i.e. without LDT. To fix this the LDT
duplication code will be moved into arch_dup_mmap() which is only called
for fork().
This introduces a locking problem. arch_dup_mmap() holds mmap_sem of the
parent process, but the LDT duplication code needs to acquire
mm->context.lock to access the LDT data safely, which is the reverse lock
order of write_ldt() where mmap_sem nests into context.lock.
Solve this by introducing a new rw semaphore which serializes the
read/write_ldt() syscall operations and use context.lock to protect the
actual installment of the LDT descriptor.
So context.lock stabilizes mm->context.ldt and can nest inside of the new
semaphore or mmap_sem.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: dan.j.williams(a)intel.com
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: kirill.shutemov(a)linux.intel.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/mmu.h | 4 +++-
arch/x86/include/asm/mmu_context.h | 2 ++
arch/x86/kernel/ldt.c | 33 +++++++++++++++++++++------------
3 files changed, 26 insertions(+), 13 deletions(-)
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -3,6 +3,7 @@
#define _ASM_X86_MMU_H
#include <linux/spinlock.h>
+#include <linux/rwsem.h>
#include <linux/mutex.h>
#include <linux/atomic.h>
@@ -27,7 +28,8 @@ typedef struct {
atomic64_t tlb_gen;
#ifdef CONFIG_MODIFY_LDT_SYSCALL
- struct ldt_struct *ldt;
+ struct rw_semaphore ldt_usr_sem;
+ struct ldt_struct *ldt;
#endif
#ifdef CONFIG_X86_64
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -132,6 +132,8 @@ void enter_lazy_tlb(struct mm_struct *mm
static inline int init_new_context(struct task_struct *tsk,
struct mm_struct *mm)
{
+ mutex_init(&mm->context.lock);
+
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -5,6 +5,11 @@
* Copyright (C) 2002 Andi Kleen
*
* This handles calls from both 32bit and 64bit mode.
+ *
+ * Lock order:
+ * contex.ldt_usr_sem
+ * mmap_sem
+ * context.lock
*/
#include <linux/errno.h>
@@ -42,7 +47,7 @@ static void refresh_ldt_segments(void)
#endif
}
-/* context.lock is held for us, so we don't need any locking. */
+/* context.lock is held by the task which issued the smp function call */
static void flush_ldt(void *__mm)
{
struct mm_struct *mm = __mm;
@@ -99,15 +104,17 @@ static void finalize_ldt_struct(struct l
paravirt_alloc_ldt(ldt->entries, ldt->nr_entries);
}
-/* context.lock is held */
-static void install_ldt(struct mm_struct *current_mm,
- struct ldt_struct *ldt)
+static void install_ldt(struct mm_struct *mm, struct ldt_struct *ldt)
{
+ mutex_lock(&mm->context.lock);
+
/* Synchronizes with READ_ONCE in load_mm_ldt. */
- smp_store_release(¤t_mm->context.ldt, ldt);
+ smp_store_release(&mm->context.ldt, ldt);
- /* Activate the LDT for all CPUs using current_mm. */
- on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+ /* Activate the LDT for all CPUs using currents mm. */
+ on_each_cpu_mask(mm_cpumask(mm), flush_ldt, mm, true);
+
+ mutex_unlock(&mm->context.lock);
}
static void free_ldt_struct(struct ldt_struct *ldt)
@@ -133,7 +140,8 @@ int init_new_context_ldt(struct task_str
struct mm_struct *old_mm;
int retval = 0;
- mutex_init(&mm->context.lock);
+ init_rwsem(&mm->context.ldt_usr_sem);
+
old_mm = current->mm;
if (!old_mm) {
mm->context.ldt = NULL;
@@ -180,7 +188,7 @@ static int read_ldt(void __user *ptr, un
unsigned long entries_size;
int retval;
- mutex_lock(&mm->context.lock);
+ down_read(&mm->context.ldt_usr_sem);
if (!mm->context.ldt) {
retval = 0;
@@ -209,7 +217,7 @@ static int read_ldt(void __user *ptr, un
retval = bytecount;
out_unlock:
- mutex_unlock(&mm->context.lock);
+ up_read(&mm->context.ldt_usr_sem);
return retval;
}
@@ -269,7 +277,8 @@ static int write_ldt(void __user *ptr, u
ldt.avl = 0;
}
- mutex_lock(&mm->context.lock);
+ if (down_write_killable(&mm->context.ldt_usr_sem))
+ return -EINTR;
old_ldt = mm->context.ldt;
old_nr_entries = old_ldt ? old_ldt->nr_entries : 0;
@@ -291,7 +300,7 @@ static int write_ldt(void __user *ptr, u
error = 0;
out_unlock:
- mutex_unlock(&mm->context.lock);
+ up_write(&mm->context.ldt_usr_sem);
out:
return error;
}
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/ldt: Prevent LDT inheritance on exec
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-ldt-prevent-ldt-inheritance-on-exec.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a4828f81037f491b2cc986595e3a969a6eeb2fb5 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 14 Dec 2017 12:27:31 +0100
Subject: x86/ldt: Prevent LDT inheritance on exec
From: Thomas Gleixner <tglx(a)linutronix.de>
commit a4828f81037f491b2cc986595e3a969a6eeb2fb5 upstream.
The LDT is inherited across fork() or exec(), but that makes no sense
at all because exec() is supposed to start the process clean.
The reason why this happens is that init_new_context_ldt() is called from
init_new_context() which obviously needs to be called for both fork() and
exec().
It would be surprising if anything relies on that behaviour, so it seems to
be safe to remove that misfeature.
Split the context initialization into two parts. Clear the LDT pointer and
initialize the mutex from the general context init and move the LDT
duplication to arch_dup_mmap() which is only called on fork().
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: dan.j.williams(a)intel.com
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: kirill.shutemov(a)linux.intel.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/mmu_context.h | 21 ++++++++++++++-------
arch/x86/kernel/ldt.c | 18 +++++-------------
tools/testing/selftests/x86/ldt_gdt.c | 9 +++------
3 files changed, 22 insertions(+), 26 deletions(-)
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -57,11 +57,17 @@ struct ldt_struct {
/*
* Used for LDT copy/destruction.
*/
-int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm);
+static inline void init_new_context_ldt(struct mm_struct *mm)
+{
+ mm->context.ldt = NULL;
+ init_rwsem(&mm->context.ldt_usr_sem);
+}
+int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm);
void destroy_context_ldt(struct mm_struct *mm);
#else /* CONFIG_MODIFY_LDT_SYSCALL */
-static inline int init_new_context_ldt(struct task_struct *tsk,
- struct mm_struct *mm)
+static inline void init_new_context_ldt(struct mm_struct *mm) { }
+static inline int ldt_dup_context(struct mm_struct *oldmm,
+ struct mm_struct *mm)
{
return 0;
}
@@ -137,15 +143,16 @@ static inline int init_new_context(struc
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
- #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
/* pkey 0 is the default and always allocated */
mm->context.pkey_allocation_map = 0x1;
/* -1 means unallocated or invalid */
mm->context.execute_only_pkey = -1;
}
- #endif
- return init_new_context_ldt(tsk, mm);
+#endif
+ init_new_context_ldt(mm);
+ return 0;
}
static inline void destroy_context(struct mm_struct *mm)
{
@@ -181,7 +188,7 @@ do { \
static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
{
paravirt_arch_dup_mmap(oldmm, mm);
- return 0;
+ return ldt_dup_context(oldmm, mm);
}
static inline void arch_exit_mmap(struct mm_struct *mm)
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -131,28 +131,20 @@ static void free_ldt_struct(struct ldt_s
}
/*
- * we do not have to muck with descriptors here, that is
- * done in switch_mm() as needed.
+ * Called on fork from arch_dup_mmap(). Just copy the current LDT state,
+ * the new task is not running, so nothing can be installed.
*/
-int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
+int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm)
{
struct ldt_struct *new_ldt;
- struct mm_struct *old_mm;
int retval = 0;
- init_rwsem(&mm->context.ldt_usr_sem);
-
- old_mm = current->mm;
- if (!old_mm) {
- mm->context.ldt = NULL;
+ if (!old_mm)
return 0;
- }
mutex_lock(&old_mm->context.lock);
- if (!old_mm->context.ldt) {
- mm->context.ldt = NULL;
+ if (!old_mm->context.ldt)
goto out_unlock;
- }
new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
if (!new_ldt) {
--- a/tools/testing/selftests/x86/ldt_gdt.c
+++ b/tools/testing/selftests/x86/ldt_gdt.c
@@ -627,13 +627,10 @@ static void do_multicpu_tests(void)
static int finish_exec_test(void)
{
/*
- * In a sensible world, this would be check_invalid_segment(0, 1);
- * For better or for worse, though, the LDT is inherited across exec.
- * We can probably change this safely, but for now we test it.
+ * Older kernel versions did inherit the LDT on exec() which is
+ * wrong because exec() starts from a clean state.
*/
- check_valid_segment(0, 1,
- AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB,
- 42, true);
+ check_invalid_segment(0, 1);
return nerrs ? 1 : 0;
}
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/Kconfig: Limit NR_CPUS on 32-bit to a sane amount
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7bbcbd3d1cdcbacd0f9f8dc4c98d550972f1ca30 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Wed, 20 Dec 2017 18:02:34 +0100
Subject: x86/Kconfig: Limit NR_CPUS on 32-bit to a sane amount
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 7bbcbd3d1cdcbacd0f9f8dc4c98d550972f1ca30 upstream.
The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y
and NR_CPUS=512, because the fixmap area becomes too big.
Limit the number of CPUs with BIGSMP to 64, which is already way to big for
32-bit, but it's at least a working limitation.
We performed a quick survey of 32-bit-only machines that might be affected
by this change negatively, but found none.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -925,7 +925,8 @@ config MAXSMP
config NR_CPUS
int "Maximum number of CPUs" if SMP && !MAXSMP
range 2 8 if SMP && X86_32 && !X86_BIGSMP
- range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
+ range 2 64 if SMP && X86_32 && X86_BIGSMP
+ range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK && X86_64
range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
default "1" if !SMP
default "8192" if MAXSMP
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e8ffe96e5933d417195268478479933d56213a3f Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Tue, 5 Dec 2017 13:34:54 +0100
Subject: x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
From: Peter Zijlstra <peterz(a)infradead.org>
commit e8ffe96e5933d417195268478479933d56213a3f upstream.
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/x86/x86_64/mm.txt | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,6 +1,4 @@
-<previous description obsolete, deleted>
-
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
@@ -49,8 +47,9 @@ ffffffffffe00000 - ffffffffffffffff (=2
Architecture defines a 64-bit virtual address. Implementations can support
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
-through to the most-significant implemented bit are set to either all ones
-or all zero. This causes hole between user space and kernel addresses.
+through to the most-significant implemented bit are sign extended.
+This causes hole between user space and kernel addresses if you interpret them
+as unsigned.
The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
@@ -60,9 +59,6 @@ vmalloc space is lazily synchronized int
the processes using the page fault handler, with init_top_pgt as
reference.
-Current X86-64 implementations support up to 46 bits of address space (64 TB),
-which is our current limit. This expands into MBZ space in the page tables.
-
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
@@ -74,5 +70,3 @@ following fixmap section.
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
Their order is preserved but their base will be offset early at boot time.
-
--Andi Kleen, Jul 2004
Patches currently in stable-queue which might be from peterz(a)infradead.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/cpu_entry_area: Move it to a separate unit
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu_entry_area-move-it-to-a-separate-unit.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ed1bbc40a0d10e0c5c74fe7bdc6298295cf40255 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Wed, 20 Dec 2017 18:28:54 +0100
Subject: x86/cpu_entry_area: Move it to a separate unit
From: Thomas Gleixner <tglx(a)linutronix.de>
commit ed1bbc40a0d10e0c5c74fe7bdc6298295cf40255 upstream.
Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/cpu_entry_area.h | 52 +++++++++++++++++
arch/x86/include/asm/fixmap.h | 41 -------------
arch/x86/kernel/cpu/common.c | 94 ------------------------------
arch/x86/kernel/traps.c | 1
arch/x86/mm/Makefile | 2
arch/x86/mm/cpu_entry_area.c | 104 ++++++++++++++++++++++++++++++++++
6 files changed, 159 insertions(+), 135 deletions(-)
--- /dev/null
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef _ASM_X86_CPU_ENTRY_AREA_H
+#define _ASM_X86_CPU_ENTRY_AREA_H
+
+#include <linux/percpu-defs.h>
+#include <asm/processor.h>
+
+/*
+ * cpu_entry_area is a percpu region that contains things needed by the CPU
+ * and early entry/exit code. Real types aren't used for all fields here
+ * to avoid circular header dependencies.
+ *
+ * Every field is a virtual alias of some other allocated backing store.
+ * There is no direct allocation of a struct cpu_entry_area.
+ */
+struct cpu_entry_area {
+ char gdt[PAGE_SIZE];
+
+ /*
+ * The GDT is just below entry_stack and thus serves (on x86_64) as
+ * a a read-only guard page.
+ */
+ struct entry_stack_page entry_stack_page;
+
+ /*
+ * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because
+ * we need task switches to work, and task switches write to the TSS.
+ */
+ struct tss_struct tss;
+
+ char entry_trampoline[PAGE_SIZE];
+
+#ifdef CONFIG_X86_64
+ /*
+ * Exception stacks used for IST entries.
+ *
+ * In the future, this should have a separate slot for each stack
+ * with guard pages between them.
+ */
+ char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
+#endif
+};
+
+#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area))
+#define CPU_ENTRY_AREA_PAGES (CPU_ENTRY_AREA_SIZE / PAGE_SIZE)
+
+DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+
+extern void setup_cpu_entry_areas(void);
+
+#endif
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -25,6 +25,7 @@
#else
#include <uapi/asm/vsyscall.h>
#endif
+#include <asm/cpu_entry_area.h>
/*
* We can't declare FIXADDR_TOP as variable for x86_64 because vsyscall
@@ -45,46 +46,6 @@ extern unsigned long __FIXADDR_TOP;
#endif
/*
- * cpu_entry_area is a percpu region in the fixmap that contains things
- * needed by the CPU and early entry/exit code. Real types aren't used
- * for all fields here to avoid circular header dependencies.
- *
- * Every field is a virtual alias of some other allocated backing store.
- * There is no direct allocation of a struct cpu_entry_area.
- */
-struct cpu_entry_area {
- char gdt[PAGE_SIZE];
-
- /*
- * The GDT is just below entry_stack and thus serves (on x86_64) as
- * a a read-only guard page.
- */
- struct entry_stack_page entry_stack_page;
-
- /*
- * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because
- * we need task switches to work, and task switches write to the TSS.
- */
- struct tss_struct tss;
-
- char entry_trampoline[PAGE_SIZE];
-
-#ifdef CONFIG_X86_64
- /*
- * Exception stacks used for IST entries.
- *
- * In the future, this should have a separate slot for each stack
- * with guard pages between them.
- */
- char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
-#endif
-};
-
-#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE)
-
-extern void setup_cpu_entry_areas(void);
-
-/*
* Here we define all the compile-time 'special' virtual
* addresses. The point is to have a constant address at
* compile time, but to set the physical address only
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -482,102 +482,8 @@ static const unsigned int exception_stac
[0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
[DEBUG_STACK - 1] = DEBUG_STKSZ
};
-
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
- [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
-#endif
-
-static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page,
- entry_stack_storage);
-
-static void __init
-set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot)
-{
- for ( ; pages; pages--, idx--, ptr += PAGE_SIZE)
- __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot);
-}
-
-/* Setup the fixmap mappings only once per-processor */
-static void __init setup_cpu_entry_area(int cpu)
-{
-#ifdef CONFIG_X86_64
- extern char _entry_trampoline[];
-
- /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
- pgprot_t gdt_prot = PAGE_KERNEL_RO;
- pgprot_t tss_prot = PAGE_KERNEL_RO;
-#else
- /*
- * On native 32-bit systems, the GDT cannot be read-only because
- * our double fault handler uses a task gate, and entering through
- * a task gate needs to change an available TSS to busy. If the
- * GDT is read-only, that will triple fault. The TSS cannot be
- * read-only because the CPU writes to it on task switches.
- *
- * On Xen PV, the GDT must be read-only because the hypervisor
- * requires it.
- */
- pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ?
- PAGE_KERNEL_RO : PAGE_KERNEL;
- pgprot_t tss_prot = PAGE_KERNEL;
#endif
- __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot);
- set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, entry_stack_page),
- per_cpu_ptr(&entry_stack_storage, cpu), 1,
- PAGE_KERNEL);
-
- /*
- * The Intel SDM says (Volume 3, 7.2.1):
- *
- * Avoid placing a page boundary in the part of the TSS that the
- * processor reads during a task switch (the first 104 bytes). The
- * processor may not correctly perform address translations if a
- * boundary occurs in this area. During a task switch, the processor
- * reads and writes into the first 104 bytes of each TSS (using
- * contiguous physical addresses beginning with the physical address
- * of the first byte of the TSS). So, after TSS access begins, if
- * part of the 104 bytes is not physically contiguous, the processor
- * will access incorrect information without generating a page-fault
- * exception.
- *
- * There are also a lot of errata involving the TSS spanning a page
- * boundary. Assert that we're not doing that.
- */
- BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
- offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
- BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
- set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss),
- &per_cpu(cpu_tss_rw, cpu),
- sizeof(struct tss_struct) / PAGE_SIZE,
- tss_prot);
-
-#ifdef CONFIG_X86_32
- per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
-#endif
-
-#ifdef CONFIG_X86_64
- BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
- BUILD_BUG_ON(sizeof(exception_stacks) !=
- sizeof(((struct cpu_entry_area *)0)->exception_stacks));
- set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks),
- &per_cpu(exception_stacks, cpu),
- sizeof(exception_stacks) / PAGE_SIZE,
- PAGE_KERNEL);
-
- __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline),
- __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX);
-#endif
-}
-
-void __init setup_cpu_entry_areas(void)
-{
- unsigned int cpu;
-
- for_each_possible_cpu(cpu)
- setup_cpu_entry_area(cpu);
-}
-
/* Load the original GDT from the per-cpu structure */
void load_direct_gdt(int cpu)
{
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -52,6 +52,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -10,7 +10,7 @@ CFLAGS_REMOVE_mem_encrypt.o = -pg
endif
obj-y := init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
- pat.o pgtable.o physaddr.o setup_nx.o tlb.o
+ pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o
# Make sure __phys_addr has no stackprotector
nostackp := $(call cc-option, -fno-stack-protector)
--- /dev/null
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+
+#include <asm/cpu_entry_area.h>
+#include <asm/pgtable.h>
+#include <asm/fixmap.h>
+#include <asm/desc.h>
+
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
+
+#ifdef CONFIG_X86_64
+static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+ [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+#endif
+
+static void __init
+set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot)
+{
+ for ( ; pages; pages--, idx--, ptr += PAGE_SIZE)
+ __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot);
+}
+
+/* Setup the fixmap mappings only once per-processor */
+static void __init setup_cpu_entry_area(int cpu)
+{
+#ifdef CONFIG_X86_64
+ extern char _entry_trampoline[];
+
+ /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
+ pgprot_t gdt_prot = PAGE_KERNEL_RO;
+ pgprot_t tss_prot = PAGE_KERNEL_RO;
+#else
+ /*
+ * On native 32-bit systems, the GDT cannot be read-only because
+ * our double fault handler uses a task gate, and entering through
+ * a task gate needs to change an available TSS to busy. If the
+ * GDT is read-only, that will triple fault. The TSS cannot be
+ * read-only because the CPU writes to it on task switches.
+ *
+ * On Xen PV, the GDT must be read-only because the hypervisor
+ * requires it.
+ */
+ pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ?
+ PAGE_KERNEL_RO : PAGE_KERNEL;
+ pgprot_t tss_prot = PAGE_KERNEL;
+#endif
+
+ __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot);
+ set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, entry_stack_page),
+ per_cpu_ptr(&entry_stack_storage, cpu), 1,
+ PAGE_KERNEL);
+
+ /*
+ * The Intel SDM says (Volume 3, 7.2.1):
+ *
+ * Avoid placing a page boundary in the part of the TSS that the
+ * processor reads during a task switch (the first 104 bytes). The
+ * processor may not correctly perform address translations if a
+ * boundary occurs in this area. During a task switch, the processor
+ * reads and writes into the first 104 bytes of each TSS (using
+ * contiguous physical addresses beginning with the physical address
+ * of the first byte of the TSS). So, after TSS access begins, if
+ * part of the 104 bytes is not physically contiguous, the processor
+ * will access incorrect information without generating a page-fault
+ * exception.
+ *
+ * There are also a lot of errata involving the TSS spanning a page
+ * boundary. Assert that we're not doing that.
+ */
+ BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
+ offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
+ BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
+ set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss),
+ &per_cpu(cpu_tss_rw, cpu),
+ sizeof(struct tss_struct) / PAGE_SIZE,
+ tss_prot);
+
+#ifdef CONFIG_X86_32
+ per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
+#endif
+
+#ifdef CONFIG_X86_64
+ BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+ BUILD_BUG_ON(sizeof(exception_stacks) !=
+ sizeof(((struct cpu_entry_area *)0)->exception_stacks));
+ set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks),
+ &per_cpu(exception_stacks, cpu),
+ sizeof(exception_stacks) / PAGE_SIZE,
+ PAGE_KERNEL);
+
+ __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline),
+ __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX);
+#endif
+}
+
+void __init setup_cpu_entry_areas(void)
+{
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu)
+ setup_cpu_entry_area(cpu);
+}
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f6c4fd506cb626e4346aa81688f255e593a7c5a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sat, 23 Dec 2017 19:45:11 +0100
Subject: x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
From: Thomas Gleixner <tglx(a)linutronix.de>
commit f6c4fd506cb626e4346aa81688f255e593a7c5a0 upstream.
The loop which populates the CPU entry area PMDs can wrap around on 32bit
machines when the number of CPUs is small.
It worked wonderful for NR_CPUS=64 for whatever reason and the moron who
wrote that code did not bother to test it with !SMP.
Check for the wraparound to fix it.
Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: kernel test robot <fengguang.wu(a)intel.com>
Signed-off-by: Thomas "Feels stupid" Gleixner <tglx(a)linutronix.de>
Tested-by: Borislav Petkov <bp(a)alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/cpu_entry_area.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -122,7 +122,8 @@ static __init void setup_cpu_entry_area_
start = CPU_ENTRY_AREA_BASE;
end = start + CPU_ENTRY_AREA_MAP_SIZE;
- for (; start < end; start += PMD_SIZE)
+ /* Careful here: start + PMD_SIZE might wrap around */
+ for (; start < end && start >= CPU_ENTRY_AREA_BASE; start += PMD_SIZE)
populate_extra_pte(start);
#endif
}
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
spi: xilinx: Detect stall with Unknown commands
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
spi-xilinx-detect-stall-with-unknown-commands.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5a1314fa697fc65cefaba64cd4699bfc3e6882a6 Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ricardo.ribalda(a)gmail.com>
Date: Tue, 21 Nov 2017 10:09:02 +0100
Subject: spi: xilinx: Detect stall with Unknown commands
From: Ricardo Ribalda Delgado <ricardo.ribalda(a)gmail.com>
commit 5a1314fa697fc65cefaba64cd4699bfc3e6882a6 upstream.
When the core is configured in C_SPI_MODE > 0, it integrates a
lookup table that automatically configures the core in dual or quad mode
based on the command (first byte on the tx fifo).
Unfortunately, that list mode_?_memoy_*.mif does not contain all the
supported commands by the flash.
Since 4.14 spi-nor automatically tries to probe the flash using SFDP
(command 0x5a), and that command is not part of the list_mode table.
Whit the right combination of C_SPI_MODE and C_SPI_MEMORY this leads
into a stall that can only be recovered with a soft rest.
This patch detects this kind of stall and returns -EIO to the caller on
those commands. spi-nor can handle this error properly:
m25p80 spi0.0: Detected stall. Check C_SPI_MODE and C_SPI_MEMORY. 0x21 0x2404
m25p80 spi0.0: SPI transfer failed: -5
spi_master spi0: failed to transfer one message from queue
m25p80 spi0.0: s25sl064p (8192 Kbytes)
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda(a)gmail.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/spi/spi-xilinx.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/drivers/spi/spi-xilinx.c
+++ b/drivers/spi/spi-xilinx.c
@@ -271,6 +271,7 @@ static int xilinx_spi_txrx_bufs(struct s
while (remaining_words) {
int n_words, tx_words, rx_words;
u32 sr;
+ int stalled;
n_words = min(remaining_words, xspi->buffer_size);
@@ -299,7 +300,17 @@ static int xilinx_spi_txrx_bufs(struct s
/* Read out all the data from the Rx FIFO */
rx_words = n_words;
+ stalled = 10;
while (rx_words) {
+ if (rx_words == n_words && !(stalled--) &&
+ !(sr & XSPI_SR_TX_EMPTY_MASK) &&
+ (sr & XSPI_SR_RX_EMPTY_MASK)) {
+ dev_err(&spi->dev,
+ "Detected stall. Check C_SPI_MODE and C_SPI_MEMORY\n");
+ xspi_init_hw(xspi);
+ return -EIO;
+ }
+
if ((sr & XSPI_SR_TX_EMPTY_MASK) && (rx_words > 1)) {
xilinx_spi_rx(xspi);
rx_words--;
Patches currently in stable-queue which might be from ricardo.ribalda(a)gmail.com are
queue-4.14/spi-xilinx-detect-stall-with-unknown-commands.patch
This is a note to let you know that I've just added the patch titled
spi: a3700: Fix clk prescaling for coefficient over 15
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
spi-a3700-fix-clk-prescaling-for-coefficient-over-15.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 251c201bf4f8b5bf4f1ccb4f8920eed2e1f57580 Mon Sep 17 00:00:00 2001
From: Maxime Chevallier <maxime.chevallier(a)smile.fr>
Date: Mon, 27 Nov 2017 15:16:32 +0100
Subject: spi: a3700: Fix clk prescaling for coefficient over 15
From: Maxime Chevallier <maxime.chevallier(a)smile.fr>
commit 251c201bf4f8b5bf4f1ccb4f8920eed2e1f57580 upstream.
The Armada 3700 SPI controller has 2 ranges of prescaler coefficients.
One ranging from 0 to 15 by steps of 1, and one ranging from 0 to 30 by
steps of 2.
This commit fixes the prescaler coefficients that are over 15 so that it
uses the correct range of values. The prescaling coefficient is rounded
to the upper value if it is odd.
This was tested on Espressobin with spidev and a locigal analyser.
Signed-off-by: Maxime Chevallier <maxime.chevallier(a)smile.fr>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/spi/spi-armada-3700.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/drivers/spi/spi-armada-3700.c
+++ b/drivers/spi/spi-armada-3700.c
@@ -79,6 +79,7 @@
#define A3700_SPI_BYTE_LEN BIT(5)
#define A3700_SPI_CLK_PRESCALE BIT(0)
#define A3700_SPI_CLK_PRESCALE_MASK (0x1f)
+#define A3700_SPI_CLK_EVEN_OFFS (0x10)
#define A3700_SPI_WFIFO_THRS_BIT 28
#define A3700_SPI_RFIFO_THRS_BIT 24
@@ -220,6 +221,13 @@ static void a3700_spi_clock_set(struct a
prescale = DIV_ROUND_UP(clk_get_rate(a3700_spi->clk), speed_hz);
+ /* For prescaler values over 15, we can only set it by steps of 2.
+ * Starting from A3700_SPI_CLK_EVEN_OFFS, we set values from 0 up to
+ * 30. We only use this range from 16 to 30.
+ */
+ if (prescale > 15)
+ prescale = A3700_SPI_CLK_EVEN_OFFS + DIV_ROUND_UP(prescale, 2);
+
val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
val = val & ~A3700_SPI_CLK_PRESCALE_MASK;
Patches currently in stable-queue which might be from maxime.chevallier(a)smile.fr are
queue-4.14/spi-a3700-fix-clk-prescaling-for-coefficient-over-15.patch
This is a note to let you know that I've just added the patch titled
Revert "parisc: Re-enable interrupts early"
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
revert-parisc-re-enable-interrupts-early.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9352aeada4d8d8753fc0e414fbfe8fdfcb68a12c Mon Sep 17 00:00:00 2001
From: John David Anglin <dave.anglin(a)bell.net>
Date: Mon, 13 Nov 2017 19:35:33 -0500
Subject: Revert "parisc: Re-enable interrupts early"
From: John David Anglin <dave.anglin(a)bell.net>
commit 9352aeada4d8d8753fc0e414fbfe8fdfcb68a12c upstream.
This reverts commit 5c38602d83e584047906b41b162ababd4db4106d.
Interrupts can't be enabled early because the register saves are done on
the thread stack prior to switching to the IRQ stack. This caused stack
overflows and the thread stack needed increasing to 32k. Even then,
stack overflows still occasionally occurred.
Background:
Even with a 32 kB thread stack, I have seen instances where the thread
stack overflowed on the mx3210 buildd. Detection of stack overflow only
occurs when we have an external interrupt. When an external interrupt
occurs, we switch to the thread stack if we are not already on a kernel
stack. Then, registers and specials are saved to the kernel stack.
The bug occurs in intr_return where interrupts are reenabled prior to
returning from the interrupt. This was done incase we need to schedule
or deliver signals. However, it introduces the possibility that
multiple external interrupts may occur on the thread stack and cause a
stack overflow. These might not be detected and cause the kernel to
misbehave in random ways.
This patch changes the code back to only reenable interrupts when we are
going to schedule or deliver signals. As a result, we generally return
from an interrupt before reenabling interrupts. This minimizes the
growth of the thread stack.
Fixes: 5c38602d83e5 ("parisc: Re-enable interrupts early")
Signed-off-by: John David Anglin <dave.anglin(a)bell.net>
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/kernel/entry.S | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -878,9 +878,6 @@ ENTRY_CFI(syscall_exit_rfi)
STREG %r19,PT_SR7(%r16)
intr_return:
- /* NOTE: Need to enable interrupts incase we schedule. */
- ssm PSW_SM_I, %r0
-
/* check for reschedule */
mfctl %cr30,%r1
LDREG TI_FLAGS(%r1),%r19 /* sched.h: TIF_NEED_RESCHED */
@@ -907,6 +904,11 @@ intr_check_sig:
LDREG PT_IASQ1(%r16), %r20
cmpib,COND(=),n 0,%r20,intr_restore /* backward */
+ /* NOTE: We need to enable interrupts if we have to deliver
+ * signals. We used to do this earlier but it caused kernel
+ * stack overflows. */
+ ssm PSW_SM_I, %r0
+
copy %r0, %r25 /* long in_syscall = 0 */
#ifdef CONFIG_64BIT
ldo -16(%r30),%r29 /* Reference param save area */
@@ -958,6 +960,10 @@ intr_do_resched:
cmpib,COND(=) 0, %r20, intr_do_preempt
nop
+ /* NOTE: We need to enable interrupts if we schedule. We used
+ * to do this earlier but it caused kernel stack overflows. */
+ ssm PSW_SM_I, %r0
+
#ifdef CONFIG_64BIT
ldo -16(%r30),%r29 /* Reference param save area */
#endif
Patches currently in stable-queue which might be from dave.anglin(a)bell.net are
queue-4.14/revert-parisc-re-enable-interrupts-early.patch
This is a note to let you know that I've just added the patch titled
powerpc/perf: Dereference BHRB entries safely
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
powerpc-perf-dereference-bhrb-entries-safely.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f41d84dddc66b164ac16acf3f584c276146f1c48 Mon Sep 17 00:00:00 2001
From: Ravi Bangoria <ravi.bangoria(a)linux.vnet.ibm.com>
Date: Tue, 12 Dec 2017 17:59:15 +0530
Subject: powerpc/perf: Dereference BHRB entries safely
From: Ravi Bangoria <ravi.bangoria(a)linux.vnet.ibm.com>
commit f41d84dddc66b164ac16acf3f584c276146f1c48 upstream.
It's theoretically possible that branch instructions recorded in
BHRB (Branch History Rolling Buffer) entries have already been
unmapped before they are processed by the kernel. Hence, trying to
dereference such memory location will result in a crash. eg:
Unable to handle kernel paging request for data at address 0xd000000019c41764
Faulting instruction address: 0xc000000000084a14
NIP [c000000000084a14] branch_target+0x4/0x70
LR [c0000000000eb828] record_and_restart+0x568/0x5c0
Call Trace:
[c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
[c0000000000ec378] perf_event_interrupt+0x298/0x460
[c000000000027964] performance_monitor_exception+0x54/0x70
[c000000000009ba4] performance_monitor_common+0x114/0x120
Fix it by deferefencing the addresses safely.
Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB")
Suggested-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria(a)linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
[mpe: Use probe_kernel_read() which is clearer, tweak change log]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/powerpc/perf/core-book3s.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -410,8 +410,12 @@ static __u64 power_pmu_bhrb_to(u64 addr)
int ret;
__u64 target;
- if (is_kernel_addr(addr))
- return branch_target((unsigned int *)addr);
+ if (is_kernel_addr(addr)) {
+ if (probe_kernel_read(&instr, (void *)addr, sizeof(instr)))
+ return 0;
+
+ return branch_target(&instr);
+ }
/* Userspace: need copy instruction here then translate it */
pagefault_disable();
Patches currently in stable-queue which might be from ravi.bangoria(a)linux.vnet.ibm.com are
queue-4.14/powerpc-perf-dereference-bhrb-entries-safely.patch
This is a note to let you know that I've just added the patch titled
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pinctrl-cherryview-mask-all-interrupts-on-intel_strago-based-systems.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d2b3c353595a855794f8b9df5b5bdbe8deb0c413 Mon Sep 17 00:00:00 2001
From: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Date: Mon, 4 Dec 2017 12:11:02 +0300
Subject: pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
From: Mika Westerberg <mika.westerberg(a)linux.intel.com>
commit d2b3c353595a855794f8b9df5b5bdbe8deb0c413 upstream.
Guenter Roeck reported an interrupt storm on a prototype system which is
based on Cyan Chromebook. The root cause turned out to be a incorrectly
configured pin that triggers spurious interrupts. This will be fixed in
coreboot but currently we need to prevent the interrupt storm from
happening by masking all interrupts (but not GPEs) on those systems.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197953
Fixes: bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in probe")
Reported-and-tested-by: Guenter Roeck <linux(a)roeck-us.net>
Reported-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pinctrl/intel/pinctrl-cherryview.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/drivers/pinctrl/intel/pinctrl-cherryview.c
+++ b/drivers/pinctrl/intel/pinctrl-cherryview.c
@@ -1620,6 +1620,22 @@ static int chv_gpio_probe(struct chv_pin
clear_bit(i, chip->irq_valid_mask);
}
+ /*
+ * The same set of machines in chv_no_valid_mask[] have incorrectly
+ * configured GPIOs that generate spurious interrupts so we use
+ * this same list to apply another quirk for them.
+ *
+ * See also https://bugzilla.kernel.org/show_bug.cgi?id=197953.
+ */
+ if (!need_valid_mask) {
+ /*
+ * Mask all interrupts the community is able to generate
+ * but leave the ones that can only generate GPEs unmasked.
+ */
+ chv_writel(GENMASK(31, pctrl->community->nirqs),
+ pctrl->regs + CHV_INTMASK);
+ }
+
/* Clear all interrupts */
chv_writel(0xffff, pctrl->regs + CHV_INTSTAT);
Patches currently in stable-queue which might be from mika.westerberg(a)linux.intel.com are
queue-4.14/pinctrl-cherryview-mask-all-interrupts-on-intel_strago-based-systems.patch
This is a note to let you know that I've just added the patch titled
parisc: Hide Diva-built-in serial aux and graphics card
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-hide-diva-built-in-serial-aux-and-graphics-card.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From bcf3f1752a622f1372d3252d0fea8855d89812e7 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 12 Dec 2017 21:52:26 +0100
Subject: parisc: Hide Diva-built-in serial aux and graphics card
From: Helge Deller <deller(a)gmx.de>
commit bcf3f1752a622f1372d3252d0fea8855d89812e7 upstream.
Diva GSP card has built-in serial AUX port and ATI graphic card which simply
don't work and which both don't have external connectors. User Guides even
mention that those devices shouldn't be used.
So, prevent that Linux drivers try to enable those devices.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/parisc/lba_pci.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1692,3 +1692,36 @@ void lba_set_iregs(struct parisc_device
iounmap(base_addr);
}
+
+/*
+ * The design of the Diva management card in rp34x0 machines (rp3410, rp3440)
+ * seems rushed, so that many built-in components simply don't work.
+ * The following quirks disable the serial AUX port and the built-in ATI RV100
+ * Radeon 7000 graphics card which both don't have any external connectors and
+ * thus are useless, and even worse, e.g. the AUX port occupies ttyS0 and as
+ * such makes those machines the only PARISC machines on which we can't use
+ * ttyS0 as boot console.
+ */
+static void quirk_diva_ati_card(struct pci_dev *dev)
+{
+ if (dev->subsystem_vendor != PCI_VENDOR_ID_HP ||
+ dev->subsystem_device != 0x1292)
+ return;
+
+ dev_info(&dev->dev, "Hiding Diva built-in ATI card");
+ dev->device = 0;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_RADEON_QY,
+ quirk_diva_ati_card);
+
+static void quirk_diva_aux_disable(struct pci_dev *dev)
+{
+ if (dev->subsystem_vendor != PCI_VENDOR_ID_HP ||
+ dev->subsystem_device != 0x1291)
+ return;
+
+ dev_info(&dev->dev, "Hiding Diva built-in AUX serial device");
+ dev->device = 0;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_DIVA_AUX,
+ quirk_diva_aux_disable);
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.14/parisc-align-os_hpmc_size-on-word-boundary.patch
queue-4.14/parisc-fix-indenting-in-puts.patch
queue-4.14/revert-parisc-re-enable-interrupts-early.patch
queue-4.14/parisc-hide-diva-built-in-serial-aux-and-graphics-card.patch
This is a note to let you know that I've just added the patch titled
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5839ee7389e893a31e4e3c9cf17b50d14103c902 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 15 Dec 2017 03:07:18 +0100
Subject: PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
From: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
commit 5839ee7389e893a31e4e3c9cf17b50d14103c902 upstream.
It is incorrect to call pci_restore_state() for devices in low-power
states (D1-D3), as that involves the restoration of MSI setup which
requires MMIO to be operational and that is only the case in D0.
However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
callbacks put the device into a low-power state, so fix it by making
it force devices into D0 via pci_set_power_state() instead of trying
to "update" their power state which is pointless.
Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
Reported-by: Thomas Gleixner <tglx(a)linutronix.de>
Reported-by: Maarten Lankhorst <dev(a)mblankhorst.nl>
Tested-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Maarten Lankhorst <dev(a)mblankhorst.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pci/pci-driver.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -968,7 +968,12 @@ static int pci_pm_thaw_noirq(struct devi
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);
- pci_update_current_state(pci_dev, PCI_D0);
+ /*
+ * pci_restore_state() requires the device to be in D0 (because of MSI
+ * restoration among other things), so force it into D0 in case the
+ * driver's "freeze" callbacks put it into a low-power state directly.
+ */
+ pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);
if (drv && drv->pm && drv->pm->thaw_noirq)
Patches currently in stable-queue which might be from rafael.j.wysocki(a)intel.com are
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/acpi-apei-erst-fix-missing-error-handling-in-erst_reader.patch
This is a note to let you know that I've just added the patch titled
net: mvneta: use proper rxq_number in loop on rx queues
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mvneta-use-proper-rxq_number-in-loop-on-rx-queues.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ca5902a6547f662419689ca28b3c29a772446caa Mon Sep 17 00:00:00 2001
From: Yelena Krivosheev <yelena(a)marvell.com>
Date: Tue, 19 Dec 2017 17:59:46 +0100
Subject: net: mvneta: use proper rxq_number in loop on rx queues
From: Yelena Krivosheev <yelena(a)marvell.com>
commit ca5902a6547f662419689ca28b3c29a772446caa upstream.
When adding the RX queue association with each CPU, a typo was made in
the mvneta_cleanup_rxqs() function. This patch fixes it.
[gregory.clement(a)free-electrons.com: add commit log and fixes tag]
Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU")
Signed-off-by: Yelena Krivosheev <yelena(a)marvell.com>
Tested-by: Dmitri Epshtein <dima(a)marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)free-electrons.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/marvell/mvneta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3015,7 +3015,7 @@ static void mvneta_cleanup_rxqs(struct m
{
int queue;
- for (queue = 0; queue < txq_number; queue++)
+ for (queue = 0; queue < rxq_number; queue++)
mvneta_rxq_deinit(pp, &pp->rxqs[queue]);
}
Patches currently in stable-queue which might be from yelena(a)marvell.com are
queue-4.14/net-mvneta-eliminate-wrong-call-to-handle-rx-descriptor-error.patch
queue-4.14/net-mvneta-clear-interface-link-status-on-port-disable.patch
queue-4.14/net-mvneta-use-proper-rxq_number-in-loop-on-rx-queues.patch
This is a note to let you know that I've just added the patch titled
parisc: Fix indenting in puts()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-fix-indenting-in-puts.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 203c110b39a89b48156c7450504e454fedb7f7f6 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 12 Dec 2017 21:32:16 +0100
Subject: parisc: Fix indenting in puts()
From: Helge Deller <deller(a)gmx.de>
commit 203c110b39a89b48156c7450504e454fedb7f7f6 upstream.
Static analysis tools complain that we intended to have curly braces
around this indent block. In this case this assumption is wrong, so fix
the indenting.
Fixes: 2f3c7b8137ef ("parisc: Add core code for self-extracting kernel")
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/boot/compressed/misc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/parisc/boot/compressed/misc.c
+++ b/arch/parisc/boot/compressed/misc.c
@@ -123,8 +123,8 @@ int puts(const char *s)
while ((nuline = strchr(s, '\n')) != NULL) {
if (nuline != s)
pdc_iodc_print(s, nuline - s);
- pdc_iodc_print("\r\n", 2);
- s = nuline + 1;
+ pdc_iodc_print("\r\n", 2);
+ s = nuline + 1;
}
if (*s != '\0')
pdc_iodc_print(s, strlen(s));
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.14/parisc-align-os_hpmc_size-on-word-boundary.patch
queue-4.14/parisc-fix-indenting-in-puts.patch
queue-4.14/revert-parisc-re-enable-interrupts-early.patch
queue-4.14/parisc-hide-diva-built-in-serial-aux-and-graphics-card.patch
This is a note to let you know that I've just added the patch titled
parisc: Align os_hpmc_size on word boundary
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
parisc-align-os_hpmc_size-on-word-boundary.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0ed9d3de5f8f97e6efd5ca0e3377cab5f0451ead Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 12 Dec 2017 21:25:41 +0100
Subject: parisc: Align os_hpmc_size on word boundary
From: Helge Deller <deller(a)gmx.de>
commit 0ed9d3de5f8f97e6efd5ca0e3377cab5f0451ead upstream.
The os_hpmc_size variable sometimes wasn't aligned at word boundary and thus
triggered the unaligned fault handler at startup.
Fix it by aligning it properly.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/parisc/kernel/hpmc.S | 1 +
1 file changed, 1 insertion(+)
--- a/arch/parisc/kernel/hpmc.S
+++ b/arch/parisc/kernel/hpmc.S
@@ -305,6 +305,7 @@ ENDPROC_CFI(os_hpmc)
__INITRODATA
+ .align 4
.export os_hpmc_size
os_hpmc_size:
.word .os_hpmc_end-.os_hpmc
Patches currently in stable-queue which might be from deller(a)gmx.de are
queue-4.14/parisc-align-os_hpmc_size-on-word-boundary.patch
queue-4.14/parisc-fix-indenting-in-puts.patch
queue-4.14/revert-parisc-re-enable-interrupts-early.patch
queue-4.14/parisc-hide-diva-built-in-serial-aux-and-graphics-card.patch
This is a note to let you know that I've just added the patch titled
net: mvneta: clear interface link status on port disable
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mvneta-clear-interface-link-status-on-port-disable.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4423c18e466afdfb02a36ee8b9f901d144b3c607 Mon Sep 17 00:00:00 2001
From: Yelena Krivosheev <yelena(a)marvell.com>
Date: Tue, 19 Dec 2017 17:59:45 +0100
Subject: net: mvneta: clear interface link status on port disable
From: Yelena Krivosheev <yelena(a)marvell.com>
commit 4423c18e466afdfb02a36ee8b9f901d144b3c607 upstream.
When port connect to PHY in polling mode (with poll interval 1 sec),
port and phy link status must be synchronize in order don't loss link
change event.
[gregory.clement(a)free-electrons.com: add fixes tag]
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Yelena Krivosheev <yelena(a)marvell.com>
Tested-by: Dmitri Epshtein <dima(a)marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)free-electrons.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/marvell/mvneta.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1214,6 +1214,10 @@ static void mvneta_port_disable(struct m
val &= ~MVNETA_GMAC0_PORT_ENABLE;
mvreg_write(pp, MVNETA_GMAC_CTRL_0, val);
+ pp->link = 0;
+ pp->duplex = -1;
+ pp->speed = 0;
+
udelay(200);
}
Patches currently in stable-queue which might be from yelena(a)marvell.com are
queue-4.14/net-mvneta-eliminate-wrong-call-to-handle-rx-descriptor-error.patch
queue-4.14/net-mvneta-clear-interface-link-status-on-port-disable.patch
queue-4.14/net-mvneta-use-proper-rxq_number-in-loop-on-rx-queues.patch
This is a note to let you know that I've just added the patch titled
net: mvneta: eliminate wrong call to handle rx descriptor error
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-mvneta-eliminate-wrong-call-to-handle-rx-descriptor-error.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2eecb2e04abb62ef8ea7b43e1a46bdb5b99d1bf8 Mon Sep 17 00:00:00 2001
From: Yelena Krivosheev <yelena(a)marvell.com>
Date: Tue, 19 Dec 2017 17:59:47 +0100
Subject: net: mvneta: eliminate wrong call to handle rx descriptor error
From: Yelena Krivosheev <yelena(a)marvell.com>
commit 2eecb2e04abb62ef8ea7b43e1a46bdb5b99d1bf8 upstream.
There are few reasons in mvneta_rx_swbm() function when received packet
is dropped. mvneta_rx_error() should be called only if error bit [16]
is set in rx descriptor.
[gregory.clement(a)free-electrons.com: add fixes tag]
Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management")
Signed-off-by: Yelena Krivosheev <yelena(a)marvell.com>
Tested-by: Dmitri Epshtein <dima(a)marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)free-electrons.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/marvell/mvneta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1962,9 +1962,9 @@ static int mvneta_rx_swbm(struct mvneta_
if (!mvneta_rxq_desc_is_first_last(rx_status) ||
(rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+ mvneta_rx_error(pp, rx_desc);
err_drop_frame:
dev->stats.rx_errors++;
- mvneta_rx_error(pp, rx_desc);
/* leave the descriptor untouched */
continue;
}
Patches currently in stable-queue which might be from yelena(a)marvell.com are
queue-4.14/net-mvneta-eliminate-wrong-call-to-handle-rx-descriptor-error.patch
queue-4.14/net-mvneta-clear-interface-link-status-on-port-disable.patch
queue-4.14/net-mvneta-use-proper-rxq_number-in-loop-on-rx-queues.patch
This is a note to let you know that I've just added the patch titled
mfd: twl6040: Fix child-node lookup
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mfd-twl6040-fix-child-node-lookup.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 85e9b13cbb130a3209f21bd7933933399c389ffe Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Sat, 11 Nov 2017 16:38:44 +0100
Subject: mfd: twl6040: Fix child-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit 85e9b13cbb130a3209f21bd7933933399c389ffe upstream.
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
To make things worse, the parent node was prematurely freed, while the
child node was leaked.
Note that the CONFIG_OF compile guard can be removed as
of_get_child_by_name() provides a !CONFIG_OF implementation which always
fails.
Fixes: 37e13cecaa14 ("mfd: Add support for Device Tree to twl6040")
Fixes: ca2cad6ae38e ("mfd: Fix twl6040 build failure")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Acked-by: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mfd/twl6040.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/mfd/twl6040.c
+++ b/drivers/mfd/twl6040.c
@@ -97,12 +97,16 @@ static struct reg_sequence twl6040_patch
};
-static bool twl6040_has_vibra(struct device_node *node)
+static bool twl6040_has_vibra(struct device_node *parent)
{
-#ifdef CONFIG_OF
- if (of_find_node_by_name(node, "vibra"))
+ struct device_node *node;
+
+ node = of_get_child_by_name(parent, "vibra");
+ if (node) {
+ of_node_put(node);
return true;
-#endif
+ }
+
return false;
}
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.14/mfd-twl4030-audio-fix-sibling-node-lookup.patch
queue-4.14/mfd-twl6040-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
mfd: twl4030-audio: Fix sibling-node lookup
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mfd-twl4030-audio-fix-sibling-node-lookup.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0a423772de2f3d7b00899987884f62f63ae00dcb Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Sat, 11 Nov 2017 16:38:43 +0100
Subject: mfd: twl4030-audio: Fix sibling-node lookup
From: Johan Hovold <johan(a)kernel.org>
commit 0a423772de2f3d7b00899987884f62f63ae00dcb upstream.
A helper purported to look up a child node based on its name was using
the wrong of-helper and ended up prematurely freeing the parent of-node
while leaking any matching node.
To make things worse, any matching node would not even necessarily be a
child node as the whole device tree was searched depth-first starting at
the parent.
Fixes: 019a7e6b7b31 ("mfd: twl4030-audio: Add DT support")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Acked-by: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mfd/twl4030-audio.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/mfd/twl4030-audio.c
+++ b/drivers/mfd/twl4030-audio.c
@@ -159,13 +159,18 @@ unsigned int twl4030_audio_get_mclk(void
EXPORT_SYMBOL_GPL(twl4030_audio_get_mclk);
static bool twl4030_audio_has_codec(struct twl4030_audio_data *pdata,
- struct device_node *node)
+ struct device_node *parent)
{
+ struct device_node *node;
+
if (pdata && pdata->codec)
return true;
- if (of_find_node_by_name(node, "codec"))
+ node = of_get_child_by_name(parent, "codec");
+ if (node) {
+ of_node_put(node);
return true;
+ }
return false;
}
Patches currently in stable-queue which might be from johan(a)kernel.org are
queue-4.14/mfd-twl4030-audio-fix-sibling-node-lookup.patch
queue-4.14/mfd-twl6040-fix-child-node-lookup.patch
This is a note to let you know that I've just added the patch titled
mfd: cros ec: spi: Don't send first message too soon
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mfd-cros-ec-spi-don-t-send-first-message-too-soon.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 15d8374874ded0bec37ef27f8301a6d54032c0e5 Mon Sep 17 00:00:00 2001
From: Jon Hunter <jonathanh(a)nvidia.com>
Date: Tue, 14 Nov 2017 14:43:27 +0000
Subject: mfd: cros ec: spi: Don't send first message too soon
From: Jon Hunter <jonathanh(a)nvidia.com>
commit 15d8374874ded0bec37ef27f8301a6d54032c0e5 upstream.
On the Tegra124 Nyan-Big chromebook the very first SPI message sent to
the EC is failing.
The Tegra SPI driver configures the SPI chip-selects to be active-high
by default (and always has for many years). The EC SPI requires an
active-low chip-select and so the Tegra chip-select is reconfigured to
be active-low when the EC SPI driver calls spi_setup(). The problem is
that if the first SPI message to the EC is sent too soon after
reconfiguring the SPI chip-select, it fails.
The EC SPI driver prevents back-to-back SPI messages being sent too
soon by keeping track of the time the last transfer was sent via the
variable 'last_transfer_ns'. To prevent the very first transfer being
sent too soon, initialise the 'last_transfer_ns' variable after calling
spi_setup() and before sending the first SPI message.
Signed-off-by: Jon Hunter <jonathanh(a)nvidia.com>
Reviewed-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
Acked-by: Benson Leung <bleung(a)chromium.org>
Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/mfd/cros_ec_spi.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -667,6 +667,7 @@ static int cros_ec_spi_probe(struct spi_
sizeof(struct ec_response_get_protocol_info);
ec_dev->dout_size = sizeof(struct ec_host_request);
+ ec_spi->last_transfer_ns = ktime_get_ns();
err = cros_ec_register(ec_dev);
if (err) {
Patches currently in stable-queue which might be from jonathanh(a)nvidia.com are
queue-4.14/mfd-cros-ec-spi-don-t-send-first-message-too-soon.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, pfn: fix start_pad handling for aligned namespaces
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-pfn-fix-start_pad-handling-for-aligned-namespaces.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 19deaa217bc04e83b59b5e8c8229eb0e53ad9efc Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Tue, 19 Dec 2017 15:07:10 -0800
Subject: libnvdimm, pfn: fix start_pad handling for aligned namespaces
From: Dan Williams <dan.j.williams(a)intel.com>
commit 19deaa217bc04e83b59b5e8c8229eb0e53ad9efc upstream.
The alignment checks at pfn driver startup fail to properly account for
the 'start_pad' in the case where the namespace is misaligned relative
to its internal alignment. This is typically triggered in 1G aligned
namespace, but could theoretically trigger with small namespace
alignments. When this triggers the kernel reports messages of the form:
dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000
Fixes: 1ee6667cd8d1 ("libnvdimm, pfn, dax: fix initialization vs autodetect...")
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/pfn_devs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -364,9 +364,9 @@ struct device *nd_pfn_create(struct nd_r
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
- unsigned long align;
enum nd_pfn_mode mode;
struct nd_namespace_io *nsio;
+ unsigned long align, start_pad;
struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
struct nd_namespace_common *ndns = nd_pfn->ndns;
const u8 *parent_uuid = nd_dev_to_uuid(&ndns->dev);
@@ -410,6 +410,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pf
align = le32_to_cpu(pfn_sb->align);
offset = le64_to_cpu(pfn_sb->dataoff);
+ start_pad = le32_to_cpu(pfn_sb->start_pad);
if (align == 0)
align = 1UL << ilog2(offset);
mode = le32_to_cpu(pfn_sb->mode);
@@ -468,7 +469,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pf
return -EBUSY;
}
- if ((align && !IS_ALIGNED(offset, align))
+ if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align))
|| !IS_ALIGNED(offset, PAGE_SIZE)) {
dev_err(&nd_pfn->dev,
"bad offset: %#llx dax disabled align: %#lx\n",
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-btt-fix-an-incompatibility-in-the-log-layout.patch
queue-4.14/libnvdimm-pfn-fix-start_pad-handling-for-aligned-namespaces.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
queue-4.14/acpi-nfit-fix-health-event-notification.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 4 Dec 2017 14:07:43 -0800
Subject: libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
From: Dan Williams <dan.j.williams(a)intel.com>
commit 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 upstream.
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu(a)oracle.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/pfn_devs.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -582,6 +582,12 @@ static struct vmem_altmap *__nvdimm_setu
return altmap;
}
+static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
+{
+ return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
+ ALIGN_DOWN(phys, nd_pfn->align));
+}
+
static int nd_pfn_init(struct nd_pfn *nd_pfn)
{
u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0;
@@ -637,13 +643,16 @@ static int nd_pfn_init(struct nd_pfn *nd
start = nsio->res.start;
size = PHYS_SECTION_ALIGN_UP(start + size) - start;
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
- IORES_DESC_NONE) == REGION_MIXED) {
+ IORES_DESC_NONE) == REGION_MIXED
+ || !IS_ALIGNED(start + resource_size(&nsio->res),
+ nd_pfn->align)) {
size = resource_size(&nsio->res);
- end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size);
+ end_trunc = start + size - phys_pmem_align_down(nd_pfn,
+ start + size);
}
if (start_pad + end_trunc)
- dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n",
+ dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
dev_name(&ndns->dev), start_pad + end_trunc);
/*
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/libnvdimm-btt-fix-an-incompatibility-in-the-log-layout.patch
queue-4.14/libnvdimm-pfn-fix-start_pad-handling-for-aligned-namespaces.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
queue-4.14/acpi-nfit-fix-health-event-notification.patch
This is a note to let you know that I've just added the patch titled
libnvdimm, btt: Fix an incompatibility in the log layout
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-btt-fix-an-incompatibility-in-the-log-layout.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 24e3a7fb60a9187e5df90e5fa655ffc94b9c4f77 Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma(a)intel.com>
Date: Mon, 18 Dec 2017 09:28:39 -0700
Subject: libnvdimm, btt: Fix an incompatibility in the log layout
From: Vishal Verma <vishal.l.verma(a)intel.com>
commit 24e3a7fb60a9187e5df90e5fa655ffc94b9c4f77 upstream.
Due to a spec misinterpretation, the Linux implementation of the BTT log
area had different padding scheme from other implementations, such as
UEFI and NVML.
This fixes the padding scheme, and defaults to it for new BTT layouts.
We attempt to detect the padding scheme in use when probing for an
existing BTT. If we detect the older/incompatible scheme, we continue
using it.
Reported-by: Juston Li <juston.li(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/btt.c | 201 ++++++++++++++++++++++++++++++++++++++++++---------
drivers/nvdimm/btt.h | 45 +++++++++++
2 files changed, 211 insertions(+), 35 deletions(-)
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -210,12 +210,12 @@ static int btt_map_read(struct arena_inf
return ret;
}
-static int btt_log_read_pair(struct arena_info *arena, u32 lane,
- struct log_entry *ent)
+static int btt_log_group_read(struct arena_info *arena, u32 lane,
+ struct log_group *log)
{
return arena_read_bytes(arena,
- arena->logoff + (2 * lane * LOG_ENT_SIZE), ent,
- 2 * LOG_ENT_SIZE, 0);
+ arena->logoff + (lane * LOG_GRP_SIZE), log,
+ LOG_GRP_SIZE, 0);
}
static struct dentry *debugfs_root;
@@ -255,6 +255,8 @@ static void arena_debugfs_init(struct ar
debugfs_create_x64("logoff", S_IRUGO, d, &a->logoff);
debugfs_create_x64("info2off", S_IRUGO, d, &a->info2off);
debugfs_create_x32("flags", S_IRUGO, d, &a->flags);
+ debugfs_create_u32("log_index_0", S_IRUGO, d, &a->log_index[0]);
+ debugfs_create_u32("log_index_1", S_IRUGO, d, &a->log_index[1]);
}
static void btt_debugfs_init(struct btt *btt)
@@ -273,6 +275,11 @@ static void btt_debugfs_init(struct btt
}
}
+static u32 log_seq(struct log_group *log, int log_idx)
+{
+ return le32_to_cpu(log->ent[log_idx].seq);
+}
+
/*
* This function accepts two log entries, and uses the
* sequence number to find the 'older' entry.
@@ -282,8 +289,10 @@ static void btt_debugfs_init(struct btt
*
* TODO The logic feels a bit kludge-y. make it better..
*/
-static int btt_log_get_old(struct log_entry *ent)
+static int btt_log_get_old(struct arena_info *a, struct log_group *log)
{
+ int idx0 = a->log_index[0];
+ int idx1 = a->log_index[1];
int old;
/*
@@ -291,23 +300,23 @@ static int btt_log_get_old(struct log_en
* the next time, the following logic works out to put this
* (next) entry into [1]
*/
- if (ent[0].seq == 0) {
- ent[0].seq = cpu_to_le32(1);
+ if (log_seq(log, idx0) == 0) {
+ log->ent[idx0].seq = cpu_to_le32(1);
return 0;
}
- if (ent[0].seq == ent[1].seq)
+ if (log_seq(log, idx0) == log_seq(log, idx1))
return -EINVAL;
- if (le32_to_cpu(ent[0].seq) + le32_to_cpu(ent[1].seq) > 5)
+ if (log_seq(log, idx0) + log_seq(log, idx1) > 5)
return -EINVAL;
- if (le32_to_cpu(ent[0].seq) < le32_to_cpu(ent[1].seq)) {
- if (le32_to_cpu(ent[1].seq) - le32_to_cpu(ent[0].seq) == 1)
+ if (log_seq(log, idx0) < log_seq(log, idx1)) {
+ if ((log_seq(log, idx1) - log_seq(log, idx0)) == 1)
old = 0;
else
old = 1;
} else {
- if (le32_to_cpu(ent[0].seq) - le32_to_cpu(ent[1].seq) == 1)
+ if ((log_seq(log, idx0) - log_seq(log, idx1)) == 1)
old = 1;
else
old = 0;
@@ -327,17 +336,18 @@ static int btt_log_read(struct arena_inf
{
int ret;
int old_ent, ret_ent;
- struct log_entry log[2];
+ struct log_group log;
- ret = btt_log_read_pair(arena, lane, log);
+ ret = btt_log_group_read(arena, lane, &log);
if (ret)
return -EIO;
- old_ent = btt_log_get_old(log);
+ old_ent = btt_log_get_old(arena, &log);
if (old_ent < 0 || old_ent > 1) {
dev_err(to_dev(arena),
"log corruption (%d): lane %d seq [%d, %d]\n",
- old_ent, lane, log[0].seq, log[1].seq);
+ old_ent, lane, log.ent[arena->log_index[0]].seq,
+ log.ent[arena->log_index[1]].seq);
/* TODO set error state? */
return -EIO;
}
@@ -345,7 +355,7 @@ static int btt_log_read(struct arena_inf
ret_ent = (old_flag ? old_ent : (1 - old_ent));
if (ent != NULL)
- memcpy(ent, &log[ret_ent], LOG_ENT_SIZE);
+ memcpy(ent, &log.ent[arena->log_index[ret_ent]], LOG_ENT_SIZE);
return ret_ent;
}
@@ -359,17 +369,13 @@ static int __btt_log_write(struct arena_
u32 sub, struct log_entry *ent, unsigned long flags)
{
int ret;
- /*
- * Ignore the padding in log_entry for calculating log_half.
- * The entry is 'committed' when we write the sequence number,
- * and we want to ensure that that is the last thing written.
- * We don't bother writing the padding as that would be extra
- * media wear and write amplification
- */
- unsigned int log_half = (LOG_ENT_SIZE - 2 * sizeof(u64)) / 2;
- u64 ns_off = arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE);
+ u32 group_slot = arena->log_index[sub];
+ unsigned int log_half = LOG_ENT_SIZE / 2;
void *src = ent;
+ u64 ns_off;
+ ns_off = arena->logoff + (lane * LOG_GRP_SIZE) +
+ (group_slot * LOG_ENT_SIZE);
/* split the 16B write into atomic, durable halves */
ret = arena_write_bytes(arena, ns_off, src, log_half, flags);
if (ret)
@@ -452,7 +458,7 @@ static int btt_log_init(struct arena_inf
{
size_t logsize = arena->info2off - arena->logoff;
size_t chunk_size = SZ_4K, offset = 0;
- struct log_entry log;
+ struct log_entry ent;
void *zerobuf;
int ret;
u32 i;
@@ -484,11 +490,11 @@ static int btt_log_init(struct arena_inf
}
for (i = 0; i < arena->nfree; i++) {
- log.lba = cpu_to_le32(i);
- log.old_map = cpu_to_le32(arena->external_nlba + i);
- log.new_map = cpu_to_le32(arena->external_nlba + i);
- log.seq = cpu_to_le32(LOG_SEQ_INIT);
- ret = __btt_log_write(arena, i, 0, &log, 0);
+ ent.lba = cpu_to_le32(i);
+ ent.old_map = cpu_to_le32(arena->external_nlba + i);
+ ent.new_map = cpu_to_le32(arena->external_nlba + i);
+ ent.seq = cpu_to_le32(LOG_SEQ_INIT);
+ ret = __btt_log_write(arena, i, 0, &ent, 0);
if (ret)
goto free;
}
@@ -593,6 +599,123 @@ static int btt_freelist_init(struct aren
return 0;
}
+static bool ent_is_padding(struct log_entry *ent)
+{
+ return (ent->lba == 0) && (ent->old_map == 0) && (ent->new_map == 0)
+ && (ent->seq == 0);
+}
+
+/*
+ * Detecting valid log indices: We read a log group (see the comments in btt.h
+ * for a description of a 'log_group' and its 'slots'), and iterate over its
+ * four slots. We expect that a padding slot will be all-zeroes, and use this
+ * to detect a padding slot vs. an actual entry.
+ *
+ * If a log_group is in the initial state, i.e. hasn't been used since the
+ * creation of this BTT layout, it will have three of the four slots with
+ * zeroes. We skip over these log_groups for the detection of log_index. If
+ * all log_groups are in the initial state (i.e. the BTT has never been
+ * written to), it is safe to assume the 'new format' of log entries in slots
+ * (0, 1).
+ */
+static int log_set_indices(struct arena_info *arena)
+{
+ bool idx_set = false, initial_state = true;
+ int ret, log_index[2] = {-1, -1};
+ u32 i, j, next_idx = 0;
+ struct log_group log;
+ u32 pad_count = 0;
+
+ for (i = 0; i < arena->nfree; i++) {
+ ret = btt_log_group_read(arena, i, &log);
+ if (ret < 0)
+ return ret;
+
+ for (j = 0; j < 4; j++) {
+ if (!idx_set) {
+ if (ent_is_padding(&log.ent[j])) {
+ pad_count++;
+ continue;
+ } else {
+ /* Skip if index has been recorded */
+ if ((next_idx == 1) &&
+ (j == log_index[0]))
+ continue;
+ /* valid entry, record index */
+ log_index[next_idx] = j;
+ next_idx++;
+ }
+ if (next_idx == 2) {
+ /* two valid entries found */
+ idx_set = true;
+ } else if (next_idx > 2) {
+ /* too many valid indices */
+ return -ENXIO;
+ }
+ } else {
+ /*
+ * once the indices have been set, just verify
+ * that all subsequent log groups are either in
+ * their initial state or follow the same
+ * indices.
+ */
+ if (j == log_index[0]) {
+ /* entry must be 'valid' */
+ if (ent_is_padding(&log.ent[j]))
+ return -ENXIO;
+ } else if (j == log_index[1]) {
+ ;
+ /*
+ * log_index[1] can be padding if the
+ * lane never got used and it is still
+ * in the initial state (three 'padding'
+ * entries)
+ */
+ } else {
+ /* entry must be invalid (padding) */
+ if (!ent_is_padding(&log.ent[j]))
+ return -ENXIO;
+ }
+ }
+ }
+ /*
+ * If any of the log_groups have more than one valid,
+ * non-padding entry, then the we are no longer in the
+ * initial_state
+ */
+ if (pad_count < 3)
+ initial_state = false;
+ pad_count = 0;
+ }
+
+ if (!initial_state && !idx_set)
+ return -ENXIO;
+
+ /*
+ * If all the entries in the log were in the initial state,
+ * assume new padding scheme
+ */
+ if (initial_state)
+ log_index[1] = 1;
+
+ /*
+ * Only allow the known permutations of log/padding indices,
+ * i.e. (0, 1), and (0, 2)
+ */
+ if ((log_index[0] == 0) && ((log_index[1] == 1) || (log_index[1] == 2)))
+ ; /* known index possibilities */
+ else {
+ dev_err(to_dev(arena), "Found an unknown padding scheme\n");
+ return -ENXIO;
+ }
+
+ arena->log_index[0] = log_index[0];
+ arena->log_index[1] = log_index[1];
+ dev_dbg(to_dev(arena), "log_index_0 = %d\n", log_index[0]);
+ dev_dbg(to_dev(arena), "log_index_1 = %d\n", log_index[1]);
+ return 0;
+}
+
static int btt_rtt_init(struct arena_info *arena)
{
arena->rtt = kcalloc(arena->nfree, sizeof(u32), GFP_KERNEL);
@@ -649,8 +772,7 @@ static struct arena_info *alloc_arena(st
available -= 2 * BTT_PG_SIZE;
/* The log takes a fixed amount of space based on nfree */
- logsize = roundup(2 * arena->nfree * sizeof(struct log_entry),
- BTT_PG_SIZE);
+ logsize = roundup(arena->nfree * LOG_GRP_SIZE, BTT_PG_SIZE);
available -= logsize;
/* Calculate optimal split between map and data area */
@@ -667,6 +789,10 @@ static struct arena_info *alloc_arena(st
arena->mapoff = arena->dataoff + datasize;
arena->logoff = arena->mapoff + mapsize;
arena->info2off = arena->logoff + logsize;
+
+ /* Default log indices are (0,1) */
+ arena->log_index[0] = 0;
+ arena->log_index[1] = 1;
return arena;
}
@@ -757,6 +883,13 @@ static int discover_arenas(struct btt *b
arena->external_lba_start = cur_nlba;
parse_arena_meta(arena, super, cur_off);
+ ret = log_set_indices(arena);
+ if (ret) {
+ dev_err(to_dev(arena),
+ "Unable to deduce log/padding indices\n");
+ goto out;
+ }
+
mutex_init(&arena->err_lock);
ret = btt_freelist_init(arena);
if (ret)
--- a/drivers/nvdimm/btt.h
+++ b/drivers/nvdimm/btt.h
@@ -27,6 +27,7 @@
#define MAP_ERR_MASK (1 << MAP_ERR_SHIFT)
#define MAP_LBA_MASK (~((1 << MAP_TRIM_SHIFT) | (1 << MAP_ERR_SHIFT)))
#define MAP_ENT_NORMAL 0xC0000000
+#define LOG_GRP_SIZE sizeof(struct log_group)
#define LOG_ENT_SIZE sizeof(struct log_entry)
#define ARENA_MIN_SIZE (1UL << 24) /* 16 MB */
#define ARENA_MAX_SIZE (1ULL << 39) /* 512 GB */
@@ -50,12 +51,52 @@ enum btt_init_state {
INIT_READY
};
+/*
+ * A log group represents one log 'lane', and consists of four log entries.
+ * Two of the four entries are valid entries, and the remaining two are
+ * padding. Due to an old bug in the padding location, we need to perform a
+ * test to determine the padding scheme being used, and use that scheme
+ * thereafter.
+ *
+ * In kernels prior to 4.15, 'log group' would have actual log entries at
+ * indices (0, 2) and padding at indices (1, 3), where as the correct/updated
+ * format has log entries at indices (0, 1) and padding at indices (2, 3).
+ *
+ * Old (pre 4.15) format:
+ * +-----------------+-----------------+
+ * | ent[0] | ent[1] |
+ * | 16B | 16B |
+ * | lba/old/new/seq | pad |
+ * +-----------------------------------+
+ * | ent[2] | ent[3] |
+ * | 16B | 16B |
+ * | lba/old/new/seq | pad |
+ * +-----------------+-----------------+
+ *
+ * New format:
+ * +-----------------+-----------------+
+ * | ent[0] | ent[1] |
+ * | 16B | 16B |
+ * | lba/old/new/seq | lba/old/new/seq |
+ * +-----------------------------------+
+ * | ent[2] | ent[3] |
+ * | 16B | 16B |
+ * | pad | pad |
+ * +-----------------+-----------------+
+ *
+ * We detect during start-up which format is in use, and set
+ * arena->log_index[(0, 1)] with the detected format.
+ */
+
struct log_entry {
__le32 lba;
__le32 old_map;
__le32 new_map;
__le32 seq;
- __le64 padding[2];
+};
+
+struct log_group {
+ struct log_entry ent[4];
};
struct btt_sb {
@@ -125,6 +166,7 @@ struct aligned_lock {
* @list: List head for list of arenas
* @debugfs_dir: Debugfs dentry
* @flags: Arena flags - may signify error states.
+ * @log_index: Indices of the valid log entries in a log_group
*
* arena_info is a per-arena handle. Once an arena is narrowed down for an
* IO, this struct is passed around for the duration of the IO.
@@ -157,6 +199,7 @@ struct arena_info {
/* Arena flags */
u32 flags;
struct mutex err_lock;
+ int log_index[2];
};
/**
Patches currently in stable-queue which might be from vishal.l.verma(a)intel.com are
queue-4.14/libnvdimm-btt-fix-an-incompatibility-in-the-log-layout.patch
queue-4.14/acpi-nfit-fix-health-event-notification.patch
This is a note to let you know that I've just added the patch titled
kvm: x86: fix RSM when PCID is non-zero
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-x86-fix-rsm-when-pcid-is-non-zero.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fae1a3e775cca8c3a9e0eb34443b310871a15a92 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Thu, 21 Dec 2017 00:49:14 +0100
Subject: kvm: x86: fix RSM when PCID is non-zero
From: Paolo Bonzini <pbonzini(a)redhat.com>
commit fae1a3e775cca8c3a9e0eb34443b310871a15a92 upstream.
rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then
CR4 & ~PCIDE, then CR0, then CR4.
However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier
in the long run to replace rsm_enter_protected_mode() with an emulator
callback that sets all the special registers (like KVM_SET_SREGS would
do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1.
Reported-by: Laszlo Ersek <lersek(a)redhat.com>
Tested-by: Laszlo Ersek <lersek(a)redhat.com>
Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/emulate.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2404,9 +2404,21 @@ static int rsm_load_seg_64(struct x86_em
}
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
- u64 cr0, u64 cr4)
+ u64 cr0, u64 cr3, u64 cr4)
{
int bad;
+ u64 pcid;
+
+ /* In order to later set CR4.PCIDE, CR3[11:0] must be zero. */
+ pcid = 0;
+ if (cr4 & X86_CR4_PCIDE) {
+ pcid = cr3 & 0xfff;
+ cr3 &= ~0xfff;
+ }
+
+ bad = ctxt->ops->set_cr(ctxt, 3, cr3);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
/*
* First enable PAE, long mode needs it before CR0.PG = 1 is set.
@@ -2425,6 +2437,12 @@ static int rsm_enter_protected_mode(stru
bad = ctxt->ops->set_cr(ctxt, 4, cr4);
if (bad)
return X86EMUL_UNHANDLEABLE;
+ if (pcid) {
+ bad = ctxt->ops->set_cr(ctxt, 3, cr3 | pcid);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
+ }
+
}
return X86EMUL_CONTINUE;
@@ -2435,11 +2453,11 @@ static int rsm_load_state_32(struct x86_
struct desc_struct desc;
struct desc_ptr dt;
u16 selector;
- u32 val, cr0, cr4;
+ u32 val, cr0, cr3, cr4;
int i;
cr0 = GET_SMSTATE(u32, smbase, 0x7ffc);
- ctxt->ops->set_cr(ctxt, 3, GET_SMSTATE(u32, smbase, 0x7ff8));
+ cr3 = GET_SMSTATE(u32, smbase, 0x7ff8);
ctxt->eflags = GET_SMSTATE(u32, smbase, 0x7ff4) | X86_EFLAGS_FIXED;
ctxt->_eip = GET_SMSTATE(u32, smbase, 0x7ff0);
@@ -2481,14 +2499,14 @@ static int rsm_load_state_32(struct x86_
ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7ef8));
- return rsm_enter_protected_mode(ctxt, cr0, cr4);
+ return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
{
struct desc_struct desc;
struct desc_ptr dt;
- u64 val, cr0, cr4;
+ u64 val, cr0, cr3, cr4;
u32 base3;
u16 selector;
int i, r;
@@ -2505,7 +2523,7 @@ static int rsm_load_state_64(struct x86_
ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1);
cr0 = GET_SMSTATE(u64, smbase, 0x7f58);
- ctxt->ops->set_cr(ctxt, 3, GET_SMSTATE(u64, smbase, 0x7f50));
+ cr3 = GET_SMSTATE(u64, smbase, 0x7f50);
cr4 = GET_SMSTATE(u64, smbase, 0x7f48);
ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7f00));
val = GET_SMSTATE(u64, smbase, 0x7ed0);
@@ -2533,7 +2551,7 @@ static int rsm_load_state_64(struct x86_
dt.address = GET_SMSTATE(u64, smbase, 0x7e68);
ctxt->ops->set_gdt(ctxt, &dt);
- r = rsm_enter_protected_mode(ctxt, cr0, cr4);
+ r = rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
if (r != X86EMUL_CONTINUE)
return r;
Patches currently in stable-queue which might be from pbonzini(a)redhat.com are
queue-4.14/kvm-mmu-fix-infinite-loop-when-there-is-no-available-mmu-page.patch
queue-4.14/kvm-x86-fix-load-rflags-w-o-the-fixed-bit.patch
queue-4.14/kvm-x86-fix-rsm-when-pcid-is-non-zero.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
This is a note to let you know that I've just added the patch titled
KVM: X86: Fix load RFLAGS w/o the fixed bit
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-x86-fix-load-rflags-w-o-the-fixed-bit.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d73235d17ba63b53dc0e1051dbc10a1f1be91b71 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpeng.li(a)hotmail.com>
Date: Thu, 7 Dec 2017 00:30:08 -0800
Subject: KVM: X86: Fix load RFLAGS w/o the fixed bit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Wanpeng Li <wanpeng.li(a)hotmail.com>
commit d73235d17ba63b53dc0e1051dbc10a1f1be91b71 upstream.
*** Guest State ***
CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
CR3 = 0x00000000fffbc000
RSP = 0x0000000000000000 RIP = 0x0000000000000000
RFLAGS=0x00000000 DR7 = 0x0000000000000400
^^^^^^^^^^
The failed vmentry is triggered by the following testcase when ept=Y:
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdint.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <sys/ioctl.h>
long r[5];
int main()
{
r[2] = open("/dev/kvm", O_RDONLY);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
struct kvm_regs regs = {
.rflags = 0,
};
ioctl(r[4], KVM_SET_REGS, ®s);
ioctl(r[4], KVM_RUN, 0);
}
X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.
Suggested-by: Jim Mattson <jmattson(a)google.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Quan Xu <quan.xu0(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Wanpeng Li <wanpeng.li(a)hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7359,7 +7359,7 @@ int kvm_arch_vcpu_ioctl_set_regs(struct
#endif
kvm_rip_write(vcpu, regs->rip);
- kvm_set_rflags(vcpu, regs->rflags);
+ kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
vcpu->arch.exception.pending = false;
Patches currently in stable-queue which might be from wanpeng.li(a)hotmail.com are
queue-4.14/kvm-mmu-fix-infinite-loop-when-there-is-no-available-mmu-page.patch
queue-4.14/kvm-x86-fix-load-rflags-w-o-the-fixed-bit.patch
This is a note to let you know that I've just added the patch titled
KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-ppc-book3s-hv-fix-pending_pri-value-in-kvmppc_xive_get_icp.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7333b5aca412d6ad02667b5a513485838a91b136 Mon Sep 17 00:00:00 2001
From: Laurent Vivier <lvivier(a)redhat.com>
Date: Tue, 12 Dec 2017 18:23:56 +0100
Subject: KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()
From: Laurent Vivier <lvivier(a)redhat.com>
commit 7333b5aca412d6ad02667b5a513485838a91b136 upstream.
When we migrate a VM from a POWER8 host (XICS) to a POWER9 host
(XICS-on-XIVE), we have an error:
qemu-kvm: Unable to restore KVM interrupt controller state \
(0xff000000) for CPU 0: Invalid argument
This is because kvmppc_xics_set_icp() checks the new state
is internaly consistent, and especially:
...
1129 if (xisr == 0) {
1130 if (pending_pri != 0xff)
1131 return -EINVAL;
...
On the other side, kvmppc_xive_get_icp() doesn't set
neither the pending_pri value, nor the xisr value (set to 0)
(and kvmppc_xive_set_icp() ignores the pending_pri value)
As xisr is 0, pending_pri must be set to 0xff.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>
Acked-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/powerpc/kvm/book3s_xive.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -725,7 +725,8 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu
/* Return the per-cpu state for state saving/migration */
return (u64)xc->cppr << KVM_REG_PPC_ICP_CPPR_SHIFT |
- (u64)xc->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT;
+ (u64)xc->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT |
+ (u64)0xff << KVM_REG_PPC_ICP_PPRI_SHIFT;
}
int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
Patches currently in stable-queue which might be from lvivier(a)redhat.com are
queue-4.14/kvm-ppc-book3s-hv-fix-pending_pri-value-in-kvmppc_xive_get_icp.patch
queue-4.14/kvm-ppc-book3s-fix-xive-migration-of-pending-interrupts.patch
This is a note to let you know that I've just added the patch titled
KVM: MMU: Fix infinite loop when there is no available mmu page
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-mmu-fix-infinite-loop-when-there-is-no-available-mmu-page.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ed52870f4676489124d8697fd00e6ae6c504e586 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpeng.li(a)hotmail.com>
Date: Mon, 4 Dec 2017 22:21:30 -0800
Subject: KVM: MMU: Fix infinite loop when there is no available mmu page
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Wanpeng Li <wanpeng.li(a)hotmail.com>
commit ed52870f4676489124d8697fd00e6ae6c504e586 upstream.
The below test case can cause infinite loop in kvm when ept=0.
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdint.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <sys/ioctl.h>
long r[5];
int main()
{
r[2] = open("/dev/kvm", O_RDONLY);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
ioctl(r[4], KVM_RUN, 0);
}
It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
kvm return 1 when kvm fails to allocate root page table which can result
in beblow infinite loop:
vcpu_run() {
for (;;) {
r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
if (r <= 0)
break;
if (need_resched())
cond_resched();
}
}
This patch fixes it by returning -ENOSPC when there is no available kvm mmu
page for root page table.
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Fixes: 26eeb53cf0f (KVM: MMU: Bail out immediately if there is no available mmu page)
Signed-off-by: Wanpeng Li <wanpeng.li(a)hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kvm/mmu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3382,7 +3382,7 @@ static int mmu_alloc_direct_roots(struct
spin_lock(&vcpu->kvm->mmu_lock);
if(make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, 0, 0,
vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL);
@@ -3397,7 +3397,7 @@ static int mmu_alloc_direct_roots(struct
spin_lock(&vcpu->kvm->mmu_lock);
if (make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
@@ -3437,7 +3437,7 @@ static int mmu_alloc_shadow_roots(struct
spin_lock(&vcpu->kvm->mmu_lock);
if (make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL);
@@ -3474,7 +3474,7 @@ static int mmu_alloc_shadow_roots(struct
spin_lock(&vcpu->kvm->mmu_lock);
if (make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
0, ACC_ALL);
Patches currently in stable-queue which might be from wanpeng.li(a)hotmail.com are
queue-4.14/kvm-mmu-fix-infinite-loop-when-there-is-no-available-mmu-page.patch
queue-4.14/kvm-x86-fix-load-rflags-w-o-the-fixed-bit.patch
This is a note to let you know that I've just added the patch titled
KVM: PPC: Book3S: fix XIVE migration of pending interrupts
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-ppc-book3s-fix-xive-migration-of-pending-interrupts.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From dc1c4165d189350cb51bdd3057deb6ecd164beda Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg(a)kaod.org>
Date: Tue, 12 Dec 2017 12:02:04 +0000
Subject: KVM: PPC: Book3S: fix XIVE migration of pending interrupts
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Cédric Le Goater <clg(a)kaod.org>
commit dc1c4165d189350cb51bdd3057deb6ecd164beda upstream.
When restoring a pending interrupt, we are setting the Q bit to force
a retrigger in xive_finish_unmask(). But we also need to force an EOI
in this case to reach the same initial state : P=1, Q=0.
This can be done by not setting 'old_p' for pending interrupts which
will inform xive_finish_unmask() that an EOI needs to be sent.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Suggested-by: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg(a)kaod.org>
Reviewed-by: Laurent Vivier <lvivier(a)redhat.com>
Tested-by: Laurent Vivier <lvivier(a)redhat.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/powerpc/kvm/book3s_xive.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1558,7 +1558,7 @@ static int xive_set_source(struct kvmppc
/*
* Restore P and Q. If the interrupt was pending, we
- * force both P and Q, which will trigger a resend.
+ * force Q and !P, which will trigger a resend.
*
* That means that a guest that had both an interrupt
* pending (queued) and Q set will restore with only
@@ -1566,7 +1566,7 @@ static int xive_set_source(struct kvmppc
* is perfectly fine as coalescing interrupts that haven't
* been presented yet is always allowed.
*/
- if (val & KVM_XICS_PRESENTED || val & KVM_XICS_PENDING)
+ if (val & KVM_XICS_PRESENTED && !(val & KVM_XICS_PENDING))
state->old_p = true;
if (val & KVM_XICS_QUEUED || val & KVM_XICS_PENDING)
state->old_q = true;
Patches currently in stable-queue which might be from clg(a)kaod.org are
queue-4.14/kvm-ppc-book3s-fix-xive-migration-of-pending-interrupts.patch
This is a note to let you know that I've just added the patch titled
KVM: arm/arm64: Fix HYP unmapping going off limits
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kvm-arm-arm64-fix-hyp-unmapping-going-off-limits.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 7839c672e58bf62da8f2f0197fefb442c02ba1dd Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Thu, 7 Dec 2017 11:45:45 +0000
Subject: KVM: arm/arm64: Fix HYP unmapping going off limits
From: Marc Zyngier <marc.zyngier(a)arm.com>
commit 7839c672e58bf62da8f2f0197fefb442c02ba1dd upstream.
When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse random memory as page tables...).
The obvious fix is to let unmap_hyp_range do what it does best,
which is to iterate over a range.
The size of the linear mapping, which begins at PAGE_OFFSET, can be
easily calculated by subtracting PAGE_OFFSET form high_memory, because
high_memory is defined as the linear map address of the last byte of
DRAM, plus one.
The size of the vmalloc region is given trivially by VMALLOC_END -
VMALLOC_START.
Reported-by: Andre Przywara <andre.przywara(a)arm.com>
Tested-by: Andre Przywara <andre.przywara(a)arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
virt/kvm/arm/mmu.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -509,8 +509,6 @@ static void unmap_hyp_range(pgd_t *pgdp,
*/
void free_hyp_pgds(void)
{
- unsigned long addr;
-
mutex_lock(&kvm_hyp_pgd_mutex);
if (boot_hyp_pgd) {
@@ -521,10 +519,10 @@ void free_hyp_pgds(void)
if (hyp_pgd) {
unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
- for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
- unmap_hyp_range(hyp_pgd, kern_hyp_va(addr), PGDIR_SIZE);
- for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
- unmap_hyp_range(hyp_pgd, kern_hyp_va(addr), PGDIR_SIZE);
+ unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
+ (uintptr_t)high_memory - PAGE_OFFSET);
+ unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START),
+ VMALLOC_END - VMALLOC_START);
free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
hyp_pgd = NULL;
Patches currently in stable-queue which might be from marc.zyngier(a)arm.com are
queue-4.14/arm64-kvm-prevent-restoring-stale-pmscr_el1-for-vcpu.patch
queue-4.14/kvm-arm-arm64-fix-hyp-unmapping-going-off-limits.patch
This is a note to let you know that I've just added the patch titled
init: Invoke init_espfix_bsp() from mm_init()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
init-invoke-init_espfix_bsp-from-mm_init.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 613e396bc0d4c7604fba23256644e78454c68cf6 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 17 Dec 2017 10:56:29 +0100
Subject: init: Invoke init_espfix_bsp() from mm_init()
From: Thomas Gleixner <tglx(a)linutronix.de>
commit 613e396bc0d4c7604fba23256644e78454c68cf6 upstream.
init_espfix_bsp() needs to be invoked before the page table isolation
initialization. Move it into mm_init() which is the place where pti_init()
will be added.
While at it get rid of the #ifdeffery and provide proper stub functions.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/espfix.h | 7 ++++---
arch/x86/kernel/smpboot.c | 6 +-----
include/asm-generic/pgtable.h | 5 +++++
init/main.c | 6 ++----
4 files changed, 12 insertions(+), 12 deletions(-)
--- a/arch/x86/include/asm/espfix.h
+++ b/arch/x86/include/asm/espfix.h
@@ -2,7 +2,7 @@
#ifndef _ASM_X86_ESPFIX_H
#define _ASM_X86_ESPFIX_H
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_ESPFIX64
#include <asm/percpu.h>
@@ -11,7 +11,8 @@ DECLARE_PER_CPU_READ_MOSTLY(unsigned lon
extern void init_espfix_bsp(void);
extern void init_espfix_ap(int cpu);
-
-#endif /* CONFIG_X86_64 */
+#else
+static inline void init_espfix_ap(int cpu) { }
+#endif
#endif /* _ASM_X86_ESPFIX_H */
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -990,12 +990,8 @@ static int do_boot_cpu(int apicid, int c
initial_code = (unsigned long)start_secondary;
initial_stack = idle->thread.sp;
- /*
- * Enable the espfix hack for this CPU
- */
-#ifdef CONFIG_X86_ESPFIX64
+ /* Enable the espfix hack for this CPU */
init_espfix_ap(cpu);
-#endif
/* So we see what's up */
announce_cpu(cpu, apicid);
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1025,6 +1025,11 @@ static inline int pmd_clear_huge(pmd_t *
struct file;
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot);
+
+#ifndef CONFIG_X86_ESPFIX64
+static inline void init_espfix_bsp(void) { }
+#endif
+
#endif /* !__ASSEMBLY__ */
#ifndef io_remap_pfn_range
--- a/init/main.c
+++ b/init/main.c
@@ -504,6 +504,8 @@ static void __init mm_init(void)
pgtable_init();
vmalloc_init();
ioremap_huge_init();
+ /* Should be run before the first non-init thread is created */
+ init_espfix_bsp();
}
asmlinkage __visible void __init start_kernel(void)
@@ -674,10 +676,6 @@ asmlinkage __visible void __init start_k
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi_enter_virtual_mode();
#endif
-#ifdef CONFIG_X86_ESPFIX64
- /* Should be run before the first non-init thread is created */
- init_espfix_bsp();
-#endif
thread_stack_cache_init();
cred_init();
fork_init();
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-decoder-fix-and-update-the-opcodes-map.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-cpu_entry_area-prevent-wraparound-in-setup_cpu_entry_area_ptes-on-32bit.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/pci-pm-force-devices-to-d0-in-pci_pm_thaw_noirq.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/tools-headers-sync-objtool-uapi-header.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/objtool-fix-64-bit-build-on-32-bit-host.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/objtool-move-synced-files-to-their-original-relative-locations.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/objtool-move-kernel-headers-code-sync-check-to-a-script.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
queue-4.14/objtool-fix-cross-build.patch
This is a note to let you know that I've just added the patch titled
drm/sun4i: Fix error path handling
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
drm-sun4i-fix-error-path-handling.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 92411f6d7f1afcc95e54295d40e96a75385212ec Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Date: Thu, 7 Dec 2017 16:58:50 +0100
Subject: drm/sun4i: Fix error path handling
From: Maxime Ripard <maxime.ripard(a)free-electrons.com>
commit 92411f6d7f1afcc95e54295d40e96a75385212ec upstream.
The commit 4c7f16d14a33 ("drm/sun4i: Fix TCON clock and regmap
initialization sequence") moved a bunch of logic around, but forgot to
update the gotos after the introduction of the err_free_dotclock label.
It means that if we fail later that the one introduced in that commit,
we'll just to the old label which isn't free the clock we created. This
will result in a breakage as soon as someone tries to do something with
that clock, since its resources will have been long reclaimed.
Fixes: 4c7f16d14a33 ("drm/sun4i: Fix TCON clock and regmap initialization sequence")
Reviewed-by: Chen-Yu Tsai <wens(a)csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard(a)free-electrons.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f83c1cebc731f0b4251f5ddd7b38c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpu/drm/sun4i/sun4i_tcon.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/sun4i/sun4i_tcon.c
+++ b/drivers/gpu/drm/sun4i/sun4i_tcon.c
@@ -567,12 +567,12 @@ static int sun4i_tcon_bind(struct device
if (IS_ERR(tcon->crtc)) {
dev_err(dev, "Couldn't create our CRTC\n");
ret = PTR_ERR(tcon->crtc);
- goto err_free_clocks;
+ goto err_free_dotclock;
}
ret = sun4i_rgb_init(drm, tcon);
if (ret < 0)
- goto err_free_clocks;
+ goto err_free_dotclock;
list_add_tail(&tcon->list, &drv->tcon_list);
Patches currently in stable-queue which might be from maxime.ripard(a)free-electrons.com are
queue-4.14/drm-sun4i-fix-error-path-handling.patch
queue-4.14/clk-sunxi-sun9i-mmc-implement-reset-callback-for-reset-controls.patch
This is a note to let you know that I've just added the patch titled
crypto: skcipher - set walk.iv for zero-length inputs
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
crypto-skcipher-set-walk.iv-for-zero-length-inputs.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2b4f27c36bcd46e820ddb9a8e6fe6a63fa4250b8 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Wed, 29 Nov 2017 01:18:57 -0800
Subject: crypto: skcipher - set walk.iv for zero-length inputs
From: Eric Biggers <ebiggers(a)google.com>
commit 2b4f27c36bcd46e820ddb9a8e6fe6a63fa4250b8 upstream.
All the ChaCha20 algorithms as well as the ARM bit-sliced AES-XTS
algorithms call skcipher_walk_virt(), then access the IV (walk.iv)
before checking whether any bytes need to be processed (walk.nbytes).
But if the input is empty, then skcipher_walk_virt() doesn't set the IV,
and the algorithms crash trying to use the uninitialized IV pointer.
Fix it by setting the IV earlier in skcipher_walk_virt(). Also fix it
for the AEAD walk functions.
This isn't a perfect solution because we can't actually align the IV to
->cra_alignmask unless there are bytes to process, for one because the
temporary buffer for the aligned IV is freed by skcipher_walk_done(),
which is only called when there are bytes to process. Thus, algorithms
that require aligned IVs will still need to avoid accessing the IV when
walk.nbytes == 0. Still, many algorithms/architectures are fine with
IVs having any alignment, and even for those that aren't, a misaligned
pointer bug is much less severe than an uninitialized pointer bug.
This change also matches the behavior of the older blkcipher_walk API.
Fixes: 0cabf2af6f5a ("crypto: skcipher - Fix crash on zero-length input")
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/skcipher.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -449,6 +449,8 @@ static int skcipher_walk_skcipher(struct
walk->total = req->cryptlen;
walk->nbytes = 0;
+ walk->iv = req->iv;
+ walk->oiv = req->iv;
if (unlikely(!walk->total))
return 0;
@@ -456,9 +458,6 @@ static int skcipher_walk_skcipher(struct
scatterwalk_start(&walk->in, req->src);
scatterwalk_start(&walk->out, req->dst);
- walk->iv = req->iv;
- walk->oiv = req->iv;
-
walk->flags &= ~SKCIPHER_WALK_SLEEP;
walk->flags |= req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
SKCIPHER_WALK_SLEEP : 0;
@@ -510,6 +509,8 @@ static int skcipher_walk_aead_common(str
int err;
walk->nbytes = 0;
+ walk->iv = req->iv;
+ walk->oiv = req->iv;
if (unlikely(!walk->total))
return 0;
@@ -525,9 +526,6 @@ static int skcipher_walk_aead_common(str
scatterwalk_done(&walk->in, 0, walk->total);
scatterwalk_done(&walk->out, 0, walk->total);
- walk->iv = req->iv;
- walk->oiv = req->iv;
-
if (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP)
walk->flags |= SKCIPHER_WALK_SLEEP;
else
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-4.14/crypto-skcipher-set-walk.iv-for-zero-length-inputs.patch
This is a note to let you know that I've just added the patch titled
drm/i915: Flush pending GTT writes before unbinding
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
drm-i915-flush-pending-gtt-writes-before-unbinding.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2797c4a11f373b2545c2398ccb02e362ee66a142 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon, 4 Dec 2017 13:25:13 +0000
Subject: drm/i915: Flush pending GTT writes before unbinding
From: Chris Wilson <chris(a)chris-wilson.co.uk>
commit 2797c4a11f373b2545c2398ccb02e362ee66a142 upstream.
>From the shrinker paths, we want to relinquish the GPU and GGTT access to
the object, releasing the backing storage back to the system for
swapout. As a part of that process we would unpin the pages, marking
them for access by the CPU (for the swapout/swapin). However, if that
process was interrupted after unbind the vma, we missed a flush of the
inflight GGTT writes before we made that GTT space available again for
reuse, with the prospect that we would redirect them to another page.
The bug dates back to the introduction of multiple GGTT vma, but the
code itself dates to commit 02bef8f98d26 ("drm/i915: Unbind closed vma
for i915_gem_object_unbind()").
Fixes: 02bef8f98d26 ("drm/i915: Unbind closed vma for i915_gem_object_unbind()")
Fixes: c5ad54cf7dd8 ("drm/i915: Use partial view in mmap fault handler")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204132513.7303-1-chris@c…
(cherry picked from commit 5888fc9eac3c2ff96e76aeeb865fdb46ab2d711e)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/gpu/drm/i915/i915_gem.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -325,17 +325,10 @@ int i915_gem_object_unbind(struct drm_i9
* must wait for all rendering to complete to the object (as unbinding
* must anyway), and retire the requests.
*/
- ret = i915_gem_object_wait(obj,
- I915_WAIT_INTERRUPTIBLE |
- I915_WAIT_LOCKED |
- I915_WAIT_ALL,
- MAX_SCHEDULE_TIMEOUT,
- NULL);
+ ret = i915_gem_object_set_to_cpu_domain(obj, false);
if (ret)
return ret;
- i915_gem_retire_requests(to_i915(obj->base.dev));
-
while ((vma = list_first_entry_or_null(&obj->vma_list,
struct i915_vma,
obj_link))) {
Patches currently in stable-queue which might be from chris(a)chris-wilson.co.uk are
queue-4.14/drm-i915-flush-pending-gtt-writes-before-unbinding.patch