This is the start of the stable review cycle for the 4.9.73 release.
There are 21 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Dec 29 16:45:43 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.73-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.73-rc1
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: eliminate wrong call to handle rx descriptor error
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: use proper rxq_number in loop on rx queues
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: clear interface link status on port disable
Dan Williams <dan.j.williams(a)intel.com>
libnvdimm, pfn: fix start_pad handling for aligned namespaces
Ravi Bangoria <ravi.bangoria(a)linux.vnet.ibm.com>
powerpc/perf: Dereference BHRB entries safely
Chen-Yu Tsai <wens(a)csie.org>
clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
Paolo Bonzini <pbonzini(a)redhat.com>
kvm: x86: fix RSM when PCID is non-zero
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: X86: Fix load RFLAGS w/o the fixed bit
Mika Westerberg <mika.westerberg(a)linux.intel.com>
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
Ricardo Ribalda Delgado <ricardo.ribalda(a)gmail.com>
spi: xilinx: Detect stall with Unknown commands
Helge Deller <deller(a)gmx.de>
parisc: Hide Diva-built-in serial aux and graphics card
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
Jussi Laako <jussi(a)sonarnerd.net>
ALSA: usb-audio: Add native DSD support for Esoteric D-05X
Takashi Iwai <tiwai(a)suse.de>
ALSA: rawmidi: Avoid racy info ioctl via ctl device
Johan Hovold <johan(a)kernel.org>
mfd: twl6040: Fix child-node lookup
Johan Hovold <johan(a)kernel.org>
mfd: twl4030-audio: Fix sibling-node lookup
Jon Hunter <jonathanh(a)nvidia.com>
mfd: cros ec: spi: Don't send first message too soon
Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
crypto: mcryptd - protect the per-CPU queue with a lock
Dan Williams <dan.j.williams(a)intel.com>
acpi, nfit: fix health event notification
Takashi Iwai <tiwai(a)suse.de>
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
-------------
Diffstat:
Makefile | 4 ++--
arch/powerpc/perf/core-book3s.c | 8 ++++++--
arch/x86/kvm/emulate.c | 32 ++++++++++++++++++++++-------
arch/x86/kvm/x86.c | 2 +-
crypto/mcryptd.c | 23 +++++++++------------
drivers/acpi/apei/erst.c | 2 +-
drivers/acpi/nfit/core.c | 9 +++++++-
drivers/clk/sunxi/clk-sun9i-mmc.c | 12 +++++++++++
drivers/mfd/cros_ec_spi.c | 1 +
drivers/mfd/twl4030-audio.c | 9 ++++++--
drivers/mfd/twl6040.c | 12 +++++++----
drivers/net/ethernet/marvell/mvneta.c | 8 ++++++--
drivers/nvdimm/pfn_devs.c | 5 +++--
drivers/parisc/lba_pci.c | 33 ++++++++++++++++++++++++++++++
drivers/pci/pci-driver.c | 7 ++++++-
drivers/pinctrl/intel/pinctrl-cherryview.c | 16 +++++++++++++++
drivers/spi/spi-xilinx.c | 11 ++++++++++
include/crypto/mcryptd.h | 1 +
sound/core/rawmidi.c | 15 +++++++++++---
sound/usb/mixer.c | 27 ++++++++++++++----------
sound/usb/quirks.c | 7 ++++---
21 files changed, 189 insertions(+), 55 deletions(-)
Just a heads-up to avoid more people lose hours in debugging:
After upgrading from 4.9 to 4.14 I noticed USB support was broken on
my Allwinner A20 boards like Cubietruck and first generation BananaPi.
There is simply no output of the lsusb command, and subsequently no
connected USB devices are detected.
Bisecting led to
commit 6254a6a94489c4be717f757bec7d3a372cba1b6e
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Thu Apr 27 21:11:48 2017 +0200
power: supply: axp20x_usb_power: add IIO dependency
which already happened in the 4.12 development. Turns out after that
commit "make oldconfig" will silently drop CONFIG_AXP20X_POWER unless
CONFIG_IIO is set - but CONFIG_AXP20X_POWER is needed for USB on these
boards.
Solution: Enable CONFIG_IIO, but no particular driver is required. I'm
not aware whether it's possible to provide a smoother upgrade path in
Kconfig, "selects" instead of "depends" might have been an option.
Christoph, two more 4.14 regressions to go
The patch titled
Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
has been added to the -mm tree. Its filename is
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/userfaultfd-clear-the-vma-vm_userf…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd-clear-the-vma-vm_userf…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
The previous fix 384632e67e0829 ("userfaultfd: non-cooperative: fix fork
use after free") corrected the refcounting in case of UFFD_EVENT_FORK
failure for the fork userfault paths. That still didn't clear the
vma->vm_userfaultfd_ctx of the vmas that were set to point to the aborted
new uffd ctx earlier in dup_userfaultfd.
Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Eric Biggers <ebiggers3(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff -puN fs/userfaultfd.c~userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails fs/userfaultfd.c
--- a/fs/userfaultfd.c~userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails
+++ a/fs/userfaultfd.c
@@ -570,11 +570,14 @@ out:
static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
struct userfaultfd_wait_queue *ewq)
{
+ struct userfaultfd_ctx *release_new_ctx;
+
if (WARN_ON_ONCE(current->flags & PF_EXITING))
goto out;
ewq->ctx = ctx;
init_waitqueue_entry(&ewq->wq, current);
+ release_new_ctx = NULL;
spin_lock(&ctx->event_wqh.lock);
/*
@@ -601,8 +604,7 @@ static void userfaultfd_event_wait_compl
new = (struct userfaultfd_ctx *)
(unsigned long)
ewq->msg.arg.reserved.reserved1;
-
- userfaultfd_ctx_put(new);
+ release_new_ctx = new;
}
break;
}
@@ -617,6 +619,20 @@ static void userfaultfd_event_wait_compl
__set_current_state(TASK_RUNNING);
spin_unlock(&ctx->event_wqh.lock);
+ if (release_new_ctx) {
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = release_new_ctx->mm;
+
+ /* the various vma->vm_userfaultfd_ctx still points to it */
+ down_write(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next)
+ if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx)
+ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
+ up_write(&mm->mmap_sem);
+
+ userfaultfd_ctx_put(release_new_ctx);
+ }
+
/*
* ctx may go away after this if the userfault pseudo fd is
* already released.
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
userfaultfd-clear-the-vma-vm_userfaultfd_ctx-if-uffd_event_fork-fails.patch
Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size")
kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary.
So passing page unaligned address to kasan_populate_zero_shadow() have two
possible effects:
1) It may leave one page hole in supposed to be populated area. After commit
21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that
hole happens to be in the shadow covering fixmap area and leads to crash:
BUG: unable to handle kernel paging request at fffffbffffe8ee04
RIP: 0010:check_memory_region+0x5c/0x190
Call Trace:
<NMI>
memcpy+0x1f/0x50
ghes_copy_tofrom_phys+0xab/0x180
ghes_read_estatus+0xfb/0x280
ghes_notify_nmi+0x2b2/0x410
nmi_handle+0x115/0x2c0
default_do_nmi+0x57/0x110
do_nmi+0xf8/0x150
end_repeat_nmi+0x1a/0x1e
Note, the crash likely disappeared after commit 92a0f81d8957, which
changed kasan_populate_zero_shadow() call the way it was before
commit 21506525fb8d.
2) Attempt to load module near MODULES_END will fail, because
__vmalloc_node_range() called from kasan_module_alloc() will hit the
WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error.
To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned
which means that MODULES_END should be 8*PAGE_SIZE aligned.
The whole point of commit f06bdd4001c2 was to move MODULES_END down if
NR_CPUS is big, so the cpu_entry_area takes a lot of space.
But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
the cpu_entry_area is no longer in fixmap, so we could just set
MODULES_END to a fixed 8*PAGE_SIZE aligned address.
Reported-by: Jakub Kicinski <kubakici(a)wp.pl>
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org> # 4.14
---
Documentation/x86/x86_64/mm.txt | 5 +----
arch/x86/include/asm/pgtable_64_types.h | 2 +-
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 51101708a03a..60a0fa2bc01f 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -42,7 +42,7 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space
+ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
@@ -66,9 +66,6 @@ memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.
-The module mapping space size changes based on the CONFIG requirements for the
-following fixmap section.
-
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
Their order is preserved but their base will be offset early at boot time.
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 3d27831bc58d..ed2da4e7f259 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -100,7 +100,7 @@ typedef struct { pteval_t pte; } pte_t;
#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
-#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1)
+#define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)
#define ESPFIX_PGD_ENTRY _AC(-2, UL)
--
2.13.6
On Tue, Dec 26, 2017 at 1:11 AM, 王元良 <yuanliang.wyl(a)alibaba-inc.com> wrote:
> To Andy and josh:
>
> Forgive me for not expressing clearly, The context is that we encountered
> hundreds of this panic in stable kernel, and hope this change can be merged
> into the stable branch linux-3.16.y
>
> Regards,
> Yuanliang Wang
>
You could try to backport the upstream fix to 3.16. Josh or I could review it.
--Andy
This is a note to let you know that I've just added the patch titled
bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bpf-verifier-fix-states_equal-comparison-of-pointer-and-unknown.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ben(a)decadent.org.uk Wed Dec 27 21:04:06 2017
From: Ben Hutchings <ben(a)decadent.org.uk>
Date: Sat, 23 Dec 2017 02:26:17 +0000
Subject: bpf/verifier: Fix states_equal() comparison of pointer and UNKNOWN
To: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org, netdev(a)vger.kernel.org, Edward Cree <ecree(a)solarflare.com>, Jann Horn <jannh(a)google.com>, Alexei Starovoitov <ast(a)kernel.org>
Message-ID: <20171223022617.GO2971(a)decadent.org.uk>
Content-Disposition: inline
From: Ben Hutchings <ben(a)decadent.org.uk>
An UNKNOWN_VALUE is not supposed to be derived from a pointer, unless
pointer leaks are allowed. Therefore, states_equal() must not treat
a state with a pointer in a register as "equal" to a state with an
UNKNOWN_VALUE in that register.
This was fixed differently upstream, but the code around here was
largely rewritten in 4.14 by commit f1174f77b50c "bpf/verifier: rework
value tracking". The bug can be detected by the bpf/verifier sub-test
"pointer/scalar confusion in state equality check (way 1)".
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Cc: Edward Cree <ecree(a)solarflare.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
---
kernel/bpf/verifier.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2722,11 +2722,12 @@ static bool states_equal(struct bpf_veri
/* If we didn't map access then again we don't care about the
* mismatched range values and it's ok if our old type was
- * UNKNOWN and we didn't go to a NOT_INIT'ed reg.
+ * UNKNOWN and we didn't go to a NOT_INIT'ed or pointer reg.
*/
if (rold->type == NOT_INIT ||
(!varlen_map_access && rold->type == UNKNOWN_VALUE &&
- rcur->type != NOT_INIT))
+ rcur->type != NOT_INIT &&
+ !__is_pointer_value(env->allow_ptr_leaks, rcur)))
continue;
/* Don't care about the reg->id in this case. */
Patches currently in stable-queue which might be from ben(a)decadent.org.uk are
queue-4.9/bpf-verifier-fix-states_equal-comparison-of-pointer-and-unknown.patch
Here is my lspci output of ThunderX2 for which I am observing kernel panic coming from
SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):
# lspci -vt
-[0000:00]-+-00.0-[01-1f]--+ [...]
+ [...]
\-00.0-[1e-1f]----00.0-[1f]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family
ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family
PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge
While setting up ASP device SID in IORT dirver:
iort_iommu_configure() -> pci_for_each_dma_alias()
we need to walk up and iterate over each device which alias transaction from
downstream devices.
AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT.
Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge
spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using
the subordinate bus number. For bridge (1e:00.0), the subordinate is equal
to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream
device. So it is possible to have two identical SIDs. The question is what we
should do about such case. Presented patch prevents from registering the same
ID so that SMMUv3 is not complaining later on.
Tomasz Nowicki (1):
iommu: Make sure device's ID array elements are unique
drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
--
2.7.4
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Cc: stable(a)vger.kernel.org
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia(a)spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0e53d46544b8..2a8d8a294345 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7580,6 +7580,7 @@ allocate_trace_buffer(struct trace_array *tr, struct trace_buffer *buf, int size
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
--
2.13.2
From: Jing Xia <jing.xia(a)spreadtrum.com>
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Cc: stable(a)vger.kernel.org
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia(a)spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang(a)spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 73652d5318b2..0e53d46544b8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7603,7 +7603,9 @@ static int allocate_trace_buffers(struct trace_array *tr, int size)
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
--
2.13.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
To free the reader page that is allocated with ring_buffer_alloc_read_page(),
ring_buffer_free_read_page() must be called. For faster performance, this
page can be reused by the ring buffer to avoid having to free and allocate
new pages.
The issue arises when the page is used with a splice pipe into the
networking code. The networking code may up the page counter for the page,
and keep it active while sending it is queued to go to the network. The
incrementing of the page ref does not prevent it from being reused in the
ring buffer, and this can cause the page that is being sent out to the
network to be modified before it is sent by reading new data.
Add a check to the page ref counter, and only reuse the page if it is not
being used anywhere else.
Cc: stable(a)vger.kernel.org
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e06cde093f76..9ab18995ff1e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4404,8 +4404,13 @@ void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void *data)
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
struct buffer_data_page *bpage = data;
+ struct page *page = virt_to_page(bpage);
unsigned long flags;
+ /* If the page is still in use someplace else, we can't reuse it */
+ if (page_ref_count(page) > 1)
+ goto out;
+
local_irq_save(flags);
arch_spin_lock(&cpu_buffer->lock);
@@ -4417,6 +4422,7 @@ void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void *data)
arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags);
+ out:
free_page((unsigned long)bpage);
}
EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);
--
2.13.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Cc: stable(a)vger.kernel.org
Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 59518b8126d0..73652d5318b2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6769,7 +6769,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
#ifdef CONFIG_TRACER_MAX_TRACE
@@ -6823,14 +6823,6 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
--
2.13.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Cc: stable(a)vger.kernel.org
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c87766c1c204..e06cde093f76 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -280,6 +280,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
@@ -331,7 +333,9 @@ static void rb_init_page(struct buffer_data_page *bpage)
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
--
2.13.2
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5663d8f9bbe4bf15488f7351efb61ea20fa6de06 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 12 Dec 2017 17:15:02 +0100
Subject: [PATCH] kvm: x86: fix WARN due to uninitialized guest FPU state
------------[ cut here ]------------
Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10
RIP: 0010:ex_handler_fprestore+0x88/0x90
Call Trace:
fixup_exception+0x4e/0x60
do_general_protection+0xff/0x270
general_protection+0x22/0x30
RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
kvm_apic_accept_events+0x1c0/0x240 [kvm]
kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
do_syscall_64+0x15f/0x600
where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.
Cc: stable(a)vger.kernel.org
Fixes: f775b13eedee2f7f3c6fdd4e90fb79090ce5d339
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 154ea27746e9..56d036b9ad75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7264,13 +7264,12 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
- struct fpu *fpu = ¤t->thread.fpu;
int r;
- fpu__initialize(fpu);
-
kvm_sigset_activate(vcpu);
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
if (kvm_run->immediate_exit) {
r = -EINTR;
@@ -7296,14 +7295,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}
- kvm_load_guest_fpu(vcpu);
-
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out_fpu;
+ goto out;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
@@ -7312,9 +7309,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);
-out_fpu:
- kvm_put_guest_fpu(vcpu);
out:
+ kvm_put_guest_fpu(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
From: Corey Minyard <cminyard(a)mvista.com>
This reverts commit c97e41076a298dbc4e910c33048e553658388eed.
A backport was requested of c0a32fe13cd32 "ipmi_si: fix memory leak on
new_smi", but the backport shouldn't have been done. This change needs
to be reverted, as it can result in an oops and the previous code is
correct.
Reverts: c97e41076a29 ("ipmi_si: fix memory leak on new_smi")
Link: https://bbs.archlinux.org/viewtopic.php?pid=1757130#p1757130
Reported-by: Neil Romig <neil(a)sixtythree.me.uk>
Cc: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
---
This is for 4.14 stable tree only. In hindsight, I should scrutinize
stable kernel requests from others in the IPMI tree. Sorry about that.
drivers/char/ipmi/ipmi_si_intf.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index e1cbb78..c04aa11 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -3469,7 +3469,6 @@ static int add_smi(struct smi_info *new_smi)
ipmi_addr_src_to_str(new_smi->addr_source),
si_to_str[new_smi->si_type]);
rv = -EBUSY;
- kfree(new_smi);
goto out_err;
}
}
--
2.7.4
This is a note to let you know that I've just added the patch titled
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 4 Dec 2017 14:07:43 -0800
Subject: libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
From: Dan Williams <dan.j.williams(a)intel.com>
commit 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 upstream.
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu(a)oracle.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/nvdimm/pfn_devs.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -562,6 +562,12 @@ static struct vmem_altmap *__nvdimm_setu
return altmap;
}
+static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
+{
+ return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
+ ALIGN_DOWN(phys, nd_pfn->align));
+}
+
static int nd_pfn_init(struct nd_pfn *nd_pfn)
{
u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0;
@@ -617,13 +623,16 @@ static int nd_pfn_init(struct nd_pfn *nd
start = nsio->res.start;
size = PHYS_SECTION_ALIGN_UP(start + size) - start;
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
- IORES_DESC_NONE) == REGION_MIXED) {
+ IORES_DESC_NONE) == REGION_MIXED
+ || !IS_ALIGNED(start + resource_size(&nsio->res),
+ nd_pfn->align)) {
size = resource_size(&nsio->res);
- end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size);
+ end_trunc = start + size - phys_pmem_align_down(nd_pfn,
+ start + size);
}
if (start_pad + end_trunc)
- dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n",
+ dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
dev_name(&ndns->dev), start_pad + end_trunc);
/*
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.9/libnvdimm-pfn-fix-start_pad-handling-for-aligned-namespaces.patch
queue-4.9/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch
queue-4.9/acpi-nfit-fix-health-event-notification.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 4 Dec 2017 14:07:43 -0800
Subject: [PATCH] libnvdimm, dax: fix 1GB-aligned namespaces vs physical
misalignment
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Cc: <stable(a)vger.kernel.org>
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu(a)oracle.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index db2fc7c02e01..2adada1a5855 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -583,6 +583,12 @@ static struct vmem_altmap *__nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
return altmap;
}
+static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
+{
+ return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
+ ALIGN_DOWN(phys, nd_pfn->align));
+}
+
static int nd_pfn_init(struct nd_pfn *nd_pfn)
{
u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0;
@@ -638,13 +644,16 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
start = nsio->res.start;
size = PHYS_SECTION_ALIGN_UP(start + size) - start;
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
- IORES_DESC_NONE) == REGION_MIXED) {
+ IORES_DESC_NONE) == REGION_MIXED
+ || !IS_ALIGNED(start + resource_size(&nsio->res),
+ nd_pfn->align)) {
size = resource_size(&nsio->res);
- end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size);
+ end_trunc = start + size - phys_pmem_align_down(nd_pfn,
+ start + size);
}
if (start_pad + end_trunc)
- dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n",
+ dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
dev_name(&ndns->dev), start_pad + end_trunc);
/*
This is a note to let you know that I've just added the patch titled
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pinctrl-cherryview-mask-all-interrupts-on-intel_strago-based-systems.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d2b3c353595a855794f8b9df5b5bdbe8deb0c413 Mon Sep 17 00:00:00 2001
From: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Date: Mon, 4 Dec 2017 12:11:02 +0300
Subject: pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
From: Mika Westerberg <mika.westerberg(a)linux.intel.com>
commit d2b3c353595a855794f8b9df5b5bdbe8deb0c413 upstream.
Guenter Roeck reported an interrupt storm on a prototype system which is
based on Cyan Chromebook. The root cause turned out to be a incorrectly
configured pin that triggers spurious interrupts. This will be fixed in
coreboot but currently we need to prevent the interrupt storm from
happening by masking all interrupts (but not GPEs) on those systems.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197953
Fixes: bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in probe")
Reported-and-tested-by: Guenter Roeck <linux(a)roeck-us.net>
Reported-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/pinctrl/intel/pinctrl-cherryview.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/drivers/pinctrl/intel/pinctrl-cherryview.c
+++ b/drivers/pinctrl/intel/pinctrl-cherryview.c
@@ -1466,6 +1466,22 @@ static int chv_gpio_probe(struct chv_pin
offset += range->npins;
}
+ /*
+ * The same set of machines in chv_no_valid_mask[] have incorrectly
+ * configured GPIOs that generate spurious interrupts so we use
+ * this same list to apply another quirk for them.
+ *
+ * See also https://bugzilla.kernel.org/show_bug.cgi?id=197953.
+ */
+ if (!need_valid_mask) {
+ /*
+ * Mask all interrupts the community is able to generate
+ * but leave the ones that can only generate GPEs unmasked.
+ */
+ chv_writel(GENMASK(31, pctrl->community->nirqs),
+ pctrl->regs + CHV_INTMASK);
+ }
+
/* Clear all interrupts */
chv_writel(0xffff, pctrl->regs + CHV_INTSTAT);
Patches currently in stable-queue which might be from mika.westerberg(a)linux.intel.com are
queue-4.4/pinctrl-cherryview-mask-all-interrupts-on-intel_strago-based-systems.patch
This is a note to let you know that I've just added the patch titled
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4831b779403a836158917d59a7ca880483c67378 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sun, 10 Dec 2017 22:47:20 -0800
Subject: x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
From: Andy Lutomirski <luto(a)kernel.org>
commit 4831b779403a836158917d59a7ca880483c67378 upstream.
If something goes wrong with pagetable setup, vsyscall=native will
accidentally fall back to emulation. Make it warn and fail so that we
notice.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -139,6 +139,10 @@ bool emulate_vsyscall(struct pt_regs *re
WARN_ON_ONCE(address != regs->ip);
+ /* This should be unreachable in NATIVE mode. */
+ if (WARN_ON(vsyscall_mode == NATIVE))
+ return false;
+
if (vsyscall_mode == NONE) {
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall attempted with vsyscall=none");
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch
This is a note to let you know that I've just added the patch titled
x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 49275fef986abfb8b476e4708aaecc07e7d3e087 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Sun, 10 Dec 2017 22:47:19 -0800
Subject: x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
From: Andy Lutomirski <luto(a)kernel.org>
commit 49275fef986abfb8b476e4708aaecc07e7d3e087 upstream.
The kernel is very erratic as to which pagetables have _PAGE_USER set. The
vsyscall page gets lucky: it seems that all of the relevant pagetables are
among the apparently arbitrary ones that set _PAGE_USER. Rather than
relying on chance, just explicitly set _PAGE_USER.
This will let us clean up pagetable setup to stop setting _PAGE_USER. The
added code can also be reused by pagetable isolation to manage the
_PAGE_USER bit in the usermode tables.
[ tglx: Folded paravirt fix from Juergen Gross ]
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -37,6 +37,7 @@
#include <asm/unistd.h>
#include <asm/fixmap.h>
#include <asm/traps.h>
+#include <asm/paravirt.h>
#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"
@@ -329,16 +330,47 @@ int in_gate_area_no_mm(unsigned long add
return vsyscall_mode != NONE && (addr & PAGE_MASK) == VSYSCALL_ADDR;
}
+/*
+ * The VSYSCALL page is the only user-accessible page in the kernel address
+ * range. Normally, the kernel page tables can have _PAGE_USER clear, but
+ * the tables covering VSYSCALL_ADDR need _PAGE_USER set if vsyscalls
+ * are enabled.
+ *
+ * Some day we may create a "minimal" vsyscall mode in which we emulate
+ * vsyscalls but leave the page not present. If so, we skip calling
+ * this.
+ */
+static void __init set_vsyscall_pgtable_user_bits(void)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd = pgd_offset_k(VSYSCALL_ADDR);
+ set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
+ p4d = p4d_offset(pgd, VSYSCALL_ADDR);
+#if CONFIG_PGTABLE_LEVELS >= 5
+ p4d->p4d |= _PAGE_USER;
+#endif
+ pud = pud_offset(p4d, VSYSCALL_ADDR);
+ set_pud(pud, __pud(pud_val(*pud) | _PAGE_USER));
+ pmd = pmd_offset(pud, VSYSCALL_ADDR);
+ set_pmd(pmd, __pmd(pmd_val(*pmd) | _PAGE_USER));
+}
+
void __init map_vsyscall(void)
{
extern char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- if (vsyscall_mode != NONE)
+ if (vsyscall_mode != NONE) {
__set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall,
vsyscall_mode == NATIVE
? PAGE_KERNEL_VSYSCALL
: PAGE_KERNEL_VVAR);
+ set_vsyscall_pgtable_user_bits();
+ }
BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
(unsigned long)VSYSCALL_ADDR);
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.14/x86-entry-rename-sysenter_stack-to-cpu_entry_area_entry_stack.patch
queue-4.14/x86-mm-put-mmu-to-hardware-asid-translation-in-one-place.patch
queue-4.14/x86-vsyscall-64-explicitly-set-_page_user-in-the-pagetable-hierarchy.patch
queue-4.14/x86-uv-use-the-right-tlb-flush-api.patch
queue-4.14/x86-mm-dump_pagetables-check-page_present-for-real.patch
queue-4.14/x86-ldt-prevent-ldt-inheritance-on-exec.patch
queue-4.14/x86-microcode-dont-abuse-the-tlb-flush-interface.patch
queue-4.14/x86-doc-remove-obvious-weirdnesses-from-the-x86-mm-layout-documentation.patch
queue-4.14/init-invoke-init_espfix_bsp-from-mm_init.patch
queue-4.14/x86-cpu_entry_area-move-it-to-a-separate-unit.patch
queue-4.14/x86-vsyscall-64-warn-and-fail-vsyscall-emulation-in-native-mode.patch
queue-4.14/x86-mm-create-asm-invpcid.h.patch
queue-4.14/x86-mm-remove-superfluous-barriers.patch
queue-4.14/x86-ldt-rework-locking.patch
queue-4.14/arch-mm-allow-arch_dup_mmap-to-fail.patch
queue-4.14/x86-cpu_entry_area-move-it-out-of-the-fixmap.patch
queue-4.14/x86-mm-remove-hard-coded-asid-limit-checks.patch
queue-4.14/x86-kconfig-limit-nr_cpus-on-32-bit-to-a-sane-amount.patch
queue-4.14/x86-mm-add-comments-to-clarify-which-tlb-flush-functions-are-supposed-to-flush-what.patch
queue-4.14/x86-mm-move-the-cr3-construction-functions-to-tlbflush.h.patch
queue-4.14/x86-mm-dump_pagetables-make-the-address-hints-correct-and-readable.patch
queue-4.14/x86-insn-eval-add-utility-functions-to-get-segment-selector.patch
queue-4.14/x86-mm-use-__flush_tlb_one-for-kernel-memory.patch
queue-4.14/x86-mm-64-improve-the-memory-map-documentation.patch