The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9d219554d9bf59875b4e571a0392d620e8954879 Mon Sep 17 00:00:00 2001
From: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Date: Wed, 2 May 2018 10:52:55 -0700
Subject: [PATCH] drm/i915: Adjust eDP's logical vco in a reliable place.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On intel_dp_compute_config() we were calculating the needed vco
for eDP on gen9 and we stashing it in
intel_atomic_state.cdclk.logical.vco
However few moments later on intel_modeset_checks() we fully
replace entire intel_atomic_state.cdclk.logical with
dev_priv->cdclk.logical fully overwriting the logical desired
vco for eDP on gen9.
So, with wrong VCO value we end up with wrong desired cdclk, but
also it will raise a lot of WARNs: On gen9, when we read
CDCLK_CTL to verify if we configured properly the desired
frequency the CD Frequency Select bits [27:26] == 10b can mean
337.5 or 308.57 MHz depending on the VCO. So if we have wrong
VCO value stashed we will believe the frequency selection didn't
stick and start to raise WARNs of cdclk mismatch.
[ 42.857519] [drm:intel_dump_cdclk_state [i915]] Changing CDCLK to 308571 kHz, VCO 8640000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0
[ 42.897269] cdclk state doesn't match!
[ 42.901052] WARNING: CPU: 5 PID: 1116 at drivers/gpu/drm/i915/intel_cdclk.c:2084 intel_set_cdclk+0x5d/0x110 [i915]
[ 42.938004] RIP: 0010:intel_set_cdclk+0x5d/0x110 [i915]
[ 43.155253] WARNING: CPU: 5 PID: 1116 at drivers/gpu/drm/i915/intel_cdclk.c:2084 intel_set_cdclk+0x5d/0x110 [i915]
[ 43.170277] [drm:intel_dump_cdclk_state [i915]] [hw state] 337500 kHz, VCO 8100000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0
[ 43.182566] [drm:intel_dump_cdclk_state [i915]] [sw state] 308571 kHz, VCO 8640000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0
v2: Move the entire eDP's vco logical adjustment to inside
the skl_modeset_calc_cdclk as suggested by Ville.
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Fixes: bb0f4aab0e76 ("drm/i915: Track full cdclk state for the logical and actual cdclk frequencies")
Cc: <stable(a)vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20180502175255.5344-1-rodrigo…
(cherry picked from commit 3297234a05ab1e90091b0574db4c397ef0e90d5f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index 32d24c69da3c..704ddb4d3ca7 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -2302,9 +2302,44 @@ static int bdw_modeset_calc_cdclk(struct drm_atomic_state *state)
return 0;
}
+static int skl_dpll0_vco(struct intel_atomic_state *intel_state)
+{
+ struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev);
+ struct intel_crtc *crtc;
+ struct intel_crtc_state *crtc_state;
+ int vco, i;
+
+ vco = intel_state->cdclk.logical.vco;
+ if (!vco)
+ vco = dev_priv->skl_preferred_vco_freq;
+
+ for_each_new_intel_crtc_in_state(intel_state, crtc, crtc_state, i) {
+ if (!crtc_state->base.enable)
+ continue;
+
+ if (!intel_crtc_has_type(crtc_state, INTEL_OUTPUT_EDP))
+ continue;
+
+ /*
+ * DPLL0 VCO may need to be adjusted to get the correct
+ * clock for eDP. This will affect cdclk as well.
+ */
+ switch (crtc_state->port_clock / 2) {
+ case 108000:
+ case 216000:
+ vco = 8640000;
+ break;
+ default:
+ vco = 8100000;
+ break;
+ }
+ }
+
+ return vco;
+}
+
static int skl_modeset_calc_cdclk(struct drm_atomic_state *state)
{
- struct drm_i915_private *dev_priv = to_i915(state->dev);
struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
int min_cdclk, cdclk, vco;
@@ -2312,9 +2347,7 @@ static int skl_modeset_calc_cdclk(struct drm_atomic_state *state)
if (min_cdclk < 0)
return min_cdclk;
- vco = intel_state->cdclk.logical.vco;
- if (!vco)
- vco = dev_priv->skl_preferred_vco_freq;
+ vco = skl_dpll0_vco(intel_state);
/*
* FIXME should also account for plane ratio
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 9a4a51e79fa1..b7b4cfdeb974 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1881,26 +1881,6 @@ intel_dp_compute_config(struct intel_encoder *encoder,
reduce_m_n);
}
- /*
- * DPLL0 VCO may need to be adjusted to get the correct
- * clock for eDP. This will affect cdclk as well.
- */
- if (intel_dp_is_edp(intel_dp) && IS_GEN9_BC(dev_priv)) {
- int vco;
-
- switch (pipe_config->port_clock / 2) {
- case 108000:
- case 216000:
- vco = 8640000;
- break;
- default:
- vco = 8100000;
- break;
- }
-
- to_intel_atomic_state(pipe_config->base.state)->cdclk.logical.vco = vco;
- }
-
if (!HAS_DDI(dev_priv))
intel_dp_set_clock(encoder, pipe_config);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d79f7aa496fc94d763f67b833a1f36f4c171176f Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Tue, 10 Apr 2018 16:27:47 -0700
Subject: [PATCH] mm: treat indirectly reclaimable memory as free in overcommit
logic
Indirectly reclaimable memory can consume a significant part of total
memory and it's actually reclaimable (it will be released under actual
memory pressure).
So, the overcommit logic should treat it as free.
Otherwise, it's possible to cause random system-wide memory allocation
failures by consuming a significant amount of memory by indirectly
reclaimable memory, e.g. dentry external names.
If overcommit policy GUESS is used, it might be used for denial of
service attack under some conditions.
The following program illustrates the approach. It causes the kernel to
allocate an unreclaimable kmalloc-256 chunk for each stat() call, so
that at some point the overcommit logic may start blocking large
allocation system-wide.
int main()
{
char buf[256];
unsigned long i;
struct stat statbuf;
buf[0] = '/';
for (i = 1; i < sizeof(buf); i++)
buf[i] = '_';
for (i = 0; 1; i++) {
sprintf(&buf[248], "%8lu", i);
stat(buf, &statbuf);
}
return 0;
}
This patch in combination with related indirectly reclaimable memory
patches closes this issue.
Link: http://lkml.kernel.org/r/20180313130041.8078-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/util.c b/mm/util.c
index 029fc2f3b395..73676f0f1b43 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -667,6 +667,13 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
*/
free += global_node_page_state(NR_SLAB_RECLAIMABLE);
+ /*
+ * Part of the kernel memory, which can be released
+ * under memory pressure.
+ */
+ free += global_node_page_state(
+ NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;
+
/*
* Leave reserved pages. The pages are not for anonymous pages.
*/
AER handling expects a successful return from slot_reset means the
driver made the device functional again. The nvme driver had been using
an asynchronous reset to recover the device, so the device
may still be initializing after control is returned to the
AER handler. This creates problems for subsequent event handling,
causing the initializion to fail.
This patch fixes that by syncing the controller reset before returning
to the AER driver, and reporting the true state of the reset.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657
Reported-by: Alex Gagniuc <mr.nuke.me(a)gmail.com>
Cc: Sinan Kaya <okaya(a)codeaurora.org>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Keith Busch <keith.busch(a)intel.com>
---
drivers/nvme/host/pci.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b542dce45927..2e221796257a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
dev_info(dev->ctrl.device, "restart after slot reset\n");
pci_restore_state(pdev);
- nvme_reset_ctrl(&dev->ctrl);
- return PCI_ERS_RESULT_RECOVERED;
+ nvme_reset_ctrl_sync(&dev->ctrl);
+
+ switch (dev->ctrl.state) {
+ case NVME_CTRL_LIVE:
+ case NVME_CTRL_ADMIN_ONLY:
+ return PCI_ERS_RESULT_RECOVERED;
+ default:
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
}
static void nvme_error_resume(struct pci_dev *pdev)
--
2.14.3
From: Stephen Hemminger <stephen(a)networkplumber.org>
In 4.9 kernel, the sysfs files for Hyper-V VMBus changed name but
the documentation files were not updated. The current sysfs file
names are /sys/bus/vmbus/devices/<UUID>/...
See commit 9a56e5d6a0ba ("Drivers: hv: make VMBus bus ids persistent")
and commit f6b2db084b65 ("vmbus: make sysfs names consistent with PCI")
Reported-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com>
---
Documentation/ABI/stable/sysfs-bus-vmbus | 40 ++++++++++++++++----------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
index 0c9d9dcd2151..3eaffbb2d468 100644
--- a/Documentation/ABI/stable/sysfs-bus-vmbus
+++ b/Documentation/ABI/stable/sysfs-bus-vmbus
@@ -1,25 +1,25 @@
-What: /sys/bus/vmbus/devices/vmbus_*/id
+What: /sys/bus/vmbus/devices/<UUID>/id
Date: Jul 2009
KernelVersion: 2.6.31
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The VMBus child_relid of the device's primary channel
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/class_id
+What: /sys/bus/vmbus/devices/<UUID>/class_id
Date: Jul 2009
KernelVersion: 2.6.31
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The VMBus interface type GUID of the device
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/device_id
+What: /sys/bus/vmbus/devices/<UUID>/device_id
Date: Jul 2009
KernelVersion: 2.6.31
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The VMBus interface instance GUID of the device
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/channel_vp_mapping
+What: /sys/bus/vmbus/devices/<UUID>/channel_vp_mapping
Date: Jul 2015
KernelVersion: 4.2.0
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
@@ -28,112 +28,112 @@ Description: The mapping of which primary/sub channels are bound to which
Format: <channel's child_relid:the bound cpu's number>
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/device
+What: /sys/bus/vmbus/devices/<UUID>/device
Date: Dec. 2015
KernelVersion: 4.5
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The 16 bit device ID of the device
Users: tools/hv/lsvmbus and user level RDMA libraries
-What: /sys/bus/vmbus/devices/vmbus_*/vendor
+What: /sys/bus/vmbus/devices/<UUID>/vendor
Date: Dec. 2015
KernelVersion: 4.5
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The 16 bit vendor ID of the device
Users: tools/hv/lsvmbus and user level RDMA libraries
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Directory for per-channel information
NN is the VMBUS relid associtated with the channel.
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/cpu
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: VCPU (sub)channel is affinitized to
Users: tools/hv/lsvmbus and other debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/cpu
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: VCPU (sub)channel is affinitized to
Users: tools/hv/lsvmbus and other debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/in_mask
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/in_mask
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Host to guest channel interrupt mask
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/latency
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/latency
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Channel signaling latency
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/out_mask
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/out_mask
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Guest to host channel interrupt mask
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/pending
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/pending
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Channel interrupt pending state
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/read_avail
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/read_avail
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Bytes available to read
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/write_avail
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/write_avail
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Bytes available to write
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/events
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/events
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Number of times we have signaled the host
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/interrupts
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/interrupts
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Number of times we have taken an interrupt (incoming)
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/subchannel_id
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/subchannel_id
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Subchannel ID associated with VMBUS channel
Users: Debugging tools and userspace drivers
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/monitor_id
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/monitor_id
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Monitor bit associated with channel
Users: Debugging tools and userspace drivers
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/ring
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/ring
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
--
2.15.1
On Linux kernel 4.16.8 (using Arch Linux) if I eject a USB 3.0 device
from my system (unplug or "udisksctl power-off --block-device /dev/sde),
it freezes instantly with a null pointer error in XHCI. I've been unable
to successfully capture a kernel log of the error (and my cameras did
not return usable results) and the screen begins scrolling
near-immediately with hung processors in a VT. I have done a git
bisection (between 4.16.7 and 4.16.7) narrowing it down to the following
commit:
commit f5331826b0b7a5f2db56a9020ddbb8ce16acdfc0
Author: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Thu May 3 17:30:07 2018 +0300
xhci: Fix use-after-free in xhci_free_virt_device
commit 44a182b9d17765514fa2b1cc911e4e65134eef93 upstream.
KASAN found a use-after-free in xhci_free_virt_device+0x33b/0x38e
where xhci_free_virt_device() sets slot id to 0 if udev exists:
if (dev->udev && dev->udev->slot_id)
dev->udev->slot_id = 0;
dev->udev will be true even if udev is freed because dev->udev is
not set to NULL.
set dev->udev pointer to NULL in xhci_free_dev()
The original patch went to stable so this fix needs to be applied
there as well.
Fixes: a400efe455f7 ("xhci: zero usb device slot_id member when
disabling and freeing a xhci slot")
Cc: <stable(a)vger.kernel.org>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
:040000 040000 356ecdc13bf252535bd51b8558a8173dae546f19
d28f201228652e87e51ba48f5a0ad4539cca5c29 M drivers
Can I be CC'd on future responses to this? I am not subscribed to the list.
Jacob Saunders
From: David Rientjes <rientjes(a)google.com>
Subject: mm, oom: fix concurrent munlock and oom reaper unmap, v3
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd
is zapped by the oom reaper during follow_page_mask() after the check for
pmd_none() is bypassed, this ends up deferencing a NULL ptl or a kernel
oops.
Fix this by manually freeing all possible memory from the mm before doing
the munlock and then setting MMF_OOM_SKIP. The oom reaper can not run on
the mm anymore so the munlock is safe to do in exit_mmap(). It also
matches the logic that the oom reaper currently uses for determining when
to set MMF_OOM_SKIP itself, so there's no new risk of excessive oom
killing.
This issue fixes CVE-2018-1000200.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.cor…
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Suggested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/oom.h | 2 +
mm/mmap.c | 44 +++++++++++++---------
mm/oom_kill.c | 81 ++++++++++++++++++++++--------------------
3 files changed, 71 insertions(+), 56 deletions(-)
diff -puN include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap include/linux/oom.h
--- a/include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/include/linux/oom.h
@@ -95,6 +95,8 @@ static inline int check_stable_address_s
return 0;
}
+void __oom_reap_task_mm(struct mm_struct *mm);
+
extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
unsigned long totalpages);
diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c
--- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/mmap.c
@@ -3024,6 +3024,32 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Manually reap the mm to free as much memory as possible.
+ * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
+ * this mm from further consideration. Taking mm->mmap_sem for
+ * write after setting MMF_OOM_SKIP will guarantee that the oom
+ * reaper will not run on this mm again after mmap_sem is
+ * dropped.
+ *
+ * Nothing can be holding mm->mmap_sem here and the above call
+ * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
+ * __oom_reap_task_mm() will not block.
+ *
+ * This needs to be done before calling munlock_vma_pages_all(),
+ * which clears VM_LOCKED, otherwise the oom reaper cannot
+ * reliably test it.
+ */
+ mutex_lock(&oom_lock);
+ __oom_reap_task_mm(mm);
+ mutex_unlock(&oom_lock);
+
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3045,24 +3071,6 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/oom_kill.c
@@ -469,7 +469,6 @@ bool process_shares_mm(struct task_struc
return false;
}
-
#ifdef CONFIG_MMU
/*
* OOM Reaper kernel thread which tries to reap the memory used by the OOM
@@ -480,16 +479,54 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+void __oom_reap_task_mm(struct mm_struct *mm)
{
- struct mmu_gather tlb;
struct vm_area_struct *vma;
+
+ /*
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over a reaped memory.
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+
+ for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+ if (!can_madv_dontneed_vma(vma))
+ continue;
+
+ /*
+ * Only anonymous pages have a good chance to be dropped
+ * without additional steps which we cannot afford as we
+ * are OOM already.
+ *
+ * We do not even care about fs backed pages because all
+ * which are reclaimable have already been reclaimed and
+ * we do not want to block exit_mmap by keeping mm ref
+ * count elevated without a good reason.
+ */
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ const unsigned long start = vma->vm_start;
+ const unsigned long end = vma->vm_end;
+ struct mmu_gather tlb;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ mmu_notifier_invalidate_range_start(mm, start, end);
+ unmap_page_range(&tlb, vma, start, end, NULL);
+ mmu_notifier_invalidate_range_end(mm, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+ }
+ }
+}
+
+static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+{
bool ret = true;
/*
* We have to make sure to not race with the victim exit path
* and cause premature new oom victim selection:
- * __oom_reap_task_mm exit_mm
+ * oom_reap_task_mm exit_mm
* mmget_not_zero
* mmput
* atomic_dec_and_test
@@ -534,39 +571,8 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
- for (vma = mm->mmap ; vma; vma = vma->vm_next) {
- if (!can_madv_dontneed_vma(vma))
- continue;
+ __oom_reap_task_mm(mm);
- /*
- * Only anonymous pages have a good chance to be dropped
- * without additional steps which we cannot afford as we
- * are OOM already.
- *
- * We do not even care about fs backed pages because all
- * which are reclaimable have already been reclaimed and
- * we do not want to block exit_mmap by keeping mm ref
- * count elevated without a good reason.
- */
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
- const unsigned long start = vma->vm_start;
- const unsigned long end = vma->vm_end;
-
- tlb_gather_mmu(&tlb, mm, start, end);
- mmu_notifier_invalidate_range_start(mm, start, end);
- unmap_page_range(&tlb, vma, start, end, NULL);
- mmu_notifier_invalidate_range_end(mm, start, end);
- tlb_finish_mmu(&tlb, start, end);
- }
- }
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -587,14 +593,13 @@ static void oom_reap_task(struct task_st
struct mm_struct *mm = tsk->signal->oom_mm;
/* Retry the down_read_trylock(mmap_sem) a few times */
- while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+ while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
schedule_timeout_idle(HZ/10);
if (attempts <= MAX_OOM_REAP_RETRIES ||
test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
-
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
task_pid_nr(tsk), tsk->comm);
debug_show_all_locks();
_
From: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Subject: mm: sections are not offlined during memory hotremove
Memory hotplug and hotremove operate with per-block granularity. If the
machine has a large amount of memory (more than 64G), the size of a memory
block can span multiple sections. By mistake, during hotremove we set
only the first section to offline state.
The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
is older than this commit. In this optimization we also added a check for
sections to be in a proper state during hotplug operation.
Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Steven Sistare <steven.sistare(a)oracle.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove mm/sparse.c
--- a/mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove
+++ a/mm/sparse.c
@@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ unsigned long section_nr = pfn_to_section_nr(pfn);
struct mem_section *ms;
/*
_
Quoting Lionel Landwerlin (2018-05-11 18:41:28)
> On 11/05/18 16:51, Chris Wilson wrote:
>
> But I can't even startup a gdm on that machine with drm-tip. So maybe
> there is some much more broken...
>
> Don't leave us in suspense...
>
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890614
>
>
> Not our bug :)
You would not believe how relieved I am that someone else has bugs in
their code.
-Chris