The patch titled
Subject: mm, thp: always specify disabled vmas as nh in smaps
has been removed from the -mm tree. Its filename was
mm-thp-always-specify-disabled-vmas-as-nh-in-smaps.patch
This patch was dropped because it was nacked
------------------------------------------------------
From: David Rientjes <rientjes(a)google.com>
Subject: mm, thp: always specify disabled vmas as nh in smaps
1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") introduced
a regression in that userspace cannot always determine the set of vmas
where thp is disabled.
Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
to determine if a vma has been disabled from being backed by hugepages.
Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
be disabled and emit "nh" as a flag for the corresponding vmas as part of
/proc/pid/smaps. After the commit, thp is disabled by means of an mm flag
and "nh" is not emitted.
This causes smaps parsing libraries to assume a vma is enabled for thp and
ends up puzzling the user on why its memory is not backed by thp.
This also clears the "hg" flag to make the behavior of MADV_HUGEPAGE and
PR_SET_THP_DISABLE definitive.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1809251449060.96762@chino.kir.corp…
Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Cyrill Gorcunov <gorcunov(a)gmail.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/filesystems/proc.txt | 7 ++++++-
fs/proc/task_mmu.c | 14 +++++++++++++-
2 files changed, 19 insertions(+), 2 deletions(-)
--- a/Documentation/filesystems/proc.txt~mm-thp-always-specify-disabled-vmas-as-nh-in-smaps
+++ a/Documentation/filesystems/proc.txt
@@ -508,9 +508,14 @@ manner. The codes are the following:
sd - soft-dirty flag
mm - mixed map area
hg - huge page advise flag
- nh - no-huge page advise flag
+ nh - no-huge page advise flag [*]
mg - mergable advise flag
+ [*] A process mapping may be advised to not be backed by transparent hugepages
+ by either madvise(MADV_NOHUGEPAGE) or prctl(PR_SET_THP_DISABLE). See
+ Documentation/admin-guide/mm/transhuge.rst for system-wide and process
+ mapping policies.
+
Note that there is no guarantee that every flag and associated mnemonic will
be present in all further kernel releases. Things get changed, the flags may
be vanished or the reverse -- new added. Interpretation of their meaning
--- a/fs/proc/task_mmu.c~mm-thp-always-specify-disabled-vmas-as-nh-in-smaps
+++ a/fs/proc/task_mmu.c
@@ -653,13 +653,25 @@ static void show_smap_vma_flags(struct s
#endif
#endif /* CONFIG_ARCH_HAS_PKEYS */
};
+ unsigned long flags = vma->vm_flags;
size_t i;
+ /*
+ * Disabling thp is possible through both MADV_NOHUGEPAGE and
+ * PR_SET_THP_DISABLE. Both historically used VM_NOHUGEPAGE. Since
+ * the introduction of MMF_DISABLE_THP, however, userspace needs the
+ * ability to detect vmas where thp is not eligible in the same manner.
+ */
+ if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
+ flags &= ~VM_HUGEPAGE;
+ flags |= VM_NOHUGEPAGE;
+ }
+
seq_puts(m, "VmFlags: ");
for (i = 0; i < BITS_PER_LONG; i++) {
if (!mnemonics[i][0])
continue;
- if (vma->vm_flags & (1UL << i)) {
+ if (flags & (1UL << i)) {
seq_putc(m, mnemonics[i][0]);
seq_putc(m, mnemonics[i][1]);
seq_putc(m, ' ');
_
Patches currently in -mm which might be from rientjes(a)google.com are
Commit 1598ecda7b23 ("arm64: kaslr: ensure randomized quantities are
clean to the PoC") added cache maintenance to ensure that global
variables set by the kaslr init routine are not wiped clean due to
cache invalidation occurring during the second round of page table
creation.
However, if kaslr_early_init() exits early with no randomization
being applied (either due to the lack of a seed, or because the user
has disabled kaslr explicitly), no cache maintenance is performed,
leading to the same issue we attempted to fix earlier, as far as the
module_alloc_base variable is concerned.
Note that module_alloc_base cannot be initialized statically, because
that would cause it to be subject to a R_AARCH64_RELATIVE relocation,
causing it to be overwritten by the second round of KASLR relocation
processing.
Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR")
Cc: <stable(a)vger.kernel.org> # v4.6+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
---
arch/arm64/kernel/kaslr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index ba6b41790fcd..b09b6f75f759 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -88,6 +88,7 @@ u64 __init kaslr_early_init(u64 dt_phys)
* we end up running with module randomization disabled.
*/
module_alloc_base = (u64)_etext - MODULES_VSIZE;
+ __flush_dcache_area(&module_alloc_base, sizeof(module_alloc_base));
/*
* Try to map the FDT early. If this fails, we simply bail,
--
2.20.1
The patch titled
Subject: mm: hwpoison: use do_send_sig_info() instead of force_sig()
has been added to the -mm tree. Its filename is
mm-hwpoison-use-do_send_sig_info-instead-of-force_sig-re-pmem-error-handling-forces-sigkill-causes-kernel-panic.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hwpoison-use-do_send_sig_info-i…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hwpoison-use-do_send_sig_info-i…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Subject: mm: hwpoison: use do_send_sig_info() instead of force_sig()
Currently memory_failure() is racy against process's exiting, which
results in kernel crash by null pointer dereference.
The root cause is that memory_failure() uses force_sig() to forcibly kill
asynchronous (meaning not in the current context) processes. As discussed
in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM fixes, this
is not a right thing to do. OOM solves this issue by using
do_send_sig_info() as done in commit d2d393099de2 ("signal: oom_kill_task:
use SEND_SIG_FORCED instead of force_sig()"), so this patch is suggesting
to do the same for hwpoison. do_send_sig_info() properly accesses to
siglock with lock_task_sighand(), so is free from the reported race.
I confirmed that the reported bug reproduces with inserting some delay in
kill_procs(), and it never reproduces with this patch.
Note that memory_failure() can send another type of signal using
force_sig_mceerr(), and the reported race shouldn't happen on it because
force_sig_mceerr() is called only for synchronous processes (i.e.
BUS_MCEERR_AR happens only when some process accesses to the corrupted
memory.)
Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp
Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-hwpoison-use-do_send_sig_info-instead-of-force_sig-re-pmem-error-handling-forces-sigkill-causes-kernel-panic
+++ a/mm/memory-failure.c
@@ -372,7 +372,8 @@ static void kill_procs(struct list_head
if (fail || tk->addr_valid == 0) {
pr_err("Memory failure: %#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
pfn, tk->tsk->comm, tk->tsk->pid);
- force_sig(SIGKILL, tk->tsk);
+ do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
+ tk->tsk, PIDTYPE_PID);
}
/*
_
Patches currently in -mm which might be from n-horiguchi(a)ah.jp.nec.com are
mm-hwpoison-use-do_send_sig_info-instead-of-force_sig-re-pmem-error-handling-forces-sigkill-causes-kernel-panic.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ef68e831840c40c7d01b328b3c0f5d8c4796c232 Mon Sep 17 00:00:00 2001
From: Pavel Shilovsky <pshilov(a)microsoft.com>
Date: Fri, 18 Jan 2019 17:25:36 -0800
Subject: [PATCH] CIFS: Do not reconnect TCP session in add_credits()
When executing add_credits() we currently call cifs_reconnect()
if the number of credits is zero and there are no requests in
flight. In this case we may call cifs_reconnect() recursively
twice and cause memory corruption given the following sequence
of functions:
mid1.callback() -> add_credits() -> cifs_reconnect() ->
-> mid2.callback() -> add_credits() -> cifs_reconnect().
Fix this by avoiding to call cifs_reconnect() in add_credits()
and checking for zero credits in the demultiplex thread.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov(a)microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 683310f26171..8463c940e0e5 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -720,6 +720,21 @@ server_unresponsive(struct TCP_Server_Info *server)
return false;
}
+static inline bool
+zero_credits(struct TCP_Server_Info *server)
+{
+ int val;
+
+ spin_lock(&server->req_lock);
+ val = server->credits + server->echo_credits + server->oplock_credits;
+ if (server->in_flight == 0 && val == 0) {
+ spin_unlock(&server->req_lock);
+ return true;
+ }
+ spin_unlock(&server->req_lock);
+ return false;
+}
+
static int
cifs_readv_from_socket(struct TCP_Server_Info *server, struct msghdr *smb_msg)
{
@@ -732,6 +747,12 @@ cifs_readv_from_socket(struct TCP_Server_Info *server, struct msghdr *smb_msg)
for (total_read = 0; msg_data_left(smb_msg); total_read += length) {
try_to_freeze();
+ /* reconnect if no credits and no requests in flight */
+ if (zero_credits(server)) {
+ cifs_reconnect(server);
+ return -ECONNABORTED;
+ }
+
if (server_unresponsive(server))
return -ECONNABORTED;
if (cifs_rdma_enabled(server) && server->smbd_conn)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 1d3ce127b02e..fa8d2e1076c8 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -34,6 +34,7 @@
#include "cifs_ioctl.h"
#include "smbdirect.h"
+/* Change credits for different ops and return the total number of credits */
static int
change_conf(struct TCP_Server_Info *server)
{
@@ -41,17 +42,15 @@ change_conf(struct TCP_Server_Info *server)
server->oplock_credits = server->echo_credits = 0;
switch (server->credits) {
case 0:
- return -1;
+ return 0;
case 1:
server->echoes = false;
server->oplocks = false;
- cifs_dbg(VFS, "disabling echoes and oplocks\n");
break;
case 2:
server->echoes = true;
server->oplocks = false;
server->echo_credits = 1;
- cifs_dbg(FYI, "disabling oplocks\n");
break;
default:
server->echoes = true;
@@ -64,14 +63,15 @@ change_conf(struct TCP_Server_Info *server)
server->echo_credits = 1;
}
server->credits -= server->echo_credits + server->oplock_credits;
- return 0;
+ return server->credits + server->echo_credits + server->oplock_credits;
}
static void
smb2_add_credits(struct TCP_Server_Info *server, const unsigned int add,
const int optype)
{
- int *val, rc = 0;
+ int *val, rc = -1;
+
spin_lock(&server->req_lock);
val = server->ops->get_credits_field(server, optype);
@@ -101,8 +101,26 @@ smb2_add_credits(struct TCP_Server_Info *server, const unsigned int add,
}
spin_unlock(&server->req_lock);
wake_up(&server->request_q);
- if (rc)
- cifs_reconnect(server);
+
+ if (server->tcpStatus == CifsNeedReconnect)
+ return;
+
+ switch (rc) {
+ case -1:
+ /* change_conf hasn't been executed */
+ break;
+ case 0:
+ cifs_dbg(VFS, "Possible client or server bug - zero credits\n");
+ break;
+ case 1:
+ cifs_dbg(VFS, "disabling echoes and oplocks\n");
+ break;
+ case 2:
+ cifs_dbg(FYI, "disabling oplocks\n");
+ break;
+ default:
+ cifs_dbg(FYI, "add %u credits total=%d\n", add, rc);
+ }
}
static void
When resuming, we check whether or not any previously connected
MST topologies are still present and if so, attempt to resume them. If
this fails, we disable said MST topologies and fire off a hotplug event
so that userspace knows to reprobe.
However, sending a hotplug event involves calling
drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
connector reprobe in the caller's thread - something we can't do at the
point in which i915 calls drm_dp_mst_topology_mgr_resume() since
hotplugging hasn't been fully initialized yet.
This currently causes some rather subtle but fatal issues. For example,
on my T480s the laptop dock connected to it usually disappears during a
suspend cycle, and comes back up a short while after the system has been
resumed. This guarantees pretty much every suspend and resume cycle,
drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
a connector hotplug will occur. Now it's Rute Goldberg time: when the
connector hotplug occurs, i915 reprobes /all/ of the connectors,
including eDP. However, eDP probing requires that we power on the panel
VDD which in turn, grabs a wakeref to the appropriate power domain on
the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
things start breaking, since this all happens before
intel_power_domains_enable() is called we end up leaking the wakeref
that was acquired and never releasing it later. Come next suspend/resume
cycle, this causes us to fail to shut down the GPU properly, which
causes it not to resume properly and die a horrible complicated death.
(as a note: this only happens when there's both an eDP panel and MST
topology connected which is removed mid-suspend. One or the other seems
to always be OK).
We could try to fix the VDD wakeref leak, but this doesn't seem like
it's worth it at all since we aren't able to handle hotplug detection
while resuming anyway. So, let's go with a more robust solution inspired
by nouveau: block fbdev from handling hotplug events until we resume
fbdev. This allows us to still send sysfs hotplug events to be handled
later by user space while we're resuming, while also preventing us from
actually processing any hotplug events we receive until it's safe.
This fixes the wakeref leak observed on the T480s and as such, also
fixes suspend/resume with MST topologies connected on this machine.
Changes since v2:
* Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
(Chris Wilson)
* Don't call drm_fb_helper_hotplug_event() in
intel_fbdev_output_poll_changed() under lock (Chris Wilson)
* Always set ifbdev->hpd_waiting (Chris Wilson)
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite(a)gmail.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v3.17+
---
drivers/gpu/drm/i915/intel_drv.h | 10 +++++++++
drivers/gpu/drm/i915/intel_fbdev.c | 33 +++++++++++++++++++++++++++++-
2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 90ba5436370e..9ec3d00fbd19 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -213,6 +213,16 @@ struct intel_fbdev {
unsigned long vma_flags;
async_cookie_t cookie;
int preferred_bpp;
+
+ /* Whether or not fbdev hpd processing is temporarily suspended */
+ bool hpd_suspended : 1;
+ /* Set when a hotplug was received while HPD processing was
+ * suspended
+ */
+ bool hpd_waiting : 1;
+
+ /* Protects hpd_suspended */
+ struct mutex hpd_lock;
};
struct intel_encoder {
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 8cf3efe88f02..376ffe842e26 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -681,6 +681,7 @@ int intel_fbdev_init(struct drm_device *dev)
if (ifbdev == NULL)
return -ENOMEM;
+ mutex_init(&ifbdev->hpd_lock);
drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs);
if (!intel_fbdev_init_bios(dev, ifbdev))
@@ -754,6 +755,26 @@ void intel_fbdev_fini(struct drm_i915_private *dev_priv)
intel_fbdev_destroy(ifbdev);
}
+/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD
+ * processing, fbdev will perform a full connector reprobe if a hotplug event
+ * was received while HPD was suspended.
+ */
+static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state)
+{
+ bool send_hpd = false;
+
+ mutex_lock(&ifbdev->hpd_lock);
+ ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED;
+ send_hpd = !ifbdev->hpd_suspended && ifbdev->hpd_waiting;
+ ifbdev->hpd_waiting = false;
+ mutex_unlock(&ifbdev->hpd_lock);
+
+ if (send_hpd) {
+ DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n");
+ drm_fb_helper_hotplug_event(&ifbdev->helper);
+ }
+}
+
void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous)
{
struct drm_i915_private *dev_priv = to_i915(dev);
@@ -775,6 +796,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
*/
if (state != FBINFO_STATE_RUNNING)
flush_work(&dev_priv->fbdev_suspend_work);
+
console_lock();
} else {
/*
@@ -802,17 +824,26 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
drm_fb_helper_set_suspend(&ifbdev->helper, state);
console_unlock();
+
+ intel_fbdev_hpd_set_suspend(ifbdev, state);
}
void intel_fbdev_output_poll_changed(struct drm_device *dev)
{
struct intel_fbdev *ifbdev = to_i915(dev)->fbdev;
+ bool send_hpd;
if (!ifbdev)
return;
intel_fbdev_sync(ifbdev);
- if (ifbdev->vma || ifbdev->helper.deferred_setup)
+
+ mutex_lock(&ifbdev->hpd_lock);
+ send_hpd = !ifbdev->hpd_suspended;
+ ifbdev->hpd_waiting = true;
+ mutex_unlock(&ifbdev->hpd_lock);
+
+ if (send_hpd && (ifbdev->vma || ifbdev->helper.deferred_setup))
drm_fb_helper_hotplug_event(&ifbdev->helper);
}
--
2.20.1