From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/sparse: fix kernel crash with pfn_section_valid check
Fix the below crash
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The crash is due to NULL dereference at
test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL; in
pfn_section_valid()
With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem(). This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that
config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()). Since section_deactivate
release mem_section->usage if a section is fully deactivated, pfn_valid()
check after a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
freed. For architectures like ppc64 where large pages are used for
vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
sections. Hence before a vmemmap mapping page can be freed, the kernel
needs to make sure there are no valid sections within that mapping.
Clearing the section valid bit before depopulate_section_memap enables
this.
[aneesh.kumar(a)linux.ibm.com: add comment]
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.com…: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check
+++ a/mm/sparse.c
@@ -781,6 +781,12 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
+ ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
if (section_is_early && memmap)
_
The patch titled
Subject: mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
has been removed from the -mm tree. Its filename was
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2.patch
This patch was dropped because it was folded into mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
add comment
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
+++ a/mm/sparse.c
@@ -781,7 +781,11 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
- /* Mark the section invalid */
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
The patch titled
Subject: umh: fix refcount underflow in fork_usermode_blob().
has been added to the -mm tree. Its filename is
umh-fix-refcount-underflow-in-fork_usermode_blob.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/umh-fix-refcount-underflow-in-fork…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/umh-fix-refcount-underflow-in-fork…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Subject: umh: fix refcount underflow in fork_usermode_blob().
Since free_bprm(bprm) always calls allow_write_access(bprm->file) and
fput(bprm->file) if bprm->file is set to non-NULL, __do_execve_file()
must call deny_write_access(file) and get_file(file) if called from
do_execve_file() path. Otherwise, use-after-free access can happen at
fput(file) in fork_usermode_blob().
general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 3 PID: 4131 Comm: insmod Tainted: G O 5.6.0-rc5+ #978
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
RIP: 0010:fork_usermode_blob+0xaa/0x190
Link: http://lkml.kernel.org/r/9b846b1f-a231-4f09-8c37-6bfb0d1e7b05@i-love.sakura…
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Fixes: 449325b52b7a6208 ("umh: introduce fork_usermode_blob() helper")
Cc: <stable(a)vger.kernel.org> [4.18+]
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--- a/fs/exec.c~umh-fix-refcount-underflow-in-fork_usermode_blob
+++ a/fs/exec.c
@@ -1761,11 +1761,17 @@ static int __do_execve_file(int fd, stru
check_unsafe_exec(bprm);
current->in_execve = 1;
- if (!file)
+ if (!file) {
file = do_open_execat(fd, filename, flags);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
+ retval = PTR_ERR(file);
+ if (IS_ERR(file))
+ goto out_unmark;
+ } else {
+ retval = deny_write_access(file);
+ if (retval)
+ goto out_unmark;
+ get_file(file);
+ }
sched_exec();
_
Patches currently in -mm which might be from penguin-kernel(a)i-love.sakura.ne.jp are
kernel-hung_taskc-monitor-killed-tasks.patch
umh-fix-refcount-underflow-in-fork_usermode_blob.patch
Hi,
Please consider applying the following patches to v4.4.y.
The following patches were found to be missing in v4.4.y by the ChromeOS
missing patches robot. The patches meet the following criteria.
- The patch includes a Fixes: tag
- The patch referenced in the Fixes: tag has been applied to v4.4.y
- The patch itself has not been applied to v4.4.y
All patches have been applied to v4.4.y and chromeos-4-4. Resulting images
have been build- and runtime-tested on real hardware running chromeos-4.4
and with virtual hardware on kerneltests.org.
Upstream commit 14fa91e0fef8 ("IB/ipoib: Do not warn if IPoIB debugfs doesn't exist")
Fixes: 771a52584096 ("IB/IPoIB: ibX: failed to create mcg debug file")
in v4.4.y: 771a52584096
Upstream commit efc45154828a ("uapi glibc compat: fix outer guard of net device flags enum")
Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
in v4.4.y: 1575c095e444
Upstream commit c4409905cd6e ("KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr")
Fixes: d391f1207067 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested")
in v4.4.y: 0c53038267a9
Notes:
This patch also applies to v4.9.y.
This patch has already been applied to v4.14.y.
This patch does not affect v4.19.y and later.
Upstream commit b76ba4af4ddd ("drivers/hwspinlock: use correct radix tree API")
Fixes: c6400ba7e13a ("drivers/hwspinlock: fix race between radix tree insertion and lookup")
in v4.4.y: 077b6173a8c8
Upstream commit 28d35bcdd392 ("net: ipv4: don't let PMTU updates increase route MTU")
Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu")
in v4.4.y: 119bbaa6795a
Notes:
This patch also applies to v4.9.y.
This patch also applies to v4.14.y.
This patch does not affect v4.19.y and later.
Note: I generated this notification manually. I hope I'll have some
automation soon; until then I'll send similar notifications for other
kernel branches as I find the time.
Thanks,
Guenter
Commit 42d84c8490f9 ("vhost: Check docket sk_family instead of call getname")
fixes CVE-2020-10942. It has been applied to v4.14.y and later, but not to v4.4.y.
While it does not apply directly to v4.4.y, its backport to v4.14.y (commit
ff8e12b0cfe2 in v4.14.y) does. Please apply the backport to v4.4.y as well.
Thanks,
Guenter
The following commit has been merged into the smp/core branch of tip:
Commit-ID: e98eac6ff1b45e4e73f2e6031b37c256ccb5d36b
Gitweb: https://git.kernel.org/tip/e98eac6ff1b45e4e73f2e6031b37c256ccb5d36b
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Fri, 27 Mar 2020 12:06:44 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Sat, 28 Mar 2020 11:42:55 +01:00
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
A recent change to freeze_secondary_cpus() which added an early abort if a
wakeup is pending missed the fact that the function is also invoked for
shutdown, reboot and kexec via disable_nonboot_cpus().
In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
the purpose is to terminate the currently running kernel.
Add a 'suspend' argument which is only set when the freeze is in context of
a suspend operation. If not set then an eventually pending wakeup event is
ignored.
Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
Reported-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Pavankumar Kondeti <pkondeti(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
---
include/linux/cpu.h | 12 +++++++++---
kernel/cpu.c | 4 ++--
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 9ead281..beaed2d 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -144,12 +144,18 @@ static inline void get_online_cpus(void) { cpus_read_lock(); }
static inline void put_online_cpus(void) { cpus_read_unlock(); }
#ifdef CONFIG_PM_SLEEP_SMP
-extern int freeze_secondary_cpus(int primary);
+int __freeze_secondary_cpus(int primary, bool suspend);
+static inline int freeze_secondary_cpus(int primary)
+{
+ return __freeze_secondary_cpus(primary, true);
+}
+
static inline int disable_nonboot_cpus(void)
{
- return freeze_secondary_cpus(0);
+ return __freeze_secondary_cpus(0, false);
}
-extern void enable_nonboot_cpus(void);
+
+void enable_nonboot_cpus(void);
static inline int suspend_disable_secondary_cpus(void)
{
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3084849..12ae636 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1327,7 +1327,7 @@ void bringup_nonboot_cpus(unsigned int setup_max_cpus)
#ifdef CONFIG_PM_SLEEP_SMP
static cpumask_var_t frozen_cpus;
-int freeze_secondary_cpus(int primary)
+int __freeze_secondary_cpus(int primary, bool suspend)
{
int cpu, error = 0;
@@ -1352,7 +1352,7 @@ int freeze_secondary_cpus(int primary)
if (cpu == primary)
continue;
- if (pm_wakeup_pending()) {
+ if (suspend && pm_wakeup_pending()) {
pr_info("Wakeup pending. Abort CPU freeze\n");
error = -EBUSY;
break;
This is a note to let you know that I've just added the patch titled
vt: vt_ioctl: fix use-after-free in vt_in_use()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 7cf64b18b0b96e751178b8d0505d8466ff5a448f Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sat, 21 Mar 2020 20:43:05 -0700
Subject: vt: vt_ioctl: fix use-after-free in vt_in_use()
vt_in_use() dereferences console_driver->ttys[i] without proper locking.
This is broken because the tty can be closed and freed concurrently.
We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL'
and skipping the check of tty_struct::count. But, looking at
console_driver->ttys[i] isn't really appropriate anyway because even if
it is NULL the tty can still be in the process of being closed.
Instead, fix it by making vt_in_use() require console_lock() and check
whether the vt is allocated and has port refcount > 1. This works since
following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use
virtual console" the port refcount is incremented while the vt is open.
Reproducer (very unreliable, but it worked for me after a few minutes):
#include <fcntl.h>
#include <linux/vt.h>
int main()
{
int fd, nproc;
struct vt_stat state;
char ttyname[16];
fd = open("/dev/tty10", O_RDONLY);
for (nproc = 1; nproc < 8; nproc *= 2)
fork();
for (;;) {
sprintf(ttyname, "/dev/tty%d", rand() % 8);
close(open(ttyname, O_RDONLY));
ioctl(fd, VT_GETSTATE, &state);
}
}
KASAN report:
BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
Read of size 4 at addr ffff888065722468 by task syz-vt2/132
CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
Call Trace:
[...]
vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
[...]
Allocated by task 136:
[...]
kzalloc include/linux/slab.h:669 [inline]
alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982
tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334
tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
[...]
Freed by task 41:
[...]
kfree+0xbf/0x200 mm/slab.c:3757
free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177
release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468
process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264
worker_thread+0x8b/0xc80 kernel/workqueue.c:2410
[...]
Fixes: 4001d7b7fc27 ("vt: push down the tty lock so we can see what is left to tackle")
Cc: <stable(a)vger.kernel.org> # v3.4+
Acked-by: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20200322034305.210082-3-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt_ioctl.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index f62f498f63c0..daf61c28ba76 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -43,9 +43,15 @@ bool vt_dont_switch;
static inline bool vt_in_use(unsigned int i)
{
- extern struct tty_driver *console_driver;
+ const struct vc_data *vc = vc_cons[i].d;
- return console_driver->ttys[i] && console_driver->ttys[i]->count;
+ /*
+ * console_lock must be held to prevent the vc from being deallocated
+ * while we're checking whether it's in-use.
+ */
+ WARN_CONSOLE_UNLOCKED();
+
+ return vc && kref_read(&vc->port.kref) > 1;
}
static inline bool vt_busy(int i)
@@ -643,15 +649,16 @@ int vt_ioctl(struct tty_struct *tty,
struct vt_stat __user *vtstat = up;
unsigned short state, mask;
- /* Review: FIXME: Console lock ? */
if (put_user(fg_console + 1, &vtstat->v_active))
ret = -EFAULT;
else {
state = 1; /* /dev/tty0 is always open */
+ console_lock(); /* required by vt_in_use() */
for (i = 0, mask = 2; i < MAX_NR_CONSOLES && mask;
++i, mask <<= 1)
if (vt_in_use(i))
state |= mask;
+ console_unlock();
ret = put_user(state, &vtstat->v_state);
}
break;
@@ -661,10 +668,11 @@ int vt_ioctl(struct tty_struct *tty,
* Returns the first available (non-opened) console.
*/
case VT_OPENQRY:
- /* FIXME: locking ? - but then this is a stupid API */
+ console_lock(); /* required by vt_in_use() */
for (i = 0; i < MAX_NR_CONSOLES; ++i)
if (!vt_in_use(i))
break;
+ console_unlock();
uival = i < MAX_NR_CONSOLES ? (i+1) : -1;
goto setint;
--
2.26.0