From: Eric Biggers <ebiggers(a)google.com>
Subject: ipc/shm: fix use-after-free of shm file via remap_file_pages()
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately
it couldn't generate a reproducer, but I found a bug which I think caused
it. When remap_file_pages() is passed a full System V shared memory
segment, the memory is first unmapped, then a new map is created using the
->vm_file. Between these steps, the shm ID can be removed and reused for
a new shm segment. But, shm_mmap() only checks whether the ID is
currently valid before calling the underlying file's ->mmap(); it doesn't
check whether it was reused. Thus it can use the wrong underlying file,
one that was already freed.
Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches the
one associated with the "outer" file. Taking the reference to the real
shm file is needed to fully solve the problem, since otherwise sfd->file
could point to a freed file, which then could be reallocated for the
reused shm ID, causing the wrong shm segment to be mapped (and without the
required permission checks).
Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because it
didn't consider the case where the shm ID is reused.
The following program usually reproduces this bug:
#include <stdlib.h>
#include <sys/shm.h>
#include <sys/syscall.h>
#include <unistd.h>
int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}
It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed. (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report. But I think it's possible
with this bug; it would just take a more extraordinary race...)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
file_accessed include/linux/fs.h:2063 [inline]
shmem_mmap+0x25/0x40 mm/shmem.c:2149
call_mmap include/linux/fs.h:1789 [inline]
shm_mmap+0x34/0x80 ipc/shm.c:465
call_mmap include/linux/fs.h:1789 [inline]
mmap_region+0x309/0x5b0 mm/mmap.c:1712
do_mmap+0x294/0x4a0 mm/mmap.c:1483
do_mmap_pgoff include/linux/mm.h:2235 [inline]
SYSC_remap_file_pages mm/mmap.c:2853 [inline]
SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ebiggers(a)google.com: add comment]
Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439(a)syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso(a)suse.de>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff -puN ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages ipc/shm.c
--- a/ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages
+++ a/ipc/shm.c
@@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_str
if (IS_ERR(shp))
return PTR_ERR(shp);
+ if (shp->shm_file != sfd->file) {
+ /* ID was reused */
+ shm_unlock(shp);
+ return -EINVAL;
+ }
+
shp->shm_atim = ktime_get_real_seconds();
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_nattch++;
@@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, s
int ret;
/*
- * In case of remap_file_pages() emulation, the file can represent
- * removed IPC ID: propogate shm_lock() error to caller.
+ * In case of remap_file_pages() emulation, the file can represent an
+ * IPC ID that was removed, and possibly even reused by another shm
+ * segment already. Propagate this case as an error to caller.
*/
ret = __shm_open(vma);
if (ret)
@@ -480,6 +487,7 @@ static int shm_release(struct inode *ino
struct shm_file_data *sfd = shm_file_data(file);
put_ipc_ns(sfd->ns);
+ fput(sfd->file);
shm_file_data(file) = NULL;
kfree(sfd);
return 0;
@@ -1445,7 +1453,16 @@ long do_shmat(int shmid, char __user *sh
file->f_mapping = shp->shm_file->f_mapping;
sfd->id = shp->shm_perm.id;
sfd->ns = get_ipc_ns(ns);
- sfd->file = shp->shm_file;
+ /*
+ * We need to take a reference to the real shm file to prevent the
+ * pointer from becoming stale in cases where the lifetime of the outer
+ * file extends beyond that of the shm segment. It's not usually
+ * possible, but it can happen during remap_file_pages() emulation as
+ * that unmaps the memory, then does ->mmap() via file reference only.
+ * We'll deny the ->mmap() if the shm segment was since removed, but to
+ * detect shm ID reuse we need to compare the file pointers.
+ */
+ sfd->file = get_file(shp->shm_file);
sfd->vm_ops = NULL;
err = security_mmap_file(file, prot, flags);
_
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Subject: get_user_pages_fast(): return -EFAULT on access_ok failure
get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages. As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to return 0 when
passed a positive number of pages, since its documentation says:
* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.
When get_user_pages_fast fall back on get_user_pages this is exactly what
happens. Unfortunately the implementation is inconsistent: it returns 0
if passed a kernel address, confusing callers: for example, the following
is pretty common but does not appear to do the right thing with a kernel
address:
ret = get_user_pages_fast(addr, 1, writeable, &page);
if (ret < 0)
return ret;
Change get_user_pages_fast to return -EFAULT when supplied a kernel
address to make it match expectations.
All callers have been audited for consistency with the documented
semantics.
Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com
Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Reported-by: syzbot+6304bf97ef436580fede(a)syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff -puN mm/gup.c~gup-return-efault-on-access_ok-failure mm/gup.c
--- a/mm/gup.c~gup-return-efault-on-access_ok-failure
+++ a/mm/gup.c
@@ -1806,9 +1806,12 @@ int get_user_pages_fast(unsigned long st
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
+ if (nr_pages <= 0)
+ return 0;
+
if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
(void __user *)start, len)))
- return 0;
+ return -EFAULT;
if (gup_fast_permitted(start, nr_pages, write)) {
local_irq_disable();
_
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Subject: mm/gup_benchmark: handle gup failures
Patch series "mm/get_user_pages_fast fixes, cleanups", v2.
Turns out get_user_pages_fast and __get_user_pages_fast return different
values on error when given a single page: __get_user_pages_fast returns 0.
get_user_pages_fast returns either 0 or an error.
Callers of get_user_pages_fast expect an error so fix it up to return an
error consistently.
Stress the difference between get_user_pages_fast and
__get_user_pages_fast to make sure callers aren't confused.
This patch (of 3):
__gup_benchmark_ioctl does not handle the case where get_user_pages_fast
fails:
- a negative return code will cause a buffer overrun
- returning with partial success will cause use of
uninitialized memory.
[akpm(a)linux-foundation.org: simplification]
Link: http://lkml.kernel.org/r/1522962072-182137-3-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup_benchmark.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures mm/gup_benchmark.c
--- a/mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures
+++ a/mm/gup_benchmark.c
@@ -23,7 +23,7 @@ static int __gup_benchmark_ioctl(unsigne
struct page **pages;
nr_pages = gup->size / PAGE_SIZE;
- pages = kvmalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
+ pages = kvzalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
if (!pages)
return -ENOMEM;
@@ -41,6 +41,8 @@ static int __gup_benchmark_ioctl(unsigne
}
nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
+ if (nr <= 0)
+ break;
i += nr;
}
end_time = ktime_get();
_
From: Takashi Iwai <tiwai(a)suse.de>
Subject: resource: fix integer overflow at reallocation
We've got a bug report indicating a kernel panic at booting on an x86-32
system, and it turned out to be the invalid PCI resource assigned after
reallocation. __find_resource() first aligns the resource start address
and resets the end address with start+size-1 accordingly, then checks
whether it's contained. Here the end address may overflow the integer,
although resource_contains() still returns true because the function
validates only start and end address. So this ends up with returning an
invalid resource (start > end).
There was already an attempt to cover such a problem in the commit
47ea91b4052d ("Resource: fix wrong resource window calculation"), but this
case is an overseen one.
This patch adds the validity check of the newly calculated resource for
avoiding the integer overflow problem.
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739
Link: http://lkml.kernel.org/r/s5hpo37d5l8.wl-tiwai@suse.de
Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Reported-by: Michael Henders <hendersm(a)shaw.ca>
Tested-by: Michael Henders <hendersm(a)shaw.ca>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ram Pai <linuxram(a)us.ibm.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN kernel/resource.c~resource-fix-integer-overflow-at-reallocation-v1 kernel/resource.c
--- a/kernel/resource.c~resource-fix-integer-overflow-at-reallocation-v1
+++ a/kernel/resource.c
@@ -651,7 +651,8 @@ static int __find_resource(struct resour
alloc.start = constraint->alignf(constraint->alignf_data, &avail,
size, constraint->align);
alloc.end = alloc.start + size - 1;
- if (resource_contains(&avail, &alloc)) {
+ if (alloc.start <= alloc.end &&
+ resource_contains(&avail, &alloc)) {
new->start = alloc.start;
new->end = alloc.end;
return 0;
_
The patch titled
Subject: mm-gup_benchmark-handle-gup-failures-fix
has been removed from the -mm tree. Its filename was
mm-gup_benchmark-handle-gup-failures-fix.patch
This patch was dropped because it was folded into mm-gup_benchmark-handle-gup-failures.patch
------------------------------------------------------
From: Andrew Morton <akpm(a)linux-foundation.org>
Subject: mm-gup_benchmark-handle-gup-failures-fix
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup_benchmark.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures-fix mm/gup_benchmark.c
--- a/mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures-fix
+++ a/mm/gup_benchmark.c
@@ -41,8 +41,9 @@ static int __gup_benchmark_ioctl(unsigne
}
nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
- if (nr > 0)
- i += nr;
+ if (nr <= 0)
+ break;
+ i += nr;
}
end_time = ktime_get();
_
Patches currently in -mm which might be from akpm(a)linux-foundation.org are
i-need-old-gcc.patch
mm-gup_benchmark-handle-gup-failures.patch
mm-pagemap-fix-swap-offset-value-for-pmd-migration-entry-fix.patch
writeback-safer-lock-nesting-fix.patch
arm-arch-arm-include-asm-pageh-needs-personalityh.patch
ocfs2-without-quota-support-try-to-avoid-calling-quota-recovery-checkpatch-fixes.patch
mm.patch
list_lru-prefetch-neighboring-list-entries-before-acquiring-lock-fix.patch
mm-oom-cgroup-aware-oom-killer-fix.patch
mm-oom-docs-describe-the-cgroup-aware-oom-killer-fix-2-fix.patch
linux-next-rejects.patch
fs-fsnotify-account-fsnotify-metadata-to-kmemcg-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch
slab-leaks3-default-y.patch
The patch titled
Subject: autofs: mount point create should honour passed in mode
has been added to the -mm tree. Its filename is
autofs-mount-point-create-should-honour-passed-in-mode.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/autofs-mount-point-create-should-h…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-mount-point-create-should-h…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: mount point create should honour passed in mode
The autofs file system mkdir inode operation blindly sets the created
directory mode to S_IFDIR | 0555, ingoring the passed in mode, which can
cause selinux dac_override denials.
But the function also checks if the caller is the daemon (as no-one else
should be able to do anything here) so there's no point in not honouring
the passed in mode, allowing the daemon to set appropriate mode when
required.
Link: http://lkml.kernel.org/r/152361593601.8051.14014139124905996173.stgit@pluto…
Signed-off-by: Ian Kent <raven(a)themaw.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/autofs4/root.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN fs/autofs4/root.c~autofs-mount-point-create-should-honour-passed-in-mode fs/autofs4/root.c
--- a/fs/autofs4/root.c~autofs-mount-point-create-should-honour-passed-in-mode
+++ a/fs/autofs4/root.c
@@ -749,7 +749,7 @@ static int autofs4_dir_mkdir(struct inod
autofs4_del_active(dentry);
- inode = autofs4_get_inode(dir->i_sb, S_IFDIR | 0555);
+ inode = autofs4_get_inode(dir->i_sb, S_IFDIR | mode);
if (!inode)
return -ENOMEM;
d_add(dentry, inode);
_
Patches currently in -mm which might be from raven(a)themaw.net are
autofs-mount-point-create-should-honour-passed-in-mode.patch