The patch titled
Subject: mm: page_mapped: don't assume compound page is huge or THP
has been added to the -mm tree. Its filename is
mm-page_mapped-dont-assume-compound-page-is-huge-or-thp.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_mapped-dont-assume-compoun…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_mapped-dont-assume-compoun…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jan Stancek <jstancek(a)redhat.com>
Subject: mm: page_mapped: don't assume compound page is huge or THP
LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
page_mapped+0x78/0xb4
stable_page_flags+0x27c/0x338
kpageflags_read+0xfc/0x164
proc_reg_read+0x7c/0xb8
__vfs_read+0x58/0x178
vfs_read+0x90/0x14c
SyS_read+0x60/0xc0
Issue is that page_mapped() assumes that if compound page is not huge,
then it must be THP. But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:
for (i = 0; i < hpage_nr_pages(page); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
}
I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
- allocates compound page (PAGEC) of order 1
- allocates 2 normal pages (COPY), which are initialized to 0xff
(to satisfy _mapcount >= 0)
- 2 PAGEC page structs are copied to address of first COPY page
- second page of COPY is marked as not present
- call to page_mapped(COPY) now triggers fault on access to 2nd
COPY page at offset 0x30 (_mapcount)
[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_cras…
Fix the loop to iterate for "1 << compound_order" pages.
Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".
Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.154357715…
Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
Debugged-by: Laszlo Ersek <lersek(a)redhat.com>
Suggested-by: "Kirill A. Shutemov" <kirill(a)shutemov.name>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/util.c~mm-page_mapped-dont-assume-compound-page-is-huge-or-thp
+++ a/mm/util.c
@@ -478,7 +478,7 @@ bool page_mapped(struct page *page)
return true;
if (PageHuge(page))
return false;
- for (i = 0; i < hpage_nr_pages(page); i++) {
+ for (i = 0; i < (1 << compound_order(page)); i++) {
if (atomic_read(&page[i]._mapcount) >= 0)
return true;
}
_
Patches currently in -mm which might be from jstancek(a)redhat.com are
mm-page_mapped-dont-assume-compound-page-is-huge-or-thp.patch
Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") fixes a crash caused due to failed memcg charge of
the kernel stack. However the fix misses the cached_stacks case which
this patch fixes. So, the same crash can happen if the memcg charge of
a cached stack is failed.
Fixes: 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index e4a51124661a..593cd1577dff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -216,6 +216,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
memset(s->addr, 0, THREAD_SIZE);
tsk->stack_vm_area = s;
+ tsk->stack = s->addr;
return s->addr;
}
--
2.20.1.415.g653613c723-goog
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN
frame modification rule that makes the data length code a higher value than
the available CAN frame data size. In combination with a configured checksum
calculation where the result is stored relatively to the end of the data
(e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in
skb_shared_info) can be rewritten which finally can cause a system crash.
Michael Kubecek suggested to drop frames that have a DLC exceeding the
available space after the modification process and provided a patch that can
handle CAN FD frames too. Within this patch we also limit the length for the
checksum calculations to the maximum of Classic CAN data length (8).
CAN frames that are dropped by these additional checks are counted with the
CGW_DELETED counter which indicates misconfigurations in can-gw rules.
This fixes CVE-2019-3701.
Reported-by: Muyu Yu <ieatmuttonchuan(a)gmail.com>
Reported-by: Marcus Meissner <meissner(a)suse.de>
Suggested-by: Michal Kubecek <mkubecek(a)suse.cz>
Tested-by: Muyu Yu <ieatmuttonchuan(a)gmail.com>
Tested-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Cc: linux-stable <stable(a)vger.kernel.org> # >= v3.2
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Hello,
I've removed the else from dlc length check. Keeps the code and the
patch more readable.
Marc
net/can/gw.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/net/can/gw.c b/net/can/gw.c
index faa3da88a127..bb85d815f092 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -416,13 +416,28 @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data)
while (modidx < MAX_MODFUNCTIONS && gwj->mod.modfunc[modidx])
(*gwj->mod.modfunc[modidx++])(cf, &gwj->mod);
- /* check for checksum updates when the CAN frame has been modified */
+ /* Has the CAN frame been modified? */
if (modidx) {
- if (gwj->mod.csumfunc.crc8)
+ /* get available space for the processed CAN frame type */
+ int max_len = nskb->len - offsetof(struct can_frame, data);
+
+ /* dlc may have changed, make sure it fits to the CAN frame */
+ if (cf->can_dlc > max_len)
+ goto out_delete;
+
+ /* check for checksum updates in classic CAN length only */
+ if (gwj->mod.csumfunc.crc8) {
+ if (cf->can_dlc > 8)
+ goto out_delete;
+
(*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8);
+ }
- if (gwj->mod.csumfunc.xor)
+ if (gwj->mod.csumfunc.xor) {
+ if (cf->can_dlc > 8)
+ goto out_delete;
(*gwj->mod.csumfunc.xor)(cf, &gwj->mod.csum.xor);
+ }
}
/* clear the skb timestamp if not configured the other way */
@@ -434,6 +449,14 @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data)
gwj->dropped_frames++;
else
gwj->handled_frames++;
+
+ return;
+
+ out_delete:
+ /* delete frame due to misconfiguration */
+ gwj->deleted_frames++;
+ kfree_skb(nskb);
+ return;
}
static inline int cgw_register_filter(struct net *net, struct cgw_job *gwj)
--
2.20.1
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN
frame modification rule that makes the data length code a higher value than
the available CAN frame data size. In combination with a configured checksum
calculation where the result is stored relatively to the end of the data
(e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in
skb_shared_info) can be rewritten which finally can cause a system crash.
Michael Kubecek suggested to drop frames that have a DLC exceeding the
available space after the modification process and provided a patch that can
handle CAN FD frames too. Within this patch we also limit the length for the
checksum calculations to the maximum of Classic CAN data length (8).
CAN frames that are dropped by these additional checks are counted with the
CGW_DELETED counter which indicates misconfigurations in can-gw rules.
This fixes CVE-2019-3701.
Reported-by: Muyu Yu <ieatmuttonchuan(a)gmail.com>
Reported-by: Marcus Meissner <meissner(a)suse.de>
Suggested-by: Michal Kubecek <mkubecek(a)suse.cz>
Tested-by: Muyu Yu <ieatmuttonchuan(a)gmail.com>
Tested-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Cc: linux-stable <stable(a)vger.kernel.org> # >= v3.2
---
net/can/gw.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)
diff --git a/net/can/gw.c b/net/can/gw.c
index faa3da88a127..180c389af5b1 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -416,13 +416,31 @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data)
while (modidx < MAX_MODFUNCTIONS && gwj->mod.modfunc[modidx])
(*gwj->mod.modfunc[modidx++])(cf, &gwj->mod);
- /* check for checksum updates when the CAN frame has been modified */
+ /* Has the CAN frame been modified? */
if (modidx) {
- if (gwj->mod.csumfunc.crc8)
- (*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8);
+ /* get available space for the processed CAN frame type */
+ int max_len = nskb->len - offsetof(struct can_frame, data);
- if (gwj->mod.csumfunc.xor)
- (*gwj->mod.csumfunc.xor)(cf, &gwj->mod.csum.xor);
+ /* dlc may have changed, make sure it fits to the CAN frame */
+ if (cf->can_dlc > max_len)
+ goto out_delete;
+
+ /* check for checksum updates in classic CAN length only */
+ if (gwj->mod.csumfunc.crc8) {
+ if (cf->can_dlc > 8)
+ goto out_delete;
+ else
+ (*gwj->mod.csumfunc.crc8)
+ (cf, &gwj->mod.csum.crc8);
+ }
+
+ if (gwj->mod.csumfunc.xor) {
+ if (cf->can_dlc > 8)
+ goto out_delete;
+ else
+ (*gwj->mod.csumfunc.xor)
+ (cf, &gwj->mod.csum.xor);
+ }
}
/* clear the skb timestamp if not configured the other way */
@@ -434,6 +452,14 @@ static void can_can_gw_rcv(struct sk_buff *skb, void *data)
gwj->dropped_frames++;
else
gwj->handled_frames++;
+
+ return;
+
+out_delete:
+ /* delete frame due to misconfiguration */
+ gwj->deleted_frames++;
+ kfree_skb(nskb);
+ return;
}
static inline int cgw_register_filter(struct net *net, struct cgw_job *gwj)
--
2.19.2
The ARM Linux kernel handles the EABI syscall numbers as follows:
0 - NR_SYSCALLS-1 : Invoke syscall via syscall table
NR_SYSCALLS - 0xeffff : -ENOSYS (to be allocated in future)
0xf0000 - 0xf07ff : Private syscall or -ENOSYS if not allocated
> 0xf07ff : SIGILL
Our compat code gets this wrong and ends up sending SIGILL in response
to all syscalls greater than NR_SYSCALLS which have a value greater
than 0x7ff in the bottom 16 bits.
Fix this by defining the end of the ARM private syscall region and
checking the syscall number against that directly. Update the comment
while we're at it.
Cc: <stable(a)vger.kernel.org>
Cc: Dave Martin <Dave.Martin(a)arm.com>
Reported-by: Pi-Hsun Shih <pihsun(a)chromium.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
---
arch/arm64/include/asm/unistd.h | 5 +++--
arch/arm64/kernel/sys_compat.c | 4 ++--
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index b13ca091f833..be66a54ee3a1 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -40,8 +40,9 @@
* The following SVCs are ARM private.
*/
#define __ARM_NR_COMPAT_BASE 0x0f0000
-#define __ARM_NR_compat_cacheflush (__ARM_NR_COMPAT_BASE+2)
-#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE+5)
+#define __ARM_NR_compat_cacheflush (__ARM_NR_COMPAT_BASE + 2)
+#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
+#define __ARM_NR_compat_syscall_end (__ARM_NR_COMPAT_BASE + 0x800)
#define __NR_compat_syscalls 399
#endif
diff --git a/arch/arm64/kernel/sys_compat.c b/arch/arm64/kernel/sys_compat.c
index 32653d156747..5972b7533fa0 100644
--- a/arch/arm64/kernel/sys_compat.c
+++ b/arch/arm64/kernel/sys_compat.c
@@ -102,12 +102,12 @@ long compat_arm_syscall(struct pt_regs *regs)
default:
/*
- * Calls 9f00xx..9f07ff are defined to return -ENOSYS
+ * Calls 0xf0xxx..0xf07ff are defined to return -ENOSYS
* if not implemented, rather than raising SIGILL. This
* way the calling program can gracefully determine whether
* a feature is supported.
*/
- if ((no & 0xffff) <= 0x7ff)
+ if (no < __ARM_NR_compat_syscall_end)
return -ENOSYS;
break;
}
--
2.1.4
The syscall number may have been changed by a tracer, so we should pass
the actual number in from the caller instead of pulling it from the
saved r7 value directly.
Cc: <stable(a)vger.kernel.org>
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: Pi-Hsun Shih <pihsun(a)chromium.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
---
arch/arm64/kernel/sys_compat.c | 9 ++++-----
arch/arm64/kernel/syscall.c | 9 ++++-----
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kernel/sys_compat.c b/arch/arm64/kernel/sys_compat.c
index 5972b7533fa0..54c29cd38ff9 100644
--- a/arch/arm64/kernel/sys_compat.c
+++ b/arch/arm64/kernel/sys_compat.c
@@ -66,12 +66,11 @@ do_compat_cache_op(unsigned long start, unsigned long end, int flags)
/*
* Handle all unrecognised system calls.
*/
-long compat_arm_syscall(struct pt_regs *regs)
+long compat_arm_syscall(struct pt_regs *regs, int scno)
{
- unsigned int no = regs->regs[7];
void __user *addr;
- switch (no) {
+ switch (scno) {
/*
* Flush a region from virtual address 'r0' to virtual address 'r1'
* _exclusive_. There is no alignment requirement on either address;
@@ -107,7 +106,7 @@ long compat_arm_syscall(struct pt_regs *regs)
* way the calling program can gracefully determine whether
* a feature is supported.
*/
- if (no < __ARM_NR_compat_syscall_end)
+ if (scno < __ARM_NR_compat_syscall_end)
return -ENOSYS;
break;
}
@@ -116,6 +115,6 @@ long compat_arm_syscall(struct pt_regs *regs)
(compat_thumb_mode(regs) ? 2 : 4);
arm64_notify_die("Oops - bad compat syscall(2)", regs,
- SIGILL, ILL_ILLTRP, addr, no);
+ SIGILL, ILL_ILLTRP, addr, scno);
return 0;
}
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 032d22312881..5610ac01c1ec 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -13,16 +13,15 @@
#include <asm/thread_info.h>
#include <asm/unistd.h>
-long compat_arm_syscall(struct pt_regs *regs);
-
+long compat_arm_syscall(struct pt_regs *regs, int scno);
long sys_ni_syscall(void);
-asmlinkage long do_ni_syscall(struct pt_regs *regs)
+static long do_ni_syscall(struct pt_regs *regs, int scno)
{
#ifdef CONFIG_COMPAT
long ret;
if (is_compat_task()) {
- ret = compat_arm_syscall(regs);
+ ret = compat_arm_syscall(regs, scno);
if (ret != -ENOSYS)
return ret;
}
@@ -47,7 +46,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)];
ret = __invoke_syscall(regs, syscall_fn);
} else {
- ret = do_ni_syscall(regs);
+ ret = do_ni_syscall(regs, scno);
}
regs->regs[0] = ret;
--
2.1.4