Commit 17c2895 ("arm64: Abstract syscallno manipulation") abstracts
out the pt_regs.syscallno value for a syscall cancelled by a tracer
as NO_SYSCALL, and provides helpers to set and check for this
condition. However, the way this was implemented has the
unintended side-effect of disabling part of the syscall restart
logic.
This comes about because the second in_syscall() check in
do_signal() re-evaluates the "in a syscall" condition based on the
updated pt_regs instead of the original pt_regs. forget_syscall()
is explicitly called prior to the second check in order to prevent
restart logic in the ret_to_user path being spuriously triggered,
which means that the second in_syscall() check always yields false.
This triggers a failure in
tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to
suppress a signal that interrups a nanosleep() syscall.
Misbehaviour of this type is only expected in the case where a
tracer suppresses a signal and the target process is either being
single-stepped or the interrupted syscall attempts to restart via
-ERESTARTBLOCK.
This patch restores the old behaviour by performing the
in_syscall() check only once at the start of the function.
Fixes: 17c289586009 ("arm64: Abstract syscallno manipulation")
Signed-off-by: Dave Martin <Dave.Martin(a)arm.com>
Reported-by: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org> # 4.14.x-
---
arch/arm64/kernel/signal.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 154b7d3..f212090 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -830,11 +830,12 @@ static void do_signal(struct pt_regs *regs)
unsigned long continue_addr = 0, restart_addr = 0;
int retval = 0;
struct ksignal ksig;
+ bool syscall = in_syscall(regs);
/*
* If we were from a system call, check for system call restarting...
*/
- if (in_syscall(regs)) {
+ if (syscall) {
continue_addr = regs->pc;
restart_addr = continue_addr - (compat_thumb_mode(regs) ? 2 : 4);
retval = regs->regs[0];
@@ -886,7 +887,7 @@ static void do_signal(struct pt_regs *regs)
* Handle restarting a different system call. As above, if a debugger
* has chosen to restart at a different PC, ignore the restart.
*/
- if (in_syscall(regs) && regs->pc == restart_addr) {
+ if (syscall && regs->pc == restart_addr) {
if (retval == -ERESTART_RESTARTBLOCK)
setup_restart_syscall(regs);
user_rewind_single_step(current);
--
2.1.4
This patch series includes some improvement to Machine check handler
for pseries. Patch 1 fixes an issue where machine check handler crashes
kernel while accessing vmalloc-ed buffer while in nmi context.
Patch 2 fixes endain bug while restoring of r3 in MCE handler.
Patch 4 dumps the SLB contents on SLB MCE errors to improve the debugability.
Patch 5 display's the MCE error details on console.
CHange in V3:
- Moved patch 5 to patch 2
Change in V2:
- patch 3: Display additional info (NIP and task info) in MCE error details.
- patch 5: Fix endain bug while restoring of r3 in MCE handler.
---
Mahesh Salgaonkar (5):
powerpc/pseries: convert rtas_log_buf to linear allocation.
powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.
powerpc/pseries: Define MCE error event section.
powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
powerpc/pseries: Display machine check error details.
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1
arch/powerpc/include/asm/rtas.h | 109 ++++++++++++++++++
arch/powerpc/kernel/rtasd.c | 2
arch/powerpc/mm/slb.c | 35 ++++++
arch/powerpc/platforms/pseries/ras.c | 155 +++++++++++++++++++++++++
5 files changed, 299 insertions(+), 3 deletions(-)
--
Signature
From: Arnd Bergmann <arnd(a)arndb.de>
commit 590347e4000356f55eb10b03ced2686bd74dab40 upstream.
gcc-6.3 and earlier show a new warning after a seemingly unrelated
change to the arm64 PAGE_KERNEL definition:
In file included from drivers/md/dm-bufio.c:14:0:
drivers/md/dm-bufio.c: In function 'alloc_buffer':
include/linux/sched/mm.h:182:56: warning: 'noio_flag' may be used uninitialized in this function [-Wmaybe-uninitialized]
current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
^
The same warning happened earlier on linux-3.18 for MIPS and I did a
workaround for that, but now it's come back.
gcc-7 and newer are apparently smart enough to figure this out, and
other architectures don't show it, so the best I could come up with is
to rework the caller slightly in a way that makes it obvious enough to
all arm64 compilers what is happening here.
Fixes: 41acec624087 ("arm64: kpti: Make use of nG dependent on arm64_kernel_unmapped_at_el0()")
Link: https://patchwork.kernel.org/patch/9692829/
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
[snitzer: moved declarations inside conditional, altered vmalloc return]
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
[nc: Backport to 4.9, adjust context for lack of 19809c2da28a]
Signed-off-by: Nathan Chancellor <natechancellor(a)gmail.com>
---
drivers/md/dm-bufio.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 3ec647e8b9c6..35fd57fdeba9 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -373,9 +373,6 @@ static void __cache_size_refresh(void)
static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
enum data_mode *data_mode)
{
- unsigned noio_flag;
- void *ptr;
-
if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
*data_mode = DATA_MODE_SLAB;
return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
@@ -399,16 +396,16 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
* all allocations done by this process (including pagetables) are done
* as if GFP_NOIO was specified.
*/
+ if (gfp_mask & __GFP_NORETRY) {
+ unsigned noio_flag = memalloc_noio_save();
+ void *ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM,
+ PAGE_KERNEL);
- if (gfp_mask & __GFP_NORETRY)
- noio_flag = memalloc_noio_save();
-
- ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL);
-
- if (gfp_mask & __GFP_NORETRY)
memalloc_noio_restore(noio_flag);
+ return ptr;
+ }
- return ptr;
+ return __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL);
}
/*
--
2.17.1
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE allocations
that want to allocate on a different node, e.g. because the allocating
thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page is
put on wrong node's list, then further list manipulations may corrupt the
list because page_to_nid() is used to determine which node's list_lock
should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There is
actually no reason to reset it, because memory policies and cpusets don't
affect the zonelist choice in the first place. This was different when
commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's anything
currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that
don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the
page allocated on arbitrary node") it is. That's at least 4.4 LTS.
Older ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 -
1 file changed, 1 deletion(-)
diff -puN mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset
+++ a/mm/page_alloc.c
@@ -4169,7 +4169,6 @@ retry:
* orientated.
*/
if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->high_zoneidx, ac->nodemask);
}
_
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE allocations
that want to allocate on a different node, e.g. because the allocating
thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page is
put on wrong node's list, then further list manipulations may corrupt the
list because page_to_nid() is used to determine which node's list_lock
should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There is
actually no reason to reset it, because memory policies and cpusets don't
affect the zonelist choice in the first place. This was different when
commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's anything
currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that
don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the
page allocated on arbitrary node") it is. That's at least 4.4 LTS.
Older ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 -
1 file changed, 1 deletion(-)
diff -puN mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset
+++ a/mm/page_alloc.c
@@ -4169,7 +4169,6 @@ retry:
* orientated.
*/
if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->high_zoneidx, ac->nodemask);
}
_