This series backports the patchset "exit: Put an upper limit on how often we can oops" (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#...) to 5.4, as recommended at https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-... This follows the backports to 5.10 and 5.15 which already released.
This required backporting various prerequisite patches.
I've tested that oops_limit and warn_limit work correctly on x86_64.
David Gow (1): mm: kasan: do not panic if both panic_on_warn and kasan_multishot set
Eric W. Biederman (2): exit: Add and use make_task_dead. objtool: Add a missing comma to avoid string concatenation
Jann Horn (1): exit: Put an upper limit on how often we can oops
Kees Cook (7): exit: Expose "oops_count" to sysfs exit: Allow oops_limit to be disabled panic: Consolidate open-coded panic_on_warn checks panic: Introduce warn_limit panic: Expose "warn_count" to sysfs docs: Fix path paste-o for /sys/kernel/warn_count exit: Use READ_ONCE() for all oops/warn limit reads
Nathan Chancellor (3): hexagon: Fix function name in die() h8300: Fix build errors from do_exit() to make_task_dead() transition csky: Fix function name in csky_alignment() and die()
Randy Dunlap (1): ia64: make IA64_MCA_RECOVERY bool instead of tristate
Tiezhu Yang (1): panic: unset panic_on_warn inside panic()
Xiaoming Ni (1): sysctl: add a new register_sysctl_init() interface
.../ABI/testing/sysfs-kernel-oops_count | 6 ++ .../ABI/testing/sysfs-kernel-warn_count | 6 ++ Documentation/admin-guide/sysctl/kernel.rst | 19 +++++ arch/alpha/kernel/traps.c | 6 +- arch/alpha/mm/fault.c | 2 +- arch/arm/kernel/traps.c | 2 +- arch/arm/mm/fault.c | 2 +- arch/arm64/kernel/traps.c | 2 +- arch/arm64/mm/fault.c | 2 +- arch/csky/abiv1/alignment.c | 2 +- arch/csky/kernel/traps.c | 2 +- arch/h8300/kernel/traps.c | 3 +- arch/h8300/mm/fault.c | 2 +- arch/hexagon/kernel/traps.c | 2 +- arch/ia64/Kconfig | 2 +- arch/ia64/kernel/mca_drv.c | 2 +- arch/ia64/kernel/traps.c | 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/kernel/traps.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/kernel/exceptions.c | 4 +- arch/mips/kernel/traps.c | 2 +- arch/nds32/kernel/fpu.c | 2 +- arch/nds32/kernel/traps.c | 8 +- arch/nios2/kernel/traps.c | 4 +- arch/openrisc/kernel/traps.c | 2 +- arch/parisc/kernel/traps.c | 2 +- arch/powerpc/kernel/traps.c | 2 +- arch/riscv/kernel/traps.c | 2 +- arch/riscv/mm/fault.c | 2 +- arch/s390/kernel/dumpstack.c | 2 +- arch/s390/kernel/nmi.c | 2 +- arch/sh/kernel/traps.c | 2 +- arch/sparc/kernel/traps_32.c | 4 +- arch/sparc/kernel/traps_64.c | 4 +- arch/x86/entry/entry_32.S | 6 +- arch/x86/entry/entry_64.S | 6 +- arch/x86/kernel/dumpstack.c | 4 +- arch/xtensa/kernel/traps.c | 2 +- fs/proc/proc_sysctl.c | 33 ++++++++ include/linux/kernel.h | 1 + include/linux/sched/task.h | 1 + include/linux/sysctl.h | 3 + kernel/exit.c | 72 ++++++++++++++++++ kernel/panic.c | 75 ++++++++++++++++--- kernel/sched/core.c | 3 +- mm/kasan/report.c | 4 +- tools/objtool/check.c | 3 +- 48 files changed, 260 insertions(+), 67 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-oops_count create mode 100644 Documentation/ABI/testing/sysfs-kernel-warn_count
From: Xiaoming Ni nixiaoming@huawei.com
commit 3ddd9a808cee7284931312f2f3e854c9617f44b2 upstream.
Patch series "sysctl: first set of kernel/sysctl cleanups", v2.
Finally had time to respin the series of the work we had started last year on cleaning up the kernel/sysct.c kitchen sink. People keeps stuffing their sysctls in that file and this creates a maintenance burden. So this effort is aimed at placing sysctls where they actually belong.
I'm going to split patches up into series as there is quite a bit of work.
This first set adds register_sysctl_init() for uses of registerting a sysctl on the init path, adds const where missing to a few places, generalizes common values so to be more easy to share, and starts the move of a few kernel/sysctl.c out where they belong.
The majority of rework on v2 in this first patch set is 0-day fixes. Eric Biederman's feedback is later addressed in subsequent patch sets.
I'll only post the first two patch sets for now. We can address the rest once the first two patch sets get completely reviewed / Acked.
This patch (of 9):
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic.
Today though folks heavily rely on tables on kernel/sysctl.c so they can easily just extend this table with their needed sysctls. In order to help users move their sysctls out we need to provide a helper which can be used during code initialization.
We special-case the initialization use of register_sysctl() since it *is* safe to fail, given all that sysctls do is provide a dynamic interface to query or modify at runtime an existing variable. So the use case of register_sysctl() on init should *not* stop if the sysctls don't end up getting registered. It would be counter productive to stop boot if a simple sysctl registration failed.
Provide a helper for init then, and document the recommended init levels to use for callers of this routine. We will later use this in subsequent patches to start slimming down kernel/sysctl.c tables and moving sysctl registration to the code which actually needs these sysctls.
[mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ]
Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni nixiaoming@huawei.com Signed-off-by: Luis Chamberlain mcgrof@kernel.org Reviewed-by: Kees Cook keescook@chromium.org Cc: Iurii Zaikin yzaikin@google.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Peter Zijlstra peterz@infradead.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Paul Turner pjt@google.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Sebastian Reichel sre@kernel.org Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Petr Mladek pmladek@suse.com Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: Qing Wang wangqing@vivo.com Cc: Benjamin LaHaise bcrl@kvack.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Jan Kara jack@suse.cz Cc: Amir Goldstein amir73il@gmail.com Cc: Stephen Kitt steve@sk2.org Cc: Antti Palosaari crope@iki.fi Cc: Arnd Bergmann arnd@arndb.de Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Clemens Ladisch clemens@ladisch.de Cc: David Airlie airlied@linux.ie Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Joel Becker jlbec@evilplan.org Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Joseph Qi joseph.qi@linux.alibaba.com Cc: Julia Lawall julia.lawall@inria.fr Cc: Lukas Middendorf kernel@tuxforce.de Cc: Mark Fasheh mark@fasheh.com Cc: Phillip Potter phil@philpotter.co.uk Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Douglas Gilbert dgilbert@interlog.com Cc: James E.J. Bottomley jejb@linux.ibm.com Cc: Jani Nikula jani.nikula@intel.com Cc: John Ogness john.ogness@linutronix.de Cc: Martin K. Petersen martin.petersen@oracle.com Cc: "Rafael J. Wysocki" rafael@kernel.org Cc: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Suren Baghdasaryan surenb@google.com Cc: "Theodore Ts'o" tytso@mit.edu Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Eric Biggers ebiggers@google.com --- fs/proc/proc_sysctl.c | 33 +++++++++++++++++++++++++++++++++ include/linux/sysctl.h | 3 +++ 2 files changed, 36 insertions(+)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index d80989b6c3448..f4264dd4ea31b 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -14,6 +14,7 @@ #include <linux/mm.h> #include <linux/module.h> #include <linux/bpf-cgroup.h> +#include <linux/kmemleak.h> #include "internal.h"
static const struct dentry_operations proc_sys_dentry_operations; @@ -1397,6 +1398,38 @@ struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *tab } EXPORT_SYMBOL(register_sysctl);
+/** + * __register_sysctl_init() - register sysctl table to path + * @path: path name for sysctl base + * @table: This is the sysctl table that needs to be registered to the path + * @table_name: The name of sysctl table, only used for log printing when + * registration fails + * + * The sysctl interface is used by userspace to query or modify at runtime + * a predefined value set on a variable. These variables however have default + * values pre-set. Code which depends on these variables will always work even + * if register_sysctl() fails. If register_sysctl() fails you'd just loose the + * ability to query or modify the sysctls dynamically at run time. Chances of + * register_sysctl() failing on init are extremely low, and so for both reasons + * this function does not return any error as it is used by initialization code. + * + * Context: Can only be called after your respective sysctl base path has been + * registered. So for instance, most base directories are registered early on + * init before init levels are processed through proc_sys_init() and + * sysctl_init(). + */ +void __init __register_sysctl_init(const char *path, struct ctl_table *table, + const char *table_name) +{ + struct ctl_table_header *hdr = register_sysctl(path, table); + + if (unlikely(!hdr)) { + pr_err("failed when register_sysctl %s to %s\n", table_name, path); + return; + } + kmemleak_not_leak(hdr); +} + static char *append_path(const char *path, char *pos, const char *name) { int namelen; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 6df477329b76e..aa615a0863f5c 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -208,6 +208,9 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, void unregister_sysctl_table(struct ctl_table_header * table);
extern int sysctl_init(void); +extern void __register_sysctl_init(const char *path, struct ctl_table *table, + const char *table_name); +#define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table)
extern struct ctl_table sysctl_mount_point[];
From: Tiezhu Yang yangtiezhu@loongson.cn
commit 1a2383e8b84c0451fd9b1eec3b9aab16f30b597c upstream.
In the current code, the following three places need to unset panic_on_warn before calling panic() to avoid recursive panics:
kernel/kcsan/report.c: print_report() kernel/sched/core.c: __schedule_bug() mm/kfence/report.c: kfence_report_error()
In order to avoid copy-pasting "panic_on_warn = 0" all over the places, it is better to move it inside panic() and then remove it from the other places.
Link: https://lkml.kernel.org/r/1644324666-15947-4-git-send-email-yangtiezhu@loong... Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Reviewed-by: Marco Elver elver@google.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Baoquan He bhe@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Xuefeng Li lixuefeng@loongson.cn Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Eric Biggers ebiggers@google.com --- kernel/panic.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/kernel/panic.c b/kernel/panic.c index f470a038b05bd..5e2b764ff5d54 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -173,6 +173,16 @@ void panic(const char *fmt, ...) int old_cpu, this_cpu; bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
+ if (panic_on_warn) { + /* + * This thread may hit another WARN() in the panic path. + * Resetting this prevents additional WARN() from panicking the + * system on this thread. Other threads are blocked by the + * panic_mutex in panic(). + */ + panic_on_warn = 0; + } + /* * Disable local interrupts. This will prevent panic_smp_self_stop * from deadlocking the first cpu that invokes the panic, since @@ -571,16 +581,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint, if (args) vprintk(args->fmt, args->args);
- if (panic_on_warn) { - /* - * This thread may hit another WARN() in the panic path. - * Resetting this prevents additional WARN() from panicking the - * system on this thread. Other threads are blocked by the - * panic_mutex in panic(). - */ - panic_on_warn = 0; + if (panic_on_warn) panic("panic_on_warn set ...\n"); - }
print_modules();
From: David Gow davidgow@google.com
commit be4f1ae978ffe98cc95ec49ceb95386fb4474974 upstream.
KASAN errors will currently trigger a panic when panic_on_warn is set. This renders kasan_multishot useless, as further KASAN errors won't be reported if the kernel has already paniced. By making kasan_multishot disable this behaviour for KASAN errors, we can still have the benefits of panic_on_warn for non-KASAN warnings, yet be able to use kasan_multishot.
This is particularly important when running KASAN tests, which need to trigger multiple KASAN errors: previously these would panic the system if panic_on_warn was set, now they can run (and will panic the system should non-KASAN warnings show up).
Signed-off-by: David Gow davidgow@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Tested-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Andrey Konovalov andreyknvl@google.com Reviewed-by: Brendan Higgins brendanhiggins@google.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Juri Lelli juri.lelli@redhat.com Cc: Patricia Alfonso trishalfonso@google.com Cc: Peter Zijlstra a.p.zijlstra@chello.nl Cc: Shuah Khan shuah@kernel.org Cc: Vincent Guittot vincent.guittot@linaro.org Link: https://lkml.kernel.org/r/20200915035828.570483-6-davidgow@google.com Link: https://lkml.kernel.org/r/20200910070331.3358048-6-davidgow@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Eric Biggers ebiggers@google.com --- mm/kasan/report.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/kasan/report.c b/mm/kasan/report.c index 621782100eaa0..a05ff1922d499 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -92,7 +92,7 @@ static void end_report(unsigned long *flags) pr_err("==================================================================\n"); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); spin_unlock_irqrestore(&report_lock, *flags); - if (panic_on_warn) + if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags)) panic("panic_on_warn set ...\n"); kasan_enable_current(); }
From: "Eric W. Biederman" ebiederm@xmission.com
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream.
There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code.
Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept.
Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing.
As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead.
Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Eric Biggers ebiggers@google.com --- arch/alpha/kernel/traps.c | 6 +++--- arch/alpha/mm/fault.c | 2 +- arch/arm/kernel/traps.c | 2 +- arch/arm/mm/fault.c | 2 +- arch/arm64/kernel/traps.c | 2 +- arch/arm64/mm/fault.c | 2 +- arch/csky/abiv1/alignment.c | 2 +- arch/csky/kernel/traps.c | 2 +- arch/h8300/kernel/traps.c | 2 +- arch/h8300/mm/fault.c | 2 +- arch/hexagon/kernel/traps.c | 2 +- arch/ia64/kernel/mca_drv.c | 2 +- arch/ia64/kernel/traps.c | 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/kernel/traps.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/kernel/exceptions.c | 4 ++-- arch/mips/kernel/traps.c | 2 +- arch/nds32/kernel/fpu.c | 2 +- arch/nds32/kernel/traps.c | 8 ++++---- arch/nios2/kernel/traps.c | 4 ++-- arch/openrisc/kernel/traps.c | 2 +- arch/parisc/kernel/traps.c | 2 +- arch/powerpc/kernel/traps.c | 2 +- arch/riscv/kernel/traps.c | 2 +- arch/riscv/mm/fault.c | 2 +- arch/s390/kernel/dumpstack.c | 2 +- arch/s390/kernel/nmi.c | 2 +- arch/sh/kernel/traps.c | 2 +- arch/sparc/kernel/traps_32.c | 4 +--- arch/sparc/kernel/traps_64.c | 4 +--- arch/x86/entry/entry_32.S | 6 +++--- arch/x86/entry/entry_64.S | 6 +++--- arch/x86/kernel/dumpstack.c | 4 ++-- arch/xtensa/kernel/traps.c | 2 +- include/linux/sched/task.h | 1 + kernel/exit.c | 9 +++++++++ tools/objtool/check.c | 3 ++- 38 files changed, 59 insertions(+), 52 deletions(-)
diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c index f6b9664ac5042..f87d8e1fcfe42 100644 --- a/arch/alpha/kernel/traps.c +++ b/arch/alpha/kernel/traps.c @@ -192,7 +192,7 @@ die_if_kernel(char * str, struct pt_regs *regs, long err, unsigned long *r9_15) local_irq_enable(); while (1); } - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
#ifndef CONFIG_MATHEMU @@ -577,7 +577,7 @@ do_entUna(void * va, unsigned long opcode, unsigned long reg,
printk("Bad unaligned kernel access at %016lx: %p %lx %lu\n", pc, va, opcode, reg); - do_exit(SIGSEGV); + make_task_dead(SIGSEGV);
got_exception: /* Ok, we caught the exception, but we don't want it. Is there @@ -632,7 +632,7 @@ do_entUna(void * va, unsigned long opcode, unsigned long reg, local_irq_enable(); while (1); } - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
/* diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 741e61ef9d3fe..a86286d2d3f3f 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -206,7 +206,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, printk(KERN_ALERT "Unable to handle kernel paging request at " "virtual address %016lx\n", address); die_if_kernel("Oops", regs, cause, (unsigned long*)regs - 16); - do_exit(SIGKILL); + make_task_dead(SIGKILL);
/* We ran out of memory, or some other thing happened to us that made us unable to handle the page fault gracefully. */ diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 207ef9a797bd4..03dfeb1208431 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -341,7 +341,7 @@ static void oops_end(unsigned long flags, struct pt_regs *regs, int signr) if (panic_on_oops) panic("Fatal exception"); if (signr) - do_exit(signr); + make_task_dead(signr); }
/* diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index bd0f4821f7e11..d623932437208 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -124,7 +124,7 @@ __do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr, show_pte(KERN_ALERT, mm, addr); die("Oops", regs, fsr); bust_spinlocks(0); - do_exit(SIGKILL); + make_task_dead(SIGKILL); }
/* diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 4e3e9d9c81517..a436a6972ced7 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -202,7 +202,7 @@ void die(const char *str, struct pt_regs *regs, int err) raw_spin_unlock_irqrestore(&die_lock, flags);
if (ret != NOTIFY_STOP) - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
static void arm64_show_signal(int signo, const char *str) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 2a7339aeb1ad4..a8e9c98147a19 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -296,7 +296,7 @@ static void die_kernel_fault(const char *msg, unsigned long addr, show_pte(addr); die("Oops", regs, esr); bust_spinlocks(0); - do_exit(SIGKILL); + make_task_dead(SIGKILL); }
static void __do_kernel_fault(unsigned long addr, unsigned int esr, diff --git a/arch/csky/abiv1/alignment.c b/arch/csky/abiv1/alignment.c index cb2a0d94a144d..5e2fb45d605cf 100644 --- a/arch/csky/abiv1/alignment.c +++ b/arch/csky/abiv1/alignment.c @@ -294,7 +294,7 @@ void csky_alignment(struct pt_regs *regs) __func__, opcode, rz, rx, imm, addr); show_regs(regs); bust_spinlocks(0); - do_exit(SIGKILL); + make_dead_task(SIGKILL); }
force_sig_fault(SIGBUS, BUS_ADRALN, (void __user *)addr); diff --git a/arch/csky/kernel/traps.c b/arch/csky/kernel/traps.c index 63715cb90ee99..af7562907f7fa 100644 --- a/arch/csky/kernel/traps.c +++ b/arch/csky/kernel/traps.c @@ -85,7 +85,7 @@ void die_if_kernel(char *str, struct pt_regs *regs, int nr) pr_err("%s: %08x\n", str, nr); show_regs(regs); add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE); - do_exit(SIGSEGV); + make_dead_task(SIGSEGV); }
void buserr(struct pt_regs *regs) diff --git a/arch/h8300/kernel/traps.c b/arch/h8300/kernel/traps.c index e47a9e0dc278f..a284c126f07a6 100644 --- a/arch/h8300/kernel/traps.c +++ b/arch/h8300/kernel/traps.c @@ -110,7 +110,7 @@ void die(const char *str, struct pt_regs *fp, unsigned long err) dump(fp);
spin_unlock_irq(&die_lock); - do_exit(SIGSEGV); + make_dead_task(SIGSEGV); }
static int kstack_depth_to_print = 24; diff --git a/arch/h8300/mm/fault.c b/arch/h8300/mm/fault.c index fabffb83930af..a8d8fc63780e4 100644 --- a/arch/h8300/mm/fault.c +++ b/arch/h8300/mm/fault.c @@ -52,7 +52,7 @@ asmlinkage int do_page_fault(struct pt_regs *regs, unsigned long address, printk(" at virtual address %08lx\n", address); if (!user_mode(regs)) die("Oops", regs, error_code); - do_exit(SIGKILL); + make_dead_task(SIGKILL);
return 1; } diff --git a/arch/hexagon/kernel/traps.c b/arch/hexagon/kernel/traps.c index 69c623b14ddd2..bfd04a388bcac 100644 --- a/arch/hexagon/kernel/traps.c +++ b/arch/hexagon/kernel/traps.c @@ -221,7 +221,7 @@ int die(const char *str, struct pt_regs *regs, long err) panic("Fatal exception");
oops_exit(); - do_exit(err); + make_dead_task(err); return 0; }
diff --git a/arch/ia64/kernel/mca_drv.c b/arch/ia64/kernel/mca_drv.c index 2a40268c3d494..d9ee3b186249d 100644 --- a/arch/ia64/kernel/mca_drv.c +++ b/arch/ia64/kernel/mca_drv.c @@ -176,7 +176,7 @@ mca_handler_bh(unsigned long paddr, void *iip, unsigned long ipsr) spin_unlock(&mca_bh_lock);
/* This process is about to be killed itself */ - do_exit(SIGKILL); + make_task_dead(SIGKILL); }
/** diff --git a/arch/ia64/kernel/traps.c b/arch/ia64/kernel/traps.c index e13cb905930fb..753642366e12e 100644 --- a/arch/ia64/kernel/traps.c +++ b/arch/ia64/kernel/traps.c @@ -85,7 +85,7 @@ die (const char *str, struct pt_regs *regs, long err) if (panic_on_oops) panic("Fatal exception");
- do_exit(SIGSEGV); + make_task_dead(SIGSEGV); return 0; }
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index c2f299fe9e04a..7f8c49579a2c2 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -272,7 +272,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re regs = NULL; bust_spinlocks(0); if (regs) - do_exit(SIGKILL); + make_task_dead(SIGKILL); return;
out_of_memory: diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c index 344f93d36a9a0..a245c1933d418 100644 --- a/arch/m68k/kernel/traps.c +++ b/arch/m68k/kernel/traps.c @@ -1139,7 +1139,7 @@ void die_if_kernel (char *str, struct pt_regs *fp, int nr) pr_crit("%s: %08x\n", str, nr); show_registers(fp); add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE); - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
asmlinkage void set_esp0(unsigned long ssp) diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index e9b1d7585b43b..03ebb67b413ef 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -48,7 +48,7 @@ int send_fault_sig(struct pt_regs *regs) pr_alert("Unable to handle kernel access"); pr_cont(" at virtual address %p\n", addr); die_if_kernel("Oops", regs, 0 /*error_code*/); - do_exit(SIGKILL); + make_task_dead(SIGKILL); }
return 1; diff --git a/arch/microblaze/kernel/exceptions.c b/arch/microblaze/kernel/exceptions.c index cf99c411503e3..6d3a6a6442205 100644 --- a/arch/microblaze/kernel/exceptions.c +++ b/arch/microblaze/kernel/exceptions.c @@ -44,10 +44,10 @@ void die(const char *str, struct pt_regs *fp, long err) pr_warn("Oops: %s, sig: %ld\n", str, err); show_regs(fp); spin_unlock_irq(&die_lock); - /* do_exit() should take care of panic'ing from an interrupt + /* make_task_dead() should take care of panic'ing from an interrupt * context so we don't handle it here */ - do_exit(err); + make_task_dead(err); }
/* for user application debugging */ diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 749089c25d5e6..5a491eca456fc 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -415,7 +415,7 @@ void __noreturn die(const char *str, struct pt_regs *regs) if (regs && kexec_should_crash(current)) crash_kexec(regs);
- do_exit(sig); + make_task_dead(sig); }
extern struct exception_table_entry __start___dbe_table[]; diff --git a/arch/nds32/kernel/fpu.c b/arch/nds32/kernel/fpu.c index 62bdafbc53f4c..26c62d5a55c15 100644 --- a/arch/nds32/kernel/fpu.c +++ b/arch/nds32/kernel/fpu.c @@ -223,7 +223,7 @@ inline void handle_fpu_exception(struct pt_regs *regs) } } else if (fpcsr & FPCSR_mskRIT) { if (!user_mode(regs)) - do_exit(SIGILL); + make_task_dead(SIGILL); si_signo = SIGILL; }
diff --git a/arch/nds32/kernel/traps.c b/arch/nds32/kernel/traps.c index f4d386b526227..f6648845aae76 100644 --- a/arch/nds32/kernel/traps.c +++ b/arch/nds32/kernel/traps.c @@ -184,7 +184,7 @@ void die(const char *str, struct pt_regs *regs, int err)
bust_spinlocks(0); spin_unlock_irq(&die_lock); - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
EXPORT_SYMBOL(die); @@ -288,7 +288,7 @@ void unhandled_interruption(struct pt_regs *regs) pr_emerg("unhandled_interruption\n"); show_regs(regs); if (!user_mode(regs)) - do_exit(SIGKILL); + make_task_dead(SIGKILL); force_sig(SIGKILL); }
@@ -299,7 +299,7 @@ void unhandled_exceptions(unsigned long entry, unsigned long addr, addr, type); show_regs(regs); if (!user_mode(regs)) - do_exit(SIGKILL); + make_task_dead(SIGKILL); force_sig(SIGKILL); }
@@ -326,7 +326,7 @@ void do_revinsn(struct pt_regs *regs) pr_emerg("Reserved Instruction\n"); show_regs(regs); if (!user_mode(regs)) - do_exit(SIGILL); + make_task_dead(SIGILL); force_sig(SIGILL); }
diff --git a/arch/nios2/kernel/traps.c b/arch/nios2/kernel/traps.c index 486db793923c0..8e192d6564261 100644 --- a/arch/nios2/kernel/traps.c +++ b/arch/nios2/kernel/traps.c @@ -37,10 +37,10 @@ void die(const char *str, struct pt_regs *regs, long err) show_regs(regs); spin_unlock_irq(&die_lock); /* - * do_exit() should take care of panic'ing from an interrupt + * make_task_dead() should take care of panic'ing from an interrupt * context so we don't handle it here */ - do_exit(err); + make_task_dead(err); }
void _exception(int signo, struct pt_regs *regs, int code, unsigned long addr) diff --git a/arch/openrisc/kernel/traps.c b/arch/openrisc/kernel/traps.c index 932a8ec2b520e..2804852a55924 100644 --- a/arch/openrisc/kernel/traps.c +++ b/arch/openrisc/kernel/traps.c @@ -218,7 +218,7 @@ void die(const char *str, struct pt_regs *regs, long err) __asm__ __volatile__("l.nop 1"); do {} while (1); #endif - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
/* This is normally the 'Oops' routine */ diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c index 2a1060d747a5d..37988f7f3abcb 100644 --- a/arch/parisc/kernel/traps.c +++ b/arch/parisc/kernel/traps.c @@ -268,7 +268,7 @@ void die_if_kernel(char *str, struct pt_regs *regs, long err) panic("Fatal exception");
oops_exit(); - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
/* gdb uses break 4,8 */ diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index ecfa460f66d17..70b99246dec46 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -246,7 +246,7 @@ static void oops_end(unsigned long flags, struct pt_regs *regs,
if (panic_on_oops) panic("Fatal exception"); - do_exit(signr); + make_task_dead(signr); } NOKPROBE_SYMBOL(oops_end);
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index ae462037910be..c28d4debf5926 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -57,7 +57,7 @@ void die(struct pt_regs *regs, const char *str) if (panic_on_oops) panic("Fatal exception"); if (ret != NOTIFY_STOP) - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr) diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index 247b8c859c448..1cfce62caa119 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -189,7 +189,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs) (addr < PAGE_SIZE) ? "NULL pointer dereference" : "paging request", addr); die(regs, "Oops"); - do_exit(SIGKILL); + make_task_dead(SIGKILL);
/* * We ran out of memory, call the OOM killer, and return the userspace diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c index 34bdc60c0b11d..2100833adfb69 100644 --- a/arch/s390/kernel/dumpstack.c +++ b/arch/s390/kernel/dumpstack.c @@ -210,5 +210,5 @@ void die(struct pt_regs *regs, const char *str) if (panic_on_oops) panic("Fatal exception: panic_on_oops"); oops_exit(); - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); } diff --git a/arch/s390/kernel/nmi.c b/arch/s390/kernel/nmi.c index 0a487fae763ee..d8951274658bd 100644 --- a/arch/s390/kernel/nmi.c +++ b/arch/s390/kernel/nmi.c @@ -179,7 +179,7 @@ void s390_handle_mcck(void) "malfunction (code 0x%016lx).\n", mcck.mcck_code); printk(KERN_EMERG "mcck: task: %s, pid: %d.\n", current->comm, current->pid); - do_exit(SIGSEGV); + make_task_dead(SIGSEGV); } } EXPORT_SYMBOL_GPL(s390_handle_mcck); diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c index 63cf17bc760da..6a228c00b73f4 100644 --- a/arch/sh/kernel/traps.c +++ b/arch/sh/kernel/traps.c @@ -57,7 +57,7 @@ void die(const char *str, struct pt_regs *regs, long err) if (panic_on_oops) panic("Fatal exception");
- do_exit(SIGSEGV); + make_task_dead(SIGSEGV); }
void die_if_kernel(const char *str, struct pt_regs *regs, long err) diff --git a/arch/sparc/kernel/traps_32.c b/arch/sparc/kernel/traps_32.c index 4ceecad556a9f..dbf068ac54ff3 100644 --- a/arch/sparc/kernel/traps_32.c +++ b/arch/sparc/kernel/traps_32.c @@ -86,9 +86,7 @@ void __noreturn die_if_kernel(char *str, struct pt_regs *regs) } printk("Instruction DUMP:"); instruction_dump ((unsigned long *) regs->pc); - if(regs->psr & PSR_PS) - do_exit(SIGKILL); - do_exit(SIGSEGV); + make_task_dead((regs->psr & PSR_PS) ? SIGKILL : SIGSEGV); }
void do_hw_interrupt(struct pt_regs *regs, unsigned long type) diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c index f2b22c496fb97..17768680cbaeb 100644 --- a/arch/sparc/kernel/traps_64.c +++ b/arch/sparc/kernel/traps_64.c @@ -2564,9 +2564,7 @@ void __noreturn die_if_kernel(char *str, struct pt_regs *regs) } if (panic_on_oops) panic("Fatal exception"); - if (regs->tstate & TSTATE_PRIV) - do_exit(SIGKILL); - do_exit(SIGSEGV); + make_task_dead((regs->tstate & TSTATE_PRIV)? SIGKILL : SIGSEGV); } EXPORT_SYMBOL(die_if_kernel);
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 2d837fb54c31b..740df9cc21963 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1659,13 +1659,13 @@ ENTRY(async_page_fault) END(async_page_fault) #endif
-ENTRY(rewind_stack_do_exit) +ENTRY(rewind_stack_and_make_dead) /* Prevent any naive code from trying to unwind to our caller. */ xorl %ebp, %ebp
movl PER_CPU_VAR(cpu_current_top_of_stack), %esi leal -TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp
- call do_exit + call make_task_dead 1: jmp 1b -END(rewind_stack_do_exit) +END(rewind_stack_and_make_dead) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c82136030d58f..bd7a4ad0937c4 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1757,7 +1757,7 @@ ENTRY(ignore_sysret) END(ignore_sysret) #endif
-ENTRY(rewind_stack_do_exit) +ENTRY(rewind_stack_and_make_dead) UNWIND_HINT_FUNC /* Prevent any naive code from trying to unwind to our caller. */ xorl %ebp, %ebp @@ -1766,5 +1766,5 @@ ENTRY(rewind_stack_do_exit) leaq -PTREGS_SIZE(%rax), %rsp UNWIND_HINT_REGS
- call do_exit -END(rewind_stack_do_exit) + call make_task_dead +END(rewind_stack_and_make_dead) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index e07424e19274b..e72042dc9487c 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -326,7 +326,7 @@ unsigned long oops_begin(void) } NOKPROBE_SYMBOL(oops_begin);
-void __noreturn rewind_stack_do_exit(int signr); +void __noreturn rewind_stack_and_make_dead(int signr);
void oops_end(unsigned long flags, struct pt_regs *regs, int signr) { @@ -361,7 +361,7 @@ void oops_end(unsigned long flags, struct pt_regs *regs, int signr) * reuse the task stack and that existing poisons are invalid. */ kasan_unpoison_task_stack(current); - rewind_stack_do_exit(signr); + rewind_stack_and_make_dead(signr); } NOKPROBE_SYMBOL(oops_end);
diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c index 4a6c495ce9b6d..16af8e514cb3b 100644 --- a/arch/xtensa/kernel/traps.c +++ b/arch/xtensa/kernel/traps.c @@ -543,5 +543,5 @@ void die(const char * str, struct pt_regs * regs, long err) if (panic_on_oops) panic("Fatal exception");
- do_exit(err); + make_task_dead(err); } diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 36f3011ab6013..6f33a07858cf6 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -51,6 +51,7 @@ extern int sched_fork(unsigned long clone_flags, struct task_struct *p); extern void sched_dead(struct task_struct *p);
void __noreturn do_task_dead(void); +void __noreturn make_task_dead(int signr);
extern void proc_caches_init(void);
diff --git a/kernel/exit.c b/kernel/exit.c index ece64771a31f5..6512d82b4d9b0 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -864,6 +864,15 @@ void __noreturn do_exit(long code) } EXPORT_SYMBOL_GPL(do_exit);
+void __noreturn make_task_dead(int signr) +{ + /* + * Take the task off the cpu after something catastrophic has + * happened. + */ + do_exit(signr); +} + void complete_and_exit(struct completion *comp, long code) { if (comp) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index ccf5580442d29..14be7d261ae7a 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -136,6 +136,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "panic", "do_exit", "do_task_dead", + "make_task_dead", "__module_put_and_exit", "complete_and_exit", "__reiserfs_panic", @@ -143,7 +144,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "fortify_panic", "usercopy_abort", "machine_real_restart", - "rewind_stack_do_exit", + "rewind_stack_and_make_dead" "cpu_bringup_and_idle", };
From: "Eric W. Biederman" ebiederm@xmission.com
commit 1fb466dff904e4a72282af336f2c355f011eec61 upstream.
Recently the kbuild robot reported two new errors:
lib/kunit/kunit-example-test.o: warning: objtool: .text.unlikely: unexpected end of section arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_opcodes()
I don't know why they did not occur in my test setup but after digging it I realized I had accidentally dropped a comma in tools/objtool/check.c when I renamed rewind_stack_do_exit to rewind_stack_and_make_dead.
Add that comma back to fix objtool errors.
Link: https://lkml.kernel.org/r/202112140949.Uq5sFKR1-lkp@intel.com Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Reported-by: kernel test robot lkp@intel.com Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Eric Biggers ebiggers@google.com --- tools/objtool/check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 14be7d261ae7a..dfd67243faac0 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -144,7 +144,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "fortify_panic", "usercopy_abort", "machine_real_restart", - "rewind_stack_and_make_dead" + "rewind_stack_and_make_dead", "cpu_bringup_and_idle", };
From: Nathan Chancellor nathan@kernel.org
commit 4f0712ccec09c071e221242a2db9a6779a55a949 upstream.
When building ARCH=hexagon defconfig:
arch/hexagon/kernel/traps.c:217:2: error: implicit declaration of function 'make_dead_task' [-Werror,-Wimplicit-function-declaration] make_dead_task(err); ^
The function's name is make_task_dead(), change it so there is no more build error.
Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Nathan Chancellor nathan@kernel.org Link: https://lkml.kernel.org/r/20211227184851.2297759-2-nathan@kernel.org Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Eric Biggers ebiggers@google.com --- arch/hexagon/kernel/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/hexagon/kernel/traps.c b/arch/hexagon/kernel/traps.c index bfd04a388bcac..f69eae3f32bd2 100644 --- a/arch/hexagon/kernel/traps.c +++ b/arch/hexagon/kernel/traps.c @@ -221,7 +221,7 @@ int die(const char *str, struct pt_regs *regs, long err) panic("Fatal exception");
oops_exit(); - make_dead_task(err); + make_task_dead(err); return 0; }
From: Nathan Chancellor nathan@kernel.org
commit ab4ababdf77ccc56c7301c751dff49c79709c51c upstream.
When building ARCH=h8300 defconfig:
arch/h8300/kernel/traps.c: In function 'die': arch/h8300/kernel/traps.c:109:2: error: implicit declaration of function 'make_dead_task' [-Werror=implicit-function-declaration] 109 | make_dead_task(SIGSEGV); | ^~~~~~~~~~~~~~
arch/h8300/mm/fault.c: In function 'do_page_fault': arch/h8300/mm/fault.c:54:2: error: implicit declaration of function 'make_dead_task' [-Werror=implicit-function-declaration] 54 | make_dead_task(SIGKILL); | ^~~~~~~~~~~~~~
The function's name is make_task_dead(), change it so there is no more build error.
Additionally, include linux/sched/task.h in arch/h8300/kernel/traps.c to avoid the same error because do_exit()'s declaration is in kernel.h but make_task_dead()'s is in task.h, which is not included in traps.c.
Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Nathan Chancellor nathan@kernel.org Link: https://lkml.kernel.org/r/20211227184851.2297759-3-nathan@kernel.org Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Eric Biggers ebiggers@google.com --- arch/h8300/kernel/traps.c | 3 ++- arch/h8300/mm/fault.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/h8300/kernel/traps.c b/arch/h8300/kernel/traps.c index a284c126f07a6..090adaee4b84c 100644 --- a/arch/h8300/kernel/traps.c +++ b/arch/h8300/kernel/traps.c @@ -17,6 +17,7 @@ #include <linux/types.h> #include <linux/sched.h> #include <linux/sched/debug.h> +#include <linux/sched/task.h> #include <linux/mm_types.h> #include <linux/kernel.h> #include <linux/errno.h> @@ -110,7 +111,7 @@ void die(const char *str, struct pt_regs *fp, unsigned long err) dump(fp);
spin_unlock_irq(&die_lock); - make_dead_task(SIGSEGV); + make_task_dead(SIGSEGV); }
static int kstack_depth_to_print = 24; diff --git a/arch/h8300/mm/fault.c b/arch/h8300/mm/fault.c index a8d8fc63780e4..573825c3cb708 100644 --- a/arch/h8300/mm/fault.c +++ b/arch/h8300/mm/fault.c @@ -52,7 +52,7 @@ asmlinkage int do_page_fault(struct pt_regs *regs, unsigned long address, printk(" at virtual address %08lx\n", address); if (!user_mode(regs)) die("Oops", regs, error_code); - make_dead_task(SIGKILL); + make_task_dead(SIGKILL);
return 1; }
From: Nathan Chancellor nathan@kernel.org
commit 751971af2e3615dc5bd12674080bc795505fefeb upstream.
When building ARCH=csky defconfig:
arch/csky/kernel/traps.c: In function 'die': arch/csky/kernel/traps.c:112:17: error: implicit declaration of function 'make_dead_task' [-Werror=implicit-function-declaration] 112 | make_dead_task(SIGSEGV); | ^~~~~~~~~~~~~~
The function's name is make_task_dead(), change it so there is no more build error.
Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Guo Ren guoren@kernel.org Link: https://lkml.kernel.org/r/20211227184851.2297759-4-nathan@kernel.org Signed-off-by: Eric W. Biederman ebiederm@xmission.com Signed-off-by: Eric Biggers ebiggers@google.com --- arch/csky/abiv1/alignment.c | 2 +- arch/csky/kernel/traps.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/csky/abiv1/alignment.c b/arch/csky/abiv1/alignment.c index 5e2fb45d605cf..2df115d0e2105 100644 --- a/arch/csky/abiv1/alignment.c +++ b/arch/csky/abiv1/alignment.c @@ -294,7 +294,7 @@ void csky_alignment(struct pt_regs *regs) __func__, opcode, rz, rx, imm, addr); show_regs(regs); bust_spinlocks(0); - make_dead_task(SIGKILL); + make_task_dead(SIGKILL); }
force_sig_fault(SIGBUS, BUS_ADRALN, (void __user *)addr); diff --git a/arch/csky/kernel/traps.c b/arch/csky/kernel/traps.c index af7562907f7fa..8cdbbcb5ed875 100644 --- a/arch/csky/kernel/traps.c +++ b/arch/csky/kernel/traps.c @@ -85,7 +85,7 @@ void die_if_kernel(char *str, struct pt_regs *regs, int nr) pr_err("%s: %08x\n", str, nr); show_regs(regs); add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE); - make_dead_task(SIGSEGV); + make_task_dead(SIGSEGV); }
void buserr(struct pt_regs *regs)
From: Randy Dunlap rdunlap@infradead.org
commit dbecf9b8b8ce580f4e11afed9d61e8aa294cddd2 upstream.
In linux-next, IA64_MCA_RECOVERY uses the (new) function make_task_dead(), which is not exported for use by modules. Instead of exporting it for one user, convert IA64_MCA_RECOVERY to be a bool Kconfig symbol.
In a config file from "kernel test robot lkp@intel.com" for a different problem, this linker error was exposed when CONFIG_IA64_MCA_RECOVERY=m.
Fixes this build error:
ERROR: modpost: "make_task_dead" [arch/ia64/kernel/mca_recovery.ko] undefined!
Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Randy Dunlap rdunlap@infradead.org Suggested-by: Christoph Hellwig hch@infradead.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: "Eric W. Biederman" ebiederm@xmission.com Cc: Tony Luck tony.luck@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Eric Biggers ebiggers@google.com --- arch/ia64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 16714477eef42..6a6036f16abe6 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -360,7 +360,7 @@ config ARCH_PROC_KCORE_TEXT depends on PROC_KCORE
config IA64_MCA_RECOVERY - tristate "MCA recovery from errors other than TLB." + bool "MCA recovery from errors other than TLB."
config PERFMON bool "Performance monitor support"
From: Jann Horn jannh@google.com
commit d4ccd54d28d3c8598e2354acc13e28c060961dbb upstream.
Many Linux systems are configured to not panic on oops; but allowing an attacker to oops the system **really** often can make even bugs that look completely unexploitable exploitable (like NULL dereferences and such) if each crash elevates a refcount by one or a lock is taken in read mode, and this causes a counter to eventually overflow.
The most interesting counters for this are 32 bits wide (like open-coded refcounts that don't use refcount_t). (The ldsem reader count on 32-bit platforms is just 16 bits, but probably nobody cares about 32-bit platforms that much nowadays.)
So let's panic the system if the kernel is constantly oopsing.
The speed of oopsing 2^32 times probably depends on several factors, like how long the stack trace is and which unwinder you're using; an empirically important one is whether your console is showing a graphical environment or a text console that oopses will be printed to. In a quick single-threaded benchmark, it looks like oopsing in a vfork() child with a very short stack trace only takes ~510 microseconds per run when a graphical console is active; but switching to a text console that oopses are printed to slows it down around 87x, to ~45 milliseconds per run. (Adding more threads makes this faster, but the actual oops printing happens under &die_lock on x86, so you can maybe speed this up by a factor of around 2 and then any further improvement gets eaten up by lock contention.)
It looks like it would take around 8-12 days to overflow a 32-bit counter with repeated oopsing on a multi-core X86 system running a graphical environment; both me (in an X86 VM) and Seth (with a distro kernel on normal hardware in a standard configuration) got numbers in that ballpark.
12 days aren't *that* short on a desktop system, and you'd likely need much longer on a typical server system (assuming that people don't run graphical desktop environments on their servers), and this is a *very* noisy and violent approach to exploiting the kernel; and it also seems to take orders of magnitude longer on some machines, probably because stuff like EFI pstore will slow it down a ton if that's active.
Signed-off-by: Jann Horn jannh@google.com Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com Reviewed-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- Documentation/admin-guide/sysctl/kernel.rst | 8 ++++ kernel/exit.c | 43 +++++++++++++++++++++ 2 files changed, 51 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 9715685be6e3b..4bdf845c79aa3 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -557,6 +557,14 @@ numa_balancing_scan_size_mb is how many megabytes worth of pages are scanned for a given scan.
+oops_limit +========== + +Number of kernel oopses after which the kernel should panic when +``panic_on_oops`` is not set. Setting this to 0 or 1 has the same effect +as setting ``panic_on_oops=1``. + + osrelease, ostype & version: ============================
diff --git a/kernel/exit.c b/kernel/exit.c index 6512d82b4d9b0..4236970aa4384 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -69,6 +69,33 @@ #include <asm/pgtable.h> #include <asm/mmu_context.h>
+/* + * The default value should be high enough to not crash a system that randomly + * crashes its kernel from time to time, but low enough to at least not permit + * overflowing 32-bit refcounts or the ldsem writer count. + */ +static unsigned int oops_limit = 10000; + +#ifdef CONFIG_SYSCTL +static struct ctl_table kern_exit_table[] = { + { + .procname = "oops_limit", + .data = &oops_limit, + .maxlen = sizeof(oops_limit), + .mode = 0644, + .proc_handler = proc_douintvec, + }, + { } +}; + +static __init int kernel_exit_sysctls_init(void) +{ + register_sysctl_init("kernel", kern_exit_table); + return 0; +} +late_initcall(kernel_exit_sysctls_init); +#endif + static void __unhash_process(struct task_struct *p, bool group_dead) { nr_threads--; @@ -866,10 +893,26 @@ EXPORT_SYMBOL_GPL(do_exit);
void __noreturn make_task_dead(int signr) { + static atomic_t oops_count = ATOMIC_INIT(0); + /* * Take the task off the cpu after something catastrophic has * happened. */ + + /* + * Every time the system oopses, if the oops happens while a reference + * to an object was held, the reference leaks. + * If the oops doesn't also leak memory, repeated oopsing can cause + * reference counters to wrap around (if they're not using refcount_t). + * This means that repeated oopsing can make unexploitable-looking bugs + * exploitable through repeated oopsing. + * To make sure this can't happen, place an upper bound on how often the + * kernel may oops without panic(). + */ + if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit)) + panic("Oopsed too often (kernel.oops_limit is %d)", oops_limit); + do_exit(signr); }
From: Kees Cook keescook@chromium.org
commit 9db89b41117024f80b38b15954017fb293133364 upstream.
Since Oops count is now tracked and is a fairly interesting signal, add the entry /sys/kernel/oops_count to expose it to userspace.
Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Jann Horn jannh@google.com Cc: Arnd Bergmann arnd@arndb.de Reviewed-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221117234328.594699-3-keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- .../ABI/testing/sysfs-kernel-oops_count | 6 +++++ kernel/exit.c | 22 +++++++++++++++++-- 2 files changed, 26 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-oops_count
diff --git a/Documentation/ABI/testing/sysfs-kernel-oops_count b/Documentation/ABI/testing/sysfs-kernel-oops_count new file mode 100644 index 0000000000000..156cca9dbc960 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-oops_count @@ -0,0 +1,6 @@ +What: /sys/kernel/oops_count +Date: November 2022 +KernelVersion: 6.2.0 +Contact: Linux Kernel Hardening List linux-hardening@vger.kernel.org +Description: + Shows how many times the system has Oopsed since last boot. diff --git a/kernel/exit.c b/kernel/exit.c index 4236970aa4384..48ac68ebab728 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -63,6 +63,7 @@ #include <linux/random.h> #include <linux/rcuwait.h> #include <linux/compat.h> +#include <linux/sysfs.h>
#include <linux/uaccess.h> #include <asm/unistd.h> @@ -96,6 +97,25 @@ static __init int kernel_exit_sysctls_init(void) late_initcall(kernel_exit_sysctls_init); #endif
+static atomic_t oops_count = ATOMIC_INIT(0); + +#ifdef CONFIG_SYSFS +static ssize_t oops_count_show(struct kobject *kobj, struct kobj_attribute *attr, + char *page) +{ + return sysfs_emit(page, "%d\n", atomic_read(&oops_count)); +} + +static struct kobj_attribute oops_count_attr = __ATTR_RO(oops_count); + +static __init int kernel_exit_sysfs_init(void) +{ + sysfs_add_file_to_group(kernel_kobj, &oops_count_attr.attr, NULL); + return 0; +} +late_initcall(kernel_exit_sysfs_init); +#endif + static void __unhash_process(struct task_struct *p, bool group_dead) { nr_threads--; @@ -893,8 +913,6 @@ EXPORT_SYMBOL_GPL(do_exit);
void __noreturn make_task_dead(int signr) { - static atomic_t oops_count = ATOMIC_INIT(0); - /* * Take the task off the cpu after something catastrophic has * happened.
From: Kees Cook keescook@chromium.org
commit de92f65719cd672f4b48397540b9f9eff67eca40 upstream.
In preparation for keeping oops_limit logic in sync with warn_limit, have oops_limit == 0 disable checking the Oops counter.
Cc: Jann Horn jannh@google.com Cc: Jonathan Corbet corbet@lwn.net Cc: Andrew Morton akpm@linux-foundation.org Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: Eric Biggers ebiggers@google.com Cc: Huang Ying ying.huang@intel.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Arnd Bergmann arnd@arndb.de Cc: linux-doc@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- Documentation/admin-guide/sysctl/kernel.rst | 5 +++-- kernel/exit.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 4bdf845c79aa3..bc31c4a88f20f 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -561,8 +561,9 @@ oops_limit ==========
Number of kernel oopses after which the kernel should panic when -``panic_on_oops`` is not set. Setting this to 0 or 1 has the same effect -as setting ``panic_on_oops=1``. +``panic_on_oops`` is not set. Setting this to 0 disables checking +the count. Setting this to 1 has the same effect as setting +``panic_on_oops=1``. The default value is 10000.
osrelease, ostype & version: diff --git a/kernel/exit.c b/kernel/exit.c index 48ac68ebab728..381282fb756c3 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -928,7 +928,7 @@ void __noreturn make_task_dead(int signr) * To make sure this can't happen, place an upper bound on how often the * kernel may oops without panic(). */ - if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit)) + if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit) && oops_limit) panic("Oopsed too often (kernel.oops_limit is %d)", oops_limit);
do_exit(signr);
From: Kees Cook keescook@chromium.org
commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream.
Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll their own warnings, and each check "panic_on_warn". Consolidate this into a single function so that future instrumentation can be added in a single location.
Cc: Marco Elver elver@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Cc: Daniel Bristot de Oliveira bristot@redhat.com Cc: Valentin Schneider vschneid@redhat.com Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: Alexander Potapenko glider@google.com Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Andrew Morton akpm@linux-foundation.org Cc: David Gow davidgow@google.com Cc: tangmeng tangmeng@uniontech.com Cc: Jann Horn jannh@google.com Cc: Shuah Khan skhan@linuxfoundation.org Cc: Petr Mladek pmladek@suse.com Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: "Guilherme G. Piccoli" gpiccoli@igalia.com Cc: Tiezhu Yang yangtiezhu@loongson.cn Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Reviewed-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Marco Elver elver@google.com Reviewed-by: Andrey Konovalov andreyknvl@gmail.com Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- include/linux/kernel.h | 1 + kernel/panic.c | 9 +++++++-- kernel/sched/core.c | 3 +-- mm/kasan/report.c | 4 ++-- 4 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 77c86a2236daf..1fdb251947ed4 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -321,6 +321,7 @@ extern long (*panic_blink)(int state); __printf(1, 2) void panic(const char *fmt, ...) __noreturn __cold; void nmi_panic(struct pt_regs *regs, const char *msg); +void check_panic_on_warn(const char *origin); extern void oops_enter(void); extern void oops_exit(void); void print_oops_end_marker(void); diff --git a/kernel/panic.c b/kernel/panic.c index 5e2b764ff5d54..7e4900eb25ac1 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -156,6 +156,12 @@ static void panic_print_sys_info(void) ftrace_dump(DUMP_ALL); }
+void check_panic_on_warn(const char *origin) +{ + if (panic_on_warn) + panic("%s: panic_on_warn set ...\n", origin); +} + /** * panic - halt the system * @fmt: The text string to print @@ -581,8 +587,7 @@ void __warn(const char *file, int line, void *caller, unsigned taint, if (args) vprintk(args->fmt, args->args);
- if (panic_on_warn) - panic("panic_on_warn set ...\n"); + check_panic_on_warn("kernel");
print_modules();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 06b686ef36e68..8ab239fd1c8d3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3964,8 +3964,7 @@ static noinline void __schedule_bug(struct task_struct *prev) print_ip_sym(preempt_disable_ip); pr_cont("\n"); } - if (panic_on_warn) - panic("scheduling while atomic\n"); + check_panic_on_warn("scheduling while atomic");
dump_stack(); add_taint(TAINT_WARN, LOCKDEP_STILL_OK); diff --git a/mm/kasan/report.c b/mm/kasan/report.c index a05ff1922d499..4d87df96acc1e 100644 --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -92,8 +92,8 @@ static void end_report(unsigned long *flags) pr_err("==================================================================\n"); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); spin_unlock_irqrestore(&report_lock, *flags); - if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags)) - panic("panic_on_warn set ...\n"); + if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags)) + check_panic_on_warn("KASAN"); kasan_enable_current(); }
From: Kees Cook keescook@chromium.org
commit 9fc9e278a5c0b708eeffaf47d6eb0c82aa74ed78 upstream.
Like oops_limit, add warn_limit for limiting the number of warnings when panic_on_warn is not set.
Cc: Jonathan Corbet corbet@lwn.net Cc: Andrew Morton akpm@linux-foundation.org Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: Eric Biggers ebiggers@google.com Cc: Huang Ying ying.huang@intel.com Cc: Petr Mladek pmladek@suse.com Cc: tangmeng tangmeng@uniontech.com Cc: "Guilherme G. Piccoli" gpiccoli@igalia.com Cc: Tiezhu Yang yangtiezhu@loongson.cn Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: linux-doc@vger.kernel.org Reviewed-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221117234328.594699-5-keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- Documentation/admin-guide/sysctl/kernel.rst | 10 ++++++++ kernel/panic.c | 27 +++++++++++++++++++++ 2 files changed, 37 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index bc31c4a88f20f..568c24ff00a72 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1186,6 +1186,16 @@ entry will default to 2 instead of 0. 2 Unprivileged calls to ``bpf()`` are disabled = =============================================================
+ +warn_limit +========== + +Number of kernel warnings after which the kernel should panic when +``panic_on_warn`` is not set. Setting this to 0 disables checking +the warning count. Setting this to 1 has the same effect as setting +``panic_on_warn=1``. The default value is 0. + + watchdog: =========
diff --git a/kernel/panic.c b/kernel/panic.c index 7e4900eb25ac1..8f72305dd501d 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -44,6 +44,7 @@ static int pause_on_oops_flag; static DEFINE_SPINLOCK(pause_on_oops_lock); bool crash_kexec_post_notifiers; int panic_on_warn __read_mostly; +static unsigned int warn_limit __read_mostly;
int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); @@ -60,6 +61,26 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
EXPORT_SYMBOL(panic_notifier_list);
+#ifdef CONFIG_SYSCTL +static struct ctl_table kern_panic_table[] = { + { + .procname = "warn_limit", + .data = &warn_limit, + .maxlen = sizeof(warn_limit), + .mode = 0644, + .proc_handler = proc_douintvec, + }, + { } +}; + +static __init int kernel_panic_sysctls_init(void) +{ + register_sysctl_init("kernel", kern_panic_table); + return 0; +} +late_initcall(kernel_panic_sysctls_init); +#endif + static long no_blink(int state) { return 0; @@ -158,8 +179,14 @@ static void panic_print_sys_info(void)
void check_panic_on_warn(const char *origin) { + static atomic_t warn_count = ATOMIC_INIT(0); + if (panic_on_warn) panic("%s: panic_on_warn set ...\n", origin); + + if (atomic_inc_return(&warn_count) >= READ_ONCE(warn_limit) && warn_limit) + panic("%s: system warned too often (kernel.warn_limit is %d)", + origin, warn_limit); }
/**
From: Kees Cook keescook@chromium.org
commit 8b05aa26336113c4cea25f1c333ee8cd4fc212a6 upstream.
Since Warn count is now tracked and is a fairly interesting signal, add the entry /sys/kernel/warn_count to expose it to userspace.
Cc: Petr Mladek pmladek@suse.com Cc: Andrew Morton akpm@linux-foundation.org Cc: tangmeng tangmeng@uniontech.com Cc: "Guilherme G. Piccoli" gpiccoli@igalia.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Tiezhu Yang yangtiezhu@loongson.cn Reviewed-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221117234328.594699-6-keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- .../ABI/testing/sysfs-kernel-warn_count | 6 +++++ kernel/panic.c | 22 +++++++++++++++++-- 2 files changed, 26 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-warn_count
diff --git a/Documentation/ABI/testing/sysfs-kernel-warn_count b/Documentation/ABI/testing/sysfs-kernel-warn_count new file mode 100644 index 0000000000000..08f083d2fd51b --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-warn_count @@ -0,0 +1,6 @@ +What: /sys/kernel/oops_count +Date: November 2022 +KernelVersion: 6.2.0 +Contact: Linux Kernel Hardening List linux-hardening@vger.kernel.org +Description: + Shows how many times the system has Warned since last boot. diff --git a/kernel/panic.c b/kernel/panic.c index 8f72305dd501d..2c118645e7408 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -31,6 +31,7 @@ #include <linux/bug.h> #include <linux/ratelimit.h> #include <linux/debugfs.h> +#include <linux/sysfs.h> #include <asm/sections.h>
#define PANIC_TIMER_STEP 100 @@ -81,6 +82,25 @@ static __init int kernel_panic_sysctls_init(void) late_initcall(kernel_panic_sysctls_init); #endif
+static atomic_t warn_count = ATOMIC_INIT(0); + +#ifdef CONFIG_SYSFS +static ssize_t warn_count_show(struct kobject *kobj, struct kobj_attribute *attr, + char *page) +{ + return sysfs_emit(page, "%d\n", atomic_read(&warn_count)); +} + +static struct kobj_attribute warn_count_attr = __ATTR_RO(warn_count); + +static __init int kernel_panic_sysfs_init(void) +{ + sysfs_add_file_to_group(kernel_kobj, &warn_count_attr.attr, NULL); + return 0; +} +late_initcall(kernel_panic_sysfs_init); +#endif + static long no_blink(int state) { return 0; @@ -179,8 +199,6 @@ static void panic_print_sys_info(void)
void check_panic_on_warn(const char *origin) { - static atomic_t warn_count = ATOMIC_INIT(0); - if (panic_on_warn) panic("%s: panic_on_warn set ...\n", origin);
From: Kees Cook keescook@chromium.org
commit 00dd027f721e0458418f7750d8a5a664ed3e5994 upstream.
Running "make htmldocs" shows that "/sys/kernel/oops_count" was duplicated. This should have been "warn_count":
Warning: /sys/kernel/oops_count is defined 2 times: ./Documentation/ABI/testing/sysfs-kernel-warn_count:0 ./Documentation/ABI/testing/sysfs-kernel-oops_count:0
Fix the typo.
Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/linux-doc/202212110529.A3Qav8aR-lkp@intel.com Fixes: 8b05aa263361 ("panic: Expose "warn_count" to sysfs") Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- Documentation/ABI/testing/sysfs-kernel-warn_count | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-kernel-warn_count b/Documentation/ABI/testing/sysfs-kernel-warn_count index 08f083d2fd51b..90a029813717d 100644 --- a/Documentation/ABI/testing/sysfs-kernel-warn_count +++ b/Documentation/ABI/testing/sysfs-kernel-warn_count @@ -1,4 +1,4 @@ -What: /sys/kernel/oops_count +What: /sys/kernel/warn_count Date: November 2022 KernelVersion: 6.2.0 Contact: Linux Kernel Hardening List linux-hardening@vger.kernel.org
From: Kees Cook keescook@chromium.org
commit 7535b832c6399b5ebfc5b53af5c51dd915ee2538 upstream.
Use a temporary variable to take full advantage of READ_ONCE() behavior. Without this, the report (and even the test) might be out of sync with the initial test.
Reported-by: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.ne... Fixes: 9fc9e278a5c0 ("panic: Introduce warn_limit") Fixes: d4ccd54d28d3 ("exit: Put an upper limit on how often we can oops") Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Jann Horn jannh@google.com Cc: Arnd Bergmann arnd@arndb.de Cc: Petr Mladek pmladek@suse.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Luis Chamberlain mcgrof@kernel.org Cc: Marco Elver elver@google.com Cc: tangmeng tangmeng@uniontech.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Tiezhu Yang yangtiezhu@loongson.cn Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Eric Biggers ebiggers@google.com --- kernel/exit.c | 6 ++++-- kernel/panic.c | 7 +++++-- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c index 381282fb756c3..563bdaa766945 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -917,6 +917,7 @@ void __noreturn make_task_dead(int signr) * Take the task off the cpu after something catastrophic has * happened. */ + unsigned int limit;
/* * Every time the system oopses, if the oops happens while a reference @@ -928,8 +929,9 @@ void __noreturn make_task_dead(int signr) * To make sure this can't happen, place an upper bound on how often the * kernel may oops without panic(). */ - if (atomic_inc_return(&oops_count) >= READ_ONCE(oops_limit) && oops_limit) - panic("Oopsed too often (kernel.oops_limit is %d)", oops_limit); + limit = READ_ONCE(oops_limit); + if (atomic_inc_return(&oops_count) >= limit && limit) + panic("Oopsed too often (kernel.oops_limit is %d)", limit);
do_exit(signr); } diff --git a/kernel/panic.c b/kernel/panic.c index 2c118645e7408..cef79466f9417 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -199,12 +199,15 @@ static void panic_print_sys_info(void)
void check_panic_on_warn(const char *origin) { + unsigned int limit; + if (panic_on_warn) panic("%s: panic_on_warn set ...\n", origin);
- if (atomic_inc_return(&warn_count) >= READ_ONCE(warn_limit) && warn_limit) + limit = READ_ONCE(warn_limit); + if (atomic_inc_return(&warn_count) >= limit && limit) panic("%s: system warned too often (kernel.warn_limit is %d)", - origin, warn_limit); + origin, limit); }
/**
On Wed, Feb 01, 2023 at 08:42:38PM -0800, Eric Biggers wrote:
This series backports the patchset "exit: Put an upper limit on how often we can oops" (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#...) to 5.4, as recommended at https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-... This follows the backports to 5.10 and 5.15 which already released.
This required backporting various prerequisite patches.
I've tested that oops_limit and warn_limit work correctly on x86_64.
Queued up all 3 backports, thanks!
On Thu, Feb 02, 2023 at 12:16:52PM -0500, Sasha Levin wrote:
On Wed, Feb 01, 2023 at 08:42:38PM -0800, Eric Biggers wrote:
This series backports the patchset "exit: Put an upper limit on how often we can oops" (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#...) to 5.4, as recommended at https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-... This follows the backports to 5.10 and 5.15 which already released.
This required backporting various prerequisite patches.
I've tested that oops_limit and warn_limit work correctly on x86_64.
Queued up all 3 backports, thanks!
... and proceeded to drop the 4.19 and 4.14 backports which fail to build:
mm/kasan/report.c: In function 'kasan_end_report': mm/kasan/report.c:175:16: error: 'KASAN_BIT_MULTI_SHOT' undeclared (first use in this function) 175 | if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
On Thu, Feb 02, 2023 at 12:47:07PM -0500, Sasha Levin wrote:
On Thu, Feb 02, 2023 at 12:16:52PM -0500, Sasha Levin wrote:
On Wed, Feb 01, 2023 at 08:42:38PM -0800, Eric Biggers wrote:
This series backports the patchset "exit: Put an upper limit on how often we can oops" (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#...) to 5.4, as recommended at https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-... This follows the backports to 5.10 and 5.15 which already released.
This required backporting various prerequisite patches.
I've tested that oops_limit and warn_limit work correctly on x86_64.
Queued up all 3 backports, thanks!
... and proceeded to drop the 4.19 and 4.14 backports which fail to build:
mm/kasan/report.c: In function 'kasan_end_report': mm/kasan/report.c:175:16: error: 'KASAN_BIT_MULTI_SHOT' undeclared (first use in this function) 175 | if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
Thanks, I'll fix that. I had grepped for KASAN_BIT_MULTI_SHOT to make sure those branches had it, but I didn't notice it was defined later in the file :-(
- Eric
On Wed, 1 Feb 2023 20:42:38 -0800 Eric Biggers ebiggers@kernel.org wrote:
This series backports the patchset "exit: Put an upper limit on how often we can oops" (https://lore.kernel.org/linux-mm/20221117233838.give.484-kees@kernel.org/T/#...) to 5.4, as recommended at https://googleprojectzero.blogspot.com/2023/01/exploiting-null-dereferences-... This follows the backports to 5.10 and 5.15 which already released.
This required backporting various prerequisite patches.
I've tested that oops_limit and warn_limit work correctly on x86_64.
Thanks for your great efforts on this.
Tested-by: SeongJae Park sj@kernel.org
Thanks, SJ
linux-stable-mirror@lists.linaro.org