This patch set enables kdump (crash dump kernel) support on arm64 on top of Geoff's kexec patchset.
In this version, there are some arm64-specific usage/constraints: 1) "mem=" boot parameter must be specified on crash dump kernel if the system starts on uefi. 2) Kvm will not be enabled on crash dump kernel even if configured See commit messages and Documentation/kdump/kdump.txt for details.
The only concern I have is whether or not we can use the exact same kernel as both system kernel and crash dump kernel. The current arm64 kernel is not relocatable in the exact sense but I see no problems in using the same binary when testing kdump.
I tested the code with - ATF v1.1 + EDK2(UEFI) v3.0-rc0 - kernel v4.0 + Geoff' kexec v9 on - Base fast model, and - MediaTek MT8173-EVB using my own kexec-tools [1], currently v0.12.
You may want to start a kernel with the following boot parameter: crashkernel=64M (or so, on model) and try $ kexec -p --load <vmlinux> --append ... $ echo c > /proc/sysrq-trigger
To examine vmcore (/proc/vmcore), you should use - gdb v7.7 or later - crash + a small patch (to recognize v4.0 kernel)
Changes from v1: * rebased to Geoff's v9 * tested this patchset on real hardware and fixed bugs: - added cache flush operation in ipi_cpu_stop() when shutting down the system. Otherwise, data saved in vmcore's note sections by crash_save_cpu() might not be flushed to dumped memory and crash command fail to fetch correct data. I will address Mark's commit[2] after Geoff takes care of it on kexec. - modified to use ioremap_cache() instead of ioremap() when reading crash memory. Otherwise, accessing /proc/vmcore on crash dump kernel might cause an alignment fault. * allows reserve_crashkernel() to handle "crashkernel=xyz[MG]" correctly, thanks to Pratyush Anand. And it now also enforces memory limit. * moved reserve_crashkernel() and reserve_elfcorehdr() to arm64_memblock_init() to clarify that they should be called before dma_contignuous_reserve().
[1] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/338171.html
AKASHI Takahiro (5): arm64: kdump: reserve memory for crash dump kernel arm64: kdump: implement machine_crash_shutdown() arm64: kdump: do not go into EL2 before starting a crash dump kernel arm64: add kdump support arm64: enable kdump in the arm64 defconfig
Documentation/kdump/kdump.txt | 31 +++++++++++++- arch/arm64/Kconfig | 12 ++++++ arch/arm64/configs/defconfig | 1 + arch/arm64/include/asm/kexec.h | 34 ++++++++++++++- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/crash_dump.c | 71 +++++++++++++++++++++++++++++++ arch/arm64/kernel/machine_kexec.c | 55 +++++++++++++++++++++++- arch/arm64/kernel/process.c | 7 +++- arch/arm64/kernel/setup.c | 8 +++- arch/arm64/kernel/smp.c | 12 +++++- arch/arm64/mm/init.c | 84 +++++++++++++++++++++++++++++++++++++ 11 files changed, 309 insertions(+), 7 deletions(-) create mode 100644 arch/arm64/kernel/crash_dump.c
On system kernel, the memory region used by crash dump kernel must be specified by "crashkernel=X@Y" boot parameter. reserve_crashkernel() will allocate the region in "System RAM" and reserve it for later use.
On crash dump kernel, memory region information in system kernel is described in a specific region specified by "elfcorehdr=X@Y" boot parameter. reserve_elfcorehdr() will set aside the region to avoid data destruction by the kernel.
Crash dump kernel will access memory regions in system kernel via copy_oldmem_page(), which reads a page by ioremap'ing it assuming that such pages are not part of main memory of crash dump kernel. This is true under non-UEFI environment because kexec-tools modifies a device tree adding "usablemem" attributes to memory sections. Under UEFI, however, this is not true because UEFI remove memory sections in a device tree and export all the memory regions, even though they belong to system kernel.
So we should add "mem=X[MG]" boot parameter to limit the memory size and avoid hitting the following assertion in ioremap(): if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr)))) return NULL;
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/crash_dump.c | 71 +++++++++++++++++++++++++++++++++ arch/arm64/kernel/setup.c | 8 +++- arch/arm64/mm/init.c | 84 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 163 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kernel/crash_dump.c
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index ac3c2e2..6fcc602 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -36,6 +36,7 @@ arm64-obj-$(CONFIG_EFI) += efi.o efi-stub.o efi-entry.o arm64-obj-$(CONFIG_PCI) += pci.o arm64-obj-$(CONFIG_ARMV8_DEPRECATED) += armv8_deprecated.o arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o +arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-y += $(arm64-obj-y) vdso/ obj-m += $(arm64-obj-m) diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c new file mode 100644 index 0000000..3d86c0a --- /dev/null +++ b/arch/arm64/kernel/crash_dump.c @@ -0,0 +1,71 @@ +/* + * Routines for doing kexec-based kdump + * + * Copyright (C) 2014 Linaro Limited + * Author: AKASHI Takahiro takahiro.akashi@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/crash_dump.h> +#include <linux/errno.h> +#include <linux/io.h> +#include <linux/memblock.h> +#include <linux/uaccess.h> +#include <asm/memory.h> + +/** + * copy_oldmem_page() - copy one page from old kernel memory + * @pfn: page frame number to be copied + * @buf: buffer where the copied page is placed + * @csize: number of bytes to copy + * @offset: offset in bytes into the page + * @userbuf: if set, @buf is in a user address space + * + * This function copies one page from old kernel memory into buffer pointed by + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes + * copied or negative error in case of failure. + */ +ssize_t copy_oldmem_page(unsigned long pfn, char *buf, + size_t csize, unsigned long offset, + int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE); + if (!vaddr) + return -ENOMEM; + + if (userbuf) { + if (copy_to_user(buf, vaddr + offset, csize)) { + iounmap(vaddr); + return -EFAULT; + } + } else { + memcpy(buf, vaddr + offset, csize); + } + + iounmap(vaddr); + + return csize; +} + +/** + * elfcorehdr_read - read from ELF core header + * @buf: buffer where the data is placed + * @csize: number of bytes to read + * @ppos: address in the memory + * + * This function reads @count bytes from elf core header which exists + * on crash dump kernel's memory. + */ +ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos) +{ + memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count); + return count; +} diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 51ef972..7932bd0 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -31,7 +31,6 @@ #include <linux/screen_info.h> #include <linux/init.h> #include <linux/kexec.h> -#include <linux/crash_dump.h> #include <linux/root_dev.h> #include <linux/clk-provider.h> #include <linux/cpu.h> @@ -364,6 +363,12 @@ static void __init request_standard_resources(void) kernel_data.end <= res->end) request_resource(res, &kernel_data); } + +#ifdef CONFIG_KEXEC + /* User space tools will detect the region with /proc/iomem */ + if (crashk_res.end) + insert_resource(&iomem_resource, &crashk_res); +#endif }
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID }; @@ -393,6 +398,7 @@ void __init setup_arch(char **cmdline_p) local_async_enable();
efi_init(); + arm64_memblock_init();
paging_init(); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ae85da6..ea70d41 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -34,6 +34,8 @@ #include <linux/dma-contiguous.h> #include <linux/efi.h> #include <linux/swiotlb.h> +#include <linux/kexec.h> +#include <linux/crash_dump.h>
#include <asm/fixmap.h> #include <asm/memory.h> @@ -66,6 +68,81 @@ static int __init early_initrd(char *p) early_param("initrd", early_initrd); #endif
+#ifdef CONFIG_KEXEC +/* + * reserve_crashkernel() - reserves memory for crash kernel + * + * This function reserves memory area given in "crashkernel=" kernel command + * line parameter. The memory reserved is used by a dump capture kernel when + * primary kernel is crashing. + */ +static void __init reserve_crashkernel(phys_addr_t limit) +{ + unsigned long long crash_size = 0, crash_base = 0; + int ret; + + ret = parse_crashkernel(boot_command_line, limit, + &crash_size, &crash_base); + if (ret) + return; + + if (crash_base == 0) { + crash_base = memblock_alloc(crash_size, 1 << 20); + if (crash_base == 0) { + pr_warn("crashkernel allocation failed (size:%llx)\n", + crash_size); + return; + } + } else { + /* User specifies base address explicitly. Sanity check */ + if (!memblock_is_region_memory(crash_base, crash_size) || + memblock_is_region_reserved(crash_base, crash_size)) { + pr_warn("crashkernel= has wrong address or size\n"); + return; + } + + if (memblock_reserve(crash_base, crash_size)) { + pr_warn("crashkernel reservation failed - out of memory\n"); + return; + } + } + + pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n", + crash_size >> 20, crash_base >> 20); + + crashk_res.start = crash_base; + crashk_res.end = crash_base + crash_size - 1; +} +#endif /* CONFIG_KEXEC */ + +#ifdef CONFIG_CRASH_DUMP +/* + * reserve_elfcorehdr() - reserves memory for elf core header + * + * This function reserves memory area given in "elfcorehdr=" kernel command + * line parameter. The memory reserved is used by a dump capture kernel to + * identify the memory used by primary kernel. + */ +static void __init reserve_elfcorehdr(void) +{ + if (!elfcorehdr_size) + return; + + if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) { + pr_warn("elfcorehdr reservation failed - memory is in use (0x%llx)\n", + elfcorehdr_addr); + return; + } + + if (memblock_reserve(elfcorehdr_addr, elfcorehdr_size)) { + pr_warn("elfcorehdr reservation failed - out of memory\n"); + return; + } + + pr_info("Reserving %lldKB of memory at %lldMB for elfcorehdr\n", + elfcorehdr_size >> 10, elfcorehdr_addr >> 20); +} +#endif /* CONFIG_CRASH_DUMP */ /* * Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It * currently assumes that for memory starting above 4G, 32-bit devices will @@ -170,6 +247,13 @@ void __init arm64_memblock_init(void) memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start); #endif
+#ifdef CONFIG_KEXEC + reserve_crashkernel(memory_limit); +#endif +#ifdef CONFIG_CRASH_DUMP + reserve_elfcorehdr(); +#endif + early_init_fdt_scan_reserved_mem();
/* 4GB maximum for 32-bit only capable devices */
On Fri, Apr 24, 2015 at 08:53:04AM +0100, AKASHI Takahiro wrote:
On system kernel, the memory region used by crash dump kernel must be specified by "crashkernel=X@Y" boot parameter. reserve_crashkernel() will allocate the region in "System RAM" and reserve it for later use.
On crash dump kernel, memory region information in system kernel is described in a specific region specified by "elfcorehdr=X@Y" boot parameter. reserve_elfcorehdr() will set aside the region to avoid data destruction by the kernel.
Crash dump kernel will access memory regions in system kernel via copy_oldmem_page(), which reads a page by ioremap'ing it assuming that such pages are not part of main memory of crash dump kernel. This is true under non-UEFI environment because kexec-tools modifies a device tree adding "usablemem" attributes to memory sections.
I'm not sure what you mean by "usablemem" here.
Do you just mean that the memory nodes are altered such that they only cover memory usable by the crash kernel?
Why not _always_ require a command line argument for the crash kernel that restricts its memory usage to a particular range? That way it doesn't matter whether we're using UEFI or not.
Under UEFI, however, this is not true because UEFI remove memory sections in a device tree and export all the memory regions, even though they belong to system kernel.
So we should add "mem=X[MG]" boot parameter to limit the memory size and avoid hitting the following assertion in ioremap(): if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr)))) return NULL;
That looks suspicious. What is being ioremapped at that point?
[...]
@@ -393,6 +398,7 @@ void __init setup_arch(char **cmdline_p) local_async_enable();
efi_init();
arm64_memblock_init(); paging_init();
Nit: unrelated whitespace change.
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ae85da6..ea70d41 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -34,6 +34,8 @@ #include <linux/dma-contiguous.h> #include <linux/efi.h> #include <linux/swiotlb.h> +#include <linux/kexec.h> +#include <linux/crash_dump.h>
Nit: please keep these ordered.
[...]
if (memblock_reserve(crash_base, crash_size)) {
pr_warn("crashkernel reservation failed - out of memory\n");
return;
}
If we can remove this memory rather than reserving it, we can limit the first kernel's ability to accidentally clobber the crash kernel, at the expense of having to explicitly map/unmap around loading it.
Mark.
On 04/24/2015 07:11 PM, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:04AM +0100, AKASHI Takahiro wrote:
On system kernel, the memory region used by crash dump kernel must be specified by "crashkernel=X@Y" boot parameter. reserve_crashkernel() will allocate the region in "System RAM" and reserve it for later use.
On crash dump kernel, memory region information in system kernel is described in a specific region specified by "elfcorehdr=X@Y" boot parameter. reserve_elfcorehdr() will set aside the region to avoid data destruction by the kernel.
Crash dump kernel will access memory regions in system kernel via copy_oldmem_page(), which reads a page by ioremap'ing it assuming that such pages are not part of main memory of crash dump kernel. This is true under non-UEFI environment because kexec-tools modifies a device tree adding "usablemem" attributes to memory sections.
I'm not sure what you mean by "usablemem" here.
I think I explained it in my previous reply.
Do you just mean that the memory nodes are altered such that they only cover memory usable by the crash kernel?
Why not _always_ require a command line argument for the crash kernel that restricts its memory usage to a particular range? That way it doesn't matter whether we're using UEFI or not.
This is one option, but why does uefi ignore all the memory properties?
Under UEFI, however, this is not true because UEFI remove memory sections in a device tree and export all the memory regions, even though they belong to system kernel.
So we should add "mem=X[MG]" boot parameter to limit the memory size and avoid hitting the following assertion in ioremap(): if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr)))) return NULL;
That looks suspicious. What is being ioremapped at that point?
As explained so far, all the memory regions are exposed to crash dump kernel, and it recognizes any pages which should belong to the old kernel also as part of crash kernel's memory. So pfn_valid() returns true.
[...]
@@ -393,6 +398,7 @@ void __init setup_arch(char **cmdline_p) local_async_enable();
efi_init();
arm64_memblock_init(); paging_init();
Nit: unrelated whitespace change.
Ok. Will fix it.
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ae85da6..ea70d41 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -34,6 +34,8 @@ #include <linux/dma-contiguous.h> #include <linux/efi.h> #include <linux/swiotlb.h> +#include <linux/kexec.h> +#include <linux/crash_dump.h>
Nit: please keep these ordered.
Yeah, but others "linux/*.h" in this file are already in a random order.
[...]
if (memblock_reserve(crash_base, crash_size)) {
pr_warn("crashkernel reservation failed - out of memory\n");
return;
}
If we can remove this memory rather than reserving it, we can limit the first kernel's ability to accidentally clobber the crash kernel, at the expense of having to explicitly map/unmap around loading it.
Do you mean that we should remove mmu mapping of crash kernel memory? Might be a good idea, but it requires modifying kernel/kexec.c.
-Takahiro AKASHI
Mark.
+#ifdef CONFIG_CRASH_DUMP +/*
- reserve_elfcorehdr() - reserves memory for elf core header
- This function reserves memory area given in "elfcorehdr=" kernel command
- line parameter. The memory reserved is used by a dump capture kernel to
- identify the memory used by primary kernel.
- */
Hi AKASHI,
May I know why elfcorehdr need be reserved separately but not locate a memory region in crashkernel reserved region like all other ARCHs? Is there any special reason?
Thanks Baoquan
+static void __init reserve_elfcorehdr(void) +{
- if (!elfcorehdr_size)
return;
- if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - memory is in use (0x%llx)\n",
elfcorehdr_addr);
return;
- }
- if (memblock_reserve(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - out of memory\n");
return;
- }
- pr_info("Reserving %lldKB of memory at %lldMB for elfcorehdr\n",
elfcorehdr_size >> 10, elfcorehdr_addr >> 20);
+} +#endif /* CONFIG_CRASH_DUMP */ /*
- Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
- currently assumes that for memory starting above 4G, 32-bit devices will
@@ -170,6 +247,13 @@ void __init arm64_memblock_init(void) memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start); #endif +#ifdef CONFIG_KEXEC
- reserve_crashkernel(memory_limit);
+#endif +#ifdef CONFIG_CRASH_DUMP
- reserve_elfcorehdr();
+#endif
- early_init_fdt_scan_reserved_mem();
/* 4GB maximum for 32-bit only capable devices */ -- 1.7.9.5
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hi Baoquan,
On 04/28/2015 06:19 PM, Baoquan He wrote:
+#ifdef CONFIG_CRASH_DUMP +/*
- reserve_elfcorehdr() - reserves memory for elf core header
- This function reserves memory area given in "elfcorehdr=" kernel command
- line parameter. The memory reserved is used by a dump capture kernel to
- identify the memory used by primary kernel.
- */
Hi AKASHI,
May I know why elfcorehdr need be reserved separately but not locate a memory region in crashkernel reserved region like all other ARCHs? Is there any special reason?
I don't get your point, but arm as well as arm64 locates elfcorehdr in a crash kernel's memory region. See kexec/arch/arm{,64}/crashdump-arm{,64}.c in kexec-tools.
And this region is reserved at boot time *on crash kernel* because we don't want to corrupt it accidentally. (After Mark's comment, we might better remove the mmu mapping for this region, too.)
Make sense?
-Takahiro AKASHI
Thanks Baoquan
+static void __init reserve_elfcorehdr(void) +{
- if (!elfcorehdr_size)
return;
- if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - memory is in use (0x%llx)\n",
elfcorehdr_addr);
return;
- }
- if (memblock_reserve(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - out of memory\n");
return;
- }
- pr_info("Reserving %lldKB of memory at %lldMB for elfcorehdr\n",
elfcorehdr_size >> 10, elfcorehdr_addr >> 20);
+} +#endif /* CONFIG_CRASH_DUMP */ /*
- Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
- currently assumes that for memory starting above 4G, 32-bit devices will
@@ -170,6 +247,13 @@ void __init arm64_memblock_init(void) memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start); #endif
+#ifdef CONFIG_KEXEC
- reserve_crashkernel(memory_limit);
+#endif +#ifdef CONFIG_CRASH_DUMP
- reserve_elfcorehdr();
+#endif
early_init_fdt_scan_reserved_mem();
/* 4GB maximum for 32-bit only capable devices */
-- 1.7.9.5
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/11/15 at 04:38pm, AKASHI Takahiro wrote:
Hi Baoquan,
On 04/28/2015 06:19 PM, Baoquan He wrote:
+#ifdef CONFIG_CRASH_DUMP +/*
- reserve_elfcorehdr() - reserves memory for elf core header
- This function reserves memory area given in "elfcorehdr=" kernel command
- line parameter. The memory reserved is used by a dump capture kernel to
- identify the memory used by primary kernel.
- */
Hi AKASHI,
May I know why elfcorehdr need be reserved separately but not locate a memory region in crashkernel reserved region like all other ARCHs? Is there any special reason?
I don't get your point, but arm as well as arm64 locates elfcorehdr in a crash kernel's memory region. See kexec/arch/arm{,64}/crashdump-arm{,64}.c in kexec-tools.
And this region is reserved at boot time *on crash kernel* because we don't want to corrupt it accidentally. (After Mark's comment, we might better remove the mmu mapping for this region, too.)
Sorry, I don't make myself clear.
In this patch you reserve a separate memory region in 1st kernel to store elfcorehdr. I am wondering why you don't call add_buffer in kexec-tools directly. Like this you can get a region from reserved crashkernel region. Then you don't need reserve_elfcorehdr() to reserve memory for elfcorehdr specifically. Like other ARCHs do only one memory region is reserved in 1st kernel, that's crashkernel region.
Thanks Baoquan
Make sense?
-Takahiro AKASHI
Thanks Baoquan
+static void __init reserve_elfcorehdr(void) +{
- if (!elfcorehdr_size)
return;
- if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - memory is in use (0x%llx)\n",
elfcorehdr_addr);
return;
- }
- if (memblock_reserve(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - out of memory\n");
return;
- }
- pr_info("Reserving %lldKB of memory at %lldMB for elfcorehdr\n",
elfcorehdr_size >> 10, elfcorehdr_addr >> 20);
+} +#endif /* CONFIG_CRASH_DUMP */ /*
- Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
- currently assumes that for memory starting above 4G, 32-bit devices will
@@ -170,6 +247,13 @@ void __init arm64_memblock_init(void) memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start); #endif
+#ifdef CONFIG_KEXEC
- reserve_crashkernel(memory_limit);
+#endif +#ifdef CONFIG_CRASH_DUMP
- reserve_elfcorehdr();
+#endif
early_init_fdt_scan_reserved_mem();
/* 4GB maximum for 32-bit only capable devices */
-- 1.7.9.5
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/11/2015 04:54 PM, Baoquan He wrote:
On 05/11/15 at 04:38pm, AKASHI Takahiro wrote:
Hi Baoquan,
On 04/28/2015 06:19 PM, Baoquan He wrote:
+#ifdef CONFIG_CRASH_DUMP +/*
- reserve_elfcorehdr() - reserves memory for elf core header
- This function reserves memory area given in "elfcorehdr=" kernel command
- line parameter. The memory reserved is used by a dump capture kernel to
- identify the memory used by primary kernel.
- */
Hi AKASHI,
May I know why elfcorehdr need be reserved separately but not locate a memory region in crashkernel reserved region like all other ARCHs? Is there any special reason?
I don't get your point, but arm as well as arm64 locates elfcorehdr in a crash kernel's memory region. See kexec/arch/arm{,64}/crashdump-arm{,64}.c in kexec-tools.
And this region is reserved at boot time *on crash kernel* because we don't want to corrupt it accidentally. (After Mark's comment, we might better remove the mmu mapping for this region, too.)
Sorry, I don't make myself clear.
In this patch you reserve a separate memory region in 1st kernel to store elfcorehdr. I am wondering why you don't call add_buffer in kexec-tools directly. Like this you can get a region from reserved crashkernel region. Then you don't need reserve_elfcorehdr() to reserve memory for elfcorehdr specifically. Like other ARCHs do only one memory region is reserved in 1st kernel, that's crashkernel region.
I think that you misunderstand somewhat. * Kexec-tools only locates/identifies a small region for elfcore header within crash kernel's memory region while 1st kernel is running. * the data in elfcore header is filled up by kexec_load system call on 1st kernel. * 1st kernel doesn't reserve any region for elfcore header because the kernel commandline parameters don't contains "elfcorehdr=" parameter, then elfcorehdr_size=0. * Crash dump kernel does reserve the region, as I said, because we don't want to corrupt the info in elfcore header accidentally while crash kernel is running.
Clear?
-Takahiro AKASHI
Thanks Baoquan
Make sense?
-Takahiro AKASHI
Thanks Baoquan
+static void __init reserve_elfcorehdr(void) +{
- if (!elfcorehdr_size)
return;
- if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - memory is in use (0x%llx)\n",
elfcorehdr_addr);
return;
- }
- if (memblock_reserve(elfcorehdr_addr, elfcorehdr_size)) {
pr_warn("elfcorehdr reservation failed - out of memory\n");
return;
- }
- pr_info("Reserving %lldKB of memory at %lldMB for elfcorehdr\n",
elfcorehdr_size >> 10, elfcorehdr_addr >> 20);
+} +#endif /* CONFIG_CRASH_DUMP */ /*
- Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
- currently assumes that for memory starting above 4G, 32-bit devices will
@@ -170,6 +247,13 @@ void __init arm64_memblock_init(void) memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start); #endif
+#ifdef CONFIG_KEXEC
- reserve_crashkernel(memory_limit);
+#endif +#ifdef CONFIG_CRASH_DUMP
- reserve_elfcorehdr();
+#endif
early_init_fdt_scan_reserved_mem();
/* 4GB maximum for 32-bit only capable devices */
-- 1.7.9.5
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 05/11/15 at 05:17pm, AKASHI Takahiro wrote:
On 05/11/2015 04:54 PM, Baoquan He wrote:
In this patch you reserve a separate memory region in 1st kernel to store elfcorehdr. I am wondering why you don't call add_buffer in kexec-tools directly. Like this you can get a region from reserved crashkernel region. Then you don't need reserve_elfcorehdr() to reserve memory for elfcorehdr specifically. Like other ARCHs do only one memory region is reserved in 1st kernel, that's crashkernel region.
I think that you misunderstand somewhat.
- Kexec-tools only locates/identifies a small region for elfcore header within crash kernel's
memory region while 1st kernel is running.
- the data in elfcore header is filled up by kexec_load system call on 1st kernel.
- 1st kernel doesn't reserve any region for elfcore header because the kernel
commandline parameters don't contains "elfcorehdr=" parameter, then elfcorehdr_size=0.
- Crash dump kernel does reserve the region, as I said, because we don't want to
corrupt the info in elfcore header accidentally while crash kernel is running.
Clear?
OK, got it now.
Then I am wondering why "elfcorehdr=" can't be contained in kernel cmdline as other ARCH does. Maybe I need go over all related threads then know why it is. Thanks for explanation.
Thanks Baoquan
On 05/11/2015 06:41 PM, Baoquan He wrote:
On 05/11/15 at 05:17pm, AKASHI Takahiro wrote:
On 05/11/2015 04:54 PM, Baoquan He wrote:
In this patch you reserve a separate memory region in 1st kernel to store elfcorehdr. I am wondering why you don't call add_buffer in kexec-tools directly. Like this you can get a region from reserved crashkernel region. Then you don't need reserve_elfcorehdr() to reserve memory for elfcorehdr specifically. Like other ARCHs do only one memory region is reserved in 1st kernel, that's crashkernel region.
I think that you misunderstand somewhat.
- Kexec-tools only locates/identifies a small region for elfcore header within crash kernel's
memory region while 1st kernel is running.
- the data in elfcore header is filled up by kexec_load system call on 1st kernel.
- 1st kernel doesn't reserve any region for elfcore header because the kernel
commandline parameters don't contains "elfcorehdr=" parameter, then elfcorehdr_size=0.
- Crash dump kernel does reserve the region, as I said, because we don't want to
corrupt the info in elfcore header accidentally while crash kernel is running.
Clear?
OK, got it now.
Then I am wondering why "elfcorehdr=" can't be contained in kernel cmdline as other ARCH does. Maybe I need go over all related threads then know why it is. Thanks for explanation.
Kexec-tools on 1st kernel appends "elfcorehdr=" to the kernel command line, actually chosen/bootargs in a device tree, that is passed to *crash dump kernel*. So when crash dump kernel boots up, it can recognizes that area (and reserves it for later use of managing /proc/vmcore.)
Thanks, -Takahiro AKASHI
Thanks Baoquan
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@
#if !defined(__ASSEMBLY__)
+extern bool in_crash_kexec; + /** * crash_setup_regs() - save registers for the panic kernel * @@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) { - /* Empty routine needed to avoid build errors. */ + if (oldregs) { + memcpy(newregs, oldregs, sizeof(*newregs)); + } else { + __asm__ __volatile__ ( + "stp x0, x1, [%3]\n\t" + "stp x2, x3, [%3, 0x10]\n\t" + "stp x4, x5, [%3, 0x20]\n\t" + "stp x6, x7, [%3, 0x30]\n\t" + "stp x8, x9, [%3, 0x40]\n\t" + "stp x10, x11, [%3, 0x50]\n\t" + "stp x12, x13, [%3, 0x60]\n\t" + "stp x14, x15, [%3, 0x70]\n\t" + "stp x16, x17, [%3, 0x80]\n\t" + "stp x18, x19, [%3, 0x90]\n\t" + "stp x20, x21, [%3, 0xa0]\n\t" + "stp x22, x23, [%3, 0xb0]\n\t" + "stp x24, x25, [%3, 0xc0]\n\t" + "stp x26, x27, [%3, 0xd0]\n\t" + "stp x28, x29, [%3, 0xe0]\n\t" + "str x30, [%3, 0xf0]\n\t" + "mov %0, sp\n\t" + "adr %1, 1f\n\t" + "mrs %2, spsr_el1\n\t" + "1:" + : "=r" (newregs->sp), + "=r" (newregs->pc), + "=r" (newregs->pstate) + : "r" (&newregs->regs) + : "memory" + ); + } }
#endif /* !defined(__ASSEMBLY__) */ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index fd6b742..6f7887b 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -9,6 +9,8 @@ * published by the Free Software Foundation. */
+#include <linux/irq.h> +#include <linux/kernel.h> #include <linux/kexec.h> #include <linux/of_fdt.h> #include <linux/slab.h> @@ -24,6 +26,8 @@ extern unsigned long arm64_kexec_dtb_addr; extern unsigned long arm64_kexec_kimage_head; extern unsigned long arm64_kexec_kimage_start;
+bool in_crash_kexec = false; + /** * kexec_is_dtb - Helper routine to check the device tree header signature. */ @@ -178,7 +182,56 @@ void machine_kexec(struct kimage *image) soft_restart(reboot_code_buffer_phys); }
+static void machine_kexec_mask_interrupts(void) +{ + unsigned int i; + struct irq_desc *desc; + + for_each_irq_desc(i, desc) { + struct irq_chip *chip; + + chip = irq_desc_get_chip(desc); + if (!chip) + continue; + + if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data)) + chip->irq_eoi(&desc->irq_data); + + if (chip->irq_mask) + chip->irq_mask(&desc->irq_data); + + if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data)) + chip->irq_disable(&desc->irq_data); + } +} + +/** + * machine_crash_shutdown - shutdown non-boot cpus and save registers + */ void machine_crash_shutdown(struct pt_regs *regs) { - /* Empty routine needed to avoid build errors. */ + struct pt_regs dummy_regs; + int cpu; + + local_irq_disable(); + + in_crash_kexec = true; + + /* + * clear and initialize the per-cpu info. This is necessary + * because, otherwise, slots for offline cpus would not be + * filled up. See smp_send_stop(). + */ + memset(&dummy_regs, 0, sizeof(dummy_regs)); + for_each_possible_cpu(cpu) + crash_save_cpu(&dummy_regs, cpu); + + /* shutdown non-boot cpus */ + smp_send_stop(); + + /* for this cpu */ + crash_save_cpu(regs, smp_processor_id()); + machine_kexec_mask_interrupts(); + + pr_info("Loading crashdump kernel...\n"); } diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index ffe8e1b..cf7f361 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -36,6 +36,7 @@ #include <linux/completion.h> #include <linux/of.h> #include <linux/irq_work.h> +#include <linux/kexec.h>
#include <asm/alternative.h> #include <asm/atomic.h> @@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock); /* * ipi_cpu_stop - handle IPI from smp_send_stop() */ -static void ipi_cpu_stop(unsigned int cpu) +static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs) { if (system_state == SYSTEM_BOOTING || system_state == SYSTEM_RUNNING) { @@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu)
local_irq_disable();
+#ifdef CONFIG_KEXEC + if (in_crash_kexec) { + crash_save_cpu(regs, cpu); + flush_cache_all(); + } +#endif /* CONFIG_KEXEC */ + while (1) cpu_relax(); } @@ -586,7 +594,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
case IPI_CPU_STOP: irq_enter(); - ipi_cpu_stop(cpu); + ipi_cpu_stop(cpu, regs); irq_exit(); break;
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@ #if !defined(__ASSEMBLY__) +extern bool in_crash_kexec;
/**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
[...]
@@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock); /*
- ipi_cpu_stop - handle IPI from smp_send_stop()
*/ -static void ipi_cpu_stop(unsigned int cpu) +static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs) { if (system_state == SYSTEM_BOOTING || system_state == SYSTEM_RUNNING) { @@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu) local_irq_disable(); +#ifdef CONFIG_KEXEC
- if (in_crash_kexec) {
crash_save_cpu(regs, cpu);
flush_cache_all();
Any cache maintenance will need to be by VA; flush_cache_all doesn't do what the name implies, though may appear to work by chance.
Is kdump implemented for ARM? I don't see equivalent for in the arch/arm ipi_cpu_stop.
Mark.
On 24/04/15 11:39, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@ #if !defined(__ASSEMBLY__) +extern bool in_crash_kexec;
/**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Indeed, parsing the irqdesc list is a recipe for disaster. Who knows which locks have been taken or simply corrupted, pointers nuked...
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
This feels like the better option. I can cook a patch or two for that.
Thanks,
M.
Marc, Mark
Sorry for not revisiting your comment below for a while.
On 04/24/2015 07:43 PM, Marc Zyngier wrote:
On 24/04/15 11:39, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@
#if !defined(__ASSEMBLY__)
+extern bool in_crash_kexec;
- /**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Indeed, parsing the irqdesc list is a recipe for disaster. Who knows which locks have been taken or simply corrupted, pointers nuked...
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
Is it just enough to remove kexec_mask_interrupts() and add gic_eoi_irq() at the beginning of gic_cpu_init() in irq-gic.c and irq-gic-v3.c?
This feels like the better option. I can cook a patch or two for that.
If you do, that will be much better :)
BTW, in arm-gic-v3.h, GICD_CTRL_ARE_NS is defined as (1U << 4) but should it be 5? (I'm referring to the page 8-415 in IHI0069A.)
Thanks, -Takahiro AKASHI
Thanks,
M.
Hi,
On 06/08/15 08:09, AKASHI Takahiro wrote:
Marc, Mark
Sorry for not revisiting your comment below for a while.
Wow. It took me a few minutes to page the context back in.
On 04/24/2015 07:43 PM, Marc Zyngier wrote:
On 24/04/15 11:39, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@
#if !defined(__ASSEMBLY__)
+extern bool in_crash_kexec;
- /**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Indeed, parsing the irqdesc list is a recipe for disaster. Who knows which locks have been taken or simply corrupted, pointers nuked...
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
Is it just enough to remove kexec_mask_interrupts() and add gic_eoi_irq() at the beginning of gic_cpu_init() in irq-gic.c and irq-gic-v3.c?
No, doing an EOI is definitely the wrong thing to do. If you do it in the wrong order, you just screw up the GIC state machine. Plus, you have no idea what to write there...
The only real solution is to zero the "active" registers.
This feels like the better option. I can cook a patch or two for that.
If you do, that will be much better :)
OK, I'll prepare something that we can merge at the same time kexec comes back from the dead (if it ever does - I'm not holding my breath).
BTW, in arm-gic-v3.h, GICD_CTRL_ARE_NS is defined as (1U << 4) but should it be 5? (I'm referring to the page 8-415 in IHI0069A.)
No, look at the definition ARE_NS has when the access is non-secure or on a system supporting a single security state. The definition you're referring to is for a secure access (firmware).
Thanks,
M.
Marc,
On 08/07/2015 12:51 AM, Marc Zyngier wrote:
Hi,
On 06/08/15 08:09, AKASHI Takahiro wrote:
Marc, Mark
Sorry for not revisiting your comment below for a while.
Wow. It took me a few minutes to page the context back in.
Please don't purge the page from your cache for a while :)
On 04/24/2015 07:43 PM, Marc Zyngier wrote:
On 24/04/15 11:39, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@
#if !defined(__ASSEMBLY__)
+extern bool in_crash_kexec;
- /**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Indeed, parsing the irqdesc list is a recipe for disaster. Who knows which locks have been taken or simply corrupted, pointers nuked...
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
Is it just enough to remove kexec_mask_interrupts() and add gic_eoi_irq() at the beginning of gic_cpu_init() in irq-gic.c and irq-gic-v3.c?
No, doing an EOI is definitely the wrong thing to do. If you do it in the wrong order, you just screw up the GIC state machine. Plus, you have no idea what to write there...
The only real solution is to zero the "active" registers.
This feels like the better option. I can cook a patch or two for that.
If you do, that will be much better :)
OK, I'll prepare something that we can merge at the same time kexec comes back from the dead (if it ever does - I'm not holding my breath).
Thank you. Please note that the same function, machine_kexec_mask_interrupts(), is already there on arm(/kernel/machine_kexec.c).
Well, kexec/kdump stuff is not dead. Hopefully I and Geoff will submit a full series of patchset in a few weeks although the main logic will be the same.
BTW, in arm-gic-v3.h, GICD_CTRL_ARE_NS is defined as (1U << 4) but should it be 5? (I'm referring to the page 8-415 in IHI0069A.)
No, look at the definition ARE_NS has when the access is non-secure or on a system supporting a single security state. The definition you're referring to is for a secure access (firmware).
Aha, I should remember that arm has "multiple personalities."
-Takahiro AKASHI
Thanks,
M.
On 04/24/2015 07:39 PM, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@
#if !defined(__ASSEMBLY__)
+extern bool in_crash_kexec;
- /**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Don't know. Will remove them.
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
OK.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
I don't get the point. You mean that the code be put in probe() of GIC driver? To be honest, this function was copied from arm's implementation.
[...]
@@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock); /*
- ipi_cpu_stop - handle IPI from smp_send_stop()
*/ -static void ipi_cpu_stop(unsigned int cpu) +static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs) { if (system_state == SYSTEM_BOOTING || system_state == SYSTEM_RUNNING) { @@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu)
local_irq_disable();
+#ifdef CONFIG_KEXEC
- if (in_crash_kexec) {
crash_save_cpu(regs, cpu);
flush_cache_all();
Any cache maintenance will need to be by VA; flush_cache_all doesn't do what the name implies, though may appear to work by chance.
Is kdump implemented for ARM? I don't see equivalent for in the arch/arm ipi_cpu_stop.
Arm has a dedicated function in arch/arm/kernel/machine_kexec.c: machine_crash_shutdown() => machine_crash_nonpanic_core()
Thanks, -Takahiro AKASHI
Mark.
Mark,
On 05/11/2015 04:10 PM, AKASHI Takahiro wrote:
On 04/24/2015 07:39 PM, Mark Rutland wrote:
On Fri, Apr 24, 2015 at 08:53:05AM +0100, AKASHI Takahiro wrote:
kdump calls machine_crash_shutdown() to shut down non-boot cpus and save per-cpu general-purpose registers before restarting the crash dump kernel. See kernel_kexec(). ipi_cpu_stop() is used and a bit modified to support this behavior.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++- arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++- arch/arm64/kernel/smp.c | 12 ++++++-- 3 files changed, 97 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h index 3530ff5..eaf3fcb 100644 --- a/arch/arm64/include/asm/kexec.h +++ b/arch/arm64/include/asm/kexec.h @@ -30,6 +30,8 @@
#if !defined(__ASSEMBLY__)
+extern bool in_crash_kexec;
- /**
- crash_setup_regs() - save registers for the panic kernel
@@ -40,7 +42,37 @@ static inline void crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs) {
- /* Empty routine needed to avoid build errors. */
- if (oldregs) {
memcpy(newregs, oldregs, sizeof(*newregs));
- } else {
__asm__ __volatile__ (
"stp x0, x1, [%3]\n\t"
Why the tabs?
Don't know. Will remove them.
Please use #16 * N as the offset for consistency with entry.S, with 0 for the first N.
OK.
[...]
+static void machine_kexec_mask_interrupts(void) +{
- unsigned int i;
- struct irq_desc *desc;
- for_each_irq_desc(i, desc) {
struct irq_chip *chip;
chip = irq_desc_get_chip(desc);
if (!chip)
continue;
if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
chip->irq_eoi(&desc->irq_data);
if (chip->irq_mask)
chip->irq_mask(&desc->irq_data);
if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
chip->irq_disable(&desc->irq_data);
- }
+}
I'm surprised that this isn't left to the irqchip driver init code in the crash kernel. For all we know this state could be corrupt anyway.
Is there any reason we can't get the GIC driver to nuke all of this at probe time?
I don't get the point. You mean that the code be put in probe() of GIC driver? To be honest, this function was copied from arm's implementation.
Could you please elaborate a bit more details so that I can address the issue?
Thanks, -Takahiro AKASHI
[...]
@@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock); /*
- ipi_cpu_stop - handle IPI from smp_send_stop()
*/ -static void ipi_cpu_stop(unsigned int cpu) +static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs) { if (system_state == SYSTEM_BOOTING || system_state == SYSTEM_RUNNING) { @@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu)
local_irq_disable();
+#ifdef CONFIG_KEXEC
- if (in_crash_kexec) {
crash_save_cpu(regs, cpu);
flush_cache_all();
Any cache maintenance will need to be by VA; flush_cache_all doesn't do what the name implies, though may appear to work by chance.
Is kdump implemented for ARM? I don't see equivalent for in the arch/arm ipi_cpu_stop.
Arm has a dedicated function in arch/arm/kernel/machine_kexec.c: machine_crash_shutdown() => machine_crash_nonpanic_core()
Thanks, -Takahiro AKASHI
Mark.
Unlike normal kexec case, we don't have a chance to reset EL2 context in a generic way because bad exceptions may directly invoke crash_kexec(). (See die().) Kvm is not useful on crash dump kernel anyway, and so we let it un-initialized across rebooting.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/kernel/process.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index d894d3e..9859f5c 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -48,6 +48,7 @@ #include <asm/compat.h> #include <asm/cacheflush.h> #include <asm/fpsimd.h> +#include <asm/kexec.h> #include <asm/mmu_context.h> #include <asm/processor.h> #include <asm/stacktrace.h> @@ -64,7 +65,11 @@ void soft_restart(unsigned long addr) setup_mm_for_reboot();
cpu_soft_restart(virt_to_phys(cpu_reset), - is_hyp_mode_available(), addr); +#ifdef CONFIG_KEXEC + !in_crash_kexec && +#endif + is_hyp_mode_available(), + addr);
/* Should never get here */ BUG();
Please read the following commits for arm64-specific constraints: arm64: kdump: reserve memory for crash dump kernel arm64: kdump: do not go into EL2 before starting a crash dump kernel
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- Documentation/kdump/kdump.txt | 31 ++++++++++++++++++++++++++++++- arch/arm64/Kconfig | 12 ++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index bc4bd5a..f9cf6f5 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to a remote system.
Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, -s390x and arm architectures. +s390x, arm and arm64 architectures.
When the system kernel boots, it reserves a small section of memory for the dump-capture kernel. This ensures that ongoing Direct Memory Access @@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm)
AUTO_ZRELADDR=y
+Dump-capture kernel config options (Arch Dependent, arm64) +---------------------------------------------------------- + +1) Disable symmetric multi-processing support under "Processor type and + features": + + CONFIG_SMP=n + + (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line + when loading the dump-capture kernel, see section "Load the Dump-capture + Kernel".) + +2) Under uefi, the maximum memory size on the dump-capture kernel must be + limited by specifying: + + mem=X[MG] + + where X should be less than or equal to the size in "crashkernel=" + boot parameter. Kexec-tools will automatically add this. + +3) Currently, kvm will not be initialized on the dump-capture kernel even + if it is configured. + Extended crashkernel syntax ===========================
@@ -312,6 +335,7 @@ Boot into System Kernel any space below the alignment point may be overwritten by the dump-capture kernel, which means it is possible that the vmcore is not that precise as expected.
+ On arm64, use "crashkernel=Y@X".
Load the Dump-capture Kernel ============================ @@ -334,6 +358,8 @@ For s390x: - Use image or bzImage For arm: - Use zImage +For arm64: + - Use vmlinux
If you are using a uncompressed vmlinux image then use following command to load dump-capture kernel. @@ -377,6 +403,9 @@ For s390x: For arm: "1 maxcpus=1 reset_devices"
+For arm64: + "1 maxcpus=1 mem=X[MG] reset_devices" + Notes on loading the dump-capture kernel:
* By default, the ELF headers are stored in ELF64 format to support diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5716edf..8e2f545 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -562,6 +562,18 @@ config KEXEC but it is independent of the system firmware. And like a reboot you can start any kernel with it, not just Linux.
+config CRASH_DUMP + bool "Build kdump crash kernel" + help + Generate crash dump after being started by kexec. This should + be normally only set in special crash dump kernels which are + loaded in the main kernel with kexec-tools into a specially + reserved region and then later executed after a crash by + kdump/kexec. The crash dump kernel must be compiled to a + memory address not used by the main kernel. + + For more details see Documentation/kdump/kdump.txt + config XEN_DOM0 def_bool y depends on XEN
Hi,
On 04/24/15 at 04:53pm, AKASHI Takahiro wrote:
Please read the following commits for arm64-specific constraints: arm64: kdump: reserve memory for crash dump kernel arm64: kdump: do not go into EL2 before starting a crash dump kernel
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
Documentation/kdump/kdump.txt | 31 ++++++++++++++++++++++++++++++- arch/arm64/Kconfig | 12 ++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index bc4bd5a..f9cf6f5 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to a remote system. Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, -s390x and arm architectures. +s390x, arm and arm64 architectures. When the system kernel boots, it reserves a small section of memory for the dump-capture kernel. This ensures that ongoing Direct Memory Access @@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm) AUTO_ZRELADDR=y +Dump-capture kernel config options (Arch Dependent, arm64) +----------------------------------------------------------
+1) Disable symmetric multi-processing support under "Processor type and
- features":
- CONFIG_SMP=n
- (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
- when loading the dump-capture kernel, see section "Load the Dump-capture
- Kernel".)
+2) Under uefi, the maximum memory size on the dump-capture kernel must be
- limited by specifying:
- mem=X[MG]
- where X should be less than or equal to the size in "crashkernel="
- boot parameter. Kexec-tools will automatically add this.
I noticed you are passing mem=X in kexec-tools, Pratyush found a problem that vmcore is corrupted because kdump kernel is using the crash notes memory.
How does kdump kernel know the system ram range especially the mem_start? Only with the size from mem=? parammeter?
In X86 because it use E820 so kexec-tools can pass the memory range to kdump kernel even for UEFI booting. But in arm64 may need find other way to communicate with 2nd kernel like memmap=exactmap in X86..
+3) Currently, kvm will not be initialized on the dump-capture kernel even
- if it is configured.
Extended crashkernel syntax
@@ -312,6 +335,7 @@ Boot into System Kernel any space below the alignment point may be overwritten by the dump-capture kernel, which means it is possible that the vmcore is not that precise as expected.
- On arm64, use "crashkernel=Y@X".
Load the Dump-capture Kernel ============================ @@ -334,6 +358,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
- Use vmlinux
If you are using a uncompressed vmlinux image then use following command to load dump-capture kernel. @@ -377,6 +403,9 @@ For s390x: For arm: "1 maxcpus=1 reset_devices" +For arm64:
- "1 maxcpus=1 mem=X[MG] reset_devices"
Notes on loading the dump-capture kernel:
- By default, the ELF headers are stored in ELF64 format to support
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5716edf..8e2f545 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -562,6 +562,18 @@ config KEXEC but it is independent of the system firmware. And like a reboot you can start any kernel with it, not just Linux. +config CRASH_DUMP
- bool "Build kdump crash kernel"
- help
Generate crash dump after being started by kexec. This should
be normally only set in special crash dump kernels which are
loaded in the main kernel with kexec-tools into a specially
reserved region and then later executed after a crash by
kdump/kexec. The crash dump kernel must be compiled to a
memory address not used by the main kernel.
For more details see Documentation/kdump/kdump.txt
config XEN_DOM0 def_bool y depends on XEN -- 1.7.9.5
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
On 05/08/15 at 08:19pm, Dave Young wrote:
Hi,
On 04/24/15 at 04:53pm, AKASHI Takahiro wrote:
Please read the following commits for arm64-specific constraints: arm64: kdump: reserve memory for crash dump kernel arm64: kdump: do not go into EL2 before starting a crash dump kernel
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
Documentation/kdump/kdump.txt | 31 ++++++++++++++++++++++++++++++- arch/arm64/Kconfig | 12 ++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index bc4bd5a..f9cf6f5 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to a remote system. Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, -s390x and arm architectures. +s390x, arm and arm64 architectures. When the system kernel boots, it reserves a small section of memory for the dump-capture kernel. This ensures that ongoing Direct Memory Access @@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm) AUTO_ZRELADDR=y +Dump-capture kernel config options (Arch Dependent, arm64) +----------------------------------------------------------
+1) Disable symmetric multi-processing support under "Processor type and
- features":
- CONFIG_SMP=n
- (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
- when loading the dump-capture kernel, see section "Load the Dump-capture
- Kernel".)
+2) Under uefi, the maximum memory size on the dump-capture kernel must be
- limited by specifying:
- mem=X[MG]
- where X should be less than or equal to the size in "crashkernel="
- boot parameter. Kexec-tools will automatically add this.
I noticed you are passing mem=X in kexec-tools, Pratyush found a problem that vmcore is corrupted because kdump kernel is using the crash notes memory.
How does kdump kernel know the system ram range especially the mem_start? Only with the size from mem=? parammeter?
In X86 because it use E820 so kexec-tools can pass the memory range to kdump kernel even for UEFI booting. But in arm64 may need find other way to communicate with 2nd kernel like memmap=exactmap in X86..
Discussed with Pratyush in irc, he mentioned that start address is get from kernel->entry which was passed by kexec-tools.
So sorry for the noise, it will works fine with the UEFI mem= limitation fix from Mark Salter
+3) Currently, kvm will not be initialized on the dump-capture kernel even
- if it is configured.
Extended crashkernel syntax
@@ -312,6 +335,7 @@ Boot into System Kernel any space below the alignment point may be overwritten by the dump-capture kernel, which means it is possible that the vmcore is not that precise as expected.
- On arm64, use "crashkernel=Y@X".
Load the Dump-capture Kernel ============================ @@ -334,6 +358,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
- Use vmlinux
If you are using a uncompressed vmlinux image then use following command to load dump-capture kernel. @@ -377,6 +403,9 @@ For s390x: For arm: "1 maxcpus=1 reset_devices" +For arm64:
- "1 maxcpus=1 mem=X[MG] reset_devices"
Notes on loading the dump-capture kernel:
- By default, the ELF headers are stored in ELF64 format to support
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5716edf..8e2f545 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -562,6 +562,18 @@ config KEXEC but it is independent of the system firmware. And like a reboot you can start any kernel with it, not just Linux. +config CRASH_DUMP
- bool "Build kdump crash kernel"
- help
Generate crash dump after being started by kexec. This should
be normally only set in special crash dump kernels which are
loaded in the main kernel with kexec-tools into a specially
reserved region and then later executed after a crash by
kdump/kexec. The crash dump kernel must be compiled to a
memory address not used by the main kernel.
For more details see Documentation/kdump/kdump.txt
config XEN_DOM0 def_bool y depends on XEN -- 1.7.9.5
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Dave,
On 05/11/2015 04:47 PM, Dave Young wrote:
On 05/08/15 at 08:19pm, Dave Young wrote:
Hi,
On 04/24/15 at 04:53pm, AKASHI Takahiro wrote:
Please read the following commits for arm64-specific constraints: arm64: kdump: reserve memory for crash dump kernel arm64: kdump: do not go into EL2 before starting a crash dump kernel
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
Documentation/kdump/kdump.txt | 31 ++++++++++++++++++++++++++++++- arch/arm64/Kconfig | 12 ++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index bc4bd5a..f9cf6f5 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to a remote system.
Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, -s390x and arm architectures. +s390x, arm and arm64 architectures.
When the system kernel boots, it reserves a small section of memory for the dump-capture kernel. This ensures that ongoing Direct Memory Access @@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm)
AUTO_ZRELADDR=y
+Dump-capture kernel config options (Arch Dependent, arm64) +----------------------------------------------------------
+1) Disable symmetric multi-processing support under "Processor type and
- features":
- CONFIG_SMP=n
- (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
- when loading the dump-capture kernel, see section "Load the Dump-capture
- Kernel".)
+2) Under uefi, the maximum memory size on the dump-capture kernel must be
- limited by specifying:
- mem=X[MG]
- where X should be less than or equal to the size in "crashkernel="
- boot parameter. Kexec-tools will automatically add this.
I noticed you are passing mem=X in kexec-tools, Pratyush found a problem that vmcore is corrupted because kdump kernel is using the crash notes memory.
How does kdump kernel know the system ram range especially the mem_start? Only with the size from mem=? parammeter?
In X86 because it use E820 so kexec-tools can pass the memory range to kdump kernel even for UEFI booting. But in arm64 may need find other way to communicate with 2nd kernel like memmap=exactmap in X86..
Discussed with Pratyush in irc, he mentioned that start address is get from kernel->entry which was passed by kexec-tools.
So sorry for the noise, it will works fine with the UEFI mem= limitation fix from Mark Salter
No, not at all. As you know, in another thread (with Mark), we're discussing this. If we conclude that we should go for "memmap=" approach, I will update my patch.
Thanks, -Takahiro AKASHI
+3) Currently, kvm will not be initialized on the dump-capture kernel even
- if it is configured.
Extended crashkernel syntax
@@ -312,6 +335,7 @@ Boot into System Kernel any space below the alignment point may be overwritten by the dump-capture kernel, which means it is possible that the vmcore is not that precise as expected.
On arm64, use "crashkernel=Y@X".
Load the Dump-capture Kernel
@@ -334,6 +358,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
- Use vmlinux
If you are using a uncompressed vmlinux image then use following command to load dump-capture kernel.
@@ -377,6 +403,9 @@ For s390x: For arm: "1 maxcpus=1 reset_devices"
+For arm64:
"1 maxcpus=1 mem=X[MG] reset_devices"
Notes on loading the dump-capture kernel:
- By default, the ELF headers are stored in ELF64 format to support
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5716edf..8e2f545 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -562,6 +562,18 @@ config KEXEC but it is independent of the system firmware. And like a reboot you can start any kernel with it, not just Linux.
+config CRASH_DUMP
- bool "Build kdump crash kernel"
- help
Generate crash dump after being started by kexec. This should
be normally only set in special crash dump kernels which are
loaded in the main kernel with kexec-tools into a specially
reserved region and then later executed after a crash by
kdump/kexec. The crash dump kernel must be compiled to a
memory address not used by the main kernel.
For more details see Documentation/kdump/kdump.txt
- config XEN_DOM0 def_bool y depends on XEN
-- 1.7.9.5
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
On 05/11/15 at 04:58pm, AKASHI Takahiro wrote:
Dave,
On 05/11/2015 04:47 PM, Dave Young wrote:
On 05/08/15 at 08:19pm, Dave Young wrote:
Hi,
On 04/24/15 at 04:53pm, AKASHI Takahiro wrote:
Please read the following commits for arm64-specific constraints: arm64: kdump: reserve memory for crash dump kernel arm64: kdump: do not go into EL2 before starting a crash dump kernel
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
Documentation/kdump/kdump.txt | 31 ++++++++++++++++++++++++++++++- arch/arm64/Kconfig | 12 ++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index bc4bd5a..f9cf6f5 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to a remote system.
Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, -s390x and arm architectures. +s390x, arm and arm64 architectures.
When the system kernel boots, it reserves a small section of memory for the dump-capture kernel. This ensures that ongoing Direct Memory Access @@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm)
AUTO_ZRELADDR=y
+Dump-capture kernel config options (Arch Dependent, arm64) +----------------------------------------------------------
+1) Disable symmetric multi-processing support under "Processor type and
- features":
- CONFIG_SMP=n
- (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
- when loading the dump-capture kernel, see section "Load the Dump-capture
- Kernel".)
+2) Under uefi, the maximum memory size on the dump-capture kernel must be
- limited by specifying:
- mem=X[MG]
- where X should be less than or equal to the size in "crashkernel="
- boot parameter. Kexec-tools will automatically add this.
I noticed you are passing mem=X in kexec-tools, Pratyush found a problem that vmcore is corrupted because kdump kernel is using the crash notes memory.
How does kdump kernel know the system ram range especially the mem_start? Only with the size from mem=? parammeter?
In X86 because it use E820 so kexec-tools can pass the memory range to kdump kernel even for UEFI booting. But in arm64 may need find other way to communicate with 2nd kernel like memmap=exactmap in X86..
Discussed with Pratyush in irc, he mentioned that start address is get from kernel->entry which was passed by kexec-tools.
So sorry for the noise, it will works fine with the UEFI mem= limitation fix from Mark Salter
No, not at all. As you know, in another thread (with Mark), we're discussing this. If we conclude that we should go for "memmap=" approach, I will update my patch.
Ok, good to know. I did not notice the thread, will take a look.
There's another problem for memmap= approach though. In x86 we have problems when we need pass reserved memory ranges as well because of some pci mmconfig issue. In that case kernel command line buffer is too small and there's too many memory ranges for reserved ranges. Thus later we switched to pass e820 in x86 boot params directly.
If there's same potential problems we may consider to use memmap= only for system ram, and for any other reserved types we can still get from same dtb in 1st kernel.
Thanks Dave
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/configs/defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 583cacb..7b3fa3a 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -49,6 +49,7 @@ CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_CMA=y CONFIG_KEXEC=y +CONFIG_CRASH_DUMP=y CONFIG_CMDLINE="console=ttyAMA0" # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set CONFIG_COMPAT=y
Hi,
On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
This patch set enables kdump (crash dump kernel) support on arm64 on top of Geoff's kexec patchset.
In this version, there are some arm64-specific usage/constraints:
- "mem=" boot parameter must be specified on crash dump kernel if the system starts on uefi.
This sounds very painful. Why is this the case, and how do x86 and/or ia64 get around that?
- Kvm will not be enabled on crash dump kernel even if configured
See commit messages and Documentation/kdump/kdump.txt for details.
The only concern I have is whether or not we can use the exact same kernel as both system kernel and crash dump kernel. The current arm64 kernel is not relocatable in the exact sense but I see no problems in using the same binary when testing kdump.
Ard has been working on decoupling the kernel text/data, FDT, and linear memory mappings, which would allow the kernel to be loaded anywhere and still be able to access all of memory [3]. I assume that's what you mean by "relocatable"?
I tested the code with
- ATF v1.1 + EDK2(UEFI) v3.0-rc0
- kernel v4.0 + Geoff' kexec v9
on
- Base fast model, and
- MediaTek MT8173-EVB
using my own kexec-tools [1], currently v0.12.
You may want to start a kernel with the following boot parameter: crashkernel=64M (or so, on model) and try $ kexec -p --load <vmlinux> --append ... $ echo c > /proc/sysrq-trigger
To examine vmcore (/proc/vmcore), you should use
- gdb v7.7 or later
- crash + a small patch (to recognize v4.0 kernel)
Changes from v1:
- rebased to Geoff's v9
- tested this patchset on real hardware and fixed bugs:
- added cache flush operation in ipi_cpu_stop() when shutting down the system. Otherwise, data saved in vmcore's note sections by crash_save_cpu() might not be flushed to dumped memory and crash command fail to fetch correct data.
We'll need to give this a go on something with a system cache too (e.g. Seattle or X-Gene). Even if that's only UP it would give me much greater confidence in the cache maintenance.
I will address Mark's commit[2] after Geoff takes care of it on kexec.
- modified to use ioremap_cache() instead of ioremap() when reading crash memory. Otherwise, accessing /proc/vmcore on crash dump kernel might cause an alignment fault.
- allows reserve_crashkernel() to handle "crashkernel=xyz[MG]" correctly, thanks to Pratyush Anand. And it now also enforces memory limit.
I worry that there could be potentially bad interaction between this and Ard's patches, depending on how the memory area to use is chosen. It is probably fine, but we should make sure that it is.
- moved reserve_crashkernel() and reserve_elfcorehdr() to arm64_memblock_init() to clarify that they should be called before dma_contignuous_reserve().
[1] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/338171.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/337596.html
Thanks, Mark.
Hi
Sorry for late response. I was on vacation.
On 04/24/2015 06:53 PM, Mark Rutland wrote:
Hi,
On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
This patch set enables kdump (crash dump kernel) support on arm64 on top of Geoff's kexec patchset.
In this version, there are some arm64-specific usage/constraints:
- "mem=" boot parameter must be specified on crash dump kernel if the system starts on uefi.
This sounds very painful. Why is this the case, and how do x86 and/or ia64 get around that?
As Dave (Young) said, x86 uses "memmap=XX" kernel commandline parameters to specify usable memory for crash dump kernel. On my arm64 implementation, "linux,usable-memory" property is added to device tree blob by kexec-tools for this purpose. This is because, when I first implemented kdump on arm64, ppc is the only architecture that supports kdump AND utilizes device trees. Since kexec-tools as well as the kernel already has this framework, I believed that device-tree approach was smarter than a commandline parameter.
However, uefi-based kernel ignores all the memory-related properties in a device tree and so this "mem=" workaround was added.
- Kvm will not be enabled on crash dump kernel even if configured
See commit messages and Documentation/kdump/kdump.txt for details.
The only concern I have is whether or not we can use the exact same kernel as both system kernel and crash dump kernel. The current arm64 kernel is not relocatable in the exact sense but I see no problems in using the same binary when testing kdump.
Ard has been working on decoupling the kernel text/data, FDT, and linear memory mappings, which would allow the kernel to be loaded anywhere and still be able to access all of memory [3]. I assume that's what you mean by "relocatable"?
I'm still trying to understand Ard's patchset, but I think yes.
I tested the code with
- ATF v1.1 + EDK2(UEFI) v3.0-rc0
- kernel v4.0 + Geoff' kexec v9
on
- Base fast model, and
- MediaTek MT8173-EVB
using my own kexec-tools [1], currently v0.12.
You may want to start a kernel with the following boot parameter: crashkernel=64M (or so, on model) and try $ kexec -p --load <vmlinux> --append ... $ echo c > /proc/sysrq-trigger
To examine vmcore (/proc/vmcore), you should use
- gdb v7.7 or later
- crash + a small patch (to recognize v4.0 kernel)
Changes from v1:
- rebased to Geoff's v9
- tested this patchset on real hardware and fixed bugs:
- added cache flush operation in ipi_cpu_stop() when shutting down the system. Otherwise, data saved in vmcore's note sections by crash_save_cpu() might not be flushed to dumped memory and crash command fail to fetch correct data.
We'll need to give this a go on something with a system cache too (e.g. Seattle or X-Gene). Even if that's only UP it would give me much greater confidence in the cache maintenance.
Is there any genric interface to do so?
I will address Mark's commit[2] after Geoff takes care of it on kexec.
- modified to use ioremap_cache() instead of ioremap() when reading crash memory. Otherwise, accessing /proc/vmcore on crash dump kernel might cause an alignment fault.
- allows reserve_crashkernel() to handle "crashkernel=xyz[MG]" correctly, thanks to Pratyush Anand. And it now also enforces memory limit.
I worry that there could be potentially bad interaction between this and Ard's patches, depending on how the memory area to use is chosen. It is probably fine, but we should make sure that it is.
I'm not sure what the point is, but I will try to check it.
Thanks, -Takahiro AKASHI
- moved reserve_crashkernel() and reserve_elfcorehdr() to arm64_memblock_init() to clarify that they should be called before dma_contignuous_reserve().
[1] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/338171.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/337596.html
Thanks, Mark.
On 05/11/15 at 03:16pm, AKASHI Takahiro wrote:
Hi
Sorry for late response. I was on vacation.
On 04/24/2015 06:53 PM, Mark Rutland wrote:
Hi,
On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
This patch set enables kdump (crash dump kernel) support on arm64 on top of Geoff's kexec patchset.
In this version, there are some arm64-specific usage/constraints:
- "mem=" boot parameter must be specified on crash dump kernel if the system starts on uefi.
This sounds very painful. Why is this the case, and how do x86 and/or ia64 get around that?
As Dave (Young) said, x86 uses "memmap=XX" kernel commandline parameters to specify usable memory for crash dump kernel.
Originally x86 use memmap=exactmap memmap=XX to specify each section of memories for 2nd kernel. But later because a lot of reserved type ranges need to be passed ie. for pci mmconfig, and kernel cmdline buffer is limited so kexec-tools later switch to passing these in x86 boot params as E820 memory ranges directly.
On my arm64 implementation, "linux,usable-memory" property is added to device tree blob by kexec-tools for this purpose. This is because, when I first implemented kdump on arm64, ppc is the only architecture that supports kdump AND utilizes device trees. Since kexec-tools as well as the kernel already has this framework, I believed that device-tree approach was smarter than a commandline parameter.
However, uefi-based kernel ignores all the memory-related properties in a device tree and so this "mem=" workaround was added.
Kdump kernel reuses the memmap info getting from firmware during 1st kernel boot, I do not think the memmap info can be cooked for crash kernel usable memory. But it might be a better way to use a special fdt node for crash kernel memory even for UEFI..
Another way is introducing a similar memmap=, but maybe consider only system_ram type ranges. For other memory areas still use UEFI memmap.
Thanks Dave
Mark,
On 05/12/2015 02:43 PM, Dave Young wrote:
On 05/11/15 at 03:16pm, AKASHI Takahiro wrote:
Hi
Sorry for late response. I was on vacation.
On 04/24/2015 06:53 PM, Mark Rutland wrote:
Hi,
On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
This patch set enables kdump (crash dump kernel) support on arm64 on top of Geoff's kexec patchset.
In this version, there are some arm64-specific usage/constraints:
- "mem=" boot parameter must be specified on crash dump kernel if the system starts on uefi.
This sounds very painful. Why is this the case, and how do x86 and/or ia64 get around that?
As Dave (Young) said, x86 uses "memmap=XX" kernel commandline parameters to specify usable memory for crash dump kernel.
Originally x86 use memmap=exactmap memmap=XX to specify each section of memories for 2nd kernel. But later because a lot of reserved type ranges need to be passed ie. for pci mmconfig, and kernel cmdline buffer is limited so kexec-tools later switch to passing these in x86 boot params as E820 memory ranges directly.
On my arm64 implementation, "linux,usable-memory" property is added to device tree blob by kexec-tools for this purpose. This is because, when I first implemented kdump on arm64, ppc is the only architecture that supports kdump AND utilizes device trees. Since kexec-tools as well as the kernel already has this framework, I believed that device-tree approach was smarter than a commandline parameter.
However, uefi-based kernel ignores all the memory-related properties in a device tree and so this "mem=" workaround
Kdump kernel reuses the memmap info getting from firmware during 1st kernel boot, I do not think the memmap info can be cooked for crash kernel usable memory. But it might be a better way to use a special fdt node for crash kernel memory even for UEFI..
Do you still prefer "memmap=" approach?
Just FYI,
x86 arm arm64 powerpc{,64}
(a) cmdline(crashkernel=) cmdline cmdline cmdline (b) iomem iomem iomem dev tree (c) cmdline(memmap=) cmdline(mem=) dev tree dev tree & cmdline(mem=)
(a) how we specify/reserve crash dump kernel's memory on 1st kernel (b) how we inform userspace tool of crash dump kernel's memory on 1st kernel - iomem: add "Crash kernel" entry in /proc/iomem - dev tree: add /chosen/linux,crashkernel-base (and crashkernel-size) (c) how we specify usable memory on crash dump kernel - dev tree: add /chosen/memory/linux,usable-memory
As you see, arm64, and arm as well, does things in a *hybrid* way of x86 and ppc. I think we should go for a uniform way with device tree as ppc does. (Even for (a), we can use a device tree.)
Another way is introducing a similar memmap=, but maybe consider only system_ram type ranges. For other memory areas still use UEFI memmap.
Yeah, I don't know whether we need additional formats of memmap=, like memmap=XX$YY or memmap=XX#YY, for ACPI stuffs.
Thanks, -Takahiro AKASHI
Thanks Dave
linaro-kernel@lists.linaro.org