Introducing a chosen node, rng-seed, which is an entropy that can be passed to kernel called very early to increase initial device randomness. Bootloader should provide this entropy and the value is read from /chosen/rng-seed in DT.
Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org --- change log: v1->v2: * call function in early_init_dt_scan_chosen * will add doc to devicetree-org/dt-schema on github if this is accepted --- Documentation/devicetree/bindings/chosen.txt | 14 ++++++++++++++ drivers/of/fdt.c | 11 +++++++++++ 2 files changed, 25 insertions(+)
diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt index 45e79172a646..fef5c82672dc 100644 --- a/Documentation/devicetree/bindings/chosen.txt +++ b/Documentation/devicetree/bindings/chosen.txt @@ -28,6 +28,20 @@ mode) when EFI_RNG_PROTOCOL is supported, it will be overwritten by the Linux EFI stub (which will populate the property itself, using EFI_RNG_PROTOCOL).
+rng-seed +----------- + +This property served as an entropy to add device randomness. It is parsed +as a byte array, e.g. + +/ { + chosen { + rng-seed = <0x31 0x95 0x1b 0x3c 0xc9 0xfa 0xb3 ...>; + }; +}; + +This random value should be provided by bootloader. + stdout-path -----------
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index de893c9616a1..96ea5eba9dd5 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -24,6 +24,7 @@ #include <linux/debugfs.h> #include <linux/serial_core.h> #include <linux/sysfs.h> +#include <linux/random.h>
#include <asm/setup.h> /* for COMMAND_LINE_SIZE */ #include <asm/page.h> @@ -1079,6 +1080,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, { int l; const char *p; + const void *rng_seed;
pr_debug("search "chosen", depth: %d, uname: %s\n", depth, uname);
@@ -1113,6 +1115,15 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
pr_debug("Command line is: %s\n", (char*)data);
+ rng_seed = of_get_flat_dt_prop(node, "rng-seed", &l); + if (!rng_seed || l == 0) + return 1; + + /* try to clear seed so it won't be found. */ + fdt_nop_property(initial_boot_params, node, "rng-seed"); + + add_device_randomness(rng_seed, l); + /* break now */ return 1; }
Currently in arm64, FDT is mapped to RO before it's passed to early_init_dt_scan(). However, there might be some code that needs to modify FDT during init. Map FDT to RW until unflatten DT.
Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org --- arch/arm64/kernel/setup.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 413d566405d1..08b22c1e72a9 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -179,9 +179,13 @@ static void __init smp_build_mpidr_hash(void) pr_warn("Large number of MPIDR hash buckets detected\n"); }
+extern void *__init __fixmap_remap_fdt(phys_addr_t dt_phys, int *size, + pgprot_t prot); + static void __init setup_machine_fdt(phys_addr_t dt_phys) { - void *dt_virt = fixmap_remap_fdt(dt_phys); + int size; + void *dt_virt = __fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); const char *name;
if (!dt_virt || !early_init_dt_scan(dt_virt)) { @@ -320,6 +324,9 @@ void __init setup_arch(char **cmdline_p) /* Parse the ACPI tables for possible boot-time configuration */ acpi_boot_table_init();
+ /* remap fdt to RO */ + fixmap_remap_fdt(__fdt_pointer); + if (acpi_disabled) unflatten_device_tree();
On Mon, May 13, 2019 at 08:38:19AM +0800, Hsin-Yi Wang wrote:
Currently in arm64, FDT is mapped to RO before it's passed to early_init_dt_scan(). However, there might be some code that needs to modify FDT during init. Map FDT to RW until unflatten DT.
Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org
arch/arm64/kernel/setup.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 413d566405d1..08b22c1e72a9 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -179,9 +179,13 @@ static void __init smp_build_mpidr_hash(void) pr_warn("Large number of MPIDR hash buckets detected\n"); } +extern void *__init __fixmap_remap_fdt(phys_addr_t dt_phys, int *size,
pgprot_t prot);
static void __init setup_machine_fdt(phys_addr_t dt_phys) {
- void *dt_virt = fixmap_remap_fdt(dt_phys);
- int size;
- void *dt_virt = __fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); const char *name;
This makes the fdt mapped without the call to meblock_reserve(fdt) which makes the fdt memory available for memblock allocations.
Chances that is will be actually allocated are small, but you know, things happen.
IMHO, instead of calling directly __fixmap_remap_fdt() it would be better to add pgprot parameter to fixmap_remap_fdt(). Then here and in kaslr.c it can be called with PAGE_KERNEL and below with PAGE_KERNEL_RO.
There is no problem to call memblock_reserve() for the same area twice, it's essentially a NOP.
if (!dt_virt || !early_init_dt_scan(dt_virt)) { @@ -320,6 +324,9 @@ void __init setup_arch(char **cmdline_p) /* Parse the ACPI tables for possible boot-time configuration */ acpi_boot_table_init();
- /* remap fdt to RO */
- fixmap_remap_fdt(__fdt_pointer);
- if (acpi_disabled) unflatten_device_tree();
2.20.1
On Mon, May 13, 2019 at 4:59 PM Mike Rapoport rppt@linux.ibm.com wrote:
This makes the fdt mapped without the call to meblock_reserve(fdt) which makes the fdt memory available for memblock allocations.
Chances that is will be actually allocated are small, but you know, things happen.
IMHO, instead of calling directly __fixmap_remap_fdt() it would be better to add pgprot parameter to fixmap_remap_fdt(). Then here and in kaslr.c it can be called with PAGE_KERNEL and below with PAGE_KERNEL_RO.
There is no problem to call memblock_reserve() for the same area twice, it's essentially a NOP.
Thanks for the suggestion. Will update fixmap_remap_fdt() in next patch.
However, I tested on some arm64 platform, if we also call memblock_reserve() in kaslr.c, would cause warning[1] when memblock_reserve() is called again in setup_machine_fdt(). The warning comes from https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L601 ``` if (type->regions[0].size == 0) { WARN_ON(type->cnt != 1 || type->total_size); ... ```
Call memblock_reserve() multiple times after setup_machine_fdt() doesn't have such warning though.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
[1] [ 0.000000] WARNING: CPU: 0 PID: 0 at /mnt/host/source/src/third_party/kernel/v4.19/mm/memblock.c:583 memblock_add_range+0x1bc/0x1c8 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.38 #125 [ 0.000000] pstate: 600001c5 (nZCv dAIF -PAN -UAO) [ 0.000000] pc : memblock_add_range+0x1bc/0x1c8 [ 0.000000] lr : memblock_add_range+0x30/0x1c8 [ 0.000000] sp : ffffff9b5e203e80 [ 0.000000] x29: ffffff9b5e203ed0 x28: 0000000040959324 [ 0.000000] x27: 0000000040080000 x26: 0000000000080000 [ 0.000000] x25: 0000000080127e4b x24: 0000000000000000 [ 0.000000] x23: 0000001b55000000 x22: 000000000001152b [ 0.000000] x21: 000000005f800000 x20: 0000000000000000 [ 0.000000] x19: ffffff9b5e24bf00 x18: 00000000ffffffb8 [ 0.000000] x17: 000000000000003c x16: ffffffbefea00000 [ 0.000000] x15: ffffffbefea00000 x14: ffffff9b5e3c17d8 [ 0.000000] x13: 00e8000000000713 x12: 0000000000000000 [ 0.000000] x11: ffffffbefea00000 x10: 00e800005f800710 [ 0.000000] x9 : 000000000001152b x8 : ffffff9b5e365690 [ 0.000000] x7 : 6f20646573616228 x6 : 0000000000000002 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 [ 0.000000] x3 : 0000000000200000 x2 : 000000000001152b [ 0.000000] x1 : 000000005f800000 x0 : ffffff9b5e24bf00 [ 0.000000] Call trace: [ 0.000000] memblock_add_range+0x1bc/0x1c8 [ 0.000000] memblock_reserve+0x60/0xac [ 0.000000] fixmap_remap_fdt+0x4c/0x78 [ 0.000000] setup_machine_fdt+0x64/0xfc [ 0.000000] setup_arch+0x68/0x1e0 [ 0.000000] start_kernel+0x68/0x380
On Mon, May 13, 2019 at 07:14:32PM +0800, Hsin-Yi Wang wrote:
On Mon, May 13, 2019 at 4:59 PM Mike Rapoport rppt@linux.ibm.com wrote:
This makes the fdt mapped without the call to meblock_reserve(fdt) which makes the fdt memory available for memblock allocations.
Chances that is will be actually allocated are small, but you know, things happen.
IMHO, instead of calling directly __fixmap_remap_fdt() it would be better to add pgprot parameter to fixmap_remap_fdt(). Then here and in kaslr.c it can be called with PAGE_KERNEL and below with PAGE_KERNEL_RO.
There is no problem to call memblock_reserve() for the same area twice, it's essentially a NOP.
Thanks for the suggestion. Will update fixmap_remap_fdt() in next patch.
However, I tested on some arm64 platform, if we also call memblock_reserve() in kaslr.c, would cause warning[1] when memblock_reserve() is called again in setup_machine_fdt(). The warning comes from https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L601
if (type->regions[0].size == 0) { WARN_ON(type->cnt != 1 || type->total_size); ...
Call memblock_reserve() multiple times after setup_machine_fdt() doesn't have such warning though.
I'm not sure if early console is available at the time kaslr_early_init() is called, but if yes, running with memblock=debug may shed some light.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
My concern that this uncovered a real bug which might hit us later.
[1] [ 0.000000] WARNING: CPU: 0 PID: 0 at /mnt/host/source/src/third_party/kernel/v4.19/mm/memblock.c:583 memblock_add_range+0x1bc/0x1c8 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.38 #125 [ 0.000000] pstate: 600001c5 (nZCv dAIF -PAN -UAO) [ 0.000000] pc : memblock_add_range+0x1bc/0x1c8 [ 0.000000] lr : memblock_add_range+0x30/0x1c8 [ 0.000000] sp : ffffff9b5e203e80 [ 0.000000] x29: ffffff9b5e203ed0 x28: 0000000040959324 [ 0.000000] x27: 0000000040080000 x26: 0000000000080000 [ 0.000000] x25: 0000000080127e4b x24: 0000000000000000 [ 0.000000] x23: 0000001b55000000 x22: 000000000001152b [ 0.000000] x21: 000000005f800000 x20: 0000000000000000 [ 0.000000] x19: ffffff9b5e24bf00 x18: 00000000ffffffb8 [ 0.000000] x17: 000000000000003c x16: ffffffbefea00000 [ 0.000000] x15: ffffffbefea00000 x14: ffffff9b5e3c17d8 [ 0.000000] x13: 00e8000000000713 x12: 0000000000000000 [ 0.000000] x11: ffffffbefea00000 x10: 00e800005f800710 [ 0.000000] x9 : 000000000001152b x8 : ffffff9b5e365690 [ 0.000000] x7 : 6f20646573616228 x6 : 0000000000000002 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 [ 0.000000] x3 : 0000000000200000 x2 : 000000000001152b [ 0.000000] x1 : 000000005f800000 x0 : ffffff9b5e24bf00 [ 0.000000] Call trace: [ 0.000000] memblock_add_range+0x1bc/0x1c8 [ 0.000000] memblock_reserve+0x60/0xac [ 0.000000] fixmap_remap_fdt+0x4c/0x78 [ 0.000000] setup_machine_fdt+0x64/0xfc [ 0.000000] setup_arch+0x68/0x1e0 [ 0.000000] start_kernel+0x68/0x380
On Tue, May 14, 2019 at 11:42 PM Mike Rapoport rppt@linux.ibm.com wrote:
I'm not sure if early console is available at the time kaslr_early_init() is called, but if yes, running with memblock=debug may shed some light.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
My concern that this uncovered a real bug which might hit us later.
Hi Mike, Thanks for the hint. I tried on my device but seems that earlycon happens after the warning call trace, so can't more information.
Since on my device kaslr will be runned, I tried call memblock_reserve() in kaslr and not in setup_machine_fdt()#fixmap_remap_fdt, but got following warning
[ 0.000000] memblock_remove: [0x0001000000000000-0x0000fffffffffffe] arm64_memblock_init+0x28/0x224 [ 0.000000] memblock_remove: [0x0000004040000000-0x000000403ffffffe] arm64_memblock_init+0x64/0x224 [ 0.000000] memblock_reserve: [0x0000000040080000-0x00000000413c3fff] arm64_memblock_init+0x188/0x224 [ 0.000000] WARNING: CPU: 0 PID: 0 at /mnt/host/source/src/third_party/kernel/v4.19/mm/memblock.c:583 memblock_add_range+0x1bc/0x1c8 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.38 #222 [ 0.000000] Hardware name: MediaTek kukui rev2 board (DT) [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 0.000000] pc : memblock_add_range+0x1bc/0x1c8 [ 0.000000] lr : memblock_add_range+0x30/0x1c8 [ 0.000000] sp : ffffffab68603ea0 [ 0.000000] x29: ffffffab68603ef0 x28: 0000000040954324 [ 0.000000] x27: 0000000040080000 x26: 0000000000080000 [ 0.000000] x25: 0000000080127e4b x24: ffffffab68716000 [ 0.000000] x23: ffffffab680b5000 x22: 0000000001344000 [ 0.000000] x21: 0000000040080000 x20: 0000000000000000 [ 0.000000] x19: ffffffab6864bf00 x18: 00000000fffffc94 [ 0.000000] x17: 000000000000003c x16: ffffffab67d49064 [ 0.000000] x15: 0000000000000006 x14: 626d656d5f34366d [ 0.000000] x13: 7261205d66666633 x12: 0000000000000000 [ 0.000000] x11: 0000000000000000 x10: ffffffffffffffff [ 0.000000] x9 : 0000000000011547 x8 : ffffffab68765690 [ 0.000000] x7 : 696e695f6b636f6c x6 : ffffffab6875dd41 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 [ 0.000000] x3 : ffffffab678a24a0 x2 : 0000000001344000 [ 0.000000] x1 : 0000000040080000 x0 : ffffffab6864bf00 [ 0.000000] Call trace: [ 0.000000] memblock_add_range+0x1bc/0x1c8 [ 0.000000] memblock_reserve+0x60/0xac [ 0.000000] arm64_memblock_init+0x188/0x224 [ 0.000000] setup_arch+0x138/0x19c [ 0.000000] start_kernel+0x68/0x380 [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x3c/0x58 with crng_init=0 [ 0.000000] ---[ end trace ea99802b425f7adf ]--- [ 0.000000] memblock_reserve: [0x000000005f800000-0x000000005f811536] early_init_dt_reserve_memory_arch+0x38/0x48 [ 0.000000] memblock_reserve: [0x00000000ffe00000-0x00000000ffffffff] early_init_dt_reserve_memory_arch+0x38/0x48
So I guess we just can't call memblock_reserve() in kaslr?
On Wed, 15 May 2019 at 12:24, Hsin-Yi Wang hsinyi@chromium.org wrote:
On Tue, May 14, 2019 at 11:42 PM Mike Rapoport rppt@linux.ibm.com wrote:
I'm not sure if early console is available at the time kaslr_early_init() is called, but if yes, running with memblock=debug may shed some light.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
My concern that this uncovered a real bug which might hit us later.
Hi Mike, Thanks for the hint. I tried on my device but seems that earlycon happens after the warning call trace, so can't more information.
Since on my device kaslr will be runned, I tried call memblock_reserve() in kaslr and not in setup_machine_fdt()#fixmap_remap_fdt, but got following warning
I realize this is not documented sufficiently in the commit log, but the reason I introduced the separate __fixmap_remap_fdt() [which does not call memblock_reserve()] was that the KASLR init code should set as little global state as possible, given that it is called with the kernel mapped at the wrong virtual address.
The KASLR boot sequence is something like - map kernel at default [unrandomized] address - apply relocations and clear BSS - run KASLR init to map and parse the FDT [*] - if KASLR is enabled, unmap the kernel and remap it at the randomized address - apply relocations and clear BSS - proceed with start_kernel()
The issue you are seeing is caused by the fact that the memblock bookkeeping gets into an inconsistent state due to the 2nd clearing of BSS.
[*] The reason we need to map the FDT this early is to obtain the random seed, and to check whether 'nokaslr' was passed on the kernel command line. The reason arm64 deviates from other architectures in this regard is that we don't have a decompressor, and so there is no other execution context available where we can run C code to parse the FDT etc before we enter the kernel proper.
[ 0.000000] memblock_remove: [0x0001000000000000-0x0000fffffffffffe] arm64_memblock_init+0x28/0x224 [ 0.000000] memblock_remove: [0x0000004040000000-0x000000403ffffffe] arm64_memblock_init+0x64/0x224 [ 0.000000] memblock_reserve: [0x0000000040080000-0x00000000413c3fff] arm64_memblock_init+0x188/0x224 [ 0.000000] WARNING: CPU: 0 PID: 0 at /mnt/host/source/src/third_party/kernel/v4.19/mm/memblock.c:583 memblock_add_range+0x1bc/0x1c8 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.38 #222 [ 0.000000] Hardware name: MediaTek kukui rev2 board (DT) [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 0.000000] pc : memblock_add_range+0x1bc/0x1c8 [ 0.000000] lr : memblock_add_range+0x30/0x1c8 [ 0.000000] sp : ffffffab68603ea0 [ 0.000000] x29: ffffffab68603ef0 x28: 0000000040954324 [ 0.000000] x27: 0000000040080000 x26: 0000000000080000 [ 0.000000] x25: 0000000080127e4b x24: ffffffab68716000 [ 0.000000] x23: ffffffab680b5000 x22: 0000000001344000 [ 0.000000] x21: 0000000040080000 x20: 0000000000000000 [ 0.000000] x19: ffffffab6864bf00 x18: 00000000fffffc94 [ 0.000000] x17: 000000000000003c x16: ffffffab67d49064 [ 0.000000] x15: 0000000000000006 x14: 626d656d5f34366d [ 0.000000] x13: 7261205d66666633 x12: 0000000000000000 [ 0.000000] x11: 0000000000000000 x10: ffffffffffffffff [ 0.000000] x9 : 0000000000011547 x8 : ffffffab68765690 [ 0.000000] x7 : 696e695f6b636f6c x6 : ffffffab6875dd41 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 [ 0.000000] x3 : ffffffab678a24a0 x2 : 0000000001344000 [ 0.000000] x1 : 0000000040080000 x0 : ffffffab6864bf00 [ 0.000000] Call trace: [ 0.000000] memblock_add_range+0x1bc/0x1c8 [ 0.000000] memblock_reserve+0x60/0xac [ 0.000000] arm64_memblock_init+0x188/0x224 [ 0.000000] setup_arch+0x138/0x19c [ 0.000000] start_kernel+0x68/0x380 [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x3c/0x58 with crng_init=0 [ 0.000000] ---[ end trace ea99802b425f7adf ]--- [ 0.000000] memblock_reserve: [0x000000005f800000-0x000000005f811536] early_init_dt_reserve_memory_arch+0x38/0x48 [ 0.000000] memblock_reserve: [0x00000000ffe00000-0x00000000ffffffff] early_init_dt_reserve_memory_arch+0x38/0x48
So I guess we just can't call memblock_reserve() in kaslr?
On Wed, May 15, 2019 at 10:11:53PM +0200, Ard Biesheuvel wrote:
On Wed, 15 May 2019 at 12:24, Hsin-Yi Wang hsinyi@chromium.org wrote:
On Tue, May 14, 2019 at 11:42 PM Mike Rapoport rppt@linux.ibm.com wrote:
I'm not sure if early console is available at the time kaslr_early_init() is called, but if yes, running with memblock=debug may shed some light.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
My concern that this uncovered a real bug which might hit us later.
Hi Mike, Thanks for the hint. I tried on my device but seems that earlycon happens after the warning call trace, so can't more information.
Since on my device kaslr will be runned, I tried call memblock_reserve() in kaslr and not in setup_machine_fdt()#fixmap_remap_fdt, but got following warning
I realize this is not documented sufficiently in the commit log, but the reason I introduced the separate __fixmap_remap_fdt() [which does not call memblock_reserve()] was that the KASLR init code should set as little global state as possible, given that it is called with the kernel mapped at the wrong virtual address.
The KASLR boot sequence is something like
- map kernel at default [unrandomized] address
- apply relocations and clear BSS
- run KASLR init to map and parse the FDT [*]
- if KASLR is enabled, unmap the kernel and remap it at the randomized address
- apply relocations and clear BSS
- proceed with start_kernel()
The issue you are seeing is caused by the fact that the memblock bookkeeping gets into an inconsistent state due to the 2nd clearing of BSS.
Ah, now the warning makes perfect sense :) Thanks!
[*] The reason we need to map the FDT this early is to obtain the random seed, and to check whether 'nokaslr' was passed on the kernel command line. The reason arm64 deviates from other architectures in this regard is that we don't have a decompressor, and so there is no other execution context available where we can run C code to parse the FDT etc before we enter the kernel proper.
[ 0.000000] memblock_remove: [0x0001000000000000-0x0000fffffffffffe] arm64_memblock_init+0x28/0x224 [ 0.000000] memblock_remove: [0x0000004040000000-0x000000403ffffffe] arm64_memblock_init+0x64/0x224 [ 0.000000] memblock_reserve: [0x0000000040080000-0x00000000413c3fff] arm64_memblock_init+0x188/0x224 [ 0.000000] WARNING: CPU: 0 PID: 0 at /mnt/host/source/src/third_party/kernel/v4.19/mm/memblock.c:583 memblock_add_range+0x1bc/0x1c8 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.38 #222 [ 0.000000] Hardware name: MediaTek kukui rev2 board (DT) [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 0.000000] pc : memblock_add_range+0x1bc/0x1c8 [ 0.000000] lr : memblock_add_range+0x30/0x1c8 [ 0.000000] sp : ffffffab68603ea0 [ 0.000000] x29: ffffffab68603ef0 x28: 0000000040954324 [ 0.000000] x27: 0000000040080000 x26: 0000000000080000 [ 0.000000] x25: 0000000080127e4b x24: ffffffab68716000 [ 0.000000] x23: ffffffab680b5000 x22: 0000000001344000 [ 0.000000] x21: 0000000040080000 x20: 0000000000000000 [ 0.000000] x19: ffffffab6864bf00 x18: 00000000fffffc94 [ 0.000000] x17: 000000000000003c x16: ffffffab67d49064 [ 0.000000] x15: 0000000000000006 x14: 626d656d5f34366d [ 0.000000] x13: 7261205d66666633 x12: 0000000000000000 [ 0.000000] x11: 0000000000000000 x10: ffffffffffffffff [ 0.000000] x9 : 0000000000011547 x8 : ffffffab68765690 [ 0.000000] x7 : 696e695f6b636f6c x6 : ffffffab6875dd41 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 [ 0.000000] x3 : ffffffab678a24a0 x2 : 0000000001344000 [ 0.000000] x1 : 0000000040080000 x0 : ffffffab6864bf00 [ 0.000000] Call trace: [ 0.000000] memblock_add_range+0x1bc/0x1c8 [ 0.000000] memblock_reserve+0x60/0xac [ 0.000000] arm64_memblock_init+0x188/0x224 [ 0.000000] setup_arch+0x138/0x19c [ 0.000000] start_kernel+0x68/0x380 [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x3c/0x58 with crng_init=0 [ 0.000000] ---[ end trace ea99802b425f7adf ]--- [ 0.000000] memblock_reserve: [0x000000005f800000-0x000000005f811536] early_init_dt_reserve_memory_arch+0x38/0x48 [ 0.000000] memblock_reserve: [0x00000000ffe00000-0x00000000ffffffff] early_init_dt_reserve_memory_arch+0x38/0x48
So I guess we just can't call memblock_reserve() in kaslr?
Quoting Hsin-Yi Wang (2019-05-13 04:14:32)
On Mon, May 13, 2019 at 4:59 PM Mike Rapoport rppt@linux.ibm.com wrote:
This makes the fdt mapped without the call to meblock_reserve(fdt) which makes the fdt memory available for memblock allocations.
Chances that is will be actually allocated are small, but you know, things happen.
IMHO, instead of calling directly __fixmap_remap_fdt() it would be better to add pgprot parameter to fixmap_remap_fdt(). Then here and in kaslr.c it can be called with PAGE_KERNEL and below with PAGE_KERNEL_RO.
There is no problem to call memblock_reserve() for the same area twice, it's essentially a NOP.
Thanks for the suggestion. Will update fixmap_remap_fdt() in next patch.
However, I tested on some arm64 platform, if we also call memblock_reserve() in kaslr.c, would cause warning[1] when memblock_reserve() is called again in setup_machine_fdt(). The warning comes from https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L601
if (type->regions[0].size == 0) { WARN_ON(type->cnt != 1 || type->total_size); ...
Call memblock_reserve() multiple times after setup_machine_fdt() doesn't have such warning though.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
Why not just have fixmap_remap_fdt() that maps it as RW and reserves memblock once, and then call __fixmap_remap_fdt() with RO after early_init_dt_scan() or unflatten_device_tree() is called? Why the desire to call memblock_reserve() twice or even three times?
On Tue, May 14, 2019 at 02:05:43PM -0700, Stephen Boyd wrote:
Quoting Hsin-Yi Wang (2019-05-13 04:14:32)
On Mon, May 13, 2019 at 4:59 PM Mike Rapoport rppt@linux.ibm.com wrote:
This makes the fdt mapped without the call to meblock_reserve(fdt) which makes the fdt memory available for memblock allocations.
Chances that is will be actually allocated are small, but you know, things happen.
IMHO, instead of calling directly __fixmap_remap_fdt() it would be better to add pgprot parameter to fixmap_remap_fdt(). Then here and in kaslr.c it can be called with PAGE_KERNEL and below with PAGE_KERNEL_RO.
There is no problem to call memblock_reserve() for the same area twice, it's essentially a NOP.
Thanks for the suggestion. Will update fixmap_remap_fdt() in next patch.
However, I tested on some arm64 platform, if we also call memblock_reserve() in kaslr.c, would cause warning[1] when memblock_reserve() is called again in setup_machine_fdt(). The warning comes from https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L601
if (type->regions[0].size == 0) { WARN_ON(type->cnt != 1 || type->total_size); ...
Call memblock_reserve() multiple times after setup_machine_fdt() doesn't have such warning though.
I didn't trace the real reason causing this. But in this case, maybe don't call memblock_reserve() in kaslr?
Why not just have fixmap_remap_fdt() that maps it as RW and reserves memblock once, and then call __fixmap_remap_fdt() with RO after early_init_dt_scan() or unflatten_device_tree() is called? Why the desire to call memblock_reserve() twice or even three times?
There's no desire to call memblock_reserve() twice. It's just that leaving the call for it in kaslr rather than in setup_arch() may end up with unreserved FDT because kaslr was disabled or even compiled out.
I've suggested to use fixmap_remap_fdt() everywhere because IMHO this improves readability and allows to un-export __fixmap_remap_fdt().
On Wed, May 15, 2019 at 1:01 PM Mike Rapoport rppt@linux.ibm.com wrote:
Why not just have fixmap_remap_fdt() that maps it as RW and reserves memblock once, and then call __fixmap_remap_fdt() with RO after early_init_dt_scan() or unflatten_device_tree() is called? Why the desire to call memblock_reserve() twice or even three times?
There's no desire to call memblock_reserve() twice. It's just that leaving the call for it in kaslr rather than in setup_arch() may end up with unreserved FDT because kaslr was disabled or even compiled out.
I've suggested to use fixmap_remap_fdt() everywhere because IMHO this improves readability and allows to un-export __fixmap_remap_fdt().
-- Sincerely yours, Mike.
How about adding an arch hook that's not limited to be called at init (like fixmap_remap_fdt). In this way we don't have to change currently arm64 setup structure and only map fdt to RW before we need to modify it (and can map back to RO after write). Since it can be called after init, for CONFIG_KEXEC, we can also use it before updating fdt with a new seed.
Does nothing by default, and for arm64 it can be like[1]. It's similar to __fixmap_remap_fdt on counting fdt size but using update_mapping_prot() (will flush the TLBs). And suppose fixmap_remap_fdt() is called at least once, region checking is skipped.
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8080c9f489c3..e0fff8a009da 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -32,6 +32,7 @@ #include <linux/io.h> #include <linux/mm.h> #include <linux/vmalloc.h> +#include <linux/of_fdt.h>
#include <asm/barrier.h> #include <asm/cputype.h> @@ -919,6 +920,22 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys) return dt_virt; }
+extern phys_addr_t fdt_pointer; + +/* Should be called after fixmap_remap_fdt() is called. */ +void update_fdt_pgprot(pgprot_t prot) +{ + u64 dt_virt_base = __fix_to_virt(FIX_FDT); + int offset, size; + + offset = fdt_pointer % SWAPPER_BLOCK_SIZE; + size = fdt_totalsize((void *)dt_virt_base + offset); + + update_mapping_prot(round_down(fdt_pointer, SWAPPER_BLOCK_SIZE), + dt_virt_base, + round_up(offset + size, SWAPPER_BLOCK_SIZE), prot); +} +
example use: update_fdt_pgprot(PAGE_KERNEL); fdt_delprop(initial_boot_params, node, "rng-seed"); update_fdt_pgprot(PAGE_KERNEL_RO);
I tested on arm64 device and it works. But if this doesn't seems right, I'll probably just don't don't map fdt back to RO if kexec is set.
Is this reasonable?
Thanks!
On Mon, May 13, 2019 at 08:38:18AM +0800, Hsin-Yi Wang wrote:
Introducing a chosen node, rng-seed, which is an entropy that can be passed to kernel called very early to increase initial device randomness. Bootloader should provide this entropy and the value is read from /chosen/rng-seed in DT.
Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org
change log: v1->v2:
- call function in early_init_dt_scan_chosen
- will add doc to devicetree-org/dt-schema on github if this is accepted
Documentation/devicetree/bindings/chosen.txt | 14 ++++++++++++++ drivers/of/fdt.c | 11 +++++++++++ 2 files changed, 25 insertions(+)
diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt index 45e79172a646..fef5c82672dc 100644 --- a/Documentation/devicetree/bindings/chosen.txt +++ b/Documentation/devicetree/bindings/chosen.txt @@ -28,6 +28,20 @@ mode) when EFI_RNG_PROTOCOL is supported, it will be overwritten by the Linux EFI stub (which will populate the property itself, using EFI_RNG_PROTOCOL). +rng-seed +-----------
+This property served as an entropy to add device randomness. It is parsed +as a byte array, e.g.
+/ {
- chosen {
rng-seed = <0x31 0x95 0x1b 0x3c 0xc9 0xfa 0xb3 ...>;
- };
+};
+This random value should be provided by bootloader.
stdout-path
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index de893c9616a1..96ea5eba9dd5 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -24,6 +24,7 @@ #include <linux/debugfs.h> #include <linux/serial_core.h> #include <linux/sysfs.h> +#include <linux/random.h> #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ #include <asm/page.h> @@ -1079,6 +1080,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, { int l; const char *p;
- const void *rng_seed;
pr_debug("search "chosen", depth: %d, uname: %s\n", depth, uname); @@ -1113,6 +1115,15 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, pr_debug("Command line is: %s\n", (char*)data);
- rng_seed = of_get_flat_dt_prop(node, "rng-seed", &l);
- if (!rng_seed || l == 0)
return 1;
- /* try to clear seed so it won't be found. */
fdt_nop_property(initial_boot_params, node, "rng-seed");
add_device_randomness(rng_seed, l);
Please fix the indentation.
- /* break now */ return 1;
}
2.20.1
On Sun, May 12, 2019 at 7:39 PM Hsin-Yi Wang hsinyi@chromium.org wrote:
Introducing a chosen node, rng-seed, which is an entropy that can be passed to kernel called very early to increase initial device randomness. Bootloader should provide this entropy and the value is read from /chosen/rng-seed in DT.
Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org
change log: v1->v2:
- call function in early_init_dt_scan_chosen
- will add doc to devicetree-org/dt-schema on github if this is accepted
Documentation/devicetree/bindings/chosen.txt | 14 ++++++++++++++ drivers/of/fdt.c | 11 +++++++++++ 2 files changed, 25 insertions(+)
diff --git a/Documentation/devicetree/bindings/chosen.txt b/Documentation/devicetree/bindings/chosen.txt index 45e79172a646..fef5c82672dc 100644 --- a/Documentation/devicetree/bindings/chosen.txt +++ b/Documentation/devicetree/bindings/chosen.txt @@ -28,6 +28,20 @@ mode) when EFI_RNG_PROTOCOL is supported, it will be overwritten by the Linux EFI stub (which will populate the property itself, using EFI_RNG_PROTOCOL).
+rng-seed +-----------
+This property served as an entropy to add device randomness. It is parsed +as a byte array, e.g.
+/ {
chosen {
rng-seed = <0x31 0x95 0x1b 0x3c 0xc9 0xfa 0xb3 ...>;
};
+};
+This random value should be provided by bootloader.
stdout-path
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index de893c9616a1..96ea5eba9dd5 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -24,6 +24,7 @@ #include <linux/debugfs.h> #include <linux/serial_core.h> #include <linux/sysfs.h> +#include <linux/random.h>
#include <asm/setup.h> /* for COMMAND_LINE_SIZE */ #include <asm/page.h> @@ -1079,6 +1080,7 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, { int l; const char *p;
const void *rng_seed; pr_debug("search \"chosen\", depth: %d, uname: %s\n", depth, uname);
@@ -1113,6 +1115,15 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
pr_debug("Command line is: %s\n", (char*)data);
rng_seed = of_get_flat_dt_prop(node, "rng-seed", &l);
if (!rng_seed || l == 0)
return 1;
This only works if this hunk stays at the end of the function. I'd invert the if and move the next 2 functions under it.
/* try to clear seed so it won't be found. */
fdt_nop_property(initial_boot_params, node, "rng-seed");
I'd just delete the property.
Also, what about kexec? Don't you need to add a new seed?
add_device_randomness(rng_seed, l);
/* break now */ return 1;
}
2.20.1
On Mon, May 13, 2019 at 9:14 PM Rob Herring robh+dt@kernel.org wrote:
fdt_nop_property(initial_boot_params, node, "rng-seed");
I'd just delete the property.
Also, what about kexec? Don't you need to add a new seed?
Will update in v3. Thanks.
boot-architecture@lists.linaro.org