The qemu-arm64 boot failed with linux next-20240219 tag kernel.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log: --------- <6>[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 <1>[ 0.000000] Unable to handle kernel paging request at virtual address ffff80008001ffe8 <1>[ 0.000000] Mem abort info: <1>[ 0.000000] ESR = 0x0000000096000004 <1>[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 0.000000] SET = 0, FnV = 0 <1>[ 0.000000] EA = 0, S1PTW = 0 <1>[ 0.000000] FSC = 0x04: level 0 translation fault <1>[ 0.000000] Data abort info: <1>[ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 <1>[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 0.000000] swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000042497000 <1>[ 0.000000] [ffff80008001ffe8] pgd=10000000439a5003, p4d=10000001000e3003, pud=10000001000e4003, pmd=10000001000e5003, pte=006800000800f413 <0>[ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP <4>[ 0.000000] Modules linked in: <4>[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc5-next-20240219 #1 <4>[ 0.000000] Hardware name: linux,dummy-virt (DT) <4>[ 0.000000] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 0.000000] pc : gic_of_init+0x84/0x3a8 <4>[ 0.000000] lr : gic_of_init+0x290/0x3a8 ... <4>[ 0.000000] Call trace: <4>[ 0.000000] gic_of_init+0x84/0x3a8 <4>[ 0.000000] of_irq_init+0x1d4/0x3d0 <4>[ 0.000000] irqchip_init+0x20/0x50 <4>[ 0.000000] init_IRQ+0xa8/0xc8 <4>[ 0.000000] start_kernel+0x270/0x690 <4>[ 0.000000] __primary_switched+0x80/0x90 <0>[ 0.000000] Code: f94017e0 f90007e0 d29ffd00 8b0002c0 (b9400000) <4>[ 0.000000] ---[ end trace 0000000000000000 ]--- <0>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! <0>[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
Links: - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240219/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240219/tes...
-- Linaro LKFT https://lkft.linaro.org
On Mon, 19 Feb 2024 09:42:45 +0000, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The qemu-arm64 boot failed with linux next-20240219 tag kernel.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log:
<6>[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 <1>[ 0.000000] Unable to handle kernel paging request at virtual address ffff80008001ffe8 <1>[ 0.000000] Mem abort info: <1>[ 0.000000] ESR = 0x0000000096000004 <1>[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 0.000000] SET = 0, FnV = 0 <1>[ 0.000000] EA = 0, S1PTW = 0 <1>[ 0.000000] FSC = 0x04: level 0 translation fault <1>[ 0.000000] Data abort info: <1>[ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 <1>[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 0.000000] swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000042497000 <1>[ 0.000000] [ffff80008001ffe8] pgd=10000000439a5003, p4d=10000001000e3003, pud=10000001000e4003, pmd=10000001000e5003, pte=006800000800f413 <0>[ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP <4>[ 0.000000] Modules linked in: <4>[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc5-next-20240219 #1 <4>[ 0.000000] Hardware name: linux,dummy-virt (DT) <4>[ 0.000000] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 0.000000] pc : gic_of_init+0x84/0x3a8 <4>[ 0.000000] lr : gic_of_init+0x290/0x3a8 ... <4>[ 0.000000] Call trace: <4>[ 0.000000] gic_of_init+0x84/0x3a8 <4>[ 0.000000] of_irq_init+0x1d4/0x3d0 <4>[ 0.000000] irqchip_init+0x20/0x50 <4>[ 0.000000] init_IRQ+0xa8/0xc8 <4>[ 0.000000] start_kernel+0x270/0x690 <4>[ 0.000000] __primary_switched+0x80/0x90 <0>[ 0.000000] Code: f94017e0 f90007e0 d29ffd00 8b0002c0 (b9400000) <4>[ 0.000000] ---[ end trace 0000000000000000 ]--- <0>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! <0>[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
Links:
Where is the configuration file? What are the parameters to QEMU? Please consider making this a useful and actionable report.
Thanks,
M.
On Mon, 19 Feb 2024 09:48:28 +0000, Marc Zyngier maz@kernel.org wrote:
On Mon, 19 Feb 2024 09:42:45 +0000, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The qemu-arm64 boot failed with linux next-20240219 tag kernel.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Boot log:
<6>[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 <1>[ 0.000000] Unable to handle kernel paging request at virtual address ffff80008001ffe8 <1>[ 0.000000] Mem abort info: <1>[ 0.000000] ESR = 0x0000000096000004 <1>[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 0.000000] SET = 0, FnV = 0 <1>[ 0.000000] EA = 0, S1PTW = 0 <1>[ 0.000000] FSC = 0x04: level 0 translation fault <1>[ 0.000000] Data abort info: <1>[ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 <1>[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 0.000000] swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000042497000 <1>[ 0.000000] [ffff80008001ffe8] pgd=10000000439a5003, p4d=10000001000e3003, pud=10000001000e4003, pmd=10000001000e5003, pte=006800000800f413 <0>[ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP <4>[ 0.000000] Modules linked in: <4>[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc5-next-20240219 #1 <4>[ 0.000000] Hardware name: linux,dummy-virt (DT) <4>[ 0.000000] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) <4>[ 0.000000] pc : gic_of_init+0x84/0x3a8 <4>[ 0.000000] lr : gic_of_init+0x290/0x3a8 ... <4>[ 0.000000] Call trace: <4>[ 0.000000] gic_of_init+0x84/0x3a8 <4>[ 0.000000] of_irq_init+0x1d4/0x3d0 <4>[ 0.000000] irqchip_init+0x20/0x50 <4>[ 0.000000] init_IRQ+0xa8/0xc8 <4>[ 0.000000] start_kernel+0x270/0x690 <4>[ 0.000000] __primary_switched+0x80/0x90 <0>[ 0.000000] Code: f94017e0 f90007e0 d29ffd00 8b0002c0 (b9400000) <4>[ 0.000000] ---[ end trace 0000000000000000 ]--- <0>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! <0>[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
Links:
Where is the configuration file? What are the parameters to QEMU? Please consider making this a useful and actionable report.
For what it is worth, I've just tested both defconfig and my own configuration with both 4k (kvmtool, QEMU+KVM and on SynQuacer) and 16k (kvmtool), without any obvious problem.
So until you come up with more specific details, there isn't much I can do.
M.
On 2024/2/19 19:32, Marc Zyngier wrote:
For what it is worth, I've just tested both defconfig and my own configuration with both 4k (kvmtool, QEMU+KVM and on SynQuacer) and 16k (kvmtool), without any obvious problem.
I had a quick test on top of next-20240219 with defconfig. I can reproduce it with QEMU parameter '-cpu max -accel tcg', but things are fine with '-cpu max,lpa2=off -accel tcg'.
Bisection shows that the problem happens when we start putting the latest arm64 and kvmarm changes together. The following hack fixes the problem for me (but I **only** write it for kernel built with defconfig with ARM64_4K_PAGES=y atm).
I can investigate it further tomorrow (as it's too late now ;-) ). Or maybe Marc or Catalin can help fix it with a proper approach.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4f7662008ede..babdc3f4721b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2798,6 +2798,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { | .sign = FTR_SIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT, | .min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT, |+ .max_field_value = BIT(ID_AA64MMFR0_EL1_TGRAN4_WIDTH - 1) - 1, | #else | .sign = FTR_UNSIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
Thanks, Zenghui
On 2024-02-19 14:46, Zenghui Yu wrote:
On 2024/2/19 19:32, Marc Zyngier wrote:
For what it is worth, I've just tested both defconfig and my own configuration with both 4k (kvmtool, QEMU+KVM and on SynQuacer) and 16k (kvmtool), without any obvious problem.
I had a quick test on top of next-20240219 with defconfig. I can reproduce it with QEMU parameter '-cpu max -accel tcg', but things are fine with '-cpu max,lpa2=off -accel tcg'.
Bisection shows that the problem happens when we start putting the latest arm64 and kvmarm changes together. The following hack fixes the problem for me (but I **only** write it for kernel built with defconfig with ARM64_4K_PAGES=y atm).
I can investigate it further tomorrow (as it's too late now ;-) ). Or maybe Marc or Catalin can help fix it with a proper approach.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4f7662008ede..babdc3f4721b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2798,6 +2798,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { | .sign = FTR_SIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT, | .min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT, |+ .max_field_value = BIT(ID_AA64MMFR0_EL1_TGRAN4_WIDTH - 1) - 1, | #else | .sign = FTR_UNSIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
Yup, got to that point too.
Working on a slightly more elaborate fix.
Thanks,
M.
On Mon, 19 Feb 2024 14:46:46 +0000, Zenghui Yu yuzenghui@huawei.com wrote:
On 2024/2/19 19:32, Marc Zyngier wrote:
For what it is worth, I've just tested both defconfig and my own configuration with both 4k (kvmtool, QEMU+KVM and on SynQuacer) and 16k (kvmtool), without any obvious problem.
I had a quick test on top of next-20240219 with defconfig. I can reproduce it with QEMU parameter '-cpu max -accel tcg', but things are fine with '-cpu max,lpa2=off -accel tcg'.
Bisection shows that the problem happens when we start putting the latest arm64 and kvmarm changes together. The following hack fixes the problem for me (but I **only** write it for kernel built with defconfig with ARM64_4K_PAGES=y atm).
I can investigate it further tomorrow (as it's too late now ;-) ). Or maybe Marc or Catalin can help fix it with a proper approach.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4f7662008ede..babdc3f4721b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2798,6 +2798,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { | .sign = FTR_SIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT, | .min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT, |+ .max_field_value = BIT(ID_AA64MMFR0_EL1_TGRAN4_WIDTH - 1) - 1, | #else | .sign = FTR_UNSIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
I've posted my take on this at [1], which hopefully matches what you were aiming at.
Thanks,
M.
[1] https://lore.kernel.org/all/86bk8c4gyh.wl-maz@kernel.org/
On 2024/2/19 23:27, Marc Zyngier wrote:
On Mon, 19 Feb 2024 14:46:46 +0000, Zenghui Yu yuzenghui@huawei.com wrote:
On 2024/2/19 19:32, Marc Zyngier wrote:
For what it is worth, I've just tested both defconfig and my own configuration with both 4k (kvmtool, QEMU+KVM and on SynQuacer) and 16k (kvmtool), without any obvious problem.
I had a quick test on top of next-20240219 with defconfig. I can reproduce it with QEMU parameter '-cpu max -accel tcg', but things are fine with '-cpu max,lpa2=off -accel tcg'.
Bisection shows that the problem happens when we start putting the latest arm64 and kvmarm changes together. The following hack fixes the problem for me (but I **only** write it for kernel built with defconfig with ARM64_4K_PAGES=y atm).
I can investigate it further tomorrow (as it's too late now ;-) ). Or maybe Marc or Catalin can help fix it with a proper approach.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4f7662008ede..babdc3f4721b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2798,6 +2798,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { | .sign = FTR_SIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT, | .min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT, |+ .max_field_value = BIT(ID_AA64MMFR0_EL1_TGRAN4_WIDTH - 1) - 1, | #else | .sign = FTR_UNSIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
I've posted my take on this at [1], which hopefully matches what you were aiming at.
[1] https://lore.kernel.org/all/86bk8c4gyh.wl-maz@kernel.org/
Yup, this looks good to me.
Thanks, Zenghui
On Mon, 19 Feb 2024 at 20:57, Marc Zyngier maz@kernel.org wrote:
On Mon, 19 Feb 2024 14:46:46 +0000, Zenghui Yu yuzenghui@huawei.com wrote:
On 2024/2/19 19:32, Marc Zyngier wrote:
For what it is worth, I've just tested both defconfig and my own configuration with both 4k (kvmtool, QEMU+KVM and on SynQuacer) and 16k (kvmtool), without any obvious problem.
I had a quick test on top of next-20240219 with defconfig. I can reproduce it with QEMU parameter '-cpu max -accel tcg', but things are fine with '-cpu max,lpa2=off -accel tcg'.
Bisection shows that the problem happens when we start putting the latest arm64 and kvmarm changes together. The following hack fixes the problem for me (but I **only** write it for kernel built with defconfig with ARM64_4K_PAGES=y atm).
I can investigate it further tomorrow (as it's too late now ;-) ). Or maybe Marc or Catalin can help fix it with a proper approach.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4f7662008ede..babdc3f4721b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2798,6 +2798,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { | .sign = FTR_SIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT, | .min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT, |+ .max_field_value = BIT(ID_AA64MMFR0_EL1_TGRAN4_WIDTH - 1) - 1, | #else | .sign = FTR_UNSIGNED, | .field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
I've posted my take on this at [1], which hopefully matches what you were aiming at.
This patch [1] applied on Linux next-20240219 and tested and the boot test passed. I have validated today's Linux next-20240220 and the boot test passed.
Thanks,
M.
[1] https://lore.kernel.org/all/86bk8c4gyh.wl-maz@kernel.org/
- Naresh