On Sun, Nov 10, 2024 at 03:34:54AM +0800, Yangyu Chen wrote:
Hi Charlie,
I have tested this patchset with ghostwrite rebased to linux commit da4373fbcf ("Merge tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm") [1] on my D1 Nezha board, with defconfig + CONFIG_ERRATA_THEAD_GHOSTWRITE=n, I got this message during boot:
[ 0.027584] Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'riscv_vector_ctx'. Error -22 [ 0.038057] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00310-gb276cf69df24-dirty #11 [ 0.047240] Hardware name: Allwinner D1 Nezha (DT) [ 0.052007] Call Trace: [ 0.054434] [<ffffffff80007172>] dump_backtrace+0x1c/0x24 [ 0.059806] [<ffffffff809f6834>] show_stack+0x2c/0x38 [ 0.064833] [<ffffffff80a040f0>] dump_stack_lvl+0x52/0x74 [ 0.070206] [<ffffffff80a04126>] dump_stack+0x14/0x1c [ 0.075233] [<ffffffff809f6db6>] panic+0x10c/0x300 [ 0.080000] [<ffffffff8017b5a0>] __kmem_cache_create_args+0x24a/0x2b6 [ 0.086413] [<ffffffff80c04c68>] riscv_v_setup_ctx_cache+0x56/0x84 [ 0.092566] [<ffffffff80c04288>] arch_task_cache_init+0x10/0x1c [ 0.098460] [<ffffffff80c07d02>] fork_init+0x68/0x1a8 [ 0.103486] [<ffffffff80c00ed2>] start_kernel+0x77e/0x822 [ 0.108870] ---[ end Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'riscv_vector_ctx'. Error -22 ]---
[1] https://github.com/cyyself/linux/tree/xtheadvector_20241110
On 9/12/24 13:55, Charlie Jenkins wrote:
diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c index 682b3feee451..9775d6a9c8ee 100644 --- a/arch/riscv/kernel/vector.c +++ b/arch/riscv/kernel/vector.c @@ -33,7 +33,17 @@ int riscv_v_setup_vsize(void) { unsigned long this_vsize;
- /* There are 32 vector registers with vlenb length. */
- /*
* There are 32 vector registers with vlenb length.
*
* If the thead,vlenb property was provided by the firmware, use that
* instead of probing the CSRs.
*/
- if (thead_vlenb_of) {
this_vsize = thead_vlenb_of * 32;
Then, I patched here which replaces "this_vsize" with "riscv_v_vsize". The kernel boots normally and I can see “xtheadvector" in /proc/cpuinfo.
However, when I try to run the "v_exec_initval_nolibc" test, the kernel panics with these outputs:
[ 978.788878] Oops - illegal instruction [#1] [ 978.788897] Modules linked in: [ 978.788908] CPU: 0 UID: 1000 PID: 461 Comm: v_exec_initval_ Not tainted 6.12.0-rc6-00310-gb276cf69df24-dirty #12 [ 978.788924] Hardware name: Allwinner D1 Nezha (DT) [ 978.788929] epc : do_trap_ecall_u+0x56/0x20a [ 978.788956] ra : _new_vmalloc_restore_context_a0+0xc2/0xce [ 978.788974] epc : ffffffff80a04afe ra : ffffffff80a0e742 sp : ffffffc6003fbeb0 [ 978.788983] gp : ffffffff81717080 tp : ffffffd60723b300 t0 : ffffffff81001268 [ 978.788991] t1 : ffffffff80a04aa8 t2 : ffffffff810012a8 s0 : ffffffc6003fbee0 [ 978.789000] s1 : ffffffc6003fbee0 a0 : ffffffc6003fbee0 a1 : 000000000000005d [ 978.789007] a2 : 0000000000000000 a3 : ffffffffffffffda a4 : 0000000000000003 [ 978.789015] a5 : 0000000000000000 a6 : 0000000002adb5fe a7 : 000000000000005d [ 978.789022] s2 : 00000000000108a8 s3 : 0000000000000000 s4 : 0000000000000008 [ 978.789030] s5 : 0000003fb42ab780 s6 : 0000002adb5fe420 s7 : 0000002adb5fb9e0 [ 978.789038] s8 : 0000002adb5fe440 s9 : 0000002adb5fe420 s10: 0000002adb572ad4 [ 978.789046] s11: 0000002adb572ad0 t3 : 0000003fb43c5e3c t4 : 622f7273752f3d5f [ 978.789053] t5 : 0000002adb5fd5a1 t6 : 0000000002adb5ff [ 978.789060] status: 8000000201800100 badaddr: 000000005e0fb057 cause: 0000000000000002 [ 978.789069] [<ffffffff80a04afe>] do_trap_ecall_u+0x56/0x20a [ 978.789086] [<ffffffff80a0e742>] _new_vmalloc_restore_context_a0+0xc2/0xce [ 978.789113] Code: a073 1007 006f 1a60 7057 0c30 57fd 17fe 77d7 0c30 (b057) 5e0f [ 978.789123] ---[ end trace 0000000000000000 ]--- [ 978.789131] Kernel panic - not syncing: Fatal exception in interrupt [ 978.937158] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Is something wrong with my setup?
Thanks for reporting this! I just sent out a new version with the fix. Something went wrong with the __riscv_v_vstate_discard() and was triggering this failure. I have tested that this new version is able to pass the testcase.
https://lore.kernel.org/linux-riscv/20241113-xtheadvector-v11-0-236c22791ef9...
- Charlie
Thanks, Yangyu Chen
return 0;
- }
- riscv_v_enable(); this_vsize = csr_read(CSR_VLENB) * 32; riscv_v_disable();
diff --git a/arch/riscv/kernel/vendor_extensions/thead.c b/arch/riscv/kernel/vendor_extensions/thead.c index 0f27baf8d245..519dbf70710a 100644 --- a/arch/riscv/kernel/vendor_extensions/thead.c +++ b/arch/riscv/kernel/vendor_extensions/thead.c @@ -5,6 +5,7 @@ #include <asm/vendor_extensions/thead.h> #include <linux/array_size.h> +#include <linux/cpumask.h> #include <linux/types.h> /* All T-Head vendor extensions supported in Linux */ @@ -16,3 +17,13 @@ struct riscv_isa_vendor_ext_data_list riscv_isa_vendor_ext_list_thead = { .ext_data_count = ARRAY_SIZE(riscv_isa_vendor_ext_thead), .ext_data = riscv_isa_vendor_ext_thead, };
+void disable_xtheadvector(void) +{
- int cpu;
- for_each_possible_cpu(cpu)
clear_bit(RISCV_ISA_VENDOR_EXT_XTHEADVECTOR, riscv_isa_vendor_ext_list_thead.per_hart_isa_bitmap[cpu].isa);
- clear_bit(RISCV_ISA_VENDOR_EXT_XTHEADVECTOR, riscv_isa_vendor_ext_list_thead.all_harts_isa_bitmap.isa);
+}