On 2022-04-11 at 07:42:53 -0700, Dave Hansen wrote:
On 4/11/22 03:14, Pengfei Xu wrote:
On 2022-04-08 at 09:58:50 -0700, Dave Hansen wrote:
On 3/16/22 05:40, Pengfei Xu wrote:
+#ifndef __cpuid_count +#define __cpuid_count(level, count, a, b, c, d) ({ \
- __asm__ __volatile__ ("cpuid\n\t" \
: "=a" (a), "=b" (b), "=c" (c), "=d" (d) \
: "0" (level), "2" (count)); \
+}) +#endif
By the time you post the next revision, I hope these are merged:
https://lore.kernel.org/all/cover.1647360971.git.reinette.chatre@intel.com/
Could you rebase on top of those to avoid duplication, please?
Yes, thanks for remind. I would like to use the helper __cpuid_count() based on above fixed patch. And I have a concern with stdlib.h with "-mno-sse" issue, it's a GCC bug, and it will be fixed in GCC11. And I could not use kselftest.h, because kselftest.h used stdlib.h also...
Ugh. What a mess.
I found the solution you mentioned as follow. I will try to fix it. Thanks!
Thanks for the link! From above "id=27600"link, Lu Hongjiu mentioned that: It's a "GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652" And id=99652 shows that it's fixed in GCC 11. I tried "-mgeneral-regs-only"(it includes "-mno-sse"), it also gcc failed with same error as "-mno-sse": " /usr/include/bits/stdlib-float.h: In function ‘atof’: /usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled " And the error shows that: it's related to "stdlib-float.h", seems I didn't use stdlib-float.h related function.
In order for this test code to support GCC versions before GCC11, such as GCC8, etc., I used this workaround "avoid stdlib.h" for GCC bug, and add *aligned_alloc() declaration above.
Because kselftest.h uses stdlib.h also, so I could not use kselftest.h with "-mno-see" special GCC requirement, and seems I could not use __cpuid_count() patch in kselftest.h from Reinette Chatre. Thanks!
Can you break this test up into two pieces which are spread across two .c files? One that *just* does the sequences that need XSAVE and to preserve FPU state with -mno-sse, and then a separate one that uses stdlib.h and also the kselftest infrastructure?
For instance, all of the detection and enumeration can be in the nornal .c file.
Thanks a lot for suggestion! I will put normal function in "prepare_xstate.c", and there is a "prepare_xstate.h" also. Only keep _xsave, _xrstor, key test function in xstate.c. gcc -xx(some other parm) -c prepare_xstate.c gcc -xx -mno-sse xstate.c prepare_xstate.o
...
+/* Fill FP/XMM/YMM/OPMASK and PKRU xstates into buffer. */ +static void fill_xstates_buf(struct xsave_buffer *buf, uint32_t xsave_mask) +{
- uint32_t i;
- /* The data of FP x87 state are as follows. */
OK, but what *is* this? Random data? Or did you put some data in that means something?
I learned from filling the fp data by follow functions: fill FPU xstate by fldl instructions, push the source operand onto the FPU register stack and recorded the fp xstate, then used buffer filling way to fill the fpu xstates: Some fp data could be set as random value under the "FP in SDM rules". Shall I add the comment for the fpu data filling like as follow: " /*
- The data of FP x87 state has the same effect as pushing source operand
- 0x1f2f3f4f1f2f3f4f onto the FPU stack ST0-7.
*/ unsigned char fp_data[160] = {... "
No, that's hideous. If you generated the fp state with code, let's include the *code*.
Thanks for suggestion. Yes, I could use fp instructions back and xsave inline function to xsave fp xstate into buf. I will think about it. Thanks!
As follow is the pushing source operand 0x1f2f3f4f1f2f3f4f onto the FPU stack ST0-7: " static inline void prepare_fp_buf(uint64_t ui64_fp) { /* Populate ui64_fp value onto FP registers stack ST0-7. */ asm volatile("finit"); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); asm volatile("fldl %0" : : "m" (ui64_fp)); } ... prepare_fp_buf(0x1f2f3f4f); __xsave(buf, xstate_info.mask); "
The code to set fp data and display xstate is as follow: https://github.com/xupengfe/xstate_show/blob/0411_fp/x86/xstate.c
I found there were 2 different:
- unsigned char fp_data[160] = {
0x7f, 0x03, 0x00, 0x00, 0xff, 0x00, 0x00, 0x00,
0xf0, 0x19, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
^Above function shows "0xff, 0x12" not "0xf0, 0x19" in 0x8/0x9
offset bytes: Bytes 11:8 are used for bits 31:0 of the x87 FPU Instruction Pointer Offset(FIP). It could be impacted if I added "vzeroall" and so on instructions, in order to match with above fpu function result, will change to "0xff, 0x12".
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
^Above function shows "0x80, 0x1f" not "0x00, 0x00" in offset
0x18/0x19 bytes: Bytes 27:24 are used for the MXCSR register. XRSTOR and XRSTORS generate general-protection faults (#GP) in response to attempts to set any of the reserved bits of the MXCSR register. It could be set as "0x00, 0x00", in order to match with result of above function, will fill as "0x80, 0x1f".
I'm totally lost with what you are trying to say here.
I used above "finit, fldl" instructions way to populate fp xstates as before. Sorry for unclear description.
...
I have to wonder if we can do this in a bit more structured way:
struct xstate_test { bool (*init)(void); bool (*fill_state)(struct xsave_buffer *buf); ... }
Yes, that means indirect function calls, but that's OK since we know it will all be compiled with the "no-sse" flag (and friends).
Then fill_xstates_buf() just becomes a loop over an xstate_tests[] array.
Yes, it's much better to fill the buf in a loop! Thanks! Because it's special for pkru filling, so I will improve it like as follow: " uint32_t xfeature_num; ...
/* Fill test byte value into each tested xstate buffer(offset/size). */ for (xfeature_num = XFEATURE_YMM; xfeature_num < XFEATURE_MAX; xfeature_num++) { if (xstate_tested(xfeature_num)) { if (xfeature_num == XFEATURE_PKRU) { /* Only 0-3 bytes of pkru xstates are allowed to be written. */ memset((unsigned char *)buf + xstate_info.offset[XFEATURE_PKRU], PKRU_TESTBYTE, sizeof(uint32_t)); continue; }
memset((unsigned char *)buf + xstate_info.offset[xfeature_num], XSTATE_TESTBYTE, xstate_info.size[xfeature_num]); }
} "
I'm not sure that's an improvement.
Ok. Except FP and xstate_bv, this loop will fill each tested value into xstate buffer. Thanks!
+/*
- Because xstate like XMM, YMM registers are not preserved across function
- calls, so use inline function with assembly code only to raise signal.
- */
+static inline long long __raise(long long pid_num, long long sig_num) +{
- long long ret, nr = SYS_kill;
- register long long arg1 asm("rdi") = pid_num;
- register long long arg2 asm("rsi") = sig_num;
I really don't like register variables. They're very fragile. Imagine if someone put a printf() right here. Why don't you just do this with inline assembly directives?
It seems that adding printf() to an inline function is not good. I'm not sure if I understand correctly: should I use inline assembly to assign variables to rdi, rsi and then syscall? It it's right, I will think about how to implement it by inline assembly way and fix it.
Yes, do it with inline assembly.
I will move value into rdi rsi register as inline assembly like follow: asm volatile("movq %0, %%rdi" : : "r"(pid_num) : "%rdi");
Thanks! --Pengfei