Hi All, We have been noticing protection key test failures on our systems here. Shggy(Dave Kleikamp) from oracle reported to this problem couple of weeks ago. Here the is the failure.
# ./protection_keys_64 has pkeys: 1 startup pkey_reg: 0000000055555550 test 0 PASSED (iteration 1) test 1 PASSED (iteration 1) test 2 PASSED (iteration 1) test 3 PASSED (iteration 1) test 4 PASSED (iteration 1) test 5 PASSED (iteration 1) test 6 PASSED (iteration 1) test 7 PASSED (iteration 1) test 8 PASSED (iteration 1) test 9 PASSED (iteration 1) test 10 PASSED (iteration 1) test 11 PASSED (iteration 1) test 12 PASSED (iteration 1) test 13 PASSED (iteration 1) protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed. Aborted (core dumped)
The test that is failing is "test_ptrace_of_child".
Sometimes it fails even earlier also. Sometimes(very rarely) it passes when I insert few printfs. Most probably it fails with test_ptrace_of_child.
In the test "test_ptrace_of_child", the parent thread attaches to the child and modifies the data structure associated with protection key. Verifies the access results based on the key permissions. While running the test, the tool finds the key permission changes out of nowhere. The test fails with assert "assert(pkey_reg == shadow_pkey_reg);"
Investigation so far. 1. The test fails on AMD(Milan) systems. Test appears to pass on Intel systems. Tested on old Intel system available here.
2. I was trying to see if the hardware(or firmware) is corrupting the pkru register value. At this point, it does not appear that way. I was able to trace all the MSR writes to the application or kernel. At this point, it does not appear to me as an hardware issue. I see that kernel appears to do save/restore properly during the context switch. This value stays default(value 0x55555554) in most cases unless the application changes the default permissions. Only application that changes here is "protection_keys".
3. I played with the test tool little bit. The behavior changes drastically when I make minor changes. For example, in the following code.
void setup_handlers(void) { signal(SIGCHLD, &sig_chld); setup_sigsegv_handler(); }
Just commenting the first line "signal(SIGCHLD, &sig_chld);" changes the behavior drastically. I see some tests PASS after this change. The first line appear to not do anything except some printing.
I still have not figured out completely what is going on with setup_sigsegv_handler();. It seems very odd modifying some structures in the signal handler. I am not sure about some of the offsets it is trying to modify. I feel it may be messing up something there. I am not sure though. Will have to investigate.
4. I took the traces(x86_fpu) while running test. It appears to show some of the feature headers are not copied during the test. But I could not figure out why it was happening. It appears to show not all the feature headers are copied when the new child is created. When I printed the feature bits, they all appear to be intact. Here is the trace.
protection_keys-17350 [035] 59275.833511: x86_fpu_regs_activated: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17351 [040] 59275.834274: x86_fpu_regs_activated: x86/fpu: 0xffff93d722877800 load: 1 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834283: x86_fpu_regs_deactivated: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17351 [040] 59275.834289: x86_fpu_regs_deactivated: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834296: x86_fpu_regs_activated: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834298: x86_fpu_regs_activated: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834406: x86_fpu_before_save: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834406: x86_fpu_after_save: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834408: x86_fpu_before_save: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834408: x86_fpu_after_save: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834654: x86_fpu_regs_deactivated: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834654: x86_fpu_dropped: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 auditd-1834 [036] 59275.834655: x86_fpu_regs_activated: x86/fpu: 0xffff93d713fef800 load: 1 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834665: x86_fpu_regs_deactivated: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 <...>-17285 [041] 59275.834679: x86_fpu_regs_activated: x86/fpu: 0xffff93d716d0df40 load: 1 xfeatures: 202 xcomp_bv: 8000000000000207 5. I instrumented parent child interactions using a separate standalone application(without the signal handler), it appears to work just fine. I see PRKU register staying intact swiching from parent to child.
My suspicion at this point is towards the selftest tool protection_keys.c. I will keep looking. Any feedback would be much appreciated to debug further.
Shaggy, Feel free to add if I missed something.
Thanks Babu
On 5/25/21 2:37 PM, Babu Moger wrote:
My suspicion at this point is towards the selftest tool protection_keys.c. I will keep looking. Any feedback would be much appreciated to debug further.
The pkey selftest code that pokes at the signal stack is rather hackish. If I had to guess, I'd suspect that PKRU ends up in there in a slightly different place than on Intel CPUs.
One oddity is that xfeatures seems to lose its pkey bit somewhere:
protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
The only legitimate way that can happen (on Intel at least) is an XRSTOR that brings PKRU back to the init state. That would destroy all meaningful PKRU state, unless PKRU=0, which it almost never does on Linux.
What values do PKRU and the shadow have when the test fails? Is PKRU 0? Any idea how xfeatures&0x200 got clear?
On 5/25/21 5:18 PM, Dave Hansen wrote:
On 5/25/21 2:37 PM, Babu Moger wrote:
My suspicion at this point is towards the selftest tool protection_keys.c. I will keep looking. Any feedback would be much appreciated to debug further.
The pkey selftest code that pokes at the signal stack is rather hackish. If I had to guess, I'd suspect that PKRU ends up in there in a slightly different place than on Intel CPUs.
One oddity is that xfeatures seems to lose its pkey bit somewhere:
protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
The only legitimate way that can happen (on Intel at least) is an XRSTOR that brings PKRU back to the init state. That would destroy all meaningful PKRU state, unless PKRU=0, which it almost never does on Linux.
What values do PKRU and the shadow have when the test fails? Is PKRU 0? Any idea how xfeatures&0x200 got clear?
I did observe that PKRU is 0 right before the failure.
Shouldn't this still be a valid value?
On 5/25/21 3:22 PM, Dave Kleikamp wrote:
On 5/25/21 5:18 PM, Dave Hansen wrote:
What values do PKRU and the shadow have when the test fails? Is PKRU 0? Any idea how xfeatures&0x200 got clear?
I did observe that PKRU is 0 right before the failure.
Shouldn't this still be a valid value?
It's architecturall *valid* in the hardware for sure, but nothing in the tests or the kernel should ever set PKRU=0. The selftest is noticing that it ends up at a value that's entirely unexpected and properly bugging out.
The reason I'm suspecting an XRSTOR is that the kernel always calls
XRSTOR(RFBM=-1)
setting all of the bits in the Requested Feature BitMap (RFBM). If RFBM[X]=1 and XSTATE_BV[i]=0 (XSTATE_BV is 'xfeatures' in the traces), then the state component is set to its initial configuration, which for PKRU is 0.
That's why I'm asking how xfeatures&0x200 got clear.
On 5/25/21 5:18 PM, Dave Hansen wrote:
On 5/25/21 2:37 PM, Babu Moger wrote:
My suspicion at this point is towards the selftest tool protection_keys.c. I will keep looking. Any feedback would be much appreciated to debug further.
The pkey selftest code that pokes at the signal stack is rather hackish. If I had to guess, I'd suspect that PKRU ends up in there in a slightly different place than on Intel CPUs.
You mean the offsets can be different? Not sure how to figure that out. Let me take a look.
One oddity is that xfeatures seems to lose its pkey bit somewhere:
Yes. I noticed that. I did not see that happening on Intel box where test runs successfully.
protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
The only legitimate way that can happen (on Intel at least) is an XRSTOR that brings PKRU back to the init state. That would destroy all meaningful PKRU state, unless PKRU=0, which it almost never does on Linux.
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be 0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow: 0000000000000000 protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
Any idea how xfeatures&0x200 got clear?
Printed all the flags while in __switch_to, the header flags and CR4 flags appears to be intact. Dont know how the feature 0x200 got cleared. Let me check if XRSTOR is coming into play here.
On 5/25/21 5:03 PM, Babu Moger wrote:
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be 0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow: 0000000000000000 protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
That's backwards (shadow vs pkru) from what I was expecting.
Can you turn on all the debuging?
Just compile with -DDEBUG_LEVEL=5
On 5/25/21 7:20 PM, Dave Hansen wrote:
On 5/25/21 5:03 PM, Babu Moger wrote:
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be 0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow: 0000000000000000 protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
That's backwards (shadow vs pkru) from what I was expecting.
Can you turn on all the debuging?
Just compile with -DDEBUG_LEVEL=5
Copied the logs at https://pastebin.com/gtQiHg8Q
On 5/26/21 10:25 AM, Babu Moger wrote:
On 5/25/21 7:20 PM, Dave Hansen wrote:
On 5/25/21 5:03 PM, Babu Moger wrote:
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be 0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow: 0000000000000000 protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
That's backwards (shadow vs pkru) from what I was expecting.
Can you turn on all the debuging?
Just compile with -DDEBUG_LEVEL=5
Copied the logs at https://pastebin.com/gtQiHg8Q
I think this failed early due to the debug code. I have a patch that gets around this.
On 5/26/21 8:25 AM, Babu Moger wrote:
On 5/25/21 7:20 PM, Dave Hansen wrote:
On 5/25/21 5:03 PM, Babu Moger wrote:
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be 0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow: 0000000000000000 protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
That's backwards (shadow vs pkru) from what I was expecting.
Can you turn on all the debuging?
Just compile with -DDEBUG_LEVEL=5
Copied the logs at https://pastebin.com/gtQiHg8Q
Well, it's a bit backwards from what I'm expecting. The PKRU=0 value *WAS* legitimate because all of the pkeys got allocated and their disable bits cleared.
I think Andy was close when he was blaming:
static inline void write_pkru(u32 pkru) {
...
pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
...
if (pk) pk->pkru = pkru; __write_pkru(pkru);
}
But that can't be it because PKRU ended up with 0x55555554. Something must have been writing 'init_pkru_value'.
switch_fpu_finish() does that:
static inline void switch_fpu_finish(struct fpu *new_fpu) { u32 pkru_val = init_pkru_value;
...
if (current->mm) { pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); if (pk) pkru_val = pk->pkru; } __write_pkru(pkru_val);
...
}
If 'new_fpu' had XSTATE_BV[PKRU]=0 then we'd have pk=NULL and 'pkru_val' would still have 'init_pkru_value'. *Then*, we'd have a shadow=0x0 and pkru=0x55555554. It would also only trigger if the hardware has an init tracker that fires when wrpkru(0). Intel doesn't do that. AMD must.
Anyway, I need to think about this a bit more. But, an entirely guaranteed to be 100% untested patch is attached. I'm *NOT* confident this is the right fix.
I don't have much AMD hardware laying around, so testing would be appreciated.
On 5/26/21 11:06 AM, Dave Hansen wrote:
On 5/26/21 8:25 AM, Babu Moger wrote:
On 5/25/21 7:20 PM, Dave Hansen wrote:
On 5/25/21 5:03 PM, Babu Moger wrote:
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be 0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow: 0000000000000000 protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
That's backwards (shadow vs pkru) from what I was expecting.
Can you turn on all the debuging?
Just compile with -DDEBUG_LEVEL=5
Copied the logs at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.c...
Well, it's a bit backwards from what I'm expecting. The PKRU=0 value *WAS* legitimate because all of the pkeys got allocated and their disable bits cleared.
I think Andy was close when he was blaming:
static inline void write_pkru(u32 pkru) {
...
pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
...
if (pk) pk->pkru = pkru; __write_pkru(pkru);
}
But that can't be it because PKRU ended up with 0x55555554. Something must have been writing 'init_pkru_value'.
switch_fpu_finish() does that:
Yes, I have noticed switch_fpu_finish writing init_pkru_value sometimes. But, I was not sure why that was happening..
static inline void switch_fpu_finish(struct fpu *new_fpu) { u32 pkru_val = init_pkru_value;
...
if (current->mm) { pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); if (pk) pkru_val = pk->pkru; } __write_pkru(pkru_val);
...
}
If 'new_fpu' had XSTATE_BV[PKRU]=0 then we'd have pk=NULL and 'pkru_val' would still have 'init_pkru_value'. *Then*, we'd have a shadow=0x0 and pkru=0x55555554. It would also only trigger if the hardware has an init tracker that fires when wrpkru(0). Intel doesn't do that. AMD must.
Ok. I will check with hardware guys here about this behavior.
Anyway, I need to think about this a bit more. But, an entirely guaranteed to be 100% untested patch is attached. I'm *NOT* confident this is the right fix.
I don't have much AMD hardware laying around, so testing would be appreciated.
Yes. Patch fixes problem on AMD. Also tested on Intel box to make sure it does not cause any regression there. It does work fine there as well. Thanks for the patch.
On 5/25/21 2:37 PM, Babu Moger wrote:
Investigation so far.
- The test fails on AMD(Milan) systems. Test appears to pass on Intel
systems. Tested on old Intel system available here.
Ten cents says that you have discovered that AMD returns a synthesized value for XINUSE[PKRU] and that write_pkru() is buggy.
Any volunteers to fix the if (!pk) case in write_pkru() and see if the problem goes away? (Or at least to print something for the if (!pk) case and seee if it gets printed?)
--Andy
On 5/25/21 7:36 PM, Andy Lutomirski wrote:
On 5/25/21 2:37 PM, Babu Moger wrote:
Investigation so far.
- The test fails on AMD(Milan) systems. Test appears to pass on Intel
systems. Tested on old Intel system available here.
Ten cents says that you have discovered that AMD returns a synthesized value for XINUSE[PKRU] and that write_pkru() is buggy.
Any volunteers to fix the if (!pk) case in write_pkru() and see if the problem goes away? (Or at least to print something for the if (!pk) case and seee if it gets printed?)
I inserted this patch to test this path.
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b1099f2d9800..a0bca917f864 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -151,6 +160,8 @@ static inline void write_pkru(u32 pkru) fpregs_lock(); if (pk) pk->pkru = pkru; + else + printk_once("%s read_pkru 0x%x\n",__FUNCTION__, read_pkru()); __write_pkru(pkru); fpregs_unlock(); }
Yes. The pk value is sometimes NULL and the value in the pkru register is 0 at that time. Here is the dmesg output.
[ 1.700889] write_pkru read_pkru 0x0
On Wed, May 26, 2021, at 2:14 PM, Babu Moger wrote:
On 5/25/21 7:36 PM, Andy Lutomirski wrote:
On 5/25/21 2:37 PM, Babu Moger wrote:
Investigation so far.
- The test fails on AMD(Milan) systems. Test appears to pass on Intel
systems. Tested on old Intel system available here.
Ten cents says that you have discovered that AMD returns a synthesized value for XINUSE[PKRU] and that write_pkru() is buggy.
Any volunteers to fix the if (!pk) case in write_pkru() and see if the problem goes away? (Or at least to print something for the if (!pk) case and seee if it gets printed?)
I inserted this patch to test this path.
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b1099f2d9800..a0bca917f864 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -151,6 +160,8 @@ static inline void write_pkru(u32 pkru) fpregs_lock(); if (pk) pk->pkru = pkru;
else
printk_once("%s read_pkru 0x%x\n",__FUNCTION__, read_pkru()); __write_pkru(pkru); fpregs_unlock();
}
Yes. The pk value is sometimes NULL and the value in the pkru register is 0 at that time. Here is the dmesg output.
[ 1.700889] write_pkru read_pkru 0x0
Sweet, maybe I get my hypothetical dime back! Both bugs are real.
Dave and I (mostly Dave — I’m supplying the snark) are working on an omnibus patch to fix this whole mess.
Thanks, Andy
linux-kselftest-mirror@lists.linaro.org