Re: [PATCH 0/6] Memory Mapping (VMA) protection using PKU - set 1

18 May 2023


      Hello Dave,
Thanks for your email.
On Thu, May 18, 2023 at 8:38 AM Dave Hansen dave.hansen@intel.com wrote:
...
On 5/17/23 16:48, Jeff Xu wrote:
...
However, there are a few challenges I have not yet worked through.
First, the code needs to track when the first signaling entry occurs
(saving the PKRU register to the thread struct) and when it is last
returned (restoring the PKRU register from the thread struct).
Would tracking signal "depth" work in the face of things like siglongjmp?
Thank you for your question! I am eager to learn more about this area
and I worry about blind spots. I will investigate and get back to you.
...
Taking a step back...
Here's my concern about this whole thing: it's headed down a rabbit hole
which is *highly* specialized both in the apps that will use it and the
attacks it will mitigate.  It probably *requires* turning off a bunch of
syscalls (like io_uring) that folks kinda like in general.
ChromeOS currently disabled io_uring, but it is not required to do so.
io_uring supports the IORING_OP_MADVICE operation, which calls the
do_madvise() function. This means that io_uring will have the same
pkey checks as the madvice() system call.  From that perspective, we
will fully support io_uring for this feature.
...
We're balancing that highly specialized mitigation with a feature that
add new ABI, touches core memory management code and signal handling.
The ABI change uses the existing flag field in pkey_alloc() which is
reserved. The implementation is backward compatible with all existing
pkey usages in both kernel and user space.  Or do you have other
concerns about ABI in mind ?
Yes, you are right about the risk of touching core mm code. To
minimize the risk, I try to control the scope of the change (it is
about 3 lines in mprotect, more in munmap but really just 3 effective
lines from syscall entry). I added new self-tests in mm to make sure
it doesn't regress in api behavior. I run those tests before and after
my kernel code change to make sure the behavior remains the same, I
tested it on 5.15 and 6.1 and 6.4-rc1.  Actually, the testing
discovered a behavior change for mprotect() between 6.1 and 6.4  (not
from this patch, there are refactoring works going on in mm) see this
thread [1]
I hope those steps will help to mitigate the risk.
Agreed on signaling handling is a tough part: what do you think about
the approach (modifying PKRU from saved stack after XSAVE), is there a
blocker ?
...
On the x86 side, PKRU is a painfully special snowflake.  It's exposed in
the "XSAVE" ABIs, but not actually managed *with* XSAVE in the kernel.
This would be making it an even more special snowflake because it would
I admit I'm quite ignorant on XSAVE  to understand the above
statement, and how that is related. Could you explain it to me please
? And what is in your mind that might improve the situation ?
...
need new altstack ABI and handling.
I thought adding protected memory support to signaling handling is an
independent project with its own weight. As Jann Horn points out in
[2]:  "we could prevent the attacker from corrupting the signal
context if we can protect the signal stack with a pkey."   However,
the kernel will send SIGSEGV when the stack is protected by PKEY,  so
there is a benefit to make this work.  (Maybe Jann can share some more
thoughts on the benefits)
And I believe we could do this in a way with minimum ABI change, as below:
- allocate PKEY with a new flag (PKEY_ALTSTACK)
- at sigaltstack() call, detect the memory is PKEY_ALTSTACK protected,
(similar as what mprotect does in this patch) and save it along with
stack address/size.
- at signaling handling, use the saved info to fill in PKRU.
The ABI change is similar to PKEY_ENFORCE_API, and there is no
backward compatibility issue.
Will these mentioned help our case ? What do you think ?
(Stephan has more info on gains,  as far as I know, V8 engineers have
worked/thought really hard to come to a suitable solution to make
chrome browser safer)
[1] https://lore.kernel.org/linux-mm/20230516165754.pocx4kaagn3yyw3r@revolver/T/
[2] https://docs.google.com/document/d/1OlnJbR5TMoaOAJsf4hHOc-FdTmYK2aDUI7d2hfCZ...
Thanks!
Best regards,
-Jeff

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 0/6] Memory Mapping (VMA) protection using PKU - set 1