On 2024-10-10 at 16:21+0000 Patrick Roy wrote:
On Tue, 2024-10-08 at 20:56 +0100, Sean Christopherson wrote:
Another (slightly crazy) approach would be use protection keys to provide the security properties that you want, while giving KVM (and userspace) a quick-and-easy override to access guest memory.
1. mmap() guest_memfd into userpace with RW protections 2. Configure PKRU to make guest_memfd memory inaccessible by default 3. Swizzle PKRU on-demand when intentionally accessing guest memory
It's essentially the same idea as SMAP+STAC/CLAC, just applied to guest memory instead of to usersepace memory.
The benefit of the PKRU approach is that there are no PTE modifications, and thus no TLB flushes, and only the CPU that is access guest memory gains temporary access. The big downside is that it would be limited to modern hardware, but that might be acceptable, especially if it simplifies KVM's implementation.
Mh, but we only have 16 protection keys, so we cannot give each VM a unique one. And if all guest memory shares the same protection key, then during the on-demand swizzling the CPU would get access to _all_ guest memory on the host, which "feels" scary. What do you think, @Derek?
Yes I am concerned about this. I don't see a way to use protection keys that would ensure the host kernel cannot be tricked by one guest into speculatively accessing another guest's memory (unless we do a key per vm, which like you say severely limits how many guests you can host).
Does ARM have something equivalent, btw?
Yes - Permission Overlay Extension [1]. Although even the most recent parts don't offer it. I don't see it in Neoverse V3 or Cortex-X4.
Derek
[1] https://lore.kernel.org/all/20240822151113.1479789-1-joey.gouly@arm.com/