On Fri Nov 7, 2025 at 3:54 PM UTC, Brendan Jackman wrote:
On Wed Sep 24, 2025 at 3:10 PM UTC, Patrick Roy wrote:
From: Patrick Roy roypat@amazon.co.uk
[ based on kvm/next ]
Unmapping virtual machine guest memory from the host kernel's direct map is a successful mitigation against Spectre-style transient execution issues: If the kernel page tables do not contain entries pointing to guest memory, then any attempted speculative read through the direct map will necessarily be blocked by the MMU before any observable microarchitectural side-effects happen. This means that Spectre-gadgets and similar cannot be used to target virtual machine memory. Roughly 60% of speculative execution issues fall into this category [1, Table 1].
This patch series extends guest_memfd with the ability to remove its memory from the host kernel's direct map, to be able to attain the above protection for KVM guests running inside guest_memfd.
Additionally, a Firecracker branch with support for these VMs can be found on GitHub [2].
For more details, please refer to the v5 cover letter [v5]. No substantial changes in design have taken place since.
=== Changes Since v6 ===
- Drop patch for passing struct address_space to ->free_folio(), due to possible races with freeing of the address_space. (Hugh)
- Stop using PG_uptodate / gmem preparedness tracking to keep track of direct map state. Instead, use the lowest bit of folio->private. (Mike, David)
- Do direct map removal when establishing mapping of gmem folio instead of at allocation time, due to impossibility of handling direct map removal errors in kvm_gmem_populate(). (Patrick)
- Do TLB flushes after direct map removal, and provide a module parameter to opt out from them, and a new patch to export flush_tlb_kernel_range() to KVM. (Will)
I just got around to trying this out, I checked out this patchset using its base-commit and grabbed the Firecracker branch. Things seem OK until I set the secrets_free flag in the Firecracker config which IIUC makes it set GUEST_MEMFD_FLAG_NO_DIRECT_MAP.
If I set it, I find the guest doesn't show anything on the console. Running it in a VM and attaching GDB suggests that it's entering the guest repeatedly, it doesn't seem like the vCPU thread is stuck or anything. I'm a bit clueless about how to debug that (so far, whenever I've broken KVM, things always exploded very dramatically).
I discovered that Firecracker has a GDB stub, so I can just attach to that and see what the guest is up to.
The issue that the pvclock_vcpu_time_info in kvmclock is all zero:
(gdb) backtrace #0 pvclock_tsc_khz (src=0xffffffff83a03000 <hv_clock_boot>) at ../arch/x86/kernel/pvclock.c:28 #1 0xffffffff8109d137 in kvm_get_tsc_khz () at ../arch/x86/include/asm/kvmclock.h:11 #2 0xffffffff835c1842 in kvm_get_preset_lpj () at ../arch/x86/kernel/kvmclock.c:128 #3 kvmclock_init () at ../arch/x86/kernel/kvmclock.c:332 #4 0xffffffff835c1487 in kvm_init_platform () at ../arch/x86/kernel/kvm.c:982 #5 0xffffffff835a83df in setup_arch (cmdline_p=cmdline_p@entry=0xffffffff82e03f00) at ../arch/x86/kernel/setup.c:916 #6 0xffffffff83595a22 in start_kernel () at ../init/main.c:925 #7 0xffffffff835a7354 in x86_64_start_reservations ( real_mode_data=real_mode_data@entry=0x36326c0 <error: Cannot access memory at address 0x36326c0>) at ../arch/x86/kernel/head64.c:507 #8 0xffffffff835a7466 in x86_64_start_kernel (real_mode_data=0x36326c0 <error: Cannot access memory at address 0x36326c0>) at ../arch/x86/kernel/head64.c:488 #9 0xffffffff8103e7fd in secondary_startup_64 () at ../arch/x86/kernel/head_64.S:413 #10 0x0000000000000000 in ?? () (gdb) p *src $3 = {version = 0, pad0 = 0, tsc_timestamp = 0, system_time = 0, tsc_to_system_mul = 0, tsc_shift = 0 '\000', flags = 0 '\000', pad = "\000"}
This causes a divide by zero in kvm_get_tsc_khz().
Probably the only reason I didn't see any console output is that I forgot to set earlyprintk, oops...