On 06.04.23 04:25, Sean Christopherson wrote:
On Sat, Mar 25, 2023, Greg KH wrote:
On Sat, Mar 25, 2023 at 12:39:59PM +0100, Mathias Krause wrote:
As this is a huge performance fix for us, we'd like to get it integrated into current stable kernels as well -- not without having the changes get some wider testing, of course, i.e. not before they end up in a non-rc version released by Linus. But I already did a backport to 5.4 to get a feeling how hard it would be and for the impact it has on older kernels.
Using the 'ssdd 10 50000' test I used before, I get promising results there as well. Without the patches it takes 9.31s, while with them we're down to 4.64s. Taking into account that this is the runtime of a workload in a VM that gets cut in half, I hope this qualifies as stable material, as it's a huge performance fix.
Greg, what's your opinion on it? Original series here: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/
I'll leave the judgement call up to the KVM maintainers, as they are the ones that need to ack any KVM patch added to stable trees.
These are quite risky to backport. E.g. we botched patch 6[*], and my initial fix also had a subtle bug. There have also been quite a few KVM MMU changes since 5.4, so it's possible that an edge case may exist in 5.4 that doesn't exist in mainline.
I totally agree. Getting the changes to work with older kernels needs more work. The MMU role handling was refactored in 5.14 and down to 5.4 it differs even more, so backports to earlier kernels definitely needs more care.
My plan would be to limit backporting of the whole series to kernels down to 5.15 (maybe 5.10 if it turns out to be doable) and for kernels before that only without patch 6. That would leave out the problematic change but still give us the benefits of dropping the needless mmu unloads for only toggling CR0.WP in the VM. This already helps us a lot!
I'm not totally opposed to the idea since our tests _should_ be provide solid coverage, e.g. existing tests caught my subtle bug, but I don't think we should backport these without a solid usecase, as there is a fairly high risk of breaking random KVM users that wouldn't see any meaningful benefit.
In other words, who cares enough about the performance of running grsecurity kernels in VMs to want these backported, but doesn't have the resources to maintain (or pay someone to maintain) their own host kernel?
The ones who care are, obviously, our customers -- and we, of course! Customers that can run their own infrastructure don't need these backports in upstream LTS kernels, as we will provide them as well. However, customers that rent VMs in the cloud have no control of what runs as host kernel. It'll likely be some distribution kernel or some tailored version of that, which is likely based on one of the LTS kernels.
Proxmox[1], for example, is a Debian based virtualization management system. They do provide their own kernels, based on 5.15. However, the official Debian stable kernel is based on 5.10. So it would be nice to get backports down to this version at least.
[1] https://www.proxmox.com/en/proxmox-ve/features
[*] https://lkml.kernel.org/r/20230405002608.418442-1-seanjc%40google.com