Although CONFIG_DEVICE_PRIVATE and hmm_range_fault() and related
functionality was first developed on x86, it also works on arm64.
However, when trying this out on an arm64 system, it turns out that
there is a massive slowdown during the setup and teardown phases.
This slowdown is due to lots of calls to WARN_ON()'s that are checking
for pages that are out of the physical range for the CPU. However,
that's a design feature of device private pages: they are specfically
chosen in order to be outside of the range of the CPU's true physical
pages.
x86 doesn't have this warning. It only checks that pages are properly
aligned. I've shown a comparison below between x86 (which works well)
and arm64 (which has these warnings).
memunmap_pages()
pageunmap_range()
if (pgmap->type == MEMORY_DEVICE_PRIVATE)
__remove_pages()
__remove_section()
sparse_remove_section()
section_deactivate()
depopulate_section_memmap()
/* arch/arm64/mm/mmu.c */
vmemmap_free()
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
...
}
/* arch/x86/mm/init_64.c */
vmemmap_free()
{
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
...
}
So, the warning is a false positive for this case. Therefore, skip the
warning if CONFIG_DEVICE_PRIVATE is set.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
cc: <stable(a)vger.kernel.org>
---
arch/arm64/mm/mmu.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6f9d8898a025..d5c9b611a8d1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1157,8 +1157,10 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
+/* Device private pages are outside of the CPU's physical page range. */
+#ifndef CONFIG_DEVICE_PRIVATE
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
-
+#endif
if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
return vmemmap_populate_basepages(start, end, node, altmap);
else
@@ -1169,8 +1171,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{
+/* Device private pages are outside of the CPU's physical page range. */
+#ifndef CONFIG_DEVICE_PRIVATE
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
-
+#endif
unmap_hotplug_range(start, end, true, altmap);
free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
}
--
2.40.0
This patch series backports a few VM preemption_status, steal_time and
PV TLB flushing fixes to 5.10 stable kernel.
Most of the changes backport cleanly except i had to work around a few
becauseof missing support/APIs in 5.10 kernel. I have captured those in
the changelog as well in the individual patches.
Changelog
- Use mark_page_dirty_in_slot api without kvm argument (KVM: x86: Fix
recording of guest steal time / preempted status)
- Avoid checking for xen_msr and SEV-ES conditions (KVM: x86:
do not set st->preempted when going back to user space)
- Use VCPU_STAT macro to expose preemption_reported and
preemption_other fields (KVM: x86: do not report a vCPU as preempted
outside instruction boundaries)
David Woodhouse (2):
KVM: x86: Fix recording of guest steal time / preempted status
KVM: Fix steal time asm constraints
Lai Jiangshan (1):
KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior
Paolo Bonzini (5):
KVM: x86: do not set st->preempted when going back to user space
KVM: x86: do not report a vCPU as preempted outside instruction
boundaries
KVM: x86: revalidate steal time cache if MSR value changes
KVM: x86: do not report preemption if the steal time cache is stale
KVM: x86: move guest_pv_has out of user_access section
Sean Christopherson (1):
KVM: x86: Remove obsolete disabling of page faults in
kvm_arch_vcpu_put()
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 164 ++++++++++++++++++++++----------
4 files changed, 122 insertions(+), 50 deletions(-)
--
2.37.1
The "serial/8250: Use fifo in 8250 console driver" commit
has revealed an issue of not re-enabling FIFO after resume.
The problematic path is inside uart_resume_port() function.
First, when the console device is re-enabled,
a call to uport->ops->set_termios() internally initializes FIFO
(in serial8250_do_set_termios()), although further code
disables it by issuing ops->startup() (pointer to serial8250_do_startup,
internally calling serial8250_clear_fifos()).
There is even a comment saying "Clear the FIFO buffers and disable them.
(they will be reenabled in set_termios())", but in this scenario,
set_termios() has been called already and FIFO remains disabled.
This patch address the issue by reversing the order - first checks
if tty port is suspended and performs actions accordingly
(e.g. call to ops->startup()), then tries to re-enable
the console device after suspend (and call to uport->ops->set_termios()).
Signed-off-by: Lukasz Majczak <lma(a)semihalf.com>
Cc: <stable(a)vger.kernel.org> # 6.1+
---
drivers/tty/serial/serial_core.c | 54 ++++++++++++++++----------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 394a05c09d87..57a153adba3a 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -2406,33 +2406,6 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
put_device(tty_dev);
uport->suspended = 0;
- /*
- * Re-enable the console device after suspending.
- */
- if (uart_console(uport)) {
- /*
- * First try to use the console cflag setting.
- */
- memset(&termios, 0, sizeof(struct ktermios));
- termios.c_cflag = uport->cons->cflag;
- termios.c_ispeed = uport->cons->ispeed;
- termios.c_ospeed = uport->cons->ospeed;
-
- /*
- * If that's unset, use the tty termios setting.
- */
- if (port->tty && termios.c_cflag == 0)
- termios = port->tty->termios;
-
- if (console_suspend_enabled)
- uart_change_pm(state, UART_PM_STATE_ON);
- uport->ops->set_termios(uport, &termios, NULL);
- if (!console_suspend_enabled && uport->ops->start_rx)
- uport->ops->start_rx(uport);
- if (console_suspend_enabled)
- console_start(uport->cons);
- }
-
if (tty_port_suspended(port)) {
const struct uart_ops *ops = uport->ops;
int ret;
@@ -2471,6 +2444,33 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
tty_port_set_suspended(port, false);
}
+ /*
+ * Re-enable the console device after suspending.
+ */
+ if (uart_console(uport)) {
+ /*
+ * First try to use the console cflag setting.
+ */
+ memset(&termios, 0, sizeof(struct ktermios));
+ termios.c_cflag = uport->cons->cflag;
+ termios.c_ispeed = uport->cons->ispeed;
+ termios.c_ospeed = uport->cons->ospeed;
+
+ /*
+ * If that's unset, use the tty termios setting.
+ */
+ if (port->tty && termios.c_cflag == 0)
+ termios = port->tty->termios;
+
+ if (console_suspend_enabled)
+ uart_change_pm(state, UART_PM_STATE_ON);
+ uport->ops->set_termios(uport, &termios, NULL);
+ if (!console_suspend_enabled && uport->ops->start_rx)
+ uport->ops->start_rx(uport);
+ if (console_suspend_enabled)
+ console_start(uport->cons);
+ }
+
mutex_unlock(&port->mutex);
return 0;
--
2.40.0.577.gac1e443424-goog
On 23.03.23 23:50, Sean Christopherson wrote:
> On Wed, 22 Mar 2023 02:37:25 +0100, Mathias Krause wrote:
>> v3: https://lore.kernel.org/kvm/20230201194604.11135-1-minipli@grsecurity.net/
>>
>> This series is the fourth iteration of resurrecting the missing pieces of
>> Paolo's previous attempt[1] to avoid needless MMU roots unloading.
>>
>> It's incorporating Sean's feedback to v3 and rebased on top of
>> kvm-x86/next, namely commit d8708b80fa0e ("KVM: Change return type of
>> kvm_arch_vm_ioctl() to "int"").
>>
>> [...]
>
> Applied 1 and 5 to kvm-x86 mmu, and the rest to kvm-x86 misc, thanks!
>
> [1/6] KVM: x86/mmu: Avoid indirect call for get_cr3
> https://github.com/kvm-x86/linux/commit/2fdcc1b32418
> [2/6] KVM: x86: Do not unload MMU roots when only toggling CR0.WP with TDP enabled
> https://github.com/kvm-x86/linux/commit/01b31714bd90
> [3/6] KVM: x86: Ignore CR0.WP toggles in non-paging mode
> https://github.com/kvm-x86/linux/commit/e40bcf9f3a18
> [4/6] KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
> https://github.com/kvm-x86/linux/commit/74cdc836919b
> [5/6] KVM: x86/mmu: Fix comment typo
> https://github.com/kvm-x86/linux/commit/50f13998451e
> [6/6] KVM: VMX: Make CR0.WP a guest owned bit
> https://github.com/kvm-x86/linux/commit/fb509f76acc8
Thanks a lot, Sean!
As this is a huge performance fix for us, we'd like to get it integrated
into current stable kernels as well -- not without having the changes
get some wider testing, of course, i.e. not before they end up in a
non-rc version released by Linus. But I already did a backport to 5.4 to
get a feeling how hard it would be and for the impact it has on older
kernels.
Using the 'ssdd 10 50000' test I used before, I get promising results
there as well. Without the patches it takes 9.31s, while with them we're
down to 4.64s. Taking into account that this is the runtime of a
workload in a VM that gets cut in half, I hope this qualifies as stable
material, as it's a huge performance fix.
Greg, what's your opinion on it? Original series here:
https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/
Thanks,
Mathias