On Tue, Dec 08, 2020 at 06:25:13PM +0200, Maxim Levitsky wrote:
On Tue, 2020-12-08 at 17:02 +0100, Thomas Gleixner wrote:
On Tue, Dec 08 2020 at 16:50, Maxim Levitsky wrote:
On Mon, 2020-12-07 at 20:29 -0300, Marcelo Tosatti wrote:
+This ioctl allows to reconstruct the guest's IA32_TSC and TSC_ADJUST value +from the state obtained in the past by KVM_GET_TSC_STATE on the same vCPU.
+If 'KVM_TSC_STATE_TIMESTAMP_VALID' is set in flags, +KVM will adjust the guest TSC value by the time that passed since the moment +CLOCK_REALTIME timestamp was saved in the struct and current value of +CLOCK_REALTIME, and set the guest's TSC to the new value.
This introduces the wraparound bug in Linux timekeeping, doesnt it?
Which bug?
It does. Could you prepare a reproducer for this bug so I get a better idea about what are you talking about?
I assume you need very long (like days worth) jump to trigger this bug and for such case we can either work around it in qemu / kernel or fix it in the guest kernel and I strongly prefer the latter.
Thomas, what do you think about it?
For one I have no idea which bug you are talking about and if the bug is caused by the VMM then why would you "fix" it in the guest kernel.
The "bug" is that if VMM moves a hardware time counter (tsc or anything else) forward by large enough value in one go, then the guest kernel will supposingly have an overflow in the time code. I don't consider this to be a buggy VMM behavior, but rather a kernel bug that should be fixed (if this bug actually exists)
It exists.
Purely in theory this can even happen on real hardware if for example SMM handler blocks a CPU from running for a long duration, or hardware debugging interface does, or some other hardware transparent sleep mechanism kicks in and blocks a CPU from running. (We do handle this gracefully for S3/S4)
Aside of that I think I made it pretty clear what the right thing to do is.
This is orthogonal to this issue of the 'bug'. Here we are not talking about per-vcpu TSC offsets, something that I said that I do agree with you that it would be very nice to get rid of. We are talking about the fact that TSC can jump forward by arbitrary large value if the migration took arbitrary amount of time, which (assuming that the bug is real) can crash the guest kernel.
QE reproduced it.
This will happen even if we use per VM global tsc offset.
So what do you think?
Best regards, Maxim Levitsky
Thanks,
tglx