Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes a failure on AMD processors because those addresses are reserved by HyperTransport (this is not documented). Avoid selftests failures by reserving those guest physical addresses.
Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()") Cc: stable@vger.kernel.org Cc: David Matlack dmatlack@google.com Reported-by: Maxim Levitsky mlevitsk@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 10a8ed691c66..d995cc9836ee 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) /* Limit physical addresses to PA-bits. */ vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
+#ifdef __x86_64__ + /* Avoid reserved HyperTransport region on AMD processors. */ + if (vm->pa_bits == 48) + vm->max_gfn = 0xfffcfffff; +#endif + /* Allocate and setup memory for guest. */ vm->vpages_mapped = sparsebit_alloc(); if (phy_pages != 0)
On Thu, 2021-08-05 at 06:54 -0400, Paolo Bonzini wrote:
Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes a failure on AMD processors because those addresses are reserved by HyperTransport (this is not documented). Avoid selftests failures by reserving those guest physical addresses.
Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()") Cc: stable@vger.kernel.org Cc: David Matlack dmatlack@google.com Reported-by: Maxim Levitsky mlevitsk@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 10a8ed691c66..d995cc9836ee 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) /* Limit physical addresses to PA-bits. */ vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; +#ifdef __x86_64__
- /* Avoid reserved HyperTransport region on AMD processors. */
- if (vm->pa_bits == 48)
vm->max_gfn = 0xfffcfffff;
+#endif
- /* Allocate and setup memory for guest. */ vm->vpages_mapped = sparsebit_alloc(); if (phy_pages != 0)
I probably would have restricted this workaround to AMD vendor string, but I don't mind this to be like that as well at least for now.
Reviewed-by: Maxim Levitsky mlevitsk@redhat.com
Best regards, Maxim Levitsky
On 8/5/21 11:54 AM, Paolo Bonzini wrote:
Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes a failure on AMD processors because those addresses are reserved by HyperTransport (this is not documented).
Oh, but it's actually documented in the AMD IOMMU manual [0] (and AMD IOMMU in linux do mark it as a reserved IOVA region i.e. HT_RANGE_START..HT_RANGE_END). And it's usually marked as a reserved type in E820. At least on the machines I've seen.
See manual section '2.1.2 IOMMU Logical Topology':
"Special address controls in Table 3 are interpreted against untranslated guest physical addressess (GPA) that lack a PASID TLP prefix."
Base Address Top Address Use
FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK FD_F910_0000h FD_F91F_FFFFh System Management FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables FD_FB00_0000h FD_FBFF_FFFFh Address Translation FD_FC00_0000h FD_FDFF_FFFFh I/O Space FD_FE00_0000h FD_FFFF_FFFFh Configuration FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages FE_2000_0000h FF_FFFF_FFFFh Reserved
It covers the range starting that address you fixed up ... up to 1Tb, fwiw.
You mark it ~1010G as max gfn so shouldn't be a problem.
[0] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
Avoid selftests failures by reserving those guest physical addresses.
Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()") Cc: stable@vger.kernel.org Cc: David Matlack dmatlack@google.com Reported-by: Maxim Levitsky mlevitsk@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 10a8ed691c66..d995cc9836ee 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) /* Limit physical addresses to PA-bits. */ vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; +#ifdef __x86_64__
- /* Avoid reserved HyperTransport region on AMD processors. */
- if (vm->pa_bits == 48)
vm->max_gfn = 0xfffcfffff;
+#endif
Not sure if it's worth the trouble having a macro with the same name as AMD iommu like:
#define HT_RANGE_START (0xfd00000000ULL) #define MAX_GFN (HT_RANGE_START - 1ULL)
#ifdef __x86_64__ /* Avoid reserved HyperTransport region on AMD processors. */ if (vm->pa_bits == 48) vm->max_gfn = MAX_GFN; #endif
It's a detail, but *perhaps* would help people grepping around it.
Also, not sure if checking against AMD cpuid vendor is worth, considering this is a limitation only on AMD.
/* Allocate and setup memory for guest. */ vm->vpages_mapped = sparsebit_alloc(); if (phy_pages != 0)
On Fri, 2021-08-06 at 11:57 +0100, Joao Martins wrote:
On 8/5/21 11:54 AM, Paolo Bonzini wrote:
Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes a failure on AMD processors because those addresses are reserved by HyperTransport (this is not documented).
Oh, but it's actually documented in the AMD IOMMU manual [0] (and AMD IOMMU in linux do mark it as a reserved IOVA region i.e. HT_RANGE_START..HT_RANGE_END). And it's usually marked as a reserved type in E820. At least on the machines I've seen.
See manual section '2.1.2 IOMMU Logical Topology':
"Special address controls in Table 3 are interpreted against untranslated guest physical addressess (GPA) that lack a PASID TLP prefix."
Base Address Top Address Use
FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK FD_F910_0000h FD_F91F_FFFFh System Management FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables FD_FB00_0000h FD_FBFF_FFFFh Address Translation FD_FC00_0000h FD_FDFF_FFFFh I/O Space FD_FE00_0000h FD_FFFF_FFFFh Configuration FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages FE_2000_0000h FF_FFFF_FFFFh Reserved
It covers the range starting that address you fixed up ... up to 1Tb, fwiw.
You mark it ~1010G as max gfn so shouldn't be a problem.
[0] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
Avoid selftests failures by reserving those guest physical addresses.
Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()") Cc: stable@vger.kernel.org Cc: David Matlack dmatlack@google.com Reported-by: Maxim Levitsky mlevitsk@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 10a8ed691c66..d995cc9836ee 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm) /* Limit physical addresses to PA-bits. */ vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; +#ifdef __x86_64__
- /* Avoid reserved HyperTransport region on AMD processors. */
- if (vm->pa_bits == 48)
vm->max_gfn = 0xfffcfffff;
+#endif
Not sure if it's worth the trouble having a macro with the same name as AMD iommu like:
#define HT_RANGE_START (0xfd00000000ULL) #define MAX_GFN (HT_RANGE_START - 1ULL)
#ifdef __x86_64__ /* Avoid reserved HyperTransport region on AMD processors. */ if (vm->pa_bits == 48) vm->max_gfn = MAX_GFN; #endif
I guess now that we know that it is documented, it is worth it, to remove '== 48' check and add check for an AMD cpu, and add reference to this manual.
I am mentioning the 48 bit check because I have seen that AMD just recently posted 5 level NPT support, so I guess CPUs which > 48 bit max physical address are also probably on horison.
And long term solution for this I guess is to add these areas to a blacklist and avoid them.
Best regards, Maxim Levitsky
It's a detail, but *perhaps* would help people grepping around it.
Also, not sure if checking against AMD cpuid vendor is worth, considering this is a limitation only on AMD.
/* Allocate and setup memory for guest. */ vm->vpages_mapped = sparsebit_alloc(); if (phy_pages != 0)
On 06/08/21 12:57, Joao Martins wrote:
Base Address Top Address Use
FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK FD_F910_0000h FD_F91F_FFFFh System Management FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables FD_FB00_0000h FD_FBFF_FFFFh Address Translation FD_FC00_0000h FD_FDFF_FFFFh I/O Space FD_FE00_0000h FD_FFFF_FFFFh Configuration FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages FE_2000_0000h FF_FFFF_FFFFh Reserved
The problems we're seeing are with FFFD_0000_0000h. How does the IOMMU interpret bits 40-47 of the address?
Paolo
On 8/9/21 10:03 AM, Paolo Bonzini wrote:
On 06/08/21 12:57, Joao Martins wrote:
Base Address Top Address Use
FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK FD_F910_0000h FD_F91F_FFFFh System Management FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables FD_FB00_0000h FD_FBFF_FFFFh Address Translation FD_FC00_0000h FD_FDFF_FFFFh I/O Space FD_FE00_0000h FD_FFFF_FFFFh Configuration FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages FE_2000_0000h FF_FFFF_FFFFh Reserved
The problems we're seeing are with FFFD_0000_0000h. How does the IOMMU interpret bits 40-47 of the address?
The IOMMU supposedly makes an invalid device request iommu fault event when touching the reserved interrupt address space. But that presumes performing DMA from/to device with said IOVA=fdXXXXXXXX (doesn't matter if it's guest or baremetal).
Oh wait. You asked fffd******** not 00fd******** which then the spec quote I gave doesn't document what you're aiming for. Ugh, sorry if I mislead you and Maxim. My mind eliminated the starting 0xff when reading the address given the similarity.
But I've seen that address range somewhere. This errata might help [0]? But anyways, it doesn't explain why how this is interpreted, just that it is 'reserved'. It doesn't look to be HyperTransport address range either as that would be the 1010..1023G range (close to 1T) as documented by the IOMMU manual, not the 256T boundary (fffd********). It's interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.
[0] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
1286 Spurious #GP May Occur When Hypervisor Running on Another Hypervisor
Description
The processor may incorrectly generate a #GP fault if a hypervisor running on a hypervisor attempts to access the following secure memory areas:
• The reserved memory address region starting at FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh. • ASEG and TSEG memory regions for SMM (System Management Mode) • MMIO APIC Space
Potential Effect on System
Software running a hypervisor on a hypervisor may encounter an unexpected #GP fault.
Suggested Workaround
If CPUID bit fn8000_000A EDX[28] = 0b, then: • Hypervisor software can trap #GP faults that potentially have this issue and ignore #GP faults that were erroneously generated. If CPUID bit fn8000_000A EDX[28] = 1b, then the issue has been fixed and no workaround is necessary. Fix Planned No fix planned
On 09/08/21 12:00, Joao Martins wrote:
[0]https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
1286 Spurious #GP May Occur When Hypervisor Running on Another Hypervisor
Description
The processor may incorrectly generate a #GP fault if a hypervisor running on a hypervisor attempts to access the following secure memory areas:
• The reserved memory address region starting at FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh. • ASEG and TSEG memory regions for SMM (System Management Mode) • MMIO APIC Space
This errata took a few months to debug so we're quite familiar with it :) but I only knew about the ASEG/TSEG/APIC cases.
So this HyperTransport region is not related to this issue, but the errata does point out that FFFD_0000_0000h and upwards is special in guests.
The Xen folks also had to deal with it only a couple months ago (https://yhbt.net/lore/all/1eb16baa-6b1b-3b18-c712-4459bd83e1aa@citrix.com/):
From "Open-Source Register Reference for AMD Family 17h Processors (PUB)": https://developer.amd.com/wp-content/resources/56255_3_03.PDF
"The processor defines a reserved memory address region starting at FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh."
It's still doesn't say that it's at the top of physical address space although I understand that's how it's now implemented. The official document doesn't confirm it will move along with physical address space extension.
[...]
1) On parts with <40 bits, its fully hidden from software 2) Before Fam17h, it was always 12G just below 1T, even if there was more RAM above this location 3) On Fam17h and later, it is variable based on SME, and is either just below 2^48 (no encryption) or 2^43 (encryption)
It's interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.
It's only been defined after AMD deemed that the errata was not fixable in current generation processors); it's X86_FEATURE_SVME_ADDR_CHK now.
I'll update the patch based on the findings from the Xen team.
Paolo
On Mon, Aug 09, 2021, Paolo Bonzini wrote:
So this HyperTransport region is not related to this issue, but the errata does point out that FFFD_0000_0000h and upwards is special in guests.
The Xen folks also had to deal with it only a couple months ago (https://yhbt.net/lore/all/1eb16baa-6b1b-3b18-c712-4459bd83e1aa@citrix.com/):
From "Open-Source Register Reference for AMD Family 17h Processors (PUB)": https://developer.amd.com/wp-content/resources/56255_3_03.PDF
"The processor defines a reserved memory address region starting at FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh."
It's still doesn't say that it's at the top of physical address space although I understand that's how it's now implemented. The official document doesn't confirm it will move along with physical address space extension.
[...]
- On parts with <40 bits, its fully hidden from software
- Before Fam17h, it was always 12G just below 1T, even if there was
more RAM above this location 3) On Fam17h and later, it is variable based on SME, and is either just below 2^48 (no encryption) or 2^43 (encryption)
It's interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.
It's only been defined after AMD deemed that the errata was not fixable in current generation processors); it's X86_FEATURE_SVME_ADDR_CHK now.
I'll update the patch based on the findings from the Xen team.
So, about that update... :-)
linux-stable-mirror@lists.linaro.org