A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: stable@vger.kernel.org # v4.14 ---
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
--- arch/x86/kernel/e820.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index c5399e80c59c..d320d37d0f95 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
void __init e820__memblock_setup(void) { + u64 size, end, not_addressable = 0; int i; - u64 end;
/* * The bootstrap memblock region count maximum is 128 entries @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
- memblock_add(entry->addr, entry->size); + if (entry->addr >= MAXMEM) { + not_addressable += entry->size; + continue; + } + + end = min_t(u64, end, MAXMEM - 1); + size = end - entry->addr; + not_addressable += entry->size - size; + memblock_add(entry->addr, size); + } + + if (not_addressable) { + pr_err("%lldGB of physical memory is not addressable in the paging mode\n", + not_addressable >> 30); + if (!pgtable_l5_enabled()) + pr_err("Consider enabling 5-level paging\n"); }
/* Throw away partial pages: */
On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: stable@vger.kernel.org # v4.14
Gentle ping.
It's not urgent, but it's a bug fix. Please consider applying.
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
arch/x86/kernel/e820.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index c5399e80c59c..d320d37d0f95 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void) void __init e820__memblock_setup(void) {
- u64 size, end, not_addressable = 0; int i;
- u64 end;
/* * The bootstrap memblock region count maximum is 128 entries @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
memblock_add(entry->addr, entry->size);
if (entry->addr >= MAXMEM) {
not_addressable += entry->size;
continue;
}
end = min_t(u64, end, MAXMEM - 1);
size = end - entry->addr;
not_addressable += entry->size - size;
memblock_add(entry->addr, size);
- }
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
}pr_err("Consider enabling 5-level paging\n");
/* Throw away partial pages: */ -- 2.26.2
On Mon, May 25, 2020 at 07:49:02AM +0300, Kirill A. Shutemov wrote:
On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: stable@vger.kernel.org # v4.14
Gentle ping.
It's not urgent, but it's a bug fix. Please consider applying.
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
arch/x86/kernel/e820.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index c5399e80c59c..d320d37d0f95 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void) void __init e820__memblock_setup(void) {
- u64 size, end, not_addressable = 0; int i;
- u64 end;
/* * The bootstrap memblock region count maximum is 128 entries @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
memblock_add(entry->addr, entry->size);
if (entry->addr >= MAXMEM) {
not_addressable += entry->size;
continue;
}
end = min_t(u64, end, MAXMEM - 1);
size = end - entry->addr;
not_addressable += entry->size - size;
memblock_add(entry->addr, size);
- }
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
} /* Throw away partial pages: */ -- 2.26.2
-- Kirill A. Shutemov
On Mon, May 25, 2020 at 05:59:43PM +0300, Mike Rapoport wrote:
On Mon, May 25, 2020 at 07:49:02AM +0300, Kirill A. Shutemov wrote:
On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: stable@vger.kernel.org # v4.14
Gentle ping.
It's not urgent, but it's a bug fix. Please consider applying.
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
arch/x86/kernel/e820.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index c5399e80c59c..d320d37d0f95 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void) void __init e820__memblock_setup(void) {
- u64 size, end, not_addressable = 0; int i;
- u64 end;
/* * The bootstrap memblock region count maximum is 128 entries @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
memblock_add(entry->addr, entry->size);
if (entry->addr >= MAXMEM) {
not_addressable += entry->size;
continue;
}
end = min_t(u64, end, MAXMEM - 1);
size = end - entry->addr;
not_addressable += entry->size - size;
memblock_add(entry->addr, size);
- }
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
It's future-profing. Who knows what paging modes we would have in the future.
On Mon, May 25, 2020 at 06:08:20PM +0300, Kirill A. Shutemov wrote:
On Mon, May 25, 2020 at 05:59:43PM +0300, Mike Rapoport wrote:
On Mon, May 25, 2020 at 07:49:02AM +0300, Kirill A. Shutemov wrote:
On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@intel.com Cc: stable@vger.kernel.org # v4.14
Gentle ping.
It's not urgent, but it's a bug fix. Please consider applying.
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
arch/x86/kernel/e820.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index c5399e80c59c..d320d37d0f95 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void) void __init e820__memblock_setup(void) {
- u64 size, end, not_addressable = 0; int i;
- u64 end;
/* * The bootstrap memblock region count maximum is 128 entries @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
memblock_add(entry->addr, entry->size);
if (entry->addr >= MAXMEM) {
not_addressable += entry->size;
continue;
}
end = min_t(u64, end, MAXMEM - 1);
size = end - entry->addr;
not_addressable += entry->size - size;
memblock_add(entry->addr, size);
- }
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
It's future-profing. Who knows what paging modes we would have in the future.
Than maybe
pr_err("%lldGB of physical memory is not addressable in %s the paging mode\n", not_addressable >> 30, pgtable_l5_enabled() "5-level" ? "4-level");
"the paging mode" on its own sounds a bit awkward to me.
-- Kirill A. Shutemov
On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
It's future-profing. Who knows what paging modes we would have in the future.
Future-proofing and firmware-proofing. :)
In any case, are we *really* limited to 52 bits of physical memory with 5-level paging? Previously, we said we were limited to 46 bits, and now we're saying that the limit is 52 with 5-level paging:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
The 46 was fine with the 48 bits of address space on 4-level paging systems since we need 1/2 of the address space for userspace, 1/4 for the direct map and 1/4 for the vmalloc-and-friends area. At 46 bits of address space, we fill up the direct map.
The hardware designers know this and never enumerated a MAXPHYADDR from CPUID which was higher than what we could cover with 46 bits. It was nice and convenient that these two separate things matched: 1. The amount of physical address space addressable in a direct map consuming 1/4 of the virtual address space. 2. The CPU-enumerated MAXPHYADDR which among other things dictates how much physical address space is addressable in a PTE.
But, with 5-level paging, things are a little different. The limit in addressable memory because of running out of the direct map actually happens at 55 bits (57-2=55, analogous to the 4-level 48-2=46).
So shouldn't it technically be this:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)
?
On Tue, May 26, 2020 at 07:27:15AM -0700, Dave Hansen wrote:
On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
It's future-profing. Who knows what paging modes we would have in the future.
Future-proofing and firmware-proofing. :)
In any case, are we *really* limited to 52 bits of physical memory with 5-level paging?
Yes. It's architectural. SDM says "MAXPHYADDR is at most 52" (Vol 3A, 4.1.4).
I guess it can be extended with an opt-in feature and relevant changes to page table structure. But as of today there's no such thing.
Previously, we said we were limited to 46 bits, and now we're saying that the limit is 52 with 5-level paging:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
The 46 was fine with the 48 bits of address space on 4-level paging systems since we need 1/2 of the address space for userspace, 1/4 for the direct map and 1/4 for the vmalloc-and-friends area. At 46 bits of address space, we fill up the direct map.
The hardware designers know this and never enumerated a MAXPHYADDR from CPUID which was higher than what we could cover with 46 bits. It was nice and convenient that these two separate things matched:
- The amount of physical address space addressable in a direct map consuming 1/4 of the virtual address space.
- The CPU-enumerated MAXPHYADDR which among other things dictates how much physical address space is addressable in a PTE.
But, with 5-level paging, things are a little different. The limit in addressable memory because of running out of the direct map actually happens at 55 bits (57-2=55, analogous to the 4-level 48-2=46).
So shouldn't it technically be this:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)
?
Bits above 52 are ignored in the page table entries and accessible to software. Some of them got claimed by HW features (XD-bit, protection keys), but such features require explicit opt-in on software side.
Kernel could claim bits 53-55 for the physical address, but it doesn't get us anything: if future HW would provide such feature it would require opt-in. On other hand claiming them now means we cannot use them for other purposes as SW bit. I don't see a point.
On 6/2/20 4:18 PM, Kirill A. Shutemov wrote:
On Tue, May 26, 2020 at 07:27:15AM -0700, Dave Hansen wrote:
On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
It's future-profing. Who knows what paging modes we would have in the future.
Future-proofing and firmware-proofing. :)
In any case, are we *really* limited to 52 bits of physical memory with 5-level paging?
Yes. It's architectural. SDM says "MAXPHYADDR is at most 52" (Vol 3A, 4.1.4).
Right you are.
I'm glad it's in the architecture. Makes all of this a lot easier!
So shouldn't it technically be this:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)
?
Bits above 52 are ignored in the page table entries and accessible to software. Some of them got claimed by HW features (XD-bit, protection keys), but such features require explicit opt-in on software side.
Kernel could claim bits 53-55 for the physical address, but it doesn't get us anything: if future HW would provide such feature it would require opt-in. On other hand claiming them now means we cannot use them for other purposes as SW bit. I don't see a point.
Yep, agreed.
linux-stable-mirror@lists.linaro.org