Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc do not have this special mmap behavior on hint address. They all just make 48-bit / 47-bit virtual address space by default, and if a user space software wants to large virtual address space, it only need to specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not treat the hint address as the upper bound of the mmap address range. After this patch, the behavior of mmap will align with existing behavior on other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc. The user space software will no longer need to rewrite their code to fit with this special mmap behavior only on RISC-V.
Note: Charlie also created another series [4] to completely remove the arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address and size. However, this will cause programs like Go and Java, which need to store information in the higher bits of the pointer, to fail on Sv57 machines.
Changes in v3: - Rebase to newest master - Changes some information in cover letter after patchset [2] - Use patch [5] to patch selftests - Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994F...
Changes in v2: - correct arch_get_mmap_end and arch_get_mmap_base - Add description in documentation about mmap behavior on kernel v6.6-6.7. - Improve commit message and cover letter - Rebase to newest riscv/for-next branch - Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8BE...
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosinc... [2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65... [3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@M... [4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@ri... [5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@ri...
Charlie Jenkins (1): riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2): RISC-V: mm: not use hint addr as upper bound Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++---- arch/riscv/include/asm/processor.h | 20 ++---- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 5 files changed, 36 insertions(+), 98 deletions(-)
From: Charlie Jenkins charlie@rivosinc.com
The mmap behavior that restricts the addresses returned by mmap caused unexpected behavior, so get rid of the test cases that check that behavior.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com Fixes: 73d05262a2ca ("selftests: riscv: Generalize mm selftests") Signed-off-by: Yangyu Chen cyy@cyyself.name --- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 3 files changed, 71 deletions(-)
diff --git a/tools/testing/selftests/riscv/mm/mmap_bottomup.c b/tools/testing/selftests/riscv/mm/mmap_bottomup.c index 7f7d3eb8b9c9..f9ccae50349b 100644 --- a/tools/testing/selftests/riscv/mm/mmap_bottomup.c +++ b/tools/testing/selftests/riscv/mm/mmap_bottomup.c @@ -7,8 +7,6 @@ TEST(infinite_rlimit) { EXPECT_EQ(BOTTOM_UP, memory_layout()); - - TEST_MMAPS; }
TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/riscv/mm/mmap_default.c b/tools/testing/selftests/riscv/mm/mmap_default.c index 2ba3ec990006..3f53b6ecc326 100644 --- a/tools/testing/selftests/riscv/mm/mmap_default.c +++ b/tools/testing/selftests/riscv/mm/mmap_default.c @@ -7,8 +7,6 @@ TEST(default_rlimit) { EXPECT_EQ(TOP_DOWN, memory_layout()); - - TEST_MMAPS; }
TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/riscv/mm/mmap_test.h b/tools/testing/selftests/riscv/mm/mmap_test.h index 3b29ca3bb3d4..75918d15919f 100644 --- a/tools/testing/selftests/riscv/mm/mmap_test.h +++ b/tools/testing/selftests/riscv/mm/mmap_test.h @@ -10,76 +10,9 @@ #define TOP_DOWN 0 #define BOTTOM_UP 1
-#if __riscv_xlen == 64 -uint64_t random_addresses[] = { - 0x19764f0d73b3a9f0, 0x016049584cecef59, 0x3580bdd3562f4acd, - 0x1164219f20b17da0, 0x07d97fcb40ff2373, 0x76ec528921272ee7, - 0x4dd48c38a3de3f70, 0x2e11415055f6997d, 0x14b43334ac476c02, - 0x375a60795aff19f6, 0x47f3051725b8ee1a, 0x4e697cf240494a9f, - 0x456b59b5c2f9e9d1, 0x101724379d63cb96, 0x7fe9ad31619528c1, - 0x2f417247c495c2ea, 0x329a5a5b82943a5e, 0x06d7a9d6adcd3827, - 0x327b0b9ee37f62d5, 0x17c7b1851dfd9b76, 0x006ebb6456ec2cd9, - 0x00836cd14146a134, 0x00e5c4dcde7126db, 0x004c29feadf75753, - 0x00d8b20149ed930c, 0x00d71574c269387a, 0x0006ebe4a82acb7a, - 0x0016135df51f471b, 0x00758bdb55455160, 0x00d0bdd949b13b32, - 0x00ecea01e7c5f54b, 0x00e37b071b9948b1, 0x0011fdd00ff57ab3, - 0x00e407294b52f5ea, 0x00567748c200ed20, 0x000d073084651046, - 0x00ac896f4365463c, 0x00eb0d49a0b26216, 0x0066a2564a982a31, - 0x002e0d20237784ae, 0x0000554ff8a77a76, 0x00006ce07a54c012, - 0x000009570516d799, 0x00000954ca15b84d, 0x0000684f0d453379, - 0x00002ae5816302b5, 0x0000042403fb54bf, 0x00004bad7392bf30, - 0x00003e73bfa4b5e3, 0x00005442c29978e0, 0x00002803f11286b6, - 0x000073875d745fc6, 0x00007cede9cb8240, 0x000027df84cc6a4f, - 0x00006d7e0e74242a, 0x00004afd0b836e02, 0x000047d0e837cd82, - 0x00003b42405efeda, 0x00001531bafa4c95, 0x00007172cae34ac4, -}; -#else -uint32_t random_addresses[] = { - 0x8dc302e0, 0x929ab1e0, 0xb47683ba, 0xea519c73, 0xa19f1c90, 0xc49ba213, - 0x8f57c625, 0xadfe5137, 0x874d4d95, 0xaa20f09d, 0xcf21ebfc, 0xda7737f1, - 0xcedf392a, 0x83026c14, 0xccedca52, 0xc6ccf826, 0xe0cd9415, 0x997472ca, - 0xa21a44c1, 0xe82196f5, 0xa23fd66b, 0xc28d5590, 0xd009cdce, 0xcf0be646, - 0x8fc8c7ff, 0xe2a85984, 0xa3d3236b, 0x89a0619d, 0xc03db924, 0xb5d4cc1b, - 0xb96ee04c, 0xd191da48, 0xb432a000, 0xaa2bebbc, 0xa2fcb289, 0xb0cca89b, - 0xb0c18d6a, 0x88f58deb, 0xa4d42d1c, 0xe4d74e86, 0x99902b09, 0x8f786d31, - 0xbec5e381, 0x9a727e65, 0xa9a65040, 0xa880d789, 0x8f1b335e, 0xfc821c1e, - 0x97e34be4, 0xbbef84ed, 0xf447d197, 0xfd7ceee2, 0xe632348d, 0xee4590f4, - 0x958992a5, 0xd57e05d6, 0xfd240970, 0xc5b0dcff, 0xd96da2c2, 0xa7ae041d, -}; -#endif - -// Only works on 64 bit -#if __riscv_xlen == 64 #define PROT (PROT_READ | PROT_WRITE) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS)
-/* mmap must return a value that doesn't use more bits than the hint address. */ -static inline unsigned long get_max_value(unsigned long input) -{ - unsigned long max_bit = (1UL << (((sizeof(unsigned long) * 8) - 1 - - __builtin_clzl(input)))); - - return max_bit + (max_bit - 1); -} - -#define TEST_MMAPS \ - ({ \ - void *mmap_addr; \ - for (int i = 0; i < ARRAY_SIZE(random_addresses); i++) { \ - mmap_addr = mmap((void *)random_addresses[i], \ - 5 * sizeof(int), PROT, FLAGS, 0, 0); \ - EXPECT_NE(MAP_FAILED, mmap_addr); \ - EXPECT_GE((void *)get_max_value(random_addresses[i]), \ - mmap_addr); \ - mmap_addr = mmap((void *)random_addresses[i], \ - 5 * sizeof(int), PROT, FLAGS, 0, 0); \ - EXPECT_NE(MAP_FAILED, mmap_addr); \ - EXPECT_GE((void *)get_max_value(random_addresses[i]), \ - mmap_addr); \ - } \ - }) -#endif /* __riscv_xlen == 64 */ - static inline int memory_layout(void) { void *value1 = mmap(NULL, sizeof(int), PROT, FLAGS, 0, 0);
This patch reverted the meaning of the addr parameter in the mmap syscall change from the previous commit b5b4287accd7 ("riscv: mm: Use hint address in mmap if available") from patch[1] which treats hint addr + size as the upper bound of the mmap return address. Result in ENOMEM error caused when hint address + size is not big enough.
Thus, this patch makes the behavior of mmap syscall to align with x86, arm64, powerpc by only limiting the address space to DEFAULT_MAP_WINDOW which is defined as not larger than 47-bit. If a user program wants to use sv57 address space, it can use mmap with a hint address larger than BIT(47) as it is already documented in x86 and arm64.
[1] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65...
Signed-off-by: Yangyu Chen cyy@cyyself.name --- arch/riscv/include/asm/processor.h | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index 8702b8721a27..faf3e230ab24 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -13,22 +13,17 @@ #include <vdso/processor.h>
#include <asm/ptrace.h> +#include <asm/mman.h>
-/* - * addr is a hint to the maximum userspace address that mmap should provide, so - * this macro needs to return the largest address space available so that - * mmap_end < addr, being mmap_end the top of that address space. - * See Documentation/arch/riscv/vm-layout.rst for more details. - */ #define arch_get_mmap_end(addr, len, flags) \ ({ \ unsigned long mmap_end; \ typeof(addr) _addr = (addr); \ - if ((_addr) == 0 || is_compat_task() || \ - ((_addr + len) > BIT(VA_BITS - 1))) \ + if (((_addr + len) > DEFAULT_MAP_WINDOW) || \ + ((flags) & MAP_FIXED)) \ mmap_end = STACK_TOP_MAX; \ else \ - mmap_end = (_addr + len); \ + mmap_end = DEFAULT_MAP_WINDOW; \ mmap_end; \ })
@@ -38,11 +33,10 @@ typeof(addr) _addr = (addr); \ typeof(base) _base = (base); \ unsigned long rnd_gap = DEFAULT_MAP_WINDOW - (_base); \ - if ((_addr) == 0 || is_compat_task() || \ - ((_addr + len) > BIT(VA_BITS - 1))) \ - mmap_base = (_base); \ + if ((_addr + len) > DEFAULT_MAP_WINDOW) \ + mmap_base = (STACK_TOP_MAX - rnd_gap); \ else \ - mmap_base = (_addr + len) - rnd_gap; \ + mmap_base = (_base); \ mmap_base; \ })
The original documentation treated the hint address on mmap as the upper bound, since we have already removed this behavior, this document should be updated. Most of the content is copied from the corresponding feature in x86_64 with some modifications to align with the current kernel's behavior on RISC-V.
Signed-off-by: Yangyu Chen cyy@cyyself.name --- Documentation/arch/riscv/vm-layout.rst | 43 +++++++++++++++++--------- 1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/Documentation/arch/riscv/vm-layout.rst b/Documentation/arch/riscv/vm-layout.rst index 077b968dcc81..826d0a3f4cbf 100644 --- a/Documentation/arch/riscv/vm-layout.rst +++ b/Documentation/arch/riscv/vm-layout.rst @@ -136,17 +136,32 @@ RISC-V Linux Kernel SV57 __________________|____________|__________________|_________|____________________________________________________________
-Userspace VAs --------------------- -To maintain compatibility with software that relies on the VA space with a -maximum of 48 bits the kernel will, by default, return virtual addresses to -userspace from a 48-bit range (sv48). This default behavior is achieved by -passing 0 into the hint address parameter of mmap. On CPUs with an address space -smaller than sv48, the CPU maximum supported address space will be the default. - -Software can "opt-in" to receiving VAs from another VA space by providing -a hint address to mmap. When a hint address is passed to mmap, the returned -address will never use more bits than the hint address. For example, if a hint -address of `1 << 40` is passed to mmap, a valid returned address will never use -bits 41 through 63. If no mappable addresses are available in that range, mmap -will return `MAP_FAILED`. +User-space and large virtual address space +========================================== +On RISC-V, Sv57 paging enables 56-bit userspace virtual address space. Not all +user space is ready to handle wide addresses. It's known that at least some JIT +compilers use higher bits in pointers to encode their information. It collides +with valid pointers with Sv57 paging and leads to crashes. + +To mitigate this, we are not going to allocate virtual address space above +47-bit by default. + +But userspace can ask for allocation from full address space by specifying hint +address (with or without MAP_FIXED) above 47-bits, or hint address + size above +47-bits with MAP_FIXED. + +If hint address set above 47-bit, but MAP_FIXED is not specified, we try to look +for unmapped area by specified address. If it's already occupied, we look for +unmapped area in *full* address space, rather than from 47-bit window. + +A high hint address would only affect the allocation in question, but not any +future mmap()s. + +Specifying high hint address without MAP_FIXED on older kernel or on machine +without Sv57 paging support is safe. The hint will be treated as the upper bound +of the address space to search, but this was removed in the future version of +kernels. On machine without Sv57 paging support, the kernel will fall back to +allocation from the supported address space. + +This approach helps to easily make application's memory allocator aware about +large address space without manually tracking allocated virtual address space.
On Tue, 27 Aug 2024 01:05:15 PDT (-0700), cyy@cyyself.name wrote:
Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc do not have this special mmap behavior on hint address. They all just make 48-bit / 47-bit virtual address space by default, and if a user space software wants to large virtual address space, it only need to specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not treat the hint address as the upper bound of the mmap address range. After this patch, the behavior of mmap will align with existing behavior on other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc. The user space software will no longer need to rewrite their code to fit with this special mmap behavior only on RISC-V.
So it actually looks like we just screwed up the original version of this: the reason we went with the more complicated address splits were than we actually started with a defacto 39-bit page table uABI (ie 38-bit user VAs), and moving to even 48-bit page tables (ie, 47-bit user VAs) broke users (here's an ASAN bug, for example: https://github.com/google/android-riscv64/issues/64).
Unless I'm missing something, though, the code doesn't actually do that. I remember having that discussion at some point, but I must have forgotten to make sure it worked. As far as I can tell we've just moved to the 48-bit VAs by default, which breaks the whole point of doing the compatibilty stuff. Probably a good sign I need to pay more attention to this stuff.
So I'm not really sure what to do here: we can just copy the arm64 behavior at tell the other users that's just how things work, but then we're just pushing around breakages. At a certain point all we can really do with this hint stuff is push around problems, though, and at least if we copy arm64 then most of those problems get reported as bugs for us.
Note: Charlie also created another series [4] to completely remove the arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address and size. However, this will cause programs like Go and Java, which need to store information in the higher bits of the pointer, to fail on Sv57 machines.
Changes in v3:
- Rebase to newest master
- Changes some information in cover letter after patchset [2]
- Use patch [5] to patch selftests
- Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994F...
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8BE...
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosinc... [2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65... [3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@M... [4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@ri... [5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@ri...
Charlie Jenkins (1): riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2): RISC-V: mm: not use hint addr as upper bound Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++---- arch/riscv/include/asm/processor.h | 20 ++---- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 5 files changed, 36 insertions(+), 98 deletions(-)
On Tue, Aug 27, 2024 at 09:33:11AM -0700, Palmer Dabbelt wrote:
On Tue, 27 Aug 2024 01:05:15 PDT (-0700), cyy@cyyself.name wrote:
Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc do not have this special mmap behavior on hint address. They all just make 48-bit / 47-bit virtual address space by default, and if a user space software wants to large virtual address space, it only need to specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not treat the hint address as the upper bound of the mmap address range. After this patch, the behavior of mmap will align with existing behavior on other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc. The user space software will no longer need to rewrite their code to fit with this special mmap behavior only on RISC-V.
So it actually looks like we just screwed up the original version of this: the reason we went with the more complicated address splits were than we actually started with a defacto 39-bit page table uABI (ie 38-bit user VAs), and moving to even 48-bit page tables (ie, 47-bit user VAs) broke users (here's an ASAN bug, for example: https://github.com/google/android-riscv64/issues/64).
Unless I'm missing something, though, the code doesn't actually do that. I remember having that discussion at some point, but I must have forgotten to make sure it worked. As far as I can tell we've just moved to the 48-bit VAs by default, which breaks the whole point of doing the compatibilty stuff. Probably a good sign I need to pay more attention to this stuff.
So I'm not really sure what to do here: we can just copy the arm64 behavior at tell the other users that's just how things work, but then we're just pushing around breakages. At a certain point all we can really do with this hint stuff is push around problems, though, and at least if we copy arm64 then most of those problems get reported as bugs for us.
Relying on the hint address in any capacity will push around breakages is my perspective as well. I messed this up from the start. I believe the only way to have consistent behavior is to mark mmap relying on the hint address as a bug, and only rely on the hint address if a flag defines the behavior.
There is an awkward window of releases that will have this "buggy" behavior. However, since the mmap changes introduced a variety of userspace bugs it seems acceptable to revert to the previous behavior and to create a consistent path forward.
- Charlie
Note: Charlie also created another series [4] to completely remove the arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address and size. However, this will cause programs like Go and Java, which need to store information in the higher bits of the pointer, to fail on Sv57 machines.
Changes in v3:
- Rebase to newest master
- Changes some information in cover letter after patchset [2]
- Use patch [5] to patch selftests
- Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994F...
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8BE...
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosinc... [2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65... [3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@M... [4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@ri... [5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@ri...
Charlie Jenkins (1): riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2): RISC-V: mm: not use hint addr as upper bound Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++---- arch/riscv/include/asm/processor.h | 20 ++---- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 5 files changed, 36 insertions(+), 98 deletions(-)
On Aug 28, 2024, at 00:40, Charlie Jenkins charlie@rivosinc.com wrote:
On Tue, Aug 27, 2024 at 09:33:11AM -0700, Palmer Dabbelt wrote:
On Tue, 27 Aug 2024 01:05:15 PDT (-0700), cyy@cyyself.name wrote:
Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc do not have this special mmap behavior on hint address. They all just make 48-bit / 47-bit virtual address space by default, and if a user space software wants to large virtual address space, it only need to specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not treat the hint address as the upper bound of the mmap address range. After this patch, the behavior of mmap will align with existing behavior on other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc. The user space software will no longer need to rewrite their code to fit with this special mmap behavior only on RISC-V.
So it actually looks like we just screwed up the original version of this: the reason we went with the more complicated address splits were than we actually started with a defacto 39-bit page table uABI (ie 38-bit user VAs), and moving to even 48-bit page tables (ie, 47-bit user VAs) broke users (here's an ASAN bug, for example: https://github.com/google/android-riscv64/issues/64).
Unless I'm missing something, though, the code doesn't actually do that. I remember having that discussion at some point, but I must have forgotten to make sure it worked. As far as I can tell we've just moved to the 48-bit VAs by default, which breaks the whole point of doing the compatibilty stuff. Probably a good sign I need to pay more attention to this stuff.
So I'm not really sure what to do here: we can just copy the arm64 behavior at tell the other users that's just how things work, but then we're just pushing around breakages. At a certain point all we can really do with this hint stuff is push around problems, though, and at least if we copy arm64 then most of those problems get reported as bugs for us.
Relying on the hint address in any capacity will push around breakages is my perspective as well. I messed this up from the start. I believe the only way to have consistent behavior is to mark mmap relying on the hint address as a bug, and only rely on the hint address if a flag defines the behavior.
I agree with this. However, since we already have this behavior on x86 and aarch64 for quite a long time, to prevent breaking userspace, I think we can use this patch and then add a flag like MAP_VA_FULL to enable full va address in the future.
Thanks, Yangyu Chen
There is an awkward window of releases that will have this "buggy" behavior. However, since the mmap changes introduced a variety of userspace bugs it seems acceptable to revert to the previous behavior and to create a consistent path forward.
- Charlie
Note: Charlie also created another series [4] to completely remove the arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address and size. However, this will cause programs like Go and Java, which need to store information in the higher bits of the pointer, to fail on Sv57 machines.
Changes in v3:
- Rebase to newest master
- Changes some information in cover letter after patchset [2]
- Use patch [5] to patch selftests
- Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994F...
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8BE...
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosinc... [2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65... [3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@M... [4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@ri... [5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@ri...
Charlie Jenkins (1): riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2): RISC-V: mm: not use hint addr as upper bound Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++---- arch/riscv/include/asm/processor.h | 20 ++---- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 5 files changed, 36 insertions(+), 98 deletions(-)
On Wed, Aug 28, 2024 at 02:04:29AM +0800, Yangyu Chen wrote:
On Aug 28, 2024, at 00:40, Charlie Jenkins charlie@rivosinc.com wrote:
On Tue, Aug 27, 2024 at 09:33:11AM -0700, Palmer Dabbelt wrote:
On Tue, 27 Aug 2024 01:05:15 PDT (-0700), cyy@cyyself.name wrote:
Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc do not have this special mmap behavior on hint address. They all just make 48-bit / 47-bit virtual address space by default, and if a user space software wants to large virtual address space, it only need to specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not treat the hint address as the upper bound of the mmap address range. After this patch, the behavior of mmap will align with existing behavior on other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc. The user space software will no longer need to rewrite their code to fit with this special mmap behavior only on RISC-V.
So it actually looks like we just screwed up the original version of this: the reason we went with the more complicated address splits were than we actually started with a defacto 39-bit page table uABI (ie 38-bit user VAs), and moving to even 48-bit page tables (ie, 47-bit user VAs) broke users (here's an ASAN bug, for example: https://github.com/google/android-riscv64/issues/64).
Unless I'm missing something, though, the code doesn't actually do that. I remember having that discussion at some point, but I must have forgotten to make sure it worked. As far as I can tell we've just moved to the 48-bit VAs by default, which breaks the whole point of doing the compatibilty stuff. Probably a good sign I need to pay more attention to this stuff.
So I'm not really sure what to do here: we can just copy the arm64 behavior at tell the other users that's just how things work, but then we're just pushing around breakages. At a certain point all we can really do with this hint stuff is push around problems, though, and at least if we copy arm64 then most of those problems get reported as bugs for us.
Relying on the hint address in any capacity will push around breakages is my perspective as well. I messed this up from the start. I believe the only way to have consistent behavior is to mark mmap relying on the hint address as a bug, and only rely on the hint address if a flag defines the behavior.
I agree with this. However, since we already have this behavior on x86 and aarch64 for quite a long time, to prevent breaking userspace, I think we can use this patch and then add a flag like MAP_VA_FULL to enable full va address in the future.
Since riscv is not x86 or aarch64, we should be able to make decisions that are best for riscv regardless of if it is identical to how it is implemented in x86 or aarch64.
- Charlie
Thanks, Yangyu Chen
There is an awkward window of releases that will have this "buggy" behavior. However, since the mmap changes introduced a variety of userspace bugs it seems acceptable to revert to the previous behavior and to create a consistent path forward.
- Charlie
Note: Charlie also created another series [4] to completely remove the arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address and size. However, this will cause programs like Go and Java, which need to store information in the higher bits of the pointer, to fail on Sv57 machines.
Changes in v3:
- Rebase to newest master
- Changes some information in cover letter after patchset [2]
- Use patch [5] to patch selftests
- Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994F...
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8BE...
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosinc... [2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65... [3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@M... [4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@ri... [5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@ri...
Charlie Jenkins (1): riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2): RISC-V: mm: not use hint addr as upper bound Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++---- arch/riscv/include/asm/processor.h | 20 ++---- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 5 files changed, 36 insertions(+), 98 deletions(-)
On Aug 28, 2024, at 00:33, Palmer Dabbelt palmer@rivosinc.com wrote:
On Tue, 27 Aug 2024 01:05:15 PDT (-0700), cyy@cyyself.name wrote:
Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc do not have this special mmap behavior on hint address. They all just make 48-bit / 47-bit virtual address space by default, and if a user space software wants to large virtual address space, it only need to specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not treat the hint address as the upper bound of the mmap address range. After this patch, the behavior of mmap will align with existing behavior on other ISAs with larger than 48-bit virtual address space like x86, arm64, and powerpc. The user space software will no longer need to rewrite their code to fit with this special mmap behavior only on RISC-V.
So it actually looks like we just screwed up the original version of this: the reason we went with the more complicated address splits were than we actually started with a defacto 39-bit page table uABI (ie 38-bit user VAs), and moving to even 48-bit page tables (ie, 47-bit user VAs) broke users (here's an ASAN bug, for example: https://github.com/google/android-riscv64/issues/64). Unless I'm missing something, though, the code doesn't actually do that. I remember having that discussion at some point, but I must have forgotten to make sure it worked. As far as I can tell we've just moved to the 48-bit VAs by default, which breaks the whole point of doing the compatibilty stuff. Probably a good sign I need to pay more attention to this stuff.
It seems the issues have been solved in LLVM D139823 [1] and LLVM D152895 [2].
[1] https://reviews.llvm.org/D139823 [2] https://reviews.llvm.org/D152895
So I'm not really sure what to do here: we can just copy the arm64 behavior at tell the other users that's just how things work, but then we're just pushing around breakages. At a certain point all we can really do with this hint stuff is push around problems, though, and at least if we copy arm64 then most of those problems get reported as bugs for us.
Note: Charlie also created another series [4] to completely remove the arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address and size. However, this will cause programs like Go and Java, which need to store information in the higher bits of the pointer, to fail on Sv57 machines.
Changes in v3:
- Rebase to newest master
- Changes some information in cover letter after patchset [2]
- Use patch [5] to patch selftests
- Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994F...
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8BE...
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosinc... [2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a65... [3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@M... [4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@ri... [5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@ri...
Charlie Jenkins (1): riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2): RISC-V: mm: not use hint addr as upper bound Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++---- arch/riscv/include/asm/processor.h | 20 ++---- .../selftests/riscv/mm/mmap_bottomup.c | 2 - .../testing/selftests/riscv/mm/mmap_default.c | 2 - tools/testing/selftests/riscv/mm/mmap_test.h | 67 ------------------- 5 files changed, 36 insertions(+), 98 deletions(-)
Hello:
This series was applied to riscv/linux.git (fixes) by Palmer Dabbelt palmer@rivosinc.com:
On Tue, 27 Aug 2024 16:05:15 +0800 you wrote:
Previous patch series[1][2] changes a mmap behavior that treats the hint address as the upper bound of the mmap address range. The motivation of the previous patch series is that some user space software may assume 48-bit address space and use higher bits to encode some information, which may collide with large virtual address space mmap may return. However, to make sv48 by default, we don't need to change the meaning of the hint address on mmap as the upper bound of the mmap address range. This behavior breaks some user space software like Chromium that gets ENOMEM error when the hint address + size is not big enough, as specified in [3].
[...]
Here is the summary with links: - [v3,1/3] riscv: selftests: Remove mmap hint address checks https://git.kernel.org/riscv/c/83dae72ac038 - [v3,2/3] RISC-V: mm: not use hint addr as upper bound (no matching commit) - [v3,3/3] Documentation: riscv: correct sv57 kernel behavior (no matching commit)
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org