On riscv, mmap currently returns an address from the largest address space that can fit entirely inside of the hint address. This makes it such that the hint address is almost never returned. This patch raises the mappable area up to and including the hint address. This allows mmap to often return the hint address, which allows a performance improvement over searching for a valid address as well as making the behavior more similar to other architectures.
Note that a previous patch introduced stronger semantics compared to other architectures for riscv mmap. On riscv, mmap will not use bits in the upper bits of the virtual address depending on the hint address. On other architectures, a random address is returned in the address space requested. On all architectures the hint address will be returned if it is available. This allows riscv applications to configure how many bits in the virtual address should be left empty. This has the two benefits of being able to request address spaces that are smaller than the default and doesn't require the application to know the page table layout of riscv.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com --- Changes in v2: - Add back forgotten "mmap_end = STACK_TOP_MAX" - Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@r...
--- Charlie Jenkins (3): riscv: mm: Use hint address in mmap if available selftests: riscv: Generalize mm selftests docs: riscv: Define behavior of mmap
Documentation/arch/riscv/vm-layout.rst | 16 ++-- arch/riscv/include/asm/processor.h | 22 +++--- tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +---- tools/testing/selftests/riscv/mm/mmap_default.c | 20 +---- tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++----------- 5 files changed, 67 insertions(+), 104 deletions(-) --- base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1
On riscv it is guaranteed that the address returned by mmap is less than the hint address. Allow mmap to return an address all the way up to addr, if provided, rather than just up to the lower address space.
This provides a performance benefit as well, allowing mmap to exit after checking that the address is in range rather than searching for a valid address.
It is possible to provide an address that uses at most the same number of bits, however it is significantly more computationally expensive to provide that number rather than setting the max to be the hint address. There is the instruction clz/clzw in Zbb that returns the highest set bit which could be used to performantly implement this, but it would still be slower than the current implementation. At worst case, half of the address would not be able to be allocated when a hint address is provided.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com --- arch/riscv/include/asm/processor.h | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index f19f861cda54..5d966ae81a58 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -22,14 +22,12 @@ ({ \ unsigned long mmap_end; \ typeof(addr) _addr = (addr); \ - if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && is_compat_task())) \ - mmap_end = STACK_TOP_MAX; \ - else if ((_addr) >= VA_USER_SV57) \ - mmap_end = STACK_TOP_MAX; \ - else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \ - mmap_end = VA_USER_SV48; \ + if ((_addr) == 0 || \ + (IS_ENABLED(CONFIG_COMPAT) && is_compat_task()) || \ + ((_addr + len) > BIT(VA_BITS - 1))) \ + mmap_end = STACK_TOP_MAX \ else \ - mmap_end = VA_USER_SV39; \ + mmap_end = (_addr + len); \ mmap_end; \ })
@@ -39,14 +37,12 @@ typeof(addr) _addr = (addr); \ typeof(base) _base = (base); \ unsigned long rnd_gap = DEFAULT_MAP_WINDOW - (_base); \ - if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && is_compat_task())) \ + if ((_addr) == 0 || \ + (IS_ENABLED(CONFIG_COMPAT) && is_compat_task()) || \ + ((_addr + len) > BIT(VA_BITS - 1))) \ mmap_base = (_base); \ - else if (((_addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) \ - mmap_base = VA_USER_SV57 - rnd_gap; \ - else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \ - mmap_base = VA_USER_SV48 - rnd_gap; \ else \ - mmap_base = VA_USER_SV39 - rnd_gap; \ + mmap_base = (_addr + len) - rnd_gap; \ mmap_base; \ })
On Tue, 2024-01-30 at 11:04 -0800, Charlie Jenkins wrote:
On riscv it is guaranteed that the address returned by mmap is less than the hint address. Allow mmap to return an address all the way up to addr, if provided, rather than just up to the lower address space.
This provides a performance benefit as well, allowing mmap to exit after checking that the address is in range rather than searching for a valid address.
It is possible to provide an address that uses at most the same number of bits, however it is significantly more computationally expensive to provide that number rather than setting the max to be the hint address. There is the instruction clz/clzw in Zbb that returns the highest set bit which could be used to performantly implement this, but it would still be slower than the current implementation. At worst case, half of the address would not be able to be allocated when a hint address is provided.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com
arch/riscv/include/asm/processor.h | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index f19f861cda54..5d966ae81a58 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -22,14 +22,12 @@ ({ \ unsigned long mmap_end; \ typeof(addr) _addr = (addr); \
- if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) &&
is_compat_task())) \
mmap_end = STACK_TOP_MAX; \
- else if ((_addr) >= VA_USER_SV57) \
mmap_end = STACK_TOP_MAX; \
- else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >=
VA_BITS_SV48)) \
mmap_end = VA_USER_SV48; \
- if ((_addr) == 0 || \
- (IS_ENABLED(CONFIG_COMPAT) && is_compat_task()) || \
- ((_addr + len) > BIT(VA_BITS -
1))) \
mmap_end = STACK_TOP_MAX \
else \
mmap_end = VA_USER_SV39; \
mmap_end = (_addr + len); \
mmap_end; \ }) @@ -39,14 +37,12 @@ typeof(addr) _addr = (addr); \ typeof(base) _base = (base); \ unsigned long rnd_gap = DEFAULT_MAP_WINDOW - (_base); \
- if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) &&
is_compat_task())) \
- if ((_addr) == 0 || \
- (IS_ENABLED(CONFIG_COMPAT) && is_compat_task()) || \
- ((_addr + len) > BIT(VA_BITS -
1))) \ mmap_base = (_base); \
- else if (((_addr) >= VA_USER_SV57) && (VA_BITS >=
VA_BITS_SV57)) \
mmap_base = VA_USER_SV57 - rnd_gap; \
- else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >=
VA_BITS_SV48)) \
mmap_base = VA_USER_SV48 - rnd_gap; \
else \
mmap_base = VA_USER_SV39 - rnd_gap; \
mmap_base = (_addr + len) - rnd_gap; \
Please mind that rnd_gap can be non-zero, in this case, the map will fail. It will be better to let mmap_base = min((_addr + len), (base) + TASK_SIZE - DEFAULT_MAP_WINDOW) .
mmap_base; \ })
On Wed, Jan 31, 2024 at 03:15:16AM +0800, Yangyu Chen wrote:
On Tue, 2024-01-30 at 11:04 -0800, Charlie Jenkins wrote:
On riscv it is guaranteed that the address returned by mmap is less than the hint address. Allow mmap to return an address all the way up to addr, if provided, rather than just up to the lower address space.
This provides a performance benefit as well, allowing mmap to exit after checking that the address is in range rather than searching for a valid address.
It is possible to provide an address that uses at most the same number of bits, however it is significantly more computationally expensive to provide that number rather than setting the max to be the hint address. There is the instruction clz/clzw in Zbb that returns the highest set bit which could be used to performantly implement this, but it would still be slower than the current implementation. At worst case, half of the address would not be able to be allocated when a hint address is provided.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com
arch/riscv/include/asm/processor.h | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index f19f861cda54..5d966ae81a58 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -22,14 +22,12 @@ ({ \ unsigned long mmap_end; \ typeof(addr) _addr = (addr); \
- if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) &&
is_compat_task())) \
mmap_end = STACK_TOP_MAX; \
- else if ((_addr) >= VA_USER_SV57) \
mmap_end = STACK_TOP_MAX; \
- else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >=
VA_BITS_SV48)) \
mmap_end = VA_USER_SV48; \
- if ((_addr) == 0 || \
- (IS_ENABLED(CONFIG_COMPAT) && is_compat_task()) || \
- ((_addr + len) > BIT(VA_BITS -
1))) \
mmap_end = STACK_TOP_MAX \
else \
mmap_end = VA_USER_SV39; \
mmap_end = (_addr + len); \
mmap_end; \ }) @@ -39,14 +37,12 @@ typeof(addr) _addr = (addr); \ typeof(base) _base = (base); \ unsigned long rnd_gap = DEFAULT_MAP_WINDOW - (_base); \
- if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) &&
is_compat_task())) \
- if ((_addr) == 0 || \
- (IS_ENABLED(CONFIG_COMPAT) && is_compat_task()) || \
- ((_addr + len) > BIT(VA_BITS -
1))) \ mmap_base = (_base); \
- else if (((_addr) >= VA_USER_SV57) && (VA_BITS >=
VA_BITS_SV57)) \
mmap_base = VA_USER_SV57 - rnd_gap; \
- else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >=
VA_BITS_SV48)) \
mmap_base = VA_USER_SV48 - rnd_gap; \
else \
mmap_base = VA_USER_SV39 - rnd_gap; \
mmap_base = (_addr + len) - rnd_gap; \
Please mind that rnd_gap can be non-zero, in this case, the map will fail. It will be better to let mmap_base = min((_addr + len), (base) + TASK_SIZE - DEFAULT_MAP_WINDOW) .
Why would an rnd_gap that is non-zero cause mmap to fail? mmap will fail if rnd_gap is greater than (_addr + len). That is expected and should be interpreted as no address was available in the requested address space.
mmap_base is used if the hint address is not available.
I do see that I fumbled the test cases in this patch so I will fix that in a v3.
- Charlie
mmap_base; \ })
The behavior of mmap on riscv is defined to not provide an address that uses more bits than the hint address, if provided. Make the tests reflect that.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com --- tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +---- tools/testing/selftests/riscv/mm/mmap_default.c | 20 +---- tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++----------- 3 files changed, 53 insertions(+), 80 deletions(-)
diff --git a/tools/testing/selftests/riscv/mm/mmap_bottomup.c b/tools/testing/selftests/riscv/mm/mmap_bottomup.c index 1757d19ca89b..bad8e854263d 100644 --- a/tools/testing/selftests/riscv/mm/mmap_bottomup.c +++ b/tools/testing/selftests/riscv/mm/mmap_bottomup.c @@ -8,27 +8,9 @@ TEST(infinite_rlimit) { // Only works on 64 bit #if __riscv_xlen == 64 - struct addresses mmap_addresses; - EXPECT_EQ(BOTTOM_UP, memory_layout());
- do_mmaps(&mmap_addresses); - - EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr); - - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint); - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr); - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr); - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr); - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr); - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr); - EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr); + TEST_MMAPS; #endif }
diff --git a/tools/testing/selftests/riscv/mm/mmap_default.c b/tools/testing/selftests/riscv/mm/mmap_default.c index c63c60b9397e..a3874778d795 100644 --- a/tools/testing/selftests/riscv/mm/mmap_default.c +++ b/tools/testing/selftests/riscv/mm/mmap_default.c @@ -8,27 +8,9 @@ TEST(default_rlimit) { // Only works on 64 bit #if __riscv_xlen == 64 - struct addresses mmap_addresses; - EXPECT_EQ(TOP_DOWN, memory_layout());
- do_mmaps(&mmap_addresses); - - EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr); - EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr); - - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint); - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr); - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr); - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr); - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr); - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr); - EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr); + TEST_MMAPS; #endif }
diff --git a/tools/testing/selftests/riscv/mm/mmap_test.h b/tools/testing/selftests/riscv/mm/mmap_test.h index 9b8434f62f57..93face2b3118 100644 --- a/tools/testing/selftests/riscv/mm/mmap_test.h +++ b/tools/testing/selftests/riscv/mm/mmap_test.h @@ -4,60 +4,69 @@ #include <sys/mman.h> #include <sys/resource.h> #include <stddef.h> +#include <strings.h> +#include "../../kselftest_harness.h"
#define TOP_DOWN 0 #define BOTTOM_UP 1
-struct addresses { - int *no_hint; - int *on_37_addr; - int *on_38_addr; - int *on_46_addr; - int *on_47_addr; - int *on_55_addr; - int *on_56_addr; +uint64_t random_addresses[] = { + 0x19764f0d73b3a9f0, 0x016049584cecef59, 0x3580bdd3562f4acd, + 0x1164219f20b17da0, 0x07d97fcb40ff2373, 0x76ec528921272ee7, + 0x4dd48c38a3de3f70, 0x2e11415055f6997d, 0x14b43334ac476c02, + 0x375a60795aff19f6, 0x47f3051725b8ee1a, 0x4e697cf240494a9f, + 0x456b59b5c2f9e9d1, 0x101724379d63cb96, 0x7fe9ad31619528c1, + 0x2f417247c495c2ea, 0x329a5a5b82943a5e, 0x06d7a9d6adcd3827, + 0x327b0b9ee37f62d5, 0x17c7b1851dfd9b76, 0x006ebb6456ec2cd9, + 0x00836cd14146a134, 0x00e5c4dcde7126db, 0x004c29feadf75753, + 0x00d8b20149ed930c, 0x00d71574c269387a, 0x0006ebe4a82acb7a, + 0x0016135df51f471b, 0x00758bdb55455160, 0x00d0bdd949b13b32, + 0x00ecea01e7c5f54b, 0x00e37b071b9948b1, 0x0011fdd00ff57ab3, + 0x00e407294b52f5ea, 0x00567748c200ed20, 0x000d073084651046, + 0x00ac896f4365463c, 0x00eb0d49a0b26216, 0x0066a2564a982a31, + 0x002e0d20237784ae, 0x0000554ff8a77a76, 0x00006ce07a54c012, + 0x000009570516d799, 0x00000954ca15b84d, 0x0000684f0d453379, + 0x00002ae5816302b5, 0x0000042403fb54bf, 0x00004bad7392bf30, + 0x00003e73bfa4b5e3, 0x00005442c29978e0, 0x00002803f11286b6, + 0x000073875d745fc6, 0x00007cede9cb8240, 0x000027df84cc6a4f, + 0x00006d7e0e74242a, 0x00004afd0b836e02, 0x000047d0e837cd82, + 0x00003b42405efeda, 0x00001531bafa4c95, 0x00007172cae34ac4, + 0x0000002732f06b2b, 0x00000012cbf8fd0b, 0x0000001fcc6af0e8, };
-static inline void do_mmaps(struct addresses *mmap_addresses) -{ - /* - * Place all of the hint addresses on the boundaries of mmap - * sv39, sv48, sv57 - * User addresses end at 1<<38, 1<<47, 1<<56 respectively - */ - void *on_37_bits = (void *)(1UL << 37); - void *on_38_bits = (void *)(1UL << 38); - void *on_46_bits = (void *)(1UL << 46); - void *on_47_bits = (void *)(1UL << 47); - void *on_55_bits = (void *)(1UL << 55); - void *on_56_bits = (void *)(1UL << 56);
- int prot = PROT_READ | PROT_WRITE; - int flags = MAP_PRIVATE | MAP_ANONYMOUS; +#define PROT (PROT_READ | PROT_WRITE) +#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS) + +/* mmap must return a value that doesn't use more bits than the hint address. */ +static inline unsigned long get_max_value(unsigned long input) +{ + unsigned long max_bit = (1UL << (ffsl(input) - 1));
- mmap_addresses->no_hint = - mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0); - mmap_addresses->on_37_addr = - mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0); - mmap_addresses->on_38_addr = - mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0); - mmap_addresses->on_46_addr = - mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0); - mmap_addresses->on_47_addr = - mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0); - mmap_addresses->on_55_addr = - mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0); - mmap_addresses->on_56_addr = - mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0); + return max_bit + (max_bit - 1); }
+#define TEST_MMAPS \ + ({ \ + void *mmap_addr; \ + for (int i = 0; i < ARRAY_SIZE(random_addresses); i++) { \ + mmap_addr = mmap((void *)random_addresses[i], \ + 5 * sizeof(int), PROT, FLAGS, 0, 0); \ + EXPECT_NE(MAP_FAILED, mmap_addr); \ + EXPECT_GE((void *)get_max_value(random_addresses[i]), \ + mmap_addr); \ + mmap_addr = mmap((void *)random_addresses[i], \ + 5 * sizeof(int), PROT, FLAGS, 0, 0); \ + EXPECT_NE(MAP_FAILED, mmap_addr); \ + EXPECT_GE((void *)get_max_value(random_addresses[i]), \ + mmap_addr); \ + } \ + }) + static inline int memory_layout(void) { - int prot = PROT_READ | PROT_WRITE; - int flags = MAP_PRIVATE | MAP_ANONYMOUS; - - void *value1 = mmap(NULL, sizeof(int), prot, flags, 0, 0); - void *value2 = mmap(NULL, sizeof(int), prot, flags, 0, 0); + void *value1 = mmap(NULL, sizeof(int), PROT, FLAGS, 0, 0); + void *value2 = mmap(NULL, sizeof(int), PROT, FLAGS, 0, 0);
return value2 > value1; }
Define mmap on riscv to not provide an address that uses more bits than the hint address, if provided.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com --- Documentation/arch/riscv/vm-layout.rst | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/Documentation/arch/riscv/vm-layout.rst b/Documentation/arch/riscv/vm-layout.rst index 69ff6da1dbf8..e476b4386bd9 100644 --- a/Documentation/arch/riscv/vm-layout.rst +++ b/Documentation/arch/riscv/vm-layout.rst @@ -144,14 +144,8 @@ passing 0 into the hint address parameter of mmap. On CPUs with an address space smaller than sv48, the CPU maximum supported address space will be the default.
Software can "opt-in" to receiving VAs from another VA space by providing -a hint address to mmap. A hint address passed to mmap will cause the largest -address space that fits entirely into the hint to be used, unless there is no -space left in the address space. If there is no space available in the requested -address space, an address in the next smallest available address space will be -returned. - -For example, in order to obtain 48-bit VA space, a hint address greater than -:code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace -ending at :code:`1 << 47` and the addresses beyond this are reserved for the -kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater -than or equal to :code:`1 << 56` must be provided. +a hint address to mmap. When a hint address is passed to mmap, the returned +address will never use more bits than the hint address. For example, if a hint +address of `1 << 40` is passed to mmap, a valid returned address will never use +bits 41 through 63. If no mappable addresses are available in that range, mmap +will return `MAP_FAILED`.
Hello:
This series was applied to riscv/linux.git (for-next) by Palmer Dabbelt palmer@rivosinc.com:
On Tue, 30 Jan 2024 11:04:29 -0800 you wrote:
On riscv, mmap currently returns an address from the largest address space that can fit entirely inside of the hint address. This makes it such that the hint address is almost never returned. This patch raises the mappable area up to and including the hint address. This allows mmap to often return the hint address, which allows a performance improvement over searching for a valid address as well as making the behavior more similar to other architectures.
[...]
Here is the summary with links: - [v2,1/3] riscv: mm: Use hint address in mmap if available (no matching commit) - [v2,2/3] selftests: riscv: Generalize mm selftests (no matching commit) - [v2,3/3] docs: riscv: Define behavior of mmap https://git.kernel.org/riscv/c/371a3c2055db
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org