On Fri, Sep 06, 2024 at 10:52:34AM +0100, Lorenzo Stoakes wrote:
(Sorry having issues with my IPv6 setup that duplicated the original email...
On Fri, Sep 06, 2024 at 09:14:08AM GMT, Arnd Bergmann wrote:
On Fri, Sep 6, 2024, at 08:14, Lorenzo Stoakes wrote:
On Fri, Sep 06, 2024 at 07:17:44AM GMT, Arnd Bergmann wrote:
On Thu, Sep 5, 2024, at 21:15, Charlie Jenkins wrote:
Create a personality flag ADDR_LIMIT_47BIT to support applications that wish to transition from running in environments that support at most 47-bit VAs to environments that support larger VAs. This personality can be set to cause all allocations to be below the 47-bit boundary. Using MAP_FIXED with mmap() will bypass this restriction.
Signed-off-by: Charlie Jenkins charlie@rivosinc.com
I think having an architecture-independent mechanism to limit the size of the 64-bit address space is useful in general, and we've discussed the same thing for arm64 in the past, though we have not actually reached an agreement on the ABI previously.
The thread on the original proposals attests to this being rather a fraught topic, and I think the weight of opinion was more so in favour of opt-in rather than opt-out.
You mean opt-in to using the larger addresses like we do on arm64 and powerpc, while "opt-out" means a limit as Charlie suggested?
I guess I'm not using brilliant terminology here haha!
To clarify - the weight of opinion was for a situation where the address space is limited, except if you set a hint above that (you could call that opt-out or opt-in depending which way you look at it, so yeah ok very unclear sorry!).
It was against the MAP_ flag and also I think a _flexible_ per-process limit is also questionable as you might end up setting a limit which breaks something else, and this starts getting messy quick.
To be clear, the ADDR_LIMIT_47BIT suggestion is absolutely a compromise and practical suggestion.
@@ -22,6 +22,7 @@ enum { WHOLE_SECONDS = 0x2000000, STICKY_TIMEOUTS = 0x4000000, ADDR_LIMIT_3GB = 0x8000000,
- ADDR_LIMIT_47BIT = 0x10000000,
};
I'm a bit worried about having this done specifically in the personality flag bits, as they are rather limited. We obviously don't want to add many more such flags when there could be a way to just set the default limit.
Since I'm the one who suggested it, I feel I should offer some kind of vague defence here :)
We shouldn't let perfect be the enemy of the good. This is a relatively straightforward means of achieving the aim (assuming your concern about arch_get_mmap_end() below isn't a blocker) which has the least impact on existing code.
Of course we can end up in absurdities where we start doing ADDR_LIMIT_xxBIT... but again - it's simple, shouldn't represent an egregious maintenance burden and is entirely opt-in so has things going for it.
I'm more confused now, I think most importantly we should try to handle this consistently across all architectures. The proposed implementation seems to completely block addresses above BIT(47) even for applications that opt in by calling mmap(BIT(47), ...), which seems to break the existing applications.
Hm, I thought the commit message suggested the hint overrides it still?
The intent is to optionally be able to run a process that keeps higher bits free for tagging and to be sure no memory mapping in the process will clobber these (correct me if I'm wrong Charlie! :)
So you really wouldn't want this if you are using tagged pointers, you'd want to be sure literally nothing touches the higher bits.
Various architectures handle the hint address differently, but it appears that the only case across any architecture where an address above 47 bits will be returned is if the application had a hint address with a value greater than 47 bits and was using the MAP_FIXED flag. MAP_FIXED bypasses all other checks so I was assuming that it would be logical for MAP_FIXED to bypass this as well. If MAP_FIXED is not set, then the intent is for no hint address to cause a value greater than 47 bits to be returned.
This does have the issue that if MAP_FIXED is used then an address can be returned above 47-bits, but if an application does not want addresses above 47-bits then they shouldn't ask for a fixed address above that range.
If we want this flag for RISC-V and also keep the behavior of defaulting to >BIT(47) addresses for mmap(0, ...) how about changing arch_get_mmap_end() to return the limit based on ADDR_LIMIT_47BIT and then make this default to enabled on arm64 and powerpc but disabled on riscv?
But you wouldn't necessarily want all processes to be so restricted, I think this is what Charlie's trying to avoid :)
On the ohter hand - I'm not sure there are many processes on any arch that'd want the higher mappings.
So that'd push us again towards risc v just limiting to 48-bits and only mapping above this if a hint is provided like x86-64 does (and as you mentioned via irc - it seems risc v is an outlier in that DEFAULT_MAP_WINDOW == TASK_SIZE).
This would be more consistent vs. other arches.
Yes riscv is an outlier here. The reason I am pushing for something like a flag to restrict the address space rather than setting it to be the default is it seems like if applications are relying on upper bits to be free, then they should be explicitly asking the kernel to keep them free rather than assuming them to be free.
It's also unclear to me how we want this flag to interact with the existing logic in arch_get_mmap_end(), which attempts to limit the default mapping to a 47-bit address space already.
How does ADDR_LIMIT_3GB presently interact with that?
That is x86 specific and only relevant to compat tasks, limiting them to 3 instead of 4 GB. There is also ADDR_LIMIT_32BIT, which on arm32 is always set in practice to allow 32-bit addressing as opposed to ARMv2 style 26-bit addressing (IIRC ARMv3 supported both 26-bit and 32-bit addressing, while ARMv4 through ARMv7 are 32-bit only.
OK, I understand what it's for, I missed it was arch-specific bit, urgh.
I'd say this limit should be min of the arch-specific limit vs. the 48-bit limit. If you have a 36-bit address space obviously it'd be rather unwise to try to provide 48 bit addresses..
In this patch I set the high limit to be the minimum of the provided high limit and 47 bits so I think that should cover this case?
- Charlie
Arnd