On Thursday 28 April 2016 16:00:22 Maxim Kuvyrkov wrote:
This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).
The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.
The high-level problem is that JITs tend to use upper bits of addresses to encode various pieces of data, and that the number of available bits is shrinking due to VA size increasing. With the usual 42-bit VA (which is what most JITs assume) they have 22 bits to encode various performance-critical data. With 48-bit VA (e.g., ThunderX world) things start to get complicated, and JITs need to be non-trivially patched at the source level to continue working with less bits available for their performance-critical storage. With upcoming 52-bit VA things might get dire enough for some JITs to declare such configurations unsupported.
On the other hand, most JITs are not expected to requires terabytes of RAM and huge VA for their applications. Most JIT applications will happily live in 42-bit world with mere 4 terabytes of RAM that it provides. Therefore, what JITs need in the modern world is a way to make mmap() return addresses below a certain threshold, and error out with ENOMEM when "lower" memory is exhausted. This is very similar to ADDR_LIMIT_32BIT personality, but extended to common VA sizes on 64-bit systems: 39-bit, 42-bit, 48-bit, 52-bit, etc.
Since we do not want to penalize the whole system (using an artificially low-size VA), it would be best to have a way to enable VA limit on per-process basis (similar to ADDR_LIMIT_32BIT personality). If that's not possible -- then on per-container / cgroup basis. If that's not possible -- then on system level (similar to vm.mmap_min_addr, but from the other end).
Dear kernel people, what can be done to address the JITs need to reduce effective VA size?
Thanks for the summary, now it all makes much more sense.
One simple (from the kernel's perspective, not from the JIT) approach might be to always use MAP_FIXED whenever an allocation is made for memory that needs these special pointers, and then manage the available address space explicitly. Would that work, or do you require everything including the binary itself to be below the address?
Regarding which memory sizes are needed, my impression from your explanation is that a single personality flag (e.g. ADDR_LIMIT_42BIT) would be sufficient for the usecase, and you don't actually need to tie this to the architecture-provided virtual addressing limits at all. If it's only one such flag, we can probably find a way to fit it into the personality flags, though ironically we are actually running out of bits in there as well.
Arnd