This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).
The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.
The high-level problem is that JITs tend to use upper bits of addresses to encode various pieces of data, and that the number of available bits is shrinking due to VA size increasing. With the usual 42-bit VA (which is what most JITs assume) they have 22 bits to encode various performance-critical data. With 48-bit VA (e.g., ThunderX world) things start to get complicated, and JITs need to be non-trivially patched at the source level to continue working with less bits available for their performance-critical storage. With upcoming 52-bit VA things might get dire enough for some JITs to declare such configurations unsupported.
On the other hand, most JITs are not expected to requires terabytes of RAM and huge VA for their applications. Most JIT applications will happily live in 42-bit world with mere 4 terabytes of RAM that it provides. Therefore, what JITs need in the modern world is a way to make mmap() return addresses below a certain threshold, and error out with ENOMEM when "lower" memory is exhausted. This is very similar to ADDR_LIMIT_32BIT personality, but extended to common VA sizes on 64-bit systems: 39-bit, 42-bit, 48-bit, 52-bit, etc.
Since we do not want to penalize the whole system (using an artificially low-size VA), it would be best to have a way to enable VA limit on per-process basis (similar to ADDR_LIMIT_32BIT personality). If that's not possible -- then on per-container / cgroup basis. If that's not possible -- then on system level (similar to vm.mmap_min_addr, but from the other end).
Dear kernel people, what can be done to address the JITs need to reduce effective VA size?
-- Maxim Kuvyrkov www.linaro.org