JITs and 52-bit VA

List overview All Threads
Download

newer

older

Re: JITs and 52-bit VA

[PATCH PM-QA v3] Fixed array...

Maxim Kuvyrkov

28 Apr 2016 28 Apr '16

1 p.m.

This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).

The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.

The high-level problem is that JITs tend to use upper bits of addresses to encode various pieces of data, and that the number of available bits is shrinking due to VA size increasing. With the usual 42-bit VA (which is what most JITs assume) they have 22 bits to encode various performance-critical data. With 48-bit VA (e.g., ThunderX world) things start to get complicated, and JITs need to be non-trivially patched at the source level to continue working with less bits available for their performance-critical storage. With upcoming 52-bit VA things might get dire enough for some JITs to declare such configurations unsupported.

On the other hand, most JITs are not expected to requires terabytes of RAM and huge VA for their applications. Most JIT applications will happily live in 42-bit world with mere 4 terabytes of RAM that it provides. Therefore, what JITs need in the modern world is a way to make mmap() return addresses below a certain threshold, and error out with ENOMEM when "lower" memory is exhausted. This is very similar to ADDR_LIMIT_32BIT personality, but extended to common VA sizes on 64-bit systems: 39-bit, 42-bit, 48-bit, 52-bit, etc.

Since we do not want to penalize the whole system (using an artificially low-size VA), it would be best to have a way to enable VA limit on per-process basis (similar to ADDR_LIMIT_32BIT personality). If that's not possible -- then on per-container / cgroup basis. If that's not possible -- then on system level (similar to vm.mmap_min_addr, but from the other end).

Dear kernel people, what can be done to address the JITs need to reduce effective VA size?

-- Maxim Kuvyrkov www.linaro.org

Show replies by date

Arnd Bergmann

28 Apr 28 Apr

1:17 p.m.

On Thursday 28 April 2016 16:00:22 Maxim Kuvyrkov wrote:

...

This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).

The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.

The high-level problem is that JITs tend to use upper bits of addresses to encode various pieces of data, and that the number of available bits is shrinking due to VA size increasing. With the usual 42-bit VA (which is what most JITs assume) they have 22 bits to encode various performance-critical data. With 48-bit VA (e.g., ThunderX world) things start to get complicated, and JITs need to be non-trivially patched at the source level to continue working with less bits available for their performance-critical storage. With upcoming 52-bit VA things might get dire enough for some JITs to declare such configurations unsupported.

On the other hand, most JITs are not expected to requires terabytes of RAM and huge VA for their applications. Most JIT applications will happily live in 42-bit world with mere 4 terabytes of RAM that it provides. Therefore, what JITs need in the modern world is a way to make mmap() return addresses below a certain threshold, and error out with ENOMEM when "lower" memory is exhausted. This is very similar to ADDR_LIMIT_32BIT personality, but extended to common VA sizes on 64-bit systems: 39-bit, 42-bit, 48-bit, 52-bit, etc.

Since we do not want to penalize the whole system (using an artificially low-size VA), it would be best to have a way to enable VA limit on per-process basis (similar to ADDR_LIMIT_32BIT personality). If that's not possible -- then on per-container / cgroup basis. If that's not possible -- then on system level (similar to vm.mmap_min_addr, but from the other end).

Dear kernel people, what can be done to address the JITs need to reduce effective VA size?

Thanks for the summary, now it all makes much more sense.

One simple (from the kernel's perspective, not from the JIT) approach might be to always use MAP_FIXED whenever an allocation is made for memory that needs these special pointers, and then manage the available address space explicitly. Would that work, or do you require everything including the binary itself to be below the address?

Regarding which memory sizes are needed, my impression from your explanation is that a single personality flag (e.g. ADDR_LIMIT_42BIT) would be sufficient for the usecase, and you don't actually need to tie this to the architecture-provided virtual addressing limits at all. If it's only one such flag, we can probably find a way to fit it into the personality flags, though ironically we are actually running out of bits in there as well.

Arnd

Peter Maydell

1:24 p.m.

On 28 April 2016 at 14:17, Arnd Bergmann arnd@arndb.de wrote:

...

One simple (from the kernel's perspective, not from the JIT) approach might be to always use MAP_FIXED whenever an allocation is made for memory that needs these special pointers, and then manage the available address space explicitly. Would that work, or do you require everything including the binary itself to be below the address?

The trouble IME with this idea is that in practice you're linking with glibc, which means glibc is managing (and using) the address space, not the JIT. So MAP_FIXED is pretty awkward to use.

thanks -- PMM

Steve Capper

7:27 p.m.

On 28 April 2016 at 14:24, Peter Maydell peter.maydell@linaro.org wrote:

...

On 28 April 2016 at 14:17, Arnd Bergmann arnd@arndb.de wrote:

...
One simple (from the kernel's perspective, not from the JIT) approach might be to always use MAP_FIXED whenever an allocation is made for memory that needs these special pointers, and then manage the available address space explicitly. Would that work, or do you require everything including the binary itself to be below the address?

The trouble IME with this idea is that in practice you're linking with glibc, which means glibc is managing (and using) the address space, not the JIT. So MAP_FIXED is pretty awkward to use.

thanks -- PMM

Hi,

One can find holes in the VA space by examining /proc/self/maps, thus selection of pointers for MAP_FIXED can be deduced.

The other problem is, as Arnd alluded to, if a JIT'ed object needs to then refer to something allocated outside of the JIT. This could be remedied by another level of indirection/trampoline.

Taking two steps back though, I would view VA space squeezing as a stop-gap before removing tags from the upper bits of a pointer altogether (tagging the bottom bits, by controlling alignment is perfectly safe). The larger the VA space, the more scope mechanisms such as Address Space Layout Randomisation have to improve security.

Cheers, -- Steve

Edward Nevill

1:53 p.m.

FWIW: OpenJDK assumes 48 bit virtual address. There is no inherent reason for this other than we do

movz/movk/movk

to form an address. It is relatively trivial to change this to

movz/movk/movk/movk

All the best, Ed.

On 28 April 2016 at 14:00, Maxim Kuvyrkov maxim.kuvyrkov@linaro.org wrote:

...

This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).

The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.

Adhemerval Zanella

3:07 p.m.

I do not think this issue is inherent to all JIT implements, but rather to luajit with its NaN-tagging scheme [1] which packs different types of objects in a 8-byte. It works well with x86_64 that limits the VMA to 47-bits, but things get messy with large VMA support. Luajit work around this issue by changing its internal block allocator [2] to basically limit mmap allocation to 47 bits. Basically it tries fixed mmap with random hint address until an allocation returns an address within 47-bits. It is far from the idea solution and it might break with different scenarios (fragmented or exausted vma space).

Another project that shows some limitation with different VMA sizes is the llvm sanitizers: for each VMA type it must use a different scheme to direct map the segments to shadow memory. It works on 39 and 42 VMAs, but with some tradeoffs: it either limits the total of shadow memory to a lower bound (asan that sets to maximum of 39-bits), or add performance cost to address translation (msan and tsan) by checking the vma and applying the correct transformation.

I see adding a personality flag could work, but it has the problem of using another flag and limiting the scheme to a narrow set of VMA (I do nothing we could add 2 flags, 39 and 42). I still see that limiting it by using cgroups a better strategy and might also help on testing on userland size (by using 48-bit kernels and setting vma to 39 and 42).

[1] http://lua-users.org/lists/lua-l/2009-11/msg00089.html [2] https://github.com/LuaJIT/LuaJIT/commit/0c6fdc1039a3a4450d366fba7af4b29de73f...

On 28/04/2016 10:53, Edward Nevill wrote:

...

FWIW: OpenJDK assumes 48 bit virtual address. There is no inherent reason for this other than we do

movz/movk/movk

to form an address. It is relatively trivial to change this to

movz/movk/movk/movk

All the best, Ed.

On 28 April 2016 at 14:00, Maxim Kuvyrkov maxim.kuvyrkov@linaro.org wrote:

...
This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).

The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.

Wookey

9:41 p.m.

+++ Adhemerval Zanella [2016-04-28 12:07 -0300]:

...

I do not think this issue is inherent to all JIT implements, but rather to luajit with its NaN-tagging scheme [1] which packs different types of objects in a 8-byte.

Other jits use the same/similar schemes (mozilla's ionmonkey is one AIUI). Not sure how many others do this, or how many JITs do it a different way. BUt it is certainly wider than just luajit.

Wookey

-- Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/

Alexandre Rames

29 Apr 29 Apr

7:55 a.m.

For info Google v8 Javascript uses the bottom bit to tag pointers. One problem with this mechanism though is that the pointers can only be used directly with unscaled offset memory access instructions (LDUR/STUR). So in particular, no LDP/STP.

On 28 April 2016 at 22:41, Wookey wookey@wookware.org wrote:

...

+++ Adhemerval Zanella [2016-04-28 12:07 -0300]:

...
I do not think this issue is inherent to all JIT implements, but rather

to

...
luajit with its NaN-tagging scheme [1] which packs different types of

objects

...
in a 8-byte.

Other jits use the same/similar schemes (mozilla's ionmonkey is one AIUI). Not sure how many others do this, or how many JITs do it a different way. BUt it is certainly wider than just luajit.

Wookey

Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/

linaro-dev mailing list linaro-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-dev

-- Alexandre Rames +44 1223 400 25452

Christopher Covington

22 Jun 22 Jun

2:53 p.m.

+Andy, Cyrill, Dmitry who have been discussing variable TASK_SIZE on x86 on linux-mm

http://marc.info/?l=linux-mm&m=146290118818484&w=2

...

...
...
On 04/28/2016 09:00 AM, Maxim Kuvyrkov wrote:

...
This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).

The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.

The high-level problem is that JITs tend to use upper bits of addresses to encode various pieces of data, and that the number of available bits is shrinking due to VA size increasing. With the usual 42-bit VA (which is what most JITs assume) they have 22 bits to encode various performance-critical data. With 48-bit VA (e.g., ThunderX world) things start to get complicated, and JITs need to be non-trivially patched at the source level to continue working with less bits available for their performance-critical storage. With upcoming 52-bit VA things might get dire enough for some JITs to declare such configurations unsupported.

On the other hand, most JITs are not expected to requires terabytes of RAM and huge VA for their applications. Most JIT applications will happily live in 42-bit world with mere 4 terabytes of RAM that it provides. Therefore, what JITs need in the modern world is a way to make mmap() return addresses below a certain threshold, and error out with ENOMEM when "lower" memory is exhausted. This is very similar to ADDR_LIMIT_32BIT personality, but extended to common VA sizes on 64-bit systems: 39-bit, 42-bit, 48-bit, 52-bit, etc.

Since we do not want to penalize the whole system (using an artificially low-size VA), it would be best to have a way to enable VA limit on per-process basis (similar to ADDR_LIMIT_32BIT personality). If that's not possible -- then on per-container / cgroup basis. If that's not possible -- then on system level (similar to vm.mmap_min_addr, but from the other end).

Dear kernel people, what can be done to address the JITs need to reduce effective VA size?

...

...
On 04/28/2016 09:17 AM, Arnd Bergmann wrote:

...
Thanks for the summary, now it all makes much more sense.

One simple (from the kernel's perspective, not from the JIT) approach might be to always use MAP_FIXED whenever an allocation is made for memory that needs these special pointers, and then manage the available address space explicitly. Would that work, or do you require everything including the binary itself to be below the address?

Regarding which memory sizes are needed, my impression from your explanation is that a single personality flag (e.g. ADDR_LIMIT_42BIT) would be sufficient for the usecase, and you don't actually need to tie this to the architecture-provided virtual addressing limits at all. If it's only one such flag, we can probably find a way to fit it into the personality flags, though ironically we are actually running out of bits in there as well.

...

On 04/28/2016 09:24 AM, Peter Maydell wrote:

...
The trouble IME with this idea is that in practice you're linking with glibc, which means glibc is managing (and using) the address space, not the JIT. So MAP_FIXED is pretty awkward to use.

On 04/28/2016 03:27 PM, Steve Capper wrote:

...

One can find holes in the VA space by examining /proc/self/maps, thus selection of pointers for MAP_FIXED can be deduced.

The other problem is, as Arnd alluded to, if a JIT'ed object needs to then refer to something allocated outside of the JIT. This could be remedied by another level of indirection/trampoline.

Taking two steps back though, I would view VA space squeezing as a stop-gap before removing tags from the upper bits of a pointer altogether (tagging the bottom bits, by controlling alignment is perfectly safe). The larger the VA space, the more scope mechanisms such as Address Space Layout Randomisation have to improve security.

I was working on an (AArch64-specific) auxiliary vector entry to export TASK_SIZE to userspace at exec time. The goal was to allow for more elegant, robust, and efficient replacements for the following changes:

https://hg.mozilla.org/integration/mozilla-inbound/rev/dfaafbaaa291

https://github.com/xemul/criu/commit/c0c0546c31e6df4932669f4740197bb830a24c8...

However based on the above discussion, it appears that some sort of prctl(PR_GET_TASK_SIZE, ...) and prctl(PR_SET_TASK_SIZE, ...) may be preferable for AArch64. (And perhaps other justifications for the new calls influences the x86 decisions.) What do folks think?

Thanks, Cov

-- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

3486

days inactive

3541

days old

linaro-dev@lists.linaro.org

8 comments

participants

tags (0)

participants (9)

Adhemerval Zanella
Alexandre Rames
Arnd Bergmann
Christopher Covington
Edward Nevill
Maxim Kuvyrkov
Peter Maydell
Steve Capper
Wookey