Hi,
Does anyone know how I do
adrp x0, dest & ~0xfff add x0, x0, dest & 0xfff
in aarch64 assembler?
Thanks for your help, Ed.
On Mon, 2015-12-07 at 10:51 +0000, Edward Nevill wrote:
Hi,
Does anyone know how I do
adrp x0, dest & ~0xfff add x0, x0, dest & 0xfff
in aarch64 assembler?
Guessing here from experience with 32-bit ARM...
adr x0, dest
and the assembler with generate the instructions you're asking for, or a PC relative address instruction, or anything else it can to give the address of dest.
On 7 December 2015 at 11:51, Edward Nevill edward.nevill@linaro.org wrote:
Hi,
Does anyone know how I do
adrp x0, dest & ~0xfff add x0, x0, dest & 0xfff
in aarch64 assembler?
You can't. ADRP is PC relative, but rounded to page granularity, so you can't use it for arbitrary expressions against symbols.
Can you elaborate on your use case?
On 7 December 2015 at 14:40, Ard Biesheuvel ard.biesheuvel@linaro.org wrote:
On 7 December 2015 at 11:51, Edward Nevill edward.nevill@linaro.org wrote:
Hi,
Does anyone know how I do
adrp x0, dest & ~0xfff add x0, x0, dest & 0xfff
in aarch64 assembler?
You can't. ADRP is PC relative, but rounded to page granularity, so you can't use it for arbitrary expressions against symbols.
Can you elaborate on your use case?
Ah hold on
You mean
adrp, x0, dest add x0, x0, #:lo12:dest
adrp, x0, dest add x0, x0, #:lo12:dest
Thanks! Thats the syntax I wanted.
The use case is I want to benchmark this as a way of generating far calls, for use within the JIT for when code cache becomes > 128m.
At the moment we generate trampolines
tramp: ldr Xn, here br Xn here .dword dest
and then do
bl dest
but if on relocation in the code cache it doen't reach then relocate this to
bl tramp
but I think it may be better to simply always generate
adrp Xn, dest add Xn, Xn, :lo12:dest blr Xn
Regards, Ed.
On 7 December 2015 at 14:53, Edward Nevill edward.nevill@linaro.org wrote:
adrp, x0, dest add x0, x0, #:lo12:dest
Thanks! Thats the syntax I wanted.
The use case is I want to benchmark this as a way of generating far calls, for use within the JIT for when code cache becomes > 128m.
At the moment we generate trampolines
tramp: ldr Xn, here br Xn here .dword dest
and then do
bl dest
but if on relocation in the code cache it doen't reach then relocate this to
bl tramp
but I think it may be better to simply always generate
adrp Xn, dest add Xn, Xn, :lo12:dest blr Xn
I don't suppose the branch predictor would like that very much, though. Can't you keep the original arrangement, and use this sequence instead? Or use the new sequence unconditionally, but nop out the add instruction and change the blr for a straight bl if the target is in range?
adrp Xn, dest add Xn, Xn, :lo12:dest blr Xn
I don't suppose the branch predictor would like that very much, though. Can't you keep the original arrangement, and use this sequence instead? Or use the new sequence unconditionally, but nop out the add instruction and change the blr for a straight bl if the target is in range?
No, the branch predictor is fine with it. The blr Xn always goes to the same address so it is predicted. The adrp/add pair are folded into a single micro op and then dual issued with the bl, so it take exactly the same time as a straight bl.
It depends, of course, on your implementation,
Regards, Ed.