Help with some assembler syntax

List overview All Threads
Download

newer

older

linux-tools-3.15.0

Perf installation on Linaro

Edward Nevill

7 Dec 2015 7 Dec '15

10:51 a.m.

Hi,

Does anyone know how I do

adrp x0, dest & ~0xfff add x0, x0, dest & 0xfff

in aarch64 assembler?

Thanks for your help, Ed.

Show replies by date

Jon Medhurst (Tixy)

7 Dec 7 Dec

1:33 p.m.

On Mon, 2015-12-07 at 10:51 +0000, Edward Nevill wrote:

...

Hi,

Does anyone know how I do
adrp  x0, dest & ~0xfff
add   x0, x0, dest & 0xfff
in aarch64 assembler?

Guessing here from experience with 32-bit ARM...

adr x0, dest

and the assembler with generate the instructions you're asking for, or a PC relative address instruction, or anything else it can to give the address of dest.

-- Tixy

Ard Biesheuvel

1:40 p.m.

On 7 December 2015 at 11:51, Edward Nevill edward.nevill@linaro.org wrote:

...

Hi,

Does anyone know how I do
adrp  x0, dest & ~0xfff
add   x0, x0, dest & 0xfff
in aarch64 assembler?

You can't. ADRP is PC relative, but rounded to page granularity, so you can't use it for arbitrary expressions against symbols.

Can you elaborate on your use case?

Ard Biesheuvel

1:42 p.m.

On 7 December 2015 at 14:40, Ard Biesheuvel ard.biesheuvel@linaro.org wrote:

...

On 7 December 2015 at 11:51, Edward Nevill edward.nevill@linaro.org wrote:

...
Hi,

Does anyone know how I do
adrp  x0, dest & ~0xfff
add   x0, x0, dest & 0xfff
in aarch64 assembler?
You can't. ADRP is PC relative, but rounded to page granularity, so you can't use it for arbitrary expressions against symbols.

Can you elaborate on your use case?

Ah hold on

You mean

adrp, x0, dest add x0, x0, #:lo12:dest

Edward Nevill

1:53 p.m.

...

adrp, x0, dest add x0, x0, #:lo12:dest

Thanks! Thats the syntax I wanted.

The use case is I want to benchmark this as a way of generating far calls, for use within the JIT for when code cache becomes > 128m.

At the moment we generate trampolines

tramp: ldr Xn, here br Xn here .dword dest

and then do

bl dest

but if on relocation in the code cache it doen't reach then relocate this to

bl tramp

but I think it may be better to simply always generate

adrp Xn, dest add Xn, Xn, :lo12:dest blr Xn

Regards, Ed.

Ard Biesheuvel

2:09 p.m.

On 7 December 2015 at 14:53, Edward Nevill edward.nevill@linaro.org wrote:

...

...
adrp, x0, dest add x0, x0, #:lo12:dest

Thanks! Thats the syntax I wanted.

The use case is I want to benchmark this as a way of generating far calls, for use within the JIT for when code cache becomes > 128m.

At the moment we generate trampolines

tramp: ldr Xn, here br Xn here .dword dest

and then do

bl dest

but if on relocation in the code cache it doen't reach then relocate this to

bl tramp

but I think it may be better to simply always generate

adrp Xn, dest add Xn, Xn, :lo12:dest blr Xn

I don't suppose the branch predictor would like that very much, though. Can't you keep the original arrangement, and use this sequence instead? Or use the new sequence unconditionally, but nop out the add instruction and change the blr for a straight bl if the target is in range?

Edward Nevill

3:05 p.m.

...

...
adrp Xn, dest add Xn, Xn, :lo12:dest blr Xn

I don't suppose the branch predictor would like that very much, though. Can't you keep the original arrangement, and use this sequence instead? Or use the new sequence unconditionally, but nop out the add instruction and change the blr for a straight bl if the target is in range?

No, the branch predictor is fine with it. The blr Xn always goes to the same address so it is predicted. The adrp/add pair are folded into a single micro op and then dual issued with the bl, so it take exactly the same time as a straight bl.

It depends, of course, on your implementation,

Regards, Ed.

3511

days inactive

3511

days old

linaro-dev@lists.linaro.org

6 comments

participants

tags (0)

participants (3)

Ard Biesheuvel
Edward Nevill
Jon Medhurst (Tixy)