Using inline NEON code
dave.martin at linaro.org
Fri Dec 3 11:36:41 UTC 2010
On Thu, Dec 2, 2010 at 9:49 PM, Michael Hope <michael.hope at linaro.org> wrote:
> Hi there. Currently you can't use NEON instructions in inline
> assembly if the compiler is set to -mfpu=vfp such as Ubuntu's
> -mfpu=vfpv3-d16. Trying code like this:
> int main()
> asm("veor d1, d2, d3");
> return 0;
> gives an error message like:
> test.s: Assembler messages:
> test.s:29: Error: selected processor does not support Thumb mode `veor d1,d2,d3'
> The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the
> compiler what instructions to use, and also tells the assembler what
> instructions are valid. We might want the compiler to use the VFP for
> compatibility or power reasons, but still be able to use NEON
> instructions in inline assembler without passing extra flags.
We came across a similar case in the kernel just recently... and it's
likely to recur as we try to move toward more unified kernels.
The problem is that the toolchain considers:
a) the architecture baseline _needed_ by an object, and the
architectural features it _may use_ to be one and the same thing.
This is true for C code, but not universally true for assembler
(either inline or not).
b) the architecture baseline _needed_ by the output of the linker to
be the union of all the architecture baselines _needed_ by all the
individual objects linked together. This is not necessarily true even
for C code.
These conservative assumptions only really support the
fixed-configuration use case; they don't accomodate the concept of
run-time adaptation to CPU features.
For background on the kernel discussion, see this thread -- you'll
have to follow it a bit:
> Inserting ".fpu neon" to the start of the inline assembly fixes the
> problem. Is this valid? Are assembly files with multiple .fpu
> statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as
> gas seems to ignore the second -mfpu.
Strictly speaking, no, because many many points of gas behaviour are
not specified and there's no definition of what should happen if there
are multiple conflicting .arch or .fpu directives. We'd need a
toolchain expert to pass judgement on this.
Also, changing the arch part way through the file means you're no
longer protected: incorrect code generation by the compiler from that
point onwards may not be detected if it occurs.
Worse, if there were to be a "neonv2" in the future, you would now
unexpectedly downgrade the architecture halfway through the file, so
the assembler may barf on subsequent compiler-generated code... so to
avoid future maintenance problems, a way to restore the "true"
architecture is definitely needed.
In principle, you could change and resture the architecture with the
help of some build system hacks:
gcc -DASM_DEFAULT_ARCH='".arch $(ARCH_VERSION); .fpu $(FP_ARCH_VERSION);"'
"veor d0, d1, d2\n\t"
This doesn't sit well with the Debian/Ubuntu way of building things
where we have to build options into the compiler as defaults for there
to be any hope of them taking effect ... because of the way package
build systems clobber CFLAGS/CPPFLAGS all over the place and in
practice can't be overridden globally. So, you'd have a tweak the
build scripts for each affected package.
Also, the architecture feature requirements put in the object can look
a bit weird--- presumably because the ".fpu" directive is overloaded
to describe two different architectural features (VFP and NEON).
If I do this:
veor d0, d1, d2
then fromelf lists the following attributes for the object:
Attribute Section: aeabi
Tag_DIV_use: Not allowed
i.e., the baseline for each architectural feature is whatever the last
applicable .arch or .fpu directive in the file specified, or the arch
required by the instructions present in the file, whichever is the
However, the assembler checks instructions validity line by line, so this:
veor d0, d1, d2
veor d0, d1, d2
gives an assembler error, which is sort of what we expect/want:
tst.s: Assembler messages:
tst.s:5: Error: selected processor does not support ARM mode `veor d0,d1,d2'
note - only the second veor causes the error here, because NEON
instructions are no longer permitted after the ".fpu vfp" directive.
While these tricks might be some use in practice, I'd be cautious
about relying on them.
> What's the best way to handle this? Some options are:
> * Add '.fpu neon' directives to the start of any inline assembly
May work for now, but probably not a great idea, as above; plus there
is no easy way way to restore the correct architecture afterwards.
> * Separate out the features, so you can specify the capabilities with
> one option and restrict the compiler to a subset with another.
> Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'
Could work, but might be contraversial. I guess it's for toolchain
guys to comment.
> * Relax the assembler so that any instructions are accepted. We'd
> lose some checking of GCC's output though.
LIkely to be contraversial? -- this could be a straightforward fix,
but it certainly should never be the default behaviour. And the
question of what architecture version requirement attributes get
written into the resulting object remains.
What I'd really like on my wishlist is to be able to write something like:
/* fancy stuff */
Where .poparch restores whatever architecture version was in force
before .pusharch, and everything between the outermost .pusharch ...
.poparch pair is ignored for the purpose of setting the attributes on
the object file. This should be safe to use inside inline asm, and
appears to fit well with the linux kernel use case and with what
you're trying to do.
That doesn't feel like rocket science, but then I'm not a toolchain hacker ;)
More information about the linaro-toolchain