Hi!
There is a branch experimental/pyarmnn, created by Matthew Bentham, which contains python wrappers for armnn, which initially seems to work pretty well - building a whl archive works, which can be installed using pip and I was able to write an example, which runs inference on a float/quantized model and using all the supported frameworks - tf, tf-lite, caffe, onnx as well. What is missing is to get the python wrappers integrated, run and check unit tests and write a few examples. We discussed this with Matthew already, but I would be glad to hear more opinions regarding how we proceed and to kick off a discussion.
1. How to integrate pyarmnn?
There are 2 paths initially:
1. Build pyarmnn together with armnn using a single cmake command
* By default it would be turned off, otherwise it would be build using e.g. -DBUILD_PYARMNN
* The product is either a whl or a src package - so should there be 2 options e.g. -DBUILD_PYARMNN_SRC, -DBUILD_PYARMNN_WHL or only a single one, which would always build both?
2. Separate pyarmnn from armnn into a different repository (and keep it as a separate project)
* Additionally to a) options -DARMNN_LIB and -DARMNN_INCLUDE would be required as well, so that it can be "linked" against configurable armnn build
The difference is mainly in maintainability - a) forces to maintain pyarmnn and update the swig files to generate wrappers per every release b) on the other hand keeps the project separate, allows to build pyarmnn with a configurable armnn release and doesn't create a dependency to update the swig files whenever armnn interface changes a little.
1. Remove tox? Yes/No - Tox is a python automation library, which is used to generate the wrappers or to run unit tests. It is not really needed, because the wrappers can be generated directly using swig and the src/whl packages generated using python/setuptools and it just creates another dependency. Unit tests can also be run directly using python.
2. Get pyarmnn published on pypi.org? Yes/No - we would be able to install pyarmnn using "pip install pyarmnn"
Any additional ideas, comments, feedback etc. would be of course appreciated.
Thanks!
Pavel M
Hi all,
Regarding the ILayerSupport interface in ILayerSupport.hpp, most of the methods have output TensorInfos. Some of the methods (e.g. IsDetectionPostProcessSupported) don't have output infos. This caused an issue in our custom backend because we were unable to check the output tensor info and reject the layer properly. I think it should be possible to have this information for all layers. What do you think?
Thanks,
Josh
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello Derek,
Is this issue still open..?
If open can I work on it?
On Mon, Oct 28, 2019 at 5:30 PM <armnn-dev-request(a)lists.linaro.org> wrote:
> Send Armnn-dev mailing list submissions to
> armnn-dev(a)lists.linaro.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.linaro.org/mailman/listinfo/armnn-dev
> or, via email, send a message with subject or body 'help' to
> armnn-dev-request(a)lists.linaro.org
>
> You can reach the person managing the list at
> armnn-dev-owner(a)lists.linaro.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Armnn-dev digest..."
>
>
> Today's Topics:
>
> 1. Re: ArmNN | ONXX model load issue (Derek Lamberti)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 28 Oct 2019 10:31:11 +0000
> From: Derek Lamberti <derek.lamberti(a)linaro.org>
> To: Rahul Chowdhury <rahul.c(a)pathpartnertech.com>
> Cc: Manjunath Kulkarni <manjunath.kulkarni(a)pathpartnertech.com>,
> armnn-dev(a)lists.linaro.org
> Subject: Re: [Armnn-dev] ArmNN | ONXX model load issue
> Message-ID:
> <CAPeFqV89WNV-sw20X0NB=
> tznsiMmNpw0bzj23HcgNSEmoFFmWg(a)mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Rahul,
>
>
> ArmNN doesn't support zero dimension tensors implicitly. Often this
> can be resolved by converting the tensor to a 1D tensor with 1
> element. We have done this conversion automatically within the TfLite
> parser and this has worked for a particular use case we ran into. A
> similar solution might work for your use case too. This could be done
> within the ToTensorInfo() function in OnnxParser.cpp. If this resolves
> the issue for you I'd recommend issuing a pull request so that we can
> integrate it into master.
>
>
> Hope that helps,
> ~Derek
>
> On Thu, 22 Aug 2019 at 15:32, Rahul Chowdhury
> <rahul.c(a)pathpartnertech.com> wrote:
> >
> > Hi,
> >
> > We are using ArmNN to cross-compile a standalone C++ application on Linux
> > that loads a standard onnx model. During the model loading, we see a
> crash
> > with the below error output -
> >
> > terminate called after throwing an instance of
> > 'armnn::InvalidArgumentException'
> > what(): Tensor numDimensions must be greater than 0
> >
> > Initially we were on armnn master, and later we switched to tag v19.05,
> but
> > the error was same for both.
> >
> > Below is the code snippet to load the model -
> > armnnOnnxParser::IOnnxParserPtr parser =
> > armnnOnnxParser::IOnnxParser::Create();
> > std::cout << "\nmodel load start";
> > armnn::INetworkPtr network =
> > parser->CreateNetworkFromBinaryFile("onnx_3DDFA.onnx");
> > std::cout << "\nmodel load end";
> >
> > It crashes after printing "model load start" with the error message
> printed
> > above.
> >
> > A gdb backtrace is also provided below -
> > (gdb) r
> > Starting program:
> > /home/root/Rahul/armnn_onnx/3DDFA_ArmNN_onnx/3ddfa_armnn_onnx
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib/libthread_db.so.1".
> >
> > terminate called after throwing an instance of
> > 'armnn::InvalidArgumentException'
> > what(): Tensor numDimensions must be greater than 0
> > model load start
> > Program received signal SIGABRT, Aborted.
> > __GI_raise (sig=sig@entry=6) at
> > /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51
> > 51 }
> > (gdb) bt
> > #0 __GI_raise (sig=sig@entry=6) at
> > /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51
> > #1 0x0000ffffbe41df00 in __GI_abort () at
> > /usr/src/debug/glibc/2.26-r0/git/stdlib/abort.c:90
> > #2 0x0000ffffbe6aa0f8 in __gnu_cxx::__verbose_terminate_handler() ()
> from
> > /usr/lib/libstdc++.so.6
> > #3 0x0000ffffbe6a7afc in ?? () from /usr/lib/libstdc++.so.6
> > #4 0x0000ffffbe6a7b50 in std::terminate() () from
> /usr/lib/libstdc++.so.6
> > #5 0x0000ffffbe6a7e20 in __cxa_throw () from /usr/lib/libstdc++.so.6
> > #6 0x0000ffffbefdad84 in armnn::TensorShape::TensorShape(unsigned int,
> > unsigned int const*) () from
> /home/root/Rahul/armnn_onnx/build/libarmnn.so
> > #7 0x0000ffffbe7e34d8 in armnnOnnxParser::(anonymous
> > namespace)::ToTensorInfo(onnx::ValueInfoProto const&) [clone
> > .constprop.493] () from
> > /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
> > #8 0x0000ffffbe7e4080 in
> >
> armnnOnnxParser::OnnxParser::SetupInfo(google::protobuf::RepeatedPtrField<onnx::ValueInfoProto>
> > const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
> > #9 0x0000ffffbe7e41ac in armnnOnnxParser::OnnxParser::LoadGraph() ()
> from
> > /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
> > #10 0x0000ffffbe7e4760 in
> > armnnOnnxParser::OnnxParser::CreateNetworkFromModel(onnx::ModelProto&) ()
> > from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
> > #11 0x0000ffffbe7e49b0 in
> > armnnOnnxParser::OnnxParser::CreateNetworkFromBinaryFile(char const*) ()
> > from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
> > #12 0x0000000000402290 in main ()
> > (gdb)
> >
> >
> > Can someone point out if we are missing something out or doing something
> > wrong? Any help or input is highly appreciated.
> >
> >
> > Regards,
> > Rahul
> >
> > --
> >
> >
> >
> >
> >
> >
> > This
> > message contains confidential information and is intended only
> > for the
> > individual(s) named. If you are not the intended
> > recipient, you are
> > notified that disclosing, copying, distributing or taking any
> > action in
> > reliance on the contents of this mail and attached file/s is strictly
> >
> > prohibited. Please notify the
> > sender immediately and delete this e-mail
> > from your system. E-mail transmission
> > cannot be guaranteed to be secured or
> > error-free as information could be
> > intercepted, corrupted, lost, destroyed,
> > arrive late or incomplete, or contain
> > viruses. The sender therefore does
> > not accept liability for any errors or
> > omissions in the contents of this
> > message, which arise as a result of e-mail
> > transmission.
> > _______________________________________________
> > Armnn-dev mailing list
> > Armnn-dev(a)lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/armnn-dev
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Armnn-dev mailing list
> Armnn-dev(a)lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/armnn-dev
>
>
> ------------------------------
>
> End of Armnn-dev Digest, Vol 8, Issue 3
> ***************************************
>
Hi,
We are using ArmNN to cross-compile a standalone C++ application on Linux
that loads a standard onnx model. During the model loading, we see a crash
with the below error output -
terminate called after throwing an instance of
'armnn::InvalidArgumentException'
what(): Tensor numDimensions must be greater than 0
Initially we were on armnn master, and later we switched to tag v19.05, but
the error was same for both.
Below is the code snippet to load the model -
armnnOnnxParser::IOnnxParserPtr parser =
armnnOnnxParser::IOnnxParser::Create();
std::cout << "\nmodel load start";
armnn::INetworkPtr network =
parser->CreateNetworkFromBinaryFile("onnx_3DDFA.onnx");
std::cout << "\nmodel load end";
It crashes after printing "model load start" with the error message printed
above.
A gdb backtrace is also provided below -
(gdb) r
Starting program:
/home/root/Rahul/armnn_onnx/3DDFA_ArmNN_onnx/3ddfa_armnn_onnx
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
terminate called after throwing an instance of
'armnn::InvalidArgumentException'
what(): Tensor numDimensions must be greater than 0
model load start
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at
/usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51 }
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at
/usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1 0x0000ffffbe41df00 in __GI_abort () at
/usr/src/debug/glibc/2.26-r0/git/stdlib/abort.c:90
#2 0x0000ffffbe6aa0f8 in __gnu_cxx::__verbose_terminate_handler() () from
/usr/lib/libstdc++.so.6
#3 0x0000ffffbe6a7afc in ?? () from /usr/lib/libstdc++.so.6
#4 0x0000ffffbe6a7b50 in std::terminate() () from /usr/lib/libstdc++.so.6
#5 0x0000ffffbe6a7e20 in __cxa_throw () from /usr/lib/libstdc++.so.6
#6 0x0000ffffbefdad84 in armnn::TensorShape::TensorShape(unsigned int,
unsigned int const*) () from /home/root/Rahul/armnn_onnx/build/libarmnn.so
#7 0x0000ffffbe7e34d8 in armnnOnnxParser::(anonymous
namespace)::ToTensorInfo(onnx::ValueInfoProto const&) [clone
.constprop.493] () from
/home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#8 0x0000ffffbe7e4080 in
armnnOnnxParser::OnnxParser::SetupInfo(google::protobuf::RepeatedPtrField<onnx::ValueInfoProto>
const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#9 0x0000ffffbe7e41ac in armnnOnnxParser::OnnxParser::LoadGraph() () from
/home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#10 0x0000ffffbe7e4760 in
armnnOnnxParser::OnnxParser::CreateNetworkFromModel(onnx::ModelProto&) ()
from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#11 0x0000ffffbe7e49b0 in
armnnOnnxParser::OnnxParser::CreateNetworkFromBinaryFile(char const*) ()
from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so
#12 0x0000000000402290 in main ()
(gdb)
Can someone point out if we are missing something out or doing something
wrong? Any help or input is highly appreciated.
Regards,
Rahul
--
This
message contains confidential information and is intended only
for the
individual(s) named. If you are not the intended
recipient, you are
notified that disclosing, copying, distributing or taking any
action in
reliance on the contents of this mail and attached file/s is strictly
prohibited. Please notify the
sender immediately and delete this e-mail
from your system. E-mail transmission
cannot be guaranteed to be secured or
error-free as information could be
intercepted, corrupted, lost, destroyed,
arrive late or incomplete, or contain
viruses. The sender therefore does
not accept liability for any errors or
omissions in the contents of this
message, which arise as a result of e-mail
transmission.
Hi,
I'm trying to send a minor patch in ArmNN for review, but I ran into some
authentication failure below for 'git review' (I added the gerrit server
with ‘git remote add gerrit https://review.mlplatform.org/ml/armnn’).
remote: Unauthorized
fatal: Authentication failed for 'https://review.mlplatform.org/ml/armnn/'
I can login to the gerrit server with the same username/password. Is there
any special permission required? I cannot find related information in
mlplatform.org website.
Please let me know if I missed something.
Thanks,
Jammy
Hi Derek,
Thanks for your reply and I'm glad we agree on this.
Are there any tickets/issues which I can use to track the changes that you have suggested?
Thanks,
Rob
From: Derek Lamberti <Derek.Lamberti(a)arm.com>
Sent: 13 August 2019 14:01
To: Matthew Bentham <Matthew.Bentham(a)arm.com>; Robert Hughes <Robert.Hughes(a)arm.com>; Armnn-dev(a)lists.linaro.org
Subject: Re: Validation of inputs
Hi Rob,
Yes, I think this is certainly an area where we should do better:
1. For completeness, there are different levels of validation that need to occur. This can be different from the validation performed by the backend::IsSupported() functions. For example, IsSupported only needs to report what is valid for that backend implementation, which may cover only a subset of the full ArmNN specification for the layer. Also worth mentioning.
2. There are different stages where we should perform the validation.
* On the input graph during Graph building (much like you suggested). This would be validation against the ArmNN spec and would indicate to the user immediately (at the point of error) that they have tried to add an invalid op.
* On the LoadedNetwork during workload creation. This is essentially what the current code does and is a validation against the ArmNN spec. However, it's currently performed during the construction of the workloads, and should instead be called by the ArmNN framework just before, which would be independent of workload implementation. I would also make this check for debug builds only. It's useful to validate that all the graph transformations to this point have been valid.
1. Furthermore, there are different stages where we could perform further validation.
* In the optimizer (post-backend-optimization) to verify the optimized result. This is essentially, the same as 2b (above) but earlier in the pipe and would give better user experience. It is required because backend optimization implementations could produce an invalid graph. If we do it early enough in the optimizer pipeline, we could use it to reject invalid optimizations from the backends, and fall-back to the next backend instead of failing outright.
* It would remain up to the back implementations to verify that the workloads created are compatible with their implementations. Similar to the IsSupported() but during workload creation (like we are doing now). This could also be made a debug only option.
1. InferTensorInfos should certainly be safe code. We will soon be updating this code so that it can be used to actually infer the shapes for tensors where the shape is unknown in the model (rather than just for validation). Suffice it to say, the current implementation could be safer.
Thanks for your feedback and keep it coming. I'm eager to make ArmNN a lot more user friendly in the coming year, so this all helps.
Regards,
~Derek
________________________________
From: Matthew Bentham <Matthew.Bentham(a)arm.com<mailto:Matthew.Bentham@arm.com>>
Sent: 13 August 2019 12:00
To: Robert Hughes <Robert.Hughes(a)arm.com<mailto:Robert.Hughes@arm.com>>; Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org> <Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org>>
Cc: Derek Lamberti <Derek.Lamberti(a)arm.com<mailto:Derek.Lamberti@arm.com>>
Subject: Re: Validation of inputs
Thanks Rob, that does seem wrong. At initial glance it looks to me like QueueDescriptor::Validate should not exist, and all that checking should move to roughly where InferTensorInfos is called now. I'll let Derek comment further.
All the best,
Matthew
________________________________
From: Armnn-dev <armnn-dev-bounces(a)lists.linaro.org<mailto:armnn-dev-bounces@lists.linaro.org>> on behalf of Robert Hughes <Robert.Hughes(a)arm.com<mailto:Robert.Hughes@arm.com>>
Sent: 13 August 2019 11:52
To: Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org> <Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org>>
Subject: [Armnn-dev] Validation of inputs
Hi ArmNN dev team,
I am part of the team developing the ArmNN backend for the Arm NPU and have some concerns about the validation that the ArmNN core library performs on its inputs. Below is a description of how I believe validation is performed within ArmNN and the problems that I see with this. This understanding may be flawed so please correct me where I have misunderstood.
When the user creates an INetwork there is minimal validation of the data provided by the user. For example, the dimensionality of input tensors is not checked at this point. The user then calls Optimize() which performs the following steps:
1. InferTensorInfos() - this calls ValidateTensorShapesFromInputs on each Layer in the Graph which confirms that the output tensor shape set on each Layer during Network construction is consistent with the Layer's inputs. For the example of a FullyConnectedLayer, this uses the shape of the input and the shape of the weights to determine the correct output shape. This code seems to make assumptions about the dimensionality of the inputs tensors, for example FullyConnectedLayer::InferOutputShapes() indexes into the input and weight shapes without checking their dimensionality first.
2. AssignBackends() - this calls each backend's IsLayerSupported() APIs. The only data that has been validated so far is that the output shapes of each layer are correct, so the backend IsLayerSupported() APIs cannot assume anything about the shapes of the tensors. This means the backends must perform additional validation.
3. ApplyBackendOptimizations() - this gives each backend the opportunity to "optimize" each subgraph which has been assigned to it. Again, the layers passed to the backend still have not been properly validated, although the backend has had the chance to reject the layers via the IsLayerSupported() APIs.
The user then creates a LoadedNetwork from the IOptimizedNetwork which creates the Workloads. This is delegated to the backend's IWorkloadFactory which is responsible for returning an object implementing IWorkload. In the case of the default backends (reference, Neon, CL), these workloads derive from BaseWorkload, which calls Validate() on the QueueDescriptor for that workload type. This is the place that seems to perform the "proper" validation of what is supported by ArmNN. In the example of Fully Connected, FullyConnectedQueueDescriptor::Validate checks the dimensionality of all tensors, the quantisastion infos, etc. Note that there seems to be no requirement that this validation code is called at all, in the case that the backend-created workloads do not inherit BaseWorkload (this is always the case for backends which replace subgraphs with PreCompiledLayers).
The problems that this causes are as follows:
* The InferTensorInfos() code could crash as it makes assumptions that have not been validated
* Every backend's IsLayerSupported APIs must duplicate the validation code in ArmNN in order to check that the layer is valid, before they even get to the point of checking if that particular backend supports it.
* The "proper" ArmNN layer validation code may never be run, depending on how the backend processes the graph. Specifically, in the case of backends which replace subgraphs with PreCompiledLayers, the validation code is never run.
* These problems affect both end users of the ArmNN API and backend developers
I would suggest that a better method of validation would be to validate the INetwork completely before it is processed any further. This could be done during construction of the INetwork or as the first step in Optimize(). This would simplify the backend code as it would not need to duplicate ArmNN's validation code and give a more consistent interface to end users.
Please let me know your thoughts,
Thanks,
Rob
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
Armnn-dev mailing list
Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org>
https://lists.linaro.org/mailman/listinfo/armnn-dev
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi ArmNN dev team,
I am part of the team developing the ArmNN backend for the Arm NPU and have some concerns about the validation that the ArmNN core library performs on its inputs. Below is a description of how I believe validation is performed within ArmNN and the problems that I see with this. This understanding may be flawed so please correct me where I have misunderstood.
When the user creates an INetwork there is minimal validation of the data provided by the user. For example, the dimensionality of input tensors is not checked at this point. The user then calls Optimize() which performs the following steps:
1. InferTensorInfos() - this calls ValidateTensorShapesFromInputs on each Layer in the Graph which confirms that the output tensor shape set on each Layer during Network construction is consistent with the Layer's inputs. For the example of a FullyConnectedLayer, this uses the shape of the input and the shape of the weights to determine the correct output shape. This code seems to make assumptions about the dimensionality of the inputs tensors, for example FullyConnectedLayer::InferOutputShapes() indexes into the input and weight shapes without checking their dimensionality first.
2. AssignBackends() - this calls each backend's IsLayerSupported() APIs. The only data that has been validated so far is that the output shapes of each layer are correct, so the backend IsLayerSupported() APIs cannot assume anything about the shapes of the tensors. This means the backends must perform additional validation.
3. ApplyBackendOptimizations() - this gives each backend the opportunity to "optimize" each subgraph which has been assigned to it. Again, the layers passed to the backend still have not been properly validated, although the backend has had the chance to reject the layers via the IsLayerSupported() APIs.
The user then creates a LoadedNetwork from the IOptimizedNetwork which creates the Workloads. This is delegated to the backend's IWorkloadFactory which is responsible for returning an object implementing IWorkload. In the case of the default backends (reference, Neon, CL), these workloads derive from BaseWorkload, which calls Validate() on the QueueDescriptor for that workload type. This is the place that seems to perform the "proper" validation of what is supported by ArmNN. In the example of Fully Connected, FullyConnectedQueueDescriptor::Validate checks the dimensionality of all tensors, the quantisastion infos, etc. Note that there seems to be no requirement that this validation code is called at all, in the case that the backend-created workloads do not inherit BaseWorkload (this is always the case for backends which replace subgraphs with PreCompiledLayers).
The problems that this causes are as follows:
* The InferTensorInfos() code could crash as it makes assumptions that have not been validated
* Every backend's IsLayerSupported APIs must duplicate the validation code in ArmNN in order to check that the layer is valid, before they even get to the point of checking if that particular backend supports it.
* The "proper" ArmNN layer validation code may never be run, depending on how the backend processes the graph. Specifically, in the case of backends which replace subgraphs with PreCompiledLayers, the validation code is never run.
* These problems affect both end users of the ArmNN API and backend developers
I would suggest that a better method of validation would be to validate the INetwork completely before it is processed any further. This could be done during construction of the INetwork or as the first step in Optimize(). This would simplify the backend code as it would not need to duplicate ArmNN's validation code and give a more consistent interface to end users.
Please let me know your thoughts,
Thanks,
Rob
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi
I'd like to submit for consideration and discussion the following
proposed backend API design to address some of the current limitations
regarding excessive mem copies and sub-optimal memory behavior in Arm
NN. This design also lays the foundation for future roadmap items to
address protected content and affects backend authors only.
One open question which I would like feedback on is "how important is
backward compatibility and stability of this backend API?". I believe
it should be possible to keep existing backends working though it
would be far simpler from an implementation and testing perspective if
we could implement this in an API breaking way. Of course, if this is
unacceptable for the community we will endeavor to maintain the
current API (though deprecated) along side the new API for at least
one release cycle. As the API matures, I expect these type of
intrusive changes to become far less common.
...
So why change the API?
The current design requires that all tensors are allocated by the
backend which executes the workload. The workload inputs and outputs
are allocated by the backend via the workload factory interface. In
order for inter-backend compatibility to work, all TensorHandles are
required to implement Map/UnMap methods which expose the raw CPU
accessible pointer. A standard mem copy is then applied to copy the
data from one tensor type to another using these mapped tensors. This
copy is even performed in situations where different backends could
potentially use the same TensorHandle type making the mem copy
redundant. The current mechanism is not sufficient to cover all the
multiple types of heaps that may be available on a system or the
different usage patterns required for optimal performance.
What follows is a design which should enable the ArmNN framework to
minimize the number of mem copies required when transitioning between
different backends while also allowing backends to use their optimal
heaps while maintaining compatibility and correct functionality.
Design
There are two aspects to this design:
a mechanism to query tensor compatibility between backends
a mechanism to select and allocate the best compatible tensor type.
TensorHandle Factory
This design introduces a new interface class ITensorHandleFactory
which exposes the following methods:
virtual std::unique_ptr<ITensorHandle>
CreateSubTensorHandle(ITensorHandle& parent, TensorShape const&
subTensorShape, unsigned int const* subTensorOrigin) const = 0;
virtual std::unique_ptr<ITensorHandle> CreateTensorHandle(const
TensorInfo& tensorInfo) const = 0;
virtual const FactoryId GetId() const = 0;
These methods are currently located on the IWorkloadFactory interface.
By moving this interface onto a new dedicated class, it becomes
possible for backends to implement multiple factories, each with
different TensorHandle properties.
FactoryId
Each TensorHandleFactory has a globally unique identifier string. This
should take the form of "VendorId/BackendName/FactoryName".
Multiple factories
It should be possible for a backend to support multiple TensorHandle
types, each with different access properties. For example, a discreet
GPU might have GPU memory tensors (which are not mappable but provide
fast read/write access by the GPU) and staging Tensors (which are
mappable and slower access). In this scenario, the framework should
use the GPU tensors between workloads which execute on the GPU, and
staging Tensors which transition between the GPU and another backend.
Another scenario where this would be useful is for vendors with
proprietary formats/compression/layout where these tensors would not
be compatible with other backends. The current design cannot support
these easily.
TensorHandleFactoryRegistry
Each backend will register its TensorHandleFactory objects as well as
any IMemoryManager objects they might require. There is a new method
on the IBackendInternal interface which backend authors need to
implement.
virtual void RegisterTensorHandleFactories(class
TensorHandleFactoryRegistry& registry) {}
The implementation of this method needs to create the concrete factory
and memory manager instances and register them via the following
methods on the ITensorHandleFactoryRegistry parameter object.
void RegisterFactory(std::unique_ptr<ITensorHandleFactory> factory);
void RegisterMemoryManager(std::weak_ptr<IMemoryManager> memoryManger);
Note: The registry currently takes ownership of the factories but only
keeps a weak ptr to the memory manager. The exact detail of this
interface is not final and could change regarding ownership.
TensorHandleFactory preferences
In some scenarios, such as on a system with a Unified Memory
Architectures and compatible APIs, it might be possible for two
different backends to be able to access Tensors of the same
TensorHandle type. For example, The CpuAcc (Neon) backend can work
just as well using tensors allocated by the GpuAcc (CL) backend. In
order to support this in a generic way the backend will be able to
report a list of known TensorHandleFactory instances that it is
compatible with. To support this, the following method is added to the
IBackendInternal interface.
virtual std::vector<ITensorHandleFactory::FactoryId>
GetHandleFactoryPreferences() const = 0;
This method should return, in preference order, the FactoryId of any
factories (including its own) with which the backend is compatible.
The ranking is in the order from highest performance to highest
compatibility.
In the discreet GPU example, the GPU only tensor factory would be
first on the list and the tensor factory which supports Map/Unmap
would be second.
TensorHandleFactory properties
There will be additional methods on this ITensorHandleFactory
interface to query the properties of the TensorHandles allocated by
the factory (exact API TBD). These properties will be queried by the
Optimizer when coming up with a tensor handle strategy for "optimal
performance".
Some example properties might be:
SupportsSubTensors - Equivalent to existing functionality on the
IWorkloadFactory
SupportsMapUnmap - Currently Map/Unmap support is required however
this will likely become optional in the future.
SupportsMemoryImport - The mem copy of inputs could be removed for
scenarios where TensorHandles can import externally allocated memory.
SupportsMemoryExport - The mem copy between different backends could
be removed for scenarios where the two backends support memory export
and memory import respectively.
The framework will use these properties to determine the best strategy
for allocation (ie which factory to use or when to insert memcopies)
and to identify unsupported/invalid scenarios (ie no compatible
factories found).
MemoryTypes
For memory import and export scenarios, we will limit this to CPU
addressable memory for this initial implementation. In the future we
can add support for import from Dma_buf or IonBuffer and even
protected DmaBuf.
...
I hope you'll agree that this design opens a lot of potential for
improved flexibility and performance. I look forward to further
discussions on this subject.
Kind regards,
Derek
Hi all,
We've set up a persistent IRC channel on FreeNode called #mlplatform for random chat about Arm NN and Compute Library development.
All the best,
Matthew
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Thanks Nicolas, very interesting.
Caveat: I'm on holiday without a computer at the moment so can't check anything :-)
That string copying one looks like a false positive for the warning, we can probably rearrange the code to avoid it. Maybe the documentation for the warning has some advice.
The exception catch in the other one should be a catch by const reference (ie.
(const InvalidArgumentException &e)).
On the code size thing, I imagine what we have is a few clusters of highly related symbols. For example, the IsLayerSupported functions have common backend wrangling and error handling in each that maybe we could factor out?
All the best and happy Christmas!
Matthew
On 19 Dec 2018 04:44, Nicolas Pitre <nicolas.pitre(a)linaro.org> wrote:
Hello everybody,
Before we all go into Xmas mode and things start to fizzle out of my
head, here's a quick summary of my observations so far. Any comments
welcome.
To start with, Arm NN does not compile successfully with gcc version 8.2.1.
The first error to be hit is:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’:
/home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
std::strncpy(truncatedString, fullString, copyLength);
~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here
size_t copyLength = std::min(maxLength, strlen(fullString));
~~~~~~^~~~~~~~~~~~
In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’,
inlined from ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’ at /home/nico/armnn/src/armnn/LayerSupport.cpp:342:5:
/home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
std::strncpy(truncatedString, fullString, copyLength);
~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’:
/home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here
size_t copyLength = std::min(maxLength, strlen(fullString));
~~~~~~^~~~~~~~~~~~
The build progresses a bit further when using -Wno-stringop-overflow.
However it then fails on this:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsActivationSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::ActivationDescriptor&, char*, size_t)’:
/home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=]
} catch (InvalidArgumentException e) { \
^
/home/nico/armnn/src/armnn/LayerSupport.cpp:78:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’
FORWARD_LAYER_SUPPORT_FUNC(backend, IsActivationSupported, input, output, descriptor);
^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsAdditionSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::TensorInfo&, char*, size_t)’:
/home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=]
} catch (InvalidArgumentException e) { \
^
/home/nico/armnn/src/armnn/LayerSupport.cpp:93:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’
FORWARD_LAYER_SUPPORT_FUNC(backend, IsAdditionSupported, input0, input1, output);
^~~~~~~~~~~~~~~~~~~~~~~~~~
[...]
My C++-fu is not yet up to snuff to make sense of this, so I gave up and
moved the whole thing to a build environment with gcc version 6.3.0
instead where the build completed successfully. Would be a good idea if
someone could address the above errors properly.
Now looking at the binary size. I configured out all parsers and used
the smallest ACL config (no Neon, etc) to keep things simple. I got:
$ ls -l libarmnn.so
-rwxr-xr-x 1 nico nico 2816920 Dec 14 13:53 libarmnn.so
$ size libarmnn.so
text data bss dec hex filename
2080167 69088 2436 2151691 20d50b libarmnn.so
Finding out where that 2080167 bytes of text (which also includes
rodata) is distributed should be interesting.
After some scripting, I got the following list of symbols sorted by
their size:
Type Size Symbol
T 20288 armnn::IWorkloadFactory::IsLayerSupported(armnn::BackendId const&, armnn::IConnectableLayer const&, armnn::Optional<armnn::DataType>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
T 16288 _init
d 13840 typeinfo for boost::system::(anonymous namespace)::system_error_category
T 11568 armnn::Profiler::Print(std::ostream&) const
T 7784 armnn::RefLstmFloat32Workload::Execute() const
T 6056 armnn::Optimize(armnn::INetwork const&, std::vector<armnn::BackendId, std::allocator<armnn::BackendId> > const&, armnn::IDeviceSpec const&, armnn::OptimizerOptions const&, armnn::Optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&>)
T 5344 armnn::StringifyLayerParameters<armnn::Pooling2dDescriptor>::Serialize(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&, armnn::Pooling2dDescriptor const&)
T 5224 armnn::Graph::Print() const
T 5112 boost::thread::physical_concurrency()
T 4624 armnn::Graph::AddCopyLayers()
T 4528 armnn::LoadedNetwork::LoadedNetwork(std::unique_ptr<armnn::OptimizedNetwork, std::default_delete<armnn::OptimizedNetwork> >)
T 4520 armnn::Layer::VerifyLayerConnections(unsigned int, armnn::CheckLocation const&) const
T 4472 armnn::Runtime::UnloadNetwork(int)
T 4128 boost::log::v2s_mt_posix::attribute_name::get_id_from_string(char const*)
t 4096 e843419@002d_000018a1_5824
t 4092 e843419@007c_00003070_c
t 4092 e843419@0041_00002011_1ed0
T 4024 armnn::SubGraphSelector::SelectSubGraphs(armnn::Graph&, std::function<bool (armnn::Layer const&)> const&)
T 3864 armnn::RefBatchNormalizationUint8Workload::Execute() const
T 3776 armnn::RefConvolution2dUint8Workload::Execute() const
[...]
This shows a long list of symbols whose size follows a pretty regular
curve towards zero. In other words, there is no obvious outlier. The
first few symbols could be investigated for their largish size, but that
wouldn't make a significant dent in the total size.
However, there are 1688 symbols with a non-zero size. That corresponds
to an average of 1274 bytes per symbol which is not unreasonable. It's
the sheer amount of them that is overwhelming. Without the ability to
parse a model at compile time which would allow for static linking of
only the necessary ops then there is hardly no way to easily scale this
down.
Quick observation: the size of boost related symbols alone is 190416 bytes.
That's it for now. Once again, please feel free to comment.
Nicolas
_______________________________________________
Armnn-dev mailing list
Armnn-dev(a)lists.linaro.org
https://lists.linaro.org/mailman/listinfo/armnn-dev
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.