- Armnn-dev - lists.linaro.org

by Pavel Macenauer

Hi! There is a branch experimental/pyarmnn, created by Matthew Bentham, which contains python wrappers for armnn, which initially seems to work pretty well - building a whl archive works, which can be installed using pip and I was able to write an example, which runs inference on a float/quantized model and using all the supported frameworks - tf, tf-lite, caffe, onnx as well. What is missing is to get the python wrappers integrated, run and check unit tests and write a few examples. We discussed this with Matthew already, but I would be glad to hear more opinions regarding how we proceed and to kick off a discussion. 1. How to integrate pyarmnn? There are 2 paths initially: 1. Build pyarmnn together with armnn using a single cmake command * By default it would be turned off, otherwise it would be build using e.g. -DBUILD_PYARMNN * The product is either a whl or a src package - so should there be 2 options e.g. -DBUILD_PYARMNN_SRC, -DBUILD_PYARMNN_WHL or only a single one, which would always build both? 2. Separate pyarmnn from armnn into a different repository (and keep it as a separate project) * Additionally to a) options -DARMNN_LIB and -DARMNN_INCLUDE would be required as well, so that it can be "linked" against configurable armnn build The difference is mainly in maintainability - a) forces to maintain pyarmnn and update the swig files to generate wrappers per every release b) on the other hand keeps the project separate, allows to build pyarmnn with a configurable armnn release and doesn't create a dependency to update the swig files whenever armnn interface changes a little. 1. Remove tox? Yes/No - Tox is a python automation library, which is used to generate the wrappers or to run unit tests. It is not really needed, because the wrappers can be generated directly using swig and the src/whl packages generated using python/setuptools and it just creates another dependency. Unit tests can also be run directly using python. 2. Get pyarmnn published on pypi.org? Yes/No - we would be able to install pyarmnn using "pip install pyarmnn" Any additional ideas, comments, feedback etc. would be of course appreciated. Thanks! Pavel M

5 years, 5 months

2
1
0 0

ILayerSupport interface

by Joshua Slater

Hi all, Regarding the ILayerSupport interface in ILayerSupport.hpp, most of the methods have output TensorInfos. Some of the methods (e.g. IsDetectionPostProcessSupported) don't have output infos. This caused an issue in our custom backend because we were unable to check the output tensor info and reject the layer properly. I think it should be possible to have this information for all layers. What do you think? Thanks, Josh IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

5 years, 7 months

3
2
0 0

Re: [Armnn-dev] Armnn-dev Digest, Vol 8, Issue 3

by vinay karnam

Hello Derek, Is this issue still open..? If open can I work on it? On Mon, Oct 28, 2019 at 5:30 PM <armnn-dev-request(a)lists.linaro.org> wrote: > Send Armnn-dev mailing list submissions to > armnn-dev(a)lists.linaro.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.linaro.org/mailman/listinfo/armnn-dev > or, via email, send a message with subject or body 'help' to > armnn-dev-request(a)lists.linaro.org > > You can reach the person managing the list at > armnn-dev-owner(a)lists.linaro.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Armnn-dev digest..." > > > Today's Topics: > > 1. Re: ArmNN | ONXX model load issue (Derek Lamberti) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 28 Oct 2019 10:31:11 +0000 > From: Derek Lamberti <derek.lamberti(a)linaro.org> > To: Rahul Chowdhury <rahul.c(a)pathpartnertech.com> > Cc: Manjunath Kulkarni <manjunath.kulkarni(a)pathpartnertech.com>, > armnn-dev(a)lists.linaro.org > Subject: Re: [Armnn-dev] ArmNN | ONXX model load issue > Message-ID: > <CAPeFqV89WNV-sw20X0NB= > tznsiMmNpw0bzj23HcgNSEmoFFmWg(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Hi Rahul, > > > ArmNN doesn't support zero dimension tensors implicitly. Often this > can be resolved by converting the tensor to a 1D tensor with 1 > element. We have done this conversion automatically within the TfLite > parser and this has worked for a particular use case we ran into. A > similar solution might work for your use case too. This could be done > within the ToTensorInfo() function in OnnxParser.cpp. If this resolves > the issue for you I'd recommend issuing a pull request so that we can > integrate it into master. > > > Hope that helps, > ~Derek > > On Thu, 22 Aug 2019 at 15:32, Rahul Chowdhury > <rahul.c(a)pathpartnertech.com> wrote: > > > > Hi, > > > > We are using ArmNN to cross-compile a standalone C++ application on Linux > > that loads a standard onnx model. During the model loading, we see a > crash > > with the below error output - > > > > terminate called after throwing an instance of > > 'armnn::InvalidArgumentException' > > what(): Tensor numDimensions must be greater than 0 > > > > Initially we were on armnn master, and later we switched to tag v19.05, > but > > the error was same for both. > > > > Below is the code snippet to load the model - > > armnnOnnxParser::IOnnxParserPtr parser = > > armnnOnnxParser::IOnnxParser::Create(); > > std::cout << "\nmodel load start"; > > armnn::INetworkPtr network = > > parser->CreateNetworkFromBinaryFile("onnx_3DDFA.onnx"); > > std::cout << "\nmodel load end"; > > > > It crashes after printing "model load start" with the error message > printed > > above. > > > > A gdb backtrace is also provided below - > > (gdb) r > > Starting program: > > /home/root/Rahul/armnn_onnx/3DDFA_ArmNN_onnx/3ddfa_armnn_onnx > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib/libthread_db.so.1". > > > > terminate called after throwing an instance of > > 'armnn::InvalidArgumentException' > > what(): Tensor numDimensions must be greater than 0 > > model load start > > Program received signal SIGABRT, Aborted. > > __GI_raise (sig=sig@entry=6) at > > /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51 > > 51 } > > (gdb) bt > > #0 __GI_raise (sig=sig@entry=6) at > > /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51 > > #1 0x0000ffffbe41df00 in __GI_abort () at > > /usr/src/debug/glibc/2.26-r0/git/stdlib/abort.c:90 > > #2 0x0000ffffbe6aa0f8 in __gnu_cxx::__verbose_terminate_handler() () > from > > /usr/lib/libstdc++.so.6 > > #3 0x0000ffffbe6a7afc in ?? () from /usr/lib/libstdc++.so.6 > > #4 0x0000ffffbe6a7b50 in std::terminate() () from > /usr/lib/libstdc++.so.6 > > #5 0x0000ffffbe6a7e20 in __cxa_throw () from /usr/lib/libstdc++.so.6 > > #6 0x0000ffffbefdad84 in armnn::TensorShape::TensorShape(unsigned int, > > unsigned int const*) () from > /home/root/Rahul/armnn_onnx/build/libarmnn.so > > #7 0x0000ffffbe7e34d8 in armnnOnnxParser::(anonymous > > namespace)::ToTensorInfo(onnx::ValueInfoProto const&) [clone > > .constprop.493] () from > > /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so > > #8 0x0000ffffbe7e4080 in > > > armnnOnnxParser::OnnxParser::SetupInfo(google::protobuf::RepeatedPtrField<onnx::ValueInfoProto> > > const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so > > #9 0x0000ffffbe7e41ac in armnnOnnxParser::OnnxParser::LoadGraph() () > from > > /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so > > #10 0x0000ffffbe7e4760 in > > armnnOnnxParser::OnnxParser::CreateNetworkFromModel(onnx::ModelProto&) () > > from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so > > #11 0x0000ffffbe7e49b0 in > > armnnOnnxParser::OnnxParser::CreateNetworkFromBinaryFile(char const*) () > > from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so > > #12 0x0000000000402290 in main () > > (gdb) > > > > > > Can someone point out if we are missing something out or doing something > > wrong? Any help or input is highly appreciated. > > > > > > Regards, > > Rahul > > > > -- > > > > > > > > > > > > > > This > > message contains confidential information and is intended only > > for the > > individual(s) named. If you are not the intended > > recipient, you are > > notified that disclosing, copying, distributing or taking any > > action in > > reliance on the contents of this mail and attached file/s is strictly > > > > prohibited. Please notify the > > sender immediately and delete this e-mail > > from your system. E-mail transmission > > cannot be guaranteed to be secured or > > error-free as information could be > > intercepted, corrupted, lost, destroyed, > > arrive late or incomplete, or contain > > viruses. The sender therefore does > > not accept liability for any errors or > > omissions in the contents of this > > message, which arise as a result of e-mail > > transmission. > > _______________________________________________ > > Armnn-dev mailing list > > Armnn-dev(a)lists.linaro.org > > https://lists.linaro.org/mailman/listinfo/armnn-dev > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Armnn-dev mailing list > Armnn-dev(a)lists.linaro.org > https://lists.linaro.org/mailman/listinfo/armnn-dev > > > ------------------------------ > > End of Armnn-dev Digest, Vol 8, Issue 3 > *************************************** >

5 years, 7 months

2
1
0 0

ArmNN | ONXX model load issue

by Rahul Chowdhury

Hi, We are using ArmNN to cross-compile a standalone C++ application on Linux that loads a standard onnx model. During the model loading, we see a crash with the below error output - terminate called after throwing an instance of 'armnn::InvalidArgumentException' what(): Tensor numDimensions must be greater than 0 Initially we were on armnn master, and later we switched to tag v19.05, but the error was same for both. Below is the code snippet to load the model - armnnOnnxParser::IOnnxParserPtr parser = armnnOnnxParser::IOnnxParser::Create(); std::cout << "\nmodel load start"; armnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile("onnx_3DDFA.onnx"); std::cout << "\nmodel load end"; It crashes after printing "model load start" with the error message printed above. A gdb backtrace is also provided below - (gdb) r Starting program: /home/root/Rahul/armnn_onnx/3DDFA_ArmNN_onnx/3ddfa_armnn_onnx [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". terminate called after throwing an instance of 'armnn::InvalidArgumentException' what(): Tensor numDimensions must be greater than 0 model load start Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51 51 } (gdb) bt #0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/raise.c:51 #1 0x0000ffffbe41df00 in __GI_abort () at /usr/src/debug/glibc/2.26-r0/git/stdlib/abort.c:90 #2 0x0000ffffbe6aa0f8 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #3 0x0000ffffbe6a7afc in ?? () from /usr/lib/libstdc++.so.6 #4 0x0000ffffbe6a7b50 in std::terminate() () from /usr/lib/libstdc++.so.6 #5 0x0000ffffbe6a7e20 in __cxa_throw () from /usr/lib/libstdc++.so.6 #6 0x0000ffffbefdad84 in armnn::TensorShape::TensorShape(unsigned int, unsigned int const*) () from /home/root/Rahul/armnn_onnx/build/libarmnn.so #7 0x0000ffffbe7e34d8 in armnnOnnxParser::(anonymous namespace)::ToTensorInfo(onnx::ValueInfoProto const&) [clone .constprop.493] () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so #8 0x0000ffffbe7e4080 in armnnOnnxParser::OnnxParser::SetupInfo(google::protobuf::RepeatedPtrField<onnx::ValueInfoProto> const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so #9 0x0000ffffbe7e41ac in armnnOnnxParser::OnnxParser::LoadGraph() () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so #10 0x0000ffffbe7e4760 in armnnOnnxParser::OnnxParser::CreateNetworkFromModel(onnx::ModelProto&) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so #11 0x0000ffffbe7e49b0 in armnnOnnxParser::OnnxParser::CreateNetworkFromBinaryFile(char const*) () from /home/root/Rahul/armnn_onnx/build/libarmnnOnnxParser.so #12 0x0000000000402290 in main () (gdb) Can someone point out if we are missing something out or doing something wrong? Any help or input is highly appreciated. Regards, Rahul -- This message contains confidential information and is intended only for the individual(s) named. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this mail and attached file/s is strictly prohibited. Please notify the sender immediately and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission.

5 years, 8 months

2
1
0 0

Configuration for gerrit review of ArmNN

by Jammy Zhou

Hi, I'm trying to send a minor patch in ArmNN for review, but I ran into some authentication failure below for 'git review' (I added the gerrit server with ‘git remote add gerrit https://review.mlplatform.org/ml/armnn’). remote: Unauthorized fatal: Authentication failed for 'https://review.mlplatform.org/ml/armnn/' I can login to the gerrit server with the same username/password. Is there any special permission required? I cannot find related information in mlplatform.org website. Please let me know if I missed something. Thanks, Jammy

5 years, 8 months

2
5
0 0

Re: [Armnn-dev] Validation of inputs

by Robert Hughes

Hi Derek, Thanks for your reply and I'm glad we agree on this. Are there any tickets/issues which I can use to track the changes that you have suggested? Thanks, Rob From: Derek Lamberti <Derek.Lamberti(a)arm.com> Sent: 13 August 2019 14:01 To: Matthew Bentham <Matthew.Bentham(a)arm.com>; Robert Hughes <Robert.Hughes(a)arm.com>; Armnn-dev(a)lists.linaro.org Subject: Re: Validation of inputs Hi Rob, Yes, I think this is certainly an area where we should do better: 1. For completeness, there are different levels of validation that need to occur. This can be different from the validation performed by the backend::IsSupported() functions. For example, IsSupported only needs to report what is valid for that backend implementation, which may cover only a subset of the full ArmNN specification for the layer. Also worth mentioning. 2. There are different stages where we should perform the validation. * On the input graph during Graph building (much like you suggested). This would be validation against the ArmNN spec and would indicate to the user immediately (at the point of error) that they have tried to add an invalid op. * On the LoadedNetwork during workload creation. This is essentially what the current code does and is a validation against the ArmNN spec. However, it's currently performed during the construction of the workloads, and should instead be called by the ArmNN framework just before, which would be independent of workload implementation. I would also make this check for debug builds only. It's useful to validate that all the graph transformations to this point have been valid. 1. Furthermore, there are different stages where we could perform further validation. * In the optimizer (post-backend-optimization) to verify the optimized result. This is essentially, the same as 2b (above) but earlier in the pipe and would give better user experience. It is required because backend optimization implementations could produce an invalid graph. If we do it early enough in the optimizer pipeline, we could use it to reject invalid optimizations from the backends, and fall-back to the next backend instead of failing outright. * It would remain up to the back implementations to verify that the workloads created are compatible with their implementations. Similar to the IsSupported() but during workload creation (like we are doing now). This could also be made a debug only option. 1. InferTensorInfos should certainly be safe code. We will soon be updating this code so that it can be used to actually infer the shapes for tensors where the shape is unknown in the model (rather than just for validation). Suffice it to say, the current implementation could be safer. Thanks for your feedback and keep it coming. I'm eager to make ArmNN a lot more user friendly in the coming year, so this all helps. Regards, ~Derek ________________________________ From: Matthew Bentham <Matthew.Bentham(a)arm.com<mailto:Matthew.Bentham@arm.com>> Sent: 13 August 2019 12:00 To: Robert Hughes <Robert.Hughes(a)arm.com<mailto:Robert.Hughes@arm.com>>; Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org> <Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org>> Cc: Derek Lamberti <Derek.Lamberti(a)arm.com<mailto:Derek.Lamberti@arm.com>> Subject: Re: Validation of inputs Thanks Rob, that does seem wrong. At initial glance it looks to me like QueueDescriptor::Validate should not exist, and all that checking should move to roughly where InferTensorInfos is called now. I'll let Derek comment further. All the best, Matthew ________________________________ From: Armnn-dev <armnn-dev-bounces(a)lists.linaro.org<mailto:armnn-dev-bounces@lists.linaro.org>> on behalf of Robert Hughes <Robert.Hughes(a)arm.com<mailto:Robert.Hughes@arm.com>> Sent: 13 August 2019 11:52 To: Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org> <Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org>> Subject: [Armnn-dev] Validation of inputs Hi ArmNN dev team, I am part of the team developing the ArmNN backend for the Arm NPU and have some concerns about the validation that the ArmNN core library performs on its inputs. Below is a description of how I believe validation is performed within ArmNN and the problems that I see with this. This understanding may be flawed so please correct me where I have misunderstood. When the user creates an INetwork there is minimal validation of the data provided by the user. For example, the dimensionality of input tensors is not checked at this point. The user then calls Optimize() which performs the following steps: 1. InferTensorInfos() - this calls ValidateTensorShapesFromInputs on each Layer in the Graph which confirms that the output tensor shape set on each Layer during Network construction is consistent with the Layer's inputs. For the example of a FullyConnectedLayer, this uses the shape of the input and the shape of the weights to determine the correct output shape. This code seems to make assumptions about the dimensionality of the inputs tensors, for example FullyConnectedLayer::InferOutputShapes() indexes into the input and weight shapes without checking their dimensionality first. 2. AssignBackends() - this calls each backend's IsLayerSupported() APIs. The only data that has been validated so far is that the output shapes of each layer are correct, so the backend IsLayerSupported() APIs cannot assume anything about the shapes of the tensors. This means the backends must perform additional validation. 3. ApplyBackendOptimizations() - this gives each backend the opportunity to "optimize" each subgraph which has been assigned to it. Again, the layers passed to the backend still have not been properly validated, although the backend has had the chance to reject the layers via the IsLayerSupported() APIs. The user then creates a LoadedNetwork from the IOptimizedNetwork which creates the Workloads. This is delegated to the backend's IWorkloadFactory which is responsible for returning an object implementing IWorkload. In the case of the default backends (reference, Neon, CL), these workloads derive from BaseWorkload, which calls Validate() on the QueueDescriptor for that workload type. This is the place that seems to perform the "proper" validation of what is supported by ArmNN. In the example of Fully Connected, FullyConnectedQueueDescriptor::Validate checks the dimensionality of all tensors, the quantisastion infos, etc. Note that there seems to be no requirement that this validation code is called at all, in the case that the backend-created workloads do not inherit BaseWorkload (this is always the case for backends which replace subgraphs with PreCompiledLayers). The problems that this causes are as follows: * The InferTensorInfos() code could crash as it makes assumptions that have not been validated * Every backend's IsLayerSupported APIs must duplicate the validation code in ArmNN in order to check that the layer is valid, before they even get to the point of checking if that particular backend supports it. * The "proper" ArmNN layer validation code may never be run, depending on how the backend processes the graph. Specifically, in the case of backends which replace subgraphs with PreCompiledLayers, the validation code is never run. * These problems affect both end users of the ArmNN API and backend developers I would suggest that a better method of validation would be to validate the INetwork completely before it is processed any further. This could be done during construction of the INetwork or as the first step in Optimize(). This would simplify the backend code as it would not need to duplicate ArmNN's validation code and give a more consistent interface to end users. Please let me know your thoughts, Thanks, Rob IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ Armnn-dev mailing list Armnn-dev(a)lists.linaro.org<mailto:Armnn-dev@lists.linaro.org> https://lists.linaro.org/mailman/listinfo/armnn-dev IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

5 years, 10 months

1
0
0 0

Validation of inputs

by Robert Hughes

Hi ArmNN dev team, I am part of the team developing the ArmNN backend for the Arm NPU and have some concerns about the validation that the ArmNN core library performs on its inputs. Below is a description of how I believe validation is performed within ArmNN and the problems that I see with this. This understanding may be flawed so please correct me where I have misunderstood. When the user creates an INetwork there is minimal validation of the data provided by the user. For example, the dimensionality of input tensors is not checked at this point. The user then calls Optimize() which performs the following steps: 1. InferTensorInfos() - this calls ValidateTensorShapesFromInputs on each Layer in the Graph which confirms that the output tensor shape set on each Layer during Network construction is consistent with the Layer's inputs. For the example of a FullyConnectedLayer, this uses the shape of the input and the shape of the weights to determine the correct output shape. This code seems to make assumptions about the dimensionality of the inputs tensors, for example FullyConnectedLayer::InferOutputShapes() indexes into the input and weight shapes without checking their dimensionality first. 2. AssignBackends() - this calls each backend's IsLayerSupported() APIs. The only data that has been validated so far is that the output shapes of each layer are correct, so the backend IsLayerSupported() APIs cannot assume anything about the shapes of the tensors. This means the backends must perform additional validation. 3. ApplyBackendOptimizations() - this gives each backend the opportunity to "optimize" each subgraph which has been assigned to it. Again, the layers passed to the backend still have not been properly validated, although the backend has had the chance to reject the layers via the IsLayerSupported() APIs. The user then creates a LoadedNetwork from the IOptimizedNetwork which creates the Workloads. This is delegated to the backend's IWorkloadFactory which is responsible for returning an object implementing IWorkload. In the case of the default backends (reference, Neon, CL), these workloads derive from BaseWorkload, which calls Validate() on the QueueDescriptor for that workload type. This is the place that seems to perform the "proper" validation of what is supported by ArmNN. In the example of Fully Connected, FullyConnectedQueueDescriptor::Validate checks the dimensionality of all tensors, the quantisastion infos, etc. Note that there seems to be no requirement that this validation code is called at all, in the case that the backend-created workloads do not inherit BaseWorkload (this is always the case for backends which replace subgraphs with PreCompiledLayers). The problems that this causes are as follows: * The InferTensorInfos() code could crash as it makes assumptions that have not been validated * Every backend's IsLayerSupported APIs must duplicate the validation code in ArmNN in order to check that the layer is valid, before they even get to the point of checking if that particular backend supports it. * The "proper" ArmNN layer validation code may never be run, depending on how the backend processes the graph. Specifically, in the case of backends which replace subgraphs with PreCompiledLayers, the validation code is never run. * These problems affect both end users of the ArmNN API and backend developers I would suggest that a better method of validation would be to validate the INetwork completely before it is processed any further. This could be done during construction of the INetwork or as the first step in Optimize(). This would simplify the backend code as it would not need to duplicate ArmNN's validation code and give a more consistent interface to end users. Please let me know your thoughts, Thanks, Rob IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

5 years, 11 months

2
1
0 0

Zero-copy and other backend API improvements

by Derek Lamberti

Hi I'd like to submit for consideration and discussion the following proposed backend API design to address some of the current limitations regarding excessive mem copies and sub-optimal memory behavior in Arm NN. This design also lays the foundation for future roadmap items to address protected content and affects backend authors only. One open question which I would like feedback on is "how important is backward compatibility and stability of this backend API?". I believe it should be possible to keep existing backends working though it would be far simpler from an implementation and testing perspective if we could implement this in an API breaking way. Of course, if this is unacceptable for the community we will endeavor to maintain the current API (though deprecated) along side the new API for at least one release cycle. As the API matures, I expect these type of intrusive changes to become far less common. ... So why change the API? The current design requires that all tensors are allocated by the backend which executes the workload. The workload inputs and outputs are allocated by the backend via the workload factory interface. In order for inter-backend compatibility to work, all TensorHandles are required to implement Map/UnMap methods which expose the raw CPU accessible pointer. A standard mem copy is then applied to copy the data from one tensor type to another using these mapped tensors. This copy is even performed in situations where different backends could potentially use the same TensorHandle type making the mem copy redundant. The current mechanism is not sufficient to cover all the multiple types of heaps that may be available on a system or the different usage patterns required for optimal performance. What follows is a design which should enable the ArmNN framework to minimize the number of mem copies required when transitioning between different backends while also allowing backends to use their optimal heaps while maintaining compatibility and correct functionality. Design There are two aspects to this design: a mechanism to query tensor compatibility between backends a mechanism to select and allocate the best compatible tensor type. TensorHandle Factory This design introduces a new interface class ITensorHandleFactory which exposes the following methods: virtual std::unique_ptr<ITensorHandle> CreateSubTensorHandle(ITensorHandle& parent, TensorShape const& subTensorShape, unsigned int const* subTensorOrigin) const = 0; virtual std::unique_ptr<ITensorHandle> CreateTensorHandle(const TensorInfo& tensorInfo) const = 0; virtual const FactoryId GetId() const = 0; These methods are currently located on the IWorkloadFactory interface. By moving this interface onto a new dedicated class, it becomes possible for backends to implement multiple factories, each with different TensorHandle properties. FactoryId Each TensorHandleFactory has a globally unique identifier string. This should take the form of "VendorId/BackendName/FactoryName". Multiple factories It should be possible for a backend to support multiple TensorHandle types, each with different access properties. For example, a discreet GPU might have GPU memory tensors (which are not mappable but provide fast read/write access by the GPU) and staging Tensors (which are mappable and slower access). In this scenario, the framework should use the GPU tensors between workloads which execute on the GPU, and staging Tensors which transition between the GPU and another backend. Another scenario where this would be useful is for vendors with proprietary formats/compression/layout where these tensors would not be compatible with other backends. The current design cannot support these easily. TensorHandleFactoryRegistry Each backend will register its TensorHandleFactory objects as well as any IMemoryManager objects they might require. There is a new method on the IBackendInternal interface which backend authors need to implement. virtual void RegisterTensorHandleFactories(class TensorHandleFactoryRegistry& registry) {} The implementation of this method needs to create the concrete factory and memory manager instances and register them via the following methods on the ITensorHandleFactoryRegistry parameter object. void RegisterFactory(std::unique_ptr<ITensorHandleFactory> factory); void RegisterMemoryManager(std::weak_ptr<IMemoryManager> memoryManger); Note: The registry currently takes ownership of the factories but only keeps a weak ptr to the memory manager. The exact detail of this interface is not final and could change regarding ownership. TensorHandleFactory preferences In some scenarios, such as on a system with a Unified Memory Architectures and compatible APIs, it might be possible for two different backends to be able to access Tensors of the same TensorHandle type. For example, The CpuAcc (Neon) backend can work just as well using tensors allocated by the GpuAcc (CL) backend. In order to support this in a generic way the backend will be able to report a list of known TensorHandleFactory instances that it is compatible with. To support this, the following method is added to the IBackendInternal interface. virtual std::vector<ITensorHandleFactory::FactoryId> GetHandleFactoryPreferences() const = 0; This method should return, in preference order, the FactoryId of any factories (including its own) with which the backend is compatible. The ranking is in the order from highest performance to highest compatibility. In the discreet GPU example, the GPU only tensor factory would be first on the list and the tensor factory which supports Map/Unmap would be second. TensorHandleFactory properties There will be additional methods on this ITensorHandleFactory interface to query the properties of the TensorHandles allocated by the factory (exact API TBD). These properties will be queried by the Optimizer when coming up with a tensor handle strategy for "optimal performance". Some example properties might be: SupportsSubTensors - Equivalent to existing functionality on the IWorkloadFactory SupportsMapUnmap - Currently Map/Unmap support is required however this will likely become optional in the future. SupportsMemoryImport - The mem copy of inputs could be removed for scenarios where TensorHandles can import externally allocated memory. SupportsMemoryExport - The mem copy between different backends could be removed for scenarios where the two backends support memory export and memory import respectively. The framework will use these properties to determine the best strategy for allocation (ie which factory to use or when to insert memcopies) and to identify unsupported/invalid scenarios (ie no compatible factories found). MemoryTypes For memory import and export scenarios, we will limit this to CPU addressable memory for this initial implementation. In the future we can add support for import from Dma_buf or IonBuffer and even protected DmaBuf. ... I hope you'll agree that this design opens a lot of potential for improved flexibility and performance. I look forward to further discussions on this subject. Kind regards, Derek

6 years

2
3
0 0

IRC channel

by Matthew Bentham

Hi all, We've set up a persistent IRC channel on FreeNode called #mlplatform for random chat about Arm NN and Compute Library development. All the best, Matthew IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 1 month

1
0
0 0

Re: [Armnn-dev] initial Arm NN binary size observations

by Matthew Bentham

Thanks Nicolas, very interesting. Caveat: I'm on holiday without a computer at the moment so can't check anything :-) That string copying one looks like a false positive for the warning, we can probably rearrange the code to avoid it. Maybe the documentation for the warning has some advice. The exception catch in the other one should be a catch by const reference (ie. (const InvalidArgumentException &e)). On the code size thing, I imagine what we have is a few clusters of highly related symbols. For example, the IsLayerSupported functions have common backend wrangling and error handling in each that maybe we could factor out? All the best and happy Christmas! Matthew On 19 Dec 2018 04:44, Nicolas Pitre <nicolas.pitre(a)linaro.org> wrote: Hello everybody, Before we all go into Xmas mode and things start to fizzle out of my head, here's a quick summary of my observations so far. Any comments welcome. To start with, Arm NN does not compile successfully with gcc version 8.2.1. The first error to be hit is: /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~ In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’, inlined from ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’ at /home/nico/armnn/src/armnn/LayerSupport.cpp:342:5: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~ The build progresses a bit further when using -Wno-stringop-overflow. However it then fails on this: /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsActivationSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::ActivationDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:78:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsActivationSupported, input, output, descriptor); ^~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsAdditionSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::TensorInfo&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:93:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsAdditionSupported, input0, input1, output); ^~~~~~~~~~~~~~~~~~~~~~~~~~ [...] My C++-fu is not yet up to snuff to make sense of this, so I gave up and moved the whole thing to a build environment with gcc version 6.3.0 instead where the build completed successfully. Would be a good idea if someone could address the above errors properly. Now looking at the binary size. I configured out all parsers and used the smallest ACL config (no Neon, etc) to keep things simple. I got: $ ls -l libarmnn.so -rwxr-xr-x 1 nico nico 2816920 Dec 14 13:53 libarmnn.so $ size libarmnn.so text data bss dec hex filename 2080167 69088 2436 2151691 20d50b libarmnn.so Finding out where that 2080167 bytes of text (which also includes rodata) is distributed should be interesting. After some scripting, I got the following list of symbols sorted by their size: Type Size Symbol T 20288 armnn::IWorkloadFactory::IsLayerSupported(armnn::BackendId const&, armnn::IConnectableLayer const&, armnn::Optional<armnn::DataType>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) T 16288 _init d 13840 typeinfo for boost::system::(anonymous namespace)::system_error_category T 11568 armnn::Profiler::Print(std::ostream&) const T 7784 armnn::RefLstmFloat32Workload::Execute() const T 6056 armnn::Optimize(armnn::INetwork const&, std::vector<armnn::BackendId, std::allocator<armnn::BackendId> > const&, armnn::IDeviceSpec const&, armnn::OptimizerOptions const&, armnn::Optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&>) T 5344 armnn::StringifyLayerParameters<armnn::Pooling2dDescriptor>::Serialize(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&, armnn::Pooling2dDescriptor const&) T 5224 armnn::Graph::Print() const T 5112 boost::thread::physical_concurrency() T 4624 armnn::Graph::AddCopyLayers() T 4528 armnn::LoadedNetwork::LoadedNetwork(std::unique_ptr<armnn::OptimizedNetwork, std::default_delete<armnn::OptimizedNetwork> >) T 4520 armnn::Layer::VerifyLayerConnections(unsigned int, armnn::CheckLocation const&) const T 4472 armnn::Runtime::UnloadNetwork(int) T 4128 boost::log::v2s_mt_posix::attribute_name::get_id_from_string(char const*) t 4096 e843419@002d_000018a1_5824 t 4092 e843419@007c_00003070_c t 4092 e843419@0041_00002011_1ed0 T 4024 armnn::SubGraphSelector::SelectSubGraphs(armnn::Graph&, std::function<bool (armnn::Layer const&)> const&) T 3864 armnn::RefBatchNormalizationUint8Workload::Execute() const T 3776 armnn::RefConvolution2dUint8Workload::Execute() const [...] This shows a long list of symbols whose size follows a pretty regular curve towards zero. In other words, there is no obvious outlier. The first few symbols could be investigated for their largish size, but that wouldn't make a significant dent in the total size. However, there are 1688 symbols with a non-zero size. That corresponds to an average of 1274 bytes per symbol which is not unreasonable. It's the sheer amount of them that is overwhelming. Without the ability to parse a model at compile time which would allow for static linking of only the necessary ops then there is hardly no way to easily scale this down. Quick observation: the size of boost related symbols alone is 190416 bytes. That's it for now. Once again, please feel free to comment. Nicolas _______________________________________________ Armnn-dev mailing list Armnn-dev(a)lists.linaro.org https://lists.linaro.org/mailman/listinfo/armnn-dev IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

6 years, 4 months

2
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

Armnn-dev