Thanks Nicolas, very interesting.
Caveat: I'm on holiday without a computer at the moment so can't check anything :-)
That string copying one looks like a false positive for the warning, we can probably rearrange the code to avoid it. Maybe the documentation for the warning has some advice.
The exception catch in the other one should be a catch by const reference (ie. (const InvalidArgumentException &e)).
On the code size thing, I imagine what we have is a few clusters of highly related symbols. For example, the IsLayerSupported functions have common backend wrangling and error handling in each that maybe we could factor out?
All the best and happy Christmas! Matthew
On 19 Dec 2018 04:44, Nicolas Pitre nicolas.pitre@linaro.org wrote: Hello everybody,
Before we all go into Xmas mode and things start to fizzle out of my head, here's a quick summary of my observations so far. Any comments welcome.
To start with, Arm NN does not compile successfully with gcc version 8.2.1.
The first error to be hit is:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~ In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’, inlined from ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’ at /home/nico/armnn/src/armnn/LayerSupport.cpp:342:5: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~
The build progresses a bit further when using -Wno-stringop-overflow.
However it then fails on this:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsActivationSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::ActivationDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:78:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsActivationSupported, input, output, descriptor); ^~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsAdditionSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::TensorInfo&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:93:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsAdditionSupported, input0, input1, output); ^~~~~~~~~~~~~~~~~~~~~~~~~~ [...]
My C++-fu is not yet up to snuff to make sense of this, so I gave up and moved the whole thing to a build environment with gcc version 6.3.0 instead where the build completed successfully. Would be a good idea if someone could address the above errors properly.
Now looking at the binary size. I configured out all parsers and used the smallest ACL config (no Neon, etc) to keep things simple. I got:
$ ls -l libarmnn.so -rwxr-xr-x 1 nico nico 2816920 Dec 14 13:53 libarmnn.so
$ size libarmnn.so text data bss dec hex filename 2080167 69088 2436 2151691 20d50b libarmnn.so
Finding out where that 2080167 bytes of text (which also includes rodata) is distributed should be interesting.
After some scripting, I got the following list of symbols sorted by their size:
Type Size Symbol T 20288 armnn::IWorkloadFactory::IsLayerSupported(armnn::BackendId const&, armnn::IConnectableLayer const&, armnn::Optionalarmnn::DataType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) T 16288 _init d 13840 typeinfo for boost::system::(anonymous namespace)::system_error_category T 11568 armnn::Profiler::Print(std::ostream&) const T 7784 armnn::RefLstmFloat32Workload::Execute() const T 6056 armnn::Optimize(armnn::INetwork const&, std::vector<armnn::BackendId, std::allocatorarmnn::BackendId > const&, armnn::IDeviceSpec const&, armnn::OptimizerOptions const&, armnn::Optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&>) T 5344 armnn::StringifyLayerParametersarmnn::Pooling2dDescriptor::Serialize(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&, armnn::Pooling2dDescriptor const&) T 5224 armnn::Graph::Print() const T 5112 boost::thread::physical_concurrency() T 4624 armnn::Graph::AddCopyLayers() T 4528 armnn::LoadedNetwork::LoadedNetwork(std::unique_ptr<armnn::OptimizedNetwork, std::default_deletearmnn::OptimizedNetwork >) T 4520 armnn::Layer::VerifyLayerConnections(unsigned int, armnn::CheckLocation const&) const T 4472 armnn::Runtime::UnloadNetwork(int) T 4128 boost::log::v2s_mt_posix::attribute_name::get_id_from_string(char const*) t 4096 e843419@002d_000018a1_5824 t 4092 e843419@007c_00003070_c t 4092 e843419@0041_00002011_1ed0 T 4024 armnn::SubGraphSelector::SelectSubGraphs(armnn::Graph&, std::function<bool (armnn::Layer const&)> const&) T 3864 armnn::RefBatchNormalizationUint8Workload::Execute() const T 3776 armnn::RefConvolution2dUint8Workload::Execute() const [...]
This shows a long list of symbols whose size follows a pretty regular curve towards zero. In other words, there is no obvious outlier. The first few symbols could be investigated for their largish size, but that wouldn't make a significant dent in the total size.
However, there are 1688 symbols with a non-zero size. That corresponds to an average of 1274 bytes per symbol which is not unreasonable. It's the sheer amount of them that is overwhelming. Without the ability to parse a model at compile time which would allow for static linking of only the necessary ops then there is hardly no way to easily scale this down.
Quick observation: the size of boost related symbols alone is 190416 bytes.
That's it for now. Once again, please feel free to comment.
Nicolas _______________________________________________ Armnn-dev mailing list Armnn-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/armnn-dev
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
After some delays dealing with unrelated issues, and going through a C++ knowledge refresh, I'm now back to this. The build errors were indeed very trivial to fix... trivial now that I understand what the compiler is telling me that is.
For the out-of-bound warning, I'm suggesting this fix:
diff --git a/src/armnn/LayerSupport.cpp b/src/armnn/LayerSupport.cpp index 6489fe4..caca14b 100644 --- a/src/armnn/LayerSupport.cpp +++ b/src/armnn/LayerSupport.cpp @@ -26,8 +26,8 @@ void CopyErrorMessage(char* truncatedString, const char* fullString, size_t maxL { if(truncatedString != nullptr) { - size_t copyLength = std::min(maxLength, strlen(fullString)); - std::strncpy(truncatedString, fullString, copyLength); + size_t copyLength = std::min(maxLength-1, strlen(fullString)); + std::memcpy(truncatedString, fullString, copyLength); // Ensure null-terminated string. truncatedString[copyLength] = '\0'; }
In addition to the strncpy() warning, the original code already had an issue with storing a null past the buffer if copyLength == maxLength.
Back to the size story. My latest observations are as follows:
text data bss dec hex filename 2147131 69376 2356 2218863 21db6f libarmnn.so 314314 7896 392 322602 4ec2a libarmnnTfLiteParser.so 4421419 134728 27928 4584075 45f28b libarmnnTfParser.so
The TF parser is gigantic! More than twice the size of the whole ArmNN base library. The TF Lite parser is more reasonable size-wise. Yet, they don't require much in terms of libarmnn symbols:
libarmnnTfLiteParser: armnn::BaseTensor<void const*>::BaseTensor(armnn::BaseTensor<void const*> const&) armnn::BaseTensor<void const*>::BaseTensor(armnn::TensorInfo const&, void const*) armnn::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) armnn::Exception::what() const armnn::INetwork::Create() armnn::OriginsDescriptor::~OriginsDescriptor() armnn::OriginsDescriptor::OriginsDescriptor(unsigned int, unsigned int) armnn::OriginsDescriptor::SetViewOriginCoord(unsigned int, unsigned int, unsigned int) armnn::PermutationVector::PermutationVector(std::initializer_list<unsigned int>) armnn::TensorInfo::GetNumBytes() const armnn::TensorInfo::operator=(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo(unsigned int, unsigned int const*, armnn::DataType, float, int) armnn::TensorShape::GetNumElements() const armnn::TensorShape::operator=(armnn::TensorShape const&) armnn::TensorShape::TensorShape() armnn::TensorShape::TensorShape(unsigned int, unsigned int const*) armnnUtils::Permuted(armnn::TensorInfo const&, armnn::PermutationVector const&)
libarmnnTfParser: armnn::BaseTensor<void const*>::BaseTensor(armnn::TensorInfo const&, void const*) armnn::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) armnn::Exception::what() const armnn::INetwork::Create() armnn::OriginsDescriptor::~OriginsDescriptor() armnn::OriginsDescriptor::OriginsDescriptor(unsigned int, unsigned int) armnn::OriginsDescriptor::SetViewOriginCoord(unsigned int, unsigned int, unsigned int) armnn::PermutationVector::PermutationVector(std::initializer_list<unsigned int>) armnn::TensorInfo::GetNumBytes() const armnn::TensorInfo::operator=(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo() armnn::TensorInfo::TensorInfo(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo(armnn::TensorShape const&, armnn::DataType, float, int) armnn::TensorInfo::TensorInfo(unsigned int, unsigned int const*, armnn::DataType, float, int) armnn::TensorShape::GetNumElements() const armnn::TensorShape::operator=(armnn::TensorShape const&) armnn::TensorShape::TensorShape() armnn::TensorShape::TensorShape(armnn::TensorShape const&) armnn::TensorShape::TensorShape(std::initializer_list<unsigned int>) armnn::TensorShape::TensorShape(unsigned int, unsigned int const*) armnnUtils::Permuted(armnn::TensorInfo const&, armnn::PermutationVector const&)
Despite the parser size difference, they relate to more or less the same number of symbols (18 vs 21).
I think that the ultimate solution for bringing the resulting binary size down should involve static linking. If parsers could generate some source code with as many constants as possible to be directly linked against the ArmNN API, then compiler features such as devirtualization and Link Time Optimization (LTO) could be leveraged to shrink the final executable. That is especially true for TF models. The runtime dynamic nature of the network creation might play against LTO though.
Of course that would also mean that the model would be completely embedded into the executable. This is an advantage on small systems with constrained resources that may only accommodate small models in the first place. I think this is the only scenario where size optimization makes sense. Larger systems are likely to run larger models that are likely to dwarf the library size, and demand paging should keep unused portions of the library out of RAM anyway.
What do you think?
On Fri, 21 Dec 2018, Matthew Bentham wrote:
Thanks Nicolas, very interesting.
Caveat: I'm on holiday without a computer at the moment so can't check anything :-)
That string copying one looks like a false positive for the warning, we can probably rearrange the code to avoid it. Maybe the documentation for the warning has some advice.
The exception catch in the other one should be a catch by const reference (ie. (const InvalidArgumentException &e)).
On the code size thing, I imagine what we have is a few clusters of highly related symbols. For example, the IsLayerSupported functions have common backend wrangling and error handling in each that maybe we could factor out?
All the best and happy Christmas! Matthew
On 19 Dec 2018 04:44, Nicolas Pitre nicolas.pitre@linaro.org wrote: Hello everybody,
Before we all go into Xmas mode and things start to fizzle out of my head, here's a quick summary of my observations so far. Any comments welcome.
To start with, Arm NN does not compile successfully with gcc version 8.2.1.
The first error to be hit is:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~ In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’, inlined from ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’ at /home/nico/armnn/src/armnn/LayerSupport.cpp:342:5: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~
The build progresses a bit further when using -Wno-stringop-overflow.
However it then fails on this:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsActivationSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::ActivationDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:78:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsActivationSupported, input, output, descriptor); ^~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsAdditionSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::TensorInfo&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:93:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsAdditionSupported, input0, input1, output); ^~~~~~~~~~~~~~~~~~~~~~~~~~ [...]
My C++-fu is not yet up to snuff to make sense of this, so I gave up and moved the whole thing to a build environment with gcc version 6.3.0 instead where the build completed successfully. Would be a good idea if someone could address the above errors properly.
Now looking at the binary size. I configured out all parsers and used the smallest ACL config (no Neon, etc) to keep things simple. I got:
$ ls -l libarmnn.so -rwxr-xr-x 1 nico nico 2816920 Dec 14 13:53 libarmnn.so
$ size libarmnn.so text data bss dec hex filename 2080167 69088 2436 2151691 20d50b libarmnn.so
Finding out where that 2080167 bytes of text (which also includes rodata) is distributed should be interesting.
After some scripting, I got the following list of symbols sorted by their size:
Type Size Symbol T 20288 armnn::IWorkloadFactory::IsLayerSupported(armnn::BackendId const&, armnn::IConnectableLayer const&, armnn::Optionalarmnn::DataType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) T 16288 _init d 13840 typeinfo for boost::system::(anonymous namespace)::system_error_category T 11568 armnn::Profiler::Print(std::ostream&) const T 7784 armnn::RefLstmFloat32Workload::Execute() const T 6056 armnn::Optimize(armnn::INetwork const&, std::vector<armnn::BackendId, std::allocatorarmnn::BackendId > const&, armnn::IDeviceSpec const&, armnn::OptimizerOptions const&, armnn::Optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&>) T 5344 armnn::StringifyLayerParametersarmnn::Pooling2dDescriptor::Serialize(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&, armnn::Pooling2dDescriptor const&) T 5224 armnn::Graph::Print() const T 5112 boost::thread::physical_concurrency() T 4624 armnn::Graph::AddCopyLayers() T 4528 armnn::LoadedNetwork::LoadedNetwork(std::unique_ptr<armnn::OptimizedNetwork, std::default_deletearmnn::OptimizedNetwork >) T 4520 armnn::Layer::VerifyLayerConnections(unsigned int, armnn::CheckLocation const&) const T 4472 armnn::Runtime::UnloadNetwork(int) T 4128 boost::log::v2s_mt_posix::attribute_name::get_id_from_string(char const*) t 4096 e843419@002d_000018a1_5824 t 4092 e843419@007c_00003070_c t 4092 e843419@0041_00002011_1ed0 T 4024 armnn::SubGraphSelector::SelectSubGraphs(armnn::Graph&, std::function<bool (armnn::Layer const&)> const&) T 3864 armnn::RefBatchNormalizationUint8Workload::Execute() const T 3776 armnn::RefConvolution2dUint8Workload::Execute() const [...]
This shows a long list of symbols whose size follows a pretty regular curve towards zero. In other words, there is no obvious outlier. The first few symbols could be investigated for their largish size, but that wouldn't make a significant dent in the total size.
However, there are 1688 symbols with a non-zero size. That corresponds to an average of 1274 bytes per symbol which is not unreasonable. It's the sheer amount of them that is overwhelming. Without the ability to parse a model at compile time which would allow for static linking of only the necessary ops then there is hardly no way to easily scale this down.
Quick observation: the size of boost related symbols alone is 190416 bytes.
That's it for now. Once again, please feel free to comment.
Nicolas _______________________________________________ Armnn-dev mailing list Armnn-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/armnn-dev
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Nicolas,
Sorry for the delay replying, somehow I missed this email.
I was thinking, instead of changing to memcpy to silence the warning, how about just removing the std::min(maxLength, strlen(fullString)) to fix it in the spirit of the compiler warning - the max number of bytes to copy shouldn't be the length of the source string, it should be the length of the destination buffer. This will copy extra zeros which should be fine in this piece of error handling code.
On the code size thing, the Tensorflow Parser is indeed huge, which I believe is a consequence of using protobuf. I think people who care about code size won't be using Tensorflow or protobuf in practice, they'll want to convert to something smaller.
In the master branch at the moment we're adding a visitor mechanism that should enable ArmNN graphs to be serialized into any format desired, including C++ code that uses the ArmNN API. Would you be able to have a look to see if this fits what you're suggesting?
Thanks,
Matthew
________________________________ From: Nicolas Pitre nicolas.pitre@linaro.org Sent: 25 January 2019 05:07:11 To: Matthew Bentham Cc: armnn-dev@lists.linaro.org Subject: Re: [Armnn-dev] initial Arm NN binary size observations
After some delays dealing with unrelated issues, and going through a C++ knowledge refresh, I'm now back to this. The build errors were indeed very trivial to fix... trivial now that I understand what the compiler is telling me that is.
For the out-of-bound warning, I'm suggesting this fix:
diff --git a/src/armnn/LayerSupport.cpp b/src/armnn/LayerSupport.cpp index 6489fe4..caca14b 100644 --- a/src/armnn/LayerSupport.cpp +++ b/src/armnn/LayerSupport.cpp @@ -26,8 +26,8 @@ void CopyErrorMessage(char* truncatedString, const char* fullString, size_t maxL { if(truncatedString != nullptr) { - size_t copyLength = std::min(maxLength, strlen(fullString)); - std::strncpy(truncatedString, fullString, copyLength); + size_t copyLength = std::min(maxLength-1, strlen(fullString)); + std::memcpy(truncatedString, fullString, copyLength); // Ensure null-terminated string. truncatedString[copyLength] = '\0'; }
In addition to the strncpy() warning, the original code already had an issue with storing a null past the buffer if copyLength == maxLength.
Back to the size story. My latest observations are as follows:
text data bss dec hex filename 2147131 69376 2356 2218863 21db6f libarmnn.so 314314 7896 392 322602 4ec2a libarmnnTfLiteParser.so 4421419 134728 27928 4584075 45f28b libarmnnTfParser.so
The TF parser is gigantic! More than twice the size of the whole ArmNN base library. The TF Lite parser is more reasonable size-wise. Yet, they don't require much in terms of libarmnn symbols:
libarmnnTfLiteParser: armnn::BaseTensor<void const*>::BaseTensor(armnn::BaseTensor<void const*> const&) armnn::BaseTensor<void const*>::BaseTensor(armnn::TensorInfo const&, void const*) armnn::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) armnn::Exception::what() const armnn::INetwork::Create() armnn::OriginsDescriptor::~OriginsDescriptor() armnn::OriginsDescriptor::OriginsDescriptor(unsigned int, unsigned int) armnn::OriginsDescriptor::SetViewOriginCoord(unsigned int, unsigned int, unsigned int) armnn::PermutationVector::PermutationVector(std::initializer_list<unsigned int>) armnn::TensorInfo::GetNumBytes() const armnn::TensorInfo::operator=(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo(unsigned int, unsigned int const*, armnn::DataType, float, int) armnn::TensorShape::GetNumElements() const armnn::TensorShape::operator=(armnn::TensorShape const&) armnn::TensorShape::TensorShape() armnn::TensorShape::TensorShape(unsigned int, unsigned int const*) armnnUtils::Permuted(armnn::TensorInfo const&, armnn::PermutationVector const&)
libarmnnTfParser: armnn::BaseTensor<void const*>::BaseTensor(armnn::TensorInfo const&, void const*) armnn::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) armnn::Exception::what() const armnn::INetwork::Create() armnn::OriginsDescriptor::~OriginsDescriptor() armnn::OriginsDescriptor::OriginsDescriptor(unsigned int, unsigned int) armnn::OriginsDescriptor::SetViewOriginCoord(unsigned int, unsigned int, unsigned int) armnn::PermutationVector::PermutationVector(std::initializer_list<unsigned int>) armnn::TensorInfo::GetNumBytes() const armnn::TensorInfo::operator=(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo() armnn::TensorInfo::TensorInfo(armnn::TensorInfo const&) armnn::TensorInfo::TensorInfo(armnn::TensorShape const&, armnn::DataType, float, int) armnn::TensorInfo::TensorInfo(unsigned int, unsigned int const*, armnn::DataType, float, int) armnn::TensorShape::GetNumElements() const armnn::TensorShape::operator=(armnn::TensorShape const&) armnn::TensorShape::TensorShape() armnn::TensorShape::TensorShape(armnn::TensorShape const&) armnn::TensorShape::TensorShape(std::initializer_list<unsigned int>) armnn::TensorShape::TensorShape(unsigned int, unsigned int const*) armnnUtils::Permuted(armnn::TensorInfo const&, armnn::PermutationVector const&)
Despite the parser size difference, they relate to more or less the same number of symbols (18 vs 21).
I think that the ultimate solution for bringing the resulting binary size down should involve static linking. If parsers could generate some source code with as many constants as possible to be directly linked against the ArmNN API, then compiler features such as devirtualization and Link Time Optimization (LTO) could be leveraged to shrink the final executable. That is especially true for TF models. The runtime dynamic nature of the network creation might play against LTO though.
Of course that would also mean that the model would be completely embedded into the executable. This is an advantage on small systems with constrained resources that may only accommodate small models in the first place. I think this is the only scenario where size optimization makes sense. Larger systems are likely to run larger models that are likely to dwarf the library size, and demand paging should keep unused portions of the library out of RAM anyway.
What do you think?
On Fri, 21 Dec 2018, Matthew Bentham wrote:
Thanks Nicolas, very interesting.
Caveat: I'm on holiday without a computer at the moment so can't check anything :-)
That string copying one looks like a false positive for the warning, we can probably rearrange the code to avoid it. Maybe the documentation for the warning has some advice.
The exception catch in the other one should be a catch by const reference (ie. (const InvalidArgumentException &e)).
On the code size thing, I imagine what we have is a few clusters of highly related symbols. For example, the IsLayerSupported functions have common backend wrangling and error handling in each that maybe we could factor out?
All the best and happy Christmas! Matthew
On 19 Dec 2018 04:44, Nicolas Pitre nicolas.pitre@linaro.org wrote: Hello everybody,
Before we all go into Xmas mode and things start to fizzle out of my head, here's a quick summary of my observations so far. Any comments welcome.
To start with, Arm NN does not compile successfully with gcc version 8.2.1.
The first error to be hit is:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~ In function ‘void armnn::{anonymous}::CopyErrorMessage(char*, const char*, size_t)’, inlined from ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’ at /home/nico/armnn/src/armnn/LayerSupport.cpp:342:5: /home/nico/armnn/src/armnn/LayerSupport.cpp:30:21: error: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] std::strncpy(truncatedString, fullString, copyLength); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsSpaceToBatchNdSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::SpaceToBatchNdDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:29:55: note: length computed here size_t copyLength = std::min(maxLength, strlen(fullString)); ~~~~~~^~~~~~~~~~~~
The build progresses a bit further when using -Wno-stringop-overflow.
However it then fails on this:
/home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsActivationSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::ActivationDescriptor&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:78:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsActivationSupported, input, output, descriptor); ^~~~~~~~~~~~~~~~~~~~~~~~~~ /home/nico/armnn/src/armnn/LayerSupport.cpp: In function ‘bool armnn::IsAdditionSupported(const armnn::BackendId&, const armnn::TensorInfo&, const armnn::TensorInfo&, const armnn::TensorInfo&, char*, size_t)’: /home/nico/armnn/src/armnn/LayerSupport.cpp:60:39: error: catching polymorphic type ‘class armnn::InvalidArgumentException’ by value [-Werror=catch-value=] } catch (InvalidArgumentException e) { \ ^ /home/nico/armnn/src/armnn/LayerSupport.cpp:93:5: note: in expansion of macro ‘FORWARD_LAYER_SUPPORT_FUNC’ FORWARD_LAYER_SUPPORT_FUNC(backend, IsAdditionSupported, input0, input1, output); ^~~~~~~~~~~~~~~~~~~~~~~~~~ [...]
My C++-fu is not yet up to snuff to make sense of this, so I gave up and moved the whole thing to a build environment with gcc version 6.3.0 instead where the build completed successfully. Would be a good idea if someone could address the above errors properly.
Now looking at the binary size. I configured out all parsers and used the smallest ACL config (no Neon, etc) to keep things simple. I got:
$ ls -l libarmnn.so -rwxr-xr-x 1 nico nico 2816920 Dec 14 13:53 libarmnn.so
$ size libarmnn.so text data bss dec hex filename 2080167 69088 2436 2151691 20d50b libarmnn.so
Finding out where that 2080167 bytes of text (which also includes rodata) is distributed should be interesting.
After some scripting, I got the following list of symbols sorted by their size:
Type Size Symbol T 20288 armnn::IWorkloadFactory::IsLayerSupported(armnn::BackendId const&, armnn::IConnectableLayer const&, armnn::Optionalarmnn::DataType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) T 16288 _init d 13840 typeinfo for boost::system::(anonymous namespace)::system_error_category T 11568 armnn::Profiler::Print(std::ostream&) const T 7784 armnn::RefLstmFloat32Workload::Execute() const T 6056 armnn::Optimize(armnn::INetwork const&, std::vector<armnn::BackendId, std::allocatorarmnn::BackendId > const&, armnn::IDeviceSpec const&, armnn::OptimizerOptions const&, armnn::Optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&>) T 5344 armnn::StringifyLayerParametersarmnn::Pooling2dDescriptor::Serialize(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>&, armnn::Pooling2dDescriptor const&) T 5224 armnn::Graph::Print() const T 5112 boost::thread::physical_concurrency() T 4624 armnn::Graph::AddCopyLayers() T 4528 armnn::LoadedNetwork::LoadedNetwork(std::unique_ptr<armnn::OptimizedNetwork, std::default_deletearmnn::OptimizedNetwork >) T 4520 armnn::Layer::VerifyLayerConnections(unsigned int, armnn::CheckLocation const&) const T 4472 armnn::Runtime::UnloadNetwork(int) T 4128 boost::log::v2s_mt_posix::attribute_name::get_id_from_string(char const*) t 4096 e843419@002d_000018a1_5824 t 4092 e843419@007c_00003070_c t 4092 e843419@0041_00002011_1ed0 T 4024 armnn::SubGraphSelector::SelectSubGraphs(armnn::Graph&, std::function<bool (armnn::Layer const&)> const&) T 3864 armnn::RefBatchNormalizationUint8Workload::Execute() const T 3776 armnn::RefConvolution2dUint8Workload::Execute() const [...]
This shows a long list of symbols whose size follows a pretty regular curve towards zero. In other words, there is no obvious outlier. The first few symbols could be investigated for their largish size, but that wouldn't make a significant dent in the total size.
However, there are 1688 symbols with a non-zero size. That corresponds to an average of 1274 bytes per symbol which is not unreasonable. It's the sheer amount of them that is overwhelming. Without the ability to parse a model at compile time which would allow for static linking of only the necessary ops then there is hardly no way to easily scale this down.
Quick observation: the size of boost related symbols alone is 190416 bytes.
That's it for now. Once again, please feel free to comment.
Nicolas _______________________________________________ Armnn-dev mailing list Armnn-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/armnn-dev
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.