Ubuntu 16.04 LTS is reaching End of Life.
Ubuntu Linux 16.04 LTS will no longer be supported by April 30, 2021.
At that time, Ubuntu 16.04 LTS will no longer receive security patches or other software updates.
Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially
supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.
Yours Sincerely
The Arm NN Team
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
The ArmNN team is pleased to announce the release of ArmNN 21.02.
ArmNN 21.02 Release Notes
Summary
The 21.02 Release provides two major pieces of functionality: one performance related, namely the ability to cache compiled OpenCL kernels when running on the GPU backend. Cached kernel files can be loaded into the runtime eliminating the cost of compiling their associated graphs resulting in significant performance uplift on first execution of a newly loaded graph. The second is that the operators which were not added to the ArmNN Tensorflow Lite delegate in the 20.11 release are now there giving the delegate the same level of operator support as the android-nn-driver.
The other features of the 21.02 release are updating the Tensorflow Lite parser to work with Tensorflow Lite v2.3.1 and changes to the public APIs to make binary compatibility between releases easier to maintain. Each group of public interfaces SDK, backend, TfLiteDelegate etc. have been separately versioned and will have their version independently updated in subsequent releases to indicate changes in their Application Binary Interface (ABI).
Support has also been added for the SSD-MobileNetv2 and SSD-MobileNetv3 models. The models have been verified to execute correctly with good performance. Work to generate accuracy figures for the models using the tensorflow lite coco_object_detection tool is on-going and will be published when complete.
Two configuration options for the CpuAcc backend have been added one to specify the number of threads to use when executing ML workloads on the CPU the other to load an MLGO tuning file to increase the performance of GEMM operations on the CPU.
ArmNN SDK
New Features:
* Added ability to save and load the ClContext through ExecuteNetwork and the Android-nn-driver.
* This will remove the time taken for initial compilation of OpenCL kernels and speed up the first execution.
* Semantic Versioning for ArmNN APIs
* Arm NN TfLite Delegate (more extensive details in Arm NN TfLite Delegate section)
* Further operator support
* Add capability to build on Android
* Verification of Support of SSD-MobileNetv2 & SSD-MobileNetv2
TfLite Parser:
* Added DEPTH_TO_SPACE operator support
* Added GATHER operator support
* Added SUM operator support
* Added REDUCE_MAX, REDUCE_MIN operator support
Tf Parser:
* Added support for ELU activation
* Support Dilation in Conv2D
ONNX Parser:
* Support Dilation in Conv2D
Caffe Parser:
* Added Dilation support
* Added argmax deconv support
ArmNN Serializer
* Serialise ArmNN Model on android-nn-driver
ExecuteNetwork App Changes:
* Two optimization parameters were added to enable saving and loading of the ClContext.
* --save-cached-network
* --cached-network-filepath
Other changes:
* Make it easier for backends to traverse the subgraph during optimization by sorting Subgraphview layers on construction
* Added CL/NEON implementation of RANK Workload
* Added REDUCE layer for REDUCE_MAX, REDUCE_MIN, REDUCE_SUM operators
* Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support CpuRef Backend
* Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload CpuAcc Backend
* Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload GpuAcc Backend
* Added more Fused Activation unit tests
* Handle Neon optionality on 32 bit linux platforms
* Validated MobileNetv2-SSD and MobileNetv3-SSD support (further details in executive summary)
* Add CpuAcc specific configuration option numberOfThreads
* Add GpuAcc MLGO tuning file configuration argument
Bug Fixes:
* Default stride values in depthwise and convolution to 1 instead of 0
* Fixed transpose conv InferOutputShape
* Fix incorrect padding value for asymmetric quantized type
* Fix build breaks for armnnDeserializer test and Threads.cpp for macosx.
* Further fix for macosx where filenames are case insensitive
* Unittest failure on mipsel/s390x/ppc64/powerpc
* ArmnnQuantizer incorrectly Quantizes all DataTypes
* Fixed TFLite parser not parsing TransposeConvolution
* Fix TfLite parser and ExecuteNetwork issues where error was not thrown in some cases
* Fix wav2letter not producing correct output for Neon backend
* Fix ReduceLayer InferOutputShape issue where the correct axis data will be read in TfLiteParser
* Fix Reduce workload to allow input tensors of any rank into the validate function
* Updated JsonPrinterTestImpl to use CpuLogitsDLogSoftmaxKernel_#
* Add missing serializer support for m_DimensionsSpecificity
* Removed unnecessary friend function in INetwork and fixed TransformIterator operator= to allow compilation on further compilers
Known issues:
Deprecation Notification:
The following components have been deprecated and will be removed in the next (21.05) release of ArmNN
* armnnQuantizer
Now that the Tensorflow Lite Converter (https://www.tensorflow.org/lite/convert/) has mature post training quantization capabilities the need for this component has gone.
See: https://www.tensorflow.org/model_optimization/guide/quantization/post_train…andhttps://www.tensorflow.org/lite/performance/post_training_quantization for more details.
* armnnTfParser
As Tensorflow Lite is our current recommended deployment environment for ArmNN and the Tensorflow Lite Converter provides a path for converting most common machine learning
models into Tensorflow Lite format the need for a Tensorflow parser has gone.
* armnnCaffeParser
Caffe is no longer as widely used as a framework for machine learning as it once was.
TfLite Delegate
New Features:
* Enabled ELU Activation
* Enabled HARD_SWISH Activation
* Added GATHER operator support
* Added Logical AND, NOT and OR operator support.
* Added PAD operator support
* Added PADV2 operator support
* Added SPLIT operator support
* Added SPLIT_V operator support
* Added ARG_MAX operator support
* Added ARG_MIN operator support
* Added LOCAL_RESPONSE_NORMALIZATION operator support
* Added L2_NORMALIZATION operator support
* Added BATCH_TO_SPACE_ND operator support
* Added SPACE_TO_BATCH_ND operator support
* Added DEPTH_TO_SPACE operator support
* Added SPACE_TO_DEPTH operator support
* Added SUM operator support
* Added REDUCE_MAX, REDUCE_MIN operator support
* Added FLOOR operator support
* Added OptimizerOptions
* Reduce Float32 to Float16
* Reduce Float32 to BFloat16
* Enable debug data
* Enable memory import
* Added STRIDED_SLICE operator support
* Added LSTM operator support
Other Changes:
* Provided Android build
* Removed Tensorflow requirement
Bug Fixes:
* Fixed fused activation in Fully Connected layer
* Fixed TfLiteDelegate Reshape operator failure when running models with 2D shape tensor.
Known Issues:
Android NNAPI driver
Deprecated features:
New Features:
* if "-request-inputs-and-outputs-dump-dir" is enabled it will serialize the network graph to a ".armnn" file to given directory
* Added ability to save and load the ClContext through Android-nn-driver.
* Two optimization parameters were added to enable:
*
"q,cached-network-file", "If non-empty, the given file will be used to load/save cached network. "
"If save-cached-network option is given will save the cached network to given file."
"If save-cached-network option is not given will load the cached network from given "
"file."
* "s,save-cached-network", "Enables saving the cached network to the file given with cached-network-file option."
Other Changes:
* Provide LayerSupportHandle to frontend users
* Update setup and Android.bp files to build v8.2a driver
* Add CpuAcc specific configuration option numberOfThreads
* Add GpuAcc MLGO tuning file configuration argument
Build Dependencies
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake
3.5.1 (Ubuntu) and 3.7.2 (Debian)
Acl
branches/arm_compute_21_02
android-nn-driver
branches/android-nn-driver_21_02
npu backend
boost 1.64
Tensorflow
2.3.1
Caffe
tag 1.0
Onnx
1.6.0
Flatbuffer 1.12.0
Protobuf
3.12.0
Eigen3
3.3
Android 10 & 11
Mali Driver r26p0_01eac0
Android NDK
r20b
mapbox/variant 1.2.0
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.