Hi, I wonder if ARM Compute Library can be built and run on ARM v7l
processors, qith subset of the functionalities as SVE not supported on
v7l? Thanks for info
Hello,
The 23.08 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.08
[https://opengraph.githubassets.com/6f01aff4f7ab61ec8b32d60f2ac777cf469f2c19…]<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.08>
Release v23.08 · ARM-software/ComputeLibrary<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.08>
Public major release Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: https://arm-software.github.io/ComputeLibrary/v23.08/github.com
Highlights of the release:
* Rewrite CLArgMinMaxLayer for axis 0 and enable S64 output.
* Add multi-sketch support for dynamic fusion.
* Break up arm_compute/core/Types.h and utils/Utils.h a bit to reduce unused code in each inclusion of these headers.
* Add Fused Activation to CLMatMul.
* Implement FP32/FP16 opencl::kernels::ClMatMulNativeMMULKernel using the MMUL extension.
* Use MatMul in fully connected layer with dynamic weights when supported.
* Optimize CPU depthwise convolution with channel multiplier.
* Add support in CpuCastKernel for conversion of S64/U64 to F32.
* Add new OpenCL™ kernels:
opencl::kernels::ClMatMulNativeMMULKernel support for FP32 and FP16, with batch support
* Enable transposed convolution with non-square kernels on CPU and GPU.
* Add support for input data type U64/S64 in CLCast.
* Add new Compute Kernel Writer (CKW) subproject that offers a C++ interface to generate tile-based OpenCL code in just-in-time fashion.
* Port the following kernels in the experimental Dynamic Fusion interface to use the new Compute Kernel Writer interface with support for FP16/FP32 only:
experimental::dynamic_fusion::GpuCkwActivation
experimental::dynamic_fusion::GpuCkwCast
experimental::dynamic_fusion::GpuCkwDirectConv2d
experimental::dynamic_fusion::GpuCkwElementwiseBinary
experimental::dynamic_fusion::GpuCkwStore
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The 23.05 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at:
Release v23.05 · ARM-software/ComputeLibrary<https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.05>
Public major release Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: https://arm-software.github.io/ComputeLibrary/v23.05/
Highlights of the release:
- New features:
* Add new Arm® Neon™ kernels / functions:
* NEMatMul for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
* NEReorderLayer (aarch64 only)
* Add new OpenCL™ kernels / functions:
* CLMatMul support for QASYMM8, QASYMM8_SIGNED, FP32 and FP16, with batch support.
* Add support for the multiple dimensions in the indices parameter for both the Arm® Neon™ and OpenCL™ implementations of the Gather Layer.
* Add support for dynamic weights in CLFullyConnectedLayer and NEFullyConnectedLayer for all data types.
* Add support for cropping in the Arm® Neon™ and OpenCL™: implementations of the BatchToSpace Layer for all data types.
* Add support for quantized data types for the ElementwiseUnary Operators for Arm® Neon™.
* Implement RSQRT for quantized data types on OpenCL™.
* Add FP16 depthwise convolution kernels for SME2.
- Performance optimizations:
* Improve CLTuner exhaustive mode tuning time.
- Deprecate dynamic block shape in NEBatchToSpaceLayer and CLBatchToSpaceLayer.
- Various optimizations and bug fixes.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The v23.02.1 patch release of Compute Library is out and comes with several fixes.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.02.1
Highlights of the release:
v23.02.1 Public patch release:
* Allow mismatching data layouts between the source tensor and weights for CpuGemmDirectConv2d with fixed format kernels.
* Fixes for experimental CPU only Bazel and CMake builds.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
It has come to our attention that the Compute Library v23.02 release contains the following erratum:
* Missing .bazelrc file for experimental Bazel builds
This erratum has now been rectified in the latest commit of the main branch on the GitHub release repository: cfb1c3035cbfc31a2fe8491c7df13e911698e2b6
Please use this commit if you rely on the new experimental Bazel build for Compute Library.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The 23.02 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.02
Highlights of the release:
v23.02 Public major release
* New features:
* Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator.
* Add the following operators to the experimental dynamic fusion API:
* GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub.
* Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling.
* Add new CPU operator AddMulAdd for float and quantized types.
* Add new flag ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings.
* Add experimental support for CPU only Bazel and CMake builds.
* Performance optimizations:
* Optimize CPU base-e exponential functions for FP32.
* Optimize CPU StridedSlice by copying first dimension elements in bulk where possible.
* Optimize CPU quantized Subtraction by reusing the quantized Addition kernel.
* Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain.
* Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain.
* Update the heuristic for CLDepthwiseConvolutionNative kernel.
* Add new optimized OpenCL kernel to compute indirect convolution:
* ClIndirectConv2dKernel
* Add new optimized OpenCL kernel to compute transposed convolution:
* ClTransposedConvolutionKernel
* Update recommended/minimum NDK version to r20b.
* Various optimizations and bug fixes.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The 22.11 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v22.11
Release v22.11 · ARM-software/ComputeLibrary<https://github.com/ARM-software/ComputeLibrary/releases/tag/v22.11>
Public major release Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: https://arm-software.github.io/ComputeLibrary/v22.11/github.com<http://github.com/>
Highlights of the release:
* New features:
* Add new experimental dynamic fusion API.
* Add CPU batch matrix multiplication with adj_x = false and adj_y = false for FP32.
* Add CPU MeanStdDevNorm for QASYMM8.
* Add CPU and GPU GELU activation function for FP32 and FP16.
* Add CPU swish activation function for FP32 and FP16.
* Performance optimizations:
* Optimize CPU bilinear scale for FP32, FP16, QASYMM8, QASYMM8_SIGNED, U8 and S8.
* Optimize CPU activation functions using LUT-based implementation:
* Sigmoid function for QASYMM8 and QASYMM8_SIGNED.
* Hard swish function for QASYMM8_SIGNED.
* Optimize CPU addition for QASYMM8 and QASYMM8_SIGNED using fixed-point arithmetic.
* Optimize CPU multiplication, subtraction and activation layers by considering tensors as 1D.
* Optimize GPU depthwise convolution kernel and heuristic.
* Optimize GPU Conv2d heuristic.
* Optimize CPU MeanStdDevNorm for FP16.
* Optimize CPU tanh activation function for FP16 using rational approximation.
* Improve GPU GeMMLowp start-up time.
* Various optimizations and bug fixes.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The 22.08 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: <https://github.com/ARM-software/ComputeLibrary/releases/tag/v22.08>
Release v22.08 · ARM-software/ComputeLibrary<https://github.com/ARM-software/ComputeLibrary/releases/tag/v22.05>
Public major release Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available here: https://arm-software.github.io/ComputeLibrary/v22.08/
Highlights of the release:
* Add Dynamic Fusion of Elementwise Operators: Div, Floor, Add.
* Optimize the gemm_reshaped_rhs_nly_nt OpenCL kernel using the arm_matrix_multiply extension available for Arm® Mali™-G715 and Arm® Mali™-G615.
* Add support for the arm_matrix_multiply extension in the gemmlowp_mm_reshaped_only_rhs_t OpenCL kernel.
* Expand GPUTarget list with missing Mali™ GPUs product names: G57, G68, G78AE, G610, G510, G310.
* Extend the direct convolution 2d interface to configure the block size.
* Update ClConv2D heuristic to use direct convolution.
* Use official Khronos® OpenCL extensions:
* Add cl_khr_integer_dot_product extension support.
* Add support of OpenCL 3.0 non-uniform workgroup.
* Cpu performance optimizations:
* Add LUT-based implementation of Hard Swish and Leaky ReLU activation function for aarch64 build.
* Optimize Add layer by considering the input tensors as 1D array.
* Add fixed-format BF16, FP16 and FP32 Neon™ GEMM kernels to support variable weights.
* Add experimental support for native builds for Windows on Arm®.
* Build flag interpretation change: arch=armv8.6-a now translates to -march=armv8.6-a CXX flag instead of march=armv8.2-a + explicit selection of feature extensions.
* armv7a with Android build will no longer be tested or maintained.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.