Hello,
It has come to our attention that the Compute Library v23.02 release contains the following erratum:
* Missing .bazelrc file for experimental Bazel builds
This erratum has now been rectified in the latest commit of the main branch on the GitHub release repository: cfb1c3035cbfc31a2fe8491c7df13e911698e2b6
Please use this commit if you rely on the new experimental Bazel build for Compute Library.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hello,
The 23.02 release of Compute Library is out and comes with a collection of improvements and new features.
Source code and prebuilt binaries are available at: https://github.com/ARM-software/ComputeLibrary/releases/tag/v23.02
Highlights of the release:
v23.02 Public major release
* New features:
* Rework the experimental dynamic fusion interface by identifying auxiliary and intermediate tensors, and specifying an explicit output operator.
* Add the following operators to the experimental dynamic fusion API:
* GpuAdd, GpuCast, GpuClamp, GpuDepthwiseConv2d, GpuMul, GpuOutput, GpuPool2d, GpuReshape, GpuResize, GpuSoftmax, GpuSub.
* Add SME/SME2 kernels for GeMM, Winograd convolution, Depthwise convolution and Pooling.
* Add new CPU operator AddMulAdd for float and quantized types.
* Add new flag ITensorInfo::lock_paddings() to tensors to prevent extending tensor paddings.
* Add experimental support for CPU only Bazel and CMake builds.
* Performance optimizations:
* Optimize CPU base-e exponential functions for FP32.
* Optimize CPU StridedSlice by copying first dimension elements in bulk where possible.
* Optimize CPU quantized Subtraction by reusing the quantized Addition kernel.
* Optimize CPU ReduceMean by removing quantization steps and performing the operation in integer domain.
* Optimize GPU Scale and Dynamic Fusion GpuResize by removing quantization steps and performing the operation in integer domain.
* Update the heuristic for CLDepthwiseConvolutionNative kernel.
* Add new optimized OpenCL kernel to compute indirect convolution:
* ClIndirectConv2dKernel
* Add new optimized OpenCL kernel to compute transposed convolution:
* ClTransposedConvolutionKernel
* Update recommended/minimum NDK version to r20b.
* Various optimizations and bug fixes.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.