Skip to content

Releases: ermig1979/Simd

Simd v6.2.152

01 Aug 08:24

Choose a tag to compare

Algorithms

New features
  • AVX2, AVX-512BW optimizations of class SynetQuantizedAddUniform.
  • Base implementation of class SynetQuantizedInnerProductRef.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedInnerProductGemmNN.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcSpecV0.
  • Base implementation, SSE4.1, AVX2 optimizations of class SynetQuantizedConvolutionNhwcDepthwise.
Improve
  • AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
Bug fixing
  • Error in NEON optimization of function Float32ToBFloat16.
  • Error in Base implementation of class SynetQuantizedConvolutionNhwcGemm.
  • Error in Base implementation of class SynetQuantizedConvolutionGemm.

Test framework

New features
  • Tests for verifying functionality of SynetQuantizedInnerProduct framework.

Simd v6.2.151

07 Jul 10:36

Choose a tag to compare

Algorithms

New features
  • Supporting of OpenCV compatibility in Simd::Resize (SimdResizeMethodBilinearOpenCv).
  • AVX-512BW optimizations of class ResizerByteBilinearOpenCv.
  • Base implementation of class SynetQuantizedConvolutionGemm.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetQuantizedConvolutionNhwcGemm.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetDequantizeLinear.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetQuantizeLinear.
  • Base implementation, SSE4.1 optimizations of class SynetQuantizedAddUniform.
Improve
  • AVX2 optimizations of class ResizerByteBilinearOpenCv.
Bug fixing
  • Linker error in ResizeOpenCvSpecialTest.

Test framework

New features
  • Tests for verifying functionality of class SynetQuantizedConvolution framework.
  • Tests for verifying functionality of function SynetDequantizeLinear.
  • Tests for verifying functionality of function SynetQuantizeLinear.
  • Tests for verifying functionality of class SynetQuantizedAdd framework.

Python wrapper

New features
  • BilinearOpenCv in Simd.ResizeMethod enumeration.

Infrastructure

Removing
  • Support of Microsoft Visual Studio 2015.
  • Support of Microsoft Visual Studio 2017.

Simd v6.1.150

02 Jun 08:24

Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2 optimizations of class ResizerByteBilinearOpenCv.
Improve
  • Base implementation, SSE4.1, AVX2 optimizations of function SynetPoolingAverage.
  • Base implementation, SSE4.1, AVX2 optimizations of class SynetGridSample2d32fBlZ.

Test framework

New features
  • Special tests to compare Simd and OpenCV resize.

Simd v6.1.149

05 May 07:59

Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution16bNhwcSpecV1.
  • AMX tile config changes caching.
  • Function SimdSetAmxFull.
Improve
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution16bNhwcSpecV0.
Bug fixing
  • Error in function Simd::SynetSetInput.
Renaming
  • Class SynetConvolution16bNhwcDirect to SynetConvolution16bNhwcSpecV0.

Infrastructure

Bug fixing
  • CMake warning (required minimal version of CMake must be greater or equal to 3.10).

Simd v6.1.148

01 Apr 08:27

Choose a tag to compare

New features
  • ForwardSmallNK algorithm in Base implementation of class SynetDeconvolution16bNhwcGemm.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetChannelSum16b.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetScale16b.

Algorithms

Improving
  • AMX-BF16 optimizations of class SynetDeconvolution16bNhwcGemm.
  • AMX-BF16 optimizations of class SynetMergedConvolution16bCdc.
  • AMX-BF16 optimizations of class SynetMergedConvolution16bCd.
  • AMX-BF16 optimizations of class SynetMergedConvolution16bDc.
Bug fixing
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetDeconvolution16bNhwcGemm.
  • Error in Base implementation, AVX-512BW, AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
  • Error in class Xml::NodeIterator.
  • Error in class Xml::AttributeIterator.

Test framework

New features
  • Tests for verifying functionality of function SynetChannelSum16b.
  • Tests for verifying functionality of class SynetScale16b.
  • Pinning of test threads (-pt=1 command line argument).

Infrastructure

New features
  • Clang version parameter in Github actions script for CMake.
  • Check Clang-19 in Github actions script for CMake.

Simd v6.1.147

03 Mar 07:08

Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BgrToLab.
  • LAB pixel format in Simd::View.
  • LAB pixel format in Simd::Frame.
  • Supporting of BMP file format in function SimdImageSaveToMemory.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class ImageBmpSaver.
  • Supporting of BMP file format in function SimdImageLoadFromMemory.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class ImageBmpLoader.
Improving
  • AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm.
  • AMX-BF16 optimizations of class SynetConvolution16bNhwcDirect.
  • AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
  • Error in AVX-512BW optimizations of class SynetConvolution32fNhwcDepthwise.
  • Error in AMX-BF16 optimizations of class SynetConvolution16bNchwGemm.
  • Error in AMX-BF16 optimizations of class SynetMergedConvolution16bCdc (micro kernel DepthwiseConvolution3x3xH).
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetInnerProduct16bGemmNN.

Python wrapper

New features
  • Lab24 in Simd.PixelFormat enumeration.
  • Lab24 in Simd.FrameFormat enumeration.
  • Bmp in Simd.ImageFile enumeration.
  • Wrapper for function SimdBgrToLab.
  • Function Simd.BgrToLab.

Test framework

New features
  • Tests for verifying functionality of function BgrToLab.

Simd v6.1.146

04 Feb 11:22

Choose a tag to compare

Algorithms

New features
  • AVX2, AVX-512BW optimizations of class ResizerBf16Bilinear.
  • Deleter callback parameter in Simd::Frame.
Improving
  • SSE4.1 optimizations of class ResizerBf16Bilinear.
  • SSE4.1, AVX2, AVX-512BW optimizations of class ResizerFloatBilinear.
  • AMX-BF16 optimizations of class SynetConvolution16bNchwGemm.
  • AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm.
Bug fixing
  • Error in Base implementation of class SynetConvolution16bNchwGemm.

Test framework

New features
  • Special tests for verifying functionality of function DescrIntCosineDistancesMxNa.

Simd v61.145

01 Jan 21:01

Choose a tag to compare

Algorithms

New features
  • Parameter add in function SimdSynetMergedConvolution16bInit.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetTiledScale2D32f.
  • AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w6 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w4 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k3p1d1s1w8 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k3p1d1s1w6 for class SynetMergedConvolution16b.
  • Base implementation, SSE4.1 optimizations of class ResizerBf16Bilinear.
Improving
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k7p3d1s1w4.
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k7p3d1s1w6.
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k7p3d1s1w8.
  • Extend using of AVX-512BW optimization of function Convolution32fNhwcDepthwise_k7p3d1s1w4.
  • Extend using of AMX-BF16 optimization of function DepthwiseConvolution_k5p2d1s1w8.
  • Performance of SynetConvolution32f (NHWC, srcC=1, dstС=1).
Bug fixing
  • Error in AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
  • Error in AVX-512BW optimizations of class SynetAdd16bUniform.
  • Error in AMX-BF16 optimizations of function DepthwiseConvolutionDefault.
  • Error in AMX-BF16 optimizations of function DepthwiseConvolutionLargePad.
  • Error in Base implementation of class SynetMergedConvolution16bCdc.
  • Error in Base implementation of class SynetMergedConvolution16bCd.
  • Error in class InputMemoryStream.
Removing
  • Parameter compatibility in function SimdSynetMergedConvolution16bInit.
  • Parameter internal in function SimdSynetMergedConvolution16bSetParams.

Test framework

New features
  • Tests for verifying functionality of function SynetTiledScale2D32f.

Simd v6.1.144

02 Dec 12:31

Choose a tag to compare

Algorithms

New features
  • SSE4.1, AVX2 optimizations of function Yuv444pToRgbaV2.
  • SSE4.1 optimizations of class ImageJpegLoader.
  • isRgb parameter of function Simd::SynetSetInput.
Bug fixing
  • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm.

Python wrapper

New features
  • isRgb parameter of function Simd.SynetSetInput.

Simd v6.1.143

04 Nov 15:26

Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution16bNhwcDepthwise.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for class SynetConvolution32fNhwcDepthwise.
  • AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w4 for class SynetMergedConvolution16b.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for class SynetConvolution32fNhwcDepthwise.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for class SynetConvolution32fNhwcDepthwise.
  • AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w6 for class SynetMergedConvolution16b.
  • AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w8 for class SynetMergedConvolution16b.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for framework SynetMergedConvolution32f.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for framework SynetMergedConvolution32f.
  • AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for framework SynetMergedConvolution32f.
  • AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w8 for class SynetMergedConvolution16b.
  • Base implementation of function SimdYuv444pToRgbaV2.
Improving
  • AVX-512BW optimizations of function Convolution32fNhwcDepthwiseDefault.
  • AMX-BF16 optimizations of function DepthwiseConvolutionLargePad.
Bug fixing
  • Error in Base implementation of class SynetDeconvolution16bNhwcGemm.

Test framework

New features
  • Tests for verifying functionality of function SimdYuv444pToRgbaV2.