BLIS 1.2
This release contains several new features and optimizations related to threaded execution, as well as internal changes that improve maintainability and lay the groundwork for future refactoring. The build system and kernel sets saw lots of new code and tweaks to old code, and of course there were many bugfixes.
Improvements present in 1.2 (June 25, 2025):
Compatibility:
gemmtraliases for thegemmtBLAS and CBLAS compatibility functions have been added to support recent versions of LAPACK. (Mo Zhou)
Kernels:
- Fixed bug affecting reference kernels with clang 14.
- Fixed an incompatibility between the
haswellgemmsupkernels and gcc 15. (Dave Love, Christopher Hillenbrand)
Build system:
- Disabled
armsveon Windows due to build failures. (Hernan Martinez, Atsushi Tatsuma) - Moved
#include <omp.h>fromblis.hto the relevant source files. (Melven Roehrig-Zoellner) - Disable building KNL with gcc 15. (Dave Love)
Testing:
- CI testing infrastructure has moved to CircleCI.
Documentation:
- Widened print format in code examples to avoid misinterpretation of results. (Minh Quan Ho, Mason McBride)
Improvements present in 1.1 (January 15, 2025):
Compatibility:
- Added a ScaLAPACK compatibility mode which disables some conflicting BLAS definitions.
- Fixed issues with improperly escaped strings in python scripts for compatibility with python 3.12+. (@AngryLoki)
Kernels:
- Fixed an out-of-bounds read bug in the
haswellgemmsupkernels. (John Mather) - Fixed a bug in the complex-domain
gemmkernels forpiledriver. (@rmast)
Improvements present in 1.0 (May 6, 2024):
Framework:
- Initialize/finalize BLIS via a new
bli_pthread_switch_tAPI. (Field Van Zee, Devin Matthews) - Revamped
bli_init()to use TLS where feasible. (Field Van Zee, Edward Smyth, Minh Quan Ho) - Implemented support for fat multithreading.
- Implemented tile-level load balancing (tlb), or tile-level partitioning, in jr/ir loops for
gemm,gemmt, andtrmmmacrokernels. (Field Van Zee, Devin Matthews, Leick Robinson, Minh Quan Ho) - Added padding to
thrcomm_tfields to avoid false sharing of cache lines. (Leick Robinson) - Rewrote/fixed broken tree barrier implementation. (Leick Robinson)
- Refactored some
rntm_tmanagement code. (Field Van Zee, Devin Matthews) - Initialize
rntm_tnt/ways fields with 1 (not -1). (Field Van Zee, Jeff Diamond, Leick Robinson, Devin Matthews) - Defined
invscalv,invscalm,invscaldoperations. - Added consistent
NaN/Infhandling insumsqv. (Devin Matthews) - Implemented support for HPX as a threading backend option. (Christopher Taylor, Srinivas Yadav)
- Relocated the pba, sba pool (from the
rntm_t), andmem_t(from thecntl_t) to thethrinfo_tobject. - Modified which communicator is associated with a given node of the
thrinfo_ttree. (Devin Matthews) - Refactored level-3 thread decorator into two parts: a thread launcher and a function to pass operands. (Devin Matthews)
- Refactored structure awareness in
bli_packm_blk_var1.c. (Devin Matthews) - Reimplemented
bli_l3_determine_kc(). (Devin Matthews) - Implemented
cntx_tpointer caching in gks. (Field Van Zee, Harihara Sudhan S) - Added
constkeyword to pointers in kernel APIs. (Field Van Zee, Nisanth M P) - Migrated all kernel APIs to use
void*pointers. - Defined new global scalar constants:
BLIS_ONE_I,BLIS_MINUS_ONE_I,BLIS_NAN. (Devin Matthews) - Disabled modification of KC in the
gemmsupkernels. (Devin Matthews) - Defined
lt,lte,gt,gteoperations and other miscellaneous updates. - Consolidated
INSERT_macro sets via variadic macros. (Devin Matthews) - De-templatized macrokernels for
gemmt,trmm, andtrsmto match that ofgemm. (Devin Matthews) - De-templatized
bli_l3_sup_var1n2m.cand unified_sup_packm_a/b(). (Devin Matthews) - Fixed 1m enablement for
herk/her2k/syrk/syr2k. (Devin Matthews) - Fixed
trmm[3]/trsmperformance bug introduced incf7d616. (Field Van Zee, Leick Robinson) - Fixed a 1m optimization bug in right-sided
hemm/symm. (Field Van Zee, Nisanth M P) - Fixed a bug in sup threshold registration. (Devin Matthews, Field Van Zee)
- Fixed brokenness in the small block allocator (sba) when the sba is disabled. (Field Van Zee, John Mather)
- Fixed type bug in
bli_cntx_set_ukr_prefs(). (Field Van Zee, Leick Robinson, Devin Matthews, Jeff Diamond) - Fixed incorrect
sizeof(type)in edge case macros. (@moon-chilled) - Fixed bugs and added sanity check in
bli_pool.c. (Devin Matthews) - Fixed a typo in the macro definition for
VEXTRACTF64X2inbli_x86_asm_macros.h. (Harsh Dave) - Fixed a typo in
bli_type_defs.hwhereBLIS_BLAS_INT_TYPE_SIZEwas misspelled. (Devin Matthews) - Typecast
printf()args inbli_thread_range_tlb.cto avoid compiler warnings. (Lee Killough) - Minor tweaks to
bli_l3_check.c. - Partial addition of
constto all interfaces above the (micro)kernels. (Devin Matthews) - Fixed a harmless misspelling of
xpbysin gemm macrokernel. - Various internal API renaming/reorganization.
- Various other fixes.
Compatibility:
- Implemented
[cz]symv_(),[cz]syr_(),[cz]rot_(). (Field Van Zee, James Foster) - Fixed compilation errors when
BLIS_DISABLE_BLAS_DEFSis defined. (Field Van Zee, Edward Smyth, Devin Matthews) - Include
bli_config.hbeforebli_system.hincblas.hso thatBLIS_ENABLE_SYSTEMis defined in time for proper OS detection. (Edward Smyth)
Kernels:
- Updated ARMv8a kernels to fix two prefetching issues and re-enable general stride IO. (Jeff Diamond)
- Restored general storage case to
armsvekernels. (RuQing Xu) - Added arm64
dgemmsupwith extended MR and NR. (RuQing Xu) - Reorganized the way
packmkernels are stored within thecntx_tso that BLIS only stores twopackmkernels per datatype: one for MRxk upanels and one for kxNR upanels. (Devin Matthews) - Fixed bugs in
scal2vreference kernel when alpha == 1. - Fixed out-of-bounds read in
haswellgemmsupkernels. (Daniël de Kok, Bhaskar Nallani, Madeesh Kannan) - Fixed k = 0 edge case in
power10microkernels. (Nisanth M P) - Disabled
power10kernels other thansgemm,dgemm. (Nisanth M P) - Fixed
bli_gemm_small()prototype mismatch. (Jeff Diamond)
Extras:
- Use the conventional level-3 sup thread decorator within the
gemmlikesandbox. - Fixed type-mismatch errors in
power10sandbox. (Nisanth M P) - Fixed
gemmlikesandbox bug that stems from reuse ofbli_thrinfo_sup_grow().
Build system:
- Added two arm64 subconfigs:
altraandaltramax. (Jeff Diamond, Leick Robinson) - Added support for RISC-V configuration targets. (Angelika Schwarz, Lee Killough)
- Auto-detect the RISC-V ABI of the compiler and use
-mabi=during RISC-V builds. (Lee Killough) - Added
sifive_x280subconfig and kernel set. (Aaron Hutchinson, Lee Killough, Devin Matthews, and Angelika Schwarz) - Added AddressSanitizer (--enable-asan) option to
configure. (Devin Matthews) - Added option to disable thread-local storage via
--disable-tls. (Field Van Zee, Nick Knight) - Exclude
-lrton Android with Bionic libraries. (Lee Killough) - Omit
-fPICoption when shared library build is disabled. (Field Van Zee, Nick Knight) - Move
-fPICoption insertion to subconfigs'make_defs.mkfiles. (Field Van Zee, Nick Knight) - Install one-line helper headers to
INCDIRprefix so that user can#include "blis.h"instead of#include <blis/blis.h>and/or"cblas.h"instead of<blis/cblas.h>if CBLAS is enabled). (Field Van Zee, Jed Brown, Devin Matthews, Mo Zhou) - Enhanced detection of Fortran compiler when checking the version string for the purposes of determining a default return convention for complex domain values. (Bart Oldeman)
- Added detection of the NVIDIA nvhpc compiler (
nvc) inconfigure. (Ajay Panyala) - Updated
zen3subconfig to support NVHPC compilers. (Abhishek Bagusetty) - Use kernel CFLAGS for
kernelssubdirs in addons. (AMD, Mithun Mohan) - Created
powerumbrella configuration family (which currently includespower9andpower10subconfigs). (Nisanth M P) - Defined
BLIS_VERSION_STRINGinblis.hinstead of via command line argument during compilation. (Field Van Zee, Mohsen Aznaveh, Tim Davis) - Rewrote
regen-symbols.shasgen-libblis-symbols.sh. (Field Van Zee) - Support
clangtargetting MinGW. (Isuru Fernando) - Added autodetection (via
/proc/cpuinfo) for POWER7, POWER9 and POWER10 microarchitectures. (Alexander Grund) - Added
#linedirectives to flattenedblis.hto facilitate easier debugging. (Devin Matthews) - Added
--nosupand--supshorthand options toconfigure. - Use here-document syntax for
configure --helpoutput. (Lee Killough) - Updated
configureto pass allshellcheckchecks. (Lee Killough) - Tweaks to
.dir-locals.elto enhance emacs formatting of C files. (Lee Killough) - Removed buggy cruft from
power10subconfig. (Field Van Zee, Nicholai Tukanov) - Added missing
#include <io.h>for Windows. (@h-vetinari) - Fixed hardware auto-detection for
firestorm(Apple M1) subconfig. (Devin Matthews) - Fixed bug in detection of Fortran compiler vendor. (Devin Matthews)
- Fixed version check for
znver3, which needs gcc >= 10.3. (Jed Brown) - Fixed typo in
configure --helptext. (Lee Killough) - Fixed warning about regular expressions with stray backslashes as the result of recent changes to
grep. - Added
output.testsuiteto.gitignore. - Minor changes to .gitignore and LICENSE files. (Jeff Diamond)
- Minor decluttering of top-level directory.
- Very minor tweaks to common.mk.
Testing:
- Rewrote
test/3drivers to take parameters via command line arguments. (Field Van Zee, Jeff Diamond, Leick Robinson) - Added
arm64entry to.travis.ymlso that Travis CI will compile/test ARM builds. (Field Van Zee, RuQing Xu) - Test the
gemmlikesandbox via AppVeyor. (Jeff Diamond) - Added
-qquiet mode option to testsuite. - Fixed non-deterministic segfault in standalone
test/3drivers. (Field Van Zee, Leick Robinson) - Fixed a crash that occurs when either
cblat1orzblat1are linked with a build of BLIS that was compiled with--complex-return=intel. (Bart Oldeman) - Other minor fixes/tweaks.
Documentation:
- Added Discord documentation (
docs/Discord.md) and logo toREADME.md. - Added the
mm_algorithmfiles (for bp and pb) todocs/diagrams. - Added mention of Wilkinson Prize to
README.md. - Minor fixes and improvements to
docs/Multithreading.md. - Fix typos in docs + example code comments. (Igor Zhuravlov)