Release 1.9.0
The Ginkgo team is proud to announce the new Ginkgo minor release 1.9.0.
This release brings new features such as:
- Support for half precision (IEEE FP16). The type
gko::halfcan now be selected in most instances as the value type
of a matrix, solver, preconditioner, etc. If the selected backend supports FP16 as a native type, the native type is
used within the kernels, otherwise an overhead might occur. The new behavior is enabled by default, but it can be
turned off during configuration. - New implementations of the ILU and IC factorization for CUDA, HIP, OpenMP, and Reference backends. These are
available in addition to the existing implementations based on the vendor libraries cuSPARSE and hipSPARSE. - New (S)SOR and Gauss-Seidel preconditioners.
- Simplified distributed matrix assembly by exchanging local rows between neighboring processes.
And more!
If you face an issue, please first check our known issues page and the open issues list and if you do not
find a solution, feel free to open a new issue or ask a question using the github discussions.
Supported systems and requirements:
- For all platforms, CMake 3.16+
- C++17 compliant compiler
- Linux and macOS
- GCC: 7.0+
- clang: 5.0+
- Intel compiler: 2019+
- Apple Clang: 15.0 is tested. Earlier versions might also work.
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CMake 3.18+, and CUDA 11.0+ or NVHPC 22.7+, Compute Capability 5.3+
- HIP module: CMake 3.21+, and ROCm 4.5+
- DPC++ module: Intel oneAPI 2023.1+ with oneMKL and oneDPL. Set the CXX compiler to
dpcpporicpx. - MPI: standard version 3.1+, ideally GPU Aware, for best performance
- Windows
- MinGW: GCC 7.0+
- Microsoft Visual Studio: VS 2019+
- CUDA module: CUDA 11.0+, Microsoft Visual Studio
- OpenMP module: MinGW.
Version support changes
- Ginkgo now requires a compiler with C++ 17 support #1603
Deprecations
- The
Executor::runoverload taking in multiple functions without a name as first parameter has been deprecated #1667 - The
masterbranch has been deprecated in favor of a new branch namedmain#1739.
Summary of previous deprecations
- The
device_resetparameter of CUDA and HIP executors no longer has an effect, and itsallocation_modeparameters have been deprecated in favor of theAllocatorinterface. - The CMake parameter
GINKGO_BUILD_DPCPPhas been deprecated in favor ofGINKGO_BUILD_SYCL. - The
gko::reorder::Rcminterface has been deprecated in favor ofgko::experimental::reorder::Rcmbased onPermutation. - The Permutation class'
permute_maskfunctionality. - Multiple functions with typos (
set_complex_subpsace(), range functions such asconj_operatonetc). gko::lend()is not necessary anymore.- The classes
RelativeResidualNormandAbsoluteResidualNormare deprecated in favor ofResidualNorm. - The class
AmgxPgmis deprecated in favor ofPgm. - Default constructors for the CSR
load_balanceandautomaticalstrategies - The PolymorphicObject's move-semantic
copy_fromvariant - The templated
SolverBaseclass. - The class
MachineTopologyis deprecated in favor ofmachine_topology. - Logger constructors and create functions with the
executorparameter. - The virtual, protected, Dense functions
compute_norm1_impl,add_scaled_impl, etc. - Logger events for solvers and criterion without the additional
implicit_tau_sqparameter. - The global
gko::solver::default_krylov_dim, use insteadgko::solver::gmres_default_krylov_dim. array::get_num_elems()has been renamed toget_size()matrix_data::ensure_row_major_order()has been renamed tosort_row_major()device_matrix_data::get_num_elems()has been renamed toget_num_stored_elements()- The CMake parameter
GINKGO_COMPILER_FLAGShas been superseded byCMAKE_CXX_FLAGS, andGINKGO_CUDA_COMPILER_FLAGShas been superseded byCMAKE_CUDA_FLAGS - The
std::initializer_listoverloads of matrixcreatemethods and constructors are deprecated in favor of explicitarrayparameters
Added features
- Add
Executor::get_description()for textual representation of the device #1615 - Add row and column scaling functionality to the distributed matrix #1640
- Add
SolverProgresslogger printing out or storing to disk the individual scalars (and vectors) of an iterative solver after each iteration #1620 - Add new
ortho_methodparameter for GMRES, with classical Gram-Schmidt and classical Gram-Schmidt with re-orthogonalization options in addition to previously-available modified Gram-Schmidt #1646 - Add file config support for Schwarz #1658
- Add overload for
Executor::runwhich accepts a name and a closure for the ReferenceExecutor as the first two arguments #1667 - Add function to fill
device_matrix_datawith zeros #1683 - Add (S)SOR and Gauss-Seidel preconditioner #1633, #1634
- Add support for additive
read_distributedfor the distributed matrix #1650 - Add Ginkgo's own ILU and IC implementation #1684
- Add NVIDIA Ada architecture #1733
- Add half precision support #1706, #1708, #1711, #1712, #1713, #1716, #1710, #1736
Improvements
- Add workspace in residual norm check #1687, which reduces the alloc/free and corresponding overhead.
- Add distributed
VectorCacheand use it as workspace inSchwarz#1688. - Add example to show the file config usage #1662
- Improve compile time for batched solvers #1629
- Reduce conflicting thrust symbols when linking with different thrust libraries by adding a custom thrust namespace #1730
Fixes
- Fix using the same algorithm as the original triangular solver when creating the transposed of the solver #1641
- Fix the inconsistent behavior on the zero diagonal value in scalar Jacobi #1642
- Fix an issue related to GCR and non-default strides in the rhs vector #1656
- Fix an issue related to triangular solvers with CUDA on Windows #1665
- Fix an issue where non-conforming MatrixMarket files were parsed without an error #1628
- Fix finding rocthrust if it's not installed paths included by default #1668
- Fix an issue related to casting between vectors of different value types in the mixed-precision multigrid setup #1663
- Fix some test failures with ROCm 6.x #1670
- Fix a race condition in bicgstab #1676
- Fix an issue with MGS GMRES for complex numbers #1678
- Fix finding ROCm on recent ROCm version (5.0+) #1673
- Fix a compiler error when using NVHPC with MPI enabled #1697
- Fix build issues of OMP backend when using HIPCC as C++ compiler #1695
- Fix build issues for Intel OneAPI 2025.0 #1718
- Fix inconsistencies between declaration and definition of functions and classes/structs, which mainly fixes clang-cl #1725
- Fix undefined symbols in shared library in msys2/clang #1724
- Fix page fault issues when running on multiple Intel GPUs in parallel #1723
- Fix data races in several OMP kernels #1743