v0.9.1
Sparse support + bugfixes
- New operators:
argminmax,dense2sparse,sparse2dense,interp1,normalize,argsort - Removed requirement for --relaxed-constexpr
- Added MatX NVTX domain
- Significantly improved speed of
svdandinv - Python integration sample
- Experimental sparse tensor support (SpMM and solver routines supported)
- Significantly reduced FFT memory usage
What's Changed
- Moving definition of CUB cache up by @cliffburdick in #771
- Added documentation of memory types by @cliffburdick in #770
- Cleaning up non-const operator() to avoid code duplication by @cliffburdick in #769
- Switch to CUB/Thrust backend for cuda executor argmax by @tmartin-gh in #772
- Refactor cub argmax to generic cub reduce, use for argmin. Fixes #774. by @tmartin-gh in #776
- Change any() and all() to use CUB's reduce by @tmartin-gh in #777
- Add argminmax operator by @tmartin-gh in #778
- Fix matx::HostExecutor segfault with argmin/argmax by @tmartin-gh in #780
- Added new cusolverDnXsyevBatched API for batched eigen calls for CTK 12.6.2 and up by @cliffburdick in #781
- cub.h CUDACC guards for custom ops by @nvjonwong in #782
- Add example compiled with host compiler to catch regressions. by @tmartin-gh in #783
- Remove relaxed constexpr by @cliffburdick in #775
- Cleanup versions.json so jq can parse it. by @alliepiper in #785
- Allow rapids-cmake's version file to be overridden. by @alliepiper in #786
- Update rapids-cmake (branch-24.12@03ec7ef) by @alliepiper in #787
- Created MatX NVTX domain by @cliffburdick in #784
- Update docs github action by @tmartin-gh in #789
- Update docs github action by @tmartin-gh in #790
- Work around compiler parser bug by @cliffburdick in #791
- Updating developer documentation by @cliffburdick in #793
- Modify concat op to enable concatenating float3. by @nvjonwong in #792
- Fix rapids cmake by @alliepiper in #799
- Switched to getRs instead of getRi for faster inverse by @cliffburdick in #797
- Update CMakeLists.txt by @cliffburdick in #801
- Support half precision R2C transforms by @cliffburdick in #796
- Fix gcc13 erroneous warning by @cliffburdick in #802
- fixed missing forwarding code for allocate by @aartbik in #804
- Fix bug with eye, and also zero workspace before LU factorization by @cliffburdick in #807
- Change shape_type for the remap op by @nvjonwong in #806
- Faster batched SVD for small sizes by @cliffburdick in #805
- Fixing broadcasting in all operator() by @cliffburdick in #795
- Add a better error on memory allocation failure by @cliffburdick in #808
- Fix solver interfaces to use executor in cache by @cliffburdick in #809
- Python integration sample by @tmartin-gh in #812
- Fixes for clang17 errors/warnings by @cliffburdick in #815
- Misc Cleanup by @tmartin-gh in #814
- frexp_fix by @cliffburdick in #817
- Adding structures needed for sparse support by @cliffburdick in #819
- fix missing newline at EOF (to avoid future diff issues) by @aartbik in #822
- add size() to container storage by @aartbik in #824
- minor edit for sparse (layout and proper swap def) by @aartbik in #820
- add a to-string method for memory space by @aartbik in #823
- Cleanup cmake usage when MatX is a dependent project by @tmartin-gh in #827
- Fixing warnings issues by clang-19, both host and device by @cliffburdick in #825
- Update build_docs actions to newest. Add CI_RUN_DATETIME in version.rst by @tmartin-gh in #829
- introduce a versatile sparse tensor type to MatX (experimental) by @aartbik in #821
- Add initial tiff support by @tmartin-gh in #831
- Make dim2lvl translation for printing more in the style of MatX by @aartbik in #832
- Expose tensor format (and lvl specs) to sparse tensor data by @aartbik in #833
- Add cross product operator by @mfzmullen in #818
- remove LVL depth restriction with constexpr templating by @aartbik in #834
- Guard all DIM/LVL recursion against completely empty format by @aartbik in #835
- Adjust half-type threshold for cross product unit tests by @mfzmullen in #838
- Added fp32 version of normcdf by @cliffburdick in #839
- Changing black scholes to float and improving performance by @cliffburdick in #840
- Implement the () operator on sparse tensors by @aartbik in #837
- Support operators into einsum interface by @cliffburdick in #845
- Add print function with nonzero dim args by @tbensonatl in #844
- Updated CCCL to fix regression in newer CTK versions by @cliffburdick in #846
- First version of MATX SpMM (using dispatch to cuSPARSE) by @aartbik in #843
- Moved sparse operator() into tensor_impl_t by @cliffburdick in #841
- Adding timing metrics to CUDA and host executors by @cliffburdick in #842
- Remove dense "testers" from the sparse tensor format type by @aartbik in #847
- cuDSS by @cliffburdick in #848
- Update deprecated CUB types by @cliffburdick in #851
- Renamed versatile into universal for sparse tensor types by @aartbik in #850
- Ignore incorrect gcc warning in einsum by @cliffburdick in #853
- Added documentation on integrating with existing software by @cliffburdick in #852
- Add compile-time check for minimum CUDA arch by @tbensonatl in #855
- First version of MATX Sparse-Direct-Solve (using dispatch to cuDSS) by @aartbik in #849
- First version of MATX sparse2dense conversion (dispatch to cuSPARSE) by @aartbik in #856
- Improve cuFFT errors by @cliffburdick in #860
- workaround for CTAD bug in NVC++ by @cliffburdick in #859
- Add note about host-allocated memory to external guide by @cliffburdick in #862
- Cleanup to use pass-by-reference more consistently by @aartbik in #861
- Move empty storage construction to inline helper method by @aartbik in #857
- Make CCCL copy false by @cliffburdick in #865
- Remove test for free memory on FFTs by @cliffburdick in #864
- Fix initializer list order by @tmartin-gh in #867
- Initialize host cuRAND API when using host compiler by @cliffburdick in #866
- Add user-friendly assertions to make_sparse_tensor by @aartbik in #869
- Add "zero" matrix factor methods for COO,CSR,CSC by @aartbik in #870
- First version of MATX dense2sparse conversion (dispatch to cuSPARSE) by @aartbik in #868
- Add sparse factory method tests by @aartbik in #871
- Enforce library restrictions on MatX transformations by @aartbik in #872
- Add sparse conversion tests (dense2sparse, sparse2dense) by @aartbik in #873
- Add sparse direct-solver tests by @aartbik in #874
- Add SpMM tests by @aartbik in #875
- Refactored OperatorTests.cu for faster compilation time by @cliffburdick in #876
- Test feeding dense output as intermediate for the new sparse ops by @aartbik in #877
- Use transitive include in benchmarks cmake by @cliffburdick in #880
- Remove const qualifier on input to thrust iterator by @cliffburdick in #879
- allow incoming transformations for convert by @aartbik in #881
- allow for incoming/outgoing transformations on SpMM by @aartbik in #884
- Update sparse_tensor_format.h by @cliffburdick in #886
- allow transforming output for sparse2dense by @aartbik in #882
- allow for incoming and outgoing transformations on solve by @aartbik in #883
- modify constexpr fall-through return into else by @aartbik in #888
- Revert broadcasting changes to concat by @cliffburdick in #890
- Added CUDA executor alias by @cliffburdick in #891
- Do not create CUDA events in ephemeral executors by @cliffburdick in #889
- Update to CCCL 2.8.0 by @cliffburdick in #895
- MatX Containers by @tylera-nvidia in #892
- Add configurable Pwelch scaling and improve performance by @tmartin-gh in #897
- Fix initialization order of stream/profiling by @cliffburdick in #893
- Start sparse tensor documentation by @aartbik in #898
- add various references to sparse tensor api, extend type doc by @aartbik in #901
- feat: added economic QR by @mfzmullen in #903
- Add SpMV support for matvec transformation by @aartbik in #904
- Support mixed-precision for SpMM by @aartbik in #906
- Refine print order and skip device contents by @aartbik in #908
- Support mixed-precision for SpMV by @aartbik in #907
- Add a SpMM (COO) benchmark (all types) by @aartbik in #909
- Enabled mixed precision tests for SpMM and SpMV by @aartbik in #910
- fixed sparse tensor print format by @aartbik in #913
- Add guard for Dss support to sparse solver test by @aartbik in #915
- minor code cleanup by @aartbik in #916
- guard half type usage with proper cuda capability by @aartbik in #917
- Implemented sparse2sparse transformation (COO to CSR using cuSPARSE) by @aartbik in #918
- minor code cleanup by @aartbik in #919
- Add COO to CSR test by @aartbik in #921
- Fixed bug in sort() where memory was not properly freed by @cliffburdick in #922
- Improve tensor format types by @aartbik in #923
- Change rank limit for batching by @nvjonwong in #911
- Add std::complex to dlpack converter by @cliffburdick in #924
- typo: missing closing brace in docs by @simonbyrne in #927
- changed order of level exp / format in the file by @aartbik in #925
- various stylistic changes to sparse tensor format file by @aartbik in #930
- Disable additional NaN/Inf checks for complex ops by @tbensonatl in #932
- Add CMake option to enable -lineinfo; update docs by @tbensonatl in #933
- Update linspace for more of a pythonic syntax by @cliffburdick in #935
- Remove MATX_ROOT macro by @cliffburdick in #937
- Fix build issues with 32-bit indices by @tbensonatl in #938
- Bump GTest version for CMake 4.0 compatibility. by @alliepiper in #939
- Improve clone docs by @simonbyrne in #942
- Fix rank of linspace by @cliffburdick in #943
- Destroy events for profiling by @cliffburdick in #944
- Add support for as_int64 and as_uint64 casts by @tmartin-gh in #945
- Update nvbench to fix compiler errors by @cliffburdick in #948
- Add ddof parameter to stdd by @ahmedhus22 in #950
- implement basic interp by @simonbyrne in #936
- Fix ExecArgReduce compile error with matxBinaryOp by @tmartin-gh in #947
- Add Normalize function by @ahmedhus22 in #951
- Fix compiler warning by @cliffburdick in #952
- Fix build error using shape_type in std::conditional_t by @tbensonatl in #954
- Fixing another compiler warning by @cliffburdick in #955
- Use sincos() rather than separate sin/cos for expj by @tbensonatl in #958
- add argsort operator by @simonbyrne in #956
- Reduce FFT Memory Usage by @cliffburdick in #961
- Fix accidental example checkin by @cliffburdick in #960
- simplify OptimizedExecSort by @simonbyrne in #962
New Contributors
- @alliepiper made their first contribution in #785
- @aartbik made their first contribution in #804
- @simonbyrne made their first contribution in #927
- @ahmedhus22 made their first contribution in #950
Full Changelog: v0.9.0...v0.9.1