Sparse support + bugfixes

New operators: argminmax, dense2sparse, sparse2dense, interp1, normalize, argsort
Removed requirement for --relaxed-constexpr
Added MatX NVTX domain
Significantly improved speed of svd and inv
Python integration sample
Experimental sparse tensor support (SpMM and solver routines supported)
Significantly reduced FFT memory usage

What's Changed

Moving definition of CUB cache up by @cliffburdick in #771
Added documentation of memory types by @cliffburdick in #770
Cleaning up non-const operator() to avoid code duplication by @cliffburdick in #769
Switch to CUB/Thrust backend for cuda executor argmax by @tmartin-gh in #772
Refactor cub argmax to generic cub reduce, use for argmin. Fixes #774. by @tmartin-gh in #776
Change any() and all() to use CUB's reduce by @tmartin-gh in #777
Add argminmax operator by @tmartin-gh in #778
Fix matx::HostExecutor segfault with argmin/argmax by @tmartin-gh in #780
Added new cusolverDnXsyevBatched API for batched eigen calls for CTK 12.6.2 and up by @cliffburdick in #781
cub.h CUDACC guards for custom ops by @nvjonwong in #782
Add example compiled with host compiler to catch regressions. by @tmartin-gh in #783
Remove relaxed constexpr by @cliffburdick in #775
Cleanup versions.json so jq can parse it. by @alliepiper in #785
Allow rapids-cmake's version file to be overridden. by @alliepiper in #786
Update rapids-cmake (branch-24.12@03ec7ef) by @alliepiper in #787
Created MatX NVTX domain by @cliffburdick in #784
Update docs github action by @tmartin-gh in #789
Update docs github action by @tmartin-gh in #790
Work around compiler parser bug by @cliffburdick in #791
Updating developer documentation by @cliffburdick in #793
Modify concat op to enable concatenating float3. by @nvjonwong in #792
Fix rapids cmake by @alliepiper in #799
Switched to getRs instead of getRi for faster inverse by @cliffburdick in #797
Update CMakeLists.txt by @cliffburdick in #801
Support half precision R2C transforms by @cliffburdick in #796
Fix gcc13 erroneous warning by @cliffburdick in #802
fixed missing forwarding code for allocate by @aartbik in #804
Fix bug with eye, and also zero workspace before LU factorization by @cliffburdick in #807
Change shape_type for the remap op by @nvjonwong in #806
Faster batched SVD for small sizes by @cliffburdick in #805
Fixing broadcasting in all operator() by @cliffburdick in #795
Add a better error on memory allocation failure by @cliffburdick in #808
Fix solver interfaces to use executor in cache by @cliffburdick in #809
Python integration sample by @tmartin-gh in #812
Fixes for clang17 errors/warnings by @cliffburdick in #815
Misc Cleanup by @tmartin-gh in #814
frexp_fix by @cliffburdick in #817
Adding structures needed for sparse support by @cliffburdick in #819
fix missing newline at EOF (to avoid future diff issues) by @aartbik in #822
add size() to container storage by @aartbik in #824
minor edit for sparse (layout and proper swap def) by @aartbik in #820
add a to-string method for memory space by @aartbik in #823
Cleanup cmake usage when MatX is a dependent project by @tmartin-gh in #827
Fixing warnings issues by clang-19, both host and device by @cliffburdick in #825
Update build_docs actions to newest. Add CI_RUN_DATETIME in version.rst by @tmartin-gh in #829
introduce a versatile sparse tensor type to MatX (experimental) by @aartbik in #821
Add initial tiff support by @tmartin-gh in #831
Make dim2lvl translation for printing more in the style of MatX by @aartbik in #832
Expose tensor format (and lvl specs) to sparse tensor data by @aartbik in #833
Add cross product operator by @mfzmullen in #818
remove LVL depth restriction with constexpr templating by @aartbik in #834
Guard all DIM/LVL recursion against completely empty format by @aartbik in #835
Adjust half-type threshold for cross product unit tests by @mfzmullen in #838
Added fp32 version of normcdf by @cliffburdick in #839
Changing black scholes to float and improving performance by @cliffburdick in #840
Implement the () operator on sparse tensors by @aartbik in #837
Support operators into einsum interface by @cliffburdick in #845
Add print function with nonzero dim args by @tbensonatl in #844
Updated CCCL to fix regression in newer CTK versions by @cliffburdick in #846
First version of MATX SpMM (using dispatch to cuSPARSE) by @aartbik in #843
Moved sparse operator() into tensor_impl_t by @cliffburdick in #841
Adding timing metrics to CUDA and host executors by @cliffburdick in #842
Remove dense "testers" from the sparse tensor format type by @aartbik in #847
cuDSS by @cliffburdick in #848
Update deprecated CUB types by @cliffburdick in #851
Renamed versatile into universal for sparse tensor types by @aartbik in #850
Ignore incorrect gcc warning in einsum by @cliffburdick in #853
Added documentation on integrating with existing software by @cliffburdick in #852
Add compile-time check for minimum CUDA arch by @tbensonatl in #855
First version of MATX Sparse-Direct-Solve (using dispatch to cuDSS) by @aartbik in #849
First version of MATX sparse2dense conversion (dispatch to cuSPARSE) by @aartbik in #856
Improve cuFFT errors by @cliffburdick in #860
workaround for CTAD bug in NVC++ by @cliffburdick in #859
Add note about host-allocated memory to external guide by @cliffburdick in #862
Cleanup to use pass-by-reference more consistently by @aartbik in #861
Move empty storage construction to inline helper method by @aartbik in #857
Make CCCL copy false by @cliffburdick in #865
Remove test for free memory on FFTs by @cliffburdick in #864
Fix initializer list order by @tmartin-gh in #867
Initialize host cuRAND API when using host compiler by @cliffburdick in #866
Add user-friendly assertions to make_sparse_tensor by @aartbik in #869
Add "zero" matrix factor methods for COO,CSR,CSC by @aartbik in #870
First version of MATX dense2sparse conversion (dispatch to cuSPARSE) by @aartbik in #868
Add sparse factory method tests by @aartbik in #871
Enforce library restrictions on MatX transformations by @aartbik in #872
Add sparse conversion tests (dense2sparse, sparse2dense) by @aartbik in #873
Add sparse direct-solver tests by @aartbik in #874
Add SpMM tests by @aartbik in #875
Refactored OperatorTests.cu for faster compilation time by @cliffburdick in #876
Test feeding dense output as intermediate for the new sparse ops by @aartbik in #877
Use transitive include in benchmarks cmake by @cliffburdick in #880
Remove const qualifier on input to thrust iterator by @cliffburdick in #879
allow incoming transformations for convert by @aartbik in #881
allow for incoming/outgoing transformations on SpMM by @aartbik in #884
Update sparse_tensor_format.h by @cliffburdick in #886
allow transforming output for sparse2dense by @aartbik in #882
allow for incoming and outgoing transformations on solve by @aartbik in #883
modify constexpr fall-through return into else by @aartbik in #888
Revert broadcasting changes to concat by @cliffburdick in #890
Added CUDA executor alias by @cliffburdick in #891
Do not create CUDA events in ephemeral executors by @cliffburdick in #889
Update to CCCL 2.8.0 by @cliffburdick in #895
MatX Containers by @tylera-nvidia in #892
Add configurable Pwelch scaling and improve performance by @tmartin-gh in #897
Fix initialization order of stream/profiling by @cliffburdick in #893
Start sparse tensor documentation by @aartbik in #898
add various references to sparse tensor api, extend type doc by @aartbik in #901
feat: added economic QR by @mfzmullen in #903
Add SpMV support for matvec transformation by @aartbik in #904
Support mixed-precision for SpMM by @aartbik in #906
Refine print order and skip device contents by @aartbik in #908
Support mixed-precision for SpMV by @aartbik in #907
Add a SpMM (COO) benchmark (all types) by @aartbik in #909
Enabled mixed precision tests for SpMM and SpMV by @aartbik in #910
fixed sparse tensor print format by @aartbik in #913
Add guard for Dss support to sparse solver test by @aartbik in #915
minor code cleanup by @aartbik in #916
guard half type usage with proper cuda capability by @aartbik in #917
Implemented sparse2sparse transformation (COO to CSR using cuSPARSE) by @aartbik in #918
minor code cleanup by @aartbik in #919
Add COO to CSR test by @aartbik in #921
Fixed bug in sort() where memory was not properly freed by @cliffburdick in #922
Improve tensor format types by @aartbik in #923
Change rank limit for batching by @nvjonwong in #911
Add std::complex to dlpack converter by @cliffburdick in #924
typo: missing closing brace in docs by @simonbyrne in #927
changed order of level exp / format in the file by @aartbik in #925
various stylistic changes to sparse tensor format file by @aartbik in #930
Disable additional NaN/Inf checks for complex ops by @tbensonatl in #932
Add CMake option to enable -lineinfo; update docs by @tbensonatl in #933
Update linspace for more of a pythonic syntax by @cliffburdick in #935
Remove MATX_ROOT macro by @cliffburdick in #937
Fix build issues with 32-bit indices by @tbensonatl in #938
Bump GTest version for CMake 4.0 compatibility. by @alliepiper in #939
Improve clone docs by @simonbyrne in #942
Fix rank of linspace by @cliffburdick in #943
Destroy events for profiling by @cliffburdick in #944
Add support for as_int64 and as_uint64 casts by @tmartin-gh in #945
Update nvbench to fix compiler errors by @cliffburdick in #948
Add ddof parameter to stdd by @ahmedhus22 in #950
implement basic interp by @simonbyrne in #936
Fix ExecArgReduce compile error with matxBinaryOp by @tmartin-gh in #947
Add Normalize function by @ahmedhus22 in #951
Fix compiler warning by @cliffburdick in #952
Fix build error using shape_type in std::conditional_t by @tbensonatl in #954
Fixing another compiler warning by @cliffburdick in #955
Use sincos() rather than separate sin/cos for expj by @tbensonatl in #958
add argsort operator by @simonbyrne in #956
Reduce FFT Memory Usage by @cliffburdick in #961
Fix accidental example checkin by @cliffburdick in #960
simplify OptimizedExecSort by @simonbyrne in #962

New Contributors

@alliepiper made their first contribution in #785
@aartbik made their first contribution in #804
@simonbyrne made their first contribution in #927
@ahmedhus22 made their first contribution in #950

Full Changelog: v0.9.0...v0.9.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Sparse support + bugfixes

What's Changed

New Contributors

Contributors

Uh oh!