Skip to content

Commit b5542d3

Browse files
committed
Squashed 'tpls/kokkos/' changes from 5ad6096..aa1f48f
aa1f48f Merge pull request #5912 from ndellingwood/release-candidate-4.0.0 606866d Update master_history.txt for 4.0.0 52ea295 Merge branch release-candidate-4.0.0 for 4.0.0 0394f7f Merge pull request #5900 from kokkos/update-changelog-to-4.0.0 d4690ab Update changelog to 4.0.0 44a5b1e Merge pull request #5872 from masterleinad/fix_version_macro_4_0_0 8af43c4 Fix version macros in 4.0.00 36f65d0 Merge pull request #5851 from crtrott/no-deprecated-3-in-makefile-400 0f820d0 Drop (deprecated) KokkosCore_UnitTest_DefaultDeviceTypeInit_* from the makefile df33d98 Don't enable deprecated code 3 in Makefile builds anymore f4cc47a Merge pull request #5842 from PhilMiller/4.0-fix-macros 25b84ad Merge pull request #5839 from dalg24/rc40_typo_deprecared 77aa52a Fixup typo `#ifdef KOKKOS_ENABLE_DEPRECA{R -> T}ED_CODE_3` 5f58dfe HIP: Drop obsolete macro definition c3f9e34 ViewLayoutTiled: Be scrupulous about macro naming and undefining 41a9eb4 OpenMPTarget: Be scrupulous about macro naming and undefining 38ab536 CUDA: Fix up comment e49a724 CUDA: Convert simple value macro to constexpr ed51dea CRS: Use Kokkos device function macros rather than duplicating code when compiling for GPU targets 16b4c26 Merge pull request #5830 from dalg24/rc40_omp_chunck_sz_static_schedule d35a58d Merge pull request #5829 from dalg24/rc40_simd_neon 86d51ae Merge pull request #5824 from dalg24/rc40_deprecate_kokkos_active_execution_memory_space_macros 7bd2961 Merge pull request #5826 from dalg24/rc40_cuda_occupancy_fixup c6c12d0 OpenMP: Adding an ifdef around chunksize for static schedule for GCC compiler. 7b00f62 SIMD backend of ARM NEON (#5775) c482a65 Further update to CUDA occupancy calculation (#5739) ab9922e Change `#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_{4 -> 3}` a10d514 Deprecate `KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_*` macros dfafa6a Merge pull request #5773 from ndellingwood/resolve-intel-ice 0f38d03 Intel ICE Sacado: turn off support for nested OpenMP with ICPC dc9d27e Intel ICE Sacado: use new HostIterateTile API in OpenMP 41b3856 Intel ICE Sacado: use new HostIterateTile API in HPX 624c71c Intel ICE Sacado: use new HostIterateTile API in Threads 3676622 Intel ICE Sacado: use new HostIterateTile API in Serial f545c68 Intel ICE Sacado: rewrite HostIterateTile 14deae4 Merge pull request #5806 from Rombur/fix_typo d55bf83 Fixup ROCm 5.4 ImplForceGlobalLaunch{Launch -> }_t typo in unit tests d882b10 [4.0.0.] Add parameter to force using GlobaLMemory launch mechanism using HIP (#5803) e95c37b Merge pull request #5799 from Rombur/hip_global_launch 9e8d143 Merge pull request #5798 from dalg24/rc40-reduction_identity_char 5ef3844 Fix race condition when using GlobalLaunch with HIP and HSA_XNACK=1 48ca904 Add missing ReductionIdentity<char> specialization a2e3df5 Merge pull request #5788 from masterleinad/cherry_pick_5785_4.0.0 10d0bb0 Merge pull request #5783 from masterleinad/fix_ci_4.0.00 c20de31 sprintf -> snprintf 39c34f1 Fix build on Fedora rawhise 8dc7a2b Merge pull request #5771 from crtrott/fix-sycl-scratch-ptr-40 c8b7344 Let increment be of type uintptr_t fixing warning 4479f1b Fix ScratchSpace pointer comparison for SYCL b3f1ba3 Merge pull request #5768 from dalg24/rc40_fixup_desul_atomics e2c3caa Generate <desul/atomics/Config.hpp> file from the generated Makefiles 05d6271 Desul atomics configure library based what the user enabled 1f0f2df Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in PTX assembly code d39980e Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in compare exchange 93e4e7c Desul atomics cleanup enable GCC or MSVC atomics e695638 Desul atomics fixup detect use of SYCL fd1a8f8 Merge pull request #5753 from masterleinad/fix_kokkos_version_4_0_0 4bc5f7f CMake: change package COMPATIBILITY mode {SameMajorVersion -> AnyNewerVersion} 1a41b7e Update Kokkos version for 4.0.0 26406b6 Merge pull request #5743 from crtrott/fix-dynamic-view-400 5e44305 Apply clang-format 03acf65 fix broken DynamicView test case #4 374cc5c fix src/dst Properties in deep_copy(DynamicView,View) fa7d6b3 Merge pull request #5736 from crtrott/fix-intel19-werror-4.0.0 904583a Merge pull request #5734 from crtrott/remove-kokkos-cxx-standard-from-build.md b053cbb Fix -Werror with intel/19 27aea3c Remove KOKKOS_CXX_STANDARD mentioning from BUILD.md 661e6d6 Merge pull request #5687 from crtrott/fix-4942 af7c2d3 Merge pull request #5729 from dalg24/desul_impl_atomic_cuda_use_double_atomicadd 2191d7f Remove dead code guarded by `#ifdef DESUL_IMPL_ATOMIC_CUDA_USE_DOUBLE_ATOMICADD` 50058de Use proxy clang-format script for Christian 5ee40a9 Merge pull request #5515 from nliber/ctad-reducers 712a838 Scratch Allocation: Completely reset m_iter if fails 404b585 Scratch alignment: fix failed to allocate path 51228af Add comments in scratch alignment test b3793a7 Merge pull request #5706 from crtrott/cuda-cache-config3 e35f0e8 Update license 90dc058 Changed all the reducer run-time tests to compile-time tests daea95f Changed ASSERT_* to static_assert in ctad reducer tests 1bbea42 Removed Kokkos::LayoutStride out of the VS View, as it wasn't strictly necessary and was causing a runtime issue on Cuda 5ec23ed Fixed core/unit_test/CMakeLists.txt d69abb0 CTAD Reducer additions 914e12a Fix UB in scratch allocation calculation 7380b59 Fix new scratch alignment test for OpenMPTarget Team size restriction 28b73f5 Merge pull request #5724 from dalg24/fixup_device_annotation_random_pool baad0f9 Fixup device annotation on defaulted random pool default constructors 94a161e Fix scratch calculation in test f54c04c Merge pull request #5720 from Rombur/global_launch_fix 119d29d CudaCacheConfig update address review comments 446c706 Cleanup random pool special member functions and precondition check in `get_state()` (#5716) 0576f97 [HIP] Fix GlobalMemory launching mechanism 75a344e Merge pull request #5718 from crtrott/fix-cuda-max-scratch-size-calc 87752bc More finetuning based on review feedback 32e1bfb Address review comments for scratch size align fix 6455fc4 Scratch Size calculation moving variable to smaller scope 752c6db P max_team_scratch level 0 calculation c3e38dd Fix CUDA max_team_scratch level 0 calculation 9595d11 Fix lock arrays d9d43bf Merge pull request #5715 from tcclevenger/add_readme_to_incremental_tests efc5059 Add README.md explaining the incremental tests 7de2b62 Rework CUDA cache config using carveout calculations 9ed7dbf Merge pull request #5696 from dalg24/openacc_parallel_reduce_local_variable 30f477b Merge pull request #5396 from masterleinad/guard_t_openmp_instance 083467a Merge pull request #5708 from crtrott/disable-perf-test-in-trilinos 5fec21d Disable perf tests in Trilinos 537e36d Use local variable in the parallel_reduce(RangePolicy<OpenACC>, ...) f64e5a6 Update nvcc_wrapper default arch to work with CUDA 12 53bf1f5 Scratch space alignment: make private variable for View align value 1b933bd Merge pull request #5678 from kokkos/2873-check-execinfo a0914e7 Finalize HIP lock arrays 0241326 Merge pull request #5688 from dalg24/nvhpc_cuda_home 9a52696 Suppress (bogus) warning with NVHPC f4a182b Rename CUDA_HOME environment variable to NVHPC_CUDA_HOME for NVHPC SDK 22.9 5a2197f Merge pull request #5680 from dalg24/openacc_refactor_parallel_reduce c87d1f7 Fix nvhpc-docker container to work with CUDA 12 driver (#5686) 68c22d7 Make SYCL::concurrency non-static (#5682) 0b4912f Fixup test 39951d3 Make Kokkos_View scratch space use a minimum alignment 503f190 Fix alignment calculation in ScratchSpace 7fd8bfb Update alignment test eb5d8e9 Refactor Policy constructor tests (#5598) 2eeb52b Fix a bug in FunctorAnalysis::Reducer final_reducer() 3a2f5da #5348: Add git information to benchmark metadata (#5463) 5b7bbc6 Merge pull request #5676 from neutrinoceros/sprintf_to_snprintf f939065 Merge pull request #5684 from dalg24/fixup_tempfile_check_copyright_script 61f63e6 Fixup "mktemp: too few X's in template" dbda659 Fixup temporary file in script to check copyright dc96975 Guard t_openmp_instance with KOKKOS_ENABLE_DEPRECATED_CODE_3 d2b6f9d Fixup declare ParallelReduce<RangePolicy, OpenACC>::execute member function const cf32a3c Refactor ParallelReduce<OpenACC> using macros to reduce code duplication 6bf0fca Define KOKKOS_IMPL_ACC_PRAGMA macro 4cad83c Fix copyright in SIMD file a7c7a41 #2873: Check for presense of execinfo.h before enabling its use for stack tracing d63d81e SIMD AVX2 backend (#5512) 6a49537 core: add is_team_handle and test (#5375) 42cc936 ENH: drop deprecated sprintf usage in Kokkos_Profiling.cpp 600be76 Merge pull request #5672 from masterleinad/fix_sycl_atomic_ref b2e2096 Work around CUDA+Clang Thrust issue (#5660) 2f5f2ee Merge pull request #5669 from dalg24/cuda_concurrency_non_static 8303668 #5667: dont install std algorithm headers multiple times (#5670) 439fc18 Fix default memory order for sycl::atomic_ref f3cf72a Add architecture flags for MSVC (#5673) 17170f6 Fixup HIP header include incomplete type Impl::HIPInternal f8d5e8e Merge pull request #5664 from dalg24/cleanup_cuda_ldg_fetch 95fa3a7 Make HIP::concurrency() member function non static 969490a Introduce Impl::HIPInternal::concurrency static member function 965998d Fix {HIP:: -> HIP().}concurrency() occurences a1b8646 Make Cuda::concurrency() member function non static 4cfa416 Fix chicken and egg Cuda concurrency issue when initializing lock arrays 136abca Replace Impl::CudaInternal::m_maxConcurrency data member by Impl::CudaInternal::concurrency() member function ad6c3da Fix {Cuda:: -> Cuda().}concurrency() occurences e4cce6e Add partition_space to OpenMP (#5105) d24d7ed Merge pull request #5671 from crtrott/fix-5651 2513fc7 Fix classic Intel compiler Serial/OpenMP backend build 35fa45c Turn off classic intel workaround in Serial backend 8942959 Merge pull request #5668 from ndellingwood/update-changelog 14e9887 [ci skip] Update changelog 588dda8 Drop unnecessary inline specifiers in CudaLDGFetch d9ea15a Drop CudaLDGFetch default constructor definition (prefer defaulted one) 8aa2315 Drop CudaLDGFetch special member functions 87eafb9 Merge pull request #5621 from cwpearson/fix/issue-5594 783aa92 Merge pull request #5662 from dalg24/serial_openacc_threads_concurrency c9ff0d4 Merge pull request #5661 from mrowan137/mrowan137/bytes_and_flops_benchmark_typecast_to_scalar e2efab2 Fixup non-static OpenTarget::concurrency() member function unless deprecated code 4 is ON 94135a8 typecast to Scalar 4e08cb2 5594: Remove two-argument CudaLDFFetch constructor 33762bf Make Threads::concurrency() a non-static member function unless deprecated code 4 is ON 052256f Fixup non-static {Serial,OpenACC}::concurrency() member function unless deprecated code 4 is ON cd99be4 Check compatibility of execution space and memory space in View creation (#5544) a8e3f4a Merge pull request #5601 from masterleinad/avoid_extern_static_thread_local d707bd1 Merge pull request #5640 from crtrott/update-copyright a2e8b4e Use git ls-find instead of find 6324fd1 Merge pull request #5656 from dalg24/non_static_concurrency_member_function ffb50a5 Merge pull request #5522 from Rombur/navi 3324cf0 Update check-copyright to ignore build directory 1c31498 Fix bug in Makefile.kokkos 7edc47d Merge pull request #5645 from dalg24/prefer_std_thread_hardware_concurrency 6bc0f21 Drop non backend-specific use of static ExecutionSpace::concurrency() member function 50e0d34 Prepare for ExecutionSpace::concurrency() member function becoming a non-static member function 6389e89 Detect existence of ExecutionSpace::concurency() member function 732727e Fix reviewers' comments 5dbe8ea Merge pull request #5649 from masterleinad/fix_unused_warning_view_ctor 1e72c61 Fix Reduce/Scan on Navi 6053c71 Fix WarpSize for NAVI 39e8ba8 Remove support for MI25 and support for NAVI 1030 8e7a91b Avoid another ICC 19 warning in with_properties_if_unset 7e52986 Drop `Impl::processors_per_node()` since not used anymore 1d9f0ef Prefer std::thread::hardware_concurrency to our own cpu discovery facility 2e0c8a6 Merge pull request #5630 from dalg24/detect_mpi ed38bb9 Merge pull request #5627 from dalg24/cpp17_fold_expressions 26eb6f3 Drop pc.in filter 0b4ff6c Merge pull request #5634 from masterleinad/set_device_hip_cuda_interop_test 137158e Add LICENSE URL to header e13514e Update automatic check e4227c2 SYCL: Set RangePolicy default chunk_size to 1 (#5625) 2420eb2 Deprecate `Kokkos_ENABLE_CUDA_UVM` (#5608) d802fd0 Remove Kokkos_ENABLE_CUDA_LDG_INTRINSIC option (#5623) 8f76928 Fix cudaErrorInvalidDeviceFunction error caused by an uninstantiated functor (#5605) fdadafe Some more license updates post rebase 0457063 Update copyright update script 15170bf Update more copyrights caf525f Update copyrights in files ffad768 Have update-copyright go live 9922107 Update header 3995833 Fix up old copyright missing etc. f5ac2cb Update copyright script 2ff5577 Initial update of License to Apache2 with LLVM exception d0f65ae Merge pull request #5304 from nmm0/mdspan-extents-conversion f40e3bc mdspan: remove deprecated macro check for non-public header inclusion from Kokkos_MDSpan_Header.hpp and Kokkos_MDSpan_Extents.hpp 60634fb Set the correct device/context in InterOp tests 6d5fffb tests: minor mdspan tests formatting issues e17f6c1 tests: mdspan formatting 49052b4 tests: add helper template to mdspan extents tests to make it more clear what we are testing 93f7034 mdspan: minor comment formatting and rename the header include logic file to Kokkos_MDSpan_Header 15d4e2a mdspan: fix formatting feabc86 mdspan: move mdspan header logic into its own header 415f102 mdspan extents test: weird formatting issue 13a3842 tests: remove old TestViewMDSpan which was accidentally left in 5c2c6a9 mdspan: comment forward declarations in Kokkos_MDSpan_Extents.hpp 9c3efbb mdspan: use absolute namespaces rather than nesting forward decls with the extents impl 992202a tests: move extents test to a subdirectory so we can begin better organizing view tests. Rename the test and make it compile-only be033cd mdspan: add IndexType to ExtentsFromDataType template parameters 91faeb0 tests: remove namespace Test from mdspan tests 2116d19 mdspan: remove extents_type and dimension_type since they are unused efd308a mdspan: get rid of SizeType template parameter and use size_t directly;; remove extra inline specifiers 02469c3 mdspan: remove unused include and macro guards around KOKKOS_ENABLE_IMPL_MDSPAN since that's checked at the point of inclusion faef259 additional formatting adjustment 8dc0723 adjust formatting e36a0aa mdspan: add conversion from extents to datatype c3948fa mdspan: initial implementation of ExtentsFromDataType 6db6d49 Fix misplaced negation in tool testing utils b92d661 Try with a right fold to see if NVC++ like this better c4df87c Avoid use of immediately invoked lambdas to increase consensus e78d6a5 Merge pull request #5629 from arghdos/xnack_warn_msg 0b10dbb Reuse mpi_local_rank_on_node at initialization when picking a GPU 0a9b6d6 Let mpi_ranks_per_node and mpi_local_rank_on_node return -1 when detection fails 7d01fe7 Handle MPI local size -1 when checking for over-subscription in host parallel backends da0e6db Let get_ctest_gpu take the local MPI rank as an integer rather than a C string 4939350 Avoid forward declaration in ctest resource allocation tests 5780462 Fixup CPU discovery source file 5fdc7df fix typo in HSA_XNACK warning message 3afc80e Drop are_valid workaround in tool testing utils e886da9 Remove comma folding emulation pre C++17 workaround and use fold expressions instead 3c37ea0 Drop `#ifdef __cpp_fold_expressions` guards in partition_space 9516700 Update CUDA occupancy calculation to reflect register allocation granularity of 256 registers per warp (#5624) 682499f Merge pull request #5617 from masterleinad/fix_containers_compile_time_test 9377346 Merge pull request #5616 from dalg24/is_view_v fea40ca Make TestStdAlgorithmsCompileOnly runnable 8afd450 Merge pull request #5614 from JBludau/deprecate_uvm_available e107e64 More compile-time test for view e4481f5 Add compile-time test for is_view[_v] ad95098 Add is_view_v helper variable template ed10458 removed CudaUVMSpace::availible() guard from test d5e1a2f deprecate CudaUVMSpace::available() ad2af17 HIP as a CMake language (#5611) 3d1e8f9 Merge pull request #5612 from brian-kelley/FixGenMakefileStandard 3a6543c Fix help text in generate_makefile 3c842fb fix incorrect offset in cuda parallel scan for < 4 byte types (#5555) 21df075 Merge pull request #5588 from dalg24/upgrade_nvhpc_22_9 26e9ba2 remove RDC flags when using CMake language CUDA (#5564) dc76918 Merge pull request #5604 from masterleinad/fix_kokkos_deprecated 5ca9274 Revert disabling KOKKOS_DEPRECATED for OpenACC d7811b5 Fix position of KOKKOS_DEPRECATED e2659a1 Upgrade to NVHPC 22.9 and re-enable OpenMP in CI build d7b65db Merge pull request #5412 from PhilMiller/cleanup-volatile 34cdf58 Work around stupidity about [[deprecated]] for OpenACC too bccba60 Add KOKKOS_DEPRECATED to the stuff that's guarded by deprecation macros ace8eea Merge pull request #5599 from PhilMiller/ci-naming 53e0407 [ci skip] Rename Jenkins builds to not restate the baseline 76d2e53 Avoid static/extern thread_local 5955797 Don't test volatile complex<T> 228b281 Deprecate volatile qualified members instead of deleting them 351a0d1 Merge pull request #5595 from masterleinad/update_sycl_aot_flags 260f113 SYCL: Update AOT architectures bbc2f02 Merge pull request #5511 from JBludau/fixup_shared_spaces bbd39de Update core/unit_test/TestSharedSpace.cpp bfc9194 Workaround for "missing return statement at end of non-void function" warning in Kokkos_ViewCtor.hpp (#5493) 5695a29 Merge pull request #5529 from masterleinad/always_check_view_rank 051fd73 Merge pull request #5577 from JBludau/clock_tic_power_pc_fixup 7e81ad6 Merge pull request #5590 from ldh4/fix_warning_host_fn_called_from_host_device_fn d2bb223 Changed to call a host device fn Kokkos::abort instead of a host only fn Kokkos::throw_runtime_exception in a host device function. 8bc6922 Merge pull request #5586 from kliegeois/fix_LIFO_include 1096634 Min max pragma push (#5541) a83f5a6 Fix Kokkos_LIFO include ae718ba Merge pull request #5580 from dalg24/nvcc-extended-lambda 90100d9 Let Kokkos_ENABLE_CUDA_LAMBDA be ON by default 523066d Let nvcc_wrapper accept -[-]extended-lambda (w/o expt- prefix) flag 363df92 dropped 32 bit powerpc support 45a5835 deleted () in cmake to stick to the usual practice in kokkos cc6504d Merge pull request #5576 from dalg24/more_local_mpi_rank_detection 9a8ea2f Suppress warning: function declared with "noreturn" does return for CUDA and Debug mode (#5441) 6339975 Merge pull request #5579 from dalg24/hip-extended-lambda 9ce3b5f Merge pull request #5578 from ldh4/fix_signed_int_overflow_team_md_range b552886 Do not add --expt-extended-lambda compile flag with HIP and GNU generated makefiles 246efc9 Convert int to int64_t to avoid signed int overflow warnings from clang UBsan fba2cc4 Merge pull request #5575 from dalg24/release_37_changelog dbc1d50 Add support to detect the local MPI rank with PMI (Process Management Interface) c926bf5 split powerPC clocktic into 32 and 64 bit version 1faed5a Fixup release 3.7 chagelog Kokkos::common_view_alloc_prop not deprecated 7695978 Add 3.7.00 changelog 449d925 Cherry-pick missing 3.6.01 changelog lost in translation 0fb8b8a Merge pull request #5568 from bartlettroscoe/tril-11152-remove-undefined-tpl-deps 3b24f4e Merge pull request #5567 from dalg24/fixup_ub_logical_spaces_unit_test 453a812 Kokkos: Remove listing of undefined TPL deps (trilinos/Trilinos#11152) 1762225 Fix UB in logical spaces unit test 9764d03 Merge pull request #5503 from dalg24/desul_atomics_more_macros 91d0a4f Merge pull request #5549 from dalg24/desul_atomics_fixup_msvc 481cb8d Prefer C++ alignas specifier over GCC-specific language extension 5fbe4de Merge pull request #5546 from Rombur/trilinos_amdclang 1e0ce64 Merge pull request #5539 from Rombur/amdclang 7176188 Fix repeated team_reduce without barrier (#5540) 68d2f26 MSVC atomics template atomic_[compare_]exchange on MemoryOrder f9d3ef0 Fixup missing host_ prefix for MSVC lock-based atomic_[compare_]exchange 8fcf9ae Refactor desul atomics generic host and device fetch op with macros af65233 Merge pull request #5490 from dalg24/cuda_with_nvc++_ci_build a872b76 Merge pull request #5542 from dalg24/doc_housekeeping f1b5067 Do not error when using amdclang with Trilinos 04de99c Merge pull request #5527 from krasznaa/CUDAInitFix-develop-20221006 2384d9a Merge pull request #5184 from junghans/patch-5 92f8ae2 CI: test flang 05811e9 s/FIXME please/FIXME wrong result/ 8239e56 Merge pull request #5543 from dalg24/rm_travis_yml_file 42139b5 [CI skip] Retire unused .travis.yml file e1d0a8d [CI skip] Remove source helper file to query cuda arch 181bece Add doc/README with a word of warning redirecting to the online documentation on kokkos.github.io 1bc4913 Remove wildly outdated document about develop builds 05c9f1a Remove programming guide markdown file that points to outdated wiki page 1a88e22 Fixup not runing sort unit test with NVHPC 5cafe04 Fix linking when using amdclang 833da38 Merge pull request #5538 from crtrott/support-hopper 27393a0 Fix up cases where the arch macro is used for HOPPER 191b238 Trilinos: Pass OpenMP flags instead of linking with the OpenMP target (#5532) ce3014c Add hopper to compute_capability detector 519cef6 Merge pull request #5536 from crtrott/fix-mixed-arch-workgrpah 30c8db1 Merge pull request #5537 from crtrott/fix-5501 18cefac Add config output and shared mem config for Hopper 1e1cfe3 Merge pull request #5535 from crtrott/fix-5534 3fe9540 Add Hopper support fcf8a3c Add test to check for mismatch static dimension and mismatch layout 6989f38 CUDA: fixes mixed-arch-use of WorkGraphPolicy 18ddf7d Drop -Werror in NVHPC build for now 6a2fe1e Disable join unit test for Cuda too bb4755b Skipping one more Cuda test to get NVHPC CI build to pass (could not reproduce) 73a57e3 Disable serial unit test failing with NVHPC CI CUDA build 2d73754 Disable core unit tests to get NVHPC CI build to pass 02ef991 Disable containers unit tests to get NVHPC CI build to pass fb8179f Disable algorithm unit tests to get NVHPC CI build to pass 5425eb2 Try with -Werror and disabling bogus diagnostics f7ee64d Temporarily disable OpenMP in the NVHPC CI build e784787 Update CI build to use NVC++ to compile CUDA as well f7bfcc5 Simplify View create_mirror returning HostMirror 8524dda Only link against libatomic in gnu-make OpenMPTarget build be72920 Fix unnecessary check for runtime-rank 1 for Left/Right assignment d5fcc32 Merge pull request #5528 from Rombur/trilinos_fix bd9adc6 Fix 5315: use Kokkos::atomic_load to Correct Race Condition Giving Rise to Seg Fault'ing Error in OpenMP tests (#5530) b684f57 Merge pull request #5531 from JBludau/fix_unnamed_functor_instance 88ce0aa fixup for intel19 (most-vexing parse) 4bbe86c Always check rank in View construction f88d8ac Simplify copying the layout 1a63570 Export the flags in KOKKOS_AMDGPU_OPTIONS when using Trilinos 0ef177c Fixed the logic for building Kokkos for an older architecture. f70b121 Merge pull request #5525 from dalg24/mpich_local_rank 292ba24 Add support for detecting MPI local rank with MPICH 5b21511 Team MD range policies impl (#5238) 11385fe Merge pull request #5491 from etphipp/fix_as_view_of_rank_n_for_sacado d5575d4 Merge pull request #5343 from cz4rs/port-sample-perf-test 32921a7 Fix memory spaces in create_mirror_view overloads using view_alloc (#5488) f73a8c9 fixing the preproc define and unified some naming in the tests aa98af2 `SharedHostPinnedSpace` alias in fwd declaration (#5405) bfe8f8c Merge pull request #5451 from seyonglee/openacc_parallel_team 056e812 Merge pull request #5510 from dalg24/fixup_tools_tests 25e2302 Merge pull request #5520 from dalg24/rm_unused_header_cuda_alloc bc89f48 Remove (unused) header <Cuda/Kokkos_Cuda_Alloc.hpp> 2ce06b9 Merge pull request #5509 from masterleinad/update_cuda_11_0_dockerhub d9e2a51 Replace nvidia/cuda:11.0-devel->nvidia/cuda:11.0.3-devel-ubuntu18.04 8bc1adc Fixup tools callbacks signature (pointers to const) 7a82a2a Fixup prefer KOKKOS_PROFILE_LIBRARY -> KOKKOS_TOOLS_LIBS env var in tests to avoid warnings 503c78e Merge pull request #5506 from ndellingwood/update-nightly-script 746e600 [ci skip] test_all_sandia: updates and cleanup 497b3f9 Replace 0 with nullptr 99e2013 Merge pull request #5500 from e10harvey/a64fx db563cc Merge pull request #5498 from dalg24/drop_unused_host_device_atomic_compare_exchange 6385f26 core/src/impl: Fix warning as error 98699d1 cmake: define KOKKOS_ARCH_A64FX 7110f8c Also restrict other as_view_of_rank_n overloads to void specialize type 267f5eb Test Legion use case (#5206) e0331a8 OpenMPTarget: Update CI to use llvm/15.0.0 and enable corresponding unit tests (#5496) 67f521a Drop unused desul generic fallback atomic_compare_exchange_{strong,weak} implementation 7a8ebaf Merge pull request #5497 from dalg24/desul_drop_unused_serial_atomics 1179da6 Drop unused serial atomics bf01d32 Fence after View creation a92bdae Report units correctly 05cef20 Try removing volatile from AtomicDataElement (#5455) f97008a Merge pull request #5495 from ndellingwood/update-testscripts f9f528c test_all_sandia: nightly testing script updates 4a5a3e7 add cmake flag to enable mdspan and include mdspan as a tpl (#4973) c7ec8fb Move __pgi_vectoridx() call, which is used to set m_team_rank variable, into the OpenACCTeamMember constructor. ac33c8e ClangFormat d32e88b Update core/src/OpenACC/Kokkos_OpenACC_ParallelFor_Team.hpp f49f1ed Adding OpenACC support for Makefiles (Makefile.kokkos and Makefile.targets) (#5437) 035a875 Remove redundant implementation file 9df3555 Use benchmark's native rate support aaf14a8 OpenMPTarget: adding implementation to set device id. (#5492) 1a15ff5 Merge pull request #5289 from JBludau/SharedMemorySpace 9a42b6c Add FIXME_OPENACC the collapsing transformation macro in Kokkos_OpenACC.hpp Delete unused variable/function in Kokkos_OpenACC_Team.hpp 14a431f Fix formatting 83538b2 Allow as_view_of_rank_n() to be overloaded for "special" scalar types 9f7bc93 Delete struct always_false : std::false_type {}; in Kokkos_Utilities.hpp 5de7bb5 Apply suggestions from code review b152efa Merge pull request #5431 from dalg24/nvcc-support-with-desul 1bdbe63 Move KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS macro into Kokkos_OpenACC.hpp 4dd3eaf Remove unused variables as suggested by code review. Move KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS macro into an OpenACC header file. 26516e0 Merge pull request #5452 from cwpearson/fix/for-single-volatile 7484954 Fixup `<desul/atomics/Lock_Array_{Cuda -> CUDA}.hpp>` 8abec24 Cleanup on Kokkos side following the desul atomics refactor eb67f51 Fixup SYCL bug in desul atomics refactor 77dd69e Refactor desul atomics to support compiling CUDA with nvc++ 051d049 Merge pull request #5486 from masterleinad/fix_cmake_threads 882655c Update core/unit_test/TestTeamBasic.hpp 93f69db Merge pull request #5485 from dalg24/nvc++_wo_cuda 9d25e52 Merge pull request #5478 from Rombur/block_size_deduction 627d018 Remove Kokkos option, KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS, and instead pass it to the NVHPC compiler directly. 5af13cf Refactor code in Kokkos_OpenACC_ParallelFor_Team.hpp so that a single '#ifdef KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS` statement is used. 3fad6d2 Fix configuring with Threads support when rerunning CMake 60c51b6 Merge pull request #3 from dalg24/block_size_deduction 6edc1d5 Deduce pattern tag from the closure type 523107a Reorder order of template parameters 584af83 Make sure we don't add '-cuda' to the link line with NVC++ 6173186 Merge pull request #5484 from ndellingwood/disable-hypot-ld-power9 f049bff Add missing HIP cpp files in Makefile.targets (#5481) 2024db7 Merge pull request #5479 from ndellingwood/cherrypick-5318 5fc53cd Disable kk3_hypot in Power9 testing 153a39e Merge pull request #5450 from dalg24/move_reduction_identity f08afd4 Don't require user-defined volatile overloads in Kokkos::single 19fb19c Merge pull request #5318 from ibaned/avx-512-gcc-lt-8 9dfa1ec Use if constexpr in Kokkos_HIP_KernelLaunch.hpp f1e9659 Simplify computation of team size 504169a Make functions constexpr in Kokkos_HIP_BlockSize_Deduction 2dee5cb Update core/perf_test/test_sharedSpace.cpp 9457b67 Update core/perf_test/test_sharedSpace.cpp 137c7d6 Move acquisition of memory scratch space to its own function (#5468) 1f048cf Use inline static member variables for CudaInternal (#5473) 96a9c76 Include <Kokkos_ReductionIdentity.hpp> from <Kokkos_NumericTraits.hpp> for backward compatibility 3343b17 HIP: Initialize device-related variables only by the singleton (#5444) 421ecb4 Refactor conditional codes as suggested by code review. 4fa3d2a Update core/src/OpenACC/Kokkos_OpenACC_Team.hpp aace644 Use view's size to calculate statistics 296fcd9 Fix compiler error in SYCL parallel_scan (#5469) 48227a6 Add comments in `report_results()` 517f3ff Avoid unnecessary fence 80ca393 Extract ViewCopy_Raw benchmarks into separate file 012a789 dropped clang analyzer annotation for ShareSpace 9dff8cc Merge pull request #5466 from Rombur/test_work_graph 3487571 Remove HIP-only parameter for a test 1db9848 Remove obsolete comments 574fd77 Extract figure of merit helper function bf84c82 Let benchmark determine number of repetitions 1da7253 Remove benchmarks from Makefile 659da41 Port DeepCopy rank 1, 2 & 3 tests 2a961bc Port DeepCopy rank 4 & 5 tests e10bf56 Remove redundant DeepCopy Raw tests 4c576fc Port DeepCopy rank 6 tests 00820cf Port DeepCopy rank 7 tests 433e7b0 Mark selected counter as Figure of Merit 2fcfa2c Use separate ViewCopy header for benchmarks to avoid gtest dependency b414149 Port remaining ViewCopy rank 8 tests 0387994 Move helper methods to a common header 2ec000e Use the same filename for ported test 0f4b3a6 Remove obsolete code 0d82de4 Port single `ViewCopy` test to use google benchmark lib 673a0ef Merge pull request #4875 from masterleinad/sycl_launch_bounds_wgroup_size 2dcb24a Merge pull request #5457 from Rombur/print_config 4ae0b50 Merge pull request #5378 from thearusable/5348-add-kokkos-config-to-metadata 96af187 Merge pull request #5438 from masterleinad/remove_kokkos_abort_message_buffer_size c02a932 Merge pull request #5449 from masterleinad/print_configuration_add_architectures 55e428e Merge pull request #5462 from masterleinad/fix_restrict_sycl_cuda a3ec6ee #5438: Change namespace to KokkosBenchmark 38ce3ff Don't enable displaying architectures based on Kokkos_ENABLE_UNSUPPORTED_ARCHS 10abb82 Fix forcing Kokkos_ENABLE_UNSUPPORTED_ARCHS with SYCL+NVidia GPUs 7f1ee99 Merge pull request #5442 from masterleinad/cuda_set_device_only_for_singleton 3372a27 #5438: Improve removal of unwanted characters from the context data c53a6d0 #5438: Update core/perf_test/Benchmark_Context.hpp 2d389e4 #5438: Update code style with clang-format bbcceb0 #5438: Add kokkos configuration to benchamrk metadata e5a649c 1) Update Kokkos::Experimental::OpenACC::print_configuration() 2) Add FIXME_OPENACC comments for team_size_max and team_size_recommended APIs in Kokkos_OpenACC_Team.hpp 8342929 Merge pull request #5448 from Rombur/launch_local e2ab8a5 1) Created Kokkos::Impl::always_false<T> in impl/Kokkos_Utilities.hpp file, and used it to issue the compile-time error if unimplemented functions are instantiated. 2) Deleted unused header files. 667d47b Merge pull request #5454 from ldh4/fix_hip_desc_cmake a845d62 Apply clang format 6dc2317 Update as suggested by code review - Remove unnecessary inline keyword - Change KOKKOS_INLINE_FUNCTION to KOKKOS_FUNCTION in a class - Rename macros. 3a613c8 Apply suggestions from code review 7d23ecc Print the architecture for AMD GPU c4257c0 Fixed cmake configure still printing HIP backend as Experimental::HIP 5a440a4 Move singleton construction close to initialization e114205 Fix comments from review 3feb4de comment why we are using different memory for warmup 4200e53 Initial OpenACC backend implementation to support parallel-for constructs with Team policy. - Add COLLAPSE_HIERARCHICAL_CONSTRUCTS option to avoid issues on existing OpenACC compilers not supporting lambdas with parallel loops. - Not implemented features: scratch memory support, team_barrier(), team_broadcast(), team_reduce(), and team_scan(). 8b5a155 Remove ok_id 8544fb2 Move reduction_identity into its own header file 406d8fd Add architcetures to print_configuration e1a7697 Use default class member initialization for lists f757503 Introduce team shared memory pool c5f7108 HIP: Pass functor by value when using LocalMemory 4477a25 Merge pull request #5445 from dalg24/openacc_shared_allocation_record_header 8bf5526 switched from lambda to named functor to get rid of ENABLE_CUDA_LAMBDA 51cc823 Drop unnecessary SharedAllocationRecord<OpenACCSpace, void>::alocate member function 3c7fd11 Move OpenACC SharedAllocationRecord implementation to separate header and source file c3b5f5a switched from double to uint64_t in for_each f1b01fb okay, lets redefine NOMINMAX ... but something is really fucked up 8697f2b try if minimal windows header solves the issue 744617a love windows includes 7808e3b remove include windows.h as it is done in Kokkos_core 00aa7cd moved include order to please Bill Gates f33e820 Cuda: Initialize device-related variables only by the singleton 4b614a9 changed tests to use has_shared_space constexpr variable 74d6881 try if () around _WIN32 is getting windows to compile b17e264 change from constexp func to constexpr variable and switching to snake case 07c32f5 SYCL RangePolicy: manually specify workgroup size through chunk size d0f710d Merge pull request #5434 from dalg24/promote_math_constants 91c0b67 Remove unused KOKKOS_ABORT_MESSAGE_BUFFER_SIZE d3da941 Merge pull request #5430 from masterleinad/clean_kokkos_compiler_cuda 4ceebf4 Merge pull request #5435 from dalg24/hip_do_not_warn_about_xnack_when_no_support_for_page_migration 20c2142 Merge pull request #5433 from dalg24/intel_suppress_missing_return_statement_warning bcfbbb3 Unit test skips if host and device execution space are the same, as there is no migration fe6c769 Fixup quad support for math functions specialized in the right namespace a58c622 Merge pull request #5428 from ibaned/fma-function eb3c89f Avoid spamming users with warnings about XNACK after detecting that page migration is not supported b2676e8 Update tests following the promotion of the math constants 7f3c6c4 Cleanup mathematical special functions 081ff05 Promote mathematical constants to Kokkos::{Experimental -> numbers} namespace 389d70f Add math constant variables without the _v suffix 53d88bf Adjusted threshold to 1.5 in an attempt to make ci pass on cpu (parallel workloads) c9ad8aa Suppress bogus missing return statement warning with Intel Compiler Classic 8ba3174 Add quad-precission fma overload f1a8cfc Merge pull request #5429 from ldh4/fix_missing_namespace 7a8e7ed Remove KOKKOS_COMPILER_CUDA_VERSION 31914bd Clean up for NVCC<11 0f807c8 Merge pull request #5411 from masterleinad/disable_ice_openmptarget_tests 34ac119 Merge pull request #5424 from dalg24/kokkos_version_macros 50e4e33 Merge pull request #5427 from crtrott/issue-5426 5a4d1b3 added perf-test with extended information about the migrations 8194fa1 added unit test for SharedSpace to defaultDevice test 94b994a added SharedSpace alias and utility functions b730a74 add test for Kokkos::fma a4f3de3 fix bug in ternary function test macro f20bcd8 move fma to where its comment was located 6af2343 Fixed incorrect namespace 59f0a7a adding Kokkos::fma(x, y, z) 11ab46f Update core/src/impl/Kokkos_ViewCtor.hpp cc24f21 Fix spurious warning in NVCC < 11.5 about missing return 9b127a4 Define KOKKOS_COMPILER_NVCC with version number 3be4cef Merge pull request #5425 from dalg24/abort_illegal_init_or_finalize a67c16a Per review KOKKOS_VERSION_COMPARE -> KOKKOS_VERSION_{LESS,GREATER,EQUAL} cd2d5e3 Dispatch Kokkos::sort(Kokkos::View) to CUDA Thrust (#5183) c2e7ea2 Abort when calling initialize() more than once or calling finalize before init or after finalize c82eb33 Silence warnings about valueView being unused (#5421) 4e2c540 Enable Android x86_64 support (#5423) 6f5de58 Draft version comparison macro fccc196 Defined KOKKOS_VERSION_{MAJOR,MINOR,PATCH} macros 9645d46 Merge pull request #5391 from masterleinad/dont_rely_on_default_stream 30c17a8 Dispatch Kokkos::sort(Kokkos::View) to std::sort (#5372) 338b458 Remove dummy arguments for ViewCtorProp (#5314) bd44b5c Merge pull request #5418 from Rombur/rocm_52 c4c9bfc Remove XL compiler support (#5349) 397a6bc Don't rely on synchronization behavior of default stream in CUDA and HIP 802b7e6 Refactor HIP backend (#5410) 1f678ec Merge pull request #5417 from masterleinad/remove_deprecated_kokkos_task_policy 0fb2f48 Remove code only used by ROCm < 5.0 068973a Merge pull request #5374 from masterleinad/no_inline_default_delete ef85afa Merge pull request #5416 from masterleinad/require_rocm_5_2_0 b35228a Remove deprecated Kokkos_TaskPolicy.hpp 25bec65 Merge pull request #5415 from brian-kelley/SplitRandomSortTest c844e90 Extent FIXME_OPENMPTARGET comments 3220c59 Update algorithms/unit_tests/CMakeLists.txt fcfa27b Require ROCm 5.2.0 578e6ae Split Random/Sort/NestedSort test into multiple cpps e50a7be Merge pull request #5317 from brian-kelley/Do645 76b36ac Define KOKKOS_DEFAULTED_FUNCTION and KOKKOS_INLINE_FUNCTION_DELETED empty 34bda9e Merge pull request #5389 from PhilMiller/5385-develop-sort-fence 108d6e8 Merge pull request #5398 from dalg24/execution_spaces_regular 45522bf Disable OpenMPTarget unit tests that cause ICEs with icpx b0e4d55 Merge pull request #5409 from PhilMiller/5194-deprecate-volatile 6c54349 parallel_scan with View as result type (#5146) f08c241 Format 4efb43f Add unit test that volatile-qualified join() is called when we expect it to be d7baa9f #5194: Fail compilation if a volatile-qualified join() would be called 4f8e0ea Introduce dependent_false_v<T> so obscure code and clarifying comments don't need to be repeated 4689208 #5385: Add fences to all non-exec-space sorting routines other than BinSort constructors 7522044 Replace CL/sycl.hpp (#5387) 72a4a6c Merge pull request #5404 from masterleinad/dont_use_filesystem 20e072b Merge pull request #5407 from nmm0/5406-remove-Kokkos_PhysicalLayout c50399a OpenACC schedule type in parallel_for and parallel_reduce (RangePolicy) (#5340) f3265d5 #5406: remove Kokkos_PhysicalLayout.hpp since it is unused 2daa5e4 Merge pull request #5388 from kokkos/5382-develop-deprecate-sort-bool 8bd5bfb Don't use <filesystem> functionality 1d77ae9 Merge pull request #5402 from masterleinad/disable_device_and_threads_test_for_trilinos b38f68a Merge pull request #5395 from Rombur/m_regsPerSM fe2cd1b Disable KokkosCore_UnitTest_DeviceAndThreads for Trilinos 43eba1c Remove m_regsPerSM in the CUDA backend c39b4a3 Provide equality comparison operators for HPX as well bcbc086 Check that execution spaces meet the requirements of regular types 2cb40e1 Take advantage of C++17 when checking that execution spaces meet requirements bb94f0e Define comparison operators == and != for all execution spaces d1f61dc Add is_device_v helper variable template 5371c05 Remove unused variable: m_regsPerSM aef15d2 Merge pull request #5383 from Rombur/hip_experimental 2a8d03a Fix unit test that was calling deleted code path 39d0d58 Move HIP out of experimental 5dd1ceb Merge pull request #5377 from masterleinad/fix_init_array_reduction 3a74a23 Merge pull request #5376 from dalg24/openacc_parallel_mdrange 01adeee Fix test to match code change f909cb5 Remove code used only in the ex-deprecated deleted path eda9c4b Remove 'previously' deprecated sort() overloads 0971348 #5382: Deprecate overloads of Kokkos::sort() taking parameter 'bool always_use_kokkos_sort' 477df17 Minor update on the FIXME_OPENACC comment. a9a8699 Fix initialization for array reductions ddddeae Enable more tests and fixup unimplemented comments for OpenACC 2694523 Implement OpenACC MDRangePolicy parallel_for 243a6c3 Merge pull request #5328 from dalg24/test_device_and_num_threads_after_initialization 045a0d3 Fixup typo async_arg[c] in ParallelReduce OpenACC 2d6cbad Tracking performance testing: Integrate google benchmark (#5177) 2b2bb3d Don't use 'inline' with KOKKOS_DEFAULTED_FUNCTION or KOKKOS_INLINE_FUNCTION_DELETED 38ac8f1 Merge pull request #5371 from masterleinad/cleanup_cuda_10.0 7d9b708 Clean up CUDA <= 10.0 checks 0b9dee7 Merge pull request #5370 from dalg24/type_identity 889a530 Rename Impl::{identity -> type_identity} and import it from std:: when C++20 is available 81715f8 Merge pull request #5368 from dalg24/prefer_std_void_t d85fb44 Fixup include <Kokkos_Macros.hpp> to pass header self-containment test 22ee620 Merge pull request #5365 from dalg24/do_not_drop_label_when_reallocating_view 0e128e7 Enable automatic detection of arch when enabling 'HIP' with 'hipcc' (#5327) 746a110 Prefer std::void_t now that C++17 is available 8ad66d0 Merge pull request #5 from masterleinad/do_not_drop_label_when_reallocating_view 69085ef Add dummy source file to SIMD to allow deduction of linker language. (#5354) f48e95e Check labels for container types for resize and realloc ac67f75 Merge pull request #5367 from dalg24/refactor_initialization_settings_class_use_std_optional c260025 Enable the default OpenACC execution space (#5360) 1a9d992 Merge pull request #5345 from brian-kelley/FixAllocRecordPrintouts dcdb511 Refactor InitializationSettings class to leverage std::optional dd63284 Per review do not bother with temporary variable to store the label in realloc 14a542c Check labels in realloc unit test 8698ab7 Bugfix preserve view label when calling realloc() 4ebaaf5 SharedAlloc print_records: don't check m_alloc_ptr 0cc7917 Move sort test includes, revert whitespace changes 35faf09 Move nested sort tests into a separate file 7994904 Nested sort test: add error messages 67bb790 NestedSort test cleanup 9d67e0b Pass tag by value af4bc75 Update minimum compiler versions + clean up (#5323) 4cd0fc4 Merge pull request #5357 from masterleinad/bump_kokkos_version_develop 4d5b4cf Merge pull request #5356 from masterleinad/fix_pragma_ivdep_openmp d7ba238 Bump Kokkos version on develop 82834e3 Merge pull request #5227 from masterleinad/sycl_deduce_wgroup_size_reduce 421870d Fix pragma ivdep in Kokkos_OpenMP_Parallel.hpp 867a4ad Restrict the number of tests for num_threads b82c258 Add overloads of `hypot` math function that take 3 arguments (#5341) b77fb8f Drop mutable 9474a89 Merge pull request #5350 from kokkos/revert-5338-fix-linker-language-simd 466ba83 Revert "Fix missing linker language for SIMD (#5338)" 188fac1 Merge pull request #4105 from masterleinad/openmp_detection cb2eaff Nested bitonic sort: small changes b51b5e7 Merge pull request #5307 from masterleinad/avoid_default_ctor_withoutinitializing 38873b0 Fix null deref in SharedAllocationRecord::print_records 10bcbb8 Fix missing linker language for SIMD (#5338) 7d55fa3 Merge pull request #5344 from masterleinad/fix_unordered_map 6688894 Fix UnorderedMapRehash::operator() 93dc5a4 Fixup capture_ouptut parameter for subprocess.run was added in version 3.7 7096260 Temporarily disable test for OpenMPTarget since it does not select the right device d2d9275 removed DEBIAN_FRONTEND=noninteractive 21807ab pep8ed the python file 6d4b49b added header 4f3ebbe Merge pull request #5331 from dalg24/fixup_cxx17 a0e3579 added apt repo 63a5d35 Merge pull request #5336 from bartlettroscoe/tril-10810-fix-cmake-install 373e68e Install newer GCC in ubuntu18.04 based Docker images c3e1a21 Add missing <thread> header include faad639 Add OpenMP Target support in the test 65364d2 Fixup USE_SOURCE_PERMISSIONS is only supported since CMake 3.20 ae3e7fd Add tests for disable_warnings and tune_internals as well f213a6a Add Python test for device_id and num_threads after initialization 5d84992 Add executable that writes num_threads or device_id on demand after initialization 8fe13d1 Merge pull request #5325 from PhilMiller/5312-revert-3580 ee476ec Simplify KOKKOS_{EXT,SUB,INT}_LIBRARIES logic into a single KOKKOS_COMPONENT_LIBRARIES list 0638341 Fixup unconditionally enable SIMD now that C++17 is the minimum cxx standard required 767cbc9 Merge pull request #5330 from dalg24/openmp_print_warning_to_standard_error_stream 085c6fc Print OpenMP warnings to the standard error stream daf933a OpenACC `parallel_for` and `parallel_reduce` (#5322) 745cfd8 Fix Sort/NestedSort includes 6ea1250 Move team/thread sort to Experimental::, new header 294b7f6 Use KOKKOS_FUNCTION in place of INLINE, FORCEINLINE 839df99 Merge pull request #5326 from dalg24/drop_reciprocal_overflow_thresold_trait 6ae55a9 sort_thread tests: Fix OOB access on idle threads 8f73a3c Drop reciprocal_overflow_threshold trait 441f39f Merge pull request #5297 from masterleinad/remove_deprecated_code_3 75346c5 #5312: Revert #3580 and try a different workaround a86eb7f Refactor nested-parallelism sort to need 1 impl only 91531a5 Small updates to nested sort e645320 Reintroduce Kokkos_ENABLE_DEPRECATED_CODE_3 c481603 Merge pull request #5321 from dalg24/fixup_cuda_arch_auto_detection_when_included_in_other_cmake_project 8e274f6 Merge remote-tracking branch 'upstream/develop' into openmp_detection c14b43a Move HWLOC test a82b9ab reindent KokkosCore.hpp 5290902 Keep math functions in Experimental namespace for now e068698 Keep InitArguments for now 5df1d23 Remove warn_if_deprecated ef1e253 Use std::bool_constant 97bdee7 Fix HWLOC test eb4c146 KOKKOS_ENABLE_DEPRECATED_CODE_3->KOKKOS_ENABLE_DEPRECATED_CODE_4 bf2ba6c Remove all deprecated code, except for partition_master d10baa0 Remove Pthread backend a6042cb Including private headers is an error 8268c40 Remove KOKKOS_IMPL_CUDA_CLANG_WORKAROUND comment db45fb4 Guard destroy functor instantiated for GPU backends 1f946f9 Team- and thread-level sort, sort_by_key b0f3ef7 Merge pull request #5295 from masterleinad/cleanup_cxx_17 7415171 Merge pull request #5316 from masterleinad/arch_native_msvc f7859f1 Initial support for multiple OpenACC execution space instances: (#5296) 5e92b46 Fixup CUDA arch auto detection in Trilinos db66946 Don't test using a different compiler with OpenMP support d845b57 Merge pull request #5273 from nliber/is_concept_v 36251dd Update error message for GCC < 7 c45e698 Merge remote-tracking branch 'upstream/develop' into openmp_detection 5c5c9ea inline constexpr bool is_CONCEPT_v variables added to match C++17 traits fc175ba Merge pull request #5281 from masterleinad/test_size_containers_create_mirror c5dffba Error out if ARCH_NATIVE is requested for MSVC 2a9ef31 Apply suggestions from code review 9da3fc0 Restore KOKKOS_ATTRIBUTE_NODISCARD efbac46 Update comments in cmake/kokkos_corner.cmake 7515804 Always instantiate the destroy functor f44063b Fix labels in View initialization e615701 Avoid instantiating default constructor of value type when WithoutInitializing is given 8acdaed KOKKOS_CLASS_LAMBDA is always defined f855c1d static constexpr variables and *_v 57c3434 Remove another FIXME in cmake/kokkos_corner_cases.cmake e8280d0 Miscellaneous clean-ups af6c8db STATIC_ASSERT -> static_assert fcac9c5 Replace attributes be7dd5f Merge pull request #5310 from ndellingwood/update-testing-cpp17 1a0c77f [ci skip] set cpp standard to 17 6b89c47 Merge pull request #5308 from masterleinad/layouts_not_defuault_constructible d052ce8 Merge pull request #5277 from masterleinad/require_cxx_17 7cc17c6 Don't assume layouts are default-constructible 95d7082 Merge pull request #5303 from masterleinad/improve_offset_view 882076b Don't change nvcc_wrapper 89fa162 Merge pull request #5302 from dalg24/cherry-pick-fix-intel-ice 8efa799 Use copyright header in Kokkos_OffsetView.hpp 1b93a44 KOKKOS_INLINE_FUNCTION -> KOKKOS_FUNCTION in class ff8ed0d Implement OffsetView constructor taking pairs and ViewCtorProp fefefce Use if constexpr in cplusplus17.cpp 1d36652 Work around intel compiler bug 7f3a4d3 Merge pull request #5300 from masterleinad/clean_internal_scratch_bitset f590343 Avoid allocating memory for UniqueToken 7b0fedf Add FIXME_CXX17 9abc0b5 More nvcc_wrapper clean up dff1f45 Use OMP_NESTED = 'true' for gcc-8.4.0 in CI 35b5728 Update nvcc_wrapper 2b820b1 Update build system 277230b Update CI 4c5be02 Minimal changes to source b2371de Allow using C++23 (#5283) 28a8631 Merge pull request #5246 from masterleinad/sycl_store_device_id fd17f28 Merge pull request #5294 from masterleinad/fix_bhalf_t_prod_test 1c7f66b Support finding libquadmath with native compiler support (#5286) 955f2ec Merge pull request #5293 from masterleinad/kokkos_cxx_standard_error 966fa3c Merge pull request #5292 from dalg24/forward_scope_guard_arguments_to_initialize afb977b Only test product for bhalf_t for N<=5 515d5b7 Turn setting Kokkos_CXX_STANDARD into an error 39c677e Merge pull request #5291 from dalg24/fixup_pin_openacc_build f0b6442 Use perfect forwarding to Kokkos::initialize in ScopeGuard constructor 638b1b9 Fixup OpenACC must run on a machine that can handle large images c8f1ffb Merge pull request #5288 from masterleinad/force_openacc_ci_volta70 d4c88ce Run OpenACC CI on a Volta70 machine ab1fdea SYCL: Store device_id passed from initialization 4b9bce5 Merge pull request #5268 from crtrott/fix-warnings 12519a4 Check size in Containers WithoutInitializing test 5f6e3a8 Merge pull request #5272 from dalg24/fixup_flag_removal b33dd1e Merge pull request #5275 from PhilMiller/5274-dynamicview-mirror 824629a Merge pull request #5271 from ldh4/rank_remove_dim_limit 23fb091 #5274: Fix test to match expectation of fences that need to be there 636727d #5274 DynamicView: Properly resize mirror instances after construction e8c5024 Add test for kokkos-tools parsing --kokkos-tools-libs flag 8542fdc Test flag removal c53ca75 Fix flag removal in Tools and warn when flag is not recognized bc7adbf Do not forget to set last element to nullptr when removing flag a53ec9b Remove Kokkos::Rank limit to 6 ranks 6b07e39 Merge pull request #5267 from dalg24/raised_by_kokkos_initialize 54b3ec3 Cleanup "Raised by Kokkos::initialize" error and warning messages 3297b04 Limit workgroup size to 512 when not using an Intel GPU 040419a Link with OpenMP 3c64b3a Don't use link flags 426d982 Use LIB_NAMES instead 9bb4005 Try using FindOpenMP instead of figuring out flags manually e90fca1 Deduce workgroup size for SYCL parallel_reduce RangePolicy git-subtree-dir: tpls/kokkos git-subtree-split: aa1f48f
1 parent d4ea755 commit b5542d3

File tree

1,139 files changed

+33692
-42155
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,139 files changed

+33692
-42155
lines changed

.github/workflows/continuous-integration-workflow-hpx.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ jobs:
7171
-DHPX_ROOT=$PWD/../../hpx/install \
7272
-DKokkos_ARCH_NATIVE=ON \
7373
-DKokkos_ENABLE_COMPILER_WARNINGS=ON \
74-
-DKokkos_ENABLE_DEPRECATED_CODE_3=OFF \
74+
-DKokkos_ENABLE_DEPRECATED_CODE_4=OFF \
7575
-DKokkos_ENABLE_EXAMPLES=ON \
7676
-DKokkos_ENABLE_HPX=ON \
7777
-DKokkos_ENABLE_HPX_ASYNC_DISPATCH=ON \

.github/workflows/continuous-integration-workflow.yml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
continue-on-error: true
1111
strategy:
1212
matrix:
13-
distro: ['fedora:latest', 'ubuntu:latest']
13+
distro: ['fedora:latest', 'fedora:rawhide', 'ubuntu:latest']
1414
cxx: ['g++', 'clang++']
1515
cmake_build_type: ['Release', 'Debug']
1616
backend: ['OPENMP']
@@ -76,6 +76,13 @@ jobs:
7676
- name: maybe_disable_death_tests
7777
if: ${{ matrix.distro == 'fedora:rawhide' }}
7878
run: echo "GTEST_FILTER=-*DeathTest*" >> $GITHUB_ENV
79+
# Re-enable when latest is F37+
80+
# - name: maybe_use_flang
81+
# if: ${{ matrix.cxx == 'clang++' && startsWith(matrix.distro,'fedora:') }}
82+
# run: echo "FC=flang" >> $GITHUB_ENV
83+
- name: maybe_use_flang_new
84+
if: ${{ matrix.cxx == 'clang++' && startsWith(matrix.distro,'fedora:rawhide') }}
85+
run: echo "FC=flang-new" >> $GITHUB_ENV
7986
- name: maybe_use_external_gtest
8087
if: ${{ matrix.distro == 'ubuntu:latest' }}
8188
run: sudo apt-get update && sudo apt-get install -y libgtest-dev
@@ -93,8 +100,9 @@ jobs:
93100
-DKokkos_ENABLE_HWLOC=ON \
94101
-DKokkos_ENABLE_${{ matrix.backend }}=ON \
95102
-DKokkos_ENABLE_TESTS=ON \
103+
-DKokkos_ENABLE_BENCHMARKS=ON \
96104
-DKokkos_ENABLE_EXAMPLES=ON \
97-
-DKokkos_ENABLE_DEPRECATED_CODE_3=ON \
105+
-DKokkos_ENABLE_DEPRECATED_CODE_4=ON \
98106
-DKokkos_ENABLE_DEPRECATION_WARNINGS=OFF \
99107
-DCMAKE_CXX_COMPILER=${{ matrix.cxx }} \
100108
-DCMAKE_BUILD_TYPE=${{ matrix.cmake_build_type }}

.github/workflows/osx.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ jobs:
3030
cmake -B build .
3131
-DKokkos_ENABLE_${{ matrix.backend }}=On
3232
-DCMAKE_CXX_FLAGS="-Werror"
33-
-DCMAKE_CXX_STANDARD=14
33+
-DCMAKE_CXX_STANDARD=17
3434
-DKokkos_ARCH_NATIVE=ON
3535
-DKokkos_ENABLE_COMPILER_WARNINGS=ON
36-
-DKokkos_ENABLE_DEPRECATED_CODE_3=OFF
36+
-DKokkos_ENABLE_DEPRECATED_CODE_4=OFF
3737
-DKokkos_ENABLE_TESTS=On
3838
-DCMAKE_BUILD_TYPE=${{ matrix.cmake_build_type }}
3939
- name: build

.jenkins

Lines changed: 47 additions & 34 deletions
Large diffs are not rendered by default.

.travis.yml

Lines changed: 0 additions & 108 deletions
This file was deleted.

BUILD.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ There are numerous device backends, options, and architecture-specific optimizat
5252
````
5353
which activates the OpenMP backend. All of the options controlling device backends, options, architectures, and third-party libraries (TPLs) are given below.
5454

55+
Kokkos requires as a minimum C++17, however C++20 and C++23 are supported depending on the compiler.
56+
57+
The latest minimum compiler versions can be found in `cmake/kokkos_compiler_id.cmake`.
58+
5559
## Known Issues<a name="KnownIssues"></a>
5660

5761
### Cray
@@ -148,12 +152,14 @@ Options can be enabled by specifying `-DKokkos_ENABLE_X`.
148152
* Whether to activate experimental lambda features
149153
* BOOL Default: OFF
150154
* Kokkos_ENABLE_CUDA_LDG_INTRINSIC
155+
* Deprecated since 4.0, LDG intrinsics are always enabled.
151156
* Whether to use CUDA LDG intrinsics
152157
* BOOL Default: OFF
153158
* Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
154159
* Whether to enable relocatable device code (RDC) for CUDA
155160
* BOOL Default: OFF
156161
* Kokkos_ENABLE_CUDA_UVM
162+
* Deprecated since 4.0
157163
* Whether to use unified memory (UM) by default for CUDA
158164
* BOOL Default: OFF
159165
* Kokkos_ENABLE_DEBUG
@@ -184,10 +190,6 @@ Options can be enabled by specifying `-DKokkos_ENABLE_X`.
184190
* Whether to enable test suite
185191
* BOOL Default: OFF
186192

187-
## Other Options
188-
* Kokkos_CXX_STANDARD
189-
* The C++ standard for Kokkos to use: c++14, c++17, or c++20. This should be given in CMake style as 14, 17, or 20.
190-
* STRING Default: 14
191193

192194
## Third-party Libraries (TPLs)
193195
The following options control enabling TPLs:

CHANGELOG.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,108 @@
11
# Change Log
22

3+
## [4.0.0](https://github.com/kokkos/kokkos/tree/4.0.0) (2023-02-21)
4+
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.7.01...4.0.0)
5+
6+
### Features:
7+
- Allow value types without default constructor in `Kokkos::View` with `Kokkos::WithoutInitializing` [\#5307](https://github.com/kokkos/kokkos/pull/5307)
8+
- `parallel_scan` with `View` as result type. [\#5146](https://github.com/kokkos/kokkos/pull/5146)
9+
- Introduced `SharedSpace`, an alias for a `MemorySpace` that is accessible by every `ExecutionSpace`. The memory is moved and then accessed locally. [\#5289](https://github.com/kokkos/kokkos/pull/5289)
10+
- Introduced `SharedHostPinnedSpace`, an alias for a `MemorySpace` that is accessible by every `ExecutionSpace`. The memory is pinned to the host and accessed via zero-copy access. [\#5405](https://github.com/kokkos/kokkos/pull/5405)
11+
- Groundwork for `MDSpan` integration. [\#4973](https://github.com/kokkos/kokkos/pull/4973) and [\#5304](https://github.com/kokkos/kokkos/pull/5304)
12+
- Introduced MD version of hierarchical parallelism: `TeamThreadMDRange`, `ThreadVectorMDRange` and `TeamVectorMDRange`. [\#5238](https://github.com/kokkos/kokkos/pull/5238)
13+
14+
### Backend and Architecture Enhancements:
15+
16+
#### CUDA:
17+
- Allow CUDA PTX forward compatibility [\#3612](https://github.com/kokkos/kokkos/pull/3612) [\#5536](https://github.com/kokkos/kokkos/pull/5536) [\#5527](https://github.com/kokkos/kokkos/pull/5527)
18+
- Add support for NVIDIA Hopper GPU architecture [\#5538](https://github.com/kokkos/kokkos/pull/5538)
19+
- Don't rely on synchronization behavior of default stream in CUDA and HIP [\#5391](https://github.com/kokkos/kokkos/pull/5391)
20+
- Improve CUDA cache config settings [\#5706](https://github.com/kokkos/kokkos/pull/5706)
21+
22+
#### HIP:
23+
- Move `HIP`, `HIPSpace`, `HIPHostPinnedSpace`, and `HIPManagedSpace` out of the `Experimental` namespace [\#5383](https://github.com/kokkos/kokkos/pull/5383)
24+
- Don't rely on synchronization behavior of default stream in CUDA and HIP [\#5391](https://github.com/kokkos/kokkos/pull/5391)
25+
- Export AMD architecture flag when using Trilinos [\#5528](https://github.com/kokkos/kokkos/pull/5528)
26+
- Fix linking error (see [OLCF issue](https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#olcfdev-1167-kokkos-build-failures-with-prgenv-amd)) when using `amdclang`: [\#5539](https://github.com/kokkos/kokkos/pull/5539)
27+
- Remove support for MI25 and added support for Navi 1030 [\#5522](https://github.com/kokkos/kokkos/pull/5522)
28+
- Fix race condition when using `HSA_XNACK=1` [\#5755](https://github.com/kokkos/kokkos/pull/5755)
29+
- Add parameter to force using GlobalMemory launch mechanism. This can be used when encountering compiler bugs with ROCm 5.3 and 5.4 [\#5796](https://github.com/kokkos/kokkos/pull/5796)
30+
31+
#### SYCL:
32+
- Delegate choice of workgroup size for `parallel_reduce` with `RangePolicy` to the compiler. [\#5227](https://github.com/kokkos/kokkos/pull/5227)
33+
- SYCL `RangePolicy`: manually specify workgroup size through chunk size [\#4875](https://github.com/kokkos/kokkos/pull/4875)
34+
35+
#### OpenMPTarget:
36+
- Select the right device [\#5492](https://github.com/kokkos/kokkos/pull/5492)
37+
38+
#### OpenMP:
39+
- Add `partition_space` [\#5105](https://github.com/kokkos/kokkos/pull/5105)
40+
41+
### General Enhancements
42+
- Implement `OffsetView` constructor taking `pair`s and `ViewCtorProp` [\#5303](https://github.com/kokkos/kokkos/pull/5303)
43+
- Promote math constants to `Kokkos::numbers` namespace [\#5434](https://github.com/kokkos/kokkos/pull/5434)
44+
- Add overloads of `hypot` math function that take 3 arguments [\#5341](https://github.com/kokkos/kokkos/pull/5341)
45+
- Add `fma` fused multiply-add math function [\#5428](https://github.com/kokkos/kokkos/pull/5428)
46+
- Views using `MemoryTraits::Atomic` don't need `volatile` overloads for the value type anymore. [\#5455](https://github.com/kokkos/kokkos/pull/5455)
47+
- Added `is_team_handle` trait [\#5375](https://github.com/kokkos/kokkos/pull/5375)
48+
- Refactor desul atomics to support compiling CUDA with NVC++ [\#5431](https://github.com/kokkos/kokkos/pull/5431) [\#5497](https://github.com/kokkos/kokkos/pull/5497) [\#5498](https://github.com/kokkos/kokkos/pull/5498)
49+
- Support finding `libquadmath` with native compiler support [\#5286](https://github.com/kokkos/kokkos/pull/5286)
50+
- Add architecture flags for MSVC [\#5673](https://github.com/kokkos/kokkos/pull/5673)
51+
- SIMD backend for ARM NEON [\#5829](https://github.com/kokkos/kokkos/pull/5829)
52+
53+
### Build System Changes
54+
- Let CMake determine OpenMP flags. [\#4105](https://github.com/kokkos/kokkos/pull/4105)
55+
- Update minimum compiler versions. [\#5323](https://github.com/kokkos/kokkos/pull/5323)
56+
- Makefile and CMake support for C++23 [\#5283](https://github.com/kokkos/kokkos/pull/5283)
57+
- Do not add `-cuda` to the link line with NVHPC compiler when the CUDA backend is not actually enabled [\#5485](https://github.com/kokkos/kokkos/pull/5485)
58+
- Only add `-latomic` in generated GNU makefiles when OpenMPTarget backend is enabled [\#5501](https://github.com/kokkos/kokkos/pull/5501) [\#5537](https://github.com/kokkos/kokkos/pull/5537) (3.7 patch release candidate)
59+
- `Kokkos_ENABLE_CUDA_LAMBDA` now `ON` by default with NVCC [\#5580](https://github.com/kokkos/kokkos/pull/5580)
60+
- Fix enabling of relocatable device code when using CUDA as CMake language [\#5564](https://github.com/kokkos/kokkos/pull/5564)
61+
- Fix cmake configuration with CUDA 12 [\#5691](https://github.com/kokkos/kokkos/pull/5691)
62+
63+
### Incompatibilities (i.e. breaking changes)
64+
- ***Require C++17*** [\#5277](https://github.com/kokkos/kokkos/pull/5277)
65+
- Turn setting `Kokkos_CXX_STANDARD` into an error [\#5293](https://github.com/kokkos/kokkos/pull/5293)
66+
- Remove all deprecations in Kokkos 3 [\#5297](https://github.com/kokkos/kokkos/pull/5297)
67+
- Remove `KOKKOS_COMPILER_CUDA_VERSION` [\#5430](https://github.com/kokkos/kokkos/pull/5430)
68+
- Drop `reciprocal_overflow_threshold` numeric trait [\#5326](https://github.com/kokkos/kokkos/pull/5326)
69+
- Move `reduction_identity` out of `<Kokkos_NumericTraits.hpp>` into a new `<Kokkos_ReductionIdentity.hpp>` header [\#5450](https://github.com/kokkos/kokkos/pull/5450)
70+
- Reduction and scan routines will report an error if the `join()` operator they would use takes `volatile`-qualified parameters [\#5409](https://github.com/kokkos/kokkos/pull/5409)
71+
- `ENABLE_CUDA_UVM` is dropped in favor of using `SharedSpace` as `MemorySpace` explicitly [\#5608](https://github.com/kokkos/kokkos/pull/5608)
72+
- Remove Kokkos_ENABLE_CUDA_LDG_INTRINSIC option [\#5623](https://github.com/kokkos/kokkos/pull/5623)
73+
- Don't rely on synchronization behavior of default stream in CUDA and HIP - this potentially will break unintended implicit synchronization with other libraries such as MPI [\#5391](https://github.com/kokkos/kokkos/pull/5391)
74+
- Make ExecutionSpace::concurrency() a non-static member function [\#5655](https://github.com/kokkos/kokkos/pull/5655) and related PRs
75+
76+
### Deprecations
77+
- Guard against non-public header inclusion [\#5178](https://github.com/kokkos/kokkos/pull/5178)
78+
- Raise deprecation warnings if non empty WorkTag class is used [\#5230](https://github.com/kokkos/kokkos/pull/5230)
79+
- Deprecate `parallel_*` overloads taking the label as trailing argument [\#5141](https://github.com/kokkos/kokkos/pull/5141)
80+
- Deprecate nested types in functional [\#5185](https://github.com/kokkos/kokkos/pull/5185)
81+
- Deprecate `InitArguments` struct and replace it with `InitializationSettings` [\#5135](https://github.com/kokkos/kokkos/pull/5135)
82+
- Deprecate `finalize_all()` [\#5134](https://github.com/kokkos/kokkos/pull/5134)
83+
- Deprecate command line arguments (other than `--help`) that are not prefixed with `kokkos-*` [\#5120](https://github.com/kokkos/kokkos/pull/5120)
84+
- Deprecate `--[kokkos-]numa` cmdline arg and `KOKKOS_NUMA` env var [\#5117](https://github.com/kokkos/kokkos/pull/5117)
85+
- Deprecate `--[kokkos-]threads` command line argument in favor of `--[kokkos-]num-threads` [\#5111](https://github.com/kokkos/kokkos/pull/5111)
86+
- Deprecate `Kokkos::is_reducer_type` [\#4957](https://github.com/kokkos/kokkos/pull/4957)
87+
- Deprecate `OffsetView` constructors taking `index_list_type` [\#4810](https://github.com/kokkos/kokkos/pull/4810)
88+
- Deprecate overloads of `Kokkos::sort` taking a parameter `bool always_use_kokkos_sort` [\#5382](https://github.com/kokkos/kokkos/issues/5382)
89+
- Deprecate `CudaUVMSpace::available()` which always returned `true` [\#5614](https://github.com/kokkos/kokkos/pull/5614)
90+
- Deprecate `volatile`-qualified members from `Kokkos::pair` and `Kokkos::complex` [\#5412](https://github.com/kokkos/kokkos/pull/5412)
91+
- Deprecate `KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_*` macros [\#5824](https://github.com/kokkos/kokkos/pull/5824) (oversight in 3.2)
92+
93+
### Bug Fixes
94+
- Avoid allocating memory for `UniqueToken` [\#5300](https://github.com/kokkos/kokkos/pull/5300)
95+
- Fix `pragma ivdep` in `Kokkos_OpenMP_Parallel.hpp` [\#5356](https://github.com/kokkos/kokkos/pull/5356)
96+
- Fix configuring with Threads support when rerunning CMake [\#5486](https://github.com/kokkos/kokkos/pull/5486)
97+
- Fix View assignment between `LayoutLeft` and `LayoutRight` with static extents [\#5535](https://github.com/kokkos/kokkos/pull/5535) (3.7 patch release candidate)
98+
- Add `fence()` calls to sorting routine overloads that don't take an execution space parameter [\#5389](https://github.com/kokkos/kokkos/pull/5389)
99+
- `ClockTic` changed to 64 bit to fix overflow on Power [\#5577](https://github.com/kokkos/kokkos/pull/5577) (incl. in 3.7.01 patch release)
100+
- Fix incorrect offset in CUDA and HIP `parallel_scan` for < 4 byte types [\#5555](https://github.com/kokkos/kokkos/pull/5555) (3.7 patch release candidate)
101+
- Fix incorrect alignment behavior of scratch allocations in some corner cases (e.g. very small allocations) [\#5687](https://github.com/kokkos/kokkos/pull/5687) (3.7 patch release candidate)
102+
- Add missing `ReductionIdentity<char>` specialization [\#5798](https://github.com/kokkos/kokkos/pull/5798)
103+
- Don't install standard algorithms headers multiple times [\#5670](https://github.com/kokkos/kokkos/pull/5670)
104+
- Fix max scratch size calculation for level 0 scratch in CUDA and HIP [\#5718](https://github.com/kokkos/kokkos/pull/5718)
105+
3106
## [3.7.01](https://github.com/kokkos/kokkos/tree/3.7.01) (2022-12-01)
4107
[Full Changelog](https://github.com/kokkos/kokkos/compare/3.7.00...3.7.01)
5108

0 commit comments

Comments
 (0)