Commit b5542d3
committed
aa1f48f Merge pull request #5912 from ndellingwood/release-candidate-4.0.0
606866d Update master_history.txt for 4.0.0
52ea295 Merge branch release-candidate-4.0.0 for 4.0.0
0394f7f Merge pull request #5900 from kokkos/update-changelog-to-4.0.0
d4690ab Update changelog to 4.0.0
44a5b1e Merge pull request #5872 from masterleinad/fix_version_macro_4_0_0
8af43c4 Fix version macros in 4.0.00
36f65d0 Merge pull request #5851 from crtrott/no-deprecated-3-in-makefile-400
0f820d0 Drop (deprecated) KokkosCore_UnitTest_DefaultDeviceTypeInit_* from the makefile
df33d98 Don't enable deprecated code 3 in Makefile builds anymore
f4cc47a Merge pull request #5842 from PhilMiller/4.0-fix-macros
25b84ad Merge pull request #5839 from dalg24/rc40_typo_deprecared
77aa52a Fixup typo `#ifdef KOKKOS_ENABLE_DEPRECA{R -> T}ED_CODE_3`
5f58dfe HIP: Drop obsolete macro definition
c3f9e34 ViewLayoutTiled: Be scrupulous about macro naming and undefining
41a9eb4 OpenMPTarget: Be scrupulous about macro naming and undefining
38ab536 CUDA: Fix up comment
e49a724 CUDA: Convert simple value macro to constexpr
ed51dea CRS: Use Kokkos device function macros rather than duplicating code when compiling for GPU targets
16b4c26 Merge pull request #5830 from dalg24/rc40_omp_chunck_sz_static_schedule
d35a58d Merge pull request #5829 from dalg24/rc40_simd_neon
86d51ae Merge pull request #5824 from dalg24/rc40_deprecate_kokkos_active_execution_memory_space_macros
7bd2961 Merge pull request #5826 from dalg24/rc40_cuda_occupancy_fixup
c6c12d0 OpenMP: Adding an ifdef around chunksize for static schedule for GCC compiler.
7b00f62 SIMD backend of ARM NEON (#5775)
c482a65 Further update to CUDA occupancy calculation (#5739)
ab9922e Change `#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_{4 -> 3}`
a10d514 Deprecate `KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_*` macros
dfafa6a Merge pull request #5773 from ndellingwood/resolve-intel-ice
0f38d03 Intel ICE Sacado: turn off support for nested OpenMP with ICPC
dc9d27e Intel ICE Sacado: use new HostIterateTile API in OpenMP
41b3856 Intel ICE Sacado: use new HostIterateTile API in HPX
624c71c Intel ICE Sacado: use new HostIterateTile API in Threads
3676622 Intel ICE Sacado: use new HostIterateTile API in Serial
f545c68 Intel ICE Sacado: rewrite HostIterateTile
14deae4 Merge pull request #5806 from Rombur/fix_typo
d55bf83 Fixup ROCm 5.4 ImplForceGlobalLaunch{Launch -> }_t typo in unit tests
d882b10 [4.0.0.] Add parameter to force using GlobaLMemory launch mechanism using HIP (#5803)
e95c37b Merge pull request #5799 from Rombur/hip_global_launch
9e8d143 Merge pull request #5798 from dalg24/rc40-reduction_identity_char
5ef3844 Fix race condition when using GlobalLaunch with HIP and HSA_XNACK=1
48ca904 Add missing ReductionIdentity<char> specialization
a2e3df5 Merge pull request #5788 from masterleinad/cherry_pick_5785_4.0.0
10d0bb0 Merge pull request #5783 from masterleinad/fix_ci_4.0.00
c20de31 sprintf -> snprintf
39c34f1 Fix build on Fedora rawhise
8dc7a2b Merge pull request #5771 from crtrott/fix-sycl-scratch-ptr-40
c8b7344 Let increment be of type uintptr_t fixing warning
4479f1b Fix ScratchSpace pointer comparison for SYCL
b3f1ba3 Merge pull request #5768 from dalg24/rc40_fixup_desul_atomics
e2c3caa Generate <desul/atomics/Config.hpp> file from the generated Makefiles
05d6271 Desul atomics configure library based what the user enabled
1f0f2df Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in PTX assembly code
d39980e Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in compare exchange
93e4e7c Desul atomics cleanup enable GCC or MSVC atomics
e695638 Desul atomics fixup detect use of SYCL
fd1a8f8 Merge pull request #5753 from masterleinad/fix_kokkos_version_4_0_0
4bc5f7f CMake: change package COMPATIBILITY mode {SameMajorVersion -> AnyNewerVersion}
1a41b7e Update Kokkos version for 4.0.0
26406b6 Merge pull request #5743 from crtrott/fix-dynamic-view-400
5e44305 Apply clang-format
03acf65 fix broken DynamicView test case #4
374cc5c fix src/dst Properties in deep_copy(DynamicView,View)
fa7d6b3 Merge pull request #5736 from crtrott/fix-intel19-werror-4.0.0
904583a Merge pull request #5734 from crtrott/remove-kokkos-cxx-standard-from-build.md
b053cbb Fix -Werror with intel/19
27aea3c Remove KOKKOS_CXX_STANDARD mentioning from BUILD.md
661e6d6 Merge pull request #5687 from crtrott/fix-4942
af7c2d3 Merge pull request #5729 from dalg24/desul_impl_atomic_cuda_use_double_atomicadd
2191d7f Remove dead code guarded by `#ifdef DESUL_IMPL_ATOMIC_CUDA_USE_DOUBLE_ATOMICADD`
50058de Use proxy clang-format script for Christian
5ee40a9 Merge pull request #5515 from nliber/ctad-reducers
712a838 Scratch Allocation: Completely reset m_iter if fails
404b585 Scratch alignment: fix failed to allocate path
51228af Add comments in scratch alignment test
b3793a7 Merge pull request #5706 from crtrott/cuda-cache-config3
e35f0e8 Update license
90dc058 Changed all the reducer run-time tests to compile-time tests
daea95f Changed ASSERT_* to static_assert in ctad reducer tests
1bbea42 Removed Kokkos::LayoutStride out of the VS View, as it wasn't strictly necessary and was causing a runtime issue on Cuda
5ec23ed Fixed core/unit_test/CMakeLists.txt
d69abb0 CTAD Reducer additions
914e12a Fix UB in scratch allocation calculation
7380b59 Fix new scratch alignment test for OpenMPTarget Team size restriction
28b73f5 Merge pull request #5724 from dalg24/fixup_device_annotation_random_pool
baad0f9 Fixup device annotation on defaulted random pool default constructors
94a161e Fix scratch calculation in test
f54c04c Merge pull request #5720 from Rombur/global_launch_fix
119d29d CudaCacheConfig update address review comments
446c706 Cleanup random pool special member functions and precondition check in `get_state()` (#5716)
0576f97 [HIP] Fix GlobalMemory launching mechanism
75a344e Merge pull request #5718 from crtrott/fix-cuda-max-scratch-size-calc
87752bc More finetuning based on review feedback
32e1bfb Address review comments for scratch size align fix
6455fc4 Scratch Size calculation moving variable to smaller scope
752c6db P max_team_scratch level 0 calculation
c3e38dd Fix CUDA max_team_scratch level 0 calculation
9595d11 Fix lock arrays
d9d43bf Merge pull request #5715 from tcclevenger/add_readme_to_incremental_tests
efc5059 Add README.md explaining the incremental tests
7de2b62 Rework CUDA cache config using carveout calculations
9ed7dbf Merge pull request #5696 from dalg24/openacc_parallel_reduce_local_variable
30f477b Merge pull request #5396 from masterleinad/guard_t_openmp_instance
083467a Merge pull request #5708 from crtrott/disable-perf-test-in-trilinos
5fec21d Disable perf tests in Trilinos
537e36d Use local variable in the parallel_reduce(RangePolicy<OpenACC>, ...)
f64e5a6 Update nvcc_wrapper default arch to work with CUDA 12
53bf1f5 Scratch space alignment: make private variable for View align value
1b933bd Merge pull request #5678 from kokkos/2873-check-execinfo
a0914e7 Finalize HIP lock arrays
0241326 Merge pull request #5688 from dalg24/nvhpc_cuda_home
9a52696 Suppress (bogus) warning with NVHPC
f4a182b Rename CUDA_HOME environment variable to NVHPC_CUDA_HOME for NVHPC SDK 22.9
5a2197f Merge pull request #5680 from dalg24/openacc_refactor_parallel_reduce
c87d1f7 Fix nvhpc-docker container to work with CUDA 12 driver (#5686)
68c22d7 Make SYCL::concurrency non-static (#5682)
0b4912f Fixup test
39951d3 Make Kokkos_View scratch space use a minimum alignment
503f190 Fix alignment calculation in ScratchSpace
7fd8bfb Update alignment test
eb5d8e9 Refactor Policy constructor tests (#5598)
2eeb52b Fix a bug in FunctorAnalysis::Reducer final_reducer()
3a2f5da #5348: Add git information to benchmark metadata (#5463)
5b7bbc6 Merge pull request #5676 from neutrinoceros/sprintf_to_snprintf
f939065 Merge pull request #5684 from dalg24/fixup_tempfile_check_copyright_script
61f63e6 Fixup "mktemp: too few X's in template"
dbda659 Fixup temporary file in script to check copyright
dc96975 Guard t_openmp_instance with KOKKOS_ENABLE_DEPRECATED_CODE_3
d2b6f9d Fixup declare ParallelReduce<RangePolicy, OpenACC>::execute member function const
cf32a3c Refactor ParallelReduce<OpenACC> using macros to reduce code duplication
6bf0fca Define KOKKOS_IMPL_ACC_PRAGMA macro
4cad83c Fix copyright in SIMD file
a7c7a41 #2873: Check for presense of execinfo.h before enabling its use for stack tracing
d63d81e SIMD AVX2 backend (#5512)
6a49537 core: add is_team_handle and test (#5375)
42cc936 ENH: drop deprecated sprintf usage in Kokkos_Profiling.cpp
600be76 Merge pull request #5672 from masterleinad/fix_sycl_atomic_ref
b2e2096 Work around CUDA+Clang Thrust issue (#5660)
2f5f2ee Merge pull request #5669 from dalg24/cuda_concurrency_non_static
8303668 #5667: dont install std algorithm headers multiple times (#5670)
439fc18 Fix default memory order for sycl::atomic_ref
f3cf72a Add architecture flags for MSVC (#5673)
17170f6 Fixup HIP header include incomplete type Impl::HIPInternal
f8d5e8e Merge pull request #5664 from dalg24/cleanup_cuda_ldg_fetch
95fa3a7 Make HIP::concurrency() member function non static
969490a Introduce Impl::HIPInternal::concurrency static member function
965998d Fix {HIP:: -> HIP().}concurrency() occurences
a1b8646 Make Cuda::concurrency() member function non static
4cfa416 Fix chicken and egg Cuda concurrency issue when initializing lock arrays
136abca Replace Impl::CudaInternal::m_maxConcurrency data member by Impl::CudaInternal::concurrency() member function
ad6c3da Fix {Cuda:: -> Cuda().}concurrency() occurences
e4cce6e Add partition_space to OpenMP (#5105)
d24d7ed Merge pull request #5671 from crtrott/fix-5651
2513fc7 Fix classic Intel compiler Serial/OpenMP backend build
35fa45c Turn off classic intel workaround in Serial backend
8942959 Merge pull request #5668 from ndellingwood/update-changelog
14e9887 [ci skip] Update changelog
588dda8 Drop unnecessary inline specifiers in CudaLDGFetch
d9ea15a Drop CudaLDGFetch default constructor definition (prefer defaulted one)
8aa2315 Drop CudaLDGFetch special member functions
87eafb9 Merge pull request #5621 from cwpearson/fix/issue-5594
783aa92 Merge pull request #5662 from dalg24/serial_openacc_threads_concurrency
c9ff0d4 Merge pull request #5661 from mrowan137/mrowan137/bytes_and_flops_benchmark_typecast_to_scalar
e2efab2 Fixup non-static OpenTarget::concurrency() member function unless deprecated code 4 is ON
94135a8 typecast to Scalar
4e08cb2 5594: Remove two-argument CudaLDFFetch constructor
33762bf Make Threads::concurrency() a non-static member function unless deprecated code 4 is ON
052256f Fixup non-static {Serial,OpenACC}::concurrency() member function unless deprecated code 4 is ON
cd99be4 Check compatibility of execution space and memory space in View creation (#5544)
a8e3f4a Merge pull request #5601 from masterleinad/avoid_extern_static_thread_local
d707bd1 Merge pull request #5640 from crtrott/update-copyright
a2e8b4e Use git ls-find instead of find
6324fd1 Merge pull request #5656 from dalg24/non_static_concurrency_member_function
ffb50a5 Merge pull request #5522 from Rombur/navi
3324cf0 Update check-copyright to ignore build directory
1c31498 Fix bug in Makefile.kokkos
7edc47d Merge pull request #5645 from dalg24/prefer_std_thread_hardware_concurrency
6bc0f21 Drop non backend-specific use of static ExecutionSpace::concurrency() member function
50e0d34 Prepare for ExecutionSpace::concurrency() member function becoming a non-static member function
6389e89 Detect existence of ExecutionSpace::concurency() member function
732727e Fix reviewers' comments
5dbe8ea Merge pull request #5649 from masterleinad/fix_unused_warning_view_ctor
1e72c61 Fix Reduce/Scan on Navi
6053c71 Fix WarpSize for NAVI
39e8ba8 Remove support for MI25 and support for NAVI 1030
8e7a91b Avoid another ICC 19 warning in with_properties_if_unset
7e52986 Drop `Impl::processors_per_node()` since not used anymore
1d9f0ef Prefer std::thread::hardware_concurrency to our own cpu discovery facility
2e0c8a6 Merge pull request #5630 from dalg24/detect_mpi
ed38bb9 Merge pull request #5627 from dalg24/cpp17_fold_expressions
26eb6f3 Drop pc.in filter
0b4ff6c Merge pull request #5634 from masterleinad/set_device_hip_cuda_interop_test
137158e Add LICENSE URL to header
e13514e Update automatic check
e4227c2 SYCL: Set RangePolicy default chunk_size to 1 (#5625)
2420eb2 Deprecate `Kokkos_ENABLE_CUDA_UVM` (#5608)
d802fd0 Remove Kokkos_ENABLE_CUDA_LDG_INTRINSIC option (#5623)
8f76928 Fix cudaErrorInvalidDeviceFunction error caused by an uninstantiated functor (#5605)
fdadafe Some more license updates post rebase
0457063 Update copyright update script
15170bf Update more copyrights
caf525f Update copyrights in files
ffad768 Have update-copyright go live
9922107 Update header
3995833 Fix up old copyright missing etc.
f5ac2cb Update copyright script
2ff5577 Initial update of License to Apache2 with LLVM exception
d0f65ae Merge pull request #5304 from nmm0/mdspan-extents-conversion
f40e3bc mdspan: remove deprecated macro check for non-public header inclusion from Kokkos_MDSpan_Header.hpp and Kokkos_MDSpan_Extents.hpp
60634fb Set the correct device/context in InterOp tests
6d5fffb tests: minor mdspan tests formatting issues
e17f6c1 tests: mdspan formatting
49052b4 tests: add helper template to mdspan extents tests to make it more clear what we are testing
93f7034 mdspan: minor comment formatting and rename the header include logic file to Kokkos_MDSpan_Header
15d4e2a mdspan: fix formatting
feabc86 mdspan: move mdspan header logic into its own header
415f102 mdspan extents test: weird formatting issue
13a3842 tests: remove old TestViewMDSpan which was accidentally left in
5c2c6a9 mdspan: comment forward declarations in Kokkos_MDSpan_Extents.hpp
9c3efbb mdspan: use absolute namespaces rather than nesting forward decls with the extents impl
992202a tests: move extents test to a subdirectory so we can begin better organizing view tests. Rename the test and make it compile-only
be033cd mdspan: add IndexType to ExtentsFromDataType template parameters
91faeb0 tests: remove namespace Test from mdspan tests
2116d19 mdspan: remove extents_type and dimension_type since they are unused
efd308a mdspan: get rid of SizeType template parameter and use size_t directly;; remove extra inline specifiers
02469c3 mdspan: remove unused include and macro guards around KOKKOS_ENABLE_IMPL_MDSPAN since that's checked at the point of inclusion
faef259 additional formatting adjustment
8dc0723 adjust formatting
e36a0aa mdspan: add conversion from extents to datatype
c3948fa mdspan: initial implementation of ExtentsFromDataType
6db6d49 Fix misplaced negation in tool testing utils
b92d661 Try with a right fold to see if NVC++ like this better
c4df87c Avoid use of immediately invoked lambdas to increase consensus
e78d6a5 Merge pull request #5629 from arghdos/xnack_warn_msg
0b10dbb Reuse mpi_local_rank_on_node at initialization when picking a GPU
0a9b6d6 Let mpi_ranks_per_node and mpi_local_rank_on_node return -1 when detection fails
7d01fe7 Handle MPI local size -1 when checking for over-subscription in host parallel backends
da0e6db Let get_ctest_gpu take the local MPI rank as an integer rather than a C string
4939350 Avoid forward declaration in ctest resource allocation tests
5780462 Fixup CPU discovery source file
5fdc7df fix typo in HSA_XNACK warning message
3afc80e Drop are_valid workaround in tool testing utils
e886da9 Remove comma folding emulation pre C++17 workaround and use fold expressions instead
3c37ea0 Drop `#ifdef __cpp_fold_expressions` guards in partition_space
9516700 Update CUDA occupancy calculation to reflect register allocation granularity of 256 registers per warp (#5624)
682499f Merge pull request #5617 from masterleinad/fix_containers_compile_time_test
9377346 Merge pull request #5616 from dalg24/is_view_v
fea40ca Make TestStdAlgorithmsCompileOnly runnable
8afd450 Merge pull request #5614 from JBludau/deprecate_uvm_available
e107e64 More compile-time test for view
e4481f5 Add compile-time test for is_view[_v]
ad95098 Add is_view_v helper variable template
ed10458 removed CudaUVMSpace::availible() guard from test
d5e1a2f deprecate CudaUVMSpace::available()
ad2af17 HIP as a CMake language (#5611)
3d1e8f9 Merge pull request #5612 from brian-kelley/FixGenMakefileStandard
3a6543c Fix help text in generate_makefile
3c842fb fix incorrect offset in cuda parallel scan for < 4 byte types (#5555)
21df075 Merge pull request #5588 from dalg24/upgrade_nvhpc_22_9
26e9ba2 remove RDC flags when using CMake language CUDA (#5564)
dc76918 Merge pull request #5604 from masterleinad/fix_kokkos_deprecated
5ca9274 Revert disabling KOKKOS_DEPRECATED for OpenACC
d7811b5 Fix position of KOKKOS_DEPRECATED
e2659a1 Upgrade to NVHPC 22.9 and re-enable OpenMP in CI build
d7b65db Merge pull request #5412 from PhilMiller/cleanup-volatile
34cdf58 Work around stupidity about [[deprecated]] for OpenACC too
bccba60 Add KOKKOS_DEPRECATED to the stuff that's guarded by deprecation macros
ace8eea Merge pull request #5599 from PhilMiller/ci-naming
53e0407 [ci skip] Rename Jenkins builds to not restate the baseline
76d2e53 Avoid static/extern thread_local
5955797 Don't test volatile complex<T>
228b281 Deprecate volatile qualified members instead of deleting them
351a0d1 Merge pull request #5595 from masterleinad/update_sycl_aot_flags
260f113 SYCL: Update AOT architectures
bbc2f02 Merge pull request #5511 from JBludau/fixup_shared_spaces
bbd39de Update core/unit_test/TestSharedSpace.cpp
bfc9194 Workaround for "missing return statement at end of non-void function" warning in Kokkos_ViewCtor.hpp (#5493)
5695a29 Merge pull request #5529 from masterleinad/always_check_view_rank
051fd73 Merge pull request #5577 from JBludau/clock_tic_power_pc_fixup
7e81ad6 Merge pull request #5590 from ldh4/fix_warning_host_fn_called_from_host_device_fn
d2bb223 Changed to call a host device fn Kokkos::abort instead of a host only fn Kokkos::throw_runtime_exception in a host device function.
8bc6922 Merge pull request #5586 from kliegeois/fix_LIFO_include
1096634 Min max pragma push (#5541)
a83f5a6 Fix Kokkos_LIFO include
ae718ba Merge pull request #5580 from dalg24/nvcc-extended-lambda
90100d9 Let Kokkos_ENABLE_CUDA_LAMBDA be ON by default
523066d Let nvcc_wrapper accept -[-]extended-lambda (w/o expt- prefix) flag
363df92 dropped 32 bit powerpc support
45a5835 deleted () in cmake to stick to the usual practice in kokkos
cc6504d Merge pull request #5576 from dalg24/more_local_mpi_rank_detection
9a8ea2f Suppress warning: function declared with "noreturn" does return for CUDA and Debug mode (#5441)
6339975 Merge pull request #5579 from dalg24/hip-extended-lambda
9ce3b5f Merge pull request #5578 from ldh4/fix_signed_int_overflow_team_md_range
b552886 Do not add --expt-extended-lambda compile flag with HIP and GNU generated makefiles
246efc9 Convert int to int64_t to avoid signed int overflow warnings from clang UBsan
fba2cc4 Merge pull request #5575 from dalg24/release_37_changelog
dbc1d50 Add support to detect the local MPI rank with PMI (Process Management Interface)
c926bf5 split powerPC clocktic into 32 and 64 bit version
1faed5a Fixup release 3.7 chagelog Kokkos::common_view_alloc_prop not deprecated
7695978 Add 3.7.00 changelog
449d925 Cherry-pick missing 3.6.01 changelog lost in translation
0fb8b8a Merge pull request #5568 from bartlettroscoe/tril-11152-remove-undefined-tpl-deps
3b24f4e Merge pull request #5567 from dalg24/fixup_ub_logical_spaces_unit_test
453a812 Kokkos: Remove listing of undefined TPL deps (trilinos/Trilinos#11152)
1762225 Fix UB in logical spaces unit test
9764d03 Merge pull request #5503 from dalg24/desul_atomics_more_macros
91d0a4f Merge pull request #5549 from dalg24/desul_atomics_fixup_msvc
481cb8d Prefer C++ alignas specifier over GCC-specific language extension
5fbe4de Merge pull request #5546 from Rombur/trilinos_amdclang
1e0ce64 Merge pull request #5539 from Rombur/amdclang
7176188 Fix repeated team_reduce without barrier (#5540)
68d2f26 MSVC atomics template atomic_[compare_]exchange on MemoryOrder
f9d3ef0 Fixup missing host_ prefix for MSVC lock-based atomic_[compare_]exchange
8fcf9ae Refactor desul atomics generic host and device fetch op with macros
af65233 Merge pull request #5490 from dalg24/cuda_with_nvc++_ci_build
a872b76 Merge pull request #5542 from dalg24/doc_housekeeping
f1b5067 Do not error when using amdclang with Trilinos
04de99c Merge pull request #5527 from krasznaa/CUDAInitFix-develop-20221006
2384d9a Merge pull request #5184 from junghans/patch-5
92f8ae2 CI: test flang
05811e9 s/FIXME please/FIXME wrong result/
8239e56 Merge pull request #5543 from dalg24/rm_travis_yml_file
42139b5 [CI skip] Retire unused .travis.yml file
e1d0a8d [CI skip] Remove source helper file to query cuda arch
181bece Add doc/README with a word of warning redirecting to the online documentation on kokkos.github.io
1bc4913 Remove wildly outdated document about develop builds
05c9f1a Remove programming guide markdown file that points to outdated wiki page
1a88e22 Fixup not runing sort unit test with NVHPC
5cafe04 Fix linking when using amdclang
833da38 Merge pull request #5538 from crtrott/support-hopper
27393a0 Fix up cases where the arch macro is used for HOPPER
191b238 Trilinos: Pass OpenMP flags instead of linking with the OpenMP target (#5532)
ce3014c Add hopper to compute_capability detector
519cef6 Merge pull request #5536 from crtrott/fix-mixed-arch-workgrpah
30c8db1 Merge pull request #5537 from crtrott/fix-5501
18cefac Add config output and shared mem config for Hopper
1e1cfe3 Merge pull request #5535 from crtrott/fix-5534
3fe9540 Add Hopper support
fcf8a3c Add test to check for mismatch static dimension and mismatch layout
6989f38 CUDA: fixes mixed-arch-use of WorkGraphPolicy
18ddf7d Drop -Werror in NVHPC build for now
6a2fe1e Disable join unit test for Cuda too
bb4755b Skipping one more Cuda test to get NVHPC CI build to pass (could not reproduce)
73a57e3 Disable serial unit test failing with NVHPC CI CUDA build
2d73754 Disable core unit tests to get NVHPC CI build to pass
02ef991 Disable containers unit tests to get NVHPC CI build to pass
fb8179f Disable algorithm unit tests to get NVHPC CI build to pass
5425eb2 Try with -Werror and disabling bogus diagnostics
f7ee64d Temporarily disable OpenMP in the NVHPC CI build
e784787 Update CI build to use NVC++ to compile CUDA as well
f7bfcc5 Simplify View create_mirror returning HostMirror
8524dda Only link against libatomic in gnu-make OpenMPTarget build
be72920 Fix unnecessary check for runtime-rank 1 for Left/Right assignment
d5fcc32 Merge pull request #5528 from Rombur/trilinos_fix
bd9adc6 Fix 5315: use Kokkos::atomic_load to Correct Race Condition Giving Rise to Seg Fault'ing Error in OpenMP tests (#5530)
b684f57 Merge pull request #5531 from JBludau/fix_unnamed_functor_instance
88ce0aa fixup for intel19 (most-vexing parse)
4bbe86c Always check rank in View construction
f88d8ac Simplify copying the layout
1a63570 Export the flags in KOKKOS_AMDGPU_OPTIONS when using Trilinos
0ef177c Fixed the logic for building Kokkos for an older architecture.
f70b121 Merge pull request #5525 from dalg24/mpich_local_rank
292ba24 Add support for detecting MPI local rank with MPICH
5b21511 Team MD range policies impl (#5238)
11385fe Merge pull request #5491 from etphipp/fix_as_view_of_rank_n_for_sacado
d5575d4 Merge pull request #5343 from cz4rs/port-sample-perf-test
32921a7 Fix memory spaces in create_mirror_view overloads using view_alloc (#5488)
f73a8c9 fixing the preproc define and unified some naming in the tests
aa98af2 `SharedHostPinnedSpace` alias in fwd declaration (#5405)
bfe8f8c Merge pull request #5451 from seyonglee/openacc_parallel_team
056e812 Merge pull request #5510 from dalg24/fixup_tools_tests
25e2302 Merge pull request #5520 from dalg24/rm_unused_header_cuda_alloc
bc89f48 Remove (unused) header <Cuda/Kokkos_Cuda_Alloc.hpp>
2ce06b9 Merge pull request #5509 from masterleinad/update_cuda_11_0_dockerhub
d9e2a51 Replace nvidia/cuda:11.0-devel->nvidia/cuda:11.0.3-devel-ubuntu18.04
8bc1adc Fixup tools callbacks signature (pointers to const)
7a82a2a Fixup prefer KOKKOS_PROFILE_LIBRARY -> KOKKOS_TOOLS_LIBS env var in tests to avoid warnings
503c78e Merge pull request #5506 from ndellingwood/update-nightly-script
746e600 [ci skip] test_all_sandia: updates and cleanup
497b3f9 Replace 0 with nullptr
99e2013 Merge pull request #5500 from e10harvey/a64fx
db563cc Merge pull request #5498 from dalg24/drop_unused_host_device_atomic_compare_exchange
6385f26 core/src/impl: Fix warning as error
98699d1 cmake: define KOKKOS_ARCH_A64FX
7110f8c Also restrict other as_view_of_rank_n overloads to void specialize type
267f5eb Test Legion use case (#5206)
e0331a8 OpenMPTarget: Update CI to use llvm/15.0.0 and enable corresponding unit tests (#5496)
67f521a Drop unused desul generic fallback atomic_compare_exchange_{strong,weak} implementation
7a8ebaf Merge pull request #5497 from dalg24/desul_drop_unused_serial_atomics
1179da6 Drop unused serial atomics
bf01d32 Fence after View creation
a92bdae Report units correctly
05cef20 Try removing volatile from AtomicDataElement (#5455)
f97008a Merge pull request #5495 from ndellingwood/update-testscripts
f9f528c test_all_sandia: nightly testing script updates
4a5a3e7 add cmake flag to enable mdspan and include mdspan as a tpl (#4973)
c7ec8fb Move __pgi_vectoridx() call, which is used to set m_team_rank variable, into the OpenACCTeamMember constructor.
ac33c8e ClangFormat
d32e88b Update core/src/OpenACC/Kokkos_OpenACC_ParallelFor_Team.hpp
f49f1ed Adding OpenACC support for Makefiles (Makefile.kokkos and Makefile.targets) (#5437)
035a875 Remove redundant implementation file
9df3555 Use benchmark's native rate support
aaf14a8 OpenMPTarget: adding implementation to set device id. (#5492)
1a15ff5 Merge pull request #5289 from JBludau/SharedMemorySpace
9a42b6c Add FIXME_OPENACC the collapsing transformation macro in Kokkos_OpenACC.hpp Delete unused variable/function in Kokkos_OpenACC_Team.hpp
14a431f Fix formatting
83538b2 Allow as_view_of_rank_n() to be overloaded for "special" scalar types
9f7bc93 Delete struct always_false : std::false_type {}; in Kokkos_Utilities.hpp
5de7bb5 Apply suggestions from code review
b152efa Merge pull request #5431 from dalg24/nvcc-support-with-desul
1bdbe63 Move KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS macro into Kokkos_OpenACC.hpp
4dd3eaf Remove unused variables as suggested by code review. Move KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS macro into an OpenACC header file.
26516e0 Merge pull request #5452 from cwpearson/fix/for-single-volatile
7484954 Fixup `<desul/atomics/Lock_Array_{Cuda -> CUDA}.hpp>`
8abec24 Cleanup on Kokkos side following the desul atomics refactor
eb67f51 Fixup SYCL bug in desul atomics refactor
77dd69e Refactor desul atomics to support compiling CUDA with nvc++
051d049 Merge pull request #5486 from masterleinad/fix_cmake_threads
882655c Update core/unit_test/TestTeamBasic.hpp
93f69db Merge pull request #5485 from dalg24/nvc++_wo_cuda
9d25e52 Merge pull request #5478 from Rombur/block_size_deduction
627d018 Remove Kokkos option, KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS, and instead pass it to the NVHPC compiler directly.
5af13cf Refactor code in Kokkos_OpenACC_ParallelFor_Team.hpp so that a single '#ifdef KOKKOS_ENABLE_OPENACC_COLLAPSE_HIERARCHICAL_CONSTRUCTS` statement is used.
3fad6d2 Fix configuring with Threads support when rerunning CMake
60c51b6 Merge pull request #3 from dalg24/block_size_deduction
6edc1d5 Deduce pattern tag from the closure type
523107a Reorder order of template parameters
584af83 Make sure we don't add '-cuda' to the link line with NVC++
6173186 Merge pull request #5484 from ndellingwood/disable-hypot-ld-power9
f049bff Add missing HIP cpp files in Makefile.targets (#5481)
2024db7 Merge pull request #5479 from ndellingwood/cherrypick-5318
5fc53cd Disable kk3_hypot in Power9 testing
153a39e Merge pull request #5450 from dalg24/move_reduction_identity
f08afd4 Don't require user-defined volatile overloads in Kokkos::single
19fb19c Merge pull request #5318 from ibaned/avx-512-gcc-lt-8
9dfa1ec Use if constexpr in Kokkos_HIP_KernelLaunch.hpp
f1e9659 Simplify computation of team size
504169a Make functions constexpr in Kokkos_HIP_BlockSize_Deduction
2dee5cb Update core/perf_test/test_sharedSpace.cpp
9457b67 Update core/perf_test/test_sharedSpace.cpp
137c7d6 Move acquisition of memory scratch space to its own function (#5468)
1f048cf Use inline static member variables for CudaInternal (#5473)
96a9c76 Include <Kokkos_ReductionIdentity.hpp> from <Kokkos_NumericTraits.hpp> for backward compatibility
3343b17 HIP: Initialize device-related variables only by the singleton (#5444)
421ecb4 Refactor conditional codes as suggested by code review.
4fa3d2a Update core/src/OpenACC/Kokkos_OpenACC_Team.hpp
aace644 Use view's size to calculate statistics
296fcd9 Fix compiler error in SYCL parallel_scan (#5469)
48227a6 Add comments in `report_results()`
517f3ff Avoid unnecessary fence
80ca393 Extract ViewCopy_Raw benchmarks into separate file
012a789 dropped clang analyzer annotation for ShareSpace
9dff8cc Merge pull request #5466 from Rombur/test_work_graph
3487571 Remove HIP-only parameter for a test
1db9848 Remove obsolete comments
574fd77 Extract figure of merit helper function
bf84c82 Let benchmark determine number of repetitions
1da7253 Remove benchmarks from Makefile
659da41 Port DeepCopy rank 1, 2 & 3 tests
2a961bc Port DeepCopy rank 4 & 5 tests
e10bf56 Remove redundant DeepCopy Raw tests
4c576fc Port DeepCopy rank 6 tests
00820cf Port DeepCopy rank 7 tests
433e7b0 Mark selected counter as Figure of Merit
2fcfa2c Use separate ViewCopy header for benchmarks to avoid gtest dependency
b414149 Port remaining ViewCopy rank 8 tests
0387994 Move helper methods to a common header
2ec000e Use the same filename for ported test
0f4b3a6 Remove obsolete code
0d82de4 Port single `ViewCopy` test to use google benchmark lib
673a0ef Merge pull request #4875 from masterleinad/sycl_launch_bounds_wgroup_size
2dcb24a Merge pull request #5457 from Rombur/print_config
4ae0b50 Merge pull request #5378 from thearusable/5348-add-kokkos-config-to-metadata
96af187 Merge pull request #5438 from masterleinad/remove_kokkos_abort_message_buffer_size
c02a932 Merge pull request #5449 from masterleinad/print_configuration_add_architectures
55e428e Merge pull request #5462 from masterleinad/fix_restrict_sycl_cuda
a3ec6ee #5438: Change namespace to KokkosBenchmark
38ce3ff Don't enable displaying architectures based on Kokkos_ENABLE_UNSUPPORTED_ARCHS
10abb82 Fix forcing Kokkos_ENABLE_UNSUPPORTED_ARCHS with SYCL+NVidia GPUs
7f1ee99 Merge pull request #5442 from masterleinad/cuda_set_device_only_for_singleton
3372a27 #5438: Improve removal of unwanted characters from the context data
c53a6d0 #5438: Update core/perf_test/Benchmark_Context.hpp
2d389e4 #5438: Update code style with clang-format
bbcceb0 #5438: Add kokkos configuration to benchamrk metadata
e5a649c 1) Update Kokkos::Experimental::OpenACC::print_configuration() 2) Add FIXME_OPENACC comments for team_size_max and team_size_recommended APIs in Kokkos_OpenACC_Team.hpp
8342929 Merge pull request #5448 from Rombur/launch_local
e2ab8a5 1) Created Kokkos::Impl::always_false<T> in impl/Kokkos_Utilities.hpp file, and used it to issue the compile-time error if unimplemented functions are instantiated. 2) Deleted unused header files.
667d47b Merge pull request #5454 from ldh4/fix_hip_desc_cmake
a845d62 Apply clang format
6dc2317 Update as suggested by code review - Remove unnecessary inline keyword - Change KOKKOS_INLINE_FUNCTION to KOKKOS_FUNCTION in a class - Rename macros.
3a613c8 Apply suggestions from code review
7d23ecc Print the architecture for AMD GPU
c4257c0 Fixed cmake configure still printing HIP backend as Experimental::HIP
5a440a4 Move singleton construction close to initialization
e114205 Fix comments from review
3feb4de comment why we are using different memory for warmup
4200e53 Initial OpenACC backend implementation to support parallel-for constructs with Team policy. - Add COLLAPSE_HIERARCHICAL_CONSTRUCTS option to avoid issues on existing OpenACC compilers not supporting lambdas with parallel loops. - Not implemented features: scratch memory support, team_barrier(), team_broadcast(), team_reduce(), and team_scan().
8b5a155 Remove ok_id
8544fb2 Move reduction_identity into its own header file
406d8fd Add architcetures to print_configuration
e1a7697 Use default class member initialization for lists
f757503 Introduce team shared memory pool
c5f7108 HIP: Pass functor by value when using LocalMemory
4477a25 Merge pull request #5445 from dalg24/openacc_shared_allocation_record_header
8bf5526 switched from lambda to named functor to get rid of ENABLE_CUDA_LAMBDA
51cc823 Drop unnecessary SharedAllocationRecord<OpenACCSpace, void>::alocate member function
3c7fd11 Move OpenACC SharedAllocationRecord implementation to separate header and source file
c3b5f5a switched from double to uint64_t in for_each
f1b01fb okay, lets redefine NOMINMAX ... but something is really fucked up
8697f2b try if minimal windows header solves the issue
744617a love windows includes
7808e3b remove include windows.h as it is done in Kokkos_core
00aa7cd moved include order to please Bill Gates
f33e820 Cuda: Initialize device-related variables only by the singleton
4b614a9 changed tests to use has_shared_space constexpr variable
74d6881 try if () around _WIN32 is getting windows to compile
b17e264 change from constexp func to constexpr variable and switching to snake case
07c32f5 SYCL RangePolicy: manually specify workgroup size through chunk size
d0f710d Merge pull request #5434 from dalg24/promote_math_constants
91c0b67 Remove unused KOKKOS_ABORT_MESSAGE_BUFFER_SIZE
d3da941 Merge pull request #5430 from masterleinad/clean_kokkos_compiler_cuda
4ceebf4 Merge pull request #5435 from dalg24/hip_do_not_warn_about_xnack_when_no_support_for_page_migration
20c2142 Merge pull request #5433 from dalg24/intel_suppress_missing_return_statement_warning
bcfbbb3 Unit test skips if host and device execution space are the same, as there is no migration
fe6c769 Fixup quad support for math functions specialized in the right namespace
a58c622 Merge pull request #5428 from ibaned/fma-function
eb3c89f Avoid spamming users with warnings about XNACK after detecting that page migration is not supported
b2676e8 Update tests following the promotion of the math constants
7f3c6c4 Cleanup mathematical special functions
081ff05 Promote mathematical constants to Kokkos::{Experimental -> numbers} namespace
389d70f Add math constant variables without the _v suffix
53d88bf Adjusted threshold to 1.5 in an attempt to make ci pass on cpu (parallel workloads)
c9ad8aa Suppress bogus missing return statement warning with Intel Compiler Classic
8ba3174 Add quad-precission fma overload
f1a8cfc Merge pull request #5429 from ldh4/fix_missing_namespace
7a8e7ed Remove KOKKOS_COMPILER_CUDA_VERSION
31914bd Clean up for NVCC<11
0f807c8 Merge pull request #5411 from masterleinad/disable_ice_openmptarget_tests
34ac119 Merge pull request #5424 from dalg24/kokkos_version_macros
50e4e33 Merge pull request #5427 from crtrott/issue-5426
5a4d1b3 added perf-test with extended information about the migrations
8194fa1 added unit test for SharedSpace to defaultDevice test
94b994a added SharedSpace alias and utility functions
b730a74 add test for Kokkos::fma
a4f3de3 fix bug in ternary function test macro
f20bcd8 move fma to where its comment was located
6af2343 Fixed incorrect namespace
59f0a7a adding Kokkos::fma(x, y, z)
11ab46f Update core/src/impl/Kokkos_ViewCtor.hpp
cc24f21 Fix spurious warning in NVCC < 11.5 about missing return
9b127a4 Define KOKKOS_COMPILER_NVCC with version number
3be4cef Merge pull request #5425 from dalg24/abort_illegal_init_or_finalize
a67c16a Per review KOKKOS_VERSION_COMPARE -> KOKKOS_VERSION_{LESS,GREATER,EQUAL}
cd2d5e3 Dispatch Kokkos::sort(Kokkos::View) to CUDA Thrust (#5183)
c2e7ea2 Abort when calling initialize() more than once or calling finalize before init or after finalize
c82eb33 Silence warnings about valueView being unused (#5421)
4e2c540 Enable Android x86_64 support (#5423)
6f5de58 Draft version comparison macro
fccc196 Defined KOKKOS_VERSION_{MAJOR,MINOR,PATCH} macros
9645d46 Merge pull request #5391 from masterleinad/dont_rely_on_default_stream
30c17a8 Dispatch Kokkos::sort(Kokkos::View) to std::sort (#5372)
338b458 Remove dummy arguments for ViewCtorProp (#5314)
bd44b5c Merge pull request #5418 from Rombur/rocm_52
c4c9bfc Remove XL compiler support (#5349)
397a6bc Don't rely on synchronization behavior of default stream in CUDA and HIP
802b7e6 Refactor HIP backend (#5410)
1f678ec Merge pull request #5417 from masterleinad/remove_deprecated_kokkos_task_policy
0fb2f48 Remove code only used by ROCm < 5.0
068973a Merge pull request #5374 from masterleinad/no_inline_default_delete
ef85afa Merge pull request #5416 from masterleinad/require_rocm_5_2_0
b35228a Remove deprecated Kokkos_TaskPolicy.hpp
25bec65 Merge pull request #5415 from brian-kelley/SplitRandomSortTest
c844e90 Extent FIXME_OPENMPTARGET comments
3220c59 Update algorithms/unit_tests/CMakeLists.txt
fcfa27b Require ROCm 5.2.0
578e6ae Split Random/Sort/NestedSort test into multiple cpps
e50a7be Merge pull request #5317 from brian-kelley/Do645
76b36ac Define KOKKOS_DEFAULTED_FUNCTION and KOKKOS_INLINE_FUNCTION_DELETED empty
34bda9e Merge pull request #5389 from PhilMiller/5385-develop-sort-fence
108d6e8 Merge pull request #5398 from dalg24/execution_spaces_regular
45522bf Disable OpenMPTarget unit tests that cause ICEs with icpx
b0e4d55 Merge pull request #5409 from PhilMiller/5194-deprecate-volatile
6c54349 parallel_scan with View as result type (#5146)
f08c241 Format
4efb43f Add unit test that volatile-qualified join() is called when we expect it to be
d7baa9f #5194: Fail compilation if a volatile-qualified join() would be called
4f8e0ea Introduce dependent_false_v<T> so obscure code and clarifying comments don't need to be repeated
4689208 #5385: Add fences to all non-exec-space sorting routines other than BinSort constructors
7522044 Replace CL/sycl.hpp (#5387)
72a4a6c Merge pull request #5404 from masterleinad/dont_use_filesystem
20e072b Merge pull request #5407 from nmm0/5406-remove-Kokkos_PhysicalLayout
c50399a OpenACC schedule type in parallel_for and parallel_reduce (RangePolicy) (#5340)
f3265d5 #5406: remove Kokkos_PhysicalLayout.hpp since it is unused
2daa5e4 Merge pull request #5388 from kokkos/5382-develop-deprecate-sort-bool
8bd5bfb Don't use <filesystem> functionality
1d77ae9 Merge pull request #5402 from masterleinad/disable_device_and_threads_test_for_trilinos
b38f68a Merge pull request #5395 from Rombur/m_regsPerSM
fe2cd1b Disable KokkosCore_UnitTest_DeviceAndThreads for Trilinos
43eba1c Remove m_regsPerSM in the CUDA backend
c39b4a3 Provide equality comparison operators for HPX as well
bcbc086 Check that execution spaces meet the requirements of regular types
2cb40e1 Take advantage of C++17 when checking that execution spaces meet requirements
bb94f0e Define comparison operators == and != for all execution spaces
d1f61dc Add is_device_v helper variable template
5371c05 Remove unused variable: m_regsPerSM
aef15d2 Merge pull request #5383 from Rombur/hip_experimental
2a8d03a Fix unit test that was calling deleted code path
39d0d58 Move HIP out of experimental
5dd1ceb Merge pull request #5377 from masterleinad/fix_init_array_reduction
3a74a23 Merge pull request #5376 from dalg24/openacc_parallel_mdrange
01adeee Fix test to match code change
f909cb5 Remove code used only in the ex-deprecated deleted path
eda9c4b Remove 'previously' deprecated sort() overloads
0971348 #5382: Deprecate overloads of Kokkos::sort() taking parameter 'bool always_use_kokkos_sort'
477df17 Minor update on the FIXME_OPENACC comment.
a9a8699 Fix initialization for array reductions
ddddeae Enable more tests and fixup unimplemented comments for OpenACC
2694523 Implement OpenACC MDRangePolicy parallel_for
243a6c3 Merge pull request #5328 from dalg24/test_device_and_num_threads_after_initialization
045a0d3 Fixup typo async_arg[c] in ParallelReduce OpenACC
2d6cbad Tracking performance testing: Integrate google benchmark (#5177)
2b2bb3d Don't use 'inline' with KOKKOS_DEFAULTED_FUNCTION or KOKKOS_INLINE_FUNCTION_DELETED
38ac8f1 Merge pull request #5371 from masterleinad/cleanup_cuda_10.0
7d9b708 Clean up CUDA <= 10.0 checks
0b9dee7 Merge pull request #5370 from dalg24/type_identity
889a530 Rename Impl::{identity -> type_identity} and import it from std:: when C++20 is available
81715f8 Merge pull request #5368 from dalg24/prefer_std_void_t
d85fb44 Fixup include <Kokkos_Macros.hpp> to pass header self-containment test
22ee620 Merge pull request #5365 from dalg24/do_not_drop_label_when_reallocating_view
0e128e7 Enable automatic detection of arch when enabling 'HIP' with 'hipcc' (#5327)
746a110 Prefer std::void_t now that C++17 is available
8ad66d0 Merge pull request #5 from masterleinad/do_not_drop_label_when_reallocating_view
69085ef Add dummy source file to SIMD to allow deduction of linker language. (#5354)
f48e95e Check labels for container types for resize and realloc
ac67f75 Merge pull request #5367 from dalg24/refactor_initialization_settings_class_use_std_optional
c260025 Enable the default OpenACC execution space (#5360)
1a9d992 Merge pull request #5345 from brian-kelley/FixAllocRecordPrintouts
dcdb511 Refactor InitializationSettings class to leverage std::optional
dd63284 Per review do not bother with temporary variable to store the label in realloc
14a542c Check labels in realloc unit test
8698ab7 Bugfix preserve view label when calling realloc()
4ebaaf5 SharedAlloc print_records: don't check m_alloc_ptr
0cc7917 Move sort test includes, revert whitespace changes
35faf09 Move nested sort tests into a separate file
7994904 Nested sort test: add error messages
67bb790 NestedSort test cleanup
9d67e0b Pass tag by value
af4bc75 Update minimum compiler versions + clean up (#5323)
4cd0fc4 Merge pull request #5357 from masterleinad/bump_kokkos_version_develop
4d5b4cf Merge pull request #5356 from masterleinad/fix_pragma_ivdep_openmp
d7ba238 Bump Kokkos version on develop
82834e3 Merge pull request #5227 from masterleinad/sycl_deduce_wgroup_size_reduce
421870d Fix pragma ivdep in Kokkos_OpenMP_Parallel.hpp
867a4ad Restrict the number of tests for num_threads
b82c258 Add overloads of `hypot` math function that take 3 arguments (#5341)
b77fb8f Drop mutable
9474a89 Merge pull request #5350 from kokkos/revert-5338-fix-linker-language-simd
466ba83 Revert "Fix missing linker language for SIMD (#5338)"
188fac1 Merge pull request #4105 from masterleinad/openmp_detection
cb2eaff Nested bitonic sort: small changes
b51b5e7 Merge pull request #5307 from masterleinad/avoid_default_ctor_withoutinitializing
38873b0 Fix null deref in SharedAllocationRecord::print_records
10bcbb8 Fix missing linker language for SIMD (#5338)
7d55fa3 Merge pull request #5344 from masterleinad/fix_unordered_map
6688894 Fix UnorderedMapRehash::operator()
93dc5a4 Fixup capture_ouptut parameter for subprocess.run was added in version 3.7
7096260 Temporarily disable test for OpenMPTarget since it does not select the right device
d2d9275 removed DEBIAN_FRONTEND=noninteractive
21807ab pep8ed the python file
6d4b49b added header
4f3ebbe Merge pull request #5331 from dalg24/fixup_cxx17
a0e3579 added apt repo
63a5d35 Merge pull request #5336 from bartlettroscoe/tril-10810-fix-cmake-install
373e68e Install newer GCC in ubuntu18.04 based Docker images
c3e1a21 Add missing <thread> header include
faad639 Add OpenMP Target support in the test
65364d2 Fixup USE_SOURCE_PERMISSIONS is only supported since CMake 3.20
ae3e7fd Add tests for disable_warnings and tune_internals as well
f213a6a Add Python test for device_id and num_threads after initialization
5d84992 Add executable that writes num_threads or device_id on demand after initialization
8fe13d1 Merge pull request #5325 from PhilMiller/5312-revert-3580
ee476ec Simplify KOKKOS_{EXT,SUB,INT}_LIBRARIES logic into a single KOKKOS_COMPONENT_LIBRARIES list
0638341 Fixup unconditionally enable SIMD now that C++17 is the minimum cxx standard required
767cbc9 Merge pull request #5330 from dalg24/openmp_print_warning_to_standard_error_stream
085c6fc Print OpenMP warnings to the standard error stream
daf933a OpenACC `parallel_for` and `parallel_reduce` (#5322)
745cfd8 Fix Sort/NestedSort includes
6ea1250 Move team/thread sort to Experimental::, new header
294b7f6 Use KOKKOS_FUNCTION in place of INLINE, FORCEINLINE
839df99 Merge pull request #5326 from dalg24/drop_reciprocal_overflow_thresold_trait
6ae55a9 sort_thread tests: Fix OOB access on idle threads
8f73a3c Drop reciprocal_overflow_threshold trait
441f39f Merge pull request #5297 from masterleinad/remove_deprecated_code_3
75346c5 #5312: Revert #3580 and try a different workaround
a86eb7f Refactor nested-parallelism sort to need 1 impl only
91531a5 Small updates to nested sort
e645320 Reintroduce Kokkos_ENABLE_DEPRECATED_CODE_3
c481603 Merge pull request #5321 from dalg24/fixup_cuda_arch_auto_detection_when_included_in_other_cmake_project
8e274f6 Merge remote-tracking branch 'upstream/develop' into openmp_detection
c14b43a Move HWLOC test
a82b9ab reindent KokkosCore.hpp
5290902 Keep math functions in Experimental namespace for now
e068698 Keep InitArguments for now
5df1d23 Remove warn_if_deprecated
ef1e253 Use std::bool_constant
97bdee7 Fix HWLOC test
eb4c146 KOKKOS_ENABLE_DEPRECATED_CODE_3->KOKKOS_ENABLE_DEPRECATED_CODE_4
bf2ba6c Remove all deprecated code, except for partition_master
d10baa0 Remove Pthread backend
a6042cb Including private headers is an error
8268c40 Remove KOKKOS_IMPL_CUDA_CLANG_WORKAROUND comment
db45fb4 Guard destroy functor instantiated for GPU backends
1f946f9 Team- and thread-level sort, sort_by_key
b0f3ef7 Merge pull request #5295 from masterleinad/cleanup_cxx_17
7415171 Merge pull request #5316 from masterleinad/arch_native_msvc
f7859f1 Initial support for multiple OpenACC execution space instances: (#5296)
5e92b46 Fixup CUDA arch auto detection in Trilinos
db66946 Don't test using a different compiler with OpenMP support
d845b57 Merge pull request #5273 from nliber/is_concept_v
36251dd Update error message for GCC < 7
c45e698 Merge remote-tracking branch 'upstream/develop' into openmp_detection
5c5c9ea inline constexpr bool is_CONCEPT_v variables added to match C++17 traits
fc175ba Merge pull request #5281 from masterleinad/test_size_containers_create_mirror
c5dffba Error out if ARCH_NATIVE is requested for MSVC
2a9ef31 Apply suggestions from code review
9da3fc0 Restore KOKKOS_ATTRIBUTE_NODISCARD
efbac46 Update comments in cmake/kokkos_corner.cmake
7515804 Always instantiate the destroy functor
f44063b Fix labels in View initialization
e615701 Avoid instantiating default constructor of value type when WithoutInitializing is given
8acdaed KOKKOS_CLASS_LAMBDA is always defined
f855c1d static constexpr variables and *_v
57c3434 Remove another FIXME in cmake/kokkos_corner_cases.cmake
e8280d0 Miscellaneous clean-ups
af6c8db STATIC_ASSERT -> static_assert
fcac9c5 Replace attributes
be7dd5f Merge pull request #5310 from ndellingwood/update-testing-cpp17
1a0c77f [ci skip] set cpp standard to 17
6b89c47 Merge pull request #5308 from masterleinad/layouts_not_defuault_constructible
d052ce8 Merge pull request #5277 from masterleinad/require_cxx_17
7cc17c6 Don't assume layouts are default-constructible
95d7082 Merge pull request #5303 from masterleinad/improve_offset_view
882076b Don't change nvcc_wrapper
89fa162 Merge pull request #5302 from dalg24/cherry-pick-fix-intel-ice
8efa799 Use copyright header in Kokkos_OffsetView.hpp
1b93a44 KOKKOS_INLINE_FUNCTION -> KOKKOS_FUNCTION in class
ff8ed0d Implement OffsetView constructor taking pairs and ViewCtorProp
fefefce Use if constexpr in cplusplus17.cpp
1d36652 Work around intel compiler bug
7f3a4d3 Merge pull request #5300 from masterleinad/clean_internal_scratch_bitset
f590343 Avoid allocating memory for UniqueToken
7b0fedf Add FIXME_CXX17
9abc0b5 More nvcc_wrapper clean up
dff1f45 Use OMP_NESTED = 'true' for gcc-8.4.0 in CI
35b5728 Update nvcc_wrapper
2b820b1 Update build system
277230b Update CI
4c5be02 Minimal changes to source
b2371de Allow using C++23 (#5283)
28a8631 Merge pull request #5246 from masterleinad/sycl_store_device_id
fd17f28 Merge pull request #5294 from masterleinad/fix_bhalf_t_prod_test
1c7f66b Support finding libquadmath with native compiler support (#5286)
955f2ec Merge pull request #5293 from masterleinad/kokkos_cxx_standard_error
966fa3c Merge pull request #5292 from dalg24/forward_scope_guard_arguments_to_initialize
afb977b Only test product for bhalf_t for N<=5
515d5b7 Turn setting Kokkos_CXX_STANDARD into an error
39c677e Merge pull request #5291 from dalg24/fixup_pin_openacc_build
f0b6442 Use perfect forwarding to Kokkos::initialize in ScopeGuard constructor
638b1b9 Fixup OpenACC must run on a machine that can handle large images
c8f1ffb Merge pull request #5288 from masterleinad/force_openacc_ci_volta70
d4c88ce Run OpenACC CI on a Volta70 machine
ab1fdea SYCL: Store device_id passed from initialization
4b9bce5 Merge pull request #5268 from crtrott/fix-warnings
12519a4 Check size in Containers WithoutInitializing test
5f6e3a8 Merge pull request #5272 from dalg24/fixup_flag_removal
b33dd1e Merge pull request #5275 from PhilMiller/5274-dynamicview-mirror
824629a Merge pull request #5271 from ldh4/rank_remove_dim_limit
23fb091 #5274: Fix test to match expectation of fences that need to be there
636727d #5274 DynamicView: Properly resize mirror instances after construction
e8c5024 Add test for kokkos-tools parsing --kokkos-tools-libs flag
8542fdc Test flag removal
c53ca75 Fix flag removal in Tools and warn when flag is not recognized
bc7adbf Do not forget to set last element to nullptr when removing flag
a53ec9b Remove Kokkos::Rank limit to 6 ranks
6b07e39 Merge pull request #5267 from dalg24/raised_by_kokkos_initialize
54b3ec3 Cleanup "Raised by Kokkos::initialize" error and warning messages
3297b04 Limit workgroup size to 512 when not using an Intel GPU
040419a Link with OpenMP
3c64b3a Don't use link flags
426d982 Use LIB_NAMES instead
9bb4005 Try using FindOpenMP instead of figuring out flags manually
e90fca1 Deduce workgroup size for SYCL parallel_reduce RangePolicy
git-subtree-dir: tpls/kokkos
git-subtree-split: aa1f48f
1 parent d4ea755 commit b5542d3
File tree
1,139 files changed
+33692
-42155
lines changed- .github/workflows
- algorithms
- src
- std_algorithms
- impl
- unit_tests
- benchmarks
- atomic
- bytes_and_flops
- gather
- gups
- policy_performance
- stream
- bin
- cmake
- Modules
- compile_tests
- deps
- tpls
- containers
- performance_tests
- src
- impl
- unit_tests
- core
- perf_test
- src
- Cuda
- HIP
- HPX
- OpenACC
- OpenMPTarget
- OpenMP
- SYCL
- Serial
- Threads
- View
- Hooks
- MDSpan
- decl
- fwd
- impl
- setup
- traits
- unit_test
- category_files
- configuration/test-code
- cuda
- default
- headers_self_contained
- hip
- hpx
- incremental
- openmptarget
- openmp
- serial
- standalone
- sycl
- tools
- include
- view
- doc
- hardware_identification
- example
- build_cmake_in_tree
- build_cmake_installed_different_compiler
- build_cmake_installed_kk_as_language
- build_cmake_installed
- make_buildlink
- query_device
- tutorial
- 01_hello_world_lambda
- 01_hello_world
- 02_simple_reduce_lambda
- 02_simple_reduce
- 03_simple_view_lambda
- 03_simple_view
- 04_simple_memoryspaces
- 05_simple_atomics
- 06_simple_mdrangepolicy
- Advanced_Views
- 01_data_layouts
- 02_memory_traits
- 03_subviews
- 04_dualviews
- 05_NVIDIA_UVM
- 07_Overlapping_DeepCopy
- Algorithms/01_random_numbers
- Hierarchical_Parallelism
- 01_thread_teams_lambda
- 01_thread_teams
- 02_nested_parallel_for
- 03_vectorization
- 04_team_scan
- launch_bounds
- virtual_functions
- scripts
- docker
- testing_scripts
- simd
- src
- unit_tests
- tpls
- desul
- include/desul/atomics
- cuda
- openmp
- src
- mdspan/include/experimental
- __p0009_bits
- __p1684_bits
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
1,139 files changed
+33692
-42155
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
| 74 | + | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
79 | 86 | | |
80 | 87 | | |
81 | 88 | | |
| |||
93 | 100 | | |
94 | 101 | | |
95 | 102 | | |
| 103 | + | |
96 | 104 | | |
97 | | - | |
| 105 | + | |
98 | 106 | | |
99 | 107 | | |
100 | 108 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
Large diffs are not rendered by default.
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
| |||
148 | 152 | | |
149 | 153 | | |
150 | 154 | | |
| 155 | + | |
151 | 156 | | |
152 | 157 | | |
153 | 158 | | |
154 | 159 | | |
155 | 160 | | |
156 | 161 | | |
| 162 | + | |
157 | 163 | | |
158 | 164 | | |
159 | 165 | | |
| |||
184 | 190 | | |
185 | 191 | | |
186 | 192 | | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | 193 | | |
192 | 194 | | |
193 | 195 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
3 | 106 | | |
4 | 107 | | |
5 | 108 | | |
| |||
0 commit comments