bugprone-narrowing-conversions#9505
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
cub/cub/detail/warpspeed/allocators/smem_allocator.cuh (1)
53-60:⚠️ Potential issue | 🟠 Major | ⚡ Quick winimportant: Handle
align == 0before round-up math inalloc.
With the current formula on Line 56,aligndefaulting to0makes(align - 1)underflow and can forceptrAllocation32to0, which breaks allocation monotonicity. The cast on Line 59 then hides this by folding a wrapped unsigned delta intoint. Guard zero alignment (e.g., normalize to1or assert/reject) before computingptrAllocation32.libcudacxx/benchmarks/bench/find_if/basic.cu (1)
27-33:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winsuggestion:
mismatch_pointcan beelementswhenMismatchAtis1.0, butstate.add_global_memory_reads<T>(mismatch_point + 1)then recordselements + 1reads. Cap the reported read count toelementsto avoid skewed benchmark metadata for that axis value.thrust/benchmarks/bench/none_of/basic.cu (1)
21-27:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winsuggestion: for
MismatchAt == 1.0,mismatch_pointbecomeselements, sostate.add_global_memory_reads<T>(mismatch_point + 1)overreports by one. Please clamp the accounting value toelementsto keep benchmark stats accurate.cub/test/catch2_test_device_reduce_env_api.cu (1)
418-446:⚠️ Potential issue | 🟠 Majorimportant: Line 443 validates device results after launching work on
stream_ref, but this block does not synchronize that stream first. Unlike the other stream tests in this file, this can race outstanding work and make the assertion nondeterministic. Addstream.sync();before theREQUIREchecks.
🧹 Nitpick comments (11)
thrust/thrust/detail/complex/cexpf.h (1)
61-61: ⚡ Quick winimportant: Line 61 still evaluates the exponent expression in
uint32_tbefore casting, so underflow can occur in unsigned space and then convert tointwith implementation-defined semantics. Cast to signed before arithmetic (e.g.,const int exp_bits = static_cast<int>(hx >> 23); *expt = exp_bits - (0x7f + 127) + static_cast<int>(k);).cudax/benchmarks/bench/copy/copy_bench.cu (1)
25-25: important: this benchmark.cuchange should include before/after performance evidence (and SASS-diff check if codegen changed), otherwise we can’t confirm it is regression-free.As per coding guidelines, “Do not commit SASS code changes without running benchmarks to check for performance regressions,” and benchmark changes should follow
docs/cub/benchmarking.rst.Source: Coding guidelines
libcudacxx/benchmarks/bench/all_of/basic.cu (1)
27-27: important: please attach benchmark comparison output for this arithmetic/codegen-touching benchmark change (base vs new), so we can confirm no performance regression.As per coding guidelines, benchmark updates should follow
docs/cub/benchmarking.rst, and.cuSASS-affecting changes require benchmark validation.Source: Coding guidelines
libcudacxx/benchmarks/bench/rotate_copy/basic.cu (1)
24-24: important: this benchmark arithmetic change should include base/new benchmark results to validate no throughput regression inrotate_copy.As per coding guidelines, benchmark changes should follow
docs/cub/benchmarking.rst, and.cuchanges that can alter generated code should be benchmark-validated.Source: Coding guidelines
thrust/benchmarks/bench/any_of/basic.cu (1)
21-21: important: please provide before/afterany_ofbenchmark output for this change to confirm no regression in measured performance.As per coding guidelines, benchmark updates should follow
docs/cub/benchmarking.rst, and.cuchanges with potential codegen impact should be validated with benchmarks.Source: Coding guidelines
c/experimental/stf/test/test_cuda_kernel.cu (1)
20-21: ⚡ Quick winsuggestion: declare
tidandnthreadsasconst intsince they are not reassigned; this matches the repository rule that non-mutating locals should beconst.As per coding guidelines: “All non-const variables must use
const.”Source: Coding guidelines
c/experimental/stf/test/test_host_launch.cu (1)
20-21: ⚡ Quick winsuggestion: make
tidandnthreadsconst int; both are immutable after initialization and should follow the project’s const-local requirement.As per coding guidelines: “All non-const variables must use
const.”Source: Coding guidelines
c/experimental/stf/test/test_logical_data_with_place.cu (1)
26-26: ⚡ Quick winsuggestion: change
int itoconst int ihere; the value is not mutated and the repo style requires const locals when possible.As per coding guidelines: “All non-const variables must use
const.”Source: Coding guidelines
thrust/examples/sum_columns.cu (1)
43-43: ⚡ Quick winsuggestion: add
<cstddef>forstd::ptrdiff_tand make this localconst autosince it is not reassigned. As per coding guidelines, files must include headers for symbols they use (no transitive reliance), and non-const variables should beconst.Source: Coding guidelines
thrust/examples/sum_rows.cu (1)
43-43: ⚡ Quick winsuggestion: include
<cstddef>forstd::ptrdiff_tand changerow_idx_endtoconst autobecause it is never modified. As per coding guidelines, do not rely on transitive includes and useconstfor non-mutated variables.Source: Coding guidelines
c/parallel/test/test_segmented_reduce.cpp (1)
416-417: ⚡ Quick winsuggestion: add
<cstddef>because these lines usestd::ptrdiff_t, and make both iterator localsconst autosince they are read-only. As per coding guidelines, files must include headers for used symbols and non-mutated locals should beconst.Also applies to: 487-488
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: db866020-3a6c-4d48-a5b0-c6db8daef821
📒 Files selected for processing (167)
.clang-tidyc/experimental/stf/test/test_async_resources_handle.cuc/experimental/stf/test/test_cuda_kernel.cuc/experimental/stf/test/test_host_launch.cuc/experimental/stf/test/test_logical_data_with_place.cuc/experimental/stf/test/test_stackable.cuc/experimental/stf/test/test_stream_ctx_override.cuc/experimental/stf/test/test_task_get_graph.cuc/parallel/test/test_histogram.cppc/parallel/test/test_segmented_reduce.cppc/parallel/test/test_segmented_sort.cppc/parallel/test/test_util.hc2h/include/c2h/cpu_timer.hcub/benchmarks/bench/merge/merge_common.cuhcub/cub/agent/agent_batch_memcpy.cuhcub/cub/agent/agent_histogram.cuhcub/cub/agent/agent_radix_sort_histogram.cuhcub/cub/agent/agent_radix_sort_onesweep.cuhcub/cub/agent/agent_radix_sort_upsweep.cuhcub/cub/agent/agent_reduce_by_key.cuhcub/cub/agent/agent_rle.cuhcub/cub/agent/agent_scan.cuhcub/cub/agent/agent_scan_by_key.cuhcub/cub/agent/agent_select_if.cuhcub/cub/agent/agent_three_way_partition.cuhcub/cub/agent/agent_topk.cuhcub/cub/agent/agent_unique_by_key.cuhcub/cub/agent/single_pass_scan_operators.cuhcub/cub/block/block_radix_rank.cuhcub/cub/block/specializations/block_reduce_warp_reductions.cuhcub/cub/detail/warpspeed/allocators/smem_allocator.cuhcub/cub/detail/warpspeed/look_ahead.cuhcub/cub/detail/warpspeed/squad/squad.cuhcub/cub/detail/warpspeed/sync_handler.cuhcub/cub/device/dispatch/dispatch_scan_by_key.cuhcub/cub/device/dispatch/kernels/kernel_histogram.cuhcub/cub/device/dispatch/kernels/kernel_reduce_deterministic.cuhcub/cub/device/dispatch/kernels/kernel_scan_lookahead.cuhcub/cub/device/dispatch/kernels/kernel_segmented_reduce.cuhcub/cub/util_ptx.cuhcub/cub/warp/specializations/warp_scan_shfl.cuhcub/examples/block/example_block_radix_sort.cucub/examples/block/example_block_reduce.cucub/examples/block/example_block_scan.cucub/examples/device/example_device_partition_flagged.cucub/examples/device/example_device_partition_if.cucub/examples/device/example_device_select_flagged.cucub/examples/device/example_device_select_if.cucub/examples/device/example_device_select_unique.cucub/examples/device/example_device_sort_find_non_trivial_runs.cucub/examples/device/example_device_topk_pairs.cucub/test/catch2_large_array_sort_helper.cuhcub/test/catch2_large_problem_helper.cuhcub/test/catch2_segmented_sort_helper.cuhcub/test/catch2_test_block_load_to_shared.cucub/test/catch2_test_block_scan.cucub/test/catch2_test_block_shuffle.cucub/test/catch2_test_device_reduce_env_api.cucub/test/catch2_test_device_scan.cuhcub/test/catch2_test_device_segmented_radix_sort_pairs.cucub/test/catch2_test_device_segmented_scan_api.cucub/test/catch2_test_device_segmented_scan_noncommutative.cucub/test/catch2_test_device_select_env_api.cucub/test/catch2_test_device_topk_common.cuhcub/test/catch2_test_device_transform.cucub/test/catch2_test_device_transform_api.cucub/test/catch2_test_launch_wrapper.cucub/test/catch2_test_memcpy_vectorized_copy.cucub/test/catch2_test_radix_operations.cucub/test/catch2_test_warp_exchange.cuhcub/test/catch2_test_warp_load.cucub/test/catch2_test_warp_merge_sort.cucub/test/catch2_test_warp_scan.cucub/test/catch2_test_warp_scan_api.cucub/test/catch2_test_warp_scan_partial.cucub/test/catch2_test_warp_scan_partial_helper.cuhcub/test/catch2_test_warp_scan_partial_invalid.cucub/test/catch2_test_warp_store.cucub/test/test_allocator.cucub/test/test_util.hcub/test/warp/catch2_test_warp_reduce_batched.cucub/test/warp/catch2_test_warp_segmented_reduce.cucudax/benchmarks/bench/copy/copy_bench.cucudax/examples/simple_p2p.cucudax/examples/stdexec_stream.cucudax/examples/vector_add.cucudax/test/copy/copy.cucudax/test/copy_bytes/mdspan_d2h_h2d.cucudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cucudax/test/cuco/hyperloglog/test_hyperloglog.culibcudacxx/benchmarks/bench/adjacent_find/basic.culibcudacxx/benchmarks/bench/all_of/basic.culibcudacxx/benchmarks/bench/any_of/basic.culibcudacxx/benchmarks/bench/equal/basic.culibcudacxx/benchmarks/bench/find/basic.culibcudacxx/benchmarks/bench/find_if/basic.culibcudacxx/benchmarks/bench/find_if_not/basic.culibcudacxx/benchmarks/bench/is_heap/basic.culibcudacxx/benchmarks/bench/is_heap_until/basic.culibcudacxx/benchmarks/bench/is_partitioned/basic.culibcudacxx/benchmarks/bench/is_sorted/basic.culibcudacxx/benchmarks/bench/is_sorted_until/basic.culibcudacxx/benchmarks/bench/mismatch/basic.culibcudacxx/benchmarks/bench/none_of/basic.culibcudacxx/benchmarks/bench/rotate/basic.culibcudacxx/benchmarks/bench/rotate_copy/basic.culibcudacxx/benchmarks/bench/shift_left/basic.culibcudacxx/benchmarks/bench/shift_right/basic.culibcudacxx/include/cuda/__barrier/barrier_block_scope.hlibcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.culibcudacxx/test/libcudacxx/cuda/containers/buffer/transform.culibcudacxx/test/support/test_pstl.hnvbench_helper/nvbench_helper/nvbench_helper.cuhtest/cuda_smoke/cuda_runtime_smoke.cuthrust/benchmarks/bench/all_of/basic.cuthrust/benchmarks/bench/any_of/basic.cuthrust/benchmarks/bench/equal/basic.cuthrust/benchmarks/bench/find/basic.cuthrust/benchmarks/bench/find_if/basic.cuthrust/benchmarks/bench/find_if_not/basic.cuthrust/benchmarks/bench/is_partitioned/basic.cuthrust/benchmarks/bench/is_sorted/basic.cuthrust/benchmarks/bench/is_sorted_until/basic.cuthrust/benchmarks/bench/mismatch/basic.cuthrust/benchmarks/bench/none_of/basic.cuthrust/examples/bucket_sort2d.cuthrust/examples/cuda/wrap_pointer.cuthrust/examples/discrete_voronoi.cuthrust/examples/dot_products_with_zip.cuthrust/examples/monte_carlo.cuthrust/examples/monte_carlo_disjoint_sequences.cuthrust/examples/padded_grid_reduction.cuthrust/examples/sort.cuthrust/examples/sum_columns.cuthrust/examples/sum_rows.cuthrust/examples/transform_iterator.cuthrust/testing/binary_search_vector.cuthrust/testing/catch2_test_adjacent_difference.cuthrust/testing/catch2_test_transform.cuthrust/testing/cuda/copy_if.cuthrust/testing/cuda/sort.cuthrust/testing/cuda/transform.cuthrust/testing/dereference.cuthrust/testing/for_each.cuthrust/testing/gather.cuthrust/testing/generate.cuthrust/testing/merge.cuthrust/testing/merge_by_key.cuthrust/testing/partition.cuthrust/testing/remove.cuthrust/testing/replace.cuthrust/testing/reverse.cuthrust/testing/scan.cuthrust/testing/shuffle.cuthrust/testing/tabulate.cuthrust/testing/transform_scan.cuthrust/testing/unique_by_key.cuthrust/testing/zip_function.cuthrust/thrust/detail/complex/catrig.hthrust/thrust/detail/complex/catrigf.hthrust/thrust/detail/complex/cexp.hthrust/thrust/detail/complex/cexpf.hthrust/thrust/detail/complex/math_private.hthrust/thrust/system/cuda/detail/core/util.hthrust/thrust/system/cuda/detail/reduce.hthrust/thrust/system/cuda/detail/reduce_by_key.hthrust/thrust/system/cuda/detail/set_operations.h
💤 Files with no reviewable changes (1)
- .clang-tidy
| { | ||
| auto h_pairs = thrust::make_zip_iterator(h_res_keys.begin(), h_res_values.begin()); | ||
| thrust::sort(h_pairs, h_pairs + h_res_keys.size()); | ||
| thrust::sort(h_pairs, h_pairs + static_cast<std::ptrdiff_t>(h_res_keys.size())); |
There was a problem hiding this comment.
important: std::ptrdiff_t is introduced on Line 74 without adding <cstddef>. Add an explicit #include <cstddef> to keep this TU self-sufficient and avoid transitive-include dependency.
As per coding guidelines, “Files must include all headers related to the symbols that they are using and relying on transitive header inclusion is not allowed.”
Source: Coding guidelines
| for (int j = 0; j < N; ++j) | ||
| { | ||
| host_data[i * N * 2 + j] = i * N + j; | ||
| host_data[static_cast<std::size_t>(i) * N * 2 + j] = static_cast<int>(static_cast<std::size_t>(i) * N + j); |
There was a problem hiding this comment.
important: Line 369 hardcodes row stride as N * 2 instead of using Ld, which makes the test silently wrong if Ld is changed independently. Use static_cast<std::size_t>(i) * Ld + j to match the declared stride contract.
| for (int i = static_cast<int>(cuda::gpu_thread.rank(cuda::grid)); i < a.size(); | ||
| i += static_cast<int>(cuda::gpu_thread.count(cuda::grid))) |
There was a problem hiding this comment.
suggestion: use cuda::std::span<T>::size_type (or auto) for the grid-stride loop index/stride instead of int. On Line 73 and Line 74, int can overflow for large buffers while a.size() is wider, which can break termination/correct indexing in stress cases.
| __global__ void increment_kernel(int* p, int n) | ||
| { | ||
| int idx = blockIdx.x * blockDim.x + threadIdx.x; | ||
| int idx = static_cast<int>(blockIdx.x * blockDim.x + threadIdx.x); |
There was a problem hiding this comment.
important: Line 18 should declare idx as const because it is never reassigned.
As per coding guidelines, “All non-const variables must use const.”
Source: Coding guidelines
😬 CI Workflow Results🟥 Finished in 5h 52m: Pass: 99%/503 | Total: 22d 17h | Max: 2h 52m | Hits: 26%/3096939See results here. |
I disagree. This checks makes developers carefully consider the size type and does illustrate some possible real bugs. There's a lot of places where we take for granted that something fits in an Granted the way I've silenced a lot of these is noise (by just |
|
I don't have a strong preference. The warning is useful. This will force developers to think about potential data loss. On the other hand, it is very verbose. The thread hierarchy API will help a lot for device code (instead of |
| const auto common_prefix = state.get_float64("MismatchAt"); | ||
| const auto mismatch_point = cuda::std::clamp<std::size_t>(elements * common_prefix, 0, elements - 2); | ||
| const auto mismatch_point = cuda::std::clamp<std::size_t>( | ||
| static_cast<std::size_t>(static_cast<double>(elements) * common_prefix), std::size_t{0}, elements - 2); |
There was a problem hiding this comment.
We should just ensure that common_prefix is size_t
Description
closes
Checklist