bugprone-misplaced-widening-cast by Jacobfaib · Pull Request #9506 · NVIDIA/cccl

Jacobfaib · 2026-06-17T17:36:48Z

Description

closes

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

coderabbitai · 2026-06-17T17:47:46Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f78850fe-0a49-4cfb-9347-aec8f85246b3

📥 Commits

Reviewing files that changed from the base of the PR and between f412ebc and f30e5e9.

📒 Files selected for processing (75)

.clang-tidy
cub/cub/agent/agent_batch_memcpy.cuh
cub/cub/agent/agent_for.cuh
cub/cub/agent/agent_merge.cuh
cub/cub/agent/agent_radix_sort_downsweep.cuh
cub/cub/agent/agent_radix_sort_histogram.cuh
cub/cub/agent/agent_radix_sort_onesweep.cuh
cub/cub/agent/agent_reduce.cuh
cub/cub/agent/agent_reduce_by_key.cuh
cub/cub/agent/agent_rle.cuh
cub/cub/agent/agent_select_if.cuh
cub/cub/agent/agent_three_way_partition.cuh
cub/cub/agent/agent_topk.cuh
cub/cub/agent/agent_unique_by_key.cuh
cub/cub/block/block_load.cuh
cub/cub/block/block_store.cuh
cub/cub/detail/rfa.cuh
cub/cub/device/device_histogram.cuh
cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
cub/cub/device/dispatch/dispatch_for.cuh
cub/cub/device/dispatch/dispatch_merge.cuh
cub/cub/device/dispatch/dispatch_radix_sort.cuh
cub/cub/device/dispatch/dispatch_reduce.cuh
cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
cub/cub/device/dispatch/dispatch_rle.cuh
cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
cub/cub/device/dispatch/dispatch_segmented_sort.cuh
cub/cub/device/dispatch/dispatch_select_if.cuh
cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
cub/cub/device/dispatch/kernels/kernel_transform.cuh
cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
cub/cub/grid/grid_even_share.cuh
cub/cub/thread/thread_reduce.cuh
cub/test/catch2_segmented_sort_helper.cuh
cub/test/catch2_test_block_adjacent_difference.cu
cub/test/catch2_test_device_for.cu
cub/test/catch2_test_device_reduce.cuh
cub/test/catch2_test_device_run_length_encode_non_trivial_runs.cu
cub/test/catch2_test_device_scan_alignment.cu
cub/test/catch2_test_device_segmented_scan.cu
cub/test/catch2_test_device_segmented_scan_multi_segment.cu
cub/test/catch2_test_device_segmented_scan_noncommutative.cu
cub/test/catch2_test_device_segmented_topk_pairs.cu
cub/test/catch2_test_device_topk_common.cuh
cub/test/catch2_test_device_transform_reduce.cu
cub/test/catch2_test_warp_load.cu
cub/test/catch2_test_warp_scan_api.cu
cub/test/catch2_test_warp_store.cu
cub/test/warp/catch2_test_warp_reduce.cu
cub/test/warp/catch2_test_warp_reduce_batched.cu
cub/test/warp/catch2_test_warp_segmented_reduce.cu
cudax/benchmarks/bench/cuco/hashers.cu
cudax/test/copy/copy.cu
cudax/test/copy/copy_edge_cases.cu
cudax/test/copy_bytes/mdspan_d2h_h2d.cu
cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.cu
libcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
thrust/examples/bucket_sort2d.cu
thrust/examples/discrete_voronoi.cu
thrust/examples/monte_carlo_disjoint_sequences.cu
thrust/examples/padded_grid_reduction.cu
thrust/examples/scan_matrix_by_rows.cu
thrust/testing/counting_iterator.cu
thrust/thrust/system/cuda/detail/reduce.h
thrust/thrust/system/cuda/detail/set_operations.h
thrust/thrust/system/detail/generic/shuffle.inl
thrust/thrust/system/detail/sequential/stable_radix_sort.h

✅ Files skipped from review due to trivial changes (53)

cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
cub/cub/agent/agent_three_way_partition.cuh
cub/cub/detail/rfa.cuh
cub/cub/device/dispatch/dispatch_merge.cuh
cub/test/catch2_test_device_run_length_encode_non_trivial_runs.cu
cub/test/catch2_segmented_sort_helper.cuh
thrust/testing/counting_iterator.cu
libcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
cub/cub/grid/grid_even_share.cuh
cub/cub/agent/agent_unique_by_key.cuh
cub/cub/device/dispatch/dispatch_reduce.cuh
cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
cub/cub/agent/agent_radix_sort_downsweep.cuh
cub/cub/block/block_load.cuh
cub/test/catch2_test_device_scan_alignment.cu
cub/cub/thread/thread_reduce.cuh
cub/test/catch2_test_device_for.cu
cub/cub/agent/agent_for.cuh
cub/cub/agent/agent_radix_sort_onesweep.cuh
cub/cub/agent/agent_reduce_by_key.cuh
cub/test/catch2_test_block_adjacent_difference.cu
cub/test/catch2_test_device_segmented_scan_noncommutative.cu
cub/test/catch2_test_device_segmented_topk_pairs.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.cu
cub/cub/agent/agent_merge.cuh
cub/cub/device/dispatch/kernels/kernel_transform.cuh
cub/test/catch2_test_warp_load.cu
cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
cub/cub/agent/agent_select_if.cuh
thrust/examples/discrete_voronoi.cu
cub/cub/block/block_store.cuh
cub/test/catch2_test_device_topk_common.cuh
cub/cub/agent/agent_batch_memcpy.cuh
cub/cub/agent/agent_reduce.cuh
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.cu
cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
thrust/examples/scan_matrix_by_rows.cu
cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
thrust/examples/monte_carlo_disjoint_sequences.cu
cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
thrust/thrust/system/detail/sequential/stable_radix_sort.h
cub/cub/agent/agent_rle.cuh
thrust/thrust/system/cuda/detail/set_operations.h
cub/test/catch2_test_device_segmented_scan_multi_segment.cu
cub/test/catch2_test_warp_store.cu
cub/cub/device/dispatch/dispatch_radix_sort.cuh
cub/test/warp/catch2_test_warp_segmented_reduce.cu
cub/test/catch2_test_device_segmented_scan.cu
cub/test/catch2_test_warp_scan_api.cu

🚧 Files skipped from review as they are similar to previous changes (20)

thrust/thrust/system/detail/generic/shuffle.inl
cub/cub/device/dispatch/dispatch_for.cuh
cub/test/catch2_test_device_transform_reduce.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu
cub/cub/agent/agent_radix_sort_histogram.cuh
cub/test/warp/catch2_test_warp_reduce_batched.cu
cub/cub/device/dispatch/dispatch_rle.cuh
thrust/examples/bucket_sort2d.cu
thrust/thrust/system/cuda/detail/reduce.h
cub/cub/agent/agent_topk.cuh
thrust/examples/padded_grid_reduction.cu
cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
cudax/test/copy/copy_edge_cases.cu
cub/cub/device/dispatch/dispatch_select_if.cuh
cudax/benchmarks/bench/cuco/hashers.cu
cub/test/warp/catch2_test_warp_reduce.cu
cub/test/catch2_test_device_reduce.cuh
cudax/test/copy_bytes/mdspan_d2h_h2d.cu
cub/cub/device/device_histogram.cuh
cudax/test/copy/copy.cu

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Summary

This PR simplifies the .clang-tidy configuration file by removing extensive comments and complex settings, while enabling the bugprone-misplaced-widening-cast check across the codebase.

Configuration Changes

.clang-tidy:

Simplified the configuration format: converted the verbose multi-line checks list with extensive inline documentation (326 lines) into a compact folded-scalar format (23 lines total)
Removed all comments documenting rationale for specific check exclusions
Enabled bugprone-misplaced-widening-cast by removing its disable entry from the bugprone checks
Added -bugprone-signed-char-misuse to the disabled checks list
Changed WarningsAsErrors from '*' (treat all warnings as errors) to '' (disabled)
Removed extensive configuration sections:
- HeaderFileExtensions, ImplementationFileExtensions, and SystemHeaders settings
- HeaderFilterRegex and compiler argument specifications (ExtraArgsBefore, ExtraArgs)
- All CheckOptions customizations for various checks (100+ lines of configuration)
Retained all functional check enable/disable logic in a more concise format

Impact

The configuration is now more maintainable with reduced file size (303 line reduction)
Warnings are no longer treated as errors by default, allowing for a transitional period as the codebase addresses the newly enabled bugprone-misplaced-widening-cast check
The disabled checks list is significantly streamlined while preserving project-specific customizations

Walkthrough

Enables the bugprone-misplaced-widening-cast clang-tidy check by removing it from .clang-tidy's disabled list, then resolves all resulting violations across CUB, Thrust, cudax, and libcudacxx. Fixes are either arithmetic order corrections (casting individual operands to the wider type before multiplying) or NOLINT suppressions where the existing pattern is intentional.

Changes

Enable check and fix all violations

Layer / File(s)	Summary
clang-tidy configuration `.clang-tidy`	Removes `bugprone-misplaced-widening-cast` from disabled checks.
Dispatch-layer tile-size and offset arithmetic fixes `cub/cub/device/dispatch/dispatch_for.cuh`, `dispatch_reduce.cuh`, `dispatch_reduce_deterministic.cuh`, `dispatch_rle.cuh`, `dispatch_select_if.cuh`, `dispatch_streaming_reduce_by_key.cuh`, `dispatch_radix_sort.cuh`, `dispatch_merge.cuh`, `dispatch_segmented_sort.cuh`, `dispatch_segmented_reduce.cuh`, `dispatch_batch_memcpy.cuh`, `cub/cub/grid/grid_even_share.cuh`	Changes `static_cast<OffsetT>(threads * items)` patterns to `static_cast<OffsetT>(threads) * static_cast<OffsetT>(items)` so multiplication is performed in the wider type, eliminating overflow before widening.
Agent-level NOLINT suppressions and cast-order fixes `cub/cub/agent/agent_batch_memcpy.cuh`, `agent_for.cuh`, `agent_merge.cuh`, `agent_radix_sort_downsweep.cuh`, `agent_radix_sort_histogram.cuh`, `agent_radix_sort_onesweep.cuh`, `agent_reduce.cuh`, `agent_reduce_by_key.cuh`, `agent_rle.cuh`, `agent_select_if.cuh`, `agent_three_way_partition.cuh`, `agent_topk.cuh`, `agent_unique_by_key.cuh`	Applies NOLINT suppressions and cast-before-multiply fixes across all CUB agent headers, covering partial-tile guards, scatter offsets, load/store indexing, and tile-base calculations.
Kernel-level NOLINT suppressions and histogram cast fix `cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh`, `kernel_segmented_scan.cuh`, `kernel_segmented_sort.cuh`, `kernel_three_way_partition.cuh`, `kernel_transform.cuh`, `kernel_unique_by_key.cuh`, `cub/cub/device/device_histogram.cuh`	Adds `NOLINT`/`NOLINTNEXTLINE` annotations to kernel offset expressions where the existing pattern is intentional; updates device_histogram overflow check from C-style cast to `static_cast<unsigned long long>`.
Block and warp primitive NOLINT suppressions `cub/cub/block/block_load.cuh`, `block_store.cuh`, `cub/cub/thread/thread_reduce.cuh`, `cub/cub/detail/rfa.cuh`	Annotates element-index expressions in block load/store variants and thread-level reduction with NOLINT comments.
Thrust internal and example fixes `thrust/thrust/system/cuda/detail/reduce.h`, `set_operations.h`, `thrust/thrust/system/detail/generic/shuffle.inl`, `thrust/thrust/system/detail/sequential/stable_radix_sort.h`, `thrust/examples/bucket_sort2d.cu`, `discrete_voronoi.cu`, `padded_grid_reduction.cu`, `scan_matrix_by_rows.cu`, `monte_carlo_disjoint_sequences.cu`, `thrust/testing/counting_iterator.cu`	Fixes cast order for `ITEMS_PER_TILE`/`blockIdx.x` products in reduce and set-operations; adds `iter_difference_t` cast in shuffle's inclusive_scan range; annotates or rewrites index expressions in Thrust examples.
CUB test arithmetic and NOLINT fixes `cub/test/catch2_test_device_reduce.cuh`, `catch2_test_device_for.cu`, `catch2_test_device_scan_alignment.cu`, `catch2_test_device_segmented_scan.cu`, `catch2_test_device_segmented_scan_multi_segment.cu`, `catch2_test_device_segmented_scan_noncommutative.cu`, `catch2_test_device_run_length_encode_non_trivial_runs.cu`, `catch2_test_device_transform_reduce.cu`, `catch2_test_block_adjacent_difference.cu`, `catch2_test_device_segmented_topk_pairs.cu`, `catch2_test_device_topk_common.cuh`, `catch2_test_warp_load.cu`, `catch2_test_warp_scan_api.cu`, `catch2_test_warp_store.cu`, `catch2_segmented_sort_helper.cuh`, `warp/catch2_test_warp_reduce.cu`, `warp/catch2_test_warp_reduce_batched.cu`, `warp/catch2_test_warp_segmented_reduce.cu`	Adds explicit `long`/`size_t`/`uint64_t` casts to segment iterator and index arithmetic in host-reference computations; annotates remaining instances with NOLINT comments.
cudax and libcudacxx test index arithmetic fixes `cudax/benchmarks/bench/cuco/hashers.cu`, `cudax/test/copy/copy.cu`, `cudax/test/copy/copy_edge_cases.cu`, `cudax/test/copy_bytes/mdspan_d2h_h2d.cu`, `cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu`, `libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu`, `libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu`, `pstl_shift_right.cu`, `alg.sorting/alg.merge/pstl_merge.cu`, `std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu`	Adds `static_cast<std::size_t>` or `static_cast<unsigned long>` to multidimensional index computations and algorithm test expressions to prevent int overflow before widening.

Possibly related PRs

NVIDIA/cccl#9467: Updates clang-tidy bugprone check configuration related to bugprone-misplaced-widening-cast handling.

Suggested reviewers

bernhardmgruber
gonidelis
caugonnet

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cub/cub/device/device_histogram.cuh (1)

753-786: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Mirror this cast-before-multiply fix in the env overloads.

These branches are fine, but the EnvT overloads later in cub/cub/device/device_histogram.cuh still do the cast after multiplying num_rows * row_stride_bytes, so the same clang-tidy hit and overflow risk remain in this header.

Also applies to: 1453-1484

cub/cub/agent/agent_rle.cuh (1)

363-369: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: tile_successor_item/tile_predecessor_item are only written by one lane, but every lane still passes the local variable by value into FlagHeadsAndTails. Most lanes therefore hand the callee an indeterminate value here, which makes the head/tail classification undefined. Broadcast the neighbor item before the call instead of just suppressing the lint.

Also applies to: 377-383, 389-402

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca2cbe9d-1d29-425b-833e-2ac72191e0ff

📥 Commits

Reviewing files that changed from the base of the PR and between a4ffbcf and f412ebc.

📒 Files selected for processing (75)

.clang-tidy
cub/cub/agent/agent_batch_memcpy.cuh
cub/cub/agent/agent_for.cuh
cub/cub/agent/agent_merge.cuh
cub/cub/agent/agent_radix_sort_downsweep.cuh
cub/cub/agent/agent_radix_sort_histogram.cuh
cub/cub/agent/agent_radix_sort_onesweep.cuh
cub/cub/agent/agent_reduce.cuh
cub/cub/agent/agent_reduce_by_key.cuh
cub/cub/agent/agent_rle.cuh
cub/cub/agent/agent_select_if.cuh
cub/cub/agent/agent_three_way_partition.cuh
cub/cub/agent/agent_topk.cuh
cub/cub/agent/agent_unique_by_key.cuh
cub/cub/block/block_load.cuh
cub/cub/block/block_store.cuh
cub/cub/detail/rfa.cuh
cub/cub/device/device_histogram.cuh
cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
cub/cub/device/dispatch/dispatch_for.cuh
cub/cub/device/dispatch/dispatch_merge.cuh
cub/cub/device/dispatch/dispatch_radix_sort.cuh
cub/cub/device/dispatch/dispatch_reduce.cuh
cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
cub/cub/device/dispatch/dispatch_rle.cuh
cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
cub/cub/device/dispatch/dispatch_segmented_sort.cuh
cub/cub/device/dispatch/dispatch_select_if.cuh
cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
cub/cub/device/dispatch/kernels/kernel_transform.cuh
cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
cub/cub/grid/grid_even_share.cuh
cub/cub/thread/thread_reduce.cuh
cub/test/catch2_segmented_sort_helper.cuh
cub/test/catch2_test_block_adjacent_difference.cu
cub/test/catch2_test_device_for.cu
cub/test/catch2_test_device_reduce.cuh
cub/test/catch2_test_device_run_length_encode_non_trivial_runs.cu
cub/test/catch2_test_device_scan_alignment.cu
cub/test/catch2_test_device_segmented_scan.cu
cub/test/catch2_test_device_segmented_scan_multi_segment.cu
cub/test/catch2_test_device_segmented_scan_noncommutative.cu
cub/test/catch2_test_device_segmented_topk_pairs.cu
cub/test/catch2_test_device_topk_common.cuh
cub/test/catch2_test_device_transform_reduce.cu
cub/test/catch2_test_warp_load.cu
cub/test/catch2_test_warp_scan_api.cu
cub/test/catch2_test_warp_store.cu
cub/test/warp/catch2_test_warp_reduce.cu
cub/test/warp/catch2_test_warp_reduce_batched.cu
cub/test/warp/catch2_test_warp_segmented_reduce.cu
cudax/benchmarks/bench/cuco/hashers.cu
cudax/test/copy/copy.cu
cudax/test/copy/copy_edge_cases.cu
cudax/test/copy_bytes/mdspan_d2h_h2d.cu
cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.cu
libcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.cu
libcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
thrust/examples/bucket_sort2d.cu
thrust/examples/discrete_voronoi.cu
thrust/examples/monte_carlo_disjoint_sequences.cu
thrust/examples/padded_grid_reduction.cu
thrust/examples/scan_matrix_by_rows.cu
thrust/testing/counting_iterator.cu
thrust/thrust/system/cuda/detail/reduce.h
thrust/thrust/system/cuda/detail/set_operations.h
thrust/thrust/system/detail/generic/shuffle.inl
thrust/thrust/system/detail/sequential/stable_radix_sort.h

💤 Files with no reviewable changes (1)

.clang-tidy

coderabbitai · 2026-06-17T17:47:50Z

  using Difference = thrust::detail::it_difference_t<Iterator>;

-  Difference diff = std::numeric_limits<std::uint32_t>::max() + 1;
+  Difference diff = std::numeric_limits<std::uint32_t>::max() + 1; // NOLINT(bugprone-misplaced-widening-cast)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In C++, for std::uint32_t x = std::numeric_limitsstd::uint32_t::max(); auto y = x + 1;, does y overflow in 32-bit before assignment to a wider signed type? Please cite the usual arithmetic conversion rules from cppreference.

💡 Result:

In the expression auto y = x + 1; where x is std::uint32_t initialized to std::numeric_limits<std::uint32_t>::max(), the operation does not "overflow" in the sense of undefined behavior [1][2]. Instead, it performs well-defined unsigned integer arithmetic [1]. According to the C++ standard's rules for arithmetic operators [1][3]: 1. Integral Promotion: Before the addition, the operand x (type std::uint32_t) and the literal 1 (type int) undergo integral promotion [3][4]. If int can represent all values of std::uint32_t (which is typically not the case on systems where both are 32 bits), the uint32_t is promoted to int [5][4]. However, if std::uint32_t cannot be represented by int (common on 32-bit systems), it remains std::uint32_t or is promoted to unsigned int [5]. 2. Usual Arithmetic Conversions: Once promoted, if the types of the operands differ (e.g., std::uint32_t and int), the "usual arithmetic conversions" are applied to bring them to a common type [6][3]. Specifically, if one operand is unsigned and the other is signed, the signed operand is converted to the corresponding unsigned type if the unsigned type's rank is greater than or equal to the signed type's rank [6]. As a result, the addition is performed using unsigned arithmetic [7][8]. 3. Unsigned Wrap-around: Unsigned integer arithmetic is defined to be performed modulo 2^n, where n is the number of bits [1]. Therefore, adding 1 to the maximum value of a 32-bit unsigned integer results in 0 [1][2]. This wrap-around occurs during the evaluation of the expression x + 1 itself, before any assignment takes place [8]. Consequently, y will be assigned the value 0, not a value that overflowed into a wider signed type [8][9].

Citations:

1: https://cs.cppreference.com/w/cpp/language/operator_arithmetic.html

2: https://giodicanio.com/2023/11/08/protecting-your-c-plus-plus-code-against-unsigned-integer-overflow/

3: https://en.cppreference.com/cpp/language/operator_arithmetic

4: https://en.cppreference.com/cpp/language/implicit_conversion

5: https://learn.microsoft.com/en-us/cpp/cpp/standard-conversions?view=msvc-170

6: https://en.cppreference.com/cpp/language/usual_arithmetic_conversions

7: https://stackoverflow.com/questions/12889237/c-integer-overflow-and-promotion

8: https://stackoverflow.com/questions/62828526/why-is-uint32-max-1-0

9: https://discourse.llvm.org/t/libcxx-bugs-incoherent-behaviour-on-integer-overflow/54312

🏁 Script executed:

cd /tmp && git clone --depth 1 https://github.com/NVIDIA/cccl.git 2>&1 | head -20

Repository: NVIDIA/cccl

Length of output: 140

🏁 Script executed:

cat -n cccl/thrust/testing/counting_iterator.cu | sed -n '270,290p'

Repository: NVIDIA/cccl

Length of output: 128

🏁 Script executed:

# Find the Difference type definition in the test file rg -A 5 -B 5 "Difference" cccl/thrust/testing/counting_iterator.cu | head -40

Repository: NVIDIA/cccl

Length of output: 136

🏁 Script executed:

find /tmp/cccl -name "counting_iterator.cu" -type f

Repository: NVIDIA/cccl

Length of output: 148

🏁 Script executed:

find /tmp/cccl -name "*counting_iterator*" -type f

Repository: NVIDIA/cccl

Length of output: 426

🏁 Script executed:

cat -n /tmp/cccl/thrust/testing/counting_iterator.cu | sed -n '270,290p'

Repository: NVIDIA/cccl

Length of output: 714

🏁 Script executed:

# Get more context around line 280 to understand the Difference type cat -n /tmp/cccl/thrust/testing/counting_iterator.cu | sed -n '260,300p'

Repository: NVIDIA/cccl

Length of output: 1492

🏁 Script executed:

# Search for Difference typedef or using declaration rg -B 20 "std::numeric_limits<std::uint32_t>::max\(\) \+ 1" /tmp/cccl/thrust/testing/counting_iterator.cu

Repository: NVIDIA/cccl

Length of output: 794

🏁 Script executed:

rg -A 10 "it_difference_t" /tmp/cccl/thrust/thrust/iterator/detail/counting_iterator.inl | head -30

Repository: NVIDIA/cccl

Length of output: 157

🏁 Script executed:

# Look for the difference_type in counting_iterator definition rg -B 5 -A 5 "difference_type" /tmp/cccl/thrust/thrust/iterator/counting_iterator.h | head -50

Repository: NVIDIA/cccl

Length of output: 2024

🏁 Script executed:

# Find the definition of it_difference_t find /tmp/cccl -name "*.h" -o -name "*.hpp" | xargs rg "template.*it_difference_t|using it_difference_t" | head -20

Repository: NVIDIA/cccl

Length of output: 190

🏁 Script executed:

cat /tmp/cccl/thrust/thrust/iterator/iterator_traits.h | grep -A 5 "it_difference_t"

Repository: NVIDIA/cccl

Length of output: 346

🏁 Script executed:

# Verify the difference_type for counting_iterator with uint64_t cat /tmp/cccl/thrust/thrust/iterator/counting_iterator.h | sed -n '1,150p' | grep -A 20 "counting_iterator_difference_type"

Repository: NVIDIA/cccl

Length of output: 1619

🏁 Script executed:

# Confirm ptrdiff_t is typically 64-bit on 64-bit systems and larger than uint32_t python3 << 'PY' # In C++, ptrdiff_t is typically signed long on 64-bit systems # The issue is that: # std::numeric_limits<std::uint32_t>::max() = 4294967295 (0xFFFFFFFF) # std::numeric_limits<std::uint32_t>::max() + 1 # Both operands are uint32_t, so addition happens in uint32_t # Result: 0xFFFFFFFF + 1 = 0x100000000, but wraps to 0x00000000 in uint32_t # Then assigns 0 to ptrdiff_t # The fix would cast first: # static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1 # This casts 0xFFFFFFFF to ptrdiff_t first (value 4294967295) # Then adds 1 in ptrdiff_t, resulting in 4294967296 print("Original expression: std::numeric_limits<std::uint32_t>::max() + 1") print("Operand types: uint32_t + int -> uint32_t + uint32_t") print("Arithmetic: 0xFFFFFFFF + 0x00000001 = 0x100000000 (wraps to 0x00000000 in uint32_t)") print("Result assigned to ptrdiff_t: 0") print() print("Fixed expression: static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1") print("Operand types: ptrdiff_t + int -> ptrdiff_t + ptrdiff_t") print("Arithmetic: 4294967295 + 1 = 4294967296 (in ptrdiff_t)") print("Result assigned to ptrdiff_t: 4294967296 ✓") PY

Repository: NVIDIA/cccl

Length of output: 535

important: the expression std::numeric_limits<std::uint32_t>::max() + 1 wraps to 0 in 32-bit unsigned arithmetic before assignment to Difference (which is ::cuda::std::ptrdiff_t for counting_iterator<std::uint64_t>). The test then validates with diff = 0, defeating its purpose to test >32-bit iterator differences. Cast before addition: static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1.

miscco

This is just too much noise

Jacobfaib · 2026-06-22T12:35:50Z

This is just too much noise

@miscco this check is arguably even more valuable than the narrowing cast. I strongly urge us to keep it. Basically, this check warns on explicit conversions which still result in lossy conversion.

I.e. a developer has come along, considered the possibility that x * y may overflow, and added a static_cast<BiggerType>(x * y) thinking they are OK. But they did not realize that this cast still results in loss of precision because the inner computation is performed on smaller types. Instead they should have written static_cast<BiggerType>(x) * y.

fbusato · 2026-06-22T16:37:57Z

noisy, but not too noisy. It affects ~1 LoC for a subset of files. Also, we could focus this warning and the narrowing one only for library code which requires higher quality

github-actions · 2026-06-22T23:02:10Z

😬 CI Workflow Results

🟥 Finished in 6h 04m: Pass: 99%/443 | Total: 18d 12h | Max: 2h 42m | Hits: 29%/2218283

See results here.

Jacobfaib self-assigned this Jun 17, 2026

Jacobfaib requested review from a team as code owners June 17, 2026 17:36

Jacobfaib requested a review from alliepiper June 17, 2026 17:36

github-project-automation Bot added this to CCCL Jun 17, 2026

Jacobfaib requested review from caugonnet and gonidelis June 17, 2026 17:36

github-project-automation Bot moved this to Todo in CCCL Jun 17, 2026

Jacobfaib requested review from davebayer, ericniebler and pauleonix June 17, 2026 17:36

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 17, 2026

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

miscco requested changes Jun 22, 2026

View reviewed changes

github-project-automation Bot moved this from In Review to In Progress in CCCL Jun 22, 2026

Jacobfaib added 2 commits June 22, 2026 12:54

bugprone-misplaced-widening-cast

78a7c81

fixup! bugprone-misplaced-widening-cast

f30e5e9

Jacobfaib force-pushed the jacobf/2026-06-17/bugprone-misplaced-widening-cast branch from f412ebc to f30e5e9 Compare June 22, 2026 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugprone-misplaced-widening-cast#9506

bugprone-misplaced-widening-cast#9506
Jacobfaib wants to merge 2 commits into
NVIDIA:mainfrom
Jacobfaib:jacobf/2026-06-17/bugprone-misplaced-widening-cast

Jacobfaib commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Uh oh!

This comment has been minimized.

miscco left a comment

Uh oh!

Jacobfaib commented Jun 22, 2026

Uh oh!

fbusato commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jacobfaib commented Jun 17, 2026

Description

Checklist

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Configuration Changes

Impact

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

Jacobfaib commented Jun 22, 2026

Uh oh!

fbusato commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

😬 CI Workflow Results

🟥 Finished in 6h 04m: Pass: 99%/443 | Total: 18d 12h | Max: 2h 42m | Hits: 29%/2218283

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading