bugprone-misplaced-widening-cast#9506
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (75)
✅ Files skipped from review due to trivial changes (53)
🚧 Files skipped from review as they are similar to previous changes (20)
SummaryThis PR simplifies the Configuration Changes
Impact
WalkthroughEnables the ChangesEnable check and fix all violations
Possibly related PRs
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cub/cub/device/device_histogram.cuh (1)
753-786:⚠️ Potential issue | 🟠 Major | ⚡ Quick winimportant: Mirror this cast-before-multiply fix in the env overloads.
These branches are fine, but the
EnvToverloads later incub/cub/device/device_histogram.cuhstill do the cast after multiplyingnum_rows * row_stride_bytes, so the same clang-tidy hit and overflow risk remain in this header.Also applies to: 1453-1484
cub/cub/agent/agent_rle.cuh (1)
363-369:⚠️ Potential issue | 🔴 Critical | ⚡ Quick wincritical:
tile_successor_item/tile_predecessor_itemare only written by one lane, but every lane still passes the local variable by value intoFlagHeadsAndTails. Most lanes therefore hand the callee an indeterminate value here, which makes the head/tail classification undefined. Broadcast the neighbor item before the call instead of just suppressing the lint.Also applies to: 377-383, 389-402
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ca2cbe9d-1d29-425b-833e-2ac72191e0ff
📒 Files selected for processing (75)
.clang-tidycub/cub/agent/agent_batch_memcpy.cuhcub/cub/agent/agent_for.cuhcub/cub/agent/agent_merge.cuhcub/cub/agent/agent_radix_sort_downsweep.cuhcub/cub/agent/agent_radix_sort_histogram.cuhcub/cub/agent/agent_radix_sort_onesweep.cuhcub/cub/agent/agent_reduce.cuhcub/cub/agent/agent_reduce_by_key.cuhcub/cub/agent/agent_rle.cuhcub/cub/agent/agent_select_if.cuhcub/cub/agent/agent_three_way_partition.cuhcub/cub/agent/agent_topk.cuhcub/cub/agent/agent_unique_by_key.cuhcub/cub/block/block_load.cuhcub/cub/block/block_store.cuhcub/cub/detail/rfa.cuhcub/cub/device/device_histogram.cuhcub/cub/device/dispatch/dispatch_batch_memcpy.cuhcub/cub/device/dispatch/dispatch_for.cuhcub/cub/device/dispatch/dispatch_merge.cuhcub/cub/device/dispatch/dispatch_radix_sort.cuhcub/cub/device/dispatch/dispatch_reduce.cuhcub/cub/device/dispatch/dispatch_reduce_deterministic.cuhcub/cub/device/dispatch/dispatch_rle.cuhcub/cub/device/dispatch/dispatch_segmented_reduce.cuhcub/cub/device/dispatch/dispatch_segmented_sort.cuhcub/cub/device/dispatch/dispatch_select_if.cuhcub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuhcub/cub/device/dispatch/kernels/kernel_merge_sort.cuhcub/cub/device/dispatch/kernels/kernel_segmented_scan.cuhcub/cub/device/dispatch/kernels/kernel_segmented_sort.cuhcub/cub/device/dispatch/kernels/kernel_three_way_partition.cuhcub/cub/device/dispatch/kernels/kernel_transform.cuhcub/cub/device/dispatch/kernels/kernel_unique_by_key.cuhcub/cub/grid/grid_even_share.cuhcub/cub/thread/thread_reduce.cuhcub/test/catch2_segmented_sort_helper.cuhcub/test/catch2_test_block_adjacent_difference.cucub/test/catch2_test_device_for.cucub/test/catch2_test_device_reduce.cuhcub/test/catch2_test_device_run_length_encode_non_trivial_runs.cucub/test/catch2_test_device_scan_alignment.cucub/test/catch2_test_device_segmented_scan.cucub/test/catch2_test_device_segmented_scan_multi_segment.cucub/test/catch2_test_device_segmented_scan_noncommutative.cucub/test/catch2_test_device_segmented_topk_pairs.cucub/test/catch2_test_device_topk_common.cuhcub/test/catch2_test_device_transform_reduce.cucub/test/catch2_test_warp_load.cucub/test/catch2_test_warp_scan_api.cucub/test/catch2_test_warp_store.cucub/test/warp/catch2_test_warp_reduce.cucub/test/warp/catch2_test_warp_reduce_batched.cucub/test/warp/catch2_test_warp_segmented_reduce.cucudax/benchmarks/bench/cuco/hashers.cucudax/test/copy/copy.cucudax/test/copy/copy_edge_cases.cucudax/test/copy_bytes/mdspan_d2h_h2d.cucudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.culibcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.culibcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.culibcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.culibcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.culibcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cuthrust/examples/bucket_sort2d.cuthrust/examples/discrete_voronoi.cuthrust/examples/monte_carlo_disjoint_sequences.cuthrust/examples/padded_grid_reduction.cuthrust/examples/scan_matrix_by_rows.cuthrust/testing/counting_iterator.cuthrust/thrust/system/cuda/detail/reduce.hthrust/thrust/system/cuda/detail/set_operations.hthrust/thrust/system/detail/generic/shuffle.inlthrust/thrust/system/detail/sequential/stable_radix_sort.h
💤 Files with no reviewable changes (1)
- .clang-tidy
| using Difference = thrust::detail::it_difference_t<Iterator>; | ||
|
|
||
| Difference diff = std::numeric_limits<std::uint32_t>::max() + 1; | ||
| Difference diff = std::numeric_limits<std::uint32_t>::max() + 1; // NOLINT(bugprone-misplaced-widening-cast) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In C++, for std::uint32_t x = std::numeric_limitsstd::uint32_t::max(); auto y = x + 1;, does y overflow in 32-bit before assignment to a wider signed type? Please cite the usual arithmetic conversion rules from cppreference.
💡 Result:
In the expression auto y = x + 1; where x is std::uint32_t initialized to std::numeric_limits<std::uint32_t>::max(), the operation does not "overflow" in the sense of undefined behavior [1][2]. Instead, it performs well-defined unsigned integer arithmetic [1]. According to the C++ standard's rules for arithmetic operators [1][3]: 1. Integral Promotion: Before the addition, the operand x (type std::uint32_t) and the literal 1 (type int) undergo integral promotion [3][4]. If int can represent all values of std::uint32_t (which is typically not the case on systems where both are 32 bits), the uint32_t is promoted to int [5][4]. However, if std::uint32_t cannot be represented by int (common on 32-bit systems), it remains std::uint32_t or is promoted to unsigned int [5]. 2. Usual Arithmetic Conversions: Once promoted, if the types of the operands differ (e.g., std::uint32_t and int), the "usual arithmetic conversions" are applied to bring them to a common type [6][3]. Specifically, if one operand is unsigned and the other is signed, the signed operand is converted to the corresponding unsigned type if the unsigned type's rank is greater than or equal to the signed type's rank [6]. As a result, the addition is performed using unsigned arithmetic [7][8]. 3. Unsigned Wrap-around: Unsigned integer arithmetic is defined to be performed modulo 2^n, where n is the number of bits [1]. Therefore, adding 1 to the maximum value of a 32-bit unsigned integer results in 0 [1][2]. This wrap-around occurs during the evaluation of the expression x + 1 itself, before any assignment takes place [8]. Consequently, y will be assigned the value 0, not a value that overflowed into a wider signed type [8][9].
Citations:
- 1: https://cs.cppreference.com/w/cpp/language/operator_arithmetic.html
- 2: https://giodicanio.com/2023/11/08/protecting-your-c-plus-plus-code-against-unsigned-integer-overflow/
- 3: https://en.cppreference.com/cpp/language/operator_arithmetic
- 4: https://en.cppreference.com/cpp/language/implicit_conversion
- 5: https://learn.microsoft.com/en-us/cpp/cpp/standard-conversions?view=msvc-170
- 6: https://en.cppreference.com/cpp/language/usual_arithmetic_conversions
- 7: https://stackoverflow.com/questions/12889237/c-integer-overflow-and-promotion
- 8: https://stackoverflow.com/questions/62828526/why-is-uint32-max-1-0
- 9: https://discourse.llvm.org/t/libcxx-bugs-incoherent-behaviour-on-integer-overflow/54312
🏁 Script executed:
cd /tmp && git clone --depth 1 https://github.com/NVIDIA/cccl.git 2>&1 | head -20Repository: NVIDIA/cccl
Length of output: 140
🏁 Script executed:
cat -n cccl/thrust/testing/counting_iterator.cu | sed -n '270,290p'Repository: NVIDIA/cccl
Length of output: 128
🏁 Script executed:
# Find the Difference type definition in the test file
rg -A 5 -B 5 "Difference" cccl/thrust/testing/counting_iterator.cu | head -40Repository: NVIDIA/cccl
Length of output: 136
🏁 Script executed:
find /tmp/cccl -name "counting_iterator.cu" -type fRepository: NVIDIA/cccl
Length of output: 148
🏁 Script executed:
find /tmp/cccl -name "*counting_iterator*" -type fRepository: NVIDIA/cccl
Length of output: 426
🏁 Script executed:
cat -n /tmp/cccl/thrust/testing/counting_iterator.cu | sed -n '270,290p'Repository: NVIDIA/cccl
Length of output: 714
🏁 Script executed:
# Get more context around line 280 to understand the Difference type
cat -n /tmp/cccl/thrust/testing/counting_iterator.cu | sed -n '260,300p'Repository: NVIDIA/cccl
Length of output: 1492
🏁 Script executed:
# Search for Difference typedef or using declaration
rg -B 20 "std::numeric_limits<std::uint32_t>::max\(\) \+ 1" /tmp/cccl/thrust/testing/counting_iterator.cuRepository: NVIDIA/cccl
Length of output: 794
🏁 Script executed:
rg -A 10 "it_difference_t" /tmp/cccl/thrust/thrust/iterator/detail/counting_iterator.inl | head -30Repository: NVIDIA/cccl
Length of output: 157
🏁 Script executed:
# Look for the difference_type in counting_iterator definition
rg -B 5 -A 5 "difference_type" /tmp/cccl/thrust/thrust/iterator/counting_iterator.h | head -50Repository: NVIDIA/cccl
Length of output: 2024
🏁 Script executed:
# Find the definition of it_difference_t
find /tmp/cccl -name "*.h" -o -name "*.hpp" | xargs rg "template.*it_difference_t|using it_difference_t" | head -20Repository: NVIDIA/cccl
Length of output: 190
🏁 Script executed:
cat /tmp/cccl/thrust/thrust/iterator/iterator_traits.h | grep -A 5 "it_difference_t"Repository: NVIDIA/cccl
Length of output: 346
🏁 Script executed:
# Verify the difference_type for counting_iterator with uint64_t
cat /tmp/cccl/thrust/thrust/iterator/counting_iterator.h | sed -n '1,150p' | grep -A 20 "counting_iterator_difference_type"Repository: NVIDIA/cccl
Length of output: 1619
🏁 Script executed:
# Confirm ptrdiff_t is typically 64-bit on 64-bit systems and larger than uint32_t
python3 << 'PY'
# In C++, ptrdiff_t is typically signed long on 64-bit systems
# The issue is that:
# std::numeric_limits<std::uint32_t>::max() = 4294967295 (0xFFFFFFFF)
# std::numeric_limits<std::uint32_t>::max() + 1
# Both operands are uint32_t, so addition happens in uint32_t
# Result: 0xFFFFFFFF + 1 = 0x100000000, but wraps to 0x00000000 in uint32_t
# Then assigns 0 to ptrdiff_t
# The fix would cast first:
# static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1
# This casts 0xFFFFFFFF to ptrdiff_t first (value 4294967295)
# Then adds 1 in ptrdiff_t, resulting in 4294967296
print("Original expression: std::numeric_limits<std::uint32_t>::max() + 1")
print("Operand types: uint32_t + int -> uint32_t + uint32_t")
print("Arithmetic: 0xFFFFFFFF + 0x00000001 = 0x100000000 (wraps to 0x00000000 in uint32_t)")
print("Result assigned to ptrdiff_t: 0")
print()
print("Fixed expression: static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1")
print("Operand types: ptrdiff_t + int -> ptrdiff_t + ptrdiff_t")
print("Arithmetic: 4294967295 + 1 = 4294967296 (in ptrdiff_t)")
print("Result assigned to ptrdiff_t: 4294967296 ✓")
PYRepository: NVIDIA/cccl
Length of output: 535
important: the expression std::numeric_limits<std::uint32_t>::max() + 1 wraps to 0 in 32-bit unsigned arithmetic before assignment to Difference (which is ::cuda::std::ptrdiff_t for counting_iterator<std::uint64_t>). The test then validates with diff = 0, defeating its purpose to test >32-bit iterator differences. Cast before addition: static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1.
This comment has been minimized.
This comment has been minimized.
miscco
left a comment
There was a problem hiding this comment.
This is just too much noise
@miscco this check is arguably even more valuable than the narrowing cast. I strongly urge us to keep it. Basically, this check warns on explicit conversions which still result in lossy conversion. I.e. a developer has come along, considered the possibility that |
|
noisy, but not too noisy. It affects ~1 LoC for a subset of files. Also, we could focus this warning and the narrowing one only for library code which requires higher quality |
f412ebc to
f30e5e9
Compare
😬 CI Workflow Results🟥 Finished in 6h 04m: Pass: 99%/443 | Total: 18d 12h | Max: 2h 42m | Hits: 29%/2218283See results here. |
Description
closes
Checklist