Skip to content

Commit de4287c

Browse files
committed
Improve fastcall half-float benchmark stability
* Increase ctypes benchmark samples Use more ASV repeats for the half-float ctypes round-trip benchmark. The extra samples reduce noise while keeping the targeted benchmark runtime well below the expected limit. Signed-off-by: Eric Shi <ershi@nvidia.com> * Improve half-float ctypes benchmark Reduce the round-trip ctypes sample size so each ASV measurement is short enough to avoid millisecond-scale scheduler noise. Increase the method-specific repeat count to preserve statistical power while keeping the targeted benchmark runtime bounded. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2406
1 parent 7b97e6e commit de4287c

1 file changed

Lines changed: 7 additions & 4 deletions

File tree

asv/benchmarks/fastcall.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,11 @@
1919
class HalfFloatConversion:
2020
"""Benchmark half-float conversion via METH_FASTCALL and ctypes paths."""
2121

22-
# The fastcall path is short enough that a ~2 ms sample can hide scheduler
23-
# noise. Keep those samples near 0.05-0.10 ms and compensate with more ASV
24-
# repeats. The ctypes path is slower, so keep its sampling unchanged.
22+
# Short fastcall loops can hide scheduler noise in millisecond-scale
23+
# samples. Keep those samples near 0.05-0.10 ms and compensate with more
24+
# ASV repeats. The single-conversion ctypes benchmarks still use larger
25+
# inner loops; only the round-trip ctypes benchmark uses a shorter loop
26+
# because it performs two ctypes calls per iteration.
2527
repeat = 300
2628
number = 1
2729
warmup_time = 0.1
@@ -63,10 +65,11 @@ def time_round_trip_fastcall(self):
6365
def time_round_trip_ctypes(self):
6466
to_half = self.ctypes.wp_float_to_half_bits
6567
to_float = self.ctypes.wp_half_bits_to_float
66-
for _ in range(5_000):
68+
for _ in range(100):
6769
to_float(to_half(1.0))
6870

6971

7072
HalfFloatConversion.time_float_to_half_bits_fastcall.repeat = 2_000
7173
HalfFloatConversion.time_half_bits_to_float_fastcall.repeat = 2_000
7274
HalfFloatConversion.time_round_trip_fastcall.repeat = 2_000
75+
HalfFloatConversion.time_round_trip_ctypes.repeat = 20_000

0 commit comments

Comments
 (0)