Problem
inspect() in mipcandy/data/inspection.py has two unnecessary performance bottlenecks:
1. Per-element tolist() in class_locations (line 294)
class_locations[class_id] = [tuple(coord.tolist()[1:]) for coord in indices]
This iterates over up to 10,000 single-row tensors, each triggering a Python-C++ bridge call. Should be a single batch operation:
class_locations[class_id] = tuple(tuple(loc) for loc in indices[:, 1:].tolist())
2. Redundant label != background (line 275 vs 301)
The boolean mask is computed twice on the full label tensor:
indices = (label != background).nonzero() # line 275
# ...
fg_mask = label != background # line 301 (same operation)
Should compute once and reuse.
Benchmark
Tested on PH2 (200 2D samples) and BRaTS (368 3D volumes, ~240x240x155):
PH2
| Variant |
Time |
Speedup |
| baseline |
31.7s |
1.00x |
| batch tolist |
16.6s |
1.91x |
| + reuse mask |
16.7s |
1.90x |
BRaTS
| Variant |
Time |
Speedup |
| baseline |
360s |
1.00x |
| batch tolist |
341s |
1.06x |
| + reuse mask |
335s |
1.08x |
On 2D datasets, opt1 alone nearly halves execution time. On 3D datasets, I/O dominates but both optimizations still provide measurable improvement.
Notes
- Streaming statistics (replacing
torch.cat + np.percentile with online mean/std and reservoir sampling) was also benchmarked but showed no improvement (even slightly slower on BRaTS) and introduced approximation error in percentile values. Not worth pursuing.
Problem
inspect()inmipcandy/data/inspection.pyhas two unnecessary performance bottlenecks:1. Per-element
tolist()inclass_locations(line 294)This iterates over up to 10,000 single-row tensors, each triggering a Python-C++ bridge call. Should be a single batch operation:
2. Redundant
label != background(line 275 vs 301)The boolean mask is computed twice on the full label tensor:
Should compute once and reuse.
Benchmark
Tested on PH2 (200 2D samples) and BRaTS (368 3D volumes, ~240x240x155):
PH2
BRaTS
On 2D datasets, opt1 alone nearly halves execution time. On 3D datasets, I/O dominates but both optimizations still provide measurable improvement.
Notes
torch.cat+np.percentilewith online mean/std and reservoir sampling) was also benchmarked but showed no improvement (even slightly slower on BRaTS) and introduced approximation error in percentile values. Not worth pursuing.