Context
Surfaced by /review-pr on PR #190 (Performance reviewer). tracked_tip_pipeline.py (around line 247) builds result["trajectories"] via for _, row in df.iterrows(): plus a result["trajectories"].append({...}) per row. df.iterrows() is the documented pandas anti-pattern (per-row Series construction + dtype boxing).
Sub-second at current scale (4-plate × 6-track × 311-frame ≈ 7,460 rows). At 100-plate × 5,000-frame × 20-track scale (~10M rows) this becomes the dominant cost in compute.
Proposal
Replace with df.to_dict(orient="records") plus explicit dtype coercion, OR a vectorized zip over numpy arrays. Both are ~5-10× faster than iterrows. Pick whichever benchmarks better.
Acceptance
Related
Context
Surfaced by
/review-pron PR #190 (Performance reviewer).tracked_tip_pipeline.py(around line 247) buildsresult["trajectories"]viafor _, row in df.iterrows():plus aresult["trajectories"].append({...})per row.df.iterrows()is the documented pandas anti-pattern (per-row Series construction + dtype boxing).Sub-second at current scale (4-plate × 6-track × 311-frame ≈ 7,460 rows). At 100-plate × 5,000-frame × 20-track scale (~10M rows) this becomes the dominant cost in compute.
Proposal
Replace with
df.to_dict(orient="records")plus explicit dtype coercion, OR a vectorizedzipover numpy arrays. Both are ~5-10× faster thaniterrows. Pick whichever benchmarks better.Acceptance
Related
TrackedTipPipelinefor SLEAP-Trained Root Tip Tracking #129) — TrackedTipPipeline. Surfaced by the multi-agent review.