This directory contains evaluator implementations for computing tracking quality metrics in the evaluation pipeline.
Each tracker evaluator implements the TrackerEvaluator abstract base class (see ../base/tracker_evaluator.py) to:
- Configure which metrics to compute
- Process tracker outputs and ground-truth data
- Compute industry-standard tracking quality metrics
- Export results and optional plots
Evaluators handle metric-library-specific details (TrackEval, py-motmetrics, etc.) while providing a unified interface to the evaluation pipeline.
Purpose: Compute tracking quality metrics using the TrackEval library with custom 3D point tracking support.
Status: FULLY IMPLEMENTED - Computes real metrics from tracker outputs using TrackEval library with custom MotChallenge3DPoint dataset class.
Supported Metrics:
- HOTA metrics: HOTA, DetA, AssA, LocA, DetPr, DetRe, AssPr, AssRe
- CLEAR MOT metrics: MOTA, MOTP, MT, ML, Frag
- Identity metrics: IDF1, IDP, IDR
For full metric list, refer to the TrackEval documentation: https://pypi.org/project/trackeval/.
Key Features:
- 3D Point Tracking: Custom
MotChallenge3DPointclass extends TrackEval'sMotChallenge2DBoxwith:- Euclidean distance similarity (instead of IoU)
- 3D position extraction (x, y, z from translation field)
- Configurable distance threshold (default: 2.0 meters for 0.5 similarity)
- Format Conversion: Automatic conversion from canonical JSON format to MOTChallenge CSV
- UUID Mapping: Consistent UUID-to-integer ID mapping for track identity preservation
- Timestamp Handling: Frame synchronization via FPS-based timestamp-to-frame conversion
Usage Example:
import sys
from pathlib import Path
# Add parent directories to path
sys.path.insert(0, str(Path(__file__).parent))
from evaluators.trackeval_evaluator import TrackEvalEvaluator
from datasets.metric_test_dataset import MetricTestDataset
from harnesses.scene_controller_harness import SceneControllerHarness
# Initialize dataset
dataset = MetricTestDataset("path/to/dataset")
dataset.set_cameras(["x1", "x2"]).set_camera_fps(30)
# Initialize and run harness
harness = SceneControllerHarness(container_image='scenescape-controller:latest')
harness.set_scene_config(dataset.get_scene_config())
harness.set_custom_config({'tracker_config_path': '/path/to/tracker-config.json'})
tracker_outputs = harness.process_inputs(dataset.get_inputs())
# Initialize evaluator
evaluator = TrackEvalEvaluator()
# Configure metrics
evaluator.configure_metrics(['HOTA', 'MOTA', 'IDF1'])
evaluator.set_output_folder(Path('/path/to/results'))
# Process and evaluate
evaluator.process_tracker_outputs(
tracker_outputs=tracker_outputs,
ground_truth=dataset.get_ground_truth()
)
# Get metrics
metrics = evaluator.evaluate_metrics()
print(f"HOTA: {metrics['HOTA']:.3f}")
print(f"MOTA: {metrics['MOTA']:.3f}")
print(f"IDF1: {metrics['IDF1']:.3f}")Current Limitations:
- Fixed class name ("pedestrian") for all objects
- Single-sequence evaluation only
- No parallel processing support
- Limited configuration options for TrackEval parameters
Implementation: trackeval_evaluator.py
Tests: See tests/test_trackeval_evaluator.py for comprehensive test suite with 16 test cases covering configuration, processing, evaluation, and integration workflows.
Purpose: Evaluate tracker smoothness by measuring positional jitter in tracked object trajectories, and compare it against jitter already present in the ground-truth test data.
Status: FULLY IMPLEMENTED — Computes RMS jerk and acceleration variance from both tracker outputs and ground-truth tracks using numerical differentiation.
Supported Metrics:
| Metric | Source | Description |
|---|---|---|
rms_jerk |
Tracker output | RMS jerk across all tracker output tracks (m/s³) |
acceleration_variance |
Tracker output | Variance of acceleration magnitudes across all tracker output tracks (m/s²)² |
rms_jerk_gt |
Ground truth | Same as rms_jerk computed on ground-truth tracks |
acceleration_variance_gt |
Ground truth | Same as acceleration_variance computed on ground-truth tracks |
rms_jerk_ratio |
Tracker / GT | rms_jerk / rms_jerk_gt — tracker jitter relative to GT (1.0 = equal) |
acceleration_variance_ratio |
Tracker / GT | acceleration_variance / acceleration_variance_gt |
Comparing rms_jerk with rms_jerk_gt shows how much jitter the tracker
adds on top of any jitter already present in the test data.
Algorithm:
All metrics are derived by applying three sequential layers of forward finite differences to 3D positions, accounting for variable time steps between frames:
-
rms_jerk / rms_jerk_gt:
$\sqrt{\frac{1}{N}\sum |j_i|^2}$ over all jerk samples from all tracks. -
acceleration_variance / acceleration_variance_gt:
$\text{Var}(|a_i|)$ over all acceleration magnitude samples from all tracks. - rms_jerk_ratio / acceleration_variance_ratio: tracker metric divided by the corresponding GT metric. Returns 0.0 when the GT denominator is zero. Values >1.0 indicate the tracker adds more jitter than is inherent in the ground truth.
Minimum track length: 3 points for acceleration, 4 points for jerk. Shorter tracks are skipped; if no eligible tracks exist, the metric returns 0.0.
For GT metrics, ground-truth frame numbers are converted to relative timestamps using the FPS derived from the tracker output.
Key Features:
- Builds per-track position histories from canonical tracker output format.
- Parses MOTChallenge 3D CSV ground-truth file for GT metric computation.
- Supports variable frame rates — time deltas are computed from actual timestamps.
- Deduplicates frames with identical timestamps (mirrors
TrackEvalEvaluatorbehaviour). - Sorts each track's positions by timestamp before metric computation.
- Saves a plain-text
jitter_results.txtsummary to the configured output folder.
Purpose: Per-frame location comparison and error analysis between matched output tracks and ground-truth tracks.
Status: FULLY IMPLEMENTED - Bipartite track matching with per-frame location and distance CSV/plot outputs.
Supported Metrics:
- LOC_T_X: Per-frame X position of each matched (output, GT) track pair
- LOC_T_Y: Per-frame Y position of each matched (output, GT) track pair
- DIST_T: Per-frame Euclidean distance error between each matched pair
Key Features:
- Track Matching: Bipartite assignment (Hungarian algorithm) minimizing mean Euclidean distance over overlapping frames. Requires a minimum of 10 overlapping frames (
MIN_OVERLAP_FRAMES). - Missing Frame Handling: Frames where only one side (output or GT) has data produce
NaNin CSV output, preserving full temporal context. - CSV Output: Per-metric CSV files with headers:
- LOC_T_X / LOC_T_Y:
[frame_id, track_id, gt_id, value_track, value_gt] - DIST_T:
[frame_id, track_id, gt_id, distance]
- LOC_T_X / LOC_T_Y:
- Plot Output: One matplotlib figure per metric with all matched pairs overlaid.
- Summary Scalars:
evaluate_metrics()returnsDIST_T_mean,LOC_T_X_mae,LOC_T_Y_mae, andnum_matches.
Usage Example:
from pathlib import Path
from evaluators.jitter_evaluator import JitterEvaluator
evaluator = JitterEvaluator()
evaluator.configure_metrics(['rms_jerk', 'rms_jerk_gt', 'rms_jerk_ratio',
'acceleration_variance', 'acceleration_variance_gt',
'acceleration_variance_ratio'])
evaluator.set_output_folder(Path('/path/to/results'))
# Pass ground_truth=None to skip GT metrics
evaluator.process_tracker_outputs(tracker_outputs, ground_truth=dataset.get_ground_truth())
metrics = evaluator.evaluate_metrics()
print(f"RMS Jerk (tracker): {metrics['rms_jerk']:.4f} m/s³")
print(f"RMS Jerk (GT): {metrics['rms_jerk_gt']:.4f} m/s³")
print(f"RMS Jerk ratio: {metrics['rms_jerk_ratio']:.4f} (1.0 = equal jitter)")Pipeline Configuration:
evaluators:
- class: evaluators.jitter_evaluator.JitterEvaluator
config:
metrics:
[
rms_jerk,
rms_jerk_gt,
rms_jerk_ratio,
acceleration_variance,
acceleration_variance_gt,
acceleration_variance_ratio,
]Implementation: jitter_evaluator.py
Tests: See tests/test_jitter_evaluator.py. from evaluators.diagnostic_evaluator import DiagnosticEvaluator from pathlib import Path
evaluator = DiagnosticEvaluator() metrics = (evaluator .configure_metrics(['LOC_T_X', 'LOC_T_Y', 'DIST_T']) .set_output_folder(Path('/path/to/results')) .process_tracker_outputs(tracker_outputs, gt_file_path) .evaluate_metrics()) print(f"Mean distance: {metrics['DIST_T_mean']:.3f}") print(f"X MAE: {metrics['LOC_T_X_mae']:.3f}") print(f"Y MAE: {metrics['LOC_T_Y_mae']:.3f}") print(f"Matched pairs: {int(metrics['num_matches'])}")
**Current Limitations**:
- Uses only X and Y coordinates (Z ignored)
- Single-sequence evaluation only
- No configurable overlap threshold (fixed at 10 frames)
**Implementation**: [diagnostic_evaluator.py](diagnostic_evaluator.py)
**Tests**: See [tests/test_diagnostic_evaluator.py](tests/test_diagnostic_evaluator.py) for unit tests covering track matching, scalar metrics, CSV output, and reset workflows.
## Adding New Evaluators
To add support for a new metric computation library:
1. **Create evaluator class**: Implement all abstract methods from `TrackerEvaluator` base class (see [../base/tracker_evaluator.py](../base/tracker_evaluator.py))
2. **Integrate metric library**: Wrap the external library (TrackEval, py-motmetrics, etc.) or implement custom code to compute metrics
3. **Handle formats**: Convert canonical tracker outputs and ground-truth to library-specific formats
4. **Support configuration**:
- `configure_metrics()` - specify which metrics to compute
- `set_output_folder()` - where to save results and plots
5. **Document requirements**: Update this README with supported metrics and configuration options
6. **Create tests**: Add tests validating metric computation and result export
### Implementation Patterns
**Metric computation workflow**:
1. Configure metrics via `configure_metrics(['HOTA', 'MOTA', ...])`
2. Set result output folder via `set_output_folder(Path('/results'))`
3. Process data via `process_tracker_outputs(tracker_outputs, ground_truth)`
4. Compute metrics via `evaluate_metrics()` → returns `Dict[str, float]`
5. Reset state via `reset()` to evaluate another tracker
**Method chaining**:
All configuration methods return `self` for fluent API:
```python
metrics = (evaluator
.configure_metrics(['HOTA', 'MOTA'])
.set_output_folder(Path('/results'))
.process_tracker_outputs(outputs, gt)
.evaluate_metrics())
Ground-truth format: Evaluators receive ground-truth in MOTChallenge 3D CSV format: See Canonical Data Formats
- Provided by dataset's
get_ground_truth()method
See tracker-evaluation-pipeline.md for overall architecture and design decisions.