This directory contains evaluator implementations for computing tracking quality metrics in the evaluation pipeline.
Each tracker evaluator implements the TrackerEvaluator abstract base class (see ../base/tracker_evaluator.py) to:
- Configure which metrics to compute
- Process tracker outputs and ground-truth data
- Compute industry-standard tracking quality metrics
- Export results and optional plots
Evaluators handle metric-library-specific details (TrackEval, py-motmetrics, etc.) while providing a unified interface to the evaluation pipeline.
Purpose: Compute tracking quality metrics using the TrackEval library with custom 3D point tracking support.
Status: FULLY IMPLEMENTED - Computes real metrics from tracker outputs using TrackEval library with custom MotChallenge3DPoint dataset class.
Supported Metrics:
- HOTA metrics: HOTA, DetA, AssA, LocA, DetPr, DetRe, AssPr, AssRe
- CLEAR MOT metrics: MOTA, MOTP, MT, ML, Frag
- Identity metrics: IDF1, IDP, IDR
For full metric list, refer to the TrackEval documentation: https://pypi.org/project/trackeval/.
Key Features:
- 3D Point Tracking: Custom
MotChallenge3DPointclass extends TrackEval'sMotChallenge2DBoxwith:- Euclidean distance similarity (instead of IoU)
- 3D position extraction (x, y, z from translation field)
- Configurable distance threshold (default: 2.0 meters for 0.5 similarity)
- Format Conversion: Automatic conversion from canonical JSON format to MOTChallenge CSV
- UUID Mapping: Consistent UUID-to-integer ID mapping for track identity preservation
- Timestamp Handling: Frame synchronization via FPS-based timestamp-to-frame conversion
Usage Example:
import sys
from pathlib import Path
# Add parent directories to path
sys.path.insert(0, str(Path(__file__).parent))
from evaluators.trackeval_evaluator import TrackEvalEvaluator
from datasets.metric_test_dataset import MetricTestDataset
from harnesses.scene_controller_harness import SceneControllerHarness
# Initialize dataset
dataset = MetricTestDataset("path/to/dataset")
dataset.set_cameras(["x1", "x2"]).set_camera_fps(30)
# Initialize and run harness
harness = SceneControllerHarness(container_image='scenescape-controller:latest')
harness.set_scene_config(dataset.get_scene_config())
harness.set_custom_config({'tracker_config_path': '/path/to/tracker-config.json'})
tracker_outputs = harness.process_inputs(dataset.get_inputs())
# Initialize evaluator
evaluator = TrackEvalEvaluator()
# Configure metrics
evaluator.configure_metrics(['HOTA', 'MOTA', 'IDF1'])
evaluator.set_output_folder(Path('/path/to/results'))
# Process and evaluate
evaluator.process_tracker_outputs(
tracker_outputs=tracker_outputs,
ground_truth=dataset.get_ground_truth()
)
# Get metrics
metrics = evaluator.evaluate_metrics()
print(f"HOTA: {metrics['HOTA']:.3f}")
print(f"MOTA: {metrics['MOTA']:.3f}")
print(f"IDF1: {metrics['IDF1']:.3f}")Current Limitations:
- Fixed class name ("pedestrian") for all objects
- Single-sequence evaluation only
- No parallel processing support
- Limited configuration options for TrackEval parameters
Implementation: trackeval_evaluator.py
Tests: See tests/test_trackeval_evaluator.py for comprehensive test suite with 16 test cases covering configuration, processing, evaluation, and integration workflows.
Purpose: Evaluate tracker smoothness by measuring positional jitter in tracked object trajectories, and compare it against jitter already present in the ground-truth test data.
Status: FULLY IMPLEMENTED — Computes RMS jerk and acceleration variance from both tracker outputs and ground-truth tracks using numerical differentiation.
Supported Metrics:
| Metric | Source | Description |
|---|---|---|
rms_jerk |
Tracker output | RMS jerk across all tracker output tracks (m/s³) |
acceleration_variance |
Tracker output | Variance of acceleration magnitudes across all tracker output tracks (m/s²)² |
rms_jerk_gt |
Ground truth | Same as rms_jerk computed on ground-truth tracks |
acceleration_variance_gt |
Ground truth | Same as acceleration_variance computed on ground-truth tracks |
Comparing rms_jerk with rms_jerk_gt shows how much jitter the tracker
adds on top of any jitter already present in the test data.
Algorithm:
All metrics are derived by applying three sequential layers of forward finite differences to 3D positions, accounting for variable time steps between frames:
-
rms_jerk / rms_jerk_gt:
$\sqrt{\frac{1}{N}\sum |j_i|^2}$ over all jerk samples from all tracks. -
acceleration_variance / acceleration_variance_gt:
$\text{Var}(|a_i|)$ over all acceleration magnitude samples from all tracks.
Minimum track length: 3 points for acceleration, 4 points for jerk. Shorter tracks are skipped; if no eligible tracks exist, the metric returns 0.0.
For GT metrics, ground-truth frame numbers are converted to relative timestamps using the FPS derived from the tracker output.
Key Features:
- Builds per-track position histories from canonical tracker output format.
- Parses MOTChallenge 3D CSV ground-truth file for GT metric computation.
- Supports variable frame rates — time deltas are computed from actual timestamps.
- Deduplicates frames with identical timestamps (mirrors
TrackEvalEvaluatorbehaviour). - Sorts each track's positions by timestamp before metric computation.
- Saves a plain-text
jitter_results.txtsummary to the configured output folder.
Usage Example:
from pathlib import Path
from evaluators.jitter_evaluator import JitterEvaluator
evaluator = JitterEvaluator()
evaluator.configure_metrics(['rms_jerk', 'rms_jerk_gt', 'acceleration_variance', 'acceleration_variance_gt'])
evaluator.set_output_folder(Path('/path/to/results'))
# Pass ground_truth=None to skip GT metrics
evaluator.process_tracker_outputs(tracker_outputs, ground_truth=dataset.get_ground_truth())
metrics = evaluator.evaluate_metrics()
print(f"RMS Jerk (tracker): {metrics['rms_jerk']:.4f} m/s³")
print(f"RMS Jerk (GT): {metrics['rms_jerk_gt']:.4f} m/s³")
print(f"Tracker added jitter: {metrics['rms_jerk'] - metrics['rms_jerk_gt']:.4f} m/s³")Pipeline Configuration:
evaluators:
- class: evaluators.jitter_evaluator.JitterEvaluator
config:
metrics:
[rms_jerk, rms_jerk_gt, acceleration_variance, acceleration_variance_gt]Implementation: jitter_evaluator.py
Tests: See tests/test_jitter_evaluator.py.
To add support for a new metric computation library:
- Create evaluator class: Implement all abstract methods from
TrackerEvaluatorbase class (see ../base/tracker_evaluator.py) - Integrate metric library: Wrap the external library (TrackEval, py-motmetrics, etc.) or implement custom code to compute metrics
- Handle formats: Convert canonical tracker outputs and ground-truth to library-specific formats
- Support configuration:
configure_metrics()- specify which metrics to compute
set_output_folder()- where to save results and plots
- Document requirements: Update this README with supported metrics and configuration options
- Create tests: Add tests validating metric computation and result export
Metric computation workflow:
- Configure metrics via
configure_metrics(['HOTA', 'MOTA', ...]) - Set result output folder via
set_output_folder(Path('/results')) - Process data via
process_tracker_outputs(tracker_outputs, ground_truth) - Compute metrics via
evaluate_metrics()→ returnsDict[str, float] - Reset state via
reset()to evaluate another tracker
Method chaining:
All configuration methods return self for fluent API:
metrics = (evaluator
.configure_metrics(['HOTA', 'MOTA'])
.set_output_folder(Path('/results'))
.process_tracker_outputs(outputs, gt)
.evaluate_metrics())Ground-truth format: Evaluators receive ground-truth in MOTChallenge 3D CSV format: See Canonical Data Formats
- Provided by dataset's
get_ground_truth()method
See tracker-evaluation-pipeline.md for overall architecture and design decisions.