Skip to content

Latest commit

 

History

History
272 lines (195 loc) · 11.9 KB

File metadata and controls

272 lines (195 loc) · 11.9 KB

Tracker Evaluators

This directory contains evaluator implementations for computing tracking quality metrics in the evaluation pipeline.

Overview

Each tracker evaluator implements the TrackerEvaluator abstract base class (see ../base/tracker_evaluator.py) to:

  • Configure which metrics to compute
  • Process tracker outputs and ground-truth data
  • Compute industry-standard tracking quality metrics
  • Export results and optional plots

Evaluators handle metric-library-specific details (TrackEval, py-motmetrics, etc.) while providing a unified interface to the evaluation pipeline.

Available Evaluators

TrackEvalEvaluator

Purpose: Compute tracking quality metrics using the TrackEval library with custom 3D point tracking support.

Status: FULLY IMPLEMENTED - Computes real metrics from tracker outputs using TrackEval library with custom MotChallenge3DPoint dataset class.

Supported Metrics:

  • HOTA metrics: HOTA, DetA, AssA, LocA, DetPr, DetRe, AssPr, AssRe
  • CLEAR MOT metrics: MOTA, MOTP, MT, ML, Frag
  • Identity metrics: IDF1, IDP, IDR

For full metric list, refer to the TrackEval documentation: https://pypi.org/project/trackeval/.

Key Features:

  • 3D Point Tracking: Custom MotChallenge3DPoint class extends TrackEval's MotChallenge2DBox with:
    • Euclidean distance similarity (instead of IoU)
    • 3D position extraction (x, y, z from translation field)
    • Configurable distance threshold (default: 2.0 meters for 0.5 similarity)
  • Format Conversion: Automatic conversion from canonical JSON format to MOTChallenge CSV
  • UUID Mapping: Consistent UUID-to-integer ID mapping for track identity preservation
  • Timestamp Handling: Frame synchronization via FPS-based timestamp-to-frame conversion

Usage Example:

import sys
from pathlib import Path

# Add parent directories to path
sys.path.insert(0, str(Path(__file__).parent))

from evaluators.trackeval_evaluator import TrackEvalEvaluator
from datasets.metric_test_dataset import MetricTestDataset
from harnesses.scene_controller_harness import SceneControllerHarness

# Initialize dataset
dataset = MetricTestDataset("path/to/dataset")
dataset.set_cameras(["x1", "x2"]).set_camera_fps(30)

# Initialize and run harness
harness = SceneControllerHarness(container_image='scenescape-controller:latest')
harness.set_scene_config(dataset.get_scene_config())
harness.set_custom_config({'tracker_config_path': '/path/to/tracker-config.json'})
tracker_outputs = harness.process_inputs(dataset.get_inputs())

# Initialize evaluator
evaluator = TrackEvalEvaluator()

# Configure metrics
evaluator.configure_metrics(['HOTA', 'MOTA', 'IDF1'])
evaluator.set_output_folder(Path('/path/to/results'))

# Process and evaluate
evaluator.process_tracker_outputs(
    tracker_outputs=tracker_outputs,
    ground_truth=dataset.get_ground_truth()
)

# Get metrics
metrics = evaluator.evaluate_metrics()
print(f"HOTA: {metrics['HOTA']:.3f}")
print(f"MOTA: {metrics['MOTA']:.3f}")
print(f"IDF1: {metrics['IDF1']:.3f}")

Current Limitations:

  • Fixed class name ("pedestrian") for all objects
  • Single-sequence evaluation only
  • No parallel processing support
  • Limited configuration options for TrackEval parameters

Implementation: trackeval_evaluator.py

Tests: See tests/test_trackeval_evaluator.py for comprehensive test suite with 16 test cases covering configuration, processing, evaluation, and integration workflows.

JitterEvaluator

Purpose: Evaluate tracker smoothness by measuring positional jitter in tracked object trajectories, and compare it against jitter already present in the ground-truth test data.

Status: FULLY IMPLEMENTED — Computes RMS jerk and acceleration variance from both tracker outputs and ground-truth tracks using numerical differentiation.

Supported Metrics:

Metric Source Description
rms_jerk Tracker output RMS jerk across all tracker output tracks (m/s³)
acceleration_variance Tracker output Variance of acceleration magnitudes across all tracker output tracks (m/s²)²
rms_jerk_gt Ground truth Same as rms_jerk computed on ground-truth tracks
acceleration_variance_gt Ground truth Same as acceleration_variance computed on ground-truth tracks
rms_jerk_ratio Tracker / GT rms_jerk / rms_jerk_gt — tracker jitter relative to GT (1.0 = equal)
acceleration_variance_ratio Tracker / GT acceleration_variance / acceleration_variance_gt

Comparing rms_jerk with rms_jerk_gt shows how much jitter the tracker adds on top of any jitter already present in the test data.

Algorithm:

All metrics are derived by applying three sequential layers of forward finite differences to 3D positions, accounting for variable time steps between frames:

$$v_i = \frac{p_{i+1} - p_i}{\Delta t_i}, \quad a_i = \frac{v_{i+1} - v_i}{\Delta t_{v,i}}, \quad j_i = \frac{a_{i+1} - a_i}{\Delta t_{a,i}}$$

  • rms_jerk / rms_jerk_gt: $\sqrt{\frac{1}{N}\sum |j_i|^2}$ over all jerk samples from all tracks.
  • acceleration_variance / acceleration_variance_gt: $\text{Var}(|a_i|)$ over all acceleration magnitude samples from all tracks.
  • rms_jerk_ratio / acceleration_variance_ratio: tracker metric divided by the corresponding GT metric. Returns 0.0 when the GT denominator is zero. Values >1.0 indicate the tracker adds more jitter than is inherent in the ground truth.

Minimum track length: 3 points for acceleration, 4 points for jerk. Shorter tracks are skipped; if no eligible tracks exist, the metric returns 0.0.

For GT metrics, ground-truth frame numbers are converted to relative timestamps using the FPS derived from the tracker output.

Key Features:

  • Builds per-track position histories from canonical tracker output format.
  • Parses MOTChallenge 3D CSV ground-truth file for GT metric computation.
  • Supports variable frame rates — time deltas are computed from actual timestamps.
  • Deduplicates frames with identical timestamps (mirrors TrackEvalEvaluator behaviour).
  • Sorts each track's positions by timestamp before metric computation.
  • Saves a plain-text jitter_results.txt summary to the configured output folder.

DiagnosticEvaluator

Purpose: Per-frame location comparison and error analysis between matched output tracks and ground-truth tracks.

Status: FULLY IMPLEMENTED - Bipartite track matching with per-frame location and distance CSV/plot outputs.

Supported Metrics:

  • LOC_T_X: Per-frame X position of each matched (output, GT) track pair
  • LOC_T_Y: Per-frame Y position of each matched (output, GT) track pair
  • DIST_T: Per-frame Euclidean distance error between each matched pair

Key Features:

  • Track Matching: Bipartite assignment (Hungarian algorithm) minimizing mean Euclidean distance over overlapping frames. Requires a minimum of 10 overlapping frames (MIN_OVERLAP_FRAMES).
  • Missing Frame Handling: Frames where only one side (output or GT) has data produce NaN in CSV output, preserving full temporal context.
  • CSV Output: Per-metric CSV files with headers:
    • LOC_T_X / LOC_T_Y: [frame_id, track_id, gt_id, value_track, value_gt]
    • DIST_T: [frame_id, track_id, gt_id, distance]
  • Plot Output: One matplotlib figure per metric with all matched pairs overlaid.
  • Summary Scalars: evaluate_metrics() returns DIST_T_mean, LOC_T_X_mae, LOC_T_Y_mae, and num_matches.

Usage Example:

from pathlib import Path
from evaluators.jitter_evaluator import JitterEvaluator

evaluator = JitterEvaluator()
evaluator.configure_metrics(['rms_jerk', 'rms_jerk_gt', 'rms_jerk_ratio',
                             'acceleration_variance', 'acceleration_variance_gt',
                             'acceleration_variance_ratio'])
evaluator.set_output_folder(Path('/path/to/results'))

# Pass ground_truth=None to skip GT metrics
evaluator.process_tracker_outputs(tracker_outputs, ground_truth=dataset.get_ground_truth())
metrics = evaluator.evaluate_metrics()

print(f"RMS Jerk (tracker): {metrics['rms_jerk']:.4f} m/s³")
print(f"RMS Jerk (GT):      {metrics['rms_jerk_gt']:.4f} m/s³")
print(f"RMS Jerk ratio:     {metrics['rms_jerk_ratio']:.4f}  (1.0 = equal jitter)")

Pipeline Configuration:

evaluators:
  - class: evaluators.jitter_evaluator.JitterEvaluator
    config:
      metrics:
        [
          rms_jerk,
          rms_jerk_gt,
          rms_jerk_ratio,
          acceleration_variance,
          acceleration_variance_gt,
          acceleration_variance_ratio,
        ]

Implementation: jitter_evaluator.py

Tests: See tests/test_jitter_evaluator.py. from evaluators.diagnostic_evaluator import DiagnosticEvaluator from pathlib import Path

evaluator = DiagnosticEvaluator() metrics = (evaluator .configure_metrics(['LOC_T_X', 'LOC_T_Y', 'DIST_T']) .set_output_folder(Path('/path/to/results')) .process_tracker_outputs(tracker_outputs, gt_file_path) .evaluate_metrics()) print(f"Mean distance: {metrics['DIST_T_mean']:.3f}") print(f"X MAE: {metrics['LOC_T_X_mae']:.3f}") print(f"Y MAE: {metrics['LOC_T_Y_mae']:.3f}") print(f"Matched pairs: {int(metrics['num_matches'])}")


**Current Limitations**:

- Uses only X and Y coordinates (Z ignored)
- Single-sequence evaluation only
- No configurable overlap threshold (fixed at 10 frames)

**Implementation**: [diagnostic_evaluator.py](diagnostic_evaluator.py)

**Tests**: See [tests/test_diagnostic_evaluator.py](tests/test_diagnostic_evaluator.py) for unit tests covering track matching, scalar metrics, CSV output, and reset workflows.

## Adding New Evaluators

To add support for a new metric computation library:

1. **Create evaluator class**: Implement all abstract methods from `TrackerEvaluator` base class (see [../base/tracker_evaluator.py](../base/tracker_evaluator.py))
2. **Integrate metric library**: Wrap the external library (TrackEval, py-motmetrics, etc.) or implement custom code to compute metrics
3. **Handle formats**: Convert canonical tracker outputs and ground-truth to library-specific formats
4. **Support configuration**:
   - `configure_metrics()` - specify which metrics to compute

- `set_output_folder()` - where to save results and plots

5. **Document requirements**: Update this README with supported metrics and configuration options
6. **Create tests**: Add tests validating metric computation and result export

### Implementation Patterns

**Metric computation workflow**:

1. Configure metrics via `configure_metrics(['HOTA', 'MOTA', ...])`
2. Set result output folder via `set_output_folder(Path('/results'))`
3. Process data via `process_tracker_outputs(tracker_outputs, ground_truth)`
4. Compute metrics via `evaluate_metrics()` → returns `Dict[str, float]`
5. Reset state via `reset()` to evaluate another tracker

**Method chaining**:
All configuration methods return `self` for fluent API:

```python
metrics = (evaluator
           .configure_metrics(['HOTA', 'MOTA'])
           .set_output_folder(Path('/results'))
           .process_tracker_outputs(outputs, gt)
           .evaluate_metrics())

Ground-truth format: Evaluators receive ground-truth in MOTChallenge 3D CSV format: See Canonical Data Formats

  • Provided by dataset's get_ground_truth() method

Design Documentation

See tracker-evaluation-pipeline.md for overall architecture and design decisions.