Every tracker in trackers uses a Kalman filter to predict where objects will appear in the next frame. The state estimator controls how bounding boxes are represented inside that filter. Different representations make different assumptions about object motion, and picking the right one can improve tracking quality without changing anything else.
What you'll learn:
- What state estimators are and why they matter
- How
XYXYStateEstimatorandXCYCSRStateEstimatorrepresent bounding boxes - When to use each representation
- How to swap the state estimator in any tracker
Get started by installing the package.
pip install trackers
For more options, see the install guide.
A state estimator wraps a Kalman filter and defines how bounding boxes are encoded into the filter's state vector. The Kalman filter then predicts the next position of each tracked object and corrects that prediction when a new detection arrives.
Two representations are available:
| Estimator | State Dimensions | Representation | Aspect Ratio |
|---|---|---|---|
XYXYStateEstimator |
8 | Top-left and bottom-right corners + their velocities | Can change |
XCYCSRStateEstimator |
7 | Center point, area, their velocities and aspect ratio | Held constant |
They accept [x1, y1, x2, y2] bounding boxes on input and produce [x1, y1, x2, y2] bounding boxes on output. The difference is entirely in how the filter models motion internally.
XYXYStateEstimator tracks the four corner coordinates independently. Each corner gets its own velocity term, giving the filter 8 state variables:
State: [x1, y1, x2, y2, vx1, vy1, vx2, vy2]
Measure: [x1, y1, x2, y2]
The transition matrix
State order:
Equivalent update equations:
x1' = x1 + vx1
y1' = y1 + vy1
x2' = x2 + vx2
y2' = y2 + vy2
vx1' = vx1
vy1' = vy1
vx2' = vx2
vy2' = vy2
| Row | Meaning |
|---|---|
| 1-4 | Each corner coordinate is updated by adding its velocity |
| 5-8 | Velocities persist unchanged from frame to frame |
Because each corner moves freely, the box width and height can change between frames. This makes XYXY a natural fit when objects change shape — due to camera perspective, non-rigid motion, or inconsistent detections.
In Trackers, this is the default for ByteTrackTracker and SORTTracker.
XCYCSRStateEstimator tracks the box center, area (scale), and aspect ratio. Only the center and scale get velocity terms; aspect ratio is treated as constant. This gives 7 state variables:
State: [x_center, y_center, scale, aspect_ratio, vx, vy, vs]
Measure: [x_center, y_center, scale, aspect_ratio]
The transition matrix
State order:
Equivalent update equations:
x_center' = x_center + vx
y_center' = y_center + vy
scale' = scale + vs
aspect_ratio' = aspect_ratio
vx' = vx
vy' = vy
vs' = vs
| Row | Meaning |
|---|---|
| 1-3 | Center position and scale follow constant-velocity motion |
| 4 | Aspect ratio is copied forward unchanged |
| 5-7 | Velocities persist unchanged from frame to frame |
The aspect ratio r = w / h is carried forward unchanged. This acts as a regularizer — the filter resists sudden shape changes. It works well for rigid objects whose proportions stay consistent, like pedestrians walking or cars on a highway.
This is the default for OCSORTTracker, matching the original OC-SORT paper.
| Scenario | Recommended | Why |
|---|---|---|
| Pedestrians, vehicles, rigid objects | XCYCSRStateEstimator |
Constant aspect ratio stabilizes predictions |
| Non-rigid or deformable objects | XYXYStateEstimator |
Corners move independently to track shape changes |
| Noisy detections with fluctuating box sizes | XCYCSRStateEstimator |
Aspect ratio constraint absorbs size noise |
| Strong perspective changes (camera pan/zoom) | XYXYStateEstimator |
Box proportions shift with viewpoint; corners adapt freely |
| Default choice when unsure | XYXYStateEstimator |
More general, fewer assumptions |
All trackers accept a state_estimator_class parameter. Import the class you want and pass it to the constructor.
=== "ByteTrack with XCYCSR"
```python
from trackers import ByteTrackTracker
from trackers.utils.state_representations import XCYCSRStateEstimator
tracker = ByteTrackTracker(
state_estimator_class=XCYCSRStateEstimator,
)
```
=== "OC-SORT with XYXY"
```python
from trackers import OCSORTTracker
from trackers.utils.state_representations import XYXYStateEstimator
tracker = OCSORTTracker(
state_estimator_class=XYXYStateEstimator,
)
```
=== "SORT with XCYCSR"
```python
from trackers import SORTTracker
from trackers.utils.state_representations import XCYCSRStateEstimator
tracker = SORTTracker(
state_estimator_class=XCYCSRStateEstimator,
)
```
Everything else stays the same — detection, association, and visualization work identically regardless of which estimator you choose.
Run ByteTrack with both estimators on the same video and compare the results side by side.
import cv2
import supervision as sv
from inference import get_model
from trackers import ByteTrackTracker
from trackers.utils.state_representations import (
XCYCSRStateEstimator,
XYXYStateEstimator,
)
model = get_model("rfdetr-nano")
tracker_xyxy = ByteTrackTracker(
state_estimator_class=XYXYStateEstimator,
)
tracker_xcycsr = ByteTrackTracker(
state_estimator_class=XCYCSRStateEstimator,
)
cap = cv2.VideoCapture("source.mp4")
while True:
ret, frame = cap.read()
if not ret:
break
result = model.infer(frame)[0]
detections = sv.Detections.from_inference(result)
tracked_xyxy = tracker_xyxy.update(detections.copy())
tracked_xcycsr = tracker_xcycsr.update(detections.copy())
# Compare tracker_id assignments, box smoothness, etc.
print(f"XYXY IDs: {tracked_xyxy.tracker_id}")
print(f"XCYCSR IDs: {tracked_xcycsr.tracker_id}")The state estimator is a single-line change that controls how the Kalman filter models bounding box motion. Use XCYCSRStateEstimator when objects keep a consistent shape, and XYXYStateEstimator when shape varies or you want fewer assumptions. Try it on your case, the best choice depends on the scene.