Skip to content

Commit 2ecc59e

Browse files
Józef Danieckikblaszczak-intel
authored andcommitted
Add ADR 7: Tracker Service - Architecture changes for tracking performance scaling (open-edge-platform#611)
1 parent 7f76d3b commit 2ecc59e

1 file changed

Lines changed: 254 additions & 0 deletions

File tree

docs/adr/0007-tracker-service.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# ADR 7: Tracker Service
2+
3+
- **Author(s)**: [Józef Daniecki](https://github.com/jdanieck)
4+
- **Date**: 2025-11-21
5+
- **Status**: `Proposed`
6+
7+
## TLDR
8+
9+
Split Controller into two services: **Tracker Service** (pure C++) handles real-time tracking with data-oriented design enabling compiler auto-vectorization (SIMD), while **Analytics Service** (Python) provides spatial analytics and event detection. This eliminates GIL limitation, maximizes CPU cache efficiency, and enables true multiprocessing to meet current scale (300 objects @ 4 cameras @ 15 FPS) and future growth.
10+
11+
## Context
12+
13+
The SceneScape Controller must process multiple scenes with 4 cameras at 15 FPS (67ms frame intervals) with 1000 objects per frame while providing both real-time tracking and rich analytics capabilities. Long-term scale requirements will likely increase across all dimensions: cameras, FPS, scene and object counts.
14+
15+
SceneScape v2025.2 Controller runs as a single Python microservice that calls C++ via pybind11 for performance-critical operations like positioning, tracking and spatial analytics. However, analysis shows the Python orchestration layer and analytics processing stages create overhead that prevents meeting real-time constraints at target scale
16+
17+
The current hybrid implementation (Python + C++ pybind11) cannot utilize modern hardware efficiently due to:
18+
19+
- **GIL prevents true multiprocessing**: Python's Global Interpreter Lock serializes execution, preventing parallel processing across CPU cores
20+
- **Object-oriented design**: Poor CPU cache utilization from scattered memory access patterns
21+
- **Boundary overhead**: Repeated memory allocation/deallocation across Python-C++ boundaries
22+
- **Individual object processing**: Prevents efficient batch operations and compiler auto-vectorization
23+
- **Mixed critical path**: Real-time tracking mixed with non-critical analytics processing
24+
25+
### Data flow overview
26+
27+
The current Controller service processes all camera data through a hybrid Python + C++ (pybind11) pipeline.
28+
29+
```mermaid
30+
flowchart TD
31+
subgraph "Input Stage"
32+
C1["📷 Camera 1<br/>MQTT Messages"]
33+
C2["📷 Camera 2<br/>MQTT Messages"]
34+
S1["🛰️ Sensor 1<br/>MQTT Messages"]
35+
S2["🛰️ Sensor 2<br/>MQTT Messages"]
36+
end
37+
38+
subgraph "Controller Service"
39+
P1["🐍 Message Parsing<br/>(Python)<br/>JSON decode"]
40+
P2["🐍 Data Validation<br/>(Python)<br/>Schema validation"]
41+
P3["🔧 Coordinate Transform<br/>(C++ via pybind11)"]
42+
P4["🔧 Object Tracking<br/>(C++ via pybind11)"]
43+
P5["🐍 Spatial Analytics<br/>(Python)<br/>Region checks"]
44+
P6["🐍 Event Detection<br/>(Python)<br/>State comparison"]
45+
end
46+
47+
subgraph "Output Stage"
48+
O1["📤 Tracking MQTT<br/>`scenescape/data/scene/{scene_id}/{thing_type}`"]
49+
O2["📤 Analytics MQTT<br/>`scenescape/regulated/scene/{scene_id}`"]
50+
O3["📤 Event MQTT<br/>`scenescape/event/...`"]
51+
end
52+
53+
C1 --> P1
54+
C2 --> P1
55+
S1 --> P1
56+
S2 --> P1
57+
58+
P1 --> P2
59+
P2 --> P3
60+
P3 --> P4
61+
P4 --> P5
62+
P5 --> P6
63+
64+
P4 --> O1
65+
P5 --> O2
66+
P6 --> O3
67+
68+
style P1 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0
69+
style P2 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0
70+
style P3 fill:#2d3748,stroke:#90cdf4,stroke-width:3px,color:#bee3f8
71+
style P4 fill:#2d3748,stroke:#90cdf4,stroke-width:3px,color:#bee3f8
72+
style P5 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0
73+
style P6 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0
74+
```
75+
76+
**Legend:**
77+
78+
- 🐍 **Python**: Orchestration and analytics logic
79+
- 🔧 **C++ (pybind11)**: Performance-critical operations called from Python
80+
81+
### Python GIL prevents true parallelism
82+
83+
The Global Interpreter Lock (GIL) in CPython allows only one thread to execute Python bytecode at a time, even on multi-core processors. For the current hybrid architecture, this creates critical performance limitations:
84+
85+
1. **Serialization**: When processing 1000 objects per frame, even though C++ tracking code releases the GIL, the Python orchestration layer (message parsing, validation, analytics) still requires the GIL. Multiple camera streams cannot process Python code in parallel, forcing sequential execution despite having multiple CPU cores available.
86+
87+
1. **Context switching overhead**: Each transition between Python and C++ requires acquiring and releasing the GIL. This constant lock contention creates CPU cycles wasted on synchronization rather than useful computation.
88+
89+
1. **Cache invalidation**: Thread switching during GIL acquisition/release invalidates CPU caches, degrading performance of both Python and C++ code paths. Data that was in L1/L2 cache gets evicted, forcing slower memory accesses.
90+
91+
### Memory layout: Object-Oriented vs Data-Oriented Design
92+
93+
The current implementation uses **Object-Oriented Design (OOD)** where each tracked object is represented as a class instance with methods and encapsulated data. While this provides clean abstractions, it creates severe performance penalties for batch processing workloads.
94+
95+
**Object-Oriented Approach** (current):
96+
97+
```python
98+
class TrackedObject:
99+
def __init__(self, id, position, velocity):
100+
self.id = id
101+
self.position = position
102+
self.velocity = velocity
103+
104+
def update(self, detection):
105+
# Process one object at a time
106+
self.position = transform(detection)
107+
self.velocity = calculate_velocity(self.position)
108+
109+
# Process 1000 objects individually
110+
for obj in tracked_objects:
111+
obj.update(detection) # Scattered memory access, pointer chasing
112+
```
113+
114+
**Problems with OOD for batch processing**:
115+
116+
- **Cache misses**: Each object scattered in memory, accessing `obj.position` causes cache miss
117+
- **Pointer chasing**: Following object pointers prevents CPU prefetching
118+
- **No auto-vectorization**: Compiler cannot vectorize operations across scattered individual objects
119+
- **Memory overhead**: Each object has vtable pointers, padding, heap allocation overhead
120+
121+
**Data-Oriented Design (DOD)** (proposed):
122+
123+
```cpp
124+
struct TrackedObjects {
125+
std::vector<int> ids; // All IDs together
126+
std::vector<vec3> positions; // All positions together
127+
std::vector<vec3> velocities; // All velocities together
128+
};
129+
130+
// Process all 1000 objects in batches
131+
transform_batch(detections, positions); // Compiler auto-vectorizes
132+
calculate_velocities_batch(positions, velocities); // Compiler auto-vectorizes
133+
```
134+
135+
**Benefits of DOD** (as per [Mike Acton's CppCon talk](https://www.youtube.com/watch?v=rX0ItVEVjHc)):
136+
137+
- **Cache efficiency**: Contiguous arrays fit in cache lines, CPU prefetcher works optimally
138+
- **Compiler auto-vectorization**: Structure enables compiler to generate SIMD instructions (AVX/AVX2) processing 4-8 objects per CPU cycle
139+
- **No pointer chasing**: Sequential memory access patterns
140+
- **Minimal overhead**: Plain data arrays without object metadata
141+
142+
## Decision
143+
144+
Split the Controller into two specialized services to address the fundamental performance bottlenecks identified above.
145+
146+
**Why separation is necessary:**
147+
148+
1. **Eliminate GIL serialization**: Moving tracking to pure C++ removes Python's GIL entirely from the critical real-time path. This enables true parallel processing across multiple camera streams on multi-core CPUs—impossible with any Python-based architecture.
149+
150+
2. **Enable data-oriented design**: A pure C++ service allows restructuring from object-oriented (scattered memory) to data-oriented (contiguous arrays) design. This transformation:
151+
- Enables compiler auto-vectorization (SIMD) processing 4-8 objects per CPU cycle
152+
- Maximizes CPU cache efficiency through contiguous memory access
153+
- Cannot be achieved in the Python orchestration layer due to language constraints
154+
155+
3. **Remove Python-C++ boundary overhead**: The current architecture incurs repeated memory allocation/deallocation and GIL acquire/release on every pybind11 call. A pure C++ tracking service eliminates these transitions entirely from the hot path.
156+
157+
4. **Decouple critical paths**: Real-time tracking requires different architecture than analytics (no strict timing). Separating them prevents analytics processing from interfering with tracking latency.
158+
159+
**The two services:**
160+
161+
- **Tracker Service** (pure C++) handles the critical real-time tracking path with data-oriented design
162+
- **Analytics Service** (Python, refactored Controller) provides analytics and event detection, maintaining Python for rapid development velocity (see [Alternative 2](#2-monolithic-c-rewrite))
163+
164+
See [Implementation Plan](#implementation-plan) for the phased migration strategy.
165+
166+
```mermaid
167+
flowchart TD
168+
subgraph "Edge Inputs"
169+
CAM["📷 Cameras<br/>`scenescape/data/camera/{camera_id}`"]
170+
SEN["🛰️ Sensors<br/>`scenescape/data/sensor/{sensor_id}`"]
171+
end
172+
173+
CAM --> CPP
174+
SEN --> CPP
175+
176+
subgraph "Tracker Service"
177+
CPP["🔧 C++ Tracker<br/>parse • transform • track"]
178+
end
179+
180+
CPP --> TRACK_OUT["📤 MQTT<br/>`scenescape/data/scene/{scene_id}/{thing_type}`"]
181+
182+
TRACK_OUT --> ANALYTICS
183+
184+
subgraph "Analytics Service"
185+
ANALYTICS["🐍 Python Analytics<br/>analytics • events"]
186+
end
187+
188+
ANALYTICS --> REG["📤 MQTT<br/>`scenescape/regulated/scene/{scene_id}`"]
189+
ANALYTICS --> EVT["📤 MQTT<br/>`scenescape/event/{region_type}/{scene_id}/{region_id}/{event_type}`"]
190+
191+
style CPP fill:#2d3748,stroke:#90cdf4,stroke-width:3px,color:#bee3f8
192+
style ANALYTICS fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0
193+
```
194+
195+
**Legend:**
196+
197+
- 🐍 **Python**: Analytics and orchestration logic
198+
- 🔧 **C++**: Real-time tracking operations
199+
- 📤 **MQTT**: Message broker topics
200+
201+
## Alternatives Considered
202+
203+
### 1. Optimize Current Python + pybind11 Architecture
204+
205+
- **Pros**: Minimal change, leverages existing code
206+
- **Cons**: Cannot eliminate GIL overhead, boundary costs, or OOD limitations; limited performance upside
207+
208+
### 2. Monolithic C++ Rewrite
209+
210+
- **Pros**: Maximum performance, no language boundaries
211+
- **Cons**: Slower analytics development velocity, loses Python ML/AI ecosystem benefits
212+
213+
### 3. Tracker Service in Go
214+
215+
- **Pros**: Native concurrency, good performance, memory safety, team familiarity
216+
- **Cons**: Reusing existing C++ tracking code requires C bindings, limited compiler auto-vectorization compared to C++, GC pauses affect real-time guarantees
217+
218+
## Consequences
219+
220+
### Positive
221+
222+
- Utilizes modern hardware efficiently (no GIL, data-oriented design enables compiler auto-vectorization)
223+
- Reuses existing tracking algorithms
224+
- Independent scaling and fault isolation per service
225+
- Analytics continue rapid Python development
226+
227+
### Negative
228+
229+
- Two services to deploy and maintain
230+
- MQTT communication overhead between services adds latency to analytics
231+
- Cross-service debugging complexity
232+
233+
## Appendix
234+
235+
### Implementation Plan
236+
237+
This is a gradual migration using feature flags to maintain backward compatibility. The Controller runs by default while the Tracker Service is developed and validated.
238+
239+
**Phase 1: Tracker Service Development**
240+
241+
1. POC - Minimal implementation validated with load tests to measure performance gains
242+
2. MVP - Works with out-of-the-box (OOB) scenes
243+
3. v1.0 - Feature parity with Controller tracking (VDMS, NTP, etc.)
244+
245+
**Phase 2: Migration**
246+
247+
1. Enable Tracker Service as default, Controller in analytics-only mode
248+
2. Refactor Controller analytics into Analytics Service
249+
3. Enable Analytics Service as default and retire Controller
250+
251+
### References
252+
253+
- [Spatial Analytics developer guide](https://github.com/open-edge-platform/scenescape/pull/598)
254+
- [CppCon 2014: Mike Acton "Data-Oriented Design and C++"](https://www.youtube.com/watch?v=rX0ItVEVjHc)

0 commit comments

Comments
 (0)