-
Notifications
You must be signed in to change notification settings - Fork 38
Add ADR 7: Tracker Service - Architecture changes for tracking performance scaling #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
26b31fd
add Tracker Service ADR
f095879
minor cleanup
0451b3a
add TLDR
9f6904f
Fixes to data flow section
35c9957
remove performance claims
ec15893
linter fixes
fe51bb3
SIMD clarifications
48f62b5
revise decission
b22fc16
update go service alternative
717aeb8
improve context
e304d29
add appendix section
4ed9c04
fix analytics service inputs
0eedbb4
simplify decission section
b98e9b0
Merge branch 'main' into adr-tracker-service
saratpoluri 3c8c4f6
Apply suggestions from code review
dd40167
add more details into decission section
4c4093c
aded analytics latency to negative consequences
b2735d4
Merge branch 'main' into adr-tracker-service
saratpoluri File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,254 @@ | ||
| # ADR 7: Tracker Service | ||
|
|
||
| - **Author(s)**: [JΓ³zef Daniecki](https://github.com/jdanieck) | ||
| - **Date**: 2025-11-21 | ||
| - **Status**: `Proposed` | ||
|
|
||
| ## TLDR | ||
|
|
||
| Split Controller into two services: **Tracker Service** (pure C++) handles real-time tracking with data-oriented design enabling compiler auto-vectorization (SIMD), while **Analytics Service** (Python) provides spatial analytics and event detection. This eliminates GIL limitation, maximizes CPU cache efficiency, and enables true multiprocessing to meet current scale (300 objects @ 4 cameras @ 15 FPS) and future growth. | ||
|
|
||
| ## Context | ||
|
|
||
| The SceneScape Controller must process multiple scenes with 4 cameras at 15 FPS (67ms frame intervals) with 1000 objects per frame while providing both real-time tracking and rich analytics capabilities. Long-term scale requirements will likely increase across all dimensions: cameras, FPS, scene and object counts. | ||
|
saratpoluri marked this conversation as resolved.
|
||
|
|
||
| SceneScape v2025.2 Controller runs as a single Python microservice that calls C++ via pybind11 for performance-critical operations like positioning, tracking and spatial analytics. However, analysis shows the Python orchestration layer and analytics processing stages create overhead that prevents meeting real-time constraints at target scale | ||
|
|
||
| The current hybrid implementation (Python + C++ pybind11) cannot utilize modern hardware efficiently due to: | ||
|
|
||
| - **GIL prevents true multiprocessing**: Python's Global Interpreter Lock serializes execution, preventing parallel processing across CPU cores | ||
| - **Object-oriented design**: Poor CPU cache utilization from scattered memory access patterns | ||
| - **Boundary overhead**: Repeated memory allocation/deallocation across Python-C++ boundaries | ||
| - **Individual object processing**: Prevents efficient batch operations and compiler auto-vectorization | ||
| - **Mixed critical path**: Real-time tracking mixed with non-critical analytics processing | ||
|
|
||
| ### Data flow overview | ||
|
|
||
| The current Controller service processes all camera data through a hybrid Python + C++ (pybind11) pipeline. | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| subgraph "Input Stage" | ||
| C1["π· Camera 1<br/>MQTT Messages"] | ||
| C2["π· Camera 2<br/>MQTT Messages"] | ||
| S1["π°οΈ Sensor 1<br/>MQTT Messages"] | ||
| S2["π°οΈ Sensor 2<br/>MQTT Messages"] | ||
| end | ||
|
|
||
| subgraph "Controller Service" | ||
| P1["π Message Parsing<br/>(Python)<br/>JSON decode"] | ||
| P2["π Data Validation<br/>(Python)<br/>Schema validation"] | ||
| P3["π§ Coordinate Transform<br/>(C++ via pybind11)"] | ||
| P4["π§ Object Tracking<br/>(C++ via pybind11)"] | ||
| P5["π Spatial Analytics<br/>(Python)<br/>Region checks"] | ||
| P6["π Event Detection<br/>(Python)<br/>State comparison"] | ||
| end | ||
|
|
||
| subgraph "Output Stage" | ||
| O1["π€ Tracking MQTT<br/>`scenescape/data/scene/{scene_id}/{thing_type}`"] | ||
| O2["π€ Analytics MQTT<br/>`scenescape/regulated/scene/{scene_id}`"] | ||
| O3["π€ Event MQTT<br/>`scenescape/event/...`"] | ||
| end | ||
|
|
||
| C1 --> P1 | ||
| C2 --> P1 | ||
| S1 --> P1 | ||
| S2 --> P1 | ||
|
|
||
| P1 --> P2 | ||
| P2 --> P3 | ||
| P3 --> P4 | ||
| P4 --> P5 | ||
| P5 --> P6 | ||
|
|
||
| P4 --> O1 | ||
| P5 --> O2 | ||
| P6 --> O3 | ||
|
|
||
| style P1 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0 | ||
| style P2 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0 | ||
| style P3 fill:#2d3748,stroke:#90cdf4,stroke-width:3px,color:#bee3f8 | ||
| style P4 fill:#2d3748,stroke:#90cdf4,stroke-width:3px,color:#bee3f8 | ||
| style P5 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0 | ||
| style P6 fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0 | ||
| ``` | ||
|
|
||
| **Legend:** | ||
|
|
||
| - π **Python**: Orchestration and analytics logic | ||
| - π§ **C++ (pybind11)**: Performance-critical operations called from Python | ||
|
|
||
| ### Python GIL prevents true parallelism | ||
|
|
||
| The Global Interpreter Lock (GIL) in CPython allows only one thread to execute Python bytecode at a time, even on multi-core processors. For the current hybrid architecture, this creates critical performance limitations: | ||
|
|
||
| 1. **Serialization**: When processing 1000 objects per frame, even though C++ tracking code releases the GIL, the Python orchestration layer (message parsing, validation, analytics) still requires the GIL. Multiple camera streams cannot process Python code in parallel, forcing sequential execution despite having multiple CPU cores available. | ||
|
|
||
| 1. **Context switching overhead**: Each transition between Python and C++ requires acquiring and releasing the GIL. This constant lock contention creates CPU cycles wasted on synchronization rather than useful computation. | ||
|
|
||
| 1. **Cache invalidation**: Thread switching during GIL acquisition/release invalidates CPU caches, degrading performance of both Python and C++ code paths. Data that was in L1/L2 cache gets evicted, forcing slower memory accesses. | ||
|
|
||
| ### Memory layout: Object-Oriented vs Data-Oriented Design | ||
|
|
||
| The current implementation uses **Object-Oriented Design (OOD)** where each tracked object is represented as a class instance with methods and encapsulated data. While this provides clean abstractions, it creates severe performance penalties for batch processing workloads. | ||
|
|
||
| **Object-Oriented Approach** (current): | ||
|
|
||
| ```python | ||
| class TrackedObject: | ||
| def __init__(self, id, position, velocity): | ||
| self.id = id | ||
| self.position = position | ||
| self.velocity = velocity | ||
|
|
||
| def update(self, detection): | ||
| # Process one object at a time | ||
| self.position = transform(detection) | ||
| self.velocity = calculate_velocity(self.position) | ||
|
|
||
| # Process 1000 objects individually | ||
| for obj in tracked_objects: | ||
| obj.update(detection) # Scattered memory access, pointer chasing | ||
| ``` | ||
|
|
||
| **Problems with OOD for batch processing**: | ||
|
|
||
| - **Cache misses**: Each object scattered in memory, accessing `obj.position` causes cache miss | ||
| - **Pointer chasing**: Following object pointers prevents CPU prefetching | ||
| - **No auto-vectorization**: Compiler cannot vectorize operations across scattered individual objects | ||
| - **Memory overhead**: Each object has vtable pointers, padding, heap allocation overhead | ||
|
|
||
| **Data-Oriented Design (DOD)** (proposed): | ||
|
|
||
| ```cpp | ||
| struct TrackedObjects { | ||
| std::vector<int> ids; // All IDs together | ||
| std::vector<vec3> positions; // All positions together | ||
| std::vector<vec3> velocities; // All velocities together | ||
| }; | ||
|
|
||
| // Process all 1000 objects in batches | ||
| transform_batch(detections, positions); // Compiler auto-vectorizes | ||
| calculate_velocities_batch(positions, velocities); // Compiler auto-vectorizes | ||
| ``` | ||
|
|
||
| **Benefits of DOD** (as per [Mike Acton's CppCon talk](https://www.youtube.com/watch?v=rX0ItVEVjHc)): | ||
|
|
||
| - **Cache efficiency**: Contiguous arrays fit in cache lines, CPU prefetcher works optimally | ||
| - **Compiler auto-vectorization**: Structure enables compiler to generate SIMD instructions (AVX/AVX2) processing 4-8 objects per CPU cycle | ||
| - **No pointer chasing**: Sequential memory access patterns | ||
| - **Minimal overhead**: Plain data arrays without object metadata | ||
|
|
||
| ## Decision | ||
|
|
||
| Split the Controller into two specialized services to address the fundamental performance bottlenecks identified above. | ||
|
|
||
| **Why separation is necessary:** | ||
|
|
||
| 1. **Eliminate GIL serialization**: Moving tracking to pure C++ removes Python's GIL entirely from the critical real-time path. This enables true parallel processing across multiple camera streams on multi-core CPUsβimpossible with any Python-based architecture. | ||
|
|
||
| 2. **Enable data-oriented design**: A pure C++ service allows restructuring from object-oriented (scattered memory) to data-oriented (contiguous arrays) design. This transformation: | ||
| - Enables compiler auto-vectorization (SIMD) processing 4-8 objects per CPU cycle | ||
| - Maximizes CPU cache efficiency through contiguous memory access | ||
| - Cannot be achieved in the Python orchestration layer due to language constraints | ||
|
|
||
| 3. **Remove Python-C++ boundary overhead**: The current architecture incurs repeated memory allocation/deallocation and GIL acquire/release on every pybind11 call. A pure C++ tracking service eliminates these transitions entirely from the hot path. | ||
|
|
||
| 4. **Decouple critical paths**: Real-time tracking requires different architecture than analytics (no strict timing). Separating them prevents analytics processing from interfering with tracking latency. | ||
|
|
||
| **The two services:** | ||
|
|
||
| - **Tracker Service** (pure C++) handles the critical real-time tracking path with data-oriented design | ||
| - **Analytics Service** (Python, refactored Controller) provides analytics and event detection, maintaining Python for rapid development velocity (see [Alternative 2](#2-monolithic-c-rewrite)) | ||
|
|
||
| See [Implementation Plan](#implementation-plan) for the phased migration strategy. | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| subgraph "Edge Inputs" | ||
| CAM["π· Cameras<br/>`scenescape/data/camera/{camera_id}`"] | ||
| SEN["π°οΈ Sensors<br/>`scenescape/data/sensor/{sensor_id}`"] | ||
| end | ||
|
|
||
| CAM --> CPP | ||
| SEN --> CPP | ||
|
|
||
| subgraph "Tracker Service" | ||
| CPP["π§ C++ Tracker<br/>parse β’ transform β’ track"] | ||
| end | ||
|
|
||
| CPP --> TRACK_OUT["π€ MQTT<br/>`scenescape/data/scene/{scene_id}/{thing_type}`"] | ||
|
|
||
| TRACK_OUT --> ANALYTICS | ||
|
|
||
| subgraph "Analytics Service" | ||
| ANALYTICS["π Python Analytics<br/>analytics β’ events"] | ||
| end | ||
|
|
||
| ANALYTICS --> REG["π€ MQTT<br/>`scenescape/regulated/scene/{scene_id}`"] | ||
| ANALYTICS --> EVT["π€ MQTT<br/>`scenescape/event/{region_type}/{scene_id}/{region_id}/{event_type}`"] | ||
|
|
||
| style CPP fill:#2d3748,stroke:#90cdf4,stroke-width:3px,color:#bee3f8 | ||
| style ANALYTICS fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#e2e8f0 | ||
| ``` | ||
|
|
||
| **Legend:** | ||
|
|
||
| - π **Python**: Analytics and orchestration logic | ||
| - π§ **C++**: Real-time tracking operations | ||
| - π€ **MQTT**: Message broker topics | ||
|
|
||
| ## Alternatives Considered | ||
|
|
||
| ### 1. Optimize Current Python + pybind11 Architecture | ||
|
|
||
| - **Pros**: Minimal change, leverages existing code | ||
| - **Cons**: Cannot eliminate GIL overhead, boundary costs, or OOD limitations; limited performance upside | ||
|
|
||
| ### 2. Monolithic C++ Rewrite | ||
|
|
||
| - **Pros**: Maximum performance, no language boundaries | ||
| - **Cons**: Slower analytics development velocity, loses Python ML/AI ecosystem benefits | ||
|
|
||
| ### 3. Tracker Service in Go | ||
|
|
||
| - **Pros**: Native concurrency, good performance, memory safety, team familiarity | ||
| - **Cons**: Reusing existing C++ tracking code requires C bindings, limited compiler auto-vectorization compared to C++, GC pauses affect real-time guarantees | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
|
|
||
| - Utilizes modern hardware efficiently (no GIL, data-oriented design enables compiler auto-vectorization) | ||
| - Reuses existing tracking algorithms | ||
| - Independent scaling and fault isolation per service | ||
| - Analytics continue rapid Python development | ||
|
|
||
| ### Negative | ||
|
|
||
| - Two services to deploy and maintain | ||
| - MQTT communication overhead between services adds latency to analytics | ||
| - Cross-service debugging complexity | ||
|
|
||
|
jdanieck marked this conversation as resolved.
|
||
| ## Appendix | ||
|
|
||
| ### Implementation Plan | ||
|
|
||
| This is a gradual migration using feature flags to maintain backward compatibility. The Controller runs by default while the Tracker Service is developed and validated. | ||
|
|
||
| **Phase 1: Tracker Service Development** | ||
|
|
||
| 1. POC - Minimal implementation validated with load tests to measure performance gains | ||
| 2. MVP - Works with out-of-the-box (OOB) scenes | ||
| 3. v1.0 - Feature parity with Controller tracking (VDMS, NTP, etc.) | ||
|
|
||
| **Phase 2: Migration** | ||
|
|
||
| 1. Enable Tracker Service as default, Controller in analytics-only mode | ||
| 2. Refactor Controller analytics into Analytics Service | ||
| 3. Enable Analytics Service as default and retire Controller | ||
|
|
||
|
saratpoluri marked this conversation as resolved.
|
||
| ### References | ||
|
|
||
| - [Spatial Analytics developer guide](https://github.com/open-edge-platform/scenescape/pull/598) | ||
| - [CppCon 2014: Mike Acton "Data-Oriented Design and C++"](https://www.youtube.com/watch?v=rX0ItVEVjHc) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main assumption is that analytics can tolerate added latency compared to tracking, which has strict real-time requirements. The proposed split is effectively "real-time service" vs "near-real-time service."
If analytics also requires hard real-time guarantees, we should pursue Alternative 2: full C++ rewrite and keep it as a monolith. However, since we already have analytics in Python and it doesn't have the same latency constraints, we can start with this hybrid approach. If performance measurements later show Analytics becoming a bottleneck, we can incrementally migrate it to C++βa more manageable transition than rewriting everything upfront.
Regarding the processing volume: the Tracker Service handles significantly more data (1000 objects Γ 4 cameras Γ 15 FPS) than Analytics, which processes aggregated results. This asymmetry further supports the split architecture.
I agree that optimizing serialization/deserialization is worth investigating. I'll benchmark message encoding options (JSON vs Protobuf vs FlatBuffers) with 1k object payloads so we can make an informed decision based on actual performance data rather than assumptions.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rawatts10 @saratpoluri completed serialization benchmarks for the Tracker/Analytics split in #636.
Good news: We can keep JSON initially. C++'s faster processing mostly compensates for the added MQTT hop - measured 6% improvement on localhost (1,370ΞΌs β 1,290ΞΌs per frame).
Migrating to Protobuf afterward would add 43% more through faster serialization and 29-63% smaller messages.
Proposed:
See ANALYSIS.md for details and RESULTS.md for raw benchmark data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's our policy on backward compatibility for inter-service communication? Can we make breaking changes to the message format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, no policy. We make a breaking change if needed and let the users complain. In future, if we implement protobuf, then backward compatibility can be discussed.