Skip to content

Commit 7dc8928

Browse files
authored
[DOCS}: Added Docs for ROS2 KPI Framework (open-edge-platform#2253)
1 parent 1eef008 commit 7dc8928

10 files changed

Lines changed: 1170 additions & 0 deletions

File tree

robotics-ai-suite/docs/robotics/dev_guide/index_tutorials.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,5 @@ these tutorials provide a learning path for developers to use and configure Auto
2020
tutorials_amr/perception/index
2121
tutorials_amr/navigation/index
2222
tutorials_amr/simulation/index
23+
tutorials_amr/kpi_monitoring/index
2324
tutorials_amr/robot-tutorials-troubleshooting
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
<!--
2+
Copyright (C) 2026 Intel Corporation
3+
4+
SPDX-License-Identifier: Apache-2.0
5+
-->
6+
7+
# Command Reference
8+
9+
## Monitoring Modes
10+
11+
| Mode | Tracks | Overhead | Use when |
12+
|------|--------|----------|----------|
13+
| **Thread** (default) | Individual threads (TIDs) | ~5–10% | Debugging, optimization |
14+
| **PID** (`--pid-only`) | Processes only | ~2–3% | Production, long-term runs |
15+
16+
## Quick Reference
17+
18+
| Task | Command | Duration |
19+
|------|---------|----------|
20+
| Quick check | `make quick-check` | 30 s |
21+
| Full monitor | `make monitor` | 60 s |
22+
| Full monitor (PID mode) | `make monitor-pid` | 60 s |
23+
| Monitor specific node | `make monitor NODE=/my_node` | 60 s |
24+
| Extended session | `make monitor-long` | 5 min |
25+
| Graph only | `make graph-only` | 60 s |
26+
| Resources only (threads) | `make resources-threads` | 60 s |
27+
| Resources only (PIDs) | `make resources-pid` | 60 s |
28+
| Remote system | `make monitor-remote REMOTE_IP=<ip>` | 60 s |
29+
| Remote system (PID mode) | `make monitor-remote-pid REMOTE_IP=<ip>` | 60 s |
30+
| Pipeline graph (PNG) | `make pipeline-graph` ||
31+
| Pipeline graph (session) | `make pipeline-graph SESSION=<name>` ||
32+
| List sessions | `make list-sessions` ||
33+
| Re-visualize last session | `make visualize-last` ||
34+
| Clean all data | `make clean` ||
35+
36+
All `make` targets accept optional variables: `NODE=`, `DURATION=`, `INTERVAL=`,
37+
`SESSION=`, `REMOTE_IP=`, and `REMOTE_USER=`.
38+
39+
```bash
40+
make monitor NODE=/slam_toolbox DURATION=120 INTERVAL=2
41+
make monitor-remote REMOTE_IP=192.168.1.100 NODE=/slam_toolbox REMOTE_USER=ros
42+
```
43+
44+
## monitor_stack.py
45+
46+
```bash
47+
uv run python src/monitor_stack.py [OPTIONS]
48+
```
49+
50+
| Option | Description |
51+
|--------|-------------|
52+
| `--node NAME` | Narrow graph discovery to one node (proc delay measured for all nodes) |
53+
| `--session NAME` | Name for this session (default: timestamp) |
54+
| `--duration SECS` | Auto-stop after N seconds |
55+
| `--interval SECS` | Update interval (default: 5) |
56+
| `--output-dir PATH` | Where to save results |
57+
| `--graph-only` | Skip resource monitoring |
58+
| `--resources-only` | Skip graph monitoring |
59+
| `--pid-only` | Process-level only, no thread details |
60+
| `--no-visualize` | Skip auto-visualization on exit |
61+
| `--remote-ip IP` | Monitor a remote machine |
62+
| `--remote-user USER` | SSH user for remote machine (default: ubuntu) |
63+
| `--list-sessions` | List previous sessions and exit |
64+
65+
```bash
66+
uv run python src/monitor_stack.py --node /slam_toolbox --session my_test --duration 120
67+
uv run python src/monitor_stack.py --remote-ip 192.168.1.100 --node /slam_toolbox
68+
uv run python src/monitor_stack.py --resources-only --pid-only --duration 60
69+
```
70+
71+
## ros2_graph_monitor.py
72+
73+
```bash
74+
uv run python src/ros2_graph_monitor.py # All nodes
75+
uv run python src/ros2_graph_monitor.py --node /slam_toolbox # Scope to one node
76+
uv run python src/ros2_graph_monitor.py --node /ctrl --log t.csv # With CSV logging
77+
uv run python src/ros2_graph_monitor.py --interval 2 # Custom interval
78+
uv run python src/ros2_graph_monitor.py --remote-ip 192.168.1.100
79+
```
80+
81+
## monitor_resources.py
82+
83+
```bash
84+
uv run python src/monitor_resources.py # CPU only
85+
uv run python src/monitor_resources.py --memory --threads # CPU + memory + threads
86+
uv run python src/monitor_resources.py --memory --log out.log # With logging
87+
uv run python src/monitor_resources.py --list # List ROS2 processes
88+
uv run python src/monitor_resources.py --remote-ip 192.168.1.100 --memory
89+
```
90+
91+
## visualize_timing.py
92+
93+
```bash
94+
uv run python src/visualize_timing.py timing.csv --delays --frequencies --output-dir ./plots/
95+
```
96+
97+
| Option | Description |
98+
|--------|-------------|
99+
| `--timestamps` | Message arrival scatter plot |
100+
| `--frequencies` | Topic message rates over time |
101+
| `--delays` | Processing delay over time |
102+
| `--inter-arrival` | Inter-message timing / jitter |
103+
| `--output-dir DIR` | Save plots as PNG (omit to display interactively) |
104+
| `--summary` | Print statistics only, no plots |
105+
106+
## visualize_resources.py
107+
108+
```bash
109+
uv run python src/visualize_resources.py resource.log --cores --heatmap --top 10 --output-dir ./plots/
110+
uv run python src/visualize_resources.py resource.log --summary
111+
```
112+
113+
| Option | Description |
114+
|--------|-------------|
115+
| `--cores` | CPU utilization per core over time |
116+
| `--pids` | CPU utilization per PID/thread (top N) |
117+
| `--heatmap` | Core utilization heatmap |
118+
| `--mapping` | Thread-to-core scatter plot |
119+
| `--top N` | Number of top threads to show (default: 10) |
120+
| `--output-dir DIR` | Save plots as PNG |
121+
| `--summary` | Print statistics only, no plots |
122+
123+
> **Note:** `pidstat` reports CPU% where 100% = 1 full core. On a 20-core
124+
> system the maximum is 2000%. Use the **Avg Cores** column in `--summary`
125+
> output for a human-readable reading.
126+
127+
## visualize_graph.py
128+
129+
Renders the ROS2 computation graph as a directed topology diagram.
130+
131+
```bash
132+
# Headless PNG
133+
uv run python src/visualize_graph.py monitoring_sessions/<name> --no-show --output graph.png
134+
135+
# Interactive (click nodes to see topic detail popups)
136+
uv run python src/visualize_graph.py monitoring_sessions/<name> --show
137+
```
138+
139+
Or via make:
140+
141+
```bash
142+
make pipeline-graph
143+
make pipeline-graph SESSION=20260306_154140
144+
```
145+
146+
## Grafana Dashboard Commands
147+
148+
| Command | Description |
149+
|---------|-------------|
150+
| `make grafana-start` | Start Grafana + Prometheus (Docker) |
151+
| `make grafana-stop` | Stop the stack |
152+
| `make grafana-status` | Check services — shows URL http://localhost:30000 |
153+
| `make grafana-export SESSION=<name>` | Export session metrics to Prometheus |
154+
| `make grafana-export-live` | Continuously export live monitoring data |
155+
| `make grafana-open` | Open dashboard in browser |
156+
157+
Metrics are exposed on **port 9092** (Prometheus occupies 9090 in
158+
host-network mode). Prometheus is pre-configured to scrape `localhost:9092`.
159+
160+
## Remote Monitoring
161+
162+
| Component | How it works |
163+
|-----------|-------------|
164+
| Graph monitor | DDS peer discovery via `CYCLONEDDS_URI` / `ROS_STATIC_PEERS` |
165+
| Resource monitor | Runs `ps` and `pidstat` over SSH |
166+
167+
Results are stored and visualized **locally** on the monitoring machine.
168+
169+
```bash
170+
make monitor-remote REMOTE_IP=192.168.1.100
171+
make monitor-remote REMOTE_IP=192.168.1.100 REMOTE_USER=ros NODE=/slam_toolbox
172+
uv run python src/monitor_stack.py --remote-ip 192.168.1.100 --pid-only --duration 120
173+
```
174+
175+
## Troubleshooting
176+
177+
| Problem | Fix |
178+
|---------|-----|
179+
| No ROS2 processes found | Run `ros2 node list` to verify nodes are up |
180+
| Monitor exits immediately | Source ROS2: `source /opt/ros/humble/setup.bash` |
181+
| Visualizations not generated | Run `make visualize-last` manually |
182+
| Permission denied | Run `uv sync` if modules are missing |
183+
| Remote: no data | Check SSH auth and matching `ROS_DOMAIN_ID` |
184+
| CPU shows e.g. "563%" | Normal — 100% = 1 core. Check **Avg Cores** column. |
185+
| `grafana-export` port in use | `fuser -k 9092/tcp && make grafana-export SESSION=<name>` |
186+
| Graph click does nothing | Use `--show` flag to enable TkAgg interactive mode |
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
<!--
2+
Copyright (C) 2026 Intel Corporation
3+
4+
SPDX-License-Identifier: Apache-2.0
5+
-->
6+
7+
# Practical Examples
8+
9+
## Quick Performance Check
10+
11+
A 30-second snapshot to verify system health.
12+
13+
**Prerequisites:** ROS2 system running, monitoring stack installed.
14+
15+
1. Source your ROS2 environment:
16+
17+
<!--hide_directive::::{tab-set}hide_directive-->
18+
<!--hide_directive:::{tab-item}hide_directive--> **Jazzy**
19+
<!--hide_directive:sync: jazzyhide_directive-->
20+
21+
```bash
22+
source /opt/ros/jazzy/setup.bash
23+
```
24+
25+
<!--hide_directive:::hide_directive-->
26+
<!--hide_directive:::{tab-item}hide_directive--> **Humble**
27+
<!--hide_directive:sync: humblehide_directive-->
28+
29+
```bash
30+
source /opt/ros/humble/setup.bash
31+
```
32+
33+
<!--hide_directive:::hide_directive-->
34+
<!--hide_directive::::hide_directive-->
35+
36+
2. Launch your ROS2 system:
37+
38+
```bash
39+
ros2 launch my_robot robot.launch.py
40+
```
41+
42+
3. In a new terminal, run the quick check (completes automatically):
43+
44+
```bash
45+
make quick-check
46+
```
47+
48+
4. Review auto-generated results:
49+
50+
```bash
51+
ls monitoring_sessions/latest/visualizations/
52+
```
53+
54+
**Output files:**
55+
56+
| File | Contents |
57+
|------|---------|
58+
| `timing_delays.png` | Processing delays per node |
59+
| `message_frequencies.png` | Topic Hz over time |
60+
| `cpu_usage_timeline.png` | CPU usage over time |
61+
| `cpu_heatmap.png` | CPU distribution across cores |
62+
63+
## Monitor a Specific Node
64+
65+
Detailed monitoring of a single ROS2 node for performance analysis.
66+
67+
**Use when:** Analyzing a particular node's processing delays, CPU/memory usage,
68+
or identifying bottlenecks.
69+
70+
```bash
71+
# 1. Find available nodes
72+
ros2 node list
73+
74+
# 2. Start monitoring (runs until Ctrl+C)
75+
make monitor NODE=/slam_toolbox
76+
77+
# 3. Let it run while your system operates normally
78+
79+
# 4. Press Ctrl+C — visualizations are auto-generated
80+
ls monitoring_sessions/latest/visualizations/
81+
```
82+
83+
With a fixed duration:
84+
85+
```bash
86+
make monitor NODE=/slam_toolbox DURATION=120 # 2 minutes
87+
```
88+
89+
Using Python directly for a named session:
90+
91+
```bash
92+
uv run python src/monitor_stack.py --node /slam_toolbox --session slam_analysis
93+
```
94+
95+
**What to look for in results:**
96+
97+
- `timing_delays.png` — High delays indicate callback bottlenecks
98+
- `cpu_usage_timeline.png` — CPU spikes correlate with processing load
99+
- `cpu_heatmap.png` — Uneven distribution may indicate single-threaded bottlenecks
100+
- `message_frequencies.png` — Irregular rates can reveal queue or scheduling issues
101+
102+
## Debug a Performance Issue
103+
104+
Step-by-step guide to isolate and diagnose a performance problem.
105+
106+
**Scenario:** Your robot is running slowly and you suspect a specific node.
107+
108+
### Step 1 — Identify the Problematic Process
109+
110+
```bash
111+
uv run python src/monitor_resources.py --list
112+
```
113+
114+
Look for processes with unexpectedly high CPU usage.
115+
116+
### Step 2 — Start Detailed Monitoring
117+
118+
```bash
119+
uv run python src/monitor_stack.py --node /problematic_node --session debug_session_1
120+
```
121+
122+
### Step 3 — Reproduce the Issue
123+
124+
While monitoring is running, execute the operations that trigger the performance
125+
problem. Let it run for at least 30–60 seconds to collect representative data.
126+
127+
### Step 4 — Stop and Analyze
128+
129+
```bash
130+
# Press Ctrl+C — visualizations are auto-generated
131+
ls monitoring_sessions/debug_session_1/visualizations/
132+
```
133+
134+
For deeper inspection:
135+
136+
```bash
137+
# Inspect raw timing data
138+
cat monitoring_sessions/debug_session_1/graph_timing.csv
139+
140+
# Check resource patterns
141+
tail -100 monitoring_sessions/debug_session_1/resource_usage.log
142+
```
143+
144+
### Step 5 — Interpret Results
145+
146+
| Symptom | Possible causes | Next steps |
147+
|---------|----------------|-----------|
148+
| Spikes in `timing_delays.png` | Heavy callback computation, blocking I/O | Profile the node's code; check for synchronous I/O |
149+
| Peaks in `cpu_usage_timeline.png` | Periodic heavy computation, message bursts | Review periodic timers; check queue sizes |
150+
| Concentrated `cpu_heatmap.png` | Single-threaded bottleneck | Consider multi-threaded callbacks; review executor config |
151+
| Irregular `message_frequencies.png` | Network latency, scheduler pressure | Check DDS QoS settings; review publisher rates |
152+
153+
### Step 6 — Validate a Fix
154+
155+
After making changes, record a second session and compare:
156+
157+
```bash
158+
uv run python src/monitor_stack.py --node /problematic_node --session debug_session_2 --duration 60
159+
160+
# Compare visualizations side by side
161+
diff -r monitoring_sessions/debug_session_1/visualizations/ \
162+
monitoring_sessions/debug_session_2/visualizations/
163+
```
164+
165+
Use `make compare-sessions` for an automated comparison report.
166+
167+
## Monitor a Navigation Stack
168+
169+
```bash
170+
# Terminal 1: start the navigation stack
171+
ros2 launch nav2_bringup tb3_simulation_launch.py
172+
173+
# Terminal 2: monitor interactively
174+
./quickstart
175+
# Choose: 1) Monitor my ROS2 application
176+
```
177+
178+
## Before/After Optimization Comparison
179+
180+
```bash
181+
# Before optimization
182+
make monitor NODE=/my_node DURATION=120
183+
184+
# Make your code changes, then run again
185+
make monitor NODE=/my_node DURATION=120
186+
187+
# Compare sessions
188+
make compare-sessions
189+
```

0 commit comments

Comments
 (0)