perf(mavlink): remove mutex from forward_message hot path#26927
perf(mavlink): remove mutex from forward_message hot path#26927
Conversation
Remove mavlink_module_mutex from forward_message() to eliminate per-message lock overhead in the forwarding path. The mutex is unnecessary because: - Instances can only be destroyed via destroy_all_instances(), which stops all threads before modifying the array - Pointer reads from mavlink_module_instances[] are naturally atomic on aligned architectures (ARM, x86) - pass_message() has its own per-instance _message_buffer_mutex Benchmark results (SITL SIH, 2 connections, wall-clock timing): 200 Hz (400 msg/s): avg 648 ns -> 563 ns (-13%) 500 Hz (1000 msg/s): avg 469 ns -> 407 ns (-13%) Mutex lock section: avg 327 ns -> 262 ns (-20%) Benchmark gist: https://gist.github.com/mrpollo/167d4ec7436fb773bf193be6c2800639 Also includes a MAVSDK C++ traffic generator tool for reproducing the benchmark. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to improve MAVLink forwarding performance by removing the global mavlink_module_mutex from the Mavlink::forward_message() hot path, and adds a standalone MAVSDK-based benchmark tool + runner script to measure forwarding-path performance and related perf counters.
Changes:
- Remove the
mavlink_module_mutexlock inMavlink::forward_message()to reduce per-message overhead. - Add a MAVSDK C++ benchmark that floods broadcast MAVLink messages across multiple UDP connections.
- Add a helper shell script and standalone CMake file to build and run the benchmark against PX4 SITL.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/modules/mavlink/mavlink_main.cpp |
Removes the forwarding-path mutex and updates inline rationale. |
test/mavsdk_tests/test_mavlink_forwarding_bench.cpp |
Adds a multi-connection sender benchmark for stressing forward_message() contention/perf. |
test/mavsdk_tests/run_forwarding_bench.sh |
Adds an end-to-end script to build SITL + benchmark and gather logs/results. |
test/mavsdk_tests/CMakeLists_bench.txt |
Adds a standalone CMake definition for building the benchmark binary. |
Comments suppressed due to low confidence (1)
src/modules/mavlink/mavlink_main.cpp:525
- Removing the global lock here introduces a data race and potential use-after-free: other code paths still modify and delete entries in
mavlink_module_instances[]while receiver threads can be running (e.g.Mavlink::stop_command()removes an instance anddeletes it undermavlink_module_mutex). Iterating the raw pointer array without synchronization can concurrently observe a pointer that is being cleared/freed, and calling methods on it is undefined behavior.
If the goal is to eliminate per-message lock overhead, consider switching mavlink_module_instances[] to an atomic pointer array (with acquire/release semantics) and using a safe reclamation strategy (or restricting deletion to destroy_all_instances() by changing stop_command() accordingly). Otherwise this needs to keep synchronization around the iteration.
// No mutex needed: instances can't be destroyed while receiver threads run
// (destroy_all_instances() stops all threads first). Pointer reads are
// naturally atomic on aligned architectures. pass_message() has its own
// per-instance _message_buffer_mutex for the ring buffer.
for (Mavlink *inst : mavlink_module_instances) {
if (inst && (inst != self) && (inst->get_forwarding_on())) {
// Pass message only if target component was seen before
if (inst->_receiver.component_was_seen(target_system_id, target_component_id)) {
inst->pass_message(msg);
| int main(int argc, char *argv[]) | ||
| { | ||
| int duration_sec = 60; | ||
| int rate_hz = 200; | ||
| int num_connections = 2; | ||
| int gcs_port = 14550; | ||
| int api_port = 14540; | ||
| std::string report_file; | ||
|
|
||
| for (int i = 1; i < argc; ++i) { | ||
| std::string arg(argv[i]); | ||
|
|
||
| if (arg == "--duration" && i + 1 < argc) { | ||
| duration_sec = std::atoi(argv[++i]); | ||
|
|
||
| } else if (arg == "--rate" && i + 1 < argc) { | ||
| rate_hz = std::atoi(argv[++i]); | ||
|
|
||
| } else if (arg == "--connections" && i + 1 < argc) { | ||
| num_connections = std::atoi(argv[++i]); |
There was a problem hiding this comment.
rate_hz is parsed from CLI but never validated. If the user passes --rate 0 (or a negative value), 1'000'000 / rate_hz will divide by zero (or produce a negative interval), which will break the sender timing loop.
Please validate rate_hz > 0 (and similarly validate duration_sec > 0, num_connections > 0) and fail with a clear error message before starting threads.
| for (size_t i = 0; i < ports.size(); ++i) { | ||
| auto config = Mavsdk::Configuration( | ||
| static_cast<uint8_t>(245 + i), // unique sysid per connection | ||
| MAV_COMP_ID_MISSIONPLANNER, | ||
| false // not autopilot | ||
| ); | ||
| auto mavsdk = std::make_unique<Mavsdk>(config); | ||
|
|
||
| std::string url = "udp://:" + std::to_string(ports[i]); | ||
| std::cout << "Connecting to " << url << " (sysid=" << (245 + i) << ")..." << std::endl; |
There was a problem hiding this comment.
system_id is derived as static_cast<uint8_t>(245 + i). For larger --connections this silently wraps past 255, producing duplicate sysids and potentially confusing routing/forwarding behavior.
Consider bounding num_connections to a safe range (e.g. so sysid stays within 1..255 and unique), or choose sysids from a configurable base with validation.
| for (size_t i = 0; i < ports.size(); ++i) { | |
| auto config = Mavsdk::Configuration( | |
| static_cast<uint8_t>(245 + i), // unique sysid per connection | |
| MAV_COMP_ID_MISSIONPLANNER, | |
| false // not autopilot | |
| ); | |
| auto mavsdk = std::make_unique<Mavsdk>(config); | |
| std::string url = "udp://:" + std::to_string(ports[i]); | |
| std::cout << "Connecting to " << url << " (sysid=" << (245 + i) << ")..." << std::endl; | |
| constexpr uint8_t sysid_base = 245; | |
| const size_t max_unique_sysids = static_cast<size_t>(255u - sysid_base + 1u); | |
| if (ports.size() > max_unique_sysids) { | |
| std::cerr << "Requested " << ports.size() | |
| << " connections, but only " << max_unique_sysids | |
| << " unique system IDs are available starting at " | |
| << static_cast<unsigned>(sysid_base) << "." << std::endl; | |
| return 1; | |
| } | |
| for (size_t i = 0; i < ports.size(); ++i) { | |
| const uint8_t system_id = static_cast<uint8_t>(sysid_base + static_cast<uint8_t>(i)); | |
| auto config = Mavsdk::Configuration( | |
| system_id, // unique sysid per connection | |
| MAV_COMP_ID_MISSIONPLANNER, | |
| false // not autopilot | |
| ); | |
| auto mavsdk = std::make_unique<Mavsdk>(config); | |
| std::string url = "udp://:" + std::to_string(ports[i]); | |
| std::cout << "Connecting to " << url << " (sysid=" | |
| << static_cast<unsigned>(system_id) << ")..." << std::endl; |
| std::cout << " OK" << std::endl; | ||
|
|
||
| auto system = mavsdk->systems().front(); | ||
| auto *passthrough = new MavlinkPassthrough(system); | ||
| passthroughs.push_back(passthrough); | ||
| mavsdks.push_back(std::move(mavsdk)); |
There was a problem hiding this comment.
passthrough is allocated with new and stored as a raw pointer. There are multiple early return 1; paths above, so this can leak on error, and it makes ownership harder to reason about.
Prefer storing MavlinkPassthrough in a std::unique_ptr vector (or even directly in a vector) so cleanup is automatic on all exit paths.
| echo "" | ||
| echo "=== Step 5: Collect perf counters ===" | ||
|
|
||
| # Send 'perf' command to PX4 via mavlink shell (or direct if available) | ||
| sleep 2 | ||
|
|
||
| echo "" | ||
| echo "=== Step 6: Stop PX4 and extract perf data ===" | ||
| kill -INT "$PX4_PID" 2>/dev/null || true |
There was a problem hiding this comment.
The script claims to “collect perf counter data” and has a “Step 5: Collect perf counters”, but it never actually issues the perf command to the PX4 console/MAVLink shell. As written, ${RESULT_PREFIX}_perf.txt will usually be missing unless PX4 happens to dump perf counters on exit.
Either implement the perf collection step (e.g. via mavlink shell/pxh) or update the step text and header comment so expectations match what the script really does.
| echo "=== Step 6: Stop PX4 and extract perf data ===" | ||
| kill -INT "$PX4_PID" 2>/dev/null || true | ||
| sleep 3 | ||
|
|
||
| # PX4 dumps perf counters on exit in some configurations. | ||
| # Extract perf lines from the log. | ||
| if grep -q "mavlink: fwd_msg" "${RESULT_PREFIX}_px4.log"; then | ||
| echo "Perf counters from PX4 log:" | ||
| grep -E "mavlink: (fwd_|comp_seen|unsigned_cb)" "${RESULT_PREFIX}_px4.log" | tee "${RESULT_PREFIX}_perf.txt" |
There was a problem hiding this comment.
After sending SIGINT to PX4, the script sleeps for a fixed 3 seconds but never waits for the process to exit. If shutdown takes longer, the log may be incomplete when you grep for perf lines, and the PX4 process can be left running in the background.
Consider wait "$PX4_PID" (optionally with a timeout/fallback SIGKILL) before extracting results.
- Validate rate, duration, and connections args (reject <= 0) - Cap connections at 10 to prevent sysid overflow past 255 - Use unique_ptr for MavlinkPassthrough (no manual delete) - Fix runner script: document that perf is read via MAVLink shell, wait for PX4 process on shutdown Signed-off-by: Ramon Roche <mrpollo@gmail.com>
💡 Commit messages could be improvedNot blocking, but these commit messages could use some cleanup.
See the commit message convention for details. This comment will be automatically removed once the issues are resolved. |
Remove
mavlink_module_mutexfromforward_message()to eliminate per-message lock overhead in the MAVLink forwarding hot path.Spun off from discussion in #26894 with @dakejahl.
The mutex is unnecessary because instances can only be destroyed via
destroy_all_instances(), which stops all receiver threads before modifying the array. Pointer reads frommavlink_module_instances[]are naturally atomic on aligned architectures.pass_message()has its own per-instance_message_buffer_mutexfor the ring buffer.Benchmarked on three platforms (200 Hz, 30-60s,
clock_gettime(CLOCK_MONOTONIC)):Full methodology and raw data: https://gist.github.com/mrpollo/167d4ec7436fb773bf193be6c2800639
Also includes a MAVSDK C++ benchmark tool for reproducing the measurements.