Background
PR #183 fixed TRANSIENT_LOCAL interop with rmw_zenoh_cpp by switching to AdvancedPublisher/AdvancedSubscriber. The root cause — express(true) being mistaken for a caching mechanism — went undetected because:
- No existing test exercised the late-joiner case (subscriber created after publisher emits samples). All QoS tests subscribed first, then published, which works fine even without a cache.
- No automated test verified ros-z behaviour against a running rmw_zenoh_cpp node, only ros-z ↔ ros-z.
PR #183 adds a handful of late-joiner tests, but they only cover a small slice of the QoS space. A more thorough conformance effort is warranted to catch the next class of silent QoS bug.
Goals
1. Internal QoS conformance matrix (ros-z ↔ ros-z)
Add a parameterised test matrix in ros-z-tests/ (or crates/ros-z/tests/) that, for every meaningful combination of:
QosDurability: Volatile, TransientLocal
QosReliability: Reliable, BestEffort
QosHistory: KeepLast(1), KeepLast(10), KeepAll
- Join ordering: subscriber-first, publisher-first (late-join)
- Subscribers: single, two (early + late)
…verifies that the subscriber receives the expected sample set. The expectation table should be derived from ROS 2 QoS semantics, not ros-z behaviour, so it serves as a spec.
Suggested implementation:
2. Cross-implementation interop tests (ros-z ↔ rmw_zenoh_cpp)
The harder part: automated tests that spawn a real rmw_zenoh_cpp publisher (or subscriber) and verify ros-z agrees on delivery semantics across the wire.
Open design questions:
- How is the rmw_zenoh_cpp node spawned from a Rust test? Likely via the existing
ros-z-bridge test infrastructure or ./scripts/test-ros-packages.nu. (See .claude/ros-z/rules/testing.md — nix develop subprocesses are forbidden; everything must come from the flake.)
- Which subset of the conformance matrix should run in cross-impl mode? CI cost will dominate; pick the QoS combos that are common in practice (
/tf_static, /parameter_events, defaults).
- Should this live in
ros-z-tests/ or in a new crate?
Not blocking on these answers — file a design sub-issue once we start.
Why both, not just one
The internal matrix is cheap to add and catches drift inside ros-z. But it cannot catch wire-format / advanced-builder configuration mismatches with the C++ rmw — that's exactly the class of bug PR #183 fixed. Both layers are needed.
Related
Background
PR #183 fixed
TRANSIENT_LOCALinterop with rmw_zenoh_cpp by switching toAdvancedPublisher/AdvancedSubscriber. The root cause —express(true)being mistaken for a caching mechanism — went undetected because:PR #183 adds a handful of late-joiner tests, but they only cover a small slice of the QoS space. A more thorough conformance effort is warranted to catch the next class of silent QoS bug.
Goals
1. Internal QoS conformance matrix (ros-z ↔ ros-z)
Add a parameterised test matrix in
ros-z-tests/(orcrates/ros-z/tests/) that, for every meaningful combination of:QosDurability:Volatile,TransientLocalQosReliability:Reliable,BestEffortQosHistory:KeepLast(1),KeepLast(10),KeepAll…verifies that the subscriber receives the expected sample set. The expectation table should be derived from ROS 2 QoS semantics, not ros-z behaviour, so it serves as a spec.
Suggested implementation:
rstestor a build.rs-generated table to keep the test file maintainable.wait_for_countpolling helper introduced in PR feat(pubsub-use-advanced): use AdvancedPublisher/AdvancedSubscriber to interoperate with rmw_zenoh_cpp transient_local topics #183 — nosleep-based assertions.2. Cross-implementation interop tests (ros-z ↔ rmw_zenoh_cpp)
The harder part: automated tests that spawn a real rmw_zenoh_cpp publisher (or subscriber) and verify ros-z agrees on delivery semantics across the wire.
Open design questions:
ros-z-bridgetest infrastructure or./scripts/test-ros-packages.nu. (See.claude/ros-z/rules/testing.md—nix developsubprocesses are forbidden; everything must come from the flake.)/tf_static,/parameter_events, defaults).ros-z-tests/or in a new crate?Not blocking on these answers — file a design sub-issue once we start.
Why both, not just one
The internal matrix is cheap to add and catches drift inside ros-z. But it cannot catch wire-format / advanced-builder configuration mismatches with the C++ rmw — that's exactly the class of bug PR #183 fixed. Both layers are needed.
Related