Skip to content

Support io_uring#2332

Open
yellowhatter wants to merge 99 commits into
eclipse-zenoh:mainfrom
ZettaScaleLabs:io_uring
Open

Support io_uring#2332
yellowhatter wants to merge 99 commits into
eclipse-zenoh:mainfrom
ZettaScaleLabs:io_uring

Conversation

@yellowhatter

@yellowhatter yellowhatter commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

Summary

This PR introduces io_uring support to Zenoh behind the uring feature flag.

The implementation is intentionally scoped as an internal transport/runtime enhancement rather than a public API change. The patch adds a new internal crate, commons/zenoh-uring, and wires it into the workspace so the io_uring-specific memory, batching, reader, and writer logic can live outside the public surface area.

What this changes

Transport integration

  • Adds uring feature that is efficient only for Linux.
  • Adds io_uring support to the Zenoh transport stack.
  • Integrates the new I/O path with most links.
  • Adds automatic read-mode selection between io_uring and tokio.
  • Updates transport-side code paths to route io_uring operations through dedicated transport support code.

Link coverage

  • Adds support for streamed/socket-based links.
  • Keeps the existing transport model intact so the non-uring path still works normally.

Buffer and memory handling

  • Introduces a dedicated page-backed arena for uring buffers.
  • Allocates pinned memory with mmap/mlock and exposes it both as io_uring provided buffers and as registered iovec buffers.

Reader / writer path

  • Adds dedicated reader and writer implementations for the uring path.
  • Adds zero-copy oriented handling for split and fragmented payloads.
  • Improves RX batch reclamation so large payloads can be handled with less memory waste.
  • Tunes batch growth behavior to better support big messages and improve footprint.

Error handling and runtime integration

  • Passes Tokio executor context into the uring thread so Tokio-aware user code remains compatible.

Build / CI / test coverage

  • Adds the uring crate and dependency wiring to the workspace.
  • Ensures Zenoh still compiles when the uring feature is disabled.
  • Adds uring to CI.
  • Expands uring-related tests, including dynamic-port coverage and extended UDP test paths.

Design notes

The implementation follows a conservative enhancement approach:

  • no public API expansion,
  • no behavioral change for users who do not enable uring,
  • all uring-specific code isolated behind feature gates and internal crates,
  • transport and buffer-management changes kept close to the existing architecture instead of introducing a separate parallel stack.

Expected user impact

For users building with -F uring, this adds a new Linux I/O backend that reduces overhead in the transport path and improve handling of larger or fragmented payloads. For users who do not enable the feature, the existing build and transport behavior remain unchanged. The improvement gives 25-50% latency reduction and few % messaging rate increase. Performance tests included in comments to this PR below.

Validation

Validation for this PR is centered on:

  • successful compilation with and without uring,
  • CI coverage for the new feature,
  • transport-level tests for the new I/O path,
  • UDP and split-buffer / fragmentation scenarios,
  • correctness of buffer reclamation and RX finalization behavior.

🏷️ Label-Based Checklist

Based on the labels applied to this PR, please complete these additional requirements:

Labels: enhancement

✨ Enhancement Requirements

Since this PR enhances existing functionality:

  • Enhancement scope documented - Clear description of what is being improved
  • Minimum necessary code - Implementation is as simple as possible, doesn't overcomplicate the system
  • Backwards compatible - Existing code/APIs still work unchanged
  • No new APIs added - Only improving existing functionality
  • Tests updated - Existing tests pass, new test cases added if needed
  • Performance improvement measured - If applicable, before/after metrics provided
  • Documentation updated - Existing docs updated to reflect improvements
  • User impact documented - How users benefit from this enhancement

Remember: Enhancements should not introduce new APIs or breaking changes.

Instructions:

  1. Check off items as you complete them (change - [ ] to - [x])
  2. The PR checklist CI will verify these are completed

This checklist updates automatically when labels change, but preserves your checked boxes.

@yellowhatter yellowhatter self-assigned this Dec 23, 2025
@yellowhatter yellowhatter added the enhancement Existing things could work better label Dec 23, 2025
@codecov

codecov Bot commented Feb 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.15207% with 227 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.08%. Comparing base (e03b0c0) to head (4487490).
⚠️ Report is 9 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
commons/zenoh-uring/src/linux/reader/window.rs 55.70% 66 Missing ⚠️
commons/zenoh-uring/src/linux/api/reader/mod.rs 82.60% 36 Missing ⚠️
commons/zenoh-uring/src/linux/batch_arena.rs 62.19% 31 Missing ⚠️
...noh-uring/src/linux/api/reader/fragmented_batch.rs 47.22% 19 Missing ⚠️
commons/zenoh-uring/src/linux/page_arena.rs 72.85% 19 Missing ⚠️
...mons/zenoh-uring/src/linux/api/reader/rx_buffer.rs 58.62% 12 Missing ⚠️
...mons/zenoh-uring/src/linux/api/reader/read_task.rs 76.74% 10 Missing ⚠️
...s/zenoh-uring/src/linux/reader/reservable_arena.rs 90.62% 9 Missing ⚠️
commons/zenoh-buffers/src/zbuf.rs 46.15% 7 Missing ⚠️
io/zenoh-transport/src/common/batch.rs 72.00% 7 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2332      +/-   ##
==========================================
- Coverage   74.11%   74.08%   -0.04%     
==========================================
  Files         400      414      +14     
  Lines       61030    62032    +1002     
==========================================
+ Hits        45234    45954     +720     
- Misses      15796    16078     +282     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

- make growing RX batch memory (to optimize mem footprint and  support zero-copy for big payloads)
- exponential grow for uring batch arena
- tune RX arena size based on config
- add async wait for uring-global errors (as opposed to rx-task-related errors)
- supply uring thread with tokio executor context to be compatible with tokio-aware user code
Conflicts:
	io/zenoh-transport/src/unicast/universal/link.rs
@yellowhatter

Copy link
Copy Markdown
Contributor Author

100Hz:

main:
image

uring:
image

@yellowhatter

Copy link
Copy Markdown
Contributor Author

1kHz:

main:
image

uring:
image

@yellowhatter

Copy link
Copy Markdown
Contributor Author

max freq

main:
image

uring:
image

- support transport_compression for io_uring
- fix stats for io_uring
@yellowhatter yellowhatter marked this pull request as ready for review April 21, 2026 08:26
@yellowhatter yellowhatter changed the title WIP on io_uring Support io_uring Apr 21, 2026
@yellowhatter

Copy link
Copy Markdown
Contributor Author

Docs update: zenoh-rs/zenoh-web#127

@yellowhatter

Copy link
Copy Markdown
Contributor Author

@YuanYuYuan can you please validate this performance on your side? The issue should be fixed now

@YuanYuYuan

Copy link
Copy Markdown
Contributor

@YuanYuYuan can you please validate this performance on your side? The issue should be fixed now

The evaluation is finally out. The results look quite good. https://gist.github.com/YuanYuYuan/f919ebf7eb4b171e502aa13536e5d5db

- Remove debug sleep(100ms) from z_ping warmup and measurement loops
- Revert unrelated liveliness test weakening: restore order-specific
  assert_eq! checks, remove HashSet indirection, #[ignore] attrs,
  debug println!s, and commented-out code
- Delete orphaned linux/memory/ directory (never compiled, duplicate
  of reader/ left over from the split-into-files refactor)
- Degrade gracefully when io_uring reactor init fails (Option<Uring>
  in TransportManagerState, warn + fallback to tokio instead of error)
- Enforce lease timeout on the uring RX task via TimeoutTracker; reset
  on each received fragment, bail on expiry (was TODO/unimplemented)
- Restrict get_fd() to Connected UDP only; unconnected sockets are
  shared across peers and cannot be demuxed by io_uring RecvMulti
- Fix BatchHeader::new() called without transport_compression feature;
  use tuple constructor BatchHeader(h) which is always available
Strip the unused thread-priority/realtime scheduling block, commented
//ring.submit(), //let len, alternate enter() calls, and the no-op
//bail! match arms from api/reader/mod.rs. Remove the commented log!
macro from window.rs. Remove commented writer field and constructor
from uring.rs. Fix "Urng" typo in reactor thread log message.
YuanYuYuan and others added 2 commits June 12, 2026 22:59
The io_uring TX writer is not wired into the transport layer in this PR.
Remove linux/writer/ and linux/api/writer/ rather than shipping unused
public API. Remove thread-priority from Cargo.toml; its only consumer
was the commented-out realtime scheduling block already deleted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Existing things could work better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants