All notable changes to UTun will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
TunnelSessionabstraction encapsulating a complete tunnel connection (codec, writer, demux, heartbeat) with per-session shutdown, enabling concurrent sessions for blue-green refresh.TunnelRecoverySignalchannel replacing the deadreconnection_neededAtomicBool -- heartbeat, demux, and writer tasks now signal failures via typed messages (HeartbeatDead,DemuxExited,WriterExited).- Runtime tunnel recovery in
run()-- recovery signals are received and processed in the main select loop, triggering teardown of the failed session and reconnection with exponential backoff. - Blue-green proactive connection refresh -- a configurable timer (
connection_refresh_interval_secs, default 1 hour) establishes a new tunnel session, atomically swaps it as active, and gracefully drains the old session. retire_session()method for graceful session draining with configurable timeout (connection_drain_timeout_secs, default 60s) before force-closing old sessions.HeartbeatState::reset()method for clearing all tracking fields.connection_refresh_interval_secsandconnection_drain_timeout_secsconfiguration fields onSourceConfig.- Comprehensive reconnection test suite (
tests/reconnection_tests.rs) covering heartbeat recovery, demux/writer failure detection, full recovery cycles, blue-green zero-downtime refresh, drain timeout, circuit breaker integration, and backoff timing. write_queue_sizeconfiguration field onSourceConfig(default 8192) for bounded write queue backpressure.- Transient error retry (single retry after 50ms) on dest target write path for WouldBlock, Interrupted, and TimedOut errors.
- Heartbeat now breaks and signals recovery after
max_missed_pongsconsecutive timeouts instead of logging forever and setting a flag nobody reads. - Demux task now sends
DemuxExitedrecovery signal on read errors instead of silently exiting. - Writer task now sends
WriterExitedrecovery signal on write errors instead of silently exiting. - DemuxWatchdog was created but never started (
spawn_watchdog()was never called) -- replaced with the recovery signal channel which is always active. - Runtime reconnection now works: previously
ReconnectionManageronly ran duringstart(), so dead tunnels afterrun()began stayed dead permanently.
- Extracted tunnel connection state from flat
SourceContainerfields intoTunnelSessionstruct, enabling multiple concurrent sessions. handle_client()now capturesArc<TunnelSession>at connection time, binding the client to its session for the connection lifetime (supports blue-green: old clients drain on old session, new clients use new session).- Heartbeat, demux, and writer tasks no longer clone
SourceContainer-- they receive individual fields and a per-session shutdown channel. - Writer task now owns
OwnedWriteHalfdirectly instead of going throughArc<Mutex<Option<OwnedWriteHalf>>>, eliminating lock contention entirely. send_frame()andregistry_count()now route through the active session.- Removed
frame_codec,tunnel_stream,tunnel_read,tunnel_write,write_queue_tx,connection_registry,heartbeat_state,demux_handle,demux_watchdog, andreconnection_neededfields fromSourceContainer. - Source writer now batches pending frames via try_recv() and flushes once per batch instead of per-frame.
- TCP_NODELAY set on source tunnel socket, client sockets, and dest target sockets for lower latency.
- All crypto atomic orderings relaxed from SeqCst to Relaxed (nonce counter, sequence counters).
HeartbeatStatereplaced 5xArc<RwLock<T>>fields with lock-free atomics using epoch-relative microsecond timestamps. All methods now synchronous.Connection.statereplacedRwLock<ConnectionState>withAtomicU8.state()andset_state()now synchronous.Connection.last_activityreplacedRwLock<Instant>with epoch-relativeAtomicU64.touch()andis_idle()now synchronous.cleanup_stale()now uses two-phase locking: read lock to identify stale IDs, write lock only for removal.- Per-frame flush removed from dest target data handler (TCP_NODELAY ensures immediacy).
- Write queue changed from unbounded to bounded channel (configurable via
write_queue_size). Heartbeat usestry_send()to avoid blocking on full queue. - Frame read buffers reused across iterations in both source demux and dest read loops to reduce per-frame heap allocations.
- Critical connection leak on dest side caused by ID mismatch between ConnectionManager (auto-generated IDs) and target_connections/frame connection_id (source IDs). ConnectionManager.remove_connection() was always a no-op on the dest, causing unbounded growth until the 8000 connection limit was hit with no recovery.
- Reader task on dest side now removes connections from both target_connections and ConnectionManager on close, preventing leaked entries.
- FIN Data frame handler now removes connections from ConnectionManager in addition to target_connections.
- Tunnel disconnect cleanup now calls ConnectionManager.close_all() to release all tracked connections.
- Source demux task now removes dead channel entries from the connection registry when a send fails, preventing stale entries from accumulating.
ConnectionManager::create_connection_with_id()method allowing caller-provided IDs so the dest side uses the same connection IDs as the source, eliminating the ID mismatch bug.ConnectionManager::get_connection()for read-only connection lookups.- Periodic stale connection cleanup task on the dest side that runs every N seconds (configurable via
stale_cleanup_interval_secs), reaping idle and closed connections from both ConnectionManager and target_connections. stale_cleanup_interval_secsconfiguration field on DestConfig (default: 15 seconds).connection_count()andtarget_connection_count()observability methods on DestContainer.connection_count()andregistry_count()observability methods on SourceContainer.- Unit tests for
create_connection_with_id,cleanup_stalereturn values, and idle timeout cleanup.
ConnectionManager::cleanup_stale()now returnsVec<u32>(removed IDs) instead ofusize, allowing callers to sync other data structures.- Dest data forwarding now uses a read lock on the target_connections HashMap with a per-connection Mutex on the writer, eliminating the global write lock that blocked all concurrent data forwarding.
TargetConnection.writerchanged fromOwnedWriteHalftoMutex<OwnedWriteHalf>and stored asArc<TargetConnection>to support concurrent access.- Dest writer task now batches pending frames via try_recv() and flushes once per batch instead of per-frame, reducing syscall overhead.
- TCP_NODELAY now set on the tunnel socket at connection start for lower latency.
- Frame read timeout increased from 30s to 60s to prevent heartbeat timeout during idle periods
- Heartbeat task now respects reconnection_enabled setting
- Connection cleanup now properly removes from both demux registry and connection manager
- GitHub Actions release workflow artifact download configuration
- Resilience module with circuit breaker to prevent infinite demux restart loops
- Demux task watchdog for automatic recovery from task failures
- Tunnel metrics tracking for observability (demux restarts, channel full events, frames dropped, lock wait times, heartbeat timeouts)
- Configurable channel sizes for connection management (
connection_channel_sizein source and dest configs) - Circuit breaker configuration options (
circuit_breaker_window_secs,circuit_breaker_max_restarts) - Dedicated writer task for lock-free frame sending via unbounded write queue
- Heartbeat race condition where pong could arrive after timeout check began, now uses atomic flag checked atomically before timeout
- Potential deadlock in destination response channel by switching to unbounded channel (monitored via metrics for backpressure)
- Lock contention during frame writes by implementing dedicated writer task that minimizes critical section to just write/flush operations
- Clippy dead_code warnings by adding appropriate allow attributes to config fields and public API methods not yet used
- Source demux task now takes ownership of tunnel read half for lock-free operation (no mutex on read path)
- Connection channels now use configurable size (default 1024) instead of hardcoded 100
- Heartbeat pong detection now atomic (flag cleared before ping sent, checked atomically after timeout)
- Frame sending now uses unbounded write queue instead of direct writes (eliminates per-frame lock acquisition)
- Frame demultiplexing system in source container to route incoming frames to correct connection handlers via connection registry
- Comprehensive session crypto tests including nonce format validation, replay protection, high-volume unique nonces, bidirectional communication, and out-of-order delivery
- Three new test suites: demux_tests.rs, end_to_end_tests.rs, and full_system_tests.rs for integration testing
- Port reuse support with SO_REUSEADDR and SO_REUSEPORT for better socket management
- Port fallback logic that tries up to 5 additional ports if the configured port is in use
- Session crypto now uses proper session-prefixed nonces (4-byte session prefix + 8-byte counter) instead of counter-only nonces to prevent nonce collisions across sessions
- Timing attack vulnerability by always incrementing sequence counter regardless of decryption success/failure
- Bidirectional crypto key ordering in destination container (swapped enc_key and mac_key for proper symmetric communication)
- Connection state handling by making rx_from_tunnel public for demux access
- Split tunnel stream into separate read/write halves to avoid lock contention
- Refactored frame reception to use dedicated demux task instead of direct receive_frame calls
- Simplified handshake flow by removing excessive logging throughout codebase
- Improved connection handling with better timeout management and cleanup
- Unused counter-mode encryption methods (encrypt_with_counter, decrypt_with_counter, nonce_from_counter)
- Counter-mode tests and benchmarks that are no longer relevant
- Excessive/over-the-top log messages across source and destination containers
- Transparent multi-port forwarding mode for source containers. Each port in
exposed_portsnow gets its own listener that automatically forwards connections to the same port on the destination server. This replaces the previous single-port limitation where all traffic went through one port. - New
modeconfiguration option for source containers with three modes:transparent(default, multi-port),protocol(single entry point), andhybrid(both modes). - Port validation to prevent duplicate ports in
exposed_portsconfiguration. - Example configuration file
examples/config-source-transparent.tomlshowing multi-port setup.
- Source mode now uses actual target port in CONNECT frames instead of hardcoded port 22. Each connection now correctly specifies which destination port to connect to.
- Updated
bytesdependency from 1.11.0 to 1.11.1 to fix integer overflow vulnerability (RUSTSEC-2026-0007). - Updated
timedependency from 0.3.46 to 0.3.47 to fix denial of service vulnerability via stack exhaustion (RUSTSEC-2026-0009). - Improved error messages for certificate file access issues with better diagnostics and fix suggestions.
- Source container now spawns separate listener tasks for each exposed port, using channel-based communication for centralized connection handling.
- Added graceful shutdown for all listener tasks.
- Added safe config reload with blue/green swap so invalid configs no longer crash the running container.
- Dynamic handshake buffer size based on KEM mode. Hybrid and McEliece modes now use 2MB buffers to accommodate ~500KB Classic McEliece public keys. ML-KEM-768 mode uses 64KB buffers. This fixes "Frame size exceeds maximum" errors when using hybrid KEM mode.
- Increased handshake timeout from 3-10 seconds to 30 seconds per message to allow time for large key exchanges.
- Fixed PEM certificate handling in handshake. Certificates are now properly converted from PEM to DER format before verification. This fixes "Certificate verification failed" errors when using PEM-encoded certificate files.
- Fixed keypair storage in handshake. Both client and server now properly store their ephemeral keypairs and reuse them for decapsulation. Previously, new keypairs were incorrectly generated during decapsulation, causing key exchange failures.
- Fixed shared secret ordering in verify_data and session key derivation. Both client and server now use canonical ordering (client-to-server, server-to-client) regardless of which side is computing. This ensures both sides derive identical verify_data hashes and session keys.
- Fixed handshake transcript ordering. Verify_data is now computed before adding the finished message to the transcript, ensuring both sides compute the same hash over the same transcript state.
- Added graceful shutdown handling for Docker containers. The process now properly handles SIGTERM and SIGINT signals, triggering a clean shutdown with a 30-second timeout. This fixes containers that would not stop or disconnect cleanly.
- New
max_handshake_sizeoption in[crypto]config section to manually override the handshake buffer size if needed.
- Post-quantum cryptography with hybrid KEM (ML-KEM-768 + Classic McEliece-460896)
- AES-256-GCM symmetric encryption with HKDF key derivation
- Mutual TLS authentication with custom CA support
- Automatic key rotation and session rekeying
- TCP and UDP protocol tunneling
- Connection pooling and efficient connection management
- Prometheus metrics endpoint
- Health check endpoint for container orchestration
- Docker and Docker Compose support
- Certificate generation and management CLI
- Source and destination container modes
- Configuration via TOML files
- Initial implementation of quantum-safe tunnel system
- Core cryptographic primitives
- Network layer with TCP/UDP support
- Certificate management utilities
- Example configurations
- Build scripts and Docker support
- Switched the hybrid KEM from Kyber-768 to ML-KEM-768 (pqcrypto-mlkem)
- Certificate CLI now requires
--out-cert/--out-keyand will not print PEM material to stdout