Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions docs/dev/issue-6403-tcp-bind-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Issue #6403 Plan: Allocate TCP Ephemeral Ports at `bind()`

Issue: [wasmerio/wasmer#6403](https://github.com/wasmerio/wasmer/issues/6403)

## Problem Summary

For TCP sockets in Wasix, `bind(("host", 0))` currently stores the requested address but does not perform a real backend bind. As a result:

- `getsockname()` after `bind()` still reports port `0`
- the kernel-assigned ephemeral port only appears after `listen()`
- code that relies on POSIX behavior breaks

Native systems allocate the ephemeral port at `bind()` time, not at `listen()` time.

## Root Cause

The relevant behavior is in `lib/wasix/src/net/socket.rs`.

- `InodeSocket::bind()` stores the requested TCP address for `PreSocket` / `RemoteSocket` stream sockets and returns without calling into the networking backend.
- `InodeSocket::addr_local()` for those pre-listen socket states simply returns the stored address, which remains `host:0`.
- `InodeSocket::listen()` is the first place that actually calls `net.listen_tcp(...)`, so the real OS bind and ephemeral port allocation happen too late.

This is not just a `getsockname()` reporting bug. The underlying port is not actually reserved until `listen()`.

## Secondary Constraint

The current `virtual-net` abstraction exposes:

- `listen_tcp(...)`
- `bind_udp(...)`
- `connect_tcp(...)`

but it does not expose a TCP bind primitive that can:

- perform a real bind without listening yet
- report the effective local address after binding
- later transition into `listen()` or `connect()`

That means the fix needs to extend the backend abstraction rather than only patching Wasix-local state.

## Proposed Fix

### 1. Add a regression test first

Add a new socket test under `lib/wasix/tests/wasm_tests/socket_tests/` that:

1. creates an IPv4 TCP socket
2. binds to `127.0.0.1:0`
3. checks that `getsockname().port != 0` immediately after `bind()`
4. calls `listen()`
5. checks that the port stays the same after `listen()`

This locks in the POSIX behavior expected by the issue report.

## 2. Introduce a real TCP-bound socket state in `virtual-net`

Extend `lib/virtual-net` with a TCP bind API and a corresponding bound-socket type that can:

- return `addr_local()`
- transition into a TCP listener
- transition into a TCP stream connection

At a minimum, the new backend capability needs to preserve the actual local port selected during `bind()`.

## 3. Implement the new backend path

### Host backend

Update `lib/virtual-net/src/host.rs` to create a TCP socket explicitly, apply socket options, call `bind()`, and read back the effective local address before any later `listen()` or `connect()` step.

This likely requires `socket2`, similar to the existing UDP bind implementation.

### Loopback backend

Update `lib/virtual-net/src/loopback.rs` so a TCP bind to port `0` allocates an ephemeral port during bind, rather than preserving `0` until listen.

### Remote client/server backend

Update `lib/virtual-net/src/meta.rs`, `client.rs`, and `server.rs` to carry the new TCP bind operation across the remote networking protocol.

Without this, Wasix behavior will diverge depending on which backend is active.

## 4. Update the Wasix socket state machine

In `lib/wasix/src/net/socket.rs`:

- make TCP `bind()` return a real upgraded socket object instead of `Ok(None)`
- add a socket state representing “TCP socket bound locally but not yet listening/connected”
- make `addr_local()` read the effective address from that bound socket state
- make `listen()` consume the bound socket instead of rebinding from scratch
- make `connect()` also honor the previously bound local address

This keeps bind/listen/connect semantics aligned and avoids reporting a port that is not actually reserved.

## 5. Fix journaling semantics

`lib/wasix/src/syscalls/wasix/sock_bind.rs` currently journals the requested address from guest memory, which is wrong for `bind(port=0)`.

After the functional fix:

- query the effective local address after `sock_bind` succeeds
- journal that effective address instead of the requested `host:0`

Otherwise journal replay can observe a different port from the one the program originally saw.

## Implementation Order

1. Add the Wasix regression test for `bind(..., 0)` + `getsockname()`.
2. Add the new TCP bind abstraction in `virtual-net`.
3. Implement the host backend first.
4. Update Wasix socket state transitions to use the real bound socket.
5. Update journaling to store the effective address.
6. Extend loopback and remote client/server backends.
7. Run targeted socket tests and any relevant `virtual-net` tests.

## Non-Goals

- Faking `getsockname()` by inventing a port in Wasix state without actually reserving it
- Fixing only the listen path while leaving bind-then-connect semantics inconsistent
- Fixing only the host backend and leaving other `virtual-net` backends with different behavior

## Expected Outcome

After the fix:

- `bind(("127.0.0.1", 0))` allocates a real ephemeral port immediately
- `getsockname()` reports the assigned port right after `bind()`
- `listen()` keeps the same local port
- journal replay preserves the same observed bound address
105 changes: 105 additions & 0 deletions lib/virtual-net/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ use crate::VirtualIoSource;
use crate::VirtualNetworking;
use crate::VirtualRawSocket;
use crate::VirtualSocket;
use crate::VirtualTcpBoundSocket;
use crate::VirtualTcpListener;
use crate::VirtualTcpSocket;
use crate::VirtualUdpSocket;
Expand Down Expand Up @@ -276,6 +277,7 @@ impl RemoteNetworkingClient {
buffer_accept: Default::default(),
buffer_recv_with_addr: Default::default(),
send_available: 0,
owns_socket_bindings: true,
}
}
}
Expand Down Expand Up @@ -760,6 +762,39 @@ impl VirtualNetworking for RemoteNetworkingClient {
}
}

async fn bind_tcp(
&self,
addr: SocketAddr,
only_v6: bool,
reuse_port: bool,
reuse_addr: bool,
) -> Result<Box<dyn VirtualTcpBoundSocket + Sync>> {
let socket_id: SocketId = self
.common
.socket_seed
.fetch_add(1, Ordering::SeqCst)
.into();
match self
.common
.io_iface(RequestType::BindTcp {
socket_id,
addr,
only_v6,
reuse_port,
reuse_addr,
})
.await
{
ResponseType::Err(err) => Err(err),
ResponseType::None => Ok(Box::new(self.new_socket(socket_id))),
ResponseType::Socket(socket_id) => Ok(Box::new(self.new_socket(socket_id))),
res => {
tracing::debug!("invalid response to bind TCP request - {res:?}");
Err(NetworkError::IOError)
}
}
}

async fn bind_udp(
&self,
addr: SocketAddr,
Expand Down Expand Up @@ -880,9 +915,13 @@ struct RemoteSocket {
buffer_recv_with_addr: VecDeque<DataWithAddr>,
buffer_accept: VecDeque<SocketWithAddr>,
send_available: u64,
owns_socket_bindings: bool,
}
impl Drop for RemoteSocket {
fn drop(&mut self) {
if !self.owns_socket_bindings {
return;
}
self.common.recv_tx.lock().unwrap().remove(&self.socket_id);
self.common
.recv_with_addr_tx
Expand Down Expand Up @@ -941,6 +980,31 @@ impl RemoteSocket {
self.pending_accept.replace((child_id, rx_recv));
Ok(())
}

fn transition_socket(&mut self) -> RemoteSocket {
let (_tx_recv, rx_recv) = tokio::sync::mpsc::channel(1);
let (_tx_recv_with_addr, rx_recv_with_addr) = tokio::sync::mpsc::channel(1);
let (_tx_accept, rx_accept) = tokio::sync::mpsc::channel(1);
let (_tx_sent, rx_sent) = tokio::sync::mpsc::channel(1);

self.owns_socket_bindings = false;

RemoteSocket {
socket_id: self.socket_id,
common: self.common.clone(),
rx_buffer: std::mem::take(&mut self.rx_buffer),
rx_recv: std::mem::replace(&mut self.rx_recv, rx_recv),
rx_recv_with_addr: std::mem::replace(&mut self.rx_recv_with_addr, rx_recv_with_addr),
tx_waker: self.tx_waker.clone(),
rx_accept: std::mem::replace(&mut self.rx_accept, rx_accept),
rx_sent: std::mem::replace(&mut self.rx_sent, rx_sent),
pending_accept: self.pending_accept.take(),
buffer_recv_with_addr: std::mem::take(&mut self.buffer_recv_with_addr),
buffer_accept: std::mem::take(&mut self.buffer_accept),
send_available: self.send_available,
owns_socket_bindings: true,
}
}
}

impl VirtualIoSource for RemoteSocket {
Expand Down Expand Up @@ -1121,6 +1185,7 @@ impl VirtualTcpListener for RemoteSocket {
buffer_accept: Default::default(),
buffer_recv_with_addr: Default::default(),
send_available: 0,
owns_socket_bindings: true,
};
Ok((Box::new(socket), accepted.addr))
}
Expand Down Expand Up @@ -1159,6 +1224,46 @@ impl VirtualTcpListener for RemoteSocket {
}
}

impl VirtualTcpBoundSocket for RemoteSocket {
fn addr_local(&self) -> Result<SocketAddr> {
VirtualSocket::addr_local(self)
}

fn listen(&mut self) -> Result<Box<dyn VirtualTcpListener + Sync>> {
match block_on(self.io_socket(RequestType::ListenBound)) {
ResponseType::Err(err) => Err(err),
ResponseType::None => {
let mut socket = self.transition_socket();
socket.touch_begin_accept().ok();
Ok(Box::new(socket))
}
res => {
tracing::debug!("invalid response to listen bound request - {res:?}");
Err(NetworkError::IOError)
}
}
}

fn connect(&mut self, peer: SocketAddr) -> Result<Box<dyn VirtualTcpSocket + Sync>> {
match block_on(self.io_socket(RequestType::ConnectBound { peer })) {
ResponseType::Err(err) => Err(err),
ResponseType::None => Ok(Box::new(self.transition_socket())),
res => {
tracing::debug!("invalid response to connect bound request - {res:?}");
Err(NetworkError::IOError)
}
}
}

fn set_ttl(&mut self, ttl: u32) -> Result<()> {
VirtualSocket::set_ttl(self, ttl)
}

fn ttl(&self) -> Result<u32> {
VirtualSocket::ttl(self)
}
}

impl VirtualRawSocket for RemoteSocket {
fn try_send(&mut self, data: &[u8]) -> Result<usize> {
let mut cx = Context::from_waker(&self.tx_waker);
Expand Down
Loading
Loading