fix(ipc): retry unix socket comms if none available#117
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to make IPC over Unix sockets more resilient during component startup by retrying sends when the target socket isn’t immediately available, and by switching the hub startup flow from a fixed sleep to explicit component readiness checks.
Changes:
- Add a short retry loop waiting for the target Unix socket file to appear before connecting/sending.
- Add
tracingdependency to themate_ipccrate to support new debug logging. - Replace a fixed startup delay in the hub with
wait_for_components()readiness checks (plus minor import cleanup in CLI).
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/ipc/src/transport/unix_socket.rs |
Adds socket-availability polling + debug logging before connect/send. |
src/ipc/Cargo.toml |
Adds tracing as a dependency for IPC crate. |
src/cli/src/transport.rs |
Import formatting cleanup only. |
src/cli/src/process/hub.rs |
Replaces fixed sleep with wait_for_components() during process spawn. |
Cargo.lock |
Locks tracing dependency addition. |
Comments suppressed due to low confidence (2)
src/ipc/src/transport/unix_socket.rs:200
- The new socket-availability wait only sleeps 10ms * 3 plus connect retries (10ms + 20ms), so send_message_internal will fail after ~60ms if the target component is still starting. Previously the CLI waited 1s before pinging components, so this change can reintroduce flaky startup failures on slower machines/CI. Consider retrying up to a time-based deadline (e.g., a few seconds) with exponential backoff, or make the retry count/delay configurable.
let target_socket = Self::socket_path_for_process(&self.base_path, &msg.to);
let mut tries = 0;
if !target_socket.exists() {
loop {
if target_socket.exists() {
break;
}
if tries >= UNIX_SOCKET_CONNECTION_RETRIES {
return Err(anyhow!(
"Target process {:?} socket does not exist at {:?}",
msg.to,
target_socket
));
}
sleep(Duration::from_millis(10)).await;
tries += 1;
debug!(
"Waiting for target process {:?} socket to be available at {:?} (attempt {}/{})",
msg.to, target_socket, tries, UNIX_SOCKET_CONNECTION_RETRIES
);
}
}
let mut stream =
Self::connect_with_retry(&target_socket, UNIX_SOCKET_CONNECTION_RETRIES).await?;
let serialized = serde_json::to_vec(msg)?;
let len = (serialized.len() as u32).to_le_bytes();
src/ipc/src/transport/unix_socket.rs:200
- There are now two layers of retry logic: a manual
target_socket.exists()polling loop and thenconnect_with_retry(...)which already retries on connect errors (including ENOENT). This duplication increases complexity and extends the overall wait in a hard-to-reason-about way; it would be simpler to consolidate into a single retry/backoff path (and optionally improve the error message when the last error is ENOENT).
if !target_socket.exists() {
loop {
if target_socket.exists() {
break;
}
if tries >= UNIX_SOCKET_CONNECTION_RETRIES {
return Err(anyhow!(
"Target process {:?} socket does not exist at {:?}",
msg.to,
target_socket
));
}
sleep(Duration::from_millis(10)).await;
tries += 1;
debug!(
"Waiting for target process {:?} socket to be available at {:?} (attempt {}/{})",
msg.to, target_socket, tries, UNIX_SOCKET_CONNECTION_RETRIES
);
}
}
let mut stream =
Self::connect_with_retry(&target_socket, UNIX_SOCKET_CONNECTION_RETRIES).await?;
let serialized = serde_json::to_vec(msg)?;
let len = (serialized.len() as u32).to_le_bytes();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of sleeping to give time to processes, we now use IPC to check for them