[Design Question] IPC timeout behaviour undefined - transport has no socket timeout and fail-open/fail-closed is unresolved per hook

## Summary

While reviewing the Phase 1 implementation I found that the Python transport 
has no socket timeout set on the UDS path. A stalled or slow sidecar will 
block the agent thread indefinitely. This also surfaces a broader design 
question that was raised in the project Slack but never formally resolved: 
what is the intended fail behaviour per hook when IPC fails or times out?

## The concrete issue in transport.py

In `_connect_and_send_uds` (transport.py:75):
```python
with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as sock:
    sock.connect(self.socket_path)
    sock.sendall(frame_bytes)
    return self._read_response(sock)
```

No `sock.settimeout()` is called. If the sidecar stalls mid-response, 
`_read_response` blocks on `sock.recv()` forever.

Additionally, the retry logic in `send()` only catches 
`ConnectionRefusedError` and `FileNotFoundError`. A `socket.timeout` 
exception would not be retried and would propagate as an unhandled error.

## Why this matters for a security layer

The architecture doc specifies a 4-8ms typical latency budget and ~10ms 
worst-case. But there is currently no enforcement of that budget on the 
SDK side. An attacker who can induce load on the sidecar process can 
stall every agent call that goes through the firewall.

The consequence depends on the fail behaviour, which is currently 
undefined:

- **Fail-open** (let the call proceed if firewall is unreachable): 
  the attacker has bypassed the enforcement layer entirely
- **Fail-closed** (block the call if firewall is unreachable): 
  the attacker has created a denial of service

Neither is acceptable as a silent default. This needs to be an explicit, 
configurable decision.

## The design question

Different hooks have different risk profiles and the right fail mode 
probably differs per hook:

| Hook | Suggested default | Reasoning |
|---|---|---|
| `on_prompt` | fail-closed | blocking a turn is recoverable |
| `on_tool_call` | fail-closed | tool execution without inspection is unsafe |
| `on_context` | configurable | degraded RAG is acceptable in some deployments |
| `on_memory` | fail-closed | a poisoned write that bypasses inspection persists |

This could be expressed in `sidecar.yaml` under each hook definition, 
similar to how `on_ipc_timeout` was proposed in the policy taxonomy 
discussion.

## Questions for the mentor

1. What is the intended fail behaviour when the sidecar is unreachable 
   or times out: fail-open or fail-closed?
2. Should this be configurable per hook, or a single global setting for v1?
3. Should the SDK enforce the latency budget with a configurable timeout 
   (e.g. `ACF_TIMEOUT_MS` env var), or is that the sidecar's responsibility?
4. Should `socket.timeout` be included in the retry set, or should it 
   propagate immediately as a hard failure?

Once the direction is decided, I can put together a PR implementing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design Question] IPC timeout behaviour undefined - transport has no socket timeout and fail-open/fail-closed is unresolved per hook #23

Summary

The concrete issue in transport.py

Why this matters for a security layer

The design question

Questions for the mentor

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hook	Suggested default	Reasoning
`on_prompt`	fail-closed	blocking a turn is recoverable
`on_tool_call`	fail-closed	tool execution without inspection is unsafe
`on_context`	configurable	degraded RAG is acceptable in some deployments
`on_memory`	fail-closed	a poisoned write that bypasses inspection persists

[Design Question] IPC timeout behaviour undefined - transport has no socket timeout and fail-open/fail-closed is unresolved per hook #23

Description

Summary

The concrete issue in transport.py

Why this matters for a security layer

The design question

Questions for the mentor

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions